|Number of watchers on Github||102|
|Number of open issues||4|
|Average time to close an issue||about 15 hours|
|Open pull requests||1+|
|Closed pull requests||0+|
|Last commit||over 2 years ago|
|Repo Created||over 2 years ago|
|Repo Last Updated||6 months ago|
|Organization / Author||coreylynch|
|Do you use vgg-19-feature-extractor? Leave a review!|
|View open issues (4)|
|View vgg-19-feature-extractor activity|
|View on github|
|Latest Open Source Launches|
Trendy new open source projects in your inbox! View examples
This allows you to extract deep visual features from a pre-trained VGG-19 net for collections of images in the millions. Images are loaded and preprocessed in parallel using multiple CPU threads then shipped to the GPU in minibatches for the forward pass through the net. Model weights are downloaded for you and loaded using Torch's loadcaffe library, so you don't need to compile Caffe.
The feature extractor computes a 4096 dimensional feature vector for every image that contains the activations of the hidden layer immediately before the VGG's object classifier. The activations are ReLU-ed and L2-normalized, which means they can be used as generic off-the-shelf features for tasks like classification or image similarity.
You point it a tab separated file of (image_id, path to image on disk) e.g.
12 /home/username/images/12.jpg 342 /home/username/images/342.jpg 169 /home/username/images/169.jpg
specified by the
-data flag, and it creates a tab separated file of (image_id, json encoded VGG vector) e.g.
12 [4096 dimensional vector] 342 [4096 dimensional vector] 169 [4096 dimensional vector]
specified by the
-nThreads tells it how many CPU loader threads to use.
-batchSize tells it how many images to put in each minibatch. The higher the batchSize, the higher the throughput, so I'd make this as large as your GPU memory will allow.
th main.lua -data [tab separated file of (image_id, path_to_image_on_disk)] -outFile out_vecs -nThreads 8 -batchSize 128
brew install coreutils findutilsto get GNU versions of
Middle layers see combinations of these lower level features, forming filters that respond to common textures.
Higher layers see combinations of these middle layers, forming filters that respond to object parts, and so on.
You can see the actual content of the image becoming increasingly explicit along the processing hierarchy.
Or being able to embed images and words in a joint space then do vector arithmetic in the learned space:
Yep that's a multimodal vector describing a blue car minus the multimodal vector for the word
blue, plus the vector for
red resulting in a vector that is near images of red cars.
Take advice from here (actually go read the entire course, it's amazing).