YFCC100M HybridNet fc6 Deep Features for Content-Based Image Retrieval

Post on 23-Jan-2018

4,961 views 0 download

transcript

Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro and Fausto Rabitti

fabrizio.falchi@cnr.it

YFCC100M HYBRIDNET FC6 DEEP FEATURES

FOR CONTENT-BASED IMAGE RETRIEVAL

Multimedia COMMONS Workshop at ACM Multimedia 2016

Amsterdam, The Netherlands, October 15-19

fabrizio.falchi@cnr.it

WHERE WE COME FROM AND MOTIVATIONS

CoPhIR – Content-based Photo Image Retrieval

http://cophir.isti.cnr.it

• Flickr 106M Photos (not all CC)

• title, description, author, tags, comments, notes, and also its GPS, coordinates, the number of views and the number of users considering the photo a favorite

• MPEG-7 Visual Features

• mainly used by the Similarity Search community(144 citations and about 100 requests)

Similarity SearchThe Metric Space ApproachZezula, Amato, Dohnal, Batko

2008

fabrizio.falchi@cnr.it

MAJOR RELATED EVENTS

Deep Learning explosion

YFCC100M

The Multimedia Commons Initiative

fabrizio.falchi@cnr.it

CONTRIBUTIONS

• HybridNet fc6 Deep Features for YFCC100M imagesmultimediacommons.wordpress.com/

• CBIR Systems on the YFCC100M

o MI-Filemifile.deepfeatures.org

o Lucene Quantizationmelisandre.deepfeatures.org

• Ground-truth Results for evaluating Approximate k-NN (k=10,001)www.deepfeatures.org/

o On 3 types of the neuron activations (features) processing

o For subsets of the whole collections at each 1M step

fabrizio.falchi@cnr.it

HYBRIDNET

• Trained on 3.5 million images from 1,183 categories:o ImageNet-ILSVRC

• about 1 million images from 888 categories (removing Places 295 duplicates)

o Places 205

• about 2.5 million images from 205 categories

Learning Deep Features for Scene Recognition using Places DatabaseZhou, Lapedriza, Xiao, Torralba, Oliva, NIPS 2014

fabrizio.falchi@cnr.it

WHY HYBRIDNET FC6?

A Practical Guide to CNNs and Fisher Vectors for Image Instance RetrievalV Chandrasekhar, J Lin, O Morère, H Goh, A Veillard - Signal Processing, 2016 - Elsevier

fabrizio.falchi@cnr.it

DEEP FEATURES PROCESSING

• We generated 3 distinct features from the fc6 activations:

o Raw (no ReLu) + L2Norm.

o ReLu + L2Norm.

o BinaryA simple binarization of deep features was shown to lead to a negligible performance drop for both classification and detection (PASCAL-CLS in particular).

𝑏𝑖 = 1 𝑓𝑖 > 00 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Analyzing the performance of multilayer neural networks for object recognition.P. Agrawal, R. Girshick, and J. Malik. (ECCV 2014)

fabrizio.falchi@cnr.it

fabrizio.falchi@cnr.it

GT RESULTS www.deepfeature.org

fabrizio.falchi@cnr.it

GT RESULTS (SEQUENTIAL SCANNNING)

fabrizio.falchi@cnr.it

GT RESULTS (SEQUENTIAL SCANNNING)

fabrizio.falchi@cnr.it

APPROXIMATE CBIR RESULTS

MI-

File

Lu

ce

ne

Qu

an

tiza

tio

n

fabrizio.falchi@cnr.it

THE CBIR ONLINE SYSTMES

• MI-File

o Permutation Based method

o Uses Inverted Files

MI-File: using inverted files for scalable approximate similarity search

G Amato, C Gennaro, P Savino (Multimedia tools and applications)

• Lucene Quantization

o Exploits the sparsity of deep features (ReLu -> 25% non zeros)

o Quantization approach to allow text encoding

o Also able to perform text and combined search

Large scale indexing and searching deep convolutional neural network features

G. Amato, F. Debole, F. Falchi, C. Gennaro, and F. Rabitti (DaWaK 2016)

fabrizio.falchi@cnr.it

MI-FILE (INDEXING BINARY FEATURES)

fabrizio.falchi@cnr.it

LUCENE QUANTIZATION (INDEXING RELU L2NORM.)

fabrizio.falchi@cnr.it

MI-FILE (COMPARED TO GT FOR RELU-L2NORM)

ONGOING WORKS

fabrizio.falchi@cnr.it

IMAGE ANNOTATION

fabrizio.falchi@cnr.it

CROSS MEDIA RETRIEVAL (RESULTS ON MS-COCO)

• Text queries are translated in HybridNet fc6 Visual Vectors by a NN

Picture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287

fabrizio.falchi@cnr.it

CROSS MEDIA RETRIEVAL (RESULTS ON YFCC100M)

Picture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287

fabrizio.falchi@cnr.it

CONCLUSIONS AND FUTURE WORK

Contributions:

• HybridNet fc6 Deep Features

• CBIR Systems for YFCC100M:

o MI-File mifile.deepfeatures.org

o Lucene Quantization melisandre.deepfeatures.org

• GT k-NN results for evaluating Approximate Search www.deepfeatures.org/

Ongoing and future works:

• HybridNet fc6 PCA256

• Image annotation based on the YFCC100M metadata

• Extracting new features, e.g.:Deep Image Retrieval: Learning Global Representations for Image SearchAlbert Gordo, Xerox Research; Jon Almazan, XRCE; Jerome Revaud, Xerox Research; Diane Larlus, Xerox

• Cross-media retrievalPicture It In Your Mind: Generating High Level Visual Representations From Textual DescriptionsFabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernándezhttps://arxiv.org/abs/1606.07287

fabrizio.falchi@cnr.it

THANKS!

Questions are welcomed

Fabrizio Falchi

fabrizio.falchi@cnr.it