Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss....

transcript

Fast and Compact Retrieval Methods in Computer Vision Part II

• A. Torralba, R. Fergus and Y. Weiss.Small Codes and Large Image Databases for Recognition. CVPR 2008

• A. Torralba, R. Fergus, W. Freeman . 80 million tiny images: a large dataset for non-parametric object and scene recognition. TR

Presented by Ken and Ryan

Outline

• Large Datasets of Images

• Searching Large Datasets– Nearest Neighbor– ANN: Locality Sensitive Hashing

• Dimensionality Reduction– Boosting– Restricted Boltzmann Machines (RBM)

• Results

• Develop efficient image search and scene matching techniques that are fast and require very little memory

• Particularly on VERY large image sets

Motivation

• Image sets– Vogel & Schiele: 702 natural scenes in 6 cat– Olivia & Torralba: 2688– Caltech 101: ~50 images/cat ~ 5000 – Caltech 256: 80-800 images/cat ~ 30608

• Why do we want larger datasets?

Motivation

• Classify any image

• Complex classification methods don’t extend well

• Can we use a simple classification method?

Thumbnail Collection Project

• Collect images for ALL objects– List obtained from WordNet– 75,378 non-abstract nouns in English

Thumbnail Collection Project

• Collected 80M images• http://people.csail.mit.edu/torralba/tinyimages

How Much is 80M Images?

• One feature-length movie:– 105 min = 151K frames @ 24 FPS

• For 80M images, watch 530 movies

• How do we store this?– 1k * 80M = 80 GB– Actual storage: 760GB

First Attempt

• Store each image as 32x32 color thumbnail• Based on human visual perception• Information: 32*32*3 channels =3072 entries

First Attempt

• Used SSD++ to find nearest neighbors of query image– Used first 19 principal components

Motivation Part 2

• Is this good enough?

• SSD is naïve

• Still too much storage required

• How can we fix this?– Traditional methods of searching large datasets– Binary reduction

Locality-Sensitive Hash Families

LSH Example

Binary Reduction

Lots of pixels

512 values 32 bits

Gist vector

Binaryreduction

164 GB 320 MB80 million images?

“The ‘gist’ is an abstract representation of the scene that spontaneously activates

memory representations of scene categories (a city, a mountain, etc.)”

A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision, 42(3):145–175, 2001.

://ilab

.usc.e

ist/Gist.h

Gist vector

Query Image Dataset

Querying

Boosting

• Positive and negative image pairs train the discovery of the binary reduction.

80% negatives150K pairs

BoostSSC

• Similarity Sensitive Coding

• Weights start uniformly

Nvalues

Weight

BoostSSC

• For each bit m:– Choose the index n that

minimizes a weighted error across entire training set

Featurevector x

from image i

Binaryreduction

Nvalues

BoostSSC

• Weak classifications are evaluated via regression stumps:

N values

)])(())([(),( TnxTnxxxf jiji

• We need to figure out , , and T for each n.

If xi and xj are similar, we should get 1 for

most n’s.

BoostSSC

• Try a range of threshold T:– Regress f across entire training set

to find each and .– Keep the T that fits the best.

• Then, keep the n that causes the least weighted error.

n )])(())([(),( TnxTnxxxf jiji

N values

BoostSSC

N values Mbits

BoostSSC

• Update weights.– Affects future error

calculations

N values

Weight

BoostSSC

• In the end, each bit has an n index and a threshold.

Nvalues

BoostSSC

Restricted Boltzmann Machine (RBM) Architecture

• Network of binary stochastic units• Hinton & Salakhutdinov, Nature 2006

Parameters: w: Symmetric Weightsb: Biasesh: Hidden Unitsv: Visible Units

Multi-Layer RBM Architecture

Training RBM Models

• Two phases1. Pre-training

• Unsupervised• Use Contrastive Divergence to learn weights and biases• Gets parameters in the right ballpark

2. Fine-tuning• Supervised• No longer stochastic• Backpropogate error to update parameters• Moves parameters to local minimum

Greedy Pre-training (Unsupervised)

Neighborhood Components Analysis

• Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004

Output of RBM

W are RBM weights

Assume K=2 classes

Pulls nearby points of same class closer

Goal is to preserve neighborhood structure of original, high-dimensional space

Experiments and Results

Searching

• Bit limitations:– Hashing scheme:

• Max. capacity for 13M images: 30 bits

– Exhaustive search:• 256 bits possible

Searching Results

LabelMe Retrieval

Examples of Web Retrieval

• 12 neighbors using different distance metrics

Web Images Retrieval

Conclusion

• Efficient searching for large image datasets

• Compact image representation

• Methods for binary reductions– Locality-Sensitive Hashing– Boosting– Restricted Boltzmann Machines

• Searching techniques

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss....

Documents