Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search
The best paper prize at CVPR 2008
Motivation
To localize the object without exhaustive search observation : often, only a small portion of the image
contains the object of interest
To find a global optimum in a huge search space
Branching and bounding
Object detection and retrieval
SVM: Localization problem
SVM answers ‘Yes’ or ‘No’ to whether the objects belongs to the classifier’s object class as well as returns confidence score
It cannot say where the object is located in the image and at what scale
SVM Object Localization Methods
Exhaustive Search. For n x n image complexity is O(n4)
Sliding Window Approach
Branch–and–Bound Scheme
Branching. Dividing a space of candidate rectangles into subspacesBounding. Pruning subspaces with a highest possible score lower than some guaranteed score in other subspaces
Bounding function
To use branch-and-bound for given quality function f, we need to define upper bound function
Algorithm
Example I. Bag of visual words SVM
For every image
Extract SIFT image descriptors Quantize descriptors using K-entry codebook of
descriptors Represent an image by a histogram of codebook entry
occurences
every image is coded as 1-dimensional vector h of length K where K is the number of codebook ‘words’
Example I. Bounding function
SVM Decision function:
We can express it as a sum of per-point contributions with weights
If we denote by Rmax the largest rectangle and by Rmin the smallest rectangle contained in a parameter region R, then
Example I. Experiment
PASCAL VOC 06 5,304 images with 9,507 objects from 10 categories 1000 visual words from 50,000 SURF descriptors claim a match when > 50% overlap between the detected
bounding box and the ground truth
PASCAL VOC 2007 9,963 images with 24,640 objects
Recall Precision Curve
Example II. Spatial Pyramid Kernel SVM
Example II. Spatial Pyramid Kernel SVM
SVM Decision function:
We can express it as a sum of per-point contributions with weights
The upper bound for f is obtained by summing the bounds for all levels and cells
Consider this as Down Sampling
Example II. Experiment
UIUC Car database (side-view, one car per image) 1050 training (550 positive images) 277 test (170 single scale + 107 multi scale) 1000 visual words from 50,000 SURF descriptors
Example III. Nonlinear
More Quality bounds by interval arithmetic
Example III. Experiment 10143 keyframes of a movie return 100 most relevant images for a query 2s per returned image
Experiments
Summary Fast Global Optimal Easy to extend (change classifiers, parametric space)
Future Kernel-based Classifiers Extensions (groups of boxes, circles …_)