SEMI-SUPERVISED DETECTOR LEARNING USING MINIMAL
LABELS
Sudeep PillaiThursday, Dec 6th, 2012
WHY SEMI-SUPERVISED?
• Few million labeled images on the interweb
2
WHY SEMI-SUPERVISED?
• But trillions of unlabeled images
3
SEMI-SUPERVISED LEARNING• Features extracted from images each lie on a different high-
dimensional manifold
• Smooth classification with the assumption that the data lies on a continuous manifold
Semi-supervised Learning using Graph Laplacian
LapRLS RLS
4
• Affinity matrix where is an matrix
• Graph Laplacian where
• Objective Function
• - labels, if (unlabeled), and (labeled)
• Laplacian Least-squares Solution
• Neighborhood graph with k<<n
SEMI-SUPERVISED LEARNING USING GRAPH LAPLACIAN
Dii =X
j
Wij
J(f) = FTLf + (f � y)T⇤(f � y)
⇤ii = 0 ⇤ii = �
(L+ ⇤)f = ⇤y
W = exp(��kxi � xjk2)
L = I �D� 12WD� 1
2
Label classificationSmoothness
y
5
n⇥ nW
IMAGE REPRESENTATION
• Combination of descriptors to discriminate between images
• GIST + PHOG (Pyramid Histogram of Oriented Gradients)
• Single unified kernel to classify images
6
IMAGE REPRESENTATION
• GIST descriptor
• Low-dimensional representation of the spatial structure of a scene
• 512 dims - 8 orientations, 4x4 bins, 4 scales
• Reduce dimensionality to top 64 dims. to account for ~98% of the variance in the data
• Similarity - L2 distance between 64-D descriptors
GIST descriptor7
IMAGE REPRESENTATION
• Pyramid Histogram of Gradients descriptor
• Low-dimensional representation of Local Shape and Spatial Layout
• Build vocabulary of k words from the histograms - via k-means
• Build spatial relationships of vocabulary - kd-tree
• Similarity - 𝛘2 distance between histograms
8
IMAGE REPRESENTATION
• Pyramid Histogram of Gradients descriptor
• Extract Oriented Gradients at multiple scales
• Final histogram is weighted concatenation of histograms at each level PHOG computed for each image at different scales
9
GIST ON CALTECH-101
ROC performance curve for the GIST descriptor with increasing number of training examples
10
GIST ON CALTECH-101
MisclassifiedClassified correctlyTrainingexample
Classification Confusion Matrix
Misclassification rate vs. number of training examples11
PHOG ON CALTECH-101
ROC performance curve for the PHOG descriptor with increasing number of training examples
12
PHOG ON CALTECH-101
Misclassification error rate drops with increasing training data
Classification Confusion Matrix with a single training example
13
CONCLUSIONS • Demonstrated a comparable semi-supervised detector capable of
learning from unlabeled data given a minimal set of labels
• With a few examples, the detector should be capable of outperforming fully supervised detectors
• Future work
• Weighted kernel matrix for each descriptor
• Multi-view label propagation in each feature space
• Test on datasets that have large variation within a single class but reasonably separated
14
THANKS!