Grouping & Codebooks
CS 510 Lecture #15
March 31st, 2014
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 2
Where are we?
• We know how to match images (patches) – Under translation and 2D rotation
• How does this help recognize – Houses? (Different shapes, colors, etc) – Cars? (Different shapes, colors, etc.)
• General problem: – Classes of objects share features – Classes may also have intra-class variation
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 3
Feature Matching • New goal (started before break):
– Detect/Match objects based on local features. • Houses have windows, chimneys, doors… • Cars have wheels, headlights, bumpers…
• Starting point: Interest Points as Features – Serve as Focus of Attention (FOA) mechanism – Feature descriptors:
• SIFT • HoG • LBP
• Next step: what do we do with descriptors?
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 4
Looking Ahead: Codebooks • Idea #1: group similar descriptors
– Ideally, each cluster corresponds to a local semantic feature
• e.g. headlight, door handle, etc. – Cluster identity serves as a proxy for a label
• Even if we don’t know what the feature is
• Idea #2: images with similar features contain similar objects – Of course, it could be the background…
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 5
Step #1: Clustering • Assumptions
– K : the number of clusters – Every descriptor is a point in feature space
• Approaches – Generative models: fit K statistical distributions
that are most likely to explain the data (today) • K-Means (in OpenCV) • Expectation Maximization (EM) (in OpenCV)
– Split/Merge approaches: find the splits that optimize a reward function (Wednesday)
• Hierarchical agglomeration • Spectral Clustering (division)
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 6
K-Means
• Select K samples as random, make them cluster centers – There are useful variations on this step
• Iterate until no change: – Assign every sample to the nearest cluster
center – Move every cluster center to the mean of the
samples assigned to it
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 7
o o
o
K-Means Illustration K = 2
o o
o o o
o o
o o
o
o o o
o
o o
o o o
o o o
o o
o o
o
o o o
o
o o
o o o
o o
o
o o
o
o o
o
o o
o o o
o o o
o o
o o
o
o o o
o
o o
o X
X
o o
o o o
o o
o o
o
o o o
o
o o
o o o
o o o
o o
o o
o
o o o
o
o o
o X
X o
o
o o o
o o
o o
o
o o o
o
o o
o o o
o o o
o o
o o
o
o o o
o
o o
o X X
Analysis of K-Means • K-Means minimizes
– Where ‘S’ is the set of samples – C(s) is the cluster center that sample ‘s’ is
assigned to • The assignment step reduces the value by
changing the assignments C(s) • The mean computation step reduces the
value by centering the means • Together, they hill climb to a local optima
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 9
s−C s( ) L2s∈S∑
Probabilistic Interpretation of K-Means
• Every cluster center can be viewed as the mean of a Gaussian random process – St. Dev. is the same in every direction – St. Dev. is the same for every process – Samples are assigned to the process that was
most likely to create them • This interpretation supports
– Estimating the likelihood of a sample – If K-Means is run more than once, select the
solution most likely to generate the observed data
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 10
K-Means : Problem with Unequal Variances
• Implicit assumption: Gaussian processes with equal variance
• Two Gaussians, but different variances • Need to model each cluster, not just its center
x x x
x x x
x x
x x x
x x x
x x x
x x x x x
x x x
x x x
x x x x
x
x x
x x x
x x x
x x
x x
x x x x
o o
o o o
o o
o o
o
o o o
o
o o
o
X
X
Measuring Cluster Variance
• Measure covariance Σ of PDFs: – Let X be the DxN set of mean-subtracted
samples:
– Then Σ is the covariance matrix:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
=
!"!"
!"!
nssX 1
TXXN1
=Σ
The Hard-Assignment Problem
Which Gaussian generated these samples?
Solution: soft assignments • Which process generated the points in the
middle? – Either could have
• For every sample/cluster pair, compute the likelihood that the sample was generated by the cluster – Note: the value is never zero – This is called “soft assignment” – Samples not uniquely assigned to clusters
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 14
Even Harder Overlapping Gaussians
x x x
x x x
x x
x x x
x x x
x x x
x x x x x
x x x
x x x
x x x x
x
x x
x x x
x x x
x x
x x
x x x x
o o
o o o
o o
o o
o
o o o
o
o o
o
True 2nd cluster
Expectation Maximization (EM) • Initialize clusters using random samples,
uniform variance • Iterate until minimal change
– For every sample • Compute the likelihood that it could be generated by
each cluster • Normalize likelihoods to sum to1
– The sample exists!
– For every cluster • Estimate mean and covariance using probability-
weighted samples
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 16
Probabilistic Interpretation of EM • Every cluster represents a Gaussian random
process – Assignment (Expectation) step computes likelihood of
generation for each sample/process pair – Fitting (Maximization) step estimates the Gaussian
parameters most likely to have generated the data • This supports:
– Estimating the likelihood of the data set – Estimating the likelihood of any sample being created
by any process
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 17
K-Means vs EM • EM is a more general model
– The processes don’t even have to be Gaussian (just a known distribution)
• EM fits far more parameters – Good if enough training data is available – Good if data fits the model
• K-Means is simpler, more robust – Better when dimensionality is high – Better when data may not be Gaussian
• OpenCV includes both K-Means and EM
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 18
Generative è Model Free • K-Means and EM fit Gaussian models • What if your data isn’t Gaussian? • Simple alternative solutions:
– Bottom-up (Agglomerative) • Start with one cluster per sample • While more than K clusters, merge most similar cluster
pair – Top-down (Spectral Clustering)
• Measure similarity of every sample pair • Divide the data set so as to minimize the similarity of
pairs in different groups
4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 19