Grouping & Codebookscs510/yr2014sp/more... · L15_Grouping.pptx Author: Bruce A. Draper Created...

Grouping & Codebooks

CS 510 Lecture #15

March 31st, 2014

4/2/14 CS 510, Image Computation, ©Ross Beveridge & Bruce Draper 2

Where are we?

•  We know how to match images (patches) – Under translation and 2D rotation

•  How does this help recognize – Houses? (Different shapes, colors, etc) – Cars? (Different shapes, colors, etc.)

•  General problem: –  Classes of objects share features –  Classes may also have intra-class variation


Feature Matching •  New goal (started before break):

– Detect/Match objects based on local features. •  Houses have windows, chimneys, doors… •  Cars have wheels, headlights, bumpers…

•  Starting point: Interest Points as Features – Serve as Focus of Attention (FOA) mechanism – Feature descriptors:

•  SIFT •  HoG •  LBP

•  Next step: what do we do with descriptors?


Looking Ahead: Codebooks •  Idea #1: group similar descriptors

–  Ideally, each cluster corresponds to a local semantic feature

•  e.g. headlight, door handle, etc. – Cluster identity serves as a proxy for a label

•  Even if we don’t know what the feature is

•  Idea #2: images with similar features contain similar objects – Of course, it could be the background…


Step #1: Clustering •  Assumptions

– K : the number of clusters – Every descriptor is a point in feature space

•  Approaches – Generative models: fit K statistical distributions

that are most likely to explain the data (today) •  K-Means (in OpenCV) •  Expectation Maximization (EM) (in OpenCV)

– Split/Merge approaches: find the splits that optimize a reward function (Wednesday)

•  Hierarchical agglomeration •  Spectral Clustering (division)


K-Means

•  Select K samples as random, make them cluster centers – There are useful variations on this step

•  Iterate until no change: – Assign every sample to the nearest cluster

center – Move every cluster center to the mean of the

samples assigned to it


o o

o

K-Means Illustration K = 2

o o

o o o

o o

o o

o

o o o

o

o o

o o o

o o o

o o

o o

o

o o o

o

o o

o o o

o o

o

o o

o

o o

o

o o

o o o

o o o

o o

o o

o

o o o

o

o o

o X

X

o o

o o o

o o

o o

o

o o o

o

o o

o o o

o o o

o o

o o

o

o o o

o

o o

o X

X o

o

o o o

o o

o o

o

o o o

o

o o

o o o

o o o

o o

o o

o

o o o

o

o o

o X X

Analysis of K-Means •  K-Means minimizes

– Where ‘S’ is the set of samples – C(s) is the cluster center that sample ‘s’ is

assigned to •  The assignment step reduces the value by

changing the assignments C(s) •  The mean computation step reduces the

value by centering the means •  Together, they hill climb to a local optima


s−C s( ) L2s∈S∑

Probabilistic Interpretation of K-Means

•  Every cluster center can be viewed as the mean of a Gaussian random process – St. Dev. is the same in every direction – St. Dev. is the same for every process – Samples are assigned to the process that was

most likely to create them •  This interpretation supports

– Estimating the likelihood of a sample –  If K-Means is run more than once, select the

solution most likely to generate the observed data


K-Means : Problem with Unequal Variances

•  Implicit assumption: Gaussian processes with equal variance

•  Two Gaussians, but different variances •  Need to model each cluster, not just its center

x x x

x x x

x x

x x x

x x x

x x x

x x x x x

x x x

x x x

x x x x

x

x x

x x x

x x x

x x

x x

x x x x

o o

o o o

o o

o o

o

o o o

o

o o

o

X

X

Measuring Cluster Variance

•  Measure covariance Σ of PDFs: – Let X be the DxN set of mean-subtracted

samples:

– Then Σ is the covariance matrix:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

=

!"!"

!"!

nssX 1

TXXN1

=Σ

The Hard-Assignment Problem

Which Gaussian generated these samples?

Solution: soft assignments •  Which process generated the points in the

middle? – Either could have

•  For every sample/cluster pair, compute the likelihood that the sample was generated by the cluster – Note: the value is never zero – This is called “soft assignment” – Samples not uniquely assigned to clusters


Even Harder Overlapping Gaussians

x x x

x x x

x x

x x x

x x x

x x x

x x x x x

x x x

x x x

x x x x

x

x x

x x x

x x x

x x

x x

x x x x

o o

o o o

o o

o o

o

o o o

o

o o

o

True 2nd cluster

Expectation Maximization (EM) •  Initialize clusters using random samples,

uniform variance •  Iterate until minimal change

– For every sample •  Compute the likelihood that it could be generated by

each cluster •  Normalize likelihoods to sum to1

–  The sample exists!

– For every cluster •  Estimate mean and covariance using probability-

weighted samples


Probabilistic Interpretation of EM •  Every cluster represents a Gaussian random

process –  Assignment (Expectation) step computes likelihood of

generation for each sample/process pair –  Fitting (Maximization) step estimates the Gaussian

parameters most likely to have generated the data •  This supports:

–  Estimating the likelihood of the data set –  Estimating the likelihood of any sample being created

by any process


K-Means vs EM •  EM is a more general model

– The processes don’t even have to be Gaussian (just a known distribution)

•  EM fits far more parameters – Good if enough training data is available – Good if data fits the model

•  K-Means is simpler, more robust – Better when dimensionality is high – Better when data may not be Gaussian

•  OpenCV includes both K-Means and EM


Generative è Model Free •  K-Means and EM fit Gaussian models •  What if your data isn’t Gaussian? •  Simple alternative solutions:

– Bottom-up (Agglomerative) •  Start with one cluster per sample •  While more than K clusters, merge most similar cluster

pair – Top-down (Spectral Clustering)

•  Measure similarity of every sample pair •  Divide the data set so as to minimize the similarity of

pairs in different groups


Date post:	15-Mar-2021
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Grouping & Codebookscs510/yr2014sp/more... · L15_Grouping.pptx Author: Bruce A. Draper Created...

Documents