Lecture 11: E-M and Mean-Shift
CAP 5415Fall 2007
Review on Segmentation by Clustering
Each Pixel
Data Vector
Example
(From Comanciu and Meer)
Review of k-means
• Let's find three clusters in this data• These points could represent RGB triplets in
3D
• Begin by guessing where the “center” of each cluster is
Review of k-means
• Now assign each point to the closest cluster
Review of k-means
• Now move each cluster center to the center of the points assigned to it
• Repeat this process until it converges
Review of k-means
Probabilistic Point of View
• We'll take a generative point of view
• How to generate a data point:
1) Choose a cluster,z, from (1 .... N)
2) Sample that point from the distribution associated with that cluster
1D Example
Called a Mixture Model
• z indicates which cluster is chosen
Probability of choosing cluster k
Probability of x given the cluster is k
or
To make it a Mixture of Gaussians
Called a mixing coefficient
Brief Review of Gaussians
Mixture of Gaussians
In Context of Our Previous Model
• Now, we have means and covariances
How does this help with clustering?
• Let's think about a different problem first• What if we had a set of data points and we
wanted to find the parameters of the mixture model?
• Typical strategy: Optimize parameters to maximize likelihood of the data
Maximizing the likelihood
• Easy if we knew which cluster each point should belong to
• But we don't, so we get its probability function by using Bayes Rule
Where this comes from
• Let's differentiate with respect to \mu_k
EM Algorithm
• This is called the E-Step• M-Step: Using these estimates of
maximize the rest of the parameters• Lots of interesting math and intuitions that go
into this algorithm, that I'm not covering• Take Pattern Recognition!
Back to clustering
• Now we have • Can be seen as a soft-clustering
Another Clustering Application
Another Clustering Application
• In this case, we have a video and we want to segment out what's moving or changing
from C. Stauffer and W. Grimson
Easy Solution
• Average a bunch of frames to get a “Background” Image
• Computer the difference between background and foreground
The difficulty with this approach
• The background changes
(From Stauffer and Grimson)
Solution
• Fit a mixture model to the background• I.E. A background pixel could have multiple
colors
Can use this to track in surveillance
Suggested Reading• Chapter 14, David A. Forsyth and Jean Ponce,
“Computer Vision: A Modern Approach”.• Chapter 3, Mubarak Shah, “Fundamentals of
Computer Vision”
Advantages and Disadvantages
Mean-Shift
• Like EM, this algorithm is built on probabilistic intuitions.
• To understand EM we had to understand mixture models
• To understand mean-shift, we need to understand kernel density estimation (Take Pattern Recognition!)
Basics of Kernel Density Estimation
• Let’s say you have a bunch of points drawn from some distribution
• What’s the distribution that generated these points?
Using a Parametric Model
• Could fit a parametric model (like a Gaussian)
• Why:– Can express
distribution with a few number of parameters (like mean and variance)
• Why not:– Limited in flexibility
Non-Parametric Methods
• We’ll focus on kernel-density estimates• Basic Idea: Use the data to define the
distribution• Intuition:
– If I were to draw more samples from the same probability distribution, then those points would probably be close to the points that I have already drawn
– Build distribution by putting a little mass of probability around each data-point
Example
(From Tappen – Thesis)
Formally
• Most Common Kernel: Gaussian or Normal Kernel
• Another way to think about it:– Make an image, put 1(or more) wherever you have a
sample– Convolve with a Gaussian
Kernel
What is Mean-Shift?
• The density will have peaks (also called modes)• If we started at point and did gradient-ascent, we
would end up at one of the modes• Cluster based on which mode each point belongs
to
Gradient Ascent?
• Actually, no.• A set of iterative steps can be taken that
will monotonically converge to a mode– No worries about step sizes– This is an adaptive gradient ascent
(x = y
j)
Results
Results
Normalized Cuts
• Clustering approach based on graphs• First some background
Graphs• A graph G(V,E) is a triple consisting of a
vertex set V(G) an edge set E(G) and a relation that associates with each edge two vertices called its end points.
(From Slides by Khurram Shafique)
Connected and Disconnected Graphs
• A graph G is connected if there is a path from every vertex to every other vertex in G.
• A graph G that is not connected is called disconnected graph.
(From Slides by Khurram Shafique)
Can represent a graph with a matrix
a
e
d
c
b
[0 1 0 0 11 0 0 0 00 0 0 0 10 0 0 0 11 0 1 1 0
]Adjacency Matrix: WOne Row Per
Node
(Based on Slides by Khurram Shafique)
Can add weights to edges
[0 1 3 ∞ ∞
1 0 4 ∞ 23 4 0 6 7∞ ∞ 6 0 1∞ 2 7 1 0
]Weight Matrix: W
(Based on Slides by Khurram Shafique)
Minimum Cut
A cut of a graph G is the set of edges S such that removal of S from G disconnects G.
Minimum cut is the cut of minimum weight, where weight of cut <A,B> is given as
(Based on Slides by Khurram Shafique)
Minimum Cut• There can be more than one minimum cut in
a given graph
• All minimum cuts of a graph can be found in polynomial time1.
1H. Nagamochi, K. Nishimura and T. Ibaraki, “Computing all small cuts in an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481.
(Based on Slides by Khurram Shafique)
How does this relate to image segmentation?
• When we compute the cut, we've divided the graph into two clusters
• To get a good segmentation, the weight on the edges should represent pixels affinity for being in the same group
(Images from Khurram Shafique)
Affinities for Image Segmentation
Brightness Features
• Interpretation:– High weight edges for pixels that
• Have similar intensity• Are close to each other
Min-Cut won't work though• The minimum-cut will often choose a cut
with one small cluster
(Image From Shi and Malik)
We need a better criterion
• Instead of min-cut, we can use the normalized cut
• Basic Idea: Big clusters will increase assoc(A,V), thus decreasing Ncut(A,B)
Finding the Normalized Cut• NP-Hard Problem
• Can find approximate solution by finding the eigenvector with the second-smallest eigenvalue of this generalized eigenvalue problem
• That splits the data into two clusters
• Can recursively partition data to find more clusters
• Code available on Jianbo Shi's webpage
Results
Figure from “Normalized cuts and image segmentation,” Shi and Malik, 2000
So what if I want to segment my image?
• Ncuts is a very common solution• Mean-shift is also very popular