Lecture 10: Segmentation by Clustering
Marshall Tappen
Region Segmentation
What is the idea underlying segmentation?
We want to group pixels together that “belong” togetherComputer vision researchers aren't the first ones to think about thisAlso studied by the Gestalt school of psychologists
GroupingTo perceive the image, the elements must be perceived as a wholeStudied how elements could be grouped togetherGestalt psychologists identified a group of factors that led to elements being grouped togetherI'm mentioning these ideas because they often come up in discussions of segmentation and grouping in computer visionHere are some examples
Proximity
Things that are nearby tend to be grouped together
(Figure from Forsyth and Ponce)
Similarity
Similar things tend to be grouped together
(Figure from Forsyth and Ponce)
Common Region
Tokens that lie in the same region tend to be grouped together
(Figure from Forsyth and Ponce)
Parallelism
Parallel lines or tokens tend to be grouped together
(Figure from Forsyth and Ponce)
Symmetry
We prefer groupings that lead to symmetric groups
(Figure from Forsyth and Ponce)
Closure
Tokens that lead to closed curves tend to be grouped together
(Figure from Forsyth and Ponce)
Grouping can lead to interesting effects
Called the Kanizsa TriangleGrouping is causing you to see illusory contours
Back to Pixels
Our goal is to group pixelsWe won't be able to incorporate all of the Gestalt cues, so we will have to focus on simpler cuesRGB similarityProximity
Simple idea
Let's find three clusters in this dataThese points could represent RGB triplets in 3D
Simple idea
Begin by guessing where the “center” of each cluster is
Simple idea
Now assign each point to the closest cluster
Simple ideaNow move each cluster center to the center of the points assigned to itRepeat this process until it converges
Mathematically, What's going on?
Each cluster will be described by a center μj
Each point, xi, will be assigned to one cluster
Call this assignment c(i)Our goal is to find the assignments and centers that minimize
How do we do this?
Optimizing c(i) and μj jointly is too difficult
But!What if I know μ
j already?
How do I minimize this?
How do we do this?
What if I know c(i) already?
Do you see why it's called k-means?
K-Means
How does this translate to images?
(From Comanciu and Meer)
Image Segmentation by K-Means
Select a value of KSelect a feature vector for every pixel (color, texture, position, or combination of these etc.)Define a similarity measure between feature vectors (Usually Euclidean Distance).Apply K-Means Algorithm.Apply Connected Components Algorithm.Merge any components of size less than some threshold to an adjacent component that is most similar to it.
K-means clustering using intensity alone and color alone
Image
Clusters on intensity
Clusters on color
K-means using color alone, 11 segments
Image
Clusters on color
K-means usingcolor alone,11 segments.
Probabilistic Point of View
We'll take a generative point of viewHow to generate a data point:1)Choose a cluster,z, from (1 .... N)2)Sample that point from the distribution associated with that cluster
1D Example
Called a Mixture Model
z indicates which cluster is chosen
Probability of choosing cluster k
Probability of x given the cluster is k
or
To make it a Mixture of Gaussians
Called a mixing coefficient
Brief Review of Gaussians
Mixture of Gaussians
In Context of Our Previous Model
Now, we have means and covariances
How does this help with clustering?
If we had the parameters of the clusters, it would be easy to assign points to clusters
How do we get the cluster parameters? We'll maximize the likelihood of the data
Mathematically, this means
Log of Mixture ModelMixture Model
Now we run into a problem
This is hard to maximize But, we can lower bound it If the lower bound is easy to work with, we
can maximize it. That should push the true function up
Lower Bounding
We use a theorem called Jensen's inequality
These have to add up to one
This looks familiar
This looks a lot like using Bayes rule to find the probability of that point's cluster
Now life is easier We can now differentiate to find parameters This is called the M-Step, The previous step is called the E-Step You are always increasing a lower bound Complete set of steps:
Find Mean Covariance Mixing Coefficients
Where this comes from
Let's differentiate with respect to \mu_k
Mixing Coefficients
EM Algorithm
This is called the E-StepM-Step: Using these estimates of maximize the rest of the parameters
Find Mean Covariance Mixing Coefficients
Back to clustering
Now we have Can be seen as a soft-clustering
How many clusters?
Remember the line problem?
Basic Idea
We want to fit the data well, but we don't want a model that is too complex
We are balancing two issues: Fitting the data Model complexity (Here, that is the
number of lines Three popular criteria for evaluating this
AIC – An Information Criterion
L is the squared error in our predictions (There is a probabilistic interpretation also,
involving the log-likelihood) The variable p is the number of
parameters
BIC – Bayes Information Criterion
L is the squared error in our predictions (There is a probabilistic interpretation also,
involving the log-likelihood) The variable N is the number of
parameters Also called MDL (Minimum Description
Length)
It doesn't always work (But it's close)
Another Clustering Application
Another Clustering Application
In this case, we have a video and we want to segment out what's moving or changing
from C. Stauffer and W. Grimson
Easy Solution
Average a bunch of frames to get a “Background” ImageComputer the difference between background and foreground
The difficulty with this approach
The background changes
(From Stauffer and Grimson)
Solution
Fit a mixture model to the backgroundI.E. A background pixel could have multiple colors
Can use this to track in surveillance
Advantages/Disadvantages
Advantages:Easy to code!Flexible, you can easily incorporate
cues like proximity by including more features
Be careful about scaling! (why?)Monotonic optimization
Advantages/Disadvantages
Disadvantages:Only converges to a local minimumYou still need to initialize itThat could have a big impact on quality
of results