Lecture 11: E-M and Mean- Shift - UCF Computer Sciencemtappen/cap5415/lecs/lec11.pdf ·...

Lecture 11: E-M and Mean-Shift

CAP 5415Fall 2007

Review on Segmentation by Clustering

Each Pixel

Data Vector

Example

(From Comanciu and Meer)

Review of k-means

• Let's find three clusters in this data• These points could represent RGB triplets in

3D

• Begin by guessing where the “center” of each cluster is

Review of k-means

• Now assign each point to the closest cluster

Review of k-means

• Now move each cluster center to the center of the points assigned to it

• Repeat this process until it converges

Review of k-means

Probabilistic Point of View

• We'll take a generative point of view

• How to generate a data point:

1) Choose a cluster,z, from (1 .... N)

2) Sample that point from the distribution associated with that cluster

1D Example

Called a Mixture Model

• z indicates which cluster is chosen

Probability of choosing cluster k

Probability of x given the cluster is k

or

To make it a Mixture of Gaussians

Called a mixing coefficient

Brief Review of Gaussians

Mixture of Gaussians

In Context of Our Previous Model

• Now, we have means and covariances

How does this help with clustering?

• Let's think about a different problem first• What if we had a set of data points and we

wanted to find the parameters of the mixture model?

• Typical strategy: Optimize parameters to maximize likelihood of the data

Maximizing the likelihood

• Easy if we knew which cluster each point should belong to

• But we don't, so we get its probability function by using Bayes Rule

Where this comes from

• Let's differentiate with respect to \mu_k

EM Algorithm

• This is called the E-Step• M-Step: Using these estimates of

maximize the rest of the parameters• Lots of interesting math and intuitions that go

into this algorithm, that I'm not covering• Take Pattern Recognition!

Back to clustering

• Now we have • Can be seen as a soft-clustering

Another Clustering Application

Another Clustering Application

• In this case, we have a video and we want to segment out what's moving or changing

from C. Stauffer and W. Grimson

Easy Solution

• Average a bunch of frames to get a “Background” Image

• Computer the difference between background and foreground

The difficulty with this approach

• The background changes

(From Stauffer and Grimson)

Solution

• Fit a mixture model to the background• I.E. A background pixel could have multiple

colors

Can use this to track in surveillance

Suggested Reading• Chapter 14, David A. Forsyth and Jean Ponce,

“Computer Vision: A Modern Approach”.• Chapter 3, Mubarak Shah, “Fundamentals of

Computer Vision”

Advantages and Disadvantages

Mean-Shift

• Like EM, this algorithm is built on probabilistic intuitions.

• To understand EM we had to understand mixture models

• To understand mean-shift, we need to understand kernel density estimation (Take Pattern Recognition!)

Basics of Kernel Density Estimation

• Let’s say you have a bunch of points drawn from some distribution

• What’s the distribution that generated these points?

Using a Parametric Model

• Could fit a parametric model (like a Gaussian)

• Why:– Can express

distribution with a few number of parameters (like mean and variance)

• Why not:– Limited in flexibility

Non-Parametric Methods

• We’ll focus on kernel-density estimates• Basic Idea: Use the data to define the

distribution• Intuition:

– If I were to draw more samples from the same probability distribution, then those points would probably be close to the points that I have already drawn

– Build distribution by putting a little mass of probability around each data-point

Example

(From Tappen – Thesis)

Formally

• Most Common Kernel: Gaussian or Normal Kernel

• Another way to think about it:– Make an image, put 1(or more) wherever you have a

sample– Convolve with a Gaussian

Kernel

What is Mean-Shift?

• The density will have peaks (also called modes)• If we started at point and did gradient-ascent, we

would end up at one of the modes• Cluster based on which mode each point belongs

to

Gradient Ascent?

• Actually, no.• A set of iterative steps can be taken that

will monotonically converge to a mode– No worries about step sizes– This is an adaptive gradient ascent

(x = y

j)

Results

Results

Normalized Cuts

• Clustering approach based on graphs• First some background

Graphs• A graph G(V,E) is a triple consisting of a

vertex set V(G) an edge set E(G) and a relation that associates with each edge two vertices called its end points.

(From Slides by Khurram Shafique)

Connected and Disconnected Graphs

• A graph G is connected if there is a path from every vertex to every other vertex in G.

• A graph G that is not connected is called disconnected graph.

(From Slides by Khurram Shafique)

Can represent a graph with a matrix

a

e

d

c

b

[0 1 0 0 11 0 0 0 00 0 0 0 10 0 0 0 11 0 1 1 0

]Adjacency Matrix: WOne Row Per

Node

(Based on Slides by Khurram Shafique)

Can add weights to edges

[0 1 3 ∞ ∞

1 0 4 ∞ 23 4 0 6 7∞ ∞ 6 0 1∞ 2 7 1 0

]Weight Matrix: W


Minimum Cut

A cut of a graph G is the set of edges S such that removal of S from G disconnects G.

Minimum cut is the cut of minimum weight, where weight of cut <A,B> is given as


Minimum Cut• There can be more than one minimum cut in

a given graph

• All minimum cuts of a graph can be found in polynomial time1.

1H. Nagamochi, K. Nishimura and T. Ibaraki, “Computing all small cuts in an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481.


How does this relate to image segmentation?

• When we compute the cut, we've divided the graph into two clusters

• To get a good segmentation, the weight on the edges should represent pixels affinity for being in the same group

(Images from Khurram Shafique)

Affinities for Image Segmentation

Brightness Features

• Interpretation:– High weight edges for pixels that

• Have similar intensity• Are close to each other

Min-Cut won't work though• The minimum-cut will often choose a cut

with one small cluster

(Image From Shi and Malik)

We need a better criterion

• Instead of min-cut, we can use the normalized cut

• Basic Idea: Big clusters will increase assoc(A,V), thus decreasing Ncut(A,B)

Finding the Normalized Cut• NP-Hard Problem

• Can find approximate solution by finding the eigenvector with the second-smallest eigenvalue of this generalized eigenvalue problem

• That splits the data into two clusters

• Can recursively partition data to find more clusters

• Code available on Jianbo Shi's webpage

Results

Figure from “Normalized cuts and image segmentation,” Shi and Malik, 2000

So what if I want to segment my image?

• Ncuts is a very common solution• Mean-shift is also very popular

Date post:	24-May-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Lecture 11: E-M and Mean- Shift - UCF Computer Sciencemtappen/cap5415/lecs/lec11.pdf ·...

Documents