MC949 - Computer Vision Lecture #13
Prof. Dr. Anderson Rocha
Microsoft Research Faculty Fellow Affiliate Member, Brazilian Academy of Sciences
Reasoning for Complex Data (Recod) Lab. Institute of Computing, University of Campinas (Unicamp)
Campinas, SP, Brazil
[email protected] http://www.ic.unicamp.br/~rocha
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians
This lecture slides were made based on slides of several researchers such as James Hayes, Derek Hoiem, Alexei Efros, Steve Seitz, David Forsyth and many others. Many thanks to all of these authors.
Reading: PRML, Cap. #9: Secs. 9.1 and 9.2
Today’s Class • Examples of Missing Data Problems
– Detec:ng outliers
• Background – Maximum Likelihood Es:ma:on – Probabilis:c Inference
• Dealing with “Hidden” Variables – EM algorithm, Mixture of Gaussians – Hard EM
Slide: Derek Hoiem
Missing Data Problems: Outliers You want to train an algorithm to predict whether a photograph is aIrac:ve. You collect annota:ons from Mechanical Turk. Some annotators try to give accurate ra:ngs, but others answer randomly.
Challenge: Determine which people to trust and the average ra:ng by accurate annotators.
Photo: Jam343 (Flickr)
Annotator Ratings
10 8 9 2 8
Missing Data Problems: Object Discovery You have a collec:on of images and have extracted regions from them. Each is represented by a histogram of “visual words”. Challenge: Discover frequently occurring object categories, without pre-‐trained appearance models.
http://www.robots.ox.ac.uk/~vgg/publications/papers/russell06.pdf
Missing Data Problems: Segmenta:on You are given an image and want to assign foreground/background pixels.
Challenge: Segment the image into figure and ground without knowing what the foreground looks like in advance.
Foreground
Background
Slide: Derek Hoiem
Missing Data Problems: Segmenta:on Challenge: Segment the image into figure and ground without knowing what the foreground looks like in advance.
Three steps: 1. If we had labels, how could we model the appearance of
foreground and background? 2. Once we have modeled the fg/bg appearance, how do we
compute the likelihood that a pixel is foreground? 3. How can we get both labels and appearance models at
once?
Foreground
Background
Slide: Derek Hoiem
Maximum Likelihood Es:ma:on 1. If we had labels, how could we model the appearance
of foreground and background?
Foreground
Background
Slide: Derek Hoiem
Maximum Likelihood Es:ma:on
{ }
∏=
=
=
nn
N
xp
p
xx
)|(argmaxˆ
)|(argmaxˆ..1
θθ
θθ
θ
θx
xdata parameters
Slide: Derek Hoiem
Maximum Likelihood Es:ma:on
{ }
∏=
=
=
nn
N
xp
p
xx
)|(argmaxˆ
)|(argmaxˆ..1
θθ
θθ
θ
θx
x
Gaussian Distribution
( )⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2
2
2exp
21),|(
σµ
πσσµ n
nxxp
Slide: Derek Hoiem
Maximum Likelihood Es:ma:on
{ }
∏=
=
=
nn
N
xp
p
xx
)|(argmaxˆ
)|(argmaxˆ..1
θθ
θθ
θ
θx
x
( )⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2
2
2exp
21),|(
σµ
πσσµ n
nxxp
Gaussian Distribution
∑=n
nxN1
µ̂ ( )∑ −=n
nxN22 ˆ1ˆ µσ
Slide: Derek Hoiem
Example: MLE
>> mu_fg = mean(im(labels)) mu_fg = 0.6012
>> sigma_fg = sqrt(mean((im(labels)-mu_fg).^2)) sigma_fg = 0.1007
>> mu_bg = mean(im(~labels)) mu_bg = 0.4007
>> sigma_bg = sqrt(mean((im(~labels)-mu_bg).^2)) sigma_bg = 0.1007
>> pfg = mean(labels(:));
labels im
fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1
Parameters used to Generate
Slide: Derek Hoiem
Probabilis:c Inference 2. Once we have modeled the fg/bg appearance, how
do we compute the likelihood that a pixel is foreground?
Foreground
Background
Slide: Derek Hoiem
Probabilis:c Inference Compute the likelihood that a par:cular model generated a sample
component or label
),|( θnn xmzp =
Slide: Derek Hoiem
Probabilis:c Inference Compute the likelihood that a par:cular model generated a sample
component or label
( )( )θ
θθ
||,),|(
n
mnnnn xp
xmzpxmzp ===
Slide: Derek Hoiem
Probabilis:c Inference Compute the likelihood that a par:cular model generated a sample
component or label
( )( )θ
θθ
||,),|(
n
mnnnn xp
xmzpxmzp ===
( )( )∑ =
==
kknn
mnn
xkzpxmzp
θθ|,|,
Slide: Derek Hoiem
Probabilis:c Inference Compute the likelihood that a par:cular model generated a sample
component or label
( )( )θ
θθ
||,),|(
n
mnnnn xp
xmzpxmzp ===
( ) ( )( ) ( )∑ ==
===
kknknn
mnmnn
kzpkzxpmzpmzxp
θθθθ|,||,|
( )( )∑ =
==
kknn
mnn
xkzpxmzp
θθ|,|,
Slide: Derek Hoiem
Example: Inference
>> pfg = 0.5; >> px_fg = normpdf(im, mu_fg, sigma_fg);
>> px_bg = normpdf(im, mu_bg, sigma_bg);
>> pfg_x = px_fg*pfg ./ (px_fg*pfg + px_bg*(1-pfg));
im fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1
Learned Parameters
p(fg | im) Slide: Derek Hoiem
Figure from “Bayesian Matting”, Chuang et al. 2001
Mixture of Gaussian* Example: MaWng
Mixture of Gaussian* Example: MaWng
Result from “Bayesian Matting”, Chuang et al. 2001
Dealing with Hidden Variables
3. How can we get both labels and appearance models at once?
Foreground
Background
Slide: Derek Hoiem
Mixture of Gaussians
( )m
m
mn
m
xπ
σ
µ
πσ⋅⎟⎟⎠
⎞⎜⎜⎝
⎛ −−= 2
2
2 2exp
2
1
( ) ( )mmmnnnn mzxpmzxp πσµ ,,|,,,|, 22 === πσµ
( ) ( )mnmmn mzpxp πσµ |,| 2 ==
mixture component
( ) ( )∑ ==m
mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ
component prior component model parameters
Slide: Derek Hoiem
Mixture of Gaussians
With enough components, can represent any probability density func:on – Widely used as general purpose pdf es:mator
Slide: Derek Hoiem
Segmenta:on with Mixture of Gaussians
Pixels come from one of several Gaussian components – We don’t know which pixels come from which components
– We don’t know the parameters for the components
Slide: Derek Hoiem
Simple solu:on 1. Ini:alize parameters
2. Compute the probability of each hidden variable given the current parameters
3. Compute new parameters for each model, weighted by likelihood of hidden variables
4. Repeat 2-‐3 un:l convergence
Slide: Derek Hoiem
Mixture of Gaussians: Simple Solu:on 1. Ini:alize parameters
2. Compute likelihood of hidden variables for current parameters
3. Es:mate new parameters for each model, weighted by likelihood
),,,|( )()(2)( tttnnnm xmzp πσµ==α
∑∑=+
nnnm
nnm
tm xα
αµ
1ˆ )1( ( )∑∑−=
+
nmnnm
nnm
tm x 2)1(2 ˆ1ˆ µα
ασ
Nn
nmt
m
∑=+
απ )1(ˆ
Slide: Derek Hoiem
Expecta:on Maximiza:on (EM) Algorithm
( )⎟⎠
⎞⎜⎝
⎛= ∑
zzx θθ
θ|,logargmaxˆ pGoal:
[ ]( ) ( )[ ]XfXf EE ≥
Jensen’s Inequality
Log of sums is intractable
See here for proof: www.stanford.edu/class/cs229/notes/cs229-notes8.ps for concave funcions, such as f(x)=log(x)
Slide: Derek Hoiem
Expecta:on Maximiza:on (EM) Algorithm
1. E-‐step: compute
2. M-‐step: solve
( )( )[ ] ( )( ) ( ))(,|
,||,log|,logE )(t
xz pppt θθθθ
xzzxzxz∑=
( )( ) ( ))()1( ,||,logargmax tt pp θθθθ
xzzxz∑=+
( )⎟⎠
⎞⎜⎝
⎛= ∑
zzx θθ
θ|,logargmaxˆ pGoal:
Slide: Derek Hoiem
EM for Mixture of Gaussians (on board) ( )
∑ ⋅⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
mm
m
mn
m
xπ
σ
µ
πσ2
2
2exp
2
1( ) ( )∑ ==m
mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ
1. E-‐step:
2. M-‐step:
( )( )[ ] ( )( ) ( ))(,|
,||,log|,logE )(t
xz pppt θθθθ
xzzxzxz∑=
( )( ) ( ))()1( ,||,logargmax tt pp θθθθ
xzzxz∑=+
EM for Mixture of Gaussians (on board) ( )
∑ ⋅⎟⎟⎠
⎞⎜⎜⎝
⎛ −−=
mm
m
mn
m
xπ
σ
µ
πσ2
2
2exp
2
1( ) ( )∑ ==m
mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ
1. E-‐step:
2. M-‐step:
( )( )[ ] ( )( ) ( ))(,|
,||,log|,logE )(t
xz pppt θθθθ
xzzxzxz∑=
( )( ) ( ))()1( ,||,logargmax tt pp θθθθ
xzzxz∑=+
),,,|( )()(2)( tttnnnm xmzp πσµ==α
∑∑=+
nnnm
nnm
tm xα
αµ
1ˆ )1( ( )∑∑−=
+
nmnnm
nnm
tm x 2)1(2 ˆ1ˆ µα
ασ
Nn
nmt
m
∑=+
απ )1(ˆ
Slide: Derek Hoiem
EM Algorithm
• Maximizes a lower bound on the data likelihood at each itera:on
• Each step increases the data likelihood – Converges to local maximum
• Common tricks to deriva:on – Find terms that sum or integrate to 1 – Lagrange mul:plier to deal with constraints
Slide: Derek Hoiem
Mixture of Gaussian demos
• hIp://www.cs.cmu.edu/~alad/em/ • hIp://lcn.epfl.ch/tutorial/english/gaussian/html/
Slide: Derek Hoiem
“Hard EM” • Same as EM except compute z* as most likely values for hidden variables
• K-‐means is an example
• Advantages – Simpler: can be applied when cannot derive EM – Some:mes works beIer if you want to make hard predic:ons at the end
• But – Generally, pdf parameters are not as accurate as EM
Slide: Derek Hoiem
Missing Data Problems: Outliers You want to train an algorithm to predict whether a photograph is aIrac:ve. You collect annota:ons from Mechanical Turk. Some annotators try to give accurate ra:ngs, but others answer randomly.
Challenge: Determine which people to trust and the average ra:ng by accurate annotators.
Photo: Jam343 (Flickr)
Annotator Ratings
10 8 9 2 8
Missing Data Problems: Object Discovery You have a collec:on of images and have extracted regions from them. Each is represented by a histogram of “visual words”. Challenge: Discover frequently occurring object categories, without pre-‐trained appearance models.
http://www.robots.ox.ac.uk/~vgg/publications/papers/russell06.pdf
What’s wrong with this predic:on?
P(foreground | image)
Slide: Derek Hoiem
Next class • MRFs and Graph-‐cut Segmenta:on
Slide: Derek Hoiem
Missing data problems • Outlier Detec:on
– You expect that some of the data are valid points and some are outliers but…
– You don’t know which are inliers – You don’t know the parameters of the distribu:on of inliers
• Modeling probability densi:es – You expect that most of the colors within a region come from one of a few normally distributed sources but…
– You don’t know which source each pixel comes from – You don’t know the Gaussian means or covariances
• Discovering paIerns or “topics” – You expect that there are some re-‐occuring “topics” of codewords within an image collec:on. You want to…
– Figure out the frequency of each codeword within each topic – Figure out which topic each image belongs to
Slide: Derek Hoiem
Running Example: Segmenta:on 1. Given examples of foreground and
background – Es:mate distribu:on parameters – Perform inference
2. Knowing that foreground and background are “normally distributed”
3. Some ini:al es:mate, but no good knowledge of foreground and background