Probabilistic Inference Lecture 5
M. Pawan Kumar [email protected]
Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
• Open Book – Textbooks – Research Papers – Course Slides – No Electronic Devices
• Easy Questions – 10 points
• Hard Questions – 10 points
What to Expect in the Final Exam
Easy Question – BP Compute the reparameterization constants for (a,b) and (c,b) such that the unary potentials of b are equal to its min-marginals.
Va Vb
2
5 5 -3 Vc
6 12 -6
-5
-2
9
-2 -1 -4 -3
Hard Question – BP Provide an O(h) algorithm to compute the reparameterization constants of BP for an edge whose pairwise potentials are specified by a truncated linear model.
Easy Question – Minimum Cut Provide the graph corresponding to the MAP estimation problem in the following MRF.
Va Vb
2
5 5 -3 Vc
6 12 -6
-5
-2
9
-2 -1 -4 -3
Hard Question – Minimum Cut Show that the expansion algorithm provides a bound of 2M for the truncated linear metric, where M is the value of the truncation.
Easy Question – Relaxations Using an example, show that the LP-S relaxation is not tight for a frustrated cycle (cycle with an odd number of supermodular pairwise potentials).
Hard Question – Relaxations Prove or disprove that the LP-S and SOCP-MS relaxations are invariant to reparameterization.
Recap
Integer Programming Formulation
min ∑a ∑i θa;i ya;i + ∑(a,b) ∑ik θab;ik yab;ik
ya;i ∈ {0,1}
∑i ya;i = 1
yab;ik = ya;i yb;k
Integer Programming Formulation
min θTy
ya;i ∈ {0,1}
∑i ya;i = 1
yab;ik = ya;i yb;k
θ = [ … θa;i …. ; … θab;ik ….] y = [ … ya;i …. ; … yab;ik ….]
Linear Programming Relaxation
min θTy
ya;i ∈ {0,1}
∑i ya;i = 1
yab;ik = ya;i yb;k
Two reasons why we can’t solve this
Linear Programming Relaxation
min θTy
ya;i ∈ [0,1]
∑i ya;i = 1
yab;ik = ya;i yb;k
One reason why we can’t solve this
Linear Programming Relaxation
min θTy
ya;i ∈ [0,1]
∑i ya;i = 1
∑k yab;ik = ∑kya;i yb;k
One reason why we can’t solve this
Linear Programming Relaxation
min θTy
ya;i ∈ [0,1]
∑i ya;i = 1
One reason why we can’t solve this
= 1 ∑k yab;ik = ya;i∑k yb;k
Linear Programming Relaxation
min θTy
ya;i ∈ [0,1]
∑i ya;i = 1
∑k yab;ik = ya;i
One reason why we can’t solve this
Linear Programming Relaxation
min θTy
ya;i ∈ [0,1]
∑i ya;i = 1
∑k yab;ik = ya;i
No reason why we can’t solve this * *memory requirements, time complexity
Dual of the LP Relaxation Wainwright et al., 2001
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3 θ4 θ5 θ6
∑ θi = θ
Dual of the LP Relaxation Wainwright et al., 2001
q*(θ1)
∑ θi = θ
q*(θ2)
q*(θ3) q*(θ4) q*(θ5) q*(θ6)
∑ q*(θi) Dual of LP
θ
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi max
Dual of the LP Relaxation Wainwright et al., 2001
q*(θ1)
∑ θi ≡ θ
q*(θ2)
q*(θ3) q*(θ4) q*(θ5) q*(θ6)
Dual of LP
θ
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi ∑ q*(θi) max
Dual of the LP Relaxation Wainwright et al., 2001
∑ θi ≡ θ
max ∑ q*(θi)
I can easily compute q*(θi)
I can easily maintain reparam constraint
So can I easily solve the dual?
• TRW Message Passing
• Dual Decomposition
Outline
Things to Remember
• Forward-pass computes min-marginals of root
• BP is exact for trees
• Every iteration provides a reparameterization
TRW Message Passing Kolmogorov, 2006
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
∑ θi ≡ θ ∑ q*(θi)
Pick a variable Va
TRW Message Passing Kolmogorov, 2006
∑ θi ≡ θ ∑ q*(θi)
Vc Vb Va
θ1c;0
θ1c;1
θ1b;0
θ1b;1
θ1a;0
θ1a;1
Va Vd Vg
θ4a;0
θ4a;1
θ4d;0
θ4d;1
θ4g;0
θ4g;1
TRW Message Passing Kolmogorov, 2006
θ1 + θ4 + θrest ≡ θ q*(θ1) + q*(θ4) + K
Vc Vb Va Va Vd Vg
Reparameterize to obtain min-marginals of Va
θ1c;0
θ1c;1
θ1b;0
θ1b;1
θ1a;0
θ1a;1
θ4a;0
θ4a;1
θ4d;0
θ4d;1
θ4g;0
θ4g;1
TRW Message Passing Kolmogorov, 2006
θ’1 + θ’4 + θrest
Vc Vb Va
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’1a;0
θ’1a;1
Va Vd Vg
θ’4a;0
θ’4a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
One pass of Belief Propagation
q*(θ’1) + q*(θ’4) + K
TRW Message Passing Kolmogorov, 2006
θ’1 + θ’4 + θrest ≡ θ
Vc Vb Va Va Vd Vg
Remain the same
q*(θ’1) + q*(θ’4) + K
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’1a;0
θ’1a;1
θ’4a;0
θ’4a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
TRW Message Passing Kolmogorov, 2006
θ’1 + θ’4 + θrest ≡ θ
min{θ’1a;0,θ’1
a;1} + min{θ’4a;0,θ’4
a;1} + K
Vc Vb Va Va Vd Vg
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’1a;0
θ’1a;1
θ’4a;0
θ’4a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
TRW Message Passing Kolmogorov, 2006
θ’1 + θ’4 + θrest ≡ θ
Vc Vb Va Va Vd Vg
Compute average of min-marginals of Va
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’1a;0
θ’1a;1
θ’4a;0
θ’4a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
min{θ’1a;0,θ’1
a;1} + min{θ’4a;0,θ’4
a;1} + K
TRW Message Passing Kolmogorov, 2006
θ’1 + θ’4 + θrest ≡ θ
Vc Vb Va Va Vd Vg
θ’’a;0 = θ’1a;0+ θ’4
a;0
2
θ’’a;1 = θ’1a;1+ θ’4
a;1
2
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’1a;0
θ’1a;1
θ’4a;0
θ’4a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
min{θ’1a;0,θ’1
a;1} + min{θ’4a;0,θ’4
a;1} + K
TRW Message Passing Kolmogorov, 2006
θ’’1 + θ’’4 + θrest
Vc Vb Va Va Vd Vg
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’’a;0
θ’’a;1
θ’’a;0
θ’’a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
θ’’a;0 = θ’1a;0+ θ’4
a;0
2
θ’’a;1 = θ’1a;1+ θ’4
a;1
2
min{θ’1a;0,θ’1
a;1} + min{θ’4a;0,θ’4
a;1} + K
TRW Message Passing Kolmogorov, 2006
θ’’1 + θ’’4 + θrest ≡ θ
Vc Vb Va Va Vd Vg
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’’a;0
θ’’a;1
θ’’a;0
θ’’a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
θ’’a;0 = θ’1a;0+ θ’4
a;0
2
θ’’a;1 = θ’1a;1+ θ’4
a;1
2
min{θ’1a;0,θ’1
a;1} + min{θ’4a;0,θ’4
a;1} + K
TRW Message Passing Kolmogorov, 2006
Vc Vb Va Va Vd Vg
2 min{θ’’a;0, θ’’a;1} + K
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’’a;0
θ’’a;1
θ’’a;0
θ’’a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
θ’’1 + θ’’4 + θrest ≡ θ
θ’’a;0 = θ’1a;0+ θ’4
a;0
2
θ’’a;1 = θ’1a;1+ θ’4
a;1
2
TRW Message Passing Kolmogorov, 2006
Vc Vb Va Va Vd Vg
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’’a;0
θ’’a;1
θ’’a;0
θ’’a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
min {p1+p2, q1+q2} min {p1, q1} + min {p2, q2} ≥ 2 min{θ’’a;0, θ’’a;1} + K
θ’’1 + θ’’4 + θrest ≡ θ
TRW Message Passing Kolmogorov, 2006
Vc Vb Va Va Vd Vg
Objective function increases or remains constant
θ’1c;0
θ’1c;1
θ’1b;0
θ’1b;1
θ’’a;0
θ’’a;1
θ’’a;0
θ’’a;1
θ’4d;0
θ’4d;1
θ’4g;0
θ’4g;1
2 min{θ’’a;0, θ’’a;1} + K
θ’’1 + θ’’4 + θrest ≡ θ
TRW Message Passing
Initialize θi. Take care of reparam constraint
Choose random variable Va
Compute min-marginals of Va for all trees
Node-average the min-marginals
REPEAT
Kolmogorov, 2006
Can also do edge-averaging
Example 1
Va Vb
0
1 1
0
2
5
4
2 l0
l1
Vb Vc
0
2 3
1
4
2
6
3 Vc Va
1
4 1
0
6
3
6
4
5 6 7
Pick variable Va. Reparameterize.
Example 1
Va Vb
-3
-2 -1
-2
5
7
4
2 Vb Vc
0
2 3
1
4
2
6
3 Vc Va
-3
1 -3
-3
6
3
10
7
5 6 7
Average the min-marginals of Va
l0
l1
Example 1
Va Vb
-3
-2 -1
-2
7.5
7
4
2 Vb Vc
0
2 3
1
4
2
6
3 Vc Va
-3
1 -3
-3
6
3
7.5
7
7 6 7
Pick variable Vb. Reparameterize.
l0
l1
Example 1
Va Vb
-7.5
-7 -5.5
-7
7.5
7
8.5
7 Vb Vc
-5
-3 -1
-3
9
6
6
3 Vc Va
-3
1 -3
-3
6
3
7.5
7
7 6 7
Average the min-marginals of Vb
l0
l1
Example 1
Va Vb
-7.5
-7 -5.5
-7
7.5
7
8.75
6.5 Vb Vc
-5
-3 -1
-3
8.75
6.5
6
3 Vc Va
-3
1 -3
-3
6
3
7.5
7
6.5 6.5 7 Value of dual does not increase
l0
l1
Example 1
Va Vb
-7.5
-7 -5.5
-7
7.5
7
8.75
6.5 Vb Vc
-5
-3 -1
-3
8.75
6.5
6
3 Vc Va
-3
1 -3
-3
6
3
7.5
7
6.5 6.5 7 Maybe it will increase for Vc
NO
l0
l1
Example 1
Va Vb
-7.5
-7 -5.5
-7
7.5
7
8.75
6.5 Vb Vc
-5
-3 -1
-3
8.75
6.5
6
3 Vc Va
-3
1 -3
-3
6
3
7.5
7
Strong Tree Agreement
Exact MAP Estimate
f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0
l0
l1
Example 2
Va Vb
0
1 1
0
2
5
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
1 1
0
0
3
4
8
4 0 4
Pick variable Va. Reparameterize.
l0
l1
Example 2
Va Vb
-2
-1 -1
-2
4
7
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
0 1
-1
0
3
4
9
4 0 4
Average the min-marginals of Va
l0
l1
Example 2
Va Vb
-2
-1 -1
-2
4
8
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
0 1
-1
0
3
4
8
4 0 4 Value of dual does not increase
l0
l1
Example 2
Va Vb
-2
-1 -1
-2
4
8
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
0 1
-1
0
3
4
8
4 0 4 Maybe it will decrease for Vb or Vc
NO
l0
l1
Example 2
Va Vb
-2
-1 -1
-2
4
8
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
0 1
-1
0
3
4
8
f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1
f2(b) = 0 f2(c) = 1 Weak Tree Agreement
Not Exact MAP Estimate
l0
l1
Example 2
Va Vb
-2
-1 -1
-2
4
8
2
2 Vb Vc
1
0 0
1
0
0
0
0 Vc Va
0
0 1
-1
0
3
4
8
Weak Tree Agreement Convergence point of TRW
l0
l1
f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1
f2(b) = 0 f2(c) = 1
Obtaining the Labelling
Only solves the dual. Primal solutions?
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ’ = ∑ θi ≡ θ
Fix the label Of Va
Obtaining the Labelling
Only solves the dual. Primal solutions?
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ’ = ∑ θi ≡ θ
Fix the label Of Vb
Continue in some fixed order Meltzer et al., 2006
Computational Issues of TRW
• Speed-ups for some pairwise potentials
Basic Component is Belief Propagation
Felzenszwalb & Huttenlocher, 2004
• Memory requirements cut down by half Kolmogorov, 2006
• Further speed-ups using monotonic chains Kolmogorov, 2006
Theoretical Properties of TRW
• Always converges, unlike BP Kolmogorov, 2006
• Strong tree agreement implies exact MAP Wainwright et al., 2001
• Optimal MAP for two-label submodular problems
Kolmogorov and Wainwright, 2005
θab;00 + θab;11 ≤ θab;01 + θab;10
Results Binary Segmentation Szeliski et al. , 2008
Labels - {foreground, background}
Unary Potentials: -log(likelihood) using learnt fg/bg models
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
Results Binary Segmentation
Labels - {foreground, background}
Unary Potentials: -log(likelihood) using learnt fg/bg models
Szeliski et al. , 2008
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
TRW
Results Binary Segmentation
Labels - {foreground, background}
Unary Potentials: -log(likelihood) using learnt fg/bg models
Szeliski et al. , 2008
Belief Propagation
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
Results Stereo Correspondence Szeliski et al. , 2008
Labels - {disparities}
Unary Potentials: Similarity of pixel colours
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
Results Szeliski et al. , 2008
Labels - {disparities}
Unary Potentials: Similarity of pixel colours
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
TRW
Stereo Correspondence
Results Szeliski et al. , 2008
Labels - {disparities}
Unary Potentials: Similarity of pixel colours
Belief Propagation
Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels
Stereo Correspondence
Results Non-submodular problems Kolmogorov, 2006
BP TRW-S
30x30 grid K50
BP TRW-S
BP outperforms TRW-S
Code + Standard Data
http://vision.middlebury.edu/MRF
• TRW Message Passing
• Dual Decomposition
Outline
Dual Decomposition
minx ∑i gi(x) s.t. x ∈ C
Dual Decomposition
minx,xi ∑i gi(xi)
s.t. xi ∈ C xi = x
Dual Decomposition
minx,xi ∑i gi(xi)
s.t. xi ∈ C
Dual Decomposition
minx,xi ∑i gi(xi) + ∑i λi
T(xi-x) s.t. xi ∈ C
maxλi
KKT Condition: ∑i λi = 0
Dual Decomposition
minx,xi ∑i gi(xi) + ∑i λi
Txi s.t. xi ∈ C
maxλi
Dual Decomposition
minxi ∑i (gi(xi) + λi
Txi) s.t. xi ∈ C
Projected Supergradient Ascent
maxλi
Supergradient s of h(z) at z0
h(z) - h(z0) ≤ sT(z-z0), for all z in the feasible region
Dual Decomposition
minxi ∑i (gi(xi) + λi
Txi) s.t. xi ∈ C
Initialize λi0
= 0
maxλi
Dual Decomposition
minxi ∑i (gi(xi) + λi
Txi) s.t. xi ∈ C
Compute supergradients
maxλi
si = argminxi ∑i (gi(xi) + (λi
t)Txi)
Dual Decomposition
minxi ∑i (gi(xi) + λi
Txi) s.t. xi ∈ C
Project supergradients
maxλi
pi = si - ∑j sj/m
where ‘m’ = number of subproblems (slaves)
Dual Decomposition
minxi ∑i (gi(xi) + λi
Txi) s.t. xi ∈ C
Update dual variables
maxλi
λit+1
= λit + ηt pi
where ηt = learning rate = 1/(t+1) for example
Dual Decomposition Initialize λi
0 = 0
Compute projected supergradients
si = argminxi ∑i (gi(xi) + (λi
t)Txi)
pi = si - ∑j sj/m
Update dual variables
λit+1
= λit + ηt pi
REPEAT
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
1 0
s4a =
Slaves agree on label for Va
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
1 0
s4a =
0 0
p1a =
0 0
p4a =
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
0 1
s4a =
Slaves disagree on label for Va
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
0 1
s4a =
0.5
-0.5 p1
a =
-0.5
0.5 p4
a =
Unary cost increases
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
0 1
s4a =
0.5
-0.5 p1
a =
-0.5
0.5 p4
a =
Unary cost decreases
Dual Decomposition Komodakis et al., 2007
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
Va Vb Vc
Vd Ve Vf
Vg Vh Vi
θ1
θ2
θ3
θ4 θ5 θ6
1 0
s1a =
0 1
s4a =
0.5
-0.5 p1
a =
-0.5
0.5 p4
a =
Push the slaves towards agreement
Comparison TRW DD
Fast Slow
Local Maximum Global Maximum
Requires Min-Marginals
Requires MAP Estimate
Other forms of slaves Tighter relaxations
Sparse high-order potentials