ICCV2009: MAP Inference in Discrete Models: Part 2

Course Program9.30-10.00 Introduction (Andrew Blake)

10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)

15min Coffee break

11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)

12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)

1 hour Lunch break

14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)

15:00-15.30 Speed and Efficiency (Pushmeet Kohli)

15min Coffee break

15:45-16.15 Comparison of Methods (Carsten Rother)

16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)

All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/

Discrete Models in Computer Vision

Carsten Rother

Microsoft Research Cambridge

Overview

• Introduce Factor graphs notation

• Categorization of models in Computer Vision:

– 4-connected MRFs

– Highly-connected MRFs

– Higher-order MRFs

Model : discrete or continuous variables? discrete or continuous space? Dependence between variables? …

Markov Random Field Models for Computer Vision

Inference: Graph Cut (GC)

Belief Propagtion (BP)

Tree-Reweighted Message Parsing (TRW)

Iterated Conditional Modes (ICM)

Cutting-plane

Dual-decomposition

…

Learning: Exhaustive search (grid search)

Pseudo-Likelihood approximation

Training in Pieces

Max-margin

…

Applications: 2D/3D Image segmentation Object Recognition 3D reconstruction Stereo matching Image denoising Texture Synthesis Pose estimation Panoramic Stitching …

Recap: Image Segmentation

P(x|z) ~ P(z|x) P(x)

P(x|z) ~ exp{-E(x)}

E(x) = ∑ θi (xi) + w∑ θij (xi,xj)

Posterior; Likelihood; Prior

Gibbs distribution

i,j Є N4i

Energy

E: {0,1}n → R

Unary term Pairwise term

Input z Maximum-a-posteriori (MAP)x* = argmax P(x|z)

x x= argmin E(x)

Min-Marginals(uncertainty of MAP-solution)

Can be used in several ways:• Insights on the model• For optimization (TRW, comes later)

image MAP Min-Marginals(foreground)

(bright=very certain)

ψv;i = min E(x)xv=i

Definition:

Introducing Factor Graphs

Write probability distributions as Graphical models:

- Direct graphical model- Undirected graphical model (… what Andrew Blake used)

- Factor graphs

References:- Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+

- several lectures at the Machine Learning Summer School 2009 (see video lectures)

Factor Graphs

P(x) ~ θ(x1,x2,x4) θ(x3,x4) θ(x2,x3) θ(x4,x5)

x2x1

x4x3

x5

Factor graph

unobserved/latent/hidden variable

P(x) ~ exp{-E(x)} E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)

variables are in same factor.

“4 factors”

Gibbs distribution

Definition “Order”

Definition “Order”:The arity (number of variables) of the largest factor

P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)

x2x1

x4x3

x5

Factor graph with order 3

arity 3 arity 2

Examples - Order

4-connected; pairwise MRF

Higher-order MRF

E(x) = ∑ θij (xi,xj)i,j Є N4

higher(8)-connected; pairwise MRF

E(x) = ∑ θij (xi,xj)i,j Є N8

Order 2 Order 2 Order n

E(x) = ∑ θij (xi,xj)

+θ(x1,…,xn)i,j Є N4

“Pairwise energy” “higher-order energy”

Example: Image segmentationP(x|z) ~ exp{-E(x)}

E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)i

Observed variable

Unobserved (latent) variable

xi

i,j Є N4

zi

Factor graph

xj

Most simple inference technique:ICM (iterated conditional mode)

x2

x1 x4

x5

x3

x* = argmin E(x)x

Goal:

E(x) = θ12 (x1,x2)+ θ13 (x1,x3)+θ14 (x1,x4)+ θ15 (x1,x5)+…

Most simple inference technique:ICM (iterated conditional mode)

Can get stuck in local minima!

E(x) = θ12 (x1,x2)+ θ13 (x1,x3)+θ14 (x1,x4)+ θ15 (x1,x5)+…

ICM Global min

x2

x1 x4

x5

x3

x* = argmin E(x)x

Simulated Annealing: accept a move even if energy increases (with certain probability)

means observed

Goal:

Overview


• Categorization of models in Computer Vision:




d=4

d=0

Stereo matching

Ground truth depthImage – left(a) Image – right(b)

• Images rectified• Ignore occlusion for now

E(d): {0,…,D-1}n → R

Energy:

Labels: d (depth/shift)

di

Stereo matching - Energy

θij (di,dj) = g(|di-dj|)

E(d): {0,…,D-1}n → R

Energy:

E(d) = ∑ θi (di) + ∑ θij (di,dj)

Pairwise:

i i,j Є N4

θi (di) = (lj-ri-di)“SAD; Sum of absolute differences”(many others possible, NCC,…)

i

i-2(di=2)

Unary:

Left ImageR

ight

Imag

e

many others

left

right

Stereo matching - prior

[Olga Veksler PhD thesis, Daniel Cremers et al.]

|di-dj|


cost

No truncation(global min.)


[Olga Veksler PhD thesis, Daniel Cremers et al.]

|di-dj|

discontinuity preserving potentials*Blake&Zisserman’83,’87+


cost

No truncation(global min.)

with truncation(NP hard optimization)

Stereo matching see http://vision.middlebury.edu/stereo/

No MRFPixel independent (WTA)

No horizontal links Efficient since independent chains

Ground truthPairwise MRF[Boykov et al. ‘01+

http://vision.middlebury.edu/stereo/

Texture synthesis

Input

Output

[Kwatra et. al. Siggraph ‘03 +

E: {0,1}n → R

b

a

O

1i,j Є N4

E(x) = ∑ |xi-xj| [ |ai-bi|+|aj-bj| ]

a

b

a

b

i j i j

Good case: Bad case:

Video Synthesis

OutputInput

Video

Video (duplicated)

Panoramic stitching

Panoramic stitching

AutoCollage

http://research.microsoft.com/en-us/um/cambridge/projects/autocollage/ [Rother et. al. Siggraph ‘05 +

Recap: 4-connected MRFs

• A lot of useful vision systems are based on 4-connected pairwise MRFs.

• Possible Reason (see Inference part): a lot of fast and good (globally optimal) inference methods exist

Overview


• Categorization of models in Computer Vision?




Why larger connectivity?

We have seen…

• “Knock-on” effect (each pixel influences each other pixel)

• Many good systems

What is missing:

1. Modelling real-world texture (images)

2. Reduce discretization artefacts

3. Encode complex prior knowledge

4. Use non-local parameters

Reason 1: Texture modelling

Test image Test image (60% Noise)Training images

Result MRF9-connected

(7 attractive; 2 repulsive)

Result MRF4-connected

Result MRF4-connected(neighbours)

Reason1: Texture Modellinginput output

[Zalesny et al. ‘01+

Reason2: Discretization artefacts

*Boykov et al ‘03, ‘05+

Larger connectivity can model true Euclidean length (also other metric possible)

Eucl.

Length of the paths:

4-con.

5.65

8

1

8-con.

6.28

6.28

5.08

6.75


higher-connectivity can model true Euclidean length

4-connected Euclidean

8-connected Euclidean

8-connected geodesic

[Boykov et al. ‘03; ‘05+

3D reconstruction

[Slide credits: Daniel Cremers]

Reason 3: Encode complex prior knowledge: Stereo with occlusion

Each pixel is connected to D pixels in the other image

E(d): {1,…,D}2n → R

matchθlr (dl,dr) =

dl dr

d=10 (match)

1

D

d

1

D

dd=20 (0 cost)

d=1 ( cost)∞

Left view right view

Stereo with occlusion

Ground truth Stereo with occlusion[Kolmogrov et al. ‘02+

Stereo without occlusion*Boykov et al. ‘01+

Reason 4: Use Non-local parameters:Interactive Segmentation (GrabCut)

[Boykov and Jolly ’01+

GrabCut *Rother et al. ’04+

A meeting with the Queen

Reason 4: Use Non-local parameters:Interactive Segmentation (GrabCut)

An object is a compact set of colors:

[Rother et al Siggraph ’04+

E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)i i,j Є N4

E(x,w): {0,1}n x {GMMs}→ R

Red Red

w

Model jointly segmentation and color model:

Reason 4: Use Non-local parameters:Segmentation and Recognition

E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj)i i,j Є N4

E(x,w): {0,1}n x {Exemplar}→ R

Large set of example segmentation:

T(1) T(2) T(3)

1

Up to 2.000.000 exemplars

Goal, Segment test image:

“Hamming distance”

[Lempisky et al. ECCV ’08+

Reason 4: Use Non-local parameters:Segmentation and Recognition

[Lempisky et al. ECCV ’08+

UIUC dataset; 98.8% accuracy






Overview

Why Higher-order Functions?

In general θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)

Reasons for higher-order MRFs:

1. Even better image(texture) models:– Field-of Expert [FoE, Roth et al. ‘05+

– Curvature *Woodford et al. ‘08+

2. Use global Priors:– Connectivity *Vicente et al. ‘08, Nowizin et al. ‘09+

– Encode better training statistics *Woodford et al. ‘09+

Reason1: Better Texture Modelling

Test Image Test Image (60% Noise)

Training images

Result pairwise MRF9-connected

Higher Order Structure not Preserved

Higher-order MRF

*Rother et al CVPR ‘09+

Reason 2: Use global PriorForeground object must be connected:

User input Standard MRF:Removes noise (+)Shrinks boundary (-)

with connectivity

E(x) = P(x) + h(x) with h(x)={∞ if not 4-connected0 otherwise

[Vicente et. al. ’08Nowizin et al ‘09+

Reason 2: Use global PriorWhat is the prior of a MAP-MRF solution:

[Woodford et. al. ICCV ‘09](see poster on Friday)

Training image: 60% black, 40% white

MRF is a bad prior since input marginal statistic ignored !

MAP:prior(x) = 0.6 = 0.0168

Others less likely :

prior(x) = 0.6 * 0.4 = 0.0055 3

Introduce a global term, which controls global stats:

Pairwise MRF –Increase Prior strength

Ground truth

Noisy input

Global gradient prior






Summary

…. all useful models, but how do I optimize them?

Course Program9.30-10.00 Introduction (Andrew Blake)

10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)

15min Coffee break

11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)

12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)

1 hour Lunch break

14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)

15:00-15.30 Speed and Efficiency (Pushmeet Kohli)

15min Coffee break

15:45-16.15 Comparison of Methods (Carsten Rother)

16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)

All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/

END

unused slides …

Markov Property

• Markov Property: Each variable is only connected to a few others,

i.e. many pixels are conditional independent

• This makes inference easier (possible at all)

• But still… every pixel can influence any other pixel (knock-on effect)

xi

Recap: Factor Graphs

• Factor graphs: very good representation since it reflects directly the given energy

• MRF (Markov Property) means many pixels are conditional independent

• Still … all pixels influence each other (knock-on effect)

Interactive Segmentation - Tutorial example

Goal

Given Z and unknown (latent) variables x:

P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)

z = (R,G,B)n x = {0,1}n

Posterior Probability

Likelihood(data-

dependent)

Maximium a Posteriori (MAP): x* = argmax P(x|z)

Prior(data-

independent)

X

Likelihood P(x|z) ~ P(z|x) P(x)

Red

Gre

en

Red

Gre

en

Maximum likelihood:

x* = argmax P(z|x) =

argmax P(zi|xi)

Log P(zi|xi=0) Log P(zi|xi=1)

X

x∏xi

Likelihood P(x|z) ~ P(z|x) P(x)

Prior P(x|z) ~ P(z|x) P(x)

P(x) = 1/f ∏ θij (xi,xj)

f = ∑ ∏ θij (xi,xj) “partition function”

θij (xi,xj) = exp{-|xi-xj|} “ising prior”

xi xj

x

i,j Є N

i,j Є N

(exp{-1}=0.36; exp{0}=1)

Posterior distribution

P(x|z) = 1/f(z,w) exp{-E(x,z,w)}

E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)i i,j

P(zi|xi=1) xi + P(zi|xi=0) (1-xi) θi (xi,zi) =

θij (xi,xj) = |xi-xj|

Note, likelihood can be an arbitrary function of the data

P(x|z) ~ P(z|x) P(x)

Posterior “Gibbs” distribution:

Likelihood

prior

f(z,w) = ∑ exp{-E(x,z,w)}X

Energy

Unary terms Pairwise terms

Energy minization

-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)

MAP same as minimum Energy

MAP; Global min E

x* = argmin E(x,z,w)

ML

f(z,w) = ∑ exp{-E(x,z,w)}X

X

P(x|z) = 1/f(z,w) exp{-E(x,z,w)}

Weight prior and likelihood

w =0

E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)

w =10

w =200w =40

Moving away from a pure prior …

E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj)i,ji

θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||2})

ß=2(Mean(||zi-zj||2) )-1

Contrast Costising cost

||zi-zj||2

cost

“Going from a Markov random Field to Conditional random field”

Tree vs Loopy graphs

- MAP (in general) NP hard(see inference part)

- Marginals P(xi) also NP hard

[Felzenschwalb, Huttenlocher ‘01+

• MAP is tractable• Marginal, e.g. P(foot), tractable

chaintree

rootxi

Markov blanket of xi: all variables which are in same factor as xi


|di-dj|


cost

[Olga Veksler PhD thesis]

(Potts model) Smooth disparities

Potts model

Left image

Modelling texture [Zalesny et al ‘01+

input

“Unary only”

“8 connected MRF”

“13 connected MRF”

5.08 5.65

6.75 8


*Boykov et al ‘03, ‘05+

Larger connectivity can model true Euclidean length (also any Riemannian metric, e.g. geodesic length, can be modelled)

4-con.

Length of the path

8-con.

6.28

6.28

θij (xi,xj) = ∆a / (2*dis(xi,xj)) |xi –xj| ∆a = π/4

1

√2

1

true euc.

References Higher-order Functions?

• In general

Field of Experts Model (2x2; 5x5)*Roth, Black CVPR ‘05 +[Potetz, CVPR ‘07+

Minimize Curvature (3x1)*Woodford et al. CVPR ‘08 +

Large Neighbourhood (10x10 -> whole image)*Rother, Kolmogorov, Minka & Blake, CVPR ‘06+*Vicente, Kolmogorov, Rother, CVPR ‘08+[Komodiakis, Paragios, CVPR ‘09+*Rother, Kohli, Feng, Jia, CVPR ‘09+*Woodford, Rother, Kolmogorov, ICCV ‘09+*Vicente, Kolmogorov, Rother, ICCV ‘09+*Ishikawa, CVPR ‘09+*Ishikawa, ICCV ‘09+

θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)

Conditional Random Field (CRF)

Definition CRF: all factors may depend on the data zNo problem for inference (but parameter learning)

E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj)i

with θij (xi,xj,zi,zj) = |xi-xj| exp(-ß||zi-zj||2)

Contrast CostIsing cost

i,j Є N4

||zi-zj||2

xj

zj

zi

xj

zj

zi

xi

xj

θij

Factor graph

Date post:	29-Jun-2015
Category:	Education
Upload:	zukun
View:	381 times
Download:	3 times

ICCV2009: MAP Inference in Discrete Models: Part 2

Education