+ All Categories
Home > Documents > Graphical Models for Computer Vision

Graphical Models for Computer Vision

Date post: 09-Feb-2022
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
45
Graphical Models for Computer Vision Pedro F Felzenszwalb Brown University Joint work with Dan Huttenlocher, Joshua Schwartz, Ross Girshick, David McAllester, Deva Ramanan, Allie Shapiro, John Oberlin
Transcript
Page 1: Graphical Models for Computer Vision

Graphical Models for

Computer VisionPedro F Felzenszwalb

Brown University

Joint work with Dan Huttenlocher, Joshua Schwartz,

Ross Girshick, David McAllester, Deva Ramanan, Allie Shapiro, John Oberlin

Page 2: Graphical Models for Computer Vision

Vision ProblemsLow-level vision High-level vision

RestorationCorrupted Restoration

Figure 8: Restoration results with an input that has missing values.

8 Summary

We have presented three algorithmic techniques for speeding up the belief propagation approach

for solving low level vision problems formulated in terms of Markov random fields. The main

focus of the paper is on the max-product formulation of belief propagation, and the corresponding

energy minimization problem in terms of costs that are proportional to negative log probabilities.

We also show how similar techniques apply to the sum-product formulation of belief propagation.

The use of our techniques yields results of comparable accuracy to other algorithms but hundreds

of times faster. In the case of stereo we quantified this accuracy using the Middlebury benchmark.

The method is quite straightforward to implement and in many cases should remove the need to

choose between fast local methods that have relatively low accuracy, and slow global methods that

have high accuracy.

The first of the three techniques reduces the time necessary to compute a single message update

from O(k2) to O(k), where k is the number of possible labels for each pixel. For the max-product

formulation this technique is applicable to problems where the discontinuity cost for neighboring

labels is a truncated linear or truncated quadratic function of the difference between the labels. The

method is not an approximation, it uses an efficient algorithm to produce exactly the same results

as the brute force quadratic time method. For sum-product a similar technique yields anO(k log k)

24

Depth estimationFigure 5: Stereo results for the Tsukuba image pair.

0 5 10 15 20 25 30 35 401.8

2

2.2

2.4

2.6

2.8

3 x 104

iterations

ener

gy

multiscalestandard

Figure 6: Energy of stereo solution as a function of the number of message update iterations.

with all but one of the techniques. In each case the running time of the algorithm is controlled by

varying the number of message update iterations. We see that each speedup technique provides

a significant benefit. Note how the min convolution method provides an important speedup even

when the number of labels is small (16 disparities for the Tsukuba images).

Table 1 shows evaluation results of our stereo algorithm on the Middlebury stereo benchmark

[9]. These results were obtained using the parameters described above. Overall our method per-

21

Segmentation

Recognition

Page 3: Graphical Models for Computer Vision

Bayesian Framework

• Bayesian approach

- We observe an image Y

- Hidden variables X --- depth map, object labels, etc.

- Vision involves statistical inference --- P(X|Y)

• Challenges

- Building good models for X and Y

- Thousands of random variables and large state spaces

Page 4: Graphical Models for Computer Vision

Image RestorationObject Detection

Multi-scale Models

Page 5: Graphical Models for Computer Vision

Image Restoration• Random variables

- X : clean picture

- Y : observed image

• P(X) : Markov random field

- Nearby pixels tend to be similar

- Markov blanket = 4 neighbors

• P(Y|X) : iid noise at each pixel

- Yi = Xi +ei

X Y

Page 6: Graphical Models for Computer Vision

• Minimize -log P(X|Y)

- D enforces consistency with the data (-log P(Y|X))

- V enforces smoothness (-log P(X))

• Computational burden

- huge number of variables

- large state spaces

- high treewidth

MAP estimation

E(X) =X

i

D(Xi, Yi) +X

ij

V (Xi, Xj)

Page 7: Graphical Models for Computer Vision

Discontinuity Costs

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

-5 -4 -3 -2 -1 0 1 2 3 4 5

-2

-1

1

2

Quadratic

X is smooth

Potts

X is piecewise constant

Truncated quadratic

X is piecewise smooth

MAP with different choices for V(a,b) = W(a-b) Y

Page 8: Graphical Models for Computer Vision

Computation• MCMC (simulated annealing)

- very general

- slow...

• Graph-cuts

- huge impact

- works extremely well on restricted models

• Loopy belief propagation

- very general

- can be very fast

Page 9: Graphical Models for Computer Vision

Runtime of Loopy BP

• Runtime depends on

- Time for computing a message

- Number of message updates for convergence

• Can exploit special problem structure to address both

Page 10: Graphical Models for Computer Vision

Message Computation

• M(b) = mina(V(a,b) + M1(a) + M2(a) + M3(a) + D(a))

• M(b) = mina(V(a,b) + H(a))

- k possible values for each pixel

- O(k2) time by “brute-force”

M

Page 11: Graphical Models for Computer Vision

Fast Message Computation

• M(b) = mina(V(a,b) + H(a))

- States are integers and V(a,b) = W(b-a)

- M(b) = mina(W(b-a) + H(a))

• Convolution of H and W in the (min,+) semi-ring

- No known general fast algorithm like FFT

- Best general algorithm O(k2/log(k))

- Fast methods for restricted W (we can pick W)

Page 12: Graphical Models for Computer Vision

Fast Min-Convolution

• M(b) = argmina(W(b-a) + H(a))

• Assume W(x) is convex

- If b’ ≥ b then M(b’) ≥ M(b) --- “no crossings”

- O(k log k) divide and conquer method

- A little more work to get O(k) method

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

b

A(b)

^

^ ^

Page 13: Graphical Models for Computer Vision

Fast Min-Convolution

• If W(x) = min(E(x), F(x))

- MW(b) = min(ME(b), MF(b))

• For truncated quadratic W

- E is quadratic

- F is constant

- Both convex - two O(k) computations plus O(k) to combine

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

Page 14: Graphical Models for Computer Vision

Multi-Grid• Number of updates for convergence is large

- Information needs to propagate across the whole image

• Define a hierarchy of problems

- Use messages from one level to initialize the one below

- Good initialization leads to fast convergence

level 0 level 1

Page 15: Graphical Models for Computer Vision

Hierarchical Algorithm

• Number of levels = log of image size

• LBP converges after ~10 iterations at each level(500x500 image)

200000

250000

300000

350000

400000

450000

500000

0 20 40 60 80 100 120 140 160 180 200

Ener

gy

Number of iterations

Energy of the tsukuba image

multiscalestandard

Page 16: Graphical Models for Computer Vision

Image Restoration

• Truncated quadratic discontinuity costs

• Quadratic data terms, no data for masked pixels

• 256 states per pixel, propagating over large areas

Page 17: Graphical Models for Computer Vision

Stereo Depth Estimation

• State-of-the-art accuracy at frame-rate

• Simple, elegant model

left camera right camera disparities

Page 18: Graphical Models for Computer Vision

Image RestorationObject Detection

Multi-scale Models

Page 19: Graphical Models for Computer Vision

Object Detection

[PASCAL VOC dataset]

Page 20: Graphical Models for Computer Vision

Part-Based Models

• Each part has an appearance model

• Flexible geometric arrangement

- Easier to model appearance of part than whole object

- Factorization leads to better generalization

[Fischler, Elschlager 73]

Page 21: Graphical Models for Computer Vision

Graphical Model• Object with n parts

• Random variables

- X = (X1 ... Xn) : object configuration

- Xi : location/pose of one part

- Y : image

• P(X) : Markov random field

- captures which geometric configurations are likely

• P(Y|X) : part appearance models + background model

X1

X2

X3

X4

Y

Page 22: Graphical Models for Computer Vision

Data Model• We would like P(Y|X) to factor

• Assume

- 1) pixels (features) in background are iid

- 2) parts don’t overlapA1A2

A3

BG

BG = Y \{A1,.., A3}

Page 23: Graphical Models for Computer Vision

Inference

• Fully factored data model

• Further assume P(X)

- tree-structured

- pairwise relative positions

• Fast MAP estimation using min-convolutions

- O(nk) time , n = number of parts, k = state space

- As fast as detecting each part separately

X1

X2

X3

X4

Y

Y

Y

Y

Page 24: Graphical Models for Computer Vision

Human Pose Estimation

Page 25: Graphical Models for Computer Vision

Object Category Detection

• Mixture of part-based models for each category

• Photometric invariant features (HOG)

• Discriminative learning (Latent SVM)

• Leading approach on PASCAL VOC benchmark

Page 26: Graphical Models for Computer Vision

Car

high scoring false positiveshigh scoring true positives

[PASCAL VOC dataset]

Page 27: Graphical Models for Computer Vision

Horsehigh scoring true positives high scoring false positives

[PASCAL VOC dataset]

Page 28: Graphical Models for Computer Vision

Image RestorationObject Detection

Multi-scale Models

Page 29: Graphical Models for Computer Vision

Curve Models• Model for curve

- Sequence of control points

- Markov model P(X) captures local shape

- Drift: hard to control accumulation of local variation

Locally these look very similar

Locally these look very different

Page 30: Graphical Models for Computer Vision

Multi-Scale Sequence Model

• Capture local properties at multiple resolutions

- Subsample A to get B

- local property of B = non-local property of A

full modeltree-width = 2

A

B

C

Page 31: Graphical Models for Computer Vision

Models for Closed Curves

1st order Markov model Multi-scale model

Both graphs have tree-width 2

Multi-scale model captures global shape properties

Page 32: Graphical Models for Computer Vision

Samples from P(X)

Multi-scale model captures global shape properties

Page 33: Graphical Models for Computer Vision

Shape Recognition

15 species

75 examples per species

(25 training, 50 test)

Nearest neighbor classificationNearest neighbor classification

Multi-scale model 96.28

Inner distance 94.13

Shape context 88.12

Swedish leaf dataset

Page 34: Graphical Models for Computer Vision

Shape Detection

Model

Model

Page 35: Graphical Models for Computer Vision

Boundary Detection

• Lots of regularities

- continuity, smoothness, closure, parallel lines, symmetry

• Can we build a “low/mid-level” model for P(X)?

[BSDS]

Page 36: Graphical Models for Computer Vision

Local Patterns• Look at each 3x3 patch C

- 512 possible patterns

• Energy model

- Parameterized by 512 costs

- Symmetries reduce to ~100

• Capture continuity, frequency junctions, etc.

Page 37: Graphical Models for Computer Vision

Coarse Local Patterns

• Coarsenings

- X 1 ... X

K

- X i+1 is a function of X

i

• Look at 3x3 blocks at all resolutions

- Vi ≠ Vj

Page 38: Graphical Models for Computer Vision

Coarse Patterns

frequency high-to-low →

reso

lutio

n →

Page 39: Graphical Models for Computer Vision

MCMC Inference

• Repeatedly update pixels

• P(X) is not Markov

- 3x3 block in X K might depend on whole picture

• Efficient MCMC via multi-scale representation

- Energy difference is local over X 1 ... X

K

Page 40: Graphical Models for Computer Vision

Restoring noisy imagesiid noise

20% flipped

P(Xi=1|Y)

Y

X

Page 41: Graphical Models for Computer Vision

Restoring noisy imagesiid noise

20% flipped

P(Xi=1|Y)

Y

X

Page 42: Graphical Models for Computer Vision
Page 43: Graphical Models for Computer Vision
Page 44: Graphical Models for Computer Vision

Summary• Graphical models permeate computer vision

- Image restoration- Depth estimation- Segmentation / Edge detection- Object Recognition

• A lot of work to do in object recognition/detection

- Better data models- Structure variation

• Need better priors for low/mid-level vision

Page 45: Graphical Models for Computer Vision

Thank youLow-level vision High-level vision

RestorationCorrupted Restoration

Figure 8: Restoration results with an input that has missing values.

8 Summary

We have presented three algorithmic techniques for speeding up the belief propagation approach

for solving low level vision problems formulated in terms of Markov random fields. The main

focus of the paper is on the max-product formulation of belief propagation, and the corresponding

energy minimization problem in terms of costs that are proportional to negative log probabilities.

We also show how similar techniques apply to the sum-product formulation of belief propagation.

The use of our techniques yields results of comparable accuracy to other algorithms but hundreds

of times faster. In the case of stereo we quantified this accuracy using the Middlebury benchmark.

The method is quite straightforward to implement and in many cases should remove the need to

choose between fast local methods that have relatively low accuracy, and slow global methods that

have high accuracy.

The first of the three techniques reduces the time necessary to compute a single message update

from O(k2) to O(k), where k is the number of possible labels for each pixel. For the max-product

formulation this technique is applicable to problems where the discontinuity cost for neighboring

labels is a truncated linear or truncated quadratic function of the difference between the labels. The

method is not an approximation, it uses an efficient algorithm to produce exactly the same results

as the brute force quadratic time method. For sum-product a similar technique yields anO(k log k)

24

Depth estimationFigure 5: Stereo results for the Tsukuba image pair.

0 5 10 15 20 25 30 35 401.8

2

2.2

2.4

2.6

2.8

3 x 104

iterations

ener

gy

multiscalestandard

Figure 6: Energy of stereo solution as a function of the number of message update iterations.

with all but one of the techniques. In each case the running time of the algorithm is controlled by

varying the number of message update iterations. We see that each speedup technique provides

a significant benefit. Note how the min convolution method provides an important speedup even

when the number of labels is small (16 disparities for the Tsukuba images).

Table 1 shows evaluation results of our stereo algorithm on the Middlebury stereo benchmark

[9]. These results were obtained using the parameters described above. Overall our method per-

21

Segmentation

Recognition


Recommended