+ All Categories
Home > Documents > Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04]...

Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04]...

Date post: 05-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
42
1 Divergence measures and message passing Tom Minka Microsoft Research Cambridge, UK with thanks to the Machine Learning and Perception Group
Transcript
Page 1: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

1

Divergence measures and message passing

Tom Minka

Microsoft Research

Cambridge, UK

with thanks to the Machine Learning and Perception Group

Page 2: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

2

Message-Passing Algorithms

[Minka 04]PEPPower EP

[Wiegerinck,Heskes 02]FBPFractional belief propagation

[Wainwright,Jaakkola,Willsky

03]

TRWTree-reweighted message passing

[Minka 01]EPExpectation propagation

[Frey,MacKay 97]BPLoopy belief propagation

[Peterson,Anderson 87]MFMean-field

Page 3: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

3

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 4: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

4

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 5: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

5

Estimation Problem

x

y

z

a

b

c

d

f

e

Page 6: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

6

Estimation Problem

x

y

z

a

b

c

d

f

e

0

1 ?

0

1 ?

0

1 ?

Page 7: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

7

Estimation Problem

x

y

z

Page 8: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

8

Estimation Problem

Queries:

Want to do these quickly

Page 9: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

9

Belief Propagation

y

x z

Page 10: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

10

Belief Propagation

x

y

z

Final

Page 11: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

11

Belief Propagation

Marginals: (Exact)

(BP)

Normalizing constant: 0.45 (Exact)

0.44 (BP)

Argmax: (0,0,0) (Exact)

(0,0,0) (BP)

Page 12: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

12

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 13: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

13

Message Passing =

Distributed Optimization

• Messages represent a simpler distribution q(x)that approximates p(x)– A distributed representation

• Message passing = optimizing q to fit p– q stands in for p when answering queries

• Parameters:– What type of distribution to construct (approximating

family)

– What cost to minimize (divergence measure)

Page 14: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

14

How to make a message-passing algorithm

1. Pick an approximating family

• fully-factorized, Gaussian, etc.

2. Pick a divergence measure

3. Construct an optimizer for that measure

• usually fixed-point iteration

4. Distribute the optimization across factors

Page 15: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

15

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 16: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

16

Kullback-Leibler (KL) divergence

Let p,q be unnormalized distributions

Alpha-divergence (α is any real number)

Asymmetric, convex

Page 17: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

17

Examples of alpha-divergence

Page 18: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

18

Minimum alpha-divergence

q is Gaussian, minimizes Dα(p||q)

α = -∞

Page 19: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

19

Minimum alpha-divergence

q is Gaussian, minimizes Dα(p||q)

α = 0

Page 20: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

20

Minimum alpha-divergence

q is Gaussian, minimizes Dα(p||q)

α = 0.5

Page 21: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

21

Minimum alpha-divergence

q is Gaussian, minimizes Dα(p||q)

α = 1

Page 22: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

22

Minimum alpha-divergence

q is Gaussian, minimizes Dα(p||q)

α = ∞

Page 23: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

23

Properties of alpha-divergence

• α ≤ 0 seeks the mode with largest mass (not tallest)

– zero-forcing: p(x)=0 forces q(x)=0

– underestimates the support of p

• α ≥ 1 stretches to cover everything

– inclusive: p(x)>0 forces q(x)>0

– overestimates the support of p

[Frey,Patrascu,Jaakkola,Moran 00]

Page 24: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

24

Structure of alpha space

α0 1

zero

forcing

inclusive (zero

avoiding)

MFBP,

EP

FBP,

PEP

TRW

Page 25: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

25

• If q is an exact minimum of alpha-divergence:

• Normalizing constant:

• If α=1: Gaussian q matches mean,variance of p

– Fully factorized q matches marginals of p

Other properties

Page 26: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

26

Two-node example

• q is fully-factorized, minimizes α-divergence to p

• q has correct marginals only for α = 1 (BP)

x y

Page 27: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

27

Two-node example

α = 1 (BP)

Bimodal

distributionBadGood

•Marginals

•Mass

•Zeros

•One peak

•Zeros

•Peak

heights

•Marginals

•Mass

α = 0 (MF)

α ≤ 0.5

Page 28: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

28

Two-node example

α = ∞

Bimodal

distributionBadGood

•Zeros

•Marginals

•Peak

heights

Page 29: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

29

Lessons

• Neither method is inherently superior –

depends on what you care about

• A factorized approx does not imply

matching marginals (only for α=1)

• Adding y to the problem can change the

estimated marginal for x (though true

marginal is unchanged)

Page 30: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

30

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 31: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

31

Distributed divergence minimization

Page 32: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

32

• Write p as product of factors:

• Approximate factors one by one:

• Multiply to get the approximation:

Distributed divergence minimization

Page 33: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

33

Global divergence to local divergence

• Global divergence:

• Local divergence:

Page 34: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

34

Message passing

• Messages are passed between factors

• Messages are factor approximations:

• Factor a receives

– Minimize local divergence to get

– Send to other factors

– Repeat until convergence

• Produces all 6 algs

Page 35: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

35

Global divergence vs. local divergence

In general, local ≠ global

• but results are similar

• BP doesn’t minimize global KL, but comes

close

0

MF

αlocal = global

no loss from

message passing

local ≠ global

Page 36: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

36

Experiment

• Which message passing algorithm is

best at minimizing global Dα(p||q)?

• Procedure:

1. Run FBP with various αL

2. Compute global divergence for various

αG

3. Find best αL (best alg) for each αG

Page 37: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

37

Results

• Average over 20 graphs, random singleton and pairwise potentials: exp(wijxixj)

• Mixed potentials (w ~ U(-1,1)):

– best αL = αG (local should match global)

– FBP with same α is best at minimizing Dα• BP is best at minimizing KL

Page 38: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

38

Outline

• Example of message passing

• Interpreting message passing

• Divergence measures

• Message passing from a divergence measure

• Big picture

Page 39: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

39

Hierarchy of algorithms

BP

• fully factorized

• KL(p||q)

EP

• exp family

• KL(p||q)

FBP

• fully factorized

• Dα(p||q)

Power EP

• exp family

• Dα(p||q)

MF

• fully factorized

• KL(q||p)

TRW

• fully factorized

• Dα(p||q),α>1

Structured MF

• exp family

• KL(q||p)

Page 40: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

40

Matrix of algorithms

BP

• fully factorized

• KL(p||q)

EP

• exp family

• KL(p||q)

FBP

• fully factorized

• Dα(p||q)

Power EP

• exp family

• Dα(p||q)

divergence

measure

Other families?

(mixtures)

MF

• fully factorized

• KL(q||p)

TRW

• fully factorized

• Dα(p||q),α>1approximation family

Structured MF

• exp family

• KL(q||p)

Other

divergences?

Page 41: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

41

Other Message Passing Algorithms

Do they correspond to divergence measures?

• Generalized belief propagation [Yedidia,Freeman,Weiss 00]

• Iterated conditional modes [Besag 86]

• Max-product belief revision

• TRW-max-product [Wainwright,Jaakkola,Willsky 02]

• Laplace propagation [Smola,Vishwanathan,Eskin 03]

• Penniless propagation [Cano,Moral,Salmerón 00]

• Bound propagation [Leisink,Kappen 03]

Page 42: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky

42

Future work

• Understand existing message passing

algorithms

• Understand local vs. global divergence

• New message passing algorithms:

– Specialized divergence measures

– Richer approximating families

• Other ways to minimize divergence


Recommended