+ All Categories
Home > Documents > Belief Propagation and its Generalizations

Belief Propagation and its Generalizations

Date post: 30-Dec-2015
Category:
Upload: nicholas-lamb
View: 54 times
Download: 5 times
Share this document with a friend
Description:
Belief Propagation and its Generalizations. Shane Oldenburger. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation - PowerPoint PPT Presentation
41
Belief Propagation and its Generalizations Shane Oldenburger
Transcript
Page 1: Belief Propagation and its Generalizations

Belief Propagation and its Generalizations

Shane Oldenburger

Page 2: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 3: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 4: Belief Propagation and its Generalizations

Recall from the Jointree Algorithm

We separate evidence e into:

e+: denotes evidence pertaining to ancestors

e-: denotes evidence pertaining to descendants

BEL(X) = P(X|e) = P(X|e+,e-) = P(e-|X,e+)*P(X|e+)/P(e-|e+) = P(e-|X)P(X|e+) = *(X)*(X): messages from parents: messages from children: normalization constant

Page 5: Belief Propagation and its Generalizations

Pearl’s Belief Propagation Algorithm:Initialization

Nodes with evidence (xi) = 1 where xi = ei; 0 otherwise

(xi) = 1 where xi = ei; 0 otherwise

Nodes with no parents (xi) = p(xi) //prior probabilities

Nodes with no children (xi) = 1

Page 6: Belief Propagation and its Generalizations

Pearl’s BP algorithmIterate

For each X: If all messages from parents of X have arrived, combine

into (X) If all messages from children of X have arrived, combine

into (X) If (X) has been computed and all messages other than

from Yi have arrived, calculate and send message XYi to child Yi

If (X) has been computed and all messages other than from Ui have arrived, calculate and send message XUi to parent Ui

Compute BEL(X) = *(X)*(X)

Page 7: Belief Propagation and its Generalizations

Example of data propagation in a simple tree

Page 8: Belief Propagation and its Generalizations

BP propertiesExact for Polytrees Only one path between any two nodes Each node X separates graph into two disjoint

graphs (e+, e-)

But most graphs of interest are not Polytrees – what do we do? Exact inference

Cutset conditioning Jointree method

Approximate inference Loopy BP

Page 9: Belief Propagation and its Generalizations

Loopy BP

In the simple tree example, a finite number of messages where passed

In a graph with loops, messages may be passed around indefinitely Stop when beliefs converge Stop after some number of iterations

Loopy BP tends to achieve good empirical results Low-level computer vision problems Error-correcting codes: Turbocodes, Gallager codes

Page 10: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 11: Belief Propagation and its Generalizations

Markov Random Fields

BP algorithms have been developed for many graphical modelsPairwise Markov Random Fields are used in this paper for ease of presentationAn MRF consists of “observable” nodes and “hidden” nodes Since it is pairwise, each observable node is

connected to exactly one hidden node, and each hidden is connected to at most one observable node

Page 12: Belief Propagation and its Generalizations

Markov Random Fields

Two hidden variables xi and xj are connected by a “compatibility function” ij(xi, yi)Hidden variable xi is connected to observable variable yi by “evidence function” i(xi, yi) = xi(xi)The joint probability for a pairwise MRF isp({x}) = (1/Z) ijij(xi, yi) ixi(xi)

The BP algorithm for pairwise MRFs is similar to that for Bayesian Networks

Page 13: Belief Propagation and its Generalizations

Conversion between graphical models

We can limit ourselves to considering pairwise MRFs

Any pairwise MRF or BN can be converted to an equivalent “Factor graph”

Any factor graph can be converted into an equivalent pairwise MRF or BN

Page 14: Belief Propagation and its Generalizations

An intermediary model

A factor graph is composed of “variable” nodes represented by circles “function” nodes represented by squares

Factor graphs are a generalization of Tanner graphs, where the “function” nodes are parity checks of its connected variablesA function node for a factor graph can be any arbitrary function of the variables connected to it

Page 15: Belief Propagation and its Generalizations

From pairwise MRF to BN

Page 16: Belief Propagation and its Generalizations

From BN to pairwise MRF

Page 17: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 18: Belief Propagation and its Generalizations

Gibbs Free Energy

Gibbs free energy is the difference in the energy of a system from an initial state to a final state of some process (e.g. chemical reaction)

For a chemical reaction, if the Gibbs free energy is negative then the reaction is “spontaneous”, or “allowed”

If the Gibbs free energy is non-negative, the reaction is “not allowed”

Page 19: Belief Propagation and its Generalizations

Gibbs free energy

Instead of difference in energy of a chemical process, we want to define Gibbs free energy in term of the difference between a target probability distribution p and an approximate probability distribution bDefine the “distance” between p({x}) and b({x}) as D(b({x}) || p({x})) = {x}b({x}) ln[b({x})/ p({x})] This is known as the Kullback-Liebler distance

Boltzmann’s law: p({x}) = (1/Z) e-E({x})/T

Generally assumed by statistical physicists Here we will use Boltzmann’s law as our definition of “energy” E T acts as a unit scale parameter; let T = 1

Substituting Boltzmann’s law into our distance measure D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z

Page 20: Belief Propagation and its Generalizations

Gibbs free energy

Our distance measure D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z

We see will be zero (p = b) when G(b({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] = U(b({x}) - S(b({x})

is minimized at F = -ln Z G: “Gibbs free energy” F: “Helmholz free energy” U: “average energy” S: “entropy”

Page 21: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 22: Belief Propagation and its Generalizations

Bethe approximation

We would like to derive Gibbs free energy in terms of one- and two-node beliefs bi and bij

Due to the pairwise nature of pairwise MRFs, bi and bij are sufficient to compute the average energy U U = - ijbij(xi,xj)lnij(xi,xj) - ibi(xi)lni(xi)

The exact marginals probabilities pi and pij yeild the same form, so this average energy is exact if the one- and two-node beliefs are exact

Page 23: Belief Propagation and its Generalizations

Bethe approximation

The entropy term is more problematic Usually must settle for an approximation

Entropy can be computed exactly if it can be explicitly expressed in terms of one- and two-node beliefs B({x}) = ij bij(xi,xj) / i bi(xi)qi-1 where qi = #neighbors of xi

Then the Bethe approximation to entropy is SBethe = ij xixjbij(xi,xj)lnbij(xi,xj) + (qi -1) xibi(xi)lnbi(xi)

For singly connected networks, this is exact and GBethe = U – SBethe corresponds to the exact marginal probabilities p

For graphs with loops, this is only an approximation (but usually a good one)

Page 24: Belief Propagation and its Generalizations

Equivalence of BP and Bethe

The Bethe approximation is exact for pairwise MRF’s when the graphs contain no loops, so the Bethe free energy is minimal for the correct marginalsBP gives correct marginals when the graph contains no loopsThus, when there are no loops, the BP beliefs are the global minima of the Bethe free energyWe can say more: a set of beliefs gives a BP fixed point in any graph iff they are local stationary points of the Bethe free energy This can be shown by adding Lagrange multipliers to

GBethe to enforce the marginalization constraints

Page 25: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 26: Belief Propagation and its Generalizations

Kikuchi approximation

Kikuchi approximation is an improvement on and generalization of Bethe

With this association between BP and the Bethe approximation to Gibbs free energy, can we use better approximation methods to craft better BP algorithms?

Page 27: Belief Propagation and its Generalizations

Cluster variational method

Free energy approximated as a sum of local free energies of sets of regions of nodes“Cluster variational method” provides a way to select the set of regions Begin with a basic set of clusters

including every interaction and node Subtract the free energies of over-

counted intersection regions Add back over-counted intersections of

intersections, etc.

Bethe is a Kikuchi approximation where the basic clusters are set to the set of all pairs of hidden nodes

Page 28: Belief Propagation and its Generalizations

Cluster variational method

Bethe regions involve one or two nodesDefine local free energy of a single nodeGi(bi(xi)) = xibi(xi)*ln(bi(xi) + Ei(xi))

Define local free energy involving two nodesGij(bi(xi,xj)=xi,xjbij(xi,xj)*ln(bij(xi,xj) + Eij(xi,xj))

Then for the regions corresponding to Bethe,GBethe = G12 + G23 + G45 + G56 + G14 + G25 +

G36 – G1 – G3 – G4 – G6 – 2G2 – 2G5

Page 29: Belief Propagation and its Generalizations

Cluster variational method

For the Kikuchi example shown below, regions involve four nodes

Extend the same logic as before

Define local free energy involving four nodese.g. G1245(b1245(x1,x2,x4,x5) = x1,x2,x4,x5

b1245(x1,x2,x4,x5)* ln(b1245(x1,x2,x4,x5) + E1245(x1,x2,x4,x5))

Then for the Kikuchi regions shown,GKikuchi = G1245 + G2356 – G25

Page 30: Belief Propagation and its Generalizations

A more general example

Now we have basic regions [1245], [2356], [4578], [5689]

Intersection regions [25], [45], [56], [58], and

Intersection of intersection region [5]

Then we have

GKikuchi = G1245 + G2356 + G4578 + G5689 - G25 - G45 - G56 - G58 + G5

Page 31: Belief Propagation and its Generalizations

Outline

The BP algorithm

MRFs – Markov Random Fields

Gibbs free energy

Bethe approximation

Kikuchi approximation

Generalized BP

Page 32: Belief Propagation and its Generalizations

Generalized BP

We show how to construct a GBP algorithm for this example

First find the intersections, intersections of intersections, etc. of the basic clusters Basic: [1245], [2356], [4578], [5689] Intersections: [25], [45], [56], [58] Intersection of intersections: [5]

Page 33: Belief Propagation and its Generalizations

Region Graph

Next, organize regions into the region graphA hierarchy of regions and their “direct” subregions

”direct” subregions are subregions not contained in another subregion

e.g. [5] is a subregion of [1245], but is also a subregion of [25]

Page 34: Belief Propagation and its Generalizations

Messages

Construct messages from all regions r to direct subregions sThese correspond to each edge of the region graphConsider the message from region [1245]

to subregion [25]A message from nodes not in the

subregion (1,4) to those in the subregion (2,5) m1425

Page 35: Belief Propagation and its Generalizations

Belief Equations

Construct belief equations for every region r

br({x}r) proportional to each compatibility matrix and evidence term completely contained in r b5 = k[5][m25m45m65m85]

b45 = k[4545][m1245m7845m25m65m85]

b1245 = k[124512142545] [m3625m7845m65m85]

Page 36: Belief Propagation and its Generalizations

Belief Equations

b5 = k[5][m25m45m65m85]

Page 37: Belief Propagation and its Generalizations

Belief Equations

b45 = k[4545][m1245m7845m25m65m85]

Page 38: Belief Propagation and its Generalizations

Belief Equationsb1245=k[124512142545][m3625m7845m65m85]

Page 39: Belief Propagation and its Generalizations

Enforcing Marginalization

Now, we need to enforce the marginalization condition relating each pair of regions that share an edge in the hierarchy

e.g. between [5] and [45]b5(x5) = x4b45(x4, x5)

Page 40: Belief Propagation and its Generalizations

Message Update

Adding the marginalization into the belief equations, we get the message update rule:

m45(x5)

k x4,x2 4(x4)45(x4,x5)m1245(x4,x5)m7825(x2,x5)

The collection of belief equations and the message update rules define out GBP algorithm

Page 41: Belief Propagation and its Generalizations

Complexity of GBP

Bad news: running time grows exponentially with the size of the basic clusters chosenGood news: if the basic clusters encompass the shortest loops in the graphical model, usually nearly all the error from BP is eliminatedThis usually requires only a small addition

amount of computation than BP


Recommended