Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Extending Expectation Propagation for Graphical Models
Yuan (Alan) Qi
Joint work with Tom Minka
Motivation
• Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.
• Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency.
• Need a method that better balances the trade-off between accuracy and efficiency.
Motivation
Computational Time
Err
orCurrentTechniques
What we want
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Graphical Models
Directed
( Bayesian networks)
Undirected
( Markov networks)
x1 x2
y1 y2
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx, a
aZp )(
1)( yx,yx,
Inference on Graphical Models
• Bayesian inference techniques:– Belief propagation (BP): Kalman filtering
/smoothing, forward-backward algorithm– Monte Carlo: Particle filter/smoothers,
MCMC• Loopy BP: typically efficient, but not accurate
on general loopy graphs• Monte Carlo: accurate, but often not efficient
Expectation Propagation in a Nutshell
• Approximate a probability distribution by simpler parametric terms:
For directed graphs:
For undirected graphs:
• Each approximation term lives in an exponential family (e.g. Gaussian)
a
afp )()( xx a
afq )(~
)( xx
)(~
xaf
)|()( ajaia xxpf x
)()( aaf xx
EP in a Nutshell• The approximate term minimizes the
following KL divergence by moment matching:
))()(~
||)()((minarg \\
)(~
xxxxx
aa
aa
f
qfqfD
a
)(~
)()(
~)(\
x
xxx
aabb
a
f
qfq
Where the leave-one-out approximation is
)(~
xaf
Limitations of Plain EP
• Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence.
• Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. – Tree-structured q(x): )()( iji xq,xxq
j
x
Three Extensions1. Instead of choosing the approximate term to
minimize the following KL divergence:
))()(~
||)()((minarg \\
)(~
xxxxx
aa
aa
f
qfqfD
a
)(~
xaf
use other criteria.2. Use numerical approximation to compute moments: Quadrature or Monte Carlo.
3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.
Efficiency vs. Accuracy
Computational Time
Err
or
Extended EP ?
Monte Carlo
Loopy BP (Factorized EP)
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Object Tracking
Guess the position of an object given noisy observations
1y
4y
Object
1x2x
3x
4x
2y
3y
Bayesian Network
ttt νxx 1
noise tt xy
(random walk)e.g.
want distribution of x’s given y’s
x1 x2 xT
y1 y2 yT
Approximation
1
1111 )|()|()|()(),(t
tttt xypxxpxypxpp yx
1
111111 )(~)(~)(~)(~)()(t
tttttttt xoxpxpxoxpq x
Factorized and Gaussian in x
Message Interpretation
)(~)(~)(~)( 11 tttttttt xpxoxpxq
= (forward msg)(observation msg)(backward msg)
xt
yt
Forward Message
Backward Message
Observation Message
Extensions of EP• Instead of matching moments, use any
method for approximate filtering.– Examples: statistical linearization, unscented
Kalman filter (UKF), mixture of Kalman filtersTurn any deterministic filtering method into a
smoothing method! All methods can be interpreted as finding
linear/Gaussian approximations to original terms.
• Use quadrature or Monte Carlo for term approximations
Example: Poisson Tracking
• is an integer valued Poisson variate with mean )exp( tx
ty
Poisson Tracking Model
)01.0,(~)|( 11 ttt xNxxp
)100,0(~)( 1 Nxp
!/)exp()|( tx
tttt yexyxyp t
Extended EP vs. Monte Carlo: Accuracy
Variance
Mean
Accuracy/Efficiency Tradeoff
Bayesian network for Wireless Signal Detection
x1 x2 xT
y1 y2 yT
s1 s2 sT
si: Transmitted signals
xi: Channel coefficients for digital wireless communications
yi: Received noisy observations
Extended-EP Joint Signal Detection and Channel Estimation
• Turn mixture of Kalman filters into a smoothing method
• Smoothing over the last observations
• Observations before act as prior for the current estimation
)( t
Computational Complexity• Expectation propagation O(nLd2)
• Stochastic mixture of Kalman filters O(LMd2)
• Rao-blackwellised particle smoothers O(LMNd2)
n: Number of EP iterations (Typically, 4 or 5)
d: Dimension of the parameter vector
L: Smooth window length
M: Number of samples in filtering (Often larger than 500)
N: Number of samples in smoothing (Larger than 50)
EP is about 5,000 times faster than Rao-blackwellised particle smoothers.
Experimental Results
EP outperforms particle smoothers in efficiency with comparable accuracy.
(Chen, Wang, Liu 2000)
Signal-Noise-Ratio Signal-Noise-Ratio
Bayesian Networks for Adaptive Decoding
x1 x2 xT
y1 y2 yT
e1 e2 eT
The information bits et are coded by a convolutional
error-correcting encoder.
EP Outperforms Viterbi Decoding
Signal-Noise-Ratio
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Inference on Loopy Graphs
Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.
X1 X2 X3 X4
X5 X6 X7 X8
X9 X10 X11 X12
X13 X14 X15 X16
4-node Loopy Graph
4x
2x 3x
1x
a
afp )()( xx
Joint distribution is product of pairwise potentials for all edges:
Want to approximate by a simpler distribution)(xp
BP vs. TreeEP
4x
2x 3x
1x
4x
2x 3x
4x
2x 3x
1x1xBP TreeEP
Junction Tree Representation
p(x) q(x) Junction tree
p(x) q(x) Junction tree
Two Kinds of Edges
• On-tree edges, e.g., (x1,x4): exactly incorporated into the junction tree
• Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure
4x
2x 3x
1x
KL Minimization • KL minimization moment matching
• Match single and pairwise marginals of
• Reduces to exact inference on single loops– Use cutset conditioning
4x
2x 3x
1x
4x
2x 3x
1x
and
Matching Marginals on Graph
(1) Incorporate edge (x3 x4)
x3 x4
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
(2) Incorporate edge (x6 x7)
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x6 x7
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
Drawbacks of Global Propagation
• Update all the cliques even when only incorporating one off-tree edge – Computationally expensive
• Store each off-tree data message as a whole tree– Require large memory size
Solution: Local Propagation
• Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end.
• Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge.– Reduce computational complexity
– Save memory
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x3 x4
x1 x2
x1 x3 x1 x4
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x3 x5
x6 x7
x5 x7
x3 x5
x3 x6
(1) Incorporate edge(x3 x4)
(3) Incorporate edge (x6 x7)
(2) Propagate evidence
On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP
New Interpretation of TreeEP
• Marry EP with Junction algorithm
• Can perform efficiently over hypertrees and hypernodes
4-node Graph
TreeEP = the proposed method GBP = generalized belief propagation on triangles TreeVB = variational tree BP = loopy belief propagation = Factorized EP MF = mean-field
Fully-connected graphs
Results are averaged over 10 graphs with randomly generated potentials• TreeEP performs the same or better than all other methods in both accuracy and efficiency!
8x8 grids, 10 trials
Method FLOPS Error
Exact 30,000 0
TreeEP 300,000 0.149
BP/double-loop 15,500,000 0.358
GBP 17,500,000 0.003
TreeEP versus BP and GBP
• TreeEP is always more accurate than BP and is often faster
• TreeEP is much more efficient than GBP and more accurate on some problems
• TreeEP converges more often than BP and GBP
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian networks for dynamic systems– Poisson tracking– Signal detection for wireless communications
• Tree-structured EP on loopy graphs
• Conclusions and future work
Conclusions
• Extend EP on graphical models:– Instead of minimizing KL divergence, use other
sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method.
– Use quadrature to approximate messages.– Local propagation to save the computation and
memory in tree structured EP.
Conclusions
• Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency
Computational Time
Err
or
Extended EP
State-of-artTechniques
Future Work
• More extensions of EP:– How to choose a sensible approximation family (e.g.
which tree structure)
– More flexible approximation: mixture of EP?
– Error bound?
– Bayesian conditional random fields
– EP for optimization (generalize max-product)
• More real-world applications, e.g., classification of gene expression data.
Classifying Colon Cancer Data by Predictive Automatic Relevance Determination
• The task: distinguish normal and cancer samples
• The dataset: 22 normal and 40 cancer samples with 2000 features per sample.
• The dataset was randomly split 100 times into 50 training and 12 testing samples.
• SVM results from Li et al. 2002