Approximate Bayesian Inference I:
PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 10
FALK LIEDER DECEMBER 2 2010
≈
Structural Approximations
Statistical Inference
Introduction Variational Inference Variational Bayes Applications
P(Z|X)Z X
Hidden States Observations Posterior Belief↝
𝔼 [ 𝑓 (𝑍 )∨𝑋 ]
When Do You Need Approximations?The problem with Bayes theorem is that it often leads to integrals that you don’t know how to solve.1. No analytic solution for 2. No analytic solution for 3. In the discrete case computing has complexity 4. Sequential Learning For Non-Conjugate Priors
Introduction Variational Inference Variational Bayes Applications
How to Approximate?
SamplesApproximate Density by HistogramApprox. Expectations by Averages
Structual Approximation
Approximation by a Density of a given Form
Evidence /Expectations of Approximate Density are easy to compute
Numerical integration
Approximate Integrals Numerically:a) Evidence p(x)b) Expectations
Infeasible if Z is high-dimensional
Introduction Variational Inference Variational Bayes Applications
How to Approximate?
Structural Approximations (Variational Inference)
+ Fast to Compute- Systematic Error+ Efficient Representation- Application often requires mathematical derivations+ Learning Rules give Insight
Stochastic Approximations(Monte-Carlo-Methods, Sampling)
- Time-Intensive+ Asymptotically Exact- Storage Intensive + Easily Applicable General Purpose Algorithms
Introduction Variational Inference Variational Bayes Applications
Variational Inference—An Intuition
Probability Distributions
Target Family
True PosteriorVB Approximation
KL-Divergence
Introduction Variational Inference Variational Bayes Applications
What Does Closest Mean?
Intuition: Closest means minimal additional surprise on average.
Kullback-Leibler (KL) divergence measures average additional surprise.
KL[p||q] measures how much less accurate the belief q is than p, if p is the true belief.
KL[p||q] is largest reduction in average surprise that you can achieve, if p is the true belief.
Introduction Variational Inference Variational Bayes Applications
KL-Divergence Illustration
KL [𝑝 (⋅∨𝑋 )∨¿𝑞]≔∫𝒑 (𝒁∨𝑿 ) ⋅𝐥𝐧( 𝒑 (𝒁∨𝑿 )𝒒 (𝒁) )𝑑𝑍
Introduction Variational Inference Variational Bayes Applications
Properties of the KL-Divergence
1. Zero iff both arguments are identical: 2. Greater than zero, if they are different:
DisadvantageThe KL-divergence is not a metric (distance function), becausea) It is not symmetric .b) It does not satisfy the triangle inequality.
KL ¿
Introduction Variational Inference Variational Bayes Applications
How to Find the Closest Target Density?
• Intuition: Minimize Distance• Implementations:– Variational Bayes: Minimize – Expectation Propagation:
• Arbitrariness– Different Measures Different Algorithms &
Different Results– Alternative Schemes are being developed,
e.g. Jaakola-Jordan variational method, Kikuchi-Approximations
Introduction Variational Inference Variational Bayes Applications
Minimizing Functionals• KL-divergence is a functional
Minimizing Functions Minimizing Functionals
Find the root of the derivative Find the root of the functional derivative
Calculus Variational CalculusFunctions that map vectors to real numbers:
Functionals map functions to real numbers
Derivative: Change of for infinitesimal changes in
Functional Derivative: Change of for infinitesimal changes in
Introduction Variational Inference Variational Bayes Applications
VB and the Free-Energy
Variational Bayes: ior
Problem: You can’t evaluate the KL-divergence, because you can’t evaluate the posterior.Solution:
Conclusion:• You can maximize the free-energy instead.
const−ℱ (𝑞 )=−ℒ (𝑞)
Introduction Variational Inference Variational Bayes Applications
VB: Minimizing KL-Divergence is equivalent to Maximizing Free-Energy
ln𝑝 (𝑋 )=ℱ (𝑞)+KL [𝑞∨¿𝑝 ]
(q)
Introduction Variational Inference Variational Bayes Examples
ln𝑝 (𝑋 )
Constrained Free-Energy Maximization
Intuition: • Maximize a Lower Bound on the Log Model Evidence• Maximization is restricted to tractable target densitiesDefinition:
Properties
• The free-energy is maximal for the true posterior.
ℱ (𝑞 )≔∫𝑞 (𝑧 )⋅ ln(𝑝 (𝑋 ,𝑍)𝑞 (𝑍 ) )dz
Introduction Variational Inference Variational Bayes Applications
(q)
Variational Approximations
1. Factorial Approximations (Meanfield)– Independence Assumption – Optimization with respect to factor densities – No Restriction on Functional Form of the factors
2. Approximation by Parametric Distributions– Optimization w.r.t. Parameters
3. Variational Approximations for Model Comparison– Variational Approximation of the Log Model Evidence
Introduction Variational Inference Variational Bayes Applications
Meanfield Approximation
∫𝑞 𝑗 (𝑧 𝑗 ) (∫∏𝑖≠ 𝑗
𝑞𝑖 (𝑧 𝑖 ) ln𝑝 (𝑋 ,𝑍 ) 𝑑𝑧1⋯𝑑𝑧 𝑗−1𝑑 𝑧 𝑗+1⋯𝑑𝑧𝐾 )𝑑 𝑧 𝑗
ln~𝑝 (𝑋 ,𝑍 𝑗 )+const≔𝔼𝑖≠ 𝑗 [ ln𝑝 (𝑋 ,𝑍 )]
Goal: 1. Rewrite as a function of and optimize.2. Optimize separately for each factor
Step 1:
Introduction Variational Inference Variational Bayes Examples
Meanfield Approximation, Step 1
∫𝑞 (𝑧 ) ln𝑞 𝑗 ( 𝑧 𝑗 )𝑑𝑧 1⋯ 𝑑𝑧𝐾+¿∫𝑞 (𝑧 )⋅∑𝑖 ≠ 𝑗ln𝑞𝑖 ( 𝑧𝑖 )𝑑 𝑧1⋯𝑑 𝑧𝐾 ¿
c onst∫𝑞 𝑗 (𝑧 𝑗 ) (∫ ln𝑞 𝑗 (𝑧 𝑗 )𝑑𝑧 1⋯𝑑 𝑧 𝑗 −1𝑑 𝑧 𝑗+1⋯ 𝑑𝑧𝐾 )𝑑𝑧 𝑗
ln𝑞 𝑗(𝑧 𝑗)
𝐹 (𝑞 𝑗 )=∫ q j (𝑧 𝑗 ) ⋅ ln~𝑝 (𝑋 ,𝑍 𝑗)𝑑 𝑧 𝑗−∫𝑞 𝑗 ( 𝑧 𝑗 ) ⋅ ln𝑞 𝑗 (𝑧 𝑗 )𝑑 𝑧 𝑗+const
Introduction Variational Inference Variational Bayes Applications
Meanfield Approximation, Step 2
Notice that .
q̂ j=argmax𝑞 𝑗
ℱ (𝑞 𝑗 )=¿~𝑝 (𝑋 ,𝑍 𝑗 )=exp (𝔼𝑖≠ 𝑗 [ ln𝑝 ( 𝑋 ,𝑍 ) ]+const )¿The constant must be the evidence, because has to integrate to one. Hence,
Introduction Variational Inference Variational Bayes Applications
Meanfield ExampleTrue Distribution: with Target Family: VB meanfield solution:
1. +const2. Hence, and 3. By symmetry
Introduction Variational Inference Variational Bayes Applications
Meanfield ExampleObservation:VB-Approximation is more compact than true density.
Reason:KL[q||p] does not penalize deviations where q is close to 0.
True Density
Approximation
KL ¿
Unreasonable Assumptions Poor Approximation
Introduction Variational Inference Variational Bayes Applications
KL[q||p] vs. KL[p||q]
Variational Bayes• Analytically Easier• Approx. is more compact
Expectation Propagation• More Involved• Approx. is wider
Introduction Variational Inference Variational Bayes Applications
2. Parametric Approximations
• Problem: – You don’t know how to integrate prior times likelihood.
• Solution: – Approximate by .– KL-divergence and free-energy become functions of the
parameters– Apply standard optimization techniques.– Setting derivatives to zero One equation per
parameter.– Solve System of Equations by iterative Updating.
Introduction Variational Inference Variational Bayes Applications
Parametric Approximation ExampleGoal: Learn the Reward Probability p• Likelihood: , • Prior: • Posterior:
ProblemYou cannot derive a learning rule for the expected reward and ist variance, because…a) No Analytic Formula for Expected Reward Probabilityb) Form of Prior Changes with Every Observation
Solution: Approximate the Posterior by a Gaussian.
Z
X{0,1 }
ℝ
Introduction Variational Inference Variational Bayes Applications
Solution
Solve
Introduction Variational Inference Variational Bayes Applications
Result: A Global Approximation
True PosteriorLaplaceVariational Bayes
Learning Rules for expected reward probability and the uncertainty about it
Sequential Learning Algorithm
Introduction Variational Inference Variational Bayes Applications
VB for Bayesian Model Selection
• Hence, if is uniform .• Problem:
– is “intractable”• Solution:
• Justification:– If, then
Introduction Variational Inference Variational Bayes Applications
SummaryApproximate Bayesian Inference
Structural Approximations
Variational Bayes (Ensemble Learning)
Meanfield Parametric Approx.
Learning Rules, Model Selection
Motivation & Overview VI Intuition VB Maths Applications