+ All Categories
Home > Documents > Approximate Bayesian Inference I:

Approximate Bayesian Inference I:

Date post: 24-Feb-2016
Category:
Upload: iona
View: 123 times
Download: 0 times
Share this document with a friend
Description:
Pattern Recognition and Machine Learning Chapter 10. Approximate Bayesian Inference I:. Structural Approximations. Falk LIEDER December 2 2010. Introduction Variational Inference Variational Bayes Applications. Statistical Inference. Z. P(Z|X). X. Hidden States. - PowerPoint PPT Presentation
Popular Tags:
27
Approximate Bayesian Inference I: PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 10 FALK LIEDER DECEMBER 2 2010 Structural Approximations
Transcript
Page 1: Approximate Bayesian Inference  I:

Approximate Bayesian Inference I:

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 10

FALK LIEDER DECEMBER 2 2010

Structural Approximations

Page 2: Approximate Bayesian Inference  I:

Statistical Inference

Introduction Variational Inference Variational Bayes Applications

P(Z|X)Z X

Hidden States Observations Posterior Belief↝

𝔼 [ 𝑓 (𝑍 )∨𝑋 ]

Page 3: Approximate Bayesian Inference  I:

When Do You Need Approximations?The problem with Bayes theorem is that it often leads to integrals that you don’t know how to solve.1. No analytic solution for 2. No analytic solution for 3. In the discrete case computing has complexity 4. Sequential Learning For Non-Conjugate Priors

Introduction Variational Inference Variational Bayes Applications

Page 4: Approximate Bayesian Inference  I:

How to Approximate?

SamplesApproximate Density by HistogramApprox. Expectations by Averages

Structual Approximation

Approximation by a Density of a given Form

Evidence /Expectations of Approximate Density are easy to compute

Numerical integration

Approximate Integrals Numerically:a) Evidence p(x)b) Expectations

Infeasible if Z is high-dimensional

Introduction Variational Inference Variational Bayes Applications

Page 5: Approximate Bayesian Inference  I:

How to Approximate?

Structural Approximations (Variational Inference)

+ Fast to Compute- Systematic Error+ Efficient Representation- Application often requires mathematical derivations+ Learning Rules give Insight

Stochastic Approximations(Monte-Carlo-Methods, Sampling)

- Time-Intensive+ Asymptotically Exact- Storage Intensive + Easily Applicable General Purpose Algorithms

Introduction Variational Inference Variational Bayes Applications

Page 6: Approximate Bayesian Inference  I:

Variational Inference—An Intuition

Probability Distributions

Target Family

True PosteriorVB Approximation

KL-Divergence

Introduction Variational Inference Variational Bayes Applications

Page 7: Approximate Bayesian Inference  I:

What Does Closest Mean?

Intuition: Closest means minimal additional surprise on average.

Kullback-Leibler (KL) divergence measures average additional surprise.

KL[p||q] measures how much less accurate the belief q is than p, if p is the true belief.

KL[p||q] is largest reduction in average surprise that you can achieve, if p is the true belief.

Introduction Variational Inference Variational Bayes Applications

Page 8: Approximate Bayesian Inference  I:

KL-Divergence Illustration

KL [𝑝 (⋅∨𝑋 )∨¿𝑞]≔∫𝒑 (𝒁∨𝑿 ) ⋅𝐥𝐧( 𝒑 (𝒁∨𝑿 )𝒒 (𝒁) )𝑑𝑍

Introduction Variational Inference Variational Bayes Applications

Page 9: Approximate Bayesian Inference  I:

Properties of the KL-Divergence

1. Zero iff both arguments are identical: 2. Greater than zero, if they are different:

DisadvantageThe KL-divergence is not a metric (distance function), becausea) It is not symmetric .b) It does not satisfy the triangle inequality.

KL ¿

Introduction Variational Inference Variational Bayes Applications

Page 10: Approximate Bayesian Inference  I:

How to Find the Closest Target Density?

• Intuition: Minimize Distance• Implementations:– Variational Bayes: Minimize – Expectation Propagation:

• Arbitrariness– Different Measures Different Algorithms &

Different Results– Alternative Schemes are being developed,

e.g. Jaakola-Jordan variational method, Kikuchi-Approximations

Introduction Variational Inference Variational Bayes Applications

Page 11: Approximate Bayesian Inference  I:

Minimizing Functionals• KL-divergence is a functional

Minimizing Functions Minimizing Functionals

Find the root of the derivative Find the root of the functional derivative

Calculus Variational CalculusFunctions that map vectors to real numbers:

Functionals map functions to real numbers

Derivative: Change of for infinitesimal changes in

Functional Derivative: Change of for infinitesimal changes in

Introduction Variational Inference Variational Bayes Applications

Page 12: Approximate Bayesian Inference  I:

VB and the Free-Energy

Variational Bayes: ior

Problem: You can’t evaluate the KL-divergence, because you can’t evaluate the posterior.Solution:

Conclusion:• You can maximize the free-energy instead.

const−ℱ (𝑞 )=−ℒ (𝑞)

Introduction Variational Inference Variational Bayes Applications

Page 13: Approximate Bayesian Inference  I:

VB: Minimizing KL-Divergence is equivalent to Maximizing Free-Energy

ln𝑝 (𝑋 )=ℱ (𝑞)+KL [𝑞∨¿𝑝 ]

(q)

Introduction Variational Inference Variational Bayes Examples

ln𝑝 (𝑋 )

Page 14: Approximate Bayesian Inference  I:

Constrained Free-Energy Maximization

Intuition: • Maximize a Lower Bound on the Log Model Evidence• Maximization is restricted to tractable target densitiesDefinition:

Properties

• The free-energy is maximal for the true posterior.

ℱ (𝑞 )≔∫𝑞 (𝑧 )⋅ ln(𝑝 (𝑋 ,𝑍)𝑞 (𝑍 ) )dz

Introduction Variational Inference Variational Bayes Applications

(q)

Page 15: Approximate Bayesian Inference  I:

Variational Approximations

1. Factorial Approximations (Meanfield)– Independence Assumption – Optimization with respect to factor densities – No Restriction on Functional Form of the factors

2. Approximation by Parametric Distributions– Optimization w.r.t. Parameters

3. Variational Approximations for Model Comparison– Variational Approximation of the Log Model Evidence

Introduction Variational Inference Variational Bayes Applications

Page 16: Approximate Bayesian Inference  I:

Meanfield Approximation

∫𝑞 𝑗 (𝑧 𝑗 ) (∫∏𝑖≠ 𝑗

𝑞𝑖 (𝑧 𝑖 ) ln𝑝 (𝑋 ,𝑍 ) 𝑑𝑧1⋯𝑑𝑧 𝑗−1𝑑 𝑧 𝑗+1⋯𝑑𝑧𝐾 )𝑑 𝑧 𝑗

ln~𝑝 (𝑋 ,𝑍 𝑗 )+const≔𝔼𝑖≠ 𝑗 [ ln𝑝 (𝑋 ,𝑍 )]

Goal: 1. Rewrite as a function of and optimize.2. Optimize separately for each factor

Step 1:

Introduction Variational Inference Variational Bayes Examples

Page 17: Approximate Bayesian Inference  I:

Meanfield Approximation, Step 1

∫𝑞 (𝑧 ) ln𝑞 𝑗 ( 𝑧 𝑗 )𝑑𝑧 1⋯ 𝑑𝑧𝐾+¿∫𝑞 (𝑧 )⋅∑𝑖 ≠ 𝑗ln𝑞𝑖 ( 𝑧𝑖 )𝑑 𝑧1⋯𝑑 𝑧𝐾 ¿

c onst∫𝑞 𝑗 (𝑧 𝑗 ) (∫ ln𝑞 𝑗 (𝑧 𝑗 )𝑑𝑧 1⋯𝑑 𝑧 𝑗 −1𝑑 𝑧 𝑗+1⋯ 𝑑𝑧𝐾 )𝑑𝑧 𝑗

ln𝑞 𝑗(𝑧 𝑗)

𝐹 (𝑞 𝑗 )=∫ q j (𝑧 𝑗 ) ⋅ ln~𝑝 (𝑋 ,𝑍 𝑗)𝑑 𝑧 𝑗−∫𝑞 𝑗 ( 𝑧 𝑗 ) ⋅ ln𝑞 𝑗 (𝑧 𝑗 )𝑑 𝑧 𝑗+const

Introduction Variational Inference Variational Bayes Applications

Page 18: Approximate Bayesian Inference  I:

Meanfield Approximation, Step 2

Notice that .

q̂ j=argmax𝑞 𝑗

ℱ (𝑞 𝑗 )=¿~𝑝 (𝑋 ,𝑍 𝑗 )=exp (𝔼𝑖≠ 𝑗 [ ln𝑝 ( 𝑋 ,𝑍 ) ]+const )¿The constant must be the evidence, because has to integrate to one. Hence,

Introduction Variational Inference Variational Bayes Applications

Page 19: Approximate Bayesian Inference  I:

Meanfield ExampleTrue Distribution: with Target Family: VB meanfield solution:

1. +const2. Hence, and 3. By symmetry

Introduction Variational Inference Variational Bayes Applications

Page 20: Approximate Bayesian Inference  I:

Meanfield ExampleObservation:VB-Approximation is more compact than true density.

Reason:KL[q||p] does not penalize deviations where q is close to 0.

True Density

Approximation

KL ¿

Unreasonable Assumptions Poor Approximation

Introduction Variational Inference Variational Bayes Applications

Page 21: Approximate Bayesian Inference  I:

KL[q||p] vs. KL[p||q]

Variational Bayes• Analytically Easier• Approx. is more compact

Expectation Propagation• More Involved• Approx. is wider

Introduction Variational Inference Variational Bayes Applications

Page 22: Approximate Bayesian Inference  I:

2. Parametric Approximations

• Problem: – You don’t know how to integrate prior times likelihood.

• Solution: – Approximate by .– KL-divergence and free-energy become functions of the

parameters– Apply standard optimization techniques.– Setting derivatives to zero One equation per

parameter.– Solve System of Equations by iterative Updating.

Introduction Variational Inference Variational Bayes Applications

Page 23: Approximate Bayesian Inference  I:

Parametric Approximation ExampleGoal: Learn the Reward Probability p• Likelihood: , • Prior: • Posterior:

ProblemYou cannot derive a learning rule for the expected reward and ist variance, because…a) No Analytic Formula for Expected Reward Probabilityb) Form of Prior Changes with Every Observation

Solution: Approximate the Posterior by a Gaussian.

Z

X{0,1 }

Introduction Variational Inference Variational Bayes Applications

Page 24: Approximate Bayesian Inference  I:

Solution

Solve

Introduction Variational Inference Variational Bayes Applications

Page 25: Approximate Bayesian Inference  I:

Result: A Global Approximation

True PosteriorLaplaceVariational Bayes

Learning Rules for expected reward probability and the uncertainty about it

Sequential Learning Algorithm

Introduction Variational Inference Variational Bayes Applications

Page 26: Approximate Bayesian Inference  I:

VB for Bayesian Model Selection

• Hence, if is uniform .• Problem:

– is “intractable”• Solution:

• Justification:– If, then

Introduction Variational Inference Variational Bayes Applications

Page 27: Approximate Bayesian Inference  I:

SummaryApproximate Bayesian Inference

Structural Approximations

Variational Bayes (Ensemble Learning)

Meanfield Parametric Approx.

Learning Rules, Model Selection

Motivation & Overview VI Intuition VB Maths Applications


Recommended