Particle Markov chain Monte Carlo

transcript

Fredrik Lindsten

Division of Automatic ControlLinköping University, Sweden

Fredrik LindstenParticle Markov chain Monte Carlo

AUTOMATIC CONTROLREGLERTEKNIK

LINKÖPINGS UNIVERSITET

Particle MCMC (PMCMC) introduced in the seminal paper,

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlomethods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

More on backward simulation in PMCMC,

N. Whiteley, C. Andrieu and A. Doucet, “Efficient Bayesian Inference forSwitching State-Space Models using Discrete Particle Markov Chain MonteCarlo methods”, Bristol Statistics Research Report 10:04, 2010.

F. Lindsten, M. I. Jordan and T. B. Schön, “Ancestral Sampling for ParticleGibbs”, NIPS (accepted), 2012.

F. Lindsten, T. B. Schön and M. I. Jordan, “Data driven Wiener systemidentification”, Submitted to Automatica, 2012.

Bayesian system identification

Consider a nonlinear, discrete-time state-space model,

xt+1 = ft(xt, ut; θ) + vt(θ),yt = ht(xt, ut; θ) + et(θ).

We observe

DT = {ut, yt}Tt=1.

Bayesian model: θ random variable with prior density π(θ).

Aim: Find p(θ | DT).

Gibbs sampler for SSMs

Aim: Find p(θ | DT).

Alternate between updating θ and updating x1:T.

MCMC: Gibbs sampling for state-space models. Iterate,

• Draw θ[r] ∼ p(θ | x1:T[r− 1], DT);

• Draw x1:T[r] ∼ p(x1:T | θ[r], DT).

The above procedure results in a Markov chain,

{θ[r], x1:T[r]}r≥1

with stationary distribution p(θ, x1:T | DT).

Aim: Find p(θ, x1:T | DT).

• Draw θ[r] ∼ p(θ | x1:T[r− 1], DT);

• Draw x1:T[r] ∼ p(x1:T | θ[r], DT).

{θ[r], x1:T[r]}r≥1

• Draw θ[r] ∼ p(θ | x1:T[r− 1], DT);

• Draw x1:T[r] ∼ p(x1:T | θ[r], DT).

{θ[r], x1:T[r]}r≥1

• Draw θ[r] ∼ p(θ | x1:T[r− 1], DT);

• Draw x1:T[r] ∼ p(x1:T | θ[r], DT).

{θ[r], x1:T[r]}r≥1

Gibbs sampler

ex) Sample from,

2 11 1

Gibbs sampler

• Draw x′ ∼ p(x | y);• Draw y′ ∼ p(y | x′).

Linear Gaussian state-space model

ex) Gibbs sampling for linear system identification.[xt+1yt

[A BC D

] [xtut

Iterate,• Draw θ′ ∼ p(θ | x1:T, DT);• Draw x′1:T ∼ p(x1:T | θ′, DT).

0 0.5 1 1.5 2 2.5 3−10

Frequency (rad/s)

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

Frequency (rad/s)

Phase(deg)

TruePosterior mean95 % credibility

Gibbs sampler for general SSM?

What about the general nonlinear/non-Gaussian case?

• Draw θ′ ∼ p(θ | x1:T, DT);

• Draw x′1:T ∼ p(x1:T | θ′, DT).

Problem: p(x1:T | θ, DT) not available!

Idea: Approximate p(x1:T | θ, DT) using particle smoother.

• Draw θ′ ∼ p(θ | x1:T, DT); OK!

• Draw x′1:T ∼ p(x1:T | θ′, DT). Hard!

• Draw θ′ ∼ p(θ | x1:T, DT); OK!

• Draw x′1:T ∼ p(x1:T | θ′, DT). Hard!

Backward simulator

Sampling strategy:

• Run a particle filter

• Sample a trajectory

x′1:Tapprox.∼ p(x1:T | θ, DT)

5 10 15 20 25−4

S. J. Godsill, A. Doucet and M. West, “Monte Carlo Smoothing for NonlinearTime Series”, Journal of the American Statistical Association, 99:156-168,2004.

Backward simulator

Sampling strategy:

Backward simulator

Sampling strategy:

5 10 15 20 25−4

Backward simulator

Sampling strategy:

5 10 15 20 25−4

Backward simulator

Sampling strategy:

Problems

Problems with this approach,

• Based on particle filter (PF)⇒ approximate sample.

• Relies on large N to be successful.

• A lot of wasted computations.

To get around these problems,

Analyze PF + MCMC together⇒ PMCMC

Problems

Problems with this approach,

• Based on particle filter (PF)⇒ approximate sample.

• Relies on large N to be successful.

• A lot of wasted computations.

To get around these problems,

Analyze PF + MCMC together⇒ PMCMC

Particle Markov chain Monte Carlo,

• Combines PF and MCMC in a systematic manner.

• “Exact approximation” of MCMC samplers.• Family of Bayesian inference methods,

• Particle Metropolis-Hastings (PMH)• Particle Gibbs (PG)

Particle Markov chain Monte Carlo,

• “Exact approximation” of MCMC samplers.• Family of Bayesian inference methods,

• Particle Metropolis-Hastings (PMH)• Particle Gibbs (PG) – with backward simulation

The particle filter

• Resampling: {xi1:t−1, wi

t−1}Ni=1 → {x̃i

1:t−1, 1/N}Ni=1.

• Propagation: xit ∼ Rθ

t (dxt | x̃i1:t−1) and xi

1:t = {x̃i1:t−1, xi

• Weighting: wit = Wθ

t (xi1:t).

⇒ {xi1:t, wi

t}Ni=1

Weighting Resampling Propagation Weighting Resampling

The particle filter

• Resampling + Propagation:

(ait, xi

t) ∼ Mθt (at, xt) =

watt−1

∑l wlt−1

Rθt (xt | xat

1:t−1).

• Weighting: wit = Wθ

t (xi1:t).

⇒ {xi1:t, wi

t}Ni=1

Weighting Resampling Propagation Weighting Resampling

A closer look at the PF

Random variables generated by the PF. Let,

xt = {x1t , . . . , xN

t }, at = {a1t , . . . , aN

The PF generates a single sample on XNT × {1, . . . , N}N(T−1) withdensity,

ψθ(x1:T, a2:T) ,N

∏i=1

Rθ1(x

∏t=2

∏i=1

Mθt (a

it, xi

Extended target density

What is the target density?

• Must admit p(x1:T, θ | DT) as a marginal.

• As close as possible to ψ.

Let xk1:T = xb1:T

1:T = {xb11 , . . . , xbT

T } be a specific path.

Introduce extended target,

φ(θ, x1:T, a2:T, k) = φ(θ, xb1:T1:T , b1:T)φ(x

−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T)

,p(xb1:T

1:T , θ | DT)

︸︷︷︸marginal

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

︸︷︷︸conditional

Let xk1:T = xb1:T

1:T = {xb11 , . . . , xbT

−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T)

,p(xb1:T

1:T , θ | DT)

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

Let xk1:T = xb1:T

1:T = {xb11 , . . . , xbT

−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T)

,p(xb1:T

1:T , θ | DT)

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

Let xk1:T = xb1:T

1:T = {xb11 , . . . , xbT

−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T)

,p(xb1:T

1:T , θ | DT)

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

Particle Gibbs with backward simulation (PG-BS)

Multi-stage Gibbs sampler, targeting φ,

i) Draw θ′ ∼ φ(θ | xb1:T1:T , b1:T);

ii) Draw {x′,−b1:T1:T , a′,−b2:T

2:T } ∼ φ(x−b1:T1:T , a−b2:T

2:T | θ′, xb1:T1:T , b1:T);

iii) Draw, for t = T, . . . , 1,

b′t ∼ φ(bt | θ′, x′,−b1:t1:t , a′,−b2:t

2:t , xb1:T1:T , b′t+1:T).

Step i) By construction,

φ(θ | xb1:T1:T , b1:T) = p(θ | xb1:T

1:T , DT).

Sampling is assumed to be feasible.

Particle Gibbs with backward simulation (PG-BS)

Multi-stage Gibbs sampler, targeting φ,

i) Draw θ′ ∼ φ(θ | xb1:T1:T , b1:T);

ii) Draw {x′,−b1:T1:T , a′,−b2:T

2:T } ∼ φ(x−b1:T1:T , a−b2:T

2:T | θ′, xb1:T1:T , b1:T);

iii) Draw, for t = T, . . . , 1,

b′t ∼ φ(bt | θ′, x′,−b1:t1:t , a′,−b2:t

2:t , xb1:T1:T , b′t+1:T).

Step i) By construction,

φ(θ | xb1:T1:T , b1:T) = p(θ | xb1:T

1:T , DT).

Sampling is assumed to be feasible.

PG-BS, Step ii)

Step ii) By construction,

φ(x−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T) =

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

Conditional PF (conditioned on {x′1:T, b1:T}),1. Initialize (t = 1):

(a) Draw xi1 ∼ Rθ

1(x1) for i 6= b1 and set xb11 = x′1.

(b) Set wi1 = Wθ

1(xi1) for i = 1, . . . , N.

2. for t = 2, . . . , T:(a) Draw (ai

t, xit) ∼ Mθ

t (at, xt) for i 6= bt.

(b) Set xbtt = x′t and abt

t = bt−1.

(c) Set xi1:t = {x

1:t−1, xit} and wi

t = Wθt (x

i1:t) for i = 1, . . . , N.

PG-BS, Step ii)

Step ii) By construction,

φ(x−b1:T1:T , a−b2:T

2:T | θ, xb1:T1:T , b1:T) =

∏i=1i 6=b1

Rθ1(x

∏t=2

∏i=1i 6=bt

Mθt (a

it, xi

Conditional PF (conditioned on {x′1:T, b1:T}),1. Initialize (t = 1):

(a) Draw xi1 ∼ Rθ

1(x1) for i 6= b1 and set xb11 = x′1.

(b) Set wi1 = Wθ

1(xi1) for i = 1, . . . , N.

2. for t = 2, . . . , T:(a) Draw (ai

t, xit) ∼ Mθ

t (at, xt) for i 6= bt.

(b) Set xbtt = x′t and abt

t = bt−1.

(c) Set xi1:t = {x

1:t−1, xit} and wi

t = Wθt (x

i1:t) for i = 1, . . . , N.

PG-BS, Step iii)

Step iii) Sequence of Gibbs steps. For t = T, . . . , 1, draw,

bt ∼ φ(bt | x1:t, a2:t, xbt+1:Tt+1:T, bt+1:T) (?)

By expanding

p(x1:t | θ, Dt) ∝ Wθt (x1:t)Rθ

t (xt | x1:t−1)p(x1:t−1 | θ, Dt−1),

we can show that (?) corresponds to

P(bt = i) ∝ wit p(xbt+1

t+1 | θ, xit).

Sampling b1:T corresponds exactly to a run of a backward simulator!

PG-BS, Step iii)

Step iii) Sequence of Gibbs steps. For t = T, . . . , 1, draw,

bt ∼ φ(bt | x1:t, a2:t, xbt+1:Tt+1:T, bt+1:T) (?)

By expanding

p(x1:t | θ, Dt) ∝ Wθt (x1:t)Rθ

t (xt | x1:t−1)p(x1:t−1 | θ, Dt−1),

we can show that (?) corresponds to

P(bt = i) ∝ wit p(xbt+1

t+1 | θ, xit).

Sampling b1:T corresponds exactly to a run of a backward simulator!

Final PG-BS algorithm

Algorithm 1 PG-BS: Particle Gibbs with backward simulation1. Initialize: Set θ[0], x1:T[0] and b1:T[0] arbitrarily.2. For r ≥ 1, iterate:

(a) Draw θ[r] ∼ p(θ | x1:T[r− 1], DT).

(b) Run a conditional PF, targeting p(x1:T | θ[r], DT),conditioned on {x1:T[r− 1], b1:T[r− 1]}.

(c) Run a backward simulator to generate b1:T[r] and setx1:T[r] to the corresponding particle trajectory.

{θ[r], x1:T[r]}r≥1 has stationary distribution p(θ, x1:T | DT).

ex) Stochastic volatility

Stochastic volatility model,

xt+1 = θ1xt + vt, vt ∼ N (0, θ2),

yt = et exp(

), et ∼ N (0, 1).

0 200 400 600 800 10000.2

Iteration number

0 200 400 600 800 10000

Iteration number

N=5N=20N=100N=1000N=5000

ex) Stochastic volatility

Stochastic volatility model,

xt+1 = θ1xt + vt, vt ∼ N (0, θ2),

yt = et exp(

), et ∼ N (0, 1).

0.8 0.85 0.9 0.95 10

Probabilityden

0 0.1 0.2 0.3 0.4 0.50

Probabilityden

N=5N=20N=100N=1000N=5000

ex) Wiener system identification

G h(·) Σut yt

• Find θ = {G, h(·)}.• Parametric (state-space) model for G.• Nonparametric model for h, based on Gaussian process.

• Example system• 4th order linear system, T = 1000.• Blind identification (ut = 0).

• PG-BS with• N = 5 particles.• 15000 iterations of the Gibbs sampler.

ex) Wiener system identification, cont’d.

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

Frequency (rad/s)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−0.8

−0.6

−0.4

−0.2

Bode diagram Nonlinear mapping

Summary

Particle Gibbs with backward simulation

• Provably convergent for any N ≥ 2 – and it works in practice!

• Makes efficient use of the available particles.

• How does it scale with the state dimension?

• Models with strong dependencies between state andparameter?

PG-BS only one member of the PMCMC family – there are othermethods with different properties.

MATLAB code available at:http://www.control.isy.liu.se/~lindsten/code/

Particle MCMC (PMCMC) introduced in the seminal paper,

C. Andrieu, A. Doucet and R. Holenstein, “Particle Markov chain Monte Carlomethods”, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010.

Particle Markov chain Monte Carlo

Documents