Download - Approximate Bayesian Computation for State Space Modelstdri.or.th/wp-content/uploads/2014/12/Maneesoonthorn_ABC_TDRI.pdf · Approximate Bayesian Computation for State Space Models

Approximate Bayesian Computation for StateSpace Models

Worapree (Ole) ManeesoonthornMelbourne Business School, The University of Melbourne

Joint work withGael M. Martin, Brendan C.P. McCabe & Christian Robert

December 2014Thailand Development Research Institute

Maneesoonthorn () ABC for state space modelsDecember 2014 Thailand Development Research Institute 1

/ 36

Presentation outline

Presentation outline

• State space models and Bayesian inference• Approximate Bayesian Computation (ABC)• Our framework• Illustration using stochastic volatility models• Conclusions

Maneesoonthorn () ABC for state space models TDRI 2014 2 / 36

State space models

State space models

• Structural time series model with hidden/unobserved components• Simplest form - linear-Gaussian model• Example:

yt = Trendt + etTrendt = β0 + β1Trendt−1 + ut

• Observed component: yt ; unobserved component: Trendt• Random components: et ∼ N

(0, σ2e

)and ut ∼ N

(0, σ2u

)• Inference about φ = (β0, β1, σu)

/ via maximum likelihood

• ⇒Kalman filter to obtain closed-form


State space models

State space models

• In general, the state space model can be written as

yt = f (xt , et , φ)

xt = g (xt−1, ut , φ)

where φ denotes a vector of static parameters

• f (.) and g (.) are potentially nonlinear• Densities p (et ) and p (ut ) are potentially non-Gaussian

• Example: stochastic volatility with student-t error

yt =√xtet

xt = β0 + β1xt−1 + σu√xt−1ut

where et ∼ tν (0, 1) and ut ∼ TN (0, 1)


State space models

State space models

• Inference of nonlinear/non-Gaussian state space models - diffi cult!• Likelihood cannot be evaluated in closed form

p (y1, ..., yT |φ) =T

∏t=1p (yt |y1, ..., yt−1,φ)

• with

p (yt |y1, ..., yt−1,φ) =∫p(yt |xt , y1, ..., yt−1, φ)p (xt |y1, ..., yt−1, φ) dxt


State space models

State space models

Inference of nonlinear/non-Gaussian state space model:

• Working with approximations• INLAR approximations• Mixtures of linear-Gaussians• ⇒estimate the approximating model

• Simulation based methods• Simulated maximum likelihood• Particle filtering

• Bayesian updating schemes• Simulating from proposed (approximating) model• Correct the draw via an updating scheme based on posterior densities


Bayesian inference

Classical vs Bayesian inference

• Classical frequentist inference• Latent state variable to be integrated out• Point estimates of parameters φ + Central Limit Theorem from MLEtheory

• Dynamic state - implied by parameter estimates

• Bayesian inference• Possible to estimate all unknowns - parameters φ + states x• Data-based inference via posterior distributions• Integration - by simulation


Bayesian inference

Bayesian inference

• Objective is to estimate

p (φ, x |y) = p (y |x , φ) p (x |φ) p (φ)p (y)

By sampling model unknowns iteratively the Markov chain from• φ ∼ p (φ|x , y)• x ∼ p (x |φ, y)

• Posterior in inference

p (φ|y) =∫p (φ, x |y) dx

p (x |y) =∫p (φ, x |y) dφ

by numerical integration


Bayesian inference

Bayesian inference

• Vast literature on Bayesian inference in state space setting• Markov chain Monte Carlo (MCMC), Particle MCMC, sequentialMonte Carlo (SMC)...

• However, these methods are not black box• High level expertise to develop• Convergence issues• Time consuming• Not widely applied by non-technical experts

• We propose a simpler alternative based on Approximate BayesianComputation (ABC)• Producing simulation-based estimate of an approximation to p (φ, x |y)


ABC

Approximate Bayesian computation - in brief

Aim:

• Produce i.i.d. draws from an approximation to p (φ, x |y)• Use draws to estimate that approximation• Employing a simple accept/reject algorithm

Need:

• To be able to simulate from p (x |φ) exactly• To be able to simulate from p (y |x , φ) exactly

Recent review: Marin, Publo, Robert & Ryder (2011)


ABC


Steps: for i = 1, ...,R

1 Simulate φi from p (φ)

2 Simulate x i from p(x |φi

)3 Simulate psuedo-data z i from the conditional likelihood p

(z |x i , φi

)4 Select

(x i , φi

)such that

d

η (y) , η(z i)≤ ε

where

• η (.) is a vector of summary statistics,• d. is a distance criterion• ε is an arbitrarily small tolerance


ABC


When η (.) is suffi cient and ε→ 0

• ⇒selected draws of(x i , φi

)⇒ p (φ, x |y)

• ... giving exact inference, up to simulation erro

When η (.) is not suffi cient

• ⇒selected draws of(x i , φi

)⇒ p (φ, x |η (y)) only

Choice of η (.) is usually problem-specific - still an open topic

• Joyce & Marjoram, 2008; Blum, 2010; Fearnhead & Prangle, 2012;Gleim & Pigorsch, 2013

• No general discussion in a state space setting


ABC

ABC and suffi ciency

How to render η(.) ‘close to’suffi cient in a state space model (SSM)setting?

• Linear Gaussian SSM ≡ ARMA model• ⇒ no reduction to suffi cient statistics (due to MA component)• ⇒ would not expect ABC based on arbitrary summary statistics(calculated from y) to perform well

Confirmed by numerical experimentation

• signal to noise ratio playing a role• dimension of η(.) also a problem (the ‘multiple matching’problem)


Our framework

Our approach to ABC

• In the spirit of indirect inference:• Gourieroux et al, 1993; Heggland and Frigessi, 2004

• think about a model that approximates the true (analyticallyintractable) SSM

• with associated likelihood function: LA(β; y)• Apply maximum likelihood estimation to LA(β; .) to produce β

• β asymptotically suffi cient for β in the approximate model• (β also asymptotically suffi cient for φ in the true model if true ∈approximate)

• If approximating model is ‘accurate’enough• β may be ‘close to’being suffi cient for φ in the true model


Our framework

Our approach to ABC

• Setting η(.) = β is computationally burdensome (optimizationrequired at each iteration of ABC.......)

• Instead, in the spirit of effi cient method of moments

• Gallant and Tauchen, 1996; Gallant and Long, 1997

• construct summary statistic as the score:

η(.) = S(β; .)|β=β(y) = T−1 ∂ ln LA(β; .)

∂β

∣∣∣∣β=β(y)

• Select ABC draws (φi , x i ) such that:

dη(y)︸︷︷︸=0

, η(z i ) ≤ ε,

• Does the ‘approx. asy. suffi ciency’of β⇒ S(β; .)|β=β(y)?


Our framework

• We show that:

dη(y), η(z i ) =√[

S(β(y); z i )]′

Σ[S(β(y); z i )

]≤ ε

and

dη(y), η(z i ) =√[

β(y)− β(z i )]′

Ω[

β(y)− β(z i )]≤ ε

• (for any p.d weighting matrices Σ and Ω)• ⇒ the same selected φi for ε→ 0

• ⇒ same estimate of p(φ|y)• For both:

• exactly identified (dim(β) = dim(φ)) case• over-identified (dim(β) > dim(φ)) case

• Estimates of p(φ|y) ≈ for small enough ε


Our framework

Our approach to ABC

• Also show that, for both criteria, (under regularity, and for ε→ 0)

• as T → ∞, p(φ|y) collapses onto the true φ0

• because we will only ever accept draws arbitrarily close to φ0

• ⇒ MLE- (or score-) based inference is (Bayesian) consistent


Our framework

Our approach to ABC

• Link between indirect inference and ABC already ackowledged; e.g.• Drovandi et al. (2011) - specific biological model• Drovandi and Pettitt (unpublished, 2013)

• Gleim and Pigorsch (unpublished, 2013) - SSM

• Use a semi-parametric approximating model based on a Hermiteexpansion - Gallant and Tauchen (1989)

• highly parameterized (by construction) - 12 parameters - and tunedto problem at hand

• Our aim is to produce a simple and generic algorithm suitable toany SSM

• Using an easily computed, parsimonious approximating model


Our framework

Our approach to ABC

• Steps:1 Define a non-linear/non-Gaussian (discrete time) state space model(SSM) of some sort

2 Apply the (augmented) unscented Kalman filter (AUKF) (Julier,Uhlmann, and Durrant-Whyte, 2000) to evaluate the likelihood:LA(β; y)

3 ⇒ use

η(.) = S(β; .)|β=β(y) = T−1 ∂ ln LA(β; .)

∂β

∣∣∣∣β=β(y)

as the matching statistic in the ABC algorithm

• Key point: computation burden of AUKF ≈ Kalman filter• ⇒ computationally feasible within ABC


Illustration

Illustration: Heston (CIR) SV model

• Assume:

rt =√xt εt ; εt ∼ i .i .d .N(0, 1)

dxt = (δ− αxt ) dt + σv√xtdWt ,

• rt = (demeaned) daily log return (observed discretely)• xt = latent variance (evolving continuously)

• Set parameters ⇒ rt and xt that ‘match’returns and realizedvolatility on S&P500 over 2003-2004 period

• Deliberately chose a tranquil period as:• not modelling price (and/or volatility) jumps• adopting conditional Gaussianity for returns


Illustration


• Transition densities are known:

xt |xt−1 ∼ Non− Central χ2(2cxt ; 2q + 2, 2u)

• Use the exact transitions to produce an exact comparator for theABC estimate

• Applying a grid-based non-linear filtering method of Ng, Forbes,Martin and McCabe (2013)

• (Appropriate for low-dimensional/SSM’s for which xt can be solvedfrom measurement equation)

• ⇒ exact p(φ|y) (up to numerical integration error)• where φ = (k = 1− α, δ, σ2v )


Illustration


Compare ABC score-based approx. of p(φ|y) with

1 Exact p(φ|y) (produced via grid-based non-linear filter)

2 Euler approximation to p(φ|y) (also produced via grid-basednon-linear filter)

3 AUKF approx. of p(φ|y)


Illustration


Also of interest to compare ABC score-based approx. of p(φ|y) with

4. ABC approx. of p(φ|y) based on the use of 5 arbitrary summarystatistics

• s1 =T−1∑t=2

yt , s2 =T−1∑t=2

y2t , s3 =T∑t=2ytyt−1, s4 = y1+ yT ,

s5 = y21+ y2T

• suffi cient for an observed AR(1)


Illustration

4a. Use Euclidean distance:

dη(y), η(z i ) = [5∑j=1(s ij − sobsj )2/var(sj )]1/2

4b. Use dimension reduction method of Fearnhead and Prangle (2012).Steps:

1 For each scalar parameter φk regress φik on si =

[s i1, s

i2, s

i3, s

i4, s

i5]for

i = 1, 2, ...,R ⇒(a, b)

2 Define:

η(zi ) = E (φk |zi ) = a+ si bη(y) = E (φk |y) = a+ sobs b

3 And use:

dη(y), η(zi ) = abs(E (φk |y)− E (φk |zi ))

as the distance measure


Illustration

Illustration: Heston SV model: results

Fix all parameters other than k = 1− α (volatility persistence): p(k |y)


Illustration

To summarize so far.....

Key insights thus far are:

1 finite sample suffi ciency unattainable in SSMs (even in LG case)• ⇒ ABC based on arbitrary summary statistics ; p(φ|y)

2 asymptotic suffi ciency obtained via MLE/score• ‘approximate’suffi ciency accessible only in general non-linear (incl.latent diffusion) SSMs

3 even an inaccurate approximating model can produce an accurate

p(φ|y) via ABC/score


Illustration

To summarize so far.....

However......

4. dimensionality of the matching statistics is critical

• if dim(η(.)) = m, ⇒ accuracy of p(φ|η(y)) declines with m (Blum,JASA, 2010)

• in addition to any difference between p(φ|η(y)) and p(φ|y)• Complexity of approximating model increases dimension of η(y)• Hence our focus on a parsimoneous approximation

5. Advocate use of integrated likelihood in multiple parameter settings• ⇒ uni-dimensional score statistic (m = 1) for each parameter φj• Only makes sense dimension of approx. model = dimension of truemodel


Illustration

Multiple parameter case: linear Gaussian model

yt = xt + ηt ηt ∼ i .i .d .N(0, σ2η)xt = d + kxt−1 + vt vt ∼ i .i .d .N(0, σ2v )

• φ = (d , k , σ2v ); (σ2η fixed to control signal to noise)

• Use the exact (KF) likelihood to generate score

• ⇒ Enables us to measure gain from exploiting asymptoticsuffi ciency via the likelihood function

• Compared with use of arbitrary (non-suffi cient) summary statistics

• Without the confounding effect of a (potentially inaccurate)approximating model

• Plus gain from moving from joint score ⇒ marginal score(dimension reduction)


Illustration

Multiple parameter case: LG model

• Estimates of exact p(k |y) based on:• 1) joint score; 2) marg. score; 3) AR(1) stats (Euclid. distance); 4)AR(1) stats (FP distance)


Illustration


Estimates of exact p(σv |y)


Illustration


Estimates of exact p(d |y)


Illustration


• Box plots (for estimates of p(k |y)) for 100 runs of ABC• High signal to noise:

• Clear ranking: 1) marginal score; 2) joint score; 3) summ. stats (FP);4) summ. stats (Euclidean)

• Marginal score method extremely accurateManeesoonthorn () ABC for state space models TDRI 2014 32 / 36

Illustration


• Score methods robust to signal to noise (exact likelihood stillaccessed)

• Two other parameters (σv and d):

• Main ranking still clear:

• 1) marginal score.......... 4) summ. stats (Euclidean)

• No uniform (intermediate) ranking for joint score/FP

• ⇒ shows the tension between the quest for asymptotic suffi ciency(via the joint score) and the quest for dimension reduction (via theFP regression method)


Illustration

Multiple parameter case: SQ model

φ = (k = 1− α, δ) (Hold σ2ν fixed)


Conclusion

To Conclude.....

• Use of (the score of) an auxilliary model to generate summarystatistics for ABC in an SSM setting seems promising

• Given that finite sample suffi ciency is unattainable

• (Approximate) asymptotic suffi ciency is a good goal to aim for

• Know that (Bayesian) consistency is also achievable

• Accuracy of the approximating model is always important (as it is inII/EMM)


Conclusion

To Conclude.....

• However, for auxiliary models with higher dimension

• ⇒ the closer is p(φ|η(y)) to p(φ|y)

• ⇒ the more inaccurate is the ABC estimate of p(φ|η(y))!

• ⇒ marginal score approach may reap benefits

• If not too compromised by the inaccuracy of the auxilliary model