Bayesian Emulation and Calibration of a Dynamic … Emulation and Calibration of a Dynamic Epidemic...

Post on 17-Apr-2018

251 views 2 download

transcript

Bayesian Emulation and Calibration of aDynamic Epidemic Model for H1N1 Influenza

Marian Farah1

Paul Birrell1, Stefano Conti2, Daniela De Angelis1,2

1MRC Biostatistics Unit, Cambridge, UK2Health Protection Agency, London, UK

ICERM Bayesian Nonparametrics WorkshopSeptember 19, 2012

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Motivation

• Tracking and predicting the behavior of an emergingepidemic is essential for a prompt public health response.

• Inferential goals:

• What is happening? i.e., real-time estimation of theepidemic parameters.

• What is going to happen next? i.e., forecasting the(short-term) evolution of the epidemic.

• What happened? i.e., “reconstructing” the epidemic byestimating its parameters and evolution dynamics.

• Noisy time-series data coming from different sources.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

1 Introduction: Epidemic modeling

2 Emulation and calibration of epidemic models

3 Preliminary results

4 Discussion

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Introduction

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Epidemic modeling

• Transmission model:

S(t)getting infected

−→ E (t)latent period

−→ I (t)infectious period

−→ R(t)

• Transmission depends on the virulence, the mixingpatterns in the population, and the transition ratesamong the S , E , I , and R states.

• Transmission dynamics are typically described by a systemof differential equations.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Birrell et al. (2011) H1N1 model

S(t)η1,η2−→ E (t)

η3−→ I (t)

η4−→ R(t)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Birrell et al. (2011) H1N1 model

S(t)η1,η2−→ E (t)

η3−→ I (t)

η4−→ R(t)

η5 ↓ incubation

Expected # of symptomatic individuals

η6 ↓ propensity to consult doctor

Expected # of doctor consultations

↓ delay in reporting

Expected # of reported cases, µ(η, t)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Birrell et al. (2011) H1N1 model

S(t)η1,η2−→ E (t)

η3−→ I (t)

η4−→ R(t)

η5 ↓ incubation

Expected # of symptomatic individuals

η6 ↓ propensity to consult doctor

Expected # of doctor consultations

↓ delay in reporting

Expected # of reported cases, µ(η, t)

• η = (η1, . . . , η6): underlying parameters of the epidemic.

• Proportion of symptomatic cases, propensity to consult,exponential growth rate, expected infectious period, ameasure of the initial number of infected individuals,population interaction parameters.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Computational challenge

• The likelihood of reported data, z(t), t = 1, . . . ,T ,depends on µ.

• p(η | z{1:T}, µ) ∝T∏

t=1p(

z(t); µ(η, t))

× p(η)

• µ(η, t) must be computed at every MCMC iteration.

• µ is computationally expensive.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Computational challenge

• The likelihood of reported data, z(t), t = 1, . . . ,T ,depends on µ.

• p(η | z{1:T}, µ) ∝T∏

t=1p(

z(t); µ(η, t))

× p(η)

• µ(η, t) must be computed at every MCMC iteration.

• µ is computationally expensive.

• What about an efficient estimate?

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Computer simulator

specify inputs η = X run code outputs

X =

x1,1 . . . x1,6

x2,1 . . . x2,6

......

...xn,1 . . . xn,6

Birrellet al.(2011)

µ(x1, 1), . . . , µ(x1,T )µ(x2, 1), . . . , µ(x2,T )

µ(x2, t)...

µ(xn, 1), . . . , µ(xn,T )

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Computer simulator

specify inputs η = X run code outputs

X =

x1,1 . . . x1,6

x2,1 . . . x2,6

......

...xn,1 . . . xn,6

Birrellet al.(2011)

µ(x1, 1), . . . , µ(x1,T )µ(x2, 1), . . . , µ(x2,T )

µ(x2, t)...

µ(xn, 1), . . . , µ(xn,T )

0 50 100 150 200 2500

2

4

6

8

10

12x 104

time

µ(η,

t)

0 50 100 150 200 250

0

2

4

6

8

10

12

time

log

µ(η,

t)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration and Emulation

• Calibration: (e.g., Higdon et al., 2004)

Posterior inference for η through the simulator, µ, and“field” observed data z(t),

Observed = Reality + Error

Observed = Simulator + bias + Error

z↑ µ↑ b↑

• p(η, b | z{1:T}, µ) ∝T∏

t=1p(

z(t); µ(η, t)+ b)

× p(η)p(b)

• Emulation: (e.g., Kennedy and O’Hagan, 2000)

Estimating a slow computer simulator output, µ, using fast

statistical model (an emulator), say µ̂.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration and Emulation

• Idea: (e.g., Bayarri et al., 2007a)

Replace the slow simulator output, µ, with the fast

emulator estimation, µ̂, and obtain posterior inference forη through

• p(η, b | z{1:T}, µ̂) ∝T∏

t=1p(

z(t); µ̂(η, t)+ b)

× p(η)p(b)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Emulation and calibration of

dynamic models

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Emulation review

• A deterministic computer simulator is a function f (·)that maps input x to a unique output y = f (x).

• The function f (·) is treated as unknown and given a prior.

• Likelihood: data are runs of the simulator, given adesign over the input space, e.g., Latin Hypercube.

• Emulator: the posterior (predictive) distribution of f (·).

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

The Gaussian process

y(x) ∼ GP(m(x), v c(x , x ′) )

m(·), v , and c(·, ·) are the mean, variance, & correlationfunction (e.g., Neal 1998; Rasmussen & Williams 2006).

−3 −2 −1 0 1 2 3−5

−4

−3

−2

−1

0

1

2

3

4

5

6

x

y

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t

20output function

f (x) = x + 3sin(x/2)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t

output function simulator dataprior realizations

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t

output function simulator dataprior realizations95% posterior region

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t

output function simulator dataprior realizations95% posterior region

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration review

• Simulator: Specify x → f (x).

• For x = η, f (η) simulates a physical system.

• η is uncertain.

• Calibration: solving the inverse-problem, i.e., η | z, f (·).

• If f (·) is computationally expensive, it is emulated.

• Priors for η and f (·).

• Likelihood: data come from field observations andsimulator runs.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t

6

8 output function simulator datafield data

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

0 2 4 6 8−2

0

2

4

6

8

10

input

oupu

t 6

8 output function simulator datafield data

z ∼ N(f (η), σ2 = 0.32)

η ∼ N(2, 0.052)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Toy example

1 2 3 4 50

2

4

6

8

η

dens

ity

truthpriorposterior

• Assuming σ2 is known.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• yt(xi ) = f (xi , t) is the simulator output at input point xiand time t.

x1 −→ y1(x1), y2(x1), . . . , yT (x1)x2 −→ y1(x2), y2(x2), . . . , yT (x2)...

......

...xn −→ y1(xn), y2(xn), . . . , yT (xn)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• yt(xi ) = f (xi , t) is the simulator output at input point xiand time t.

x1 −→ y1(x1), y2(x1), . . . , yT (x1)x2 −→ y1(x2), y2(x2), . . . , yT (x2)...

......

...xn −→ y1(xn), y2(xn), . . . , yT (xn)

• Need to model three types of interdependencies:

1 over the input space.

2 over time within each time series.

3 across series of different input points.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Modeling dependence over the input space alone

Typically using a Gaussian process prior for outputs.

y(x) ∼ GP(m(x), v c(x , x ′) )

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Modeling dependence over the input space alone

Typically using a Gaussian process prior for outputs.

y(x) ∼ GP(m(x), v c(x , x ′) )

• Modeling dependence for a single time series

Typically, TVAR(p) model is used; e.g., p = 1,

yt(x) = φt yt−1(x) + ǫt(x), ǫt(x) ∼ N(0, vt),

φt = φt−1 + ωt , ωt ∼ N(0, wt).

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Linking across time series for different inputs using amultivariate TVAR(p) model (Liu and West, 2009),

yt (x1)...

yt (xn)

=

yt−1 (x1) · · · yt−p (x1)...

. . ....

yt−1 (xn) · · · yt−p (xn)

φ1,t

...φp,t

+

ǫt(x1)...

ǫt(xn)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Linking across time series for different inputs using amultivariate TVAR(p) model (Liu and West, 2009),

yt (x1)...

yt (xn)

=

yt−1 (x1) · · · yt−p (x1)...

. . ....

yt−1 (xn) · · · yt−p (xn)

φ1,t

...φp,t

+

ǫt(x1)...

ǫt(xn)

Cov(

ǫt(xi ), ǫt(xj ))

= vt c(xi , xj)

• c(xi , xj) is the (i , j) element in the n × n correlationmatrix induced by the Gaussian process.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Linking across time series for different inputs using amultivariate TVAR(p) model (Liu and West, 2009),

yt (x1)...

yt (xn)

=

yt−1 (x1) · · · yt−p (x1)...

. . ....

yt−1 (xn) · · · yt−p (xn)

φ1,t

...φp,t

+

ǫt(x1)...

ǫt(xn)

Cov(

ǫt(xi ), ǫt(xj ))

= vt c(xi , xj)

• c(xi , xj) is the (i , j) element in the n × n correlationmatrix induced by the Gaussian process.

• φt = φt−1 + ωt , where φt = (φ1t , . . . , φpt)′.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Birrell et al. (2011) simulator

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

12t = 20

x2

log

µ

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

12t = 40

x2

log

µ

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

12t = 85

x2

log

µ

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

12t = 140

x2

log

µ

• x2: Exponential growth rate.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Birrell et al. (2011) simulator

0 50 100 150 200 250

0

2

4

6

8

10

12

time

log

µ(η,

t)

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Dynamic emulation

• Extending Liu and West (2009)

• Modeling input-dependent trends:

yt(x) = φt yt−1(x) + h(x)βt + ǫt

• Modeling systematic temporal trend:

yt(x) = θt + φt yt−1(x) + h(x)βt + ǫt

θt

φt

βt

=

θt−1

φt−1

βt−1

+

ω1t

ω2t

ω3t

• Posterior inference through Forward-FilteringBackward-Sampling.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration

• Two sources of data:

• Simulator data: Ds = {(yt , x); t = 1, . . . ,T}. Modelparameters are specified as inputs x.

• “Field” observed epidemic data DF = {zt ; t = 1, . . . ,T}.Model parameters, η, are unknown.

• Two-stage calibration (e.g., Bayarri et al., 2007b)

• Stage 1: Estimate the emulator model parameters usingonly Ds .

• Stage 2: Model zt using a parametric distribution centeredon the emulator model. Then, conditional onstage 1, estimate p(η | DF ,Ds).

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Results

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Validating the emulator

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

0 50 100 150 200 250−5

0

5

10

15

time

log

µ

• Simulation runs (black), emulator’s median & 95% region (red).

• Plots based on a MVTVAR(1) and Gaussian correlation function.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration

• Generated synthetic epidemic data.

• Set η = η0. Then, z ∼Poisson(

µ(η0, t))

0 50 100 150 200 250−2

0

2

4

6

8

time

log

obse

rvat

ion

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Calibration

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

η1

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

η2

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

η3

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

η4

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

η5

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

η6

• Truth — Prior

• η2 is exponential growth rate, and η5 is effect of summerholiday on population interaction.

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

Discussion

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

• What we have done:

• Estimation of epidemic dynamics by combining astatistical emulator with reported epidemic data.

• Dynamic emulation through modeling dependenciesacross time and epidemic parameter space.

• Still to do:

• Consider different age groups in the population.

• Incorporate additional sources of information.

• Real-time calibration and forecasting using epidemic data.

• . . .

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

• What we have done:

• Estimation of epidemic dynamics by combining astatistical emulator with reported epidemic data.

• Dynamic emulation through modeling dependenciesacross time and epidemic parameter space.

• Still to do:

• Consider different age groups in the population.

• Incorporate additional sources of information.

• Real-time calibration and forecasting using epidemic data.

• . . .

Thank you!

DynamicBayesian

modeling forepidemics

Marian FarahBiostatisticsCambridge

Outline

Introduction

Methods

Results

Discussion

References:

• Bayarri, M., Berger, J., Paulo, R., Sacks, J., Cafeo, J., Cavendish, J., Lin,C., and Tu, J. (2007a), “A framework for validation of computer models,”Technometrics, 49, 138–154.

• Bayarri, M. J., Berger, J. O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo,J., Parthasarathy, R., Paulo, R., Sacks, J., and Walsh, D. (2007b),“Computer model validation with functional output,” Annals of Statistics,35, 1874–1906.

• Birrell, P. J., Ketsetzis, G., Gay, N. J., Cooper, B. S., Presanis, A. M.,Harris, R. J., Charlett, A., Zhang, X.-S., White, P. J., Pebody, R. G., andDe Angelis, D. (2011), “Bayesian modeling to unmask and predict infuenzaA/H1N1pdm dynamics in London,” Proceedings of the National Academy

of Sciences.

• Kennedy, M. C. and O’Hagan, A. (2000), “Predicting the output from acomplex computer code when fast approximations are available,”Biometrika, 87, 1–13.

• Liu, F. and West, M. (2009), “A Dynamic Modelling Strategy for BayesianComputer Model Emulation,” Bayesian Analysis, 4, 393–412.