+ All Categories
Home > Documents > Computational methods for continuous time Markov chains with

Computational methods for continuous time Markov chains with

Date post: 03-Feb-2022
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
104
Computational methods for continuous time Markov chains with applications to biological processes David F. Anderson * * [email protected] Department of Mathematics University of Wisconsin - Madison Penn. State January 13th, 2012
Transcript

Computational methods for continuous time Markov chainswith applications to biological processes

David F. Anderson∗

[email protected]

Department of Mathematics

University of Wisconsin - Madison

Penn. State

January 13th, 2012

Stochastic Models of Biochemical Reaction Systems

I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”

Stochastic Models of Biochemical Reaction Systems

I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”

Stochastic Models of Biochemical Reaction Systems

Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.

Stochastic Models of Biochemical Reaction Systems

Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

Benefits/drawbacks

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

We need to develop the model for better ideas....

Benefits/drawbacks

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

We need to develop the model for better ideas....

Build up model: Random time change representation of Tom Kurtz

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.

Build up model: Random time change representation of Tom Kurtz

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Build up model: Random time change representation of Tom Kurtz

• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Build up model: Random time change representation of Tom Kurtz

• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

Back to our problem

Recall:

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

Let’s try an approximate scheme.

Tau-leaping: Euler’s method

Explicit tau-leaping 1 or Euler’s method, was first formulated by Dan Gillespiein this setting .

Tau-leaping is essentially an Euler approximation of∫ t

0λk (X (s))ds:

Z (h) = Z (0) +∑

k

Yk

(∫ h

0λk (Z (s)) ds

)ζk

≈ Z (0) +∑

k

Yk

(λk (Z (0)) h

)ζk

d= Z (0) +

∑k

Poisson(λk (Z (0)) h

)ζk .

1D. T. Gillespie, J. Chem. Phys., 115, 1716 – 1733.

Euler’s method

Path-wise representation for Z (t) generated by Euler’s method is

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk ,

where

η(s) = tn if tn ≤ s < tn+1 = tn + h

is a step function giving left endpoints of time discretization.

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.

Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL

≈ 1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL =

E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) =

E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

For approximate processes

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (Z` ◦ η`(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

Z`−1(t) = Z`−1(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z`−1 ◦ η`−1(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk ,

Algorithm for simulation is equivalent in to τ -leaping.

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.

Multi-level Monte Carlo: an unbiased estimator

For well chosen n0, n`, and nE . We have

Var(Q) = Var

QE +L∑

`=`0+1

Q` + Q0

= O(ε2),

with

Comp. cost =[ε−2(N−ρhL + h2

L)]

N+ε−2(

h−1`0

+ ln(ε)2N−ρ + ln(ε−1)1

M − 1h`0

)

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Example

Consider a model of gene transcription and translation:

G 25→ G + M,

M 1000→ M + P,

P + P 0.001→ D,

M 0.1→ ∅,

P 1→ ∅.

Suppose:

1. initialize with: G = 1, M = 0, P = 0, D = 0,

2. want to estimate the expected number of dimers at time T = 1,

3. to an accuracy of ± 1.0 with 95% confidence.

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

Example

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Unbiased Multi-level Monte Carlo with M = 3, L = 5, and `0 = 2.

Level # paths CPU Time Var. estimator # updates(X ,Z3−5) 3,900 279.6 S 0.0658 6.8 ×107

(Z3−5 ,Z3−4) 30,000 49.0 S 0.0217 8.8 ×107

(Z3−4 ,Z3−3) 150,000 71.7 S 0.0179 1.5 ×108

(Z3−3 ,Z3−2) 510,000 112.3 S 0.0319 1.7 ×108

Tau-leap with h = 3−2 8,630,000 518.4 S 0.1192 4.7 ×108

Totals N.A. 1031.0 S 0.2565 9.5 ×108

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Another example: Viral infectionLet

1. T = viral template.

2. G = viral genome.

3. S = viral structure.

4. V = virus.

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Another example: Viral infection

Stochastic equations for X = (XG,XS ,XT ,XV ) are

X1(t) = X1(0) + Y1

(∫ t

0X3(s)ds

)− Y2

(0.025

∫ t

0X1(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X2(t) = X2(0) + Y3

(1000

∫ t

0X3(s)ds

)− Y5

(2∫ t

0X2(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X3(t) = X3(0) + Y2

(0.025

∫ t

0X1(s)ds

)− Y4

(0.25

∫ t

0X3(s)ds

)X4(t) = X4(0) + Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

).

Another example: Viral infection

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).

Another example: Viral infection

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).

Another example: Viral infection

Approximate process satisfies.

Z1(t) = X1(0) + Y1

(∫ t

0Z3(s)ds

)− Y2

(0.025

∫ t

0Z1(s)ds

)− Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

)Z3(t) = X3(0) + Y2

(0.025

∫ t

0Z1(s)ds

)− Y4

(0.25

∫ t

0Z3(s)ds

)Z4(t) = X4(0) + Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

).

(1)

Now useEf (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Another example: Viral infection

X(t) = X(0) + Y1,1

(∫ t

0min{X3(s), Z3(s)}ds

)ζ1 + Y1,2

(∫ t

0X3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t

0min{X1(s), Z1(s)}ds

)ζ2 + Y2,2

(0.025

∫ t

0X1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y3

(1000

∫ t

0X3(s)ds

)ζ3

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,2

(0.25

∫ t

0X3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y5

(2∫ t

0X2(s)ds

)ζ5

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,2

(∫ t

0λ6(X(s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6

Z (t) = Y1,1

(∫ t

0min{X3(s), Z3(s)}ds

)ζ1 + Y1,3

(∫ t

0Z3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t

0min{X1(s), Z1(s)}ds

)ζ2 + Y2,3

(0.025

∫ t

0Z1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,3

(0.25

∫ t

0Z3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,3

(∫ t

0Λ6(Z (s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6,

Another example: Viral infection

Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.

Another example: Viral infection

Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.

Mathematical Analysis

We had

X (t) = X (0) +∑

k

Yk

(∫ t

0λ′k (X (s))ds

)ζk .

Assumed ∑k

λ′k (X (·)) ≈ N � 1.

There are therefore two extreme parameters floating around our models:

1. Some parameter N � 1, causing N � 1 (inherent to model).

2. h, the stepsize (inherent to approximation).

To quantify errors, need to account for both.

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (Z N

` ,ZN`−1) satisfy coupling with Z N

` (0) = Z N`−1(0). Then,

supt≤T

E|Z N` (t)− Z N

`−1(t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2` .

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

Flavor of Proof

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

X N (t) =X N (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

+∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

Z N` (t) =Z N

` (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z` ◦ η`(s))ds

)ζN

k

+∑

k

Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Next problem: parameter sensitivities.

Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).

Next problem: parameter sensitivities.

Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).

Next problem: parameter sensitivities.

Couple the processes.

X θ+ε(t) = X θ+ε(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,2

(∫ t

0λθ+ε

k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk

X θ(t) = X θ(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,3

(∫ t

0λθk (X

θ(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk ,

Use:

DR(ε) = ε−1 1R

R∑i=1

[f (X θ+ε

[i] (t))− f (X θ[i](t))

].

Next problem: parameter sensitivities.

Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.

Next problem: parameter sensitivities.

Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.

Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:

Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Thanks!

References:

1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo forcontinuous time Markov chains, with applications in biochemical kinetics,to appear in SIAM: Multiscale Modeling and Simulation.

Available at arXiv.org:1107.2181. Also on my website:www.math.wisc.edu/˜anderson.

2. David F. Anderson, Efficient Finite Difference Method for ParameterSensitivities of Continuous time Markov Chains, submitted.

Available at arXiv.org:1109.2890. Also on my website:www.math.wisc.edu/˜anderson.

Funding: NSF-DMS-1009275.


Recommended