Computational methods for continuous time Markov chainswith applications to biological processes
David F. Anderson∗
Department of Mathematics
University of Wisconsin - Madison
Penn. State
January 13th, 2012
Stochastic Models of Biochemical Reaction Systems
I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.
I Often called chemical master equation type models in biosciences.
Common examples include:
1. Gene regulatory networks.
2. Models of viral infection.
3. General population models (epidemic, predator-prey, etc.)
Path-wise simulation methods include:
Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change
representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”
Stochastic Models of Biochemical Reaction Systems
I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.
I Often called chemical master equation type models in biosciences.
Common examples include:
1. Gene regulatory networks.
2. Models of viral infection.
3. General population models (epidemic, predator-prey, etc.)
Path-wise simulation methods include:
Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change
representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”
Stochastic Models of Biochemical Reaction Systems
Path-wise methods can approximate values such as
Ef (X (t))
For example,
1. Means: f (x) = xi .
2. Moments/variances: f (x) = x2i .
3. Probabilities: f (x) = 1{x∈A}.
or compute sensitivitiesd
dκEf (Xκ(t)).
Problem: solving using these algorithms can be computationally expensive.
Stochastic Models of Biochemical Reaction Systems
Path-wise methods can approximate values such as
Ef (X (t))
For example,
1. Means: f (x) = xi .
2. Moments/variances: f (x) = x2i .
3. Probabilities: f (x) = 1{x∈A}.
or compute sensitivitiesd
dκEf (Xκ(t)).
Problem: solving using these algorithms can be computationally expensive.
First problem: joint with Des Higham
Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.
Easy!
I Simulate the CTMC exactly,
I generate independent paths, X[i](t), use the unbiased estimator
µn =1n
n∑i=1
f (X[i](t)).
I stop when desired confidence interval is ± ε.
First problem: joint with Des Higham
Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.
Easy!
I Simulate the CTMC exactly,
I generate independent paths, X[i](t), use the unbiased estimator
µn =1n
n∑i=1
f (X[i](t)).
I stop when desired confidence interval is ± ε.
What is the computational cost?
Recall,
µn =1n
n∑i=1
f (X[i](t)).
Thus,
Var(µn) = O(
1n
).
So, if we wantσn = O(ε),
we need1√n= O(ε) =⇒ n = O(ε−2).
If N gives average cost (steps) of a path using exact algorithm:
Total computational complexity = (cost per path)× (# paths)
= O(Nε−2).
Can be bad if (i) N is large, or (ii) ε is small.
What is the computational cost?
Recall,
µn =1n
n∑i=1
f (X[i](t)).
Thus,
Var(µn) = O(
1n
).
So, if we wantσn = O(ε),
we need1√n= O(ε) =⇒ n = O(ε−2).
If N gives average cost (steps) of a path using exact algorithm:
Total computational complexity = (cost per path)× (# paths)
= O(Nε−2).
Can be bad if (i) N is large, or (ii) ε is small.
What is the computational cost?
Recall,
µn =1n
n∑i=1
f (X[i](t)).
Thus,
Var(µn) = O(
1n
).
So, if we wantσn = O(ε),
we need1√n= O(ε) =⇒ n = O(ε−2).
If N gives average cost (steps) of a path using exact algorithm:
Total computational complexity = (cost per path)× (# paths)
= O(Nε−2).
Can be bad if (i) N is large, or (ii) ε is small.
Benefits/drawbacks
Benefits:
1. Easy to implement.
2. Estimator
µn =1n
n∑i=1
f (X[i](t))
is unbiased.
Drawbacks:
1. The cost of O(Nε−2) could be prohibitively large.
2. For our models, we often have that N is very large.
We need to develop the model for better ideas....
Benefits/drawbacks
Benefits:
1. Easy to implement.
2. Estimator
µn =1n
n∑i=1
f (X[i](t))
is unbiased.
Drawbacks:
1. The cost of O(Nε−2) could be prohibitively large.
2. For our models, we often have that N is very large.
We need to develop the model for better ideas....
Build up model: Random time change representation of Tom Kurtz
Consider the simple systemA + B → C
where one molecule each of A and B is being converted to one of C.
Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,
X (t) = X (0) + R(t)
−1−11
,
whereI R(t) is the # of times the reaction has occurred by time t , and
I X (0) is the initial condition.
Build up model: Random time change representation of Tom Kurtz
Consider the simple systemA + B → C
where one molecule each of A and B is being converted to one of C.
Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,
X (t) = X (0) + R(t)
−1−11
,
whereI R(t) is the # of times the reaction has occurred by time t , and
I X (0) is the initial condition.
Build up model: Random time change representation of Tom Kurtz
Assuming intensity or propensity of reaction is
κXA(s)XB(s),
We can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson point process.
Hence XA(t)XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
Build up model: Random time change representation of Tom Kurtz
Assuming intensity or propensity of reaction is
κXA(s)XB(s),
We can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson point process.
Hence XA(t)XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
Build up model: Random time change representation of Tom Kurtz
Assuming intensity or propensity of reaction is
κXA(s)XB(s),
We can model
R(t) = Y(∫ t
0κXA(s)XB(s)ds
)where Y is a unit-rate Poisson point process.
Hence XA(t)XB(t)XC(t)
≡ X (t) = X (0) +
−1−11
Y(∫ t
0κXA(s)XB(s)ds
).
Build up model: Random time change representation of Tom Kurtz
• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :
d∑i=1
νik Si −→d∑
i=1
ν′ik Si
Denote reaction vector as
ζk = ν′k − νk ,
• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.
• By analogy with before
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Yk are independent, unit-rate Poisson processes.
Build up model: Random time change representation of Tom Kurtz
• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :
d∑i=1
νik Si −→d∑
i=1
ν′ik Si
Denote reaction vector as
ζk = ν′k − νk ,
• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.
• By analogy with before
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Yk are independent, unit-rate Poisson processes.
ExampleConsider a model of gene transcription and translation:
G 25→ G + M, (Transcription)
M 1000→ M + P, (Translation)
P + P 0.001→ D, (Dimerization)
M 0.1→ ∅, (Degradation of mRNA)
P 1→ ∅ (Degradation of Protein).
Then, if X = [XM ,XP ,XD]T ,
X (t) = X (0) + Y1 (25t)
100
+ Y2
(1000
∫ t
0XM(s)ds
) 010
+ Y3
(0.001
∫ t
0XP(s)(XP(s)− 1)ds
) 0−21
+ Y4
(0.1∫ t
0XM(s)ds
) −100
+ Y5
(1∫ t
0XP(s)ds
) 0−10
ExampleConsider a model of gene transcription and translation:
G 25→ G + M, (Transcription)
M 1000→ M + P, (Translation)
P + P 0.001→ D, (Dimerization)
M 0.1→ ∅, (Degradation of mRNA)
P 1→ ∅ (Degradation of Protein).
Then, if X = [XM ,XP ,XD]T ,
X (t) = X (0) + Y1 (25t)
100
+ Y2
(1000
∫ t
0XM(s)ds
) 010
+ Y3
(0.001
∫ t
0XP(s)(XP(s)− 1)ds
) 0−21
+ Y4
(0.1∫ t
0XM(s)ds
) −100
+ Y5
(1∫ t
0XP(s)ds
) 0−10
Back to our problem
Recall:
Benefits:
1. Easy to implement.
2. Estimator
µn =1n
n∑i=1
f (X[i](t))
is unbiased.
Drawbacks:
1. The cost of O(Nε−2) could be prohibitively large.
2. For our models, we often have that N is very large.
Let’s try an approximate scheme.
Tau-leaping: Euler’s method
Explicit tau-leaping 1 or Euler’s method, was first formulated by Dan Gillespiein this setting .
Tau-leaping is essentially an Euler approximation of∫ t
0λk (X (s))ds:
Z (h) = Z (0) +∑
k
Yk
(∫ h
0λk (Z (s)) ds
)ζk
≈ Z (0) +∑
k
Yk
(λk (Z (0)) h
)ζk
d= Z (0) +
∑k
Poisson(λk (Z (0)) h
)ζk .
1D. T. Gillespie, J. Chem. Phys., 115, 1716 – 1733.
Euler’s method
Path-wise representation for Z (t) generated by Euler’s method is
Z (t) = X (0) +∑
k
Yk
(∫ t
0λk (Z ◦ η(s))ds
)ζk ,
where
η(s) = tn if tn ≤ s < tn+1 = tn + h
is a step function giving left endpoints of time discretization.
Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let
µn =1n
n∑i=1
f (ZL,[i](t)).
We note
Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))
]+ Ef (ZL(t))− µn
Suppose have an order one method
Ef (X (t))− Ef (ZL(t)) = O(hL).
We need:1. hL = O(ε).
2. n = ε−2.Suppose a path costs O(ε−1) steps. Then
Total computational complexity = (# paths)× (cost per path)
= O(ε−3).
Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let
µn =1n
n∑i=1
f (ZL,[i](t)).
We note
Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))
]+ Ef (ZL(t))− µn
Suppose have an order one method
Ef (X (t))− Ef (ZL(t)) = O(hL).
We need:1. hL = O(ε).
2. n = ε−2.Suppose a path costs O(ε−1) steps. Then
Total computational complexity = (# paths)× (cost per path)
= O(ε−3).
Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let
µn =1n
n∑i=1
f (ZL,[i](t)).
We note
Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))
]+ Ef (ZL(t))− µn
Suppose have an order one method
Ef (X (t))− Ef (ZL(t)) = O(hL).
We need:1. hL = O(ε).
2. n = ε−2.Suppose a path costs O(ε−1) steps. Then
Total computational complexity = (# paths)× (cost per path)
= O(ε−3).
Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let
µn =1n
n∑i=1
f (ZL,[i](t)).
We note
Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))
]+ Ef (ZL(t))− µn
Suppose have an order one method
Ef (X (t))− Ef (ZL(t)) = O(hL).
We need:1. hL = O(ε).
2. n = ε−2.
Suppose a path costs O(ε−1) steps. Then
Total computational complexity = (# paths)× (cost per path)
= O(ε−3).
Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let
µn =1n
n∑i=1
f (ZL,[i](t)).
We note
Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))
]+ Ef (ZL(t))− µn
Suppose have an order one method
Ef (X (t))− Ef (ZL(t)) = O(hL).
We need:1. hL = O(ε).
2. n = ε−2.Suppose a path costs O(ε−1) steps. Then
Total computational complexity = (# paths)× (cost per path)
= O(ε−3).
Benefits/drawbacks
Benefits:
1. Can drastically lower the computational complexity of a problem ifε−1 � N.
CC of using exact = Nε−2
CC of using approximate = ε−1ε−2.
Drawbacks:
1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.
2. Tau-leaping has problems: what happens if you go negative?
3. Gone away from an unbiased estimator.
Benefits/drawbacks
Benefits:
1. Can drastically lower the computational complexity of a problem ifε−1 � N.
CC of using exact = Nε−2
CC of using approximate = ε−1ε−2.
Drawbacks:
1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.
2. Tau-leaping has problems: what happens if you go negative?
3. Gone away from an unbiased estimator.
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL
≈ 1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL =
E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo and control variatesI Suppose I want
EX ≈ 1n
n∑i=1
X[i],
but realizations of X are expensive.
I Suppose X ≈ ZL, and ZL is cheap.
I Suppose X , ZL can be generated simultaneously so that
Var(X − ZL)
is small.
I Then use
EX = E[X − ZL] + EZL ≈1n1
n1∑i=1
(X[i] − ZL,[i]) +1n2
n2∑i=1
ZL,[i].
I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going
EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) =
E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
Multi-level Monte Carlo: an unbiased estimatorIn our setting:
Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑
`=`0+1
E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).
For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via
QEdef=
1nE
nE∑i=1
(f (X[i](T )− f (ZL,[i](T ))),
Q`def=
1n`
n∑i=1
(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}
Q0def=
1n0
n0∑i=1
f (Z`0,[i](T )),
and note that
Q def= QE +
L∑`=`0+1
Q` + Q0
is an unbiased estimator for Ef (X (T )).
So what is the coupling and the variance of the estimator?
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set
Z13.1(t) = Y1(13.1t),
Z13(t) = Y2(13t),
Using this representation, these processes are independent and, hence,not coupled.
The variance of difference is large:
Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))
= 26.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset
Z13.1(t) = Y1(13t) + Y2(0.1t)
Z13(t) = Y1(13t),
The variance of difference is much smaller:
Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .
How do we generate processes simultaneously
Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.
I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset
Z13.1(t) = Y1(13t) + Y2(0.1t)
Z13(t) = Y1(13t),
The variance of difference is much smaller:
Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
How do we generate processes simultaneously
More generally, suppose we want
1. non-homogeneous Poisson process with intensity f (t) and
2. non-homogeneous Poisson process with intensity g(t).
We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define
Zf (t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
),
Zg(t) = Y1
(∫ t
0f (s) ∧ g(s)ds
)+ Y3
(∫ t
0g(s)− (f (s) ∧ g(s)) ds
),
where we are using that, for example,
Y1
(∫ t
0f (s) ∧ g(s)ds
)+Y2
(∫ t
0f (s)− (f (s) ∧ g(s)) ds
)= Y
(∫ t
0f (s)ds
),
where Y is a unit rate Poisson process.
Back to our processes
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Z (t) = X (0) +∑
k
Yk
(∫ t
0λk (Z ◦ η(s))ds
)ζk .
Now couple
X (t) = X (0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,2
(∫ t
0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Z`(t) = Z`(0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,3
(∫ t
0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Algorithm for simulation is equivalent to next reaction method or Gillespie.
Back to our processes
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Z (t) = X (0) +∑
k
Yk
(∫ t
0λk (Z ◦ η(s))ds
)ζk .
Now couple
X (t) = X (0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,2
(∫ t
0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Z`(t) = Z`(0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,3
(∫ t
0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Algorithm for simulation is equivalent to next reaction method or Gillespie.
Back to our processes
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk ,
Z (t) = X (0) +∑
k
Yk
(∫ t
0λk (Z ◦ η(s))ds
)ζk .
Now couple
X (t) = X (0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,2
(∫ t
0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Z`(t) = Z`(0) +∑
k
Yk,1
(∫ t
0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
+∑
k
Yk,3
(∫ t
0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds
)ζk
Algorithm for simulation is equivalent to next reaction method or Gillespie.
For approximate processes
Z`(t) = Z`(0) +∑
k
Yk,1
(∫ t
0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds
)ζk
+∑
k
Yk,2
(∫ t
0λk (Z` ◦ η`(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds
)ζk
Z`−1(t) = Z`−1(0) +∑
k
Yk,1
(∫ t
0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds
)ζk
+∑
k
Yk,3
(∫ t
0λk (Z`−1 ◦ η`−1(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds
)ζk ,
Algorithm for simulation is equivalent in to τ -leaping.
Multi-level Monte Carlo: chemical kinetic setting
Can prove:
Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,
supt≤T
E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .
Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,
supt≤T
E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .
1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.
Multi-level Monte Carlo: chemical kinetic setting
Can prove:
Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,
supt≤T
E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .
Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,
supt≤T
E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .
1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.
Multi-level Monte Carlo: an unbiased estimator
For well chosen n0, n`, and nE . We have
Var(Q) = Var
QE +L∑
`=`0+1
Q` + Q0
= O(ε2),
with
Comp. cost =[ε−2(N−ρhL + h2
L)]
N+ε−2(
h−1`0
+ ln(ε)2N−ρ + ln(ε−1)1
M − 1h`0
)
Multi-level Monte Carlo: an unbiased estimator
Some observations:
1. Weak error plays no role in analysis: free to choose hL.
2. Common problems associated with tau-leaping
I Negativity of species numbers,
does not matter. Just define process in a sensible way.
3. The method is unbiased.
Multi-level Monte Carlo: an unbiased estimator
Some observations:
1. Weak error plays no role in analysis: free to choose hL.
2. Common problems associated with tau-leaping
I Negativity of species numbers,
does not matter. Just define process in a sensible way.
3. The method is unbiased.
Multi-level Monte Carlo: an unbiased estimator
Some observations:
1. Weak error plays no role in analysis: free to choose hL.
2. Common problems associated with tau-leaping
I Negativity of species numbers,
does not matter. Just define process in a sensible way.
3. The method is unbiased.
Example
Consider a model of gene transcription and translation:
G 25→ G + M,
M 1000→ M + P,
P + P 0.001→ D,
M 0.1→ ∅,
P 1→ ∅.
Suppose:
1. initialize with: G = 1, M = 0, P = 0, D = 0,
2. want to estimate the expected number of dimers at time T = 1,
3. to an accuracy of ± 1.0 with 95% confidence.
ExampleMethod: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010
Method: Euler tau-leaping with crude Monte Carlo.
Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010
h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010
h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109
h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109
Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates
M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109
M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108
M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109
M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109
M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109
I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!
ExampleMethod: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010
Method: Euler tau-leaping with crude Monte Carlo.
Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010
h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010
h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109
h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109
Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates
M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109
M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108
M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109
M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109
M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109
I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!
ExampleMethod: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010
Method: Euler tau-leaping with crude Monte Carlo.
Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010
h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010
h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109
h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109
Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates
M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109
M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108
M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109
M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109
M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109
I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!
ExampleMethod: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010
Method: Euler tau-leaping with crude Monte Carlo.
Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010
h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010
h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109
h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109
Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates
M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109
M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108
M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109
M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109
M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109
I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!
Example
Method: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010
Unbiased Multi-level Monte Carlo with M = 3, L = 5, and `0 = 2.
Level # paths CPU Time Var. estimator # updates(X ,Z3−5) 3,900 279.6 S 0.0658 6.8 ×107
(Z3−5 ,Z3−4) 30,000 49.0 S 0.0217 8.8 ×107
(Z3−4 ,Z3−3) 150,000 71.7 S 0.0179 1.5 ×108
(Z3−3 ,Z3−2) 510,000 112.3 S 0.0319 1.7 ×108
Tau-leap with h = 3−2 8,630,000 518.4 S 0.1192 4.7 ×108
Totals N.A. 1031.0 S 0.2565 9.5 ×108
Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute
expectations:
1.1 Means.
1.2 Variances.
1.3 Probabilities.
2. The new method (MLMC) also performs this task with no bias (exact).
3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).
4. Will commonly be many orders of magnitude faster.
5. Applicable to essentially all continuous time Markov chains:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk .
6. Con: Is substantially harder to implement; good software is needed.
7. Makes no use of any specific structure or scaling in the problem.
Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute
expectations:
1.1 Means.
1.2 Variances.
1.3 Probabilities.
2. The new method (MLMC) also performs this task with no bias (exact).
3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).
4. Will commonly be many orders of magnitude faster.
5. Applicable to essentially all continuous time Markov chains:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk .
6. Con: Is substantially harder to implement; good software is needed.
7. Makes no use of any specific structure or scaling in the problem.
Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute
expectations:
1.1 Means.
1.2 Variances.
1.3 Probabilities.
2. The new method (MLMC) also performs this task with no bias (exact).
3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).
4. Will commonly be many orders of magnitude faster.
5. Applicable to essentially all continuous time Markov chains:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk .
6. Con: Is substantially harder to implement; good software is needed.
7. Makes no use of any specific structure or scaling in the problem.
Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute
expectations:
1.1 Means.
1.2 Variances.
1.3 Probabilities.
2. The new method (MLMC) also performs this task with no bias (exact).
3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).
4. Will commonly be many orders of magnitude faster.
5. Applicable to essentially all continuous time Markov chains:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk .
6. Con: Is substantially harder to implement; good software is needed.
7. Makes no use of any specific structure or scaling in the problem.
Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute
expectations:
1.1 Means.
1.2 Variances.
1.3 Probabilities.
2. The new method (MLMC) also performs this task with no bias (exact).
3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).
4. Will commonly be many orders of magnitude faster.
5. Applicable to essentially all continuous time Markov chains:
X (t) = X (0) +∑
k
Yk
(∫ t
0λk (X (s))ds
)ζk .
6. Con: Is substantially harder to implement; good software is needed.
7. Makes no use of any specific structure or scaling in the problem.
Another example: Viral infectionLet
1. T = viral template.
2. G = viral genome.
3. S = viral structure.
4. V = virus.
Reactions:
R1) T+ stuffκ1→ T + G κ1 = 1
R2) Gκ2→ T κ2 = 0.025
R3) T+ stuffκ3→ T + S κ3 = 1000
R4) Tκ4→ ∅ κ4 = 0.25
R5) Sκ5→ ∅ κ5 = 2
R6) G + Sκ6→ V κ6 = 7.5× 10−6
I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.
Another example: Viral infection
Stochastic equations for X = (XG,XS ,XT ,XV ) are
X1(t) = X1(0) + Y1
(∫ t
0X3(s)ds
)− Y2
(0.025
∫ t
0X1(s)ds
)− Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
)X2(t) = X2(0) + Y3
(1000
∫ t
0X3(s)ds
)− Y5
(2∫ t
0X2(s)ds
)− Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
)X3(t) = X3(0) + Y2
(0.025
∫ t
0X1(s)ds
)− Y4
(0.25
∫ t
0X3(s)ds
)X4(t) = X4(0) + Y6
(7.5× 10−6
∫ t
0X1(s)X2(s)ds
).
Another example: Viral infection
Reactions:
R1) T+ stuffκ1→ T + G κ1 = 1
R2) Gκ2→ T κ2 = 0.025
R3) T+ stuffκ3→ T + S κ3 = 1000
R4) Tκ4→ ∅ κ4 = 0.25
R5) Sκ5→ ∅ κ5 = 2
R6) G + Sκ6→ V κ6 = 7.5× 10−6
If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).
Can average out to get approximate process Z (t).
Another example: Viral infection
Reactions:
R1) T+ stuffκ1→ T + G κ1 = 1
R2) Gκ2→ T κ2 = 0.025
R3) T+ stuffκ3→ T + S κ3 = 1000
R4) Tκ4→ ∅ κ4 = 0.25
R5) Sκ5→ ∅ κ5 = 2
R6) G + Sκ6→ V κ6 = 7.5× 10−6
If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).
Can average out to get approximate process Z (t).
Another example: Viral infection
Approximate process satisfies.
Z1(t) = X1(0) + Y1
(∫ t
0Z3(s)ds
)− Y2
(0.025
∫ t
0Z1(s)ds
)− Y6
(3.75× 10−3
∫ t
0Z1(s)Z3(s)ds
)Z3(t) = X3(0) + Y2
(0.025
∫ t
0Z1(s)ds
)− Y4
(0.25
∫ t
0Z3(s)ds
)Z4(t) = X4(0) + Y6
(3.75× 10−3
∫ t
0Z1(s)Z3(s)ds
).
(1)
Now useEf (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).
Another example: Viral infection
X(t) = X(0) + Y1,1
(∫ t
0min{X3(s), Z3(s)}ds
)ζ1 + Y1,2
(∫ t
0X3(s)− min{X3(s), Z3(s)}ds
)ζ1
+ Y2,1
(0.025
∫ t
0min{X1(s), Z1(s)}ds
)ζ2 + Y2,2
(0.025
∫ t
0X1(s)− min{X1(s), Z1(s)}ds
)ζ2
+ Y3
(1000
∫ t
0X3(s)ds
)ζ3
+ Y4,1
(0.25
∫ t
0min{X3(s), Z3(s)}(s)ds
)ζ4 + Y4,2
(0.25
∫ t
0X3(s)− min{X3(s), Z3(s)}(s)ds
)ζ4
+ Y5
(2∫ t
0X2(s)ds
)ζ5
+ Y6,1
(∫ t
0min{λ6(X(s)), Λ6(Z (s))}ds
)ζ6 − Y6,2
(∫ t
0λ6(X(s))− min{λ6(X(s)), Λ6(Z (s))}ds
)ζ6
Z (t) = Y1,1
(∫ t
0min{X3(s), Z3(s)}ds
)ζ1 + Y1,3
(∫ t
0Z3(s)− min{X3(s), Z3(s)}ds
)ζ1
+ Y2,1
(0.025
∫ t
0min{X1(s), Z1(s)}ds
)ζ2 + Y2,3
(0.025
∫ t
0Z1(s)− min{X1(s), Z1(s)}ds
)ζ2
+ Y4,1
(0.25
∫ t
0min{X3(s), Z3(s)}(s)ds
)ζ4 + Y4,3
(0.25
∫ t
0Z3(s)− min{X3(s), Z3(s)}(s)ds
)ζ4
+ Y6,1
(∫ t
0min{λ6(X(s)), Λ6(Z (s))}ds
)ζ6 − Y6,3
(∫ t
0Λ6(Z (s))− min{λ6(X(s)), Λ6(Z (s))}ds
)ζ6,
Another example: Viral infection
Suppose wantEXvirus(20)
Given T (0) = 10, all others zero.
Method: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010
Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).
Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108
Exact + crude Monte Carlo used:
1. 60 times more total steps.
2. 22 times more CPU time.
Another example: Viral infection
Suppose wantEXvirus(20)
Given T (0) = 10, all others zero.
Method: Exact algorithm with crude Monte Carlo.
Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010
Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).
Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108
Exact + crude Monte Carlo used:
1. 60 times more total steps.
2. 22 times more CPU time.
Mathematical Analysis
We had
X (t) = X (0) +∑
k
Yk
(∫ t
0λ′k (X (s))ds
)ζk .
Assumed ∑k
λ′k (X (·)) ≈ N � 1.
There are therefore two extreme parameters floating around our models:
1. Some parameter N � 1, causing N � 1 (inherent to model).
2. h, the stepsize (inherent to approximation).
To quantify errors, need to account for both.
Mathematical Analysis: Scaling in style of Thomas Kurtz
For each species i , define the normalized abundance
X Ni (t) = N−αi Xi(t),
where αi ≥ 0 should be selected so that X Ni = O(1).
Rate constants, κ′k , may also vary over several orders of magnitude. We write
κ′k = κk Nβk
where the βk are selected so that κk = O(1).
Eventually leads to scaled model
X N(t) = X N(0) +∑
k
Yk
(Nγ
∫ t
0Nβk +α·νk−γλk (X N(s))ds
)ζN
k .
Mathematical Analysis: Scaling in style of Thomas Kurtz
For each species i , define the normalized abundance
X Ni (t) = N−αi Xi(t),
where αi ≥ 0 should be selected so that X Ni = O(1).
Rate constants, κ′k , may also vary over several orders of magnitude. We write
κ′k = κk Nβk
where the βk are selected so that κk = O(1).
Eventually leads to scaled model
X N(t) = X N(0) +∑
k
Yk
(Nγ
∫ t
0Nβk +α·νk−γλk (X N(s))ds
)ζN
k .
Mathematical Analysis: Scaling in style of Thomas Kurtz
For each species i , define the normalized abundance
X Ni (t) = N−αi Xi(t),
where αi ≥ 0 should be selected so that X Ni = O(1).
Rate constants, κ′k , may also vary over several orders of magnitude. We write
κ′k = κk Nβk
where the βk are selected so that κk = O(1).
Eventually leads to scaled model
X N(t) = X N(0) +∑
k
Yk
(Nγ
∫ t
0Nβk +α·νk−γλk (X N(s))ds
)ζN
k .
Results
X N(t) = X N(0) +∑
k
Yk
(Nγ
∫ t
0Nckλk (X N(s))ds
)ζN
k .
Let ρk ≥ 0 satisfy
|ζNk | ≈ N−ρk ,
and setρ = min{ρk}.
Theorem (A., Higham 2011)Suppose (Z N
` ,ZN`−1) satisfy coupling with Z N
` (0) = Z N`−1(0). Then,
supt≤T
E|Z N` (t)− Z N
`−1(t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2` .
Results
X N(t) = X N(0) +∑
k
Yk
(Nγ
∫ t
0Nckλk (X N(s))ds
)ζN
k .
Let ρk ≥ 0 satisfy
|ζNk | ≈ N−ρk ,
and setρ = min{ρk}.
Theorem (A., Higham 2011)Suppose (X N ,Z N
` ) satisfy coupling with X N(0) = Z N` (0). Then,
supt≤T
E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2
` .
Flavor of Proof
Theorem (A., Higham 2011)Suppose (X N ,Z N
` ) satisfy coupling with X N(0) = Z N` (0). Then,
supt≤T
E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2
` .
X N (t) =X N (0) +∑
k
Yk,1
(NγNck
∫ t
0λk (X N (s)) ∧ λk (Z N
` ◦ η`(s))ds)ζN
k
+∑
k
Yk,2
(NγNck
∫ t
0λk (X N (s))− λk (X N (s)) ∧ λk (Z N
` ◦ η`(s))ds)ζN
k
Z N` (t) =Z N
` (0) +∑
k
Yk,1
(NγNck
∫ t
0λk (X N (s)) ∧ λk (Z` ◦ η`(s))ds
)ζN
k
+∑
k
Yk,3
(NγNck
∫ t
0λk (Z N
` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds
)ζN
k
Flavor of Proof
So,
X N (t)− Z N (t) =∑
k
Yk,2
(NγNck
∫ t
0λk (X N (s))− λk (X N (s)) ∧ λk (Z N
` ◦ η`(s))ds)ζN
k
−Yk,3
(NγNck
∫ t
0λk (Z N
` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds
)ζN
k
Hence,
X N(t)− Z N(t) = MN(t) +∑
k
NγζNk Nck
∫ t
0(λk (X N(s))− λk (Z N
` ◦ η`(s)))ds.
Now work.
Flavor of Proof
So,
X N (t)− Z N (t) =∑
k
Yk,2
(NγNck
∫ t
0λk (X N (s))− λk (X N (s)) ∧ λk (Z N
` ◦ η`(s))ds)ζN
k
−Yk,3
(NγNck
∫ t
0λk (Z N
` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds
)ζN
k
Hence,
X N(t)− Z N(t) = MN(t) +∑
k
NγζNk Nck
∫ t
0(λk (X N(s))− λk (Z N
` ◦ η`(s)))ds.
Now work.
Next problem: parameter sensitivities.
Motivated by Jim Rawlings.
We have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λθk (X
θ(s))ds)ζk .
and we define
J(θ) = Ef (X θ(t)].
We wantJ ′(θ) =
ddθ
Ef (X θ(t)).
There are multiple methods. We consider finite differences:
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε).
Next problem: parameter sensitivities.
Motivated by Jim Rawlings.
We have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λθk (X
θ(s))ds)ζk .
and we define
J(θ) = Ef (X θ(t)].
We wantJ ′(θ) =
ddθ
Ef (X θ(t)).
There are multiple methods. We consider finite differences:
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε).
Next problem: parameter sensitivities.
Motivated by Jim Rawlings.
We have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λθk (X
θ(s))ds)ζk .
and we define
J(θ) = Ef (X θ(t)].
We wantJ ′(θ) =
ddθ
Ef (X θ(t)).
There are multiple methods. We consider finite differences:
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε).
Next problem: parameter sensitivities.
Motivated by Jim Rawlings.
We have
X θ(t) = X θ(0) +∑
k
Yk
(∫ t
0λθk (X
θ(s))ds)ζk .
and we define
J(θ) = Ef (X θ(t)].
We wantJ ′(θ) =
ddθ
Ef (X θ(t)).
There are multiple methods. We consider finite differences:
J ′(θ) =J(θ + ε)− J(θ)
ε+ O(ε).
Next problem: parameter sensitivities.
Noting that
J ′(θ) =ddθ
Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))
ε+ o(ε).
The usual finite difference estimator is
DR(ε) = ε−1
1R
R∑i=1
f (X θ+ε[i] (t))− 1
R
R∑j=1
f (X θ[j](t))
If generated independently, then
Var(DR(ε)) = O(R−1ε−2).
Next problem: parameter sensitivities.
Noting that
J ′(θ) =ddθ
Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))
ε+ o(ε).
The usual finite difference estimator is
DR(ε) = ε−1
1R
R∑i=1
f (X θ+ε[i] (t))− 1
R
R∑j=1
f (X θ[j](t))
If generated independently, then
Var(DR(ε)) = O(R−1ε−2).
Next problem: parameter sensitivities.
Couple the processes.
X θ+ε(t) = X θ+ε(0) +∑
k
Yk,1
(∫ t
0λθ+ε
k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk
+∑
k
Yk,2
(∫ t
0λθ+ε
k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds
)ζk
X θ(t) = X θ(0) +∑
k
Yk,1
(∫ t
0λθ+ε
k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk
+∑
k
Yk,3
(∫ t
0λθk (X
θ(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds
)ζk ,
Use:
DR(ε) = ε−1 1R
R∑i=1
[f (X θ+ε
[i] (t))− f (X θ[i](t))
].
Next problem: parameter sensitivities.
Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E
[supt≤T
(f (X θ+ε(t))− f (X θ(t))
)2]≤ CT ,f ε.
This lowers variance of estimator from
O(R−1ε−2),
toO(R−1ε−1).
Lowered by order of magnitude (in ε).
1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.
Next problem: parameter sensitivities.
Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E
[supt≤T
(f (X θ+ε(t))− f (X θ(t))
)2]≤ CT ,f ε.
This lowers variance of estimator from
O(R−1ε−2),
toO(R−1ε−1).
Lowered by order of magnitude (in ε).
1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.
Parameter Sensitivities
G 2→ G + M,
M 10→ M + P,
M k→ ∅,
P 1→ ∅.
Want∂
∂kE[X k
protein(30)], k ≈ 1/4.
Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S
CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S
Parameter Sensitivities
G 2→ G + M,
M 10→ M + P,
M k→ ∅,
P 1→ ∅.
Want∂
∂kE[X k
protein(30)], k ≈ 1/4.
Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S
CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S
Analysis
TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E supt≤T
(f (X θ+ε(t))− f (X θ(t))
)2≤ CT ,f ε.
Proof:
Key observation of proof:
X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t
0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.
Now work on Martingale and absolutely continuous part.
Analysis
TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which
E supt≤T
(f (X θ+ε(t))− f (X θ(t))
)2≤ CT ,f ε.
Proof:Key observation of proof:
X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t
0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.
Now work on Martingale and absolutely continuous part.
Thanks!
References:
1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo forcontinuous time Markov chains, with applications in biochemical kinetics,to appear in SIAM: Multiscale Modeling and Simulation.
Available at arXiv.org:1107.2181. Also on my website:www.math.wisc.edu/˜anderson.
2. David F. Anderson, Efficient Finite Difference Method for ParameterSensitivities of Continuous time Markov Chains, submitted.
Available at arXiv.org:1109.2890. Also on my website:www.math.wisc.edu/˜anderson.
Funding: NSF-DMS-1009275.