Paula Tataru
Conditional expectations of sufficient statistics for continuous time Markov chains
Joint work with Asger Hobolth
November 28, 2012
Motivation
Methods
Results
Paula Tataru
2/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
A stochastic process {X(t) | t ≥ 0}
that fulfills the Markov property
Motivation
Methods
Results
Paula Tataru
3/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Credit risk
• States: different types of ratings
• Chemical reactions
• States: different states of a molecule
• DNA data
• States:• Nucleotides• Amino acids• Codons
Motivation
Methods
Results
Paula Tataru
4/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
1st 2nd base 3rdbase T C A G base
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTTC TCC TAC TGC CTTA
Leu
TCA TAA Stop TGA Stop ATTG TCG TAG Stop TGG Trp G
C
CTT CCT
Pro
CATHis
CGT
Arg
TCTC CCC CAC CGC CCTA CCA CAA
GlnCGA A
CTG CCG CAG CGG G
A
ATTIle
ACT
Thr
AATAsn
AGTSer
TATC ACC AAC AGC CATA ACA AAA
LysAGA
ArgA
ATG Met ACG AAG AGG G
G
GTT
Val
GCT
Ala
GATAsp
GGT
Gly
TGTC GCC GAC GGC CGTA GCA GAA
GluGGA A
GTG GCG GAG GGG G
Motivation
Methods
Results
Paula Tataru
5/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Amino acids (n = 20)
• Codons (n = 64)
Motivation
Methods
Results
Paula Tataru
6/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Characterized by a rate matrix
with properties
Motivation
Methods
Results
Paula Tataru
7/30
Conditional expectations of sufficient statistics for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Jukes Cantor (JC)
Motivation
Methods
Results
Paula Tataru
8/30
Conditional expectations of sufficient statistics for CTMCs
Continuous-time Markov chains
• Waiting time is exponentially distributed
• Jumps are discretely distributed
Motivation
Methods
Results
Paula Tataru
9/30
Conditional expectations of sufficient statistics for CTMCs
General EM
Motivation
Methods
Results
Paula Tataru
10/30
Conditional expectations of sufficient statistics for CTMCs
General EM
• Full log likelihood
• E-step
• M-step
Motivation
Methods
Results
Paula Tataru
11/30
Conditional expectations of sufficient statistics for CTMCs
EM for Jukes Cantor
• Full log likelihood
• M-step
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
12/30
Calculating expectations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
13/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
14/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
15/30
Linear combinations
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
16/30
Methods
• In Hobolth&Jensen (2010), three methods are presented to evaluate the necessary statistics
• Eigenvalue decomposition (EVD)
• Uniformization (UNI)
• Exponentiation (EXPM)
• Extend efficiently to linear combinations
• Which method is more accurate?
• Which method is faster?
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
17/30
Eigenvalue decomposition
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
18/30
Uniformization
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
19/30
Uniformization
• Choose truncation point s(μt)
• Bound error using the tail of Pois(m+1; μt)
• s(μt) = 4 + 6 √μt + μt
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
20/30
Exponentiation
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
21/30
Algorithms
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1 μ, s(μt), R
O(n3) O(n2) O(n2)
Rm eAt
O(n2) O(s(μt)n3) O(n3)
O(n3) O(s(μt)n2) O(n2)
μt —
* expm package in R
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
22/30
Accuracy
• Use two models for which ∑(C; t) is available in analytical form
• JC, varying n
• HKY, varying t
• Measure accuracy as the normalized difference: approximation / true - 1
• as a function of n
• as a function of t
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
23/30
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
24/30
Speed
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1 μ, s(μt), R
O(n3) O(n2) O(n2)
Rm eAt
O(n2) O(s(μt)n3) O(n3)
O(n3) O(s(μt)n2) O(n2)
μt —
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
25/30
Speed
• Partition of computation
• Precomputation• EVD• UNI
• Main computation
• Use three models and different time points to asses the speed
• GY, n = 61
• GTR, n = 4
• UNR, n = 4
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
26/30
Speed
• 10 equidistant time points
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
27/30
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
28/30
Results
• Accuracy
• Similar accuracy
• EXPM is the most accurate one
• Speed
• EXPM is the slowest
• EVD vs UNI
Motivation
Methods
Results
Paula Tataru
Conditional expectations of sufficient statistics for CTMCs
29/30
Results
• UNI has potential
• Better cut off point s(μt)
• Ameliorate effect of μt by
• Adaptive uniformization
• On-the-fly variant of adaptive uniformization
Thank you!
Comparison of methods for calculating conditional expectations of sufficient statistics
for continuous time Markov chains
BMC Bioinformatics 12(1):465, 2011