Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 1
Computational Challenges in Large-Scale Pathway Modeling
Frank TobinScientific Computing and Mathematical
ModelingGlaxoSmithKline
September 22, 2004
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 2
Agenda
• Biological pathways♦ simple example of a pathway♦ simple example of pharmaceutical interest
• Building a mathematical model of biological networks
• Computational challenges
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 3
Motivation
• Build as complete a model of as much of a cell or organism as possible
♦ E. coli is the archetypical prototype
• Figure out what to do with it once we get itWhat if we had a perfect model? Then what?
model
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 4
What is a Pathway?For the purposes of this talk:
A network of interaction biological entities represented as a directed graph.
So network and pathway are equivalent under this definition.
Saturated Fatty Acid Elongation
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 5
Pharmaceutical Interest in Pathways
• Predicting culture conditions for overproduction of biopharmaceuticals and drug targets, bioengineering of target assays, enzymes, receptors, etc.
• Understanding compound modes of action• Identifying novel behaviors and new behaviors of
known pathways♦ clues to new intervention approaches♦ selecting and prioritizing of new targets
• Identifying and validating bio-markers♦ animal ⇔⇔⇔⇔ human correlation
• Interpreting and integrating system biology data:♦ transcriptomics, proteomics and metabolomics and other ‘omics’
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 6
A Simple Pharmaceutical Pathway Example
• Risperidone is a psychotropic agent used for treating schizophrenia or psychosis
• 2.1% of patients develop extrapyramidal symptoms:♦ involuntary movements♦ tremors and rigidity♦ body restlessness♦ muscle contractions ♦ changes in breathing and heart rate
• Hypothesis for the extrapyramidal symptoms:Dopamine receptor antagonismYamada, et al, Synapse 46, 32-37 (2002)
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 7
Mechanism of dopamine receptor inhibition
Receptor Binding: DA + D2 ⇔ DA•D2Formation of active complex: DA•D2 + T ⇔ DA•D2•T
R + D2 ⇔ R•D2R + HT2 ⇔ R•HT2OH + D2 ⇔ OH•D2OH + HT2⇔ OH•HT2
R → OHRisperidone conversionto 9-hydroxyrisperidoneBinding to D2 and 5-HT2receptors
DA: DopamineD2: ReceptorT: TransmitterR: RisperidoneOH: 9-hydroxyRHT2: Receptor
D2DAD2
DA D2
R
OH
R
OH
D2
D2
T
DAT
OH HT2
R HT2
HT2
clearance
clearance
dosing
Missing from Yamada ModelIncorrectly specified in Yamada
Non-antagonized systemRisperidone dosing and clearanceRisperidone metabolismRisperidone antagonism
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 8
Yamada model for Risperidone PK
Oral dose of Risperidone
Gut
Blood
R , OHclearance
Yamada et al, 2002, Synapse, 46:32-37
1-compartment PK model for Risperidone concentration
Input (from gut)
cR(t)cOH(t)
ka,Rka,OH
kel,Rkel,OH
c(t) = A(c0,ka, kel ) exp(-kelt) - exp(-kat)[ ] Clearance
0 5 10 15 20 25 30 35 40 4510-2
10-1
100
101
102
Plas
ma
conc
. (ng
/ml)
OH- 1mg doseR - 1mg doseOH - 1mg doseR - 1 mg doseOH - 2mg doseR - 2mg doseOH- 2 mg doseR - 2mg dose
time (h)
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 9
The ODE Model Approach
Biological model
Mathematical modelNumerical simulations
′x = f (x,λλλλ ) + D(t)ODEs
d[R]gut
dt = - ka
R [R]gut
d[OH]gut
dt = - ka
OH [OH]gut
d[R]dt
= kaR Rgut - kel
R [R]
d[OH]dt
= kaOH [OH]gut - kel
OH [OH]
d[DAgD2]dt
= k+DAgD2 [DA][D2] - KA
DAgD2 k+DAgD2 [DAgD2]
d[DAgD2gT]dt
= k+DAgD2 gT [DAgD2][T] - KA
DAgD2 gTk+DAgD2 gT [DAgD2gT]
d[RgD2]dt
= k+RgD2 βR[R][D2] - KA
RgD2 k+RgD2 [RgD2]
d[OHgD2]dt
= k+OHgD2 βOH[OH][D2] - KA
OHgD2 k+OHgD2 [OHgD2]
d[RgHT2]dt
= k+RgHT2 βR[R][HT2] - KA
RgHT2 k+RgHT2 [RgHT2]
d[OHgHT2]dt
= k+OHgHT2 βOH[OH][HT2] - KA
OHgHT2 k+OHgHT2 [OHgHT2]
[DA]total = [DA] + [DAgD2] + [DAgD2gT][T]total = [T] + [DAgD2gT][D2]total = [D2] + [DAgD2] + [DAgD2gT] + [RgD2] + [OHgD2] [HT2]total = [HT2] + [RgHT2] + [OHgHT2]
DAD2
DA D2
R
OH
R
OH
D2
D2
TDA
T
OH HT2
R HT2
HT2
clearance
clearance
dosing
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 10
Daily Dosing Differs from a Single DosePlasma Concentration
R, OH exptl data from Ishigooka et al., Clin Eval 19, 93-163 (1991)
Pla
sma
conc
. (ng
/ml)
time (h)
0 24 48 72 96 1200.01
0.1
1
10
Average OH conc.
Average R conc.
OH simulationR simulationOH - single doseR - single dosedaily dose
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 11
0.10
10
20
30
40
50
60
70
80
90M
ean
rece
ptor
occ
upan
cy (%
)
5-HT2
Effect of multiple dosing on receptor occupancy
1 10 100dose (mg/day)
100
TherapeuticRange
1dose5 doses
1dose5 doses
D2
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 12
Daily dosing causes differences in predicted side-effects
Multiple dosing results in increased ESRS shift, increasing with daily dose administered
0.1 1 10 1000
0.5
1
1.5
2
2.5
3
dose (mg/day)E
SR
S s
hift
Single dose
5 daily doses
0 20 40 60 80 1000
0.5
1
1.5
2
2.5
3
D2 receptor occupancy
ES
RS
Shi
ft
Single DoseExperimental Data - single doseMultiple dose (daily, 5 days)
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 13
Receptor Occupancy as a function of cumulative dosing
D2
0 24 48 72 96 1200
10
20
30
40
50
60
70
80
90
100
time(h)
Occ
upan
cy
R + OH
5-HT2
0 24 48 72 96 1200
20
40
60
80
100
time (h)
Occupancy
R + OH
Only R
Cumulative changes in occupancy
First 24 hours identical between single and multiple
doses
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 14
Real Pathways are More Complex
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 15
Mathematical Complexity
• Consider a small, relatively unsophisticated bacterium: Escherichia coli
♦ ≈ 2000 genes♦ 2500 proteins ♦ at least several hundred small molecules♦ 3 interactions per entity X 5000 entities♦ 3 parameters per equation♦ ≈ 15 000 equations with 45 000 parameters!
• Now add on spatial change - 15 000 PDEs!
′ X = F(X;λ ) continuous,discrete, stochastic0 = G(X;λ ) analytic constraints0 = H(X;λ) non − analytic constraints
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 16
The Modeling Process
Building the model
Getting it right
Getting value out of it
1A Building the model -- forward problem♦ Static♦ Kinetic
− Rate law determination− Parameter determination
1B Reconstructing the model -- inverse problem2 Validating the model
♦ Experimental data comparison♦ Plausible biology from analytic analysis/simulation♦ Examining and assertions testing results
· 3 Simulation ♦ Hypothesis testing♦ Hypothesis generation
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 17
• Only connectivity (topology) of the interactions• Visualised as connection or interaction graph• Used for initial model verification and testing • Types
♦ Metabolic♦ Gene Regulation♦ Gene-Product, and Protein-Protein Interactions
Static Model
RNAProteinMetabolites
DNA
R 12
R 19
2.7.1.69HPr-P
HPr
5.3.1.9
D-glucoseR1
R 2
R32.7.1.11
ADP
ATP
D-glucose 6-phosphate
D-fructose 6-phosphate
D-fructose 1,6-bisphosphate
Metabolic network Genetic network
Activator
σσσσ ββββ ’ββββ ααααααααx y
Repressor
OPERON
rasT
R-G-S-rGTP
R-G-S-rT
RP RPrasD
R-Sh-G-S-rD
R-Sh-G-S-r
R-Sh-G-S-rT
ras
R-GP-rT
GDP
GTP
GDP
R-G-S
GDP
GTP
R-G-SR-Sh-G-S
R-GAPP
R-GAPP
Pi
Pi
Gene-Product interactions network
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 18
Kinetic Model
• First phase: Kinetic models - time dependency incorporated♦ Kinetic behaviour (rate laws) added to static model♦ May or may not obey mass action kinetics
• Second phase: Kinetic constants determined from experimental data• Third phase Mathematical model - equations generated
♦ Time variation of all concentrations and fluxes can be simulated♦ Model analyses possible: sensitivity, linear stability theory, asymptotic
analysis, etc.
ReceptorInhibitor
Ligand
Static model Numerical Simulation
Kinetic ModelR + L ⇔ R ⋅ LR + I ⇔ R ⋅ I
R[ ]′ = −k1 R[ ] L[ ] + k2 RL[ ] − k3 R[ ] I[ ] + k4 RI[ ]RL[ ]′ = k1 R[ ] L[ ] −k2 RL[ ]RI[ ]′ = k3 R[ ] I[ ] − k4 RI[ ]L[ ]′ = −k1 R[ ] L[ ] + k2 RL[ ]I[ ]′ = −k3 R[ ] I[ ] + k4 RI[ ]L0 = L[ ] + RL[ ]I0 = I[ ] + RI[ ]R0 = R[ ] + RL[ ] + RI[ ]
Mathematical ModelExample: Inhibition of a Ligand-Receptor Complex Formation
Time10,0008,0006,0004,0002,0000L
igan
d-R
ecep
tor
Com
plex250
200
150
100
50
0+
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 19
• The resulting system of equations:
• Very large dimensionalities in:♦ the number of species, X♦ the number of interactions♦ the number of parameters, λ♦ the number of constraint equations
• Uncertainty, error, ambiguity, approximations, etcFatty Acid ACP Biosynthesis
The Resulting SystemVery Large, Flawed, and Damned Useful!
′ x = F(x,l )0 = G(x, l ) algebraic relationships0 = H(x,l ) analytic constraints0 = I(x,l ) non - analytic constraints
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 20
As the pathways grow large, the nature of the problems change.
• Building the model♦ knowledge management♦ knowledge updating♦ incomplete knowledge ♦ Automation♦ Updating the model - versioning
• Analysis of the model♦ Too much for a human to peruse♦ Theory gaps♦ Automation
• Analysis of the simulation results♦ Too much for a human to peruse♦ New techniques♦ Automation
Phytanic Acid Peroxisomal Oxidation
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 21
Automation
• No human intervention whatsoever♦ None, nada, zip!♦ If it takes a human to setup, run or analyze - its not automated
• Robust algorithms♦ Graceful failure♦ Knowledge of domain of applicability♦ Pathological data happens very often - Murphy is omnipresent
• Not as easy as it main seem at first
• Many existing algorithms are not automatable in current usage
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 22
The Model Understanding RoadmapBiology
Understanding
Analyzing Results
Static model
Kinetic model
Dynamics model
graph theory
analytic theory
analytic theory
AutomateExhaustive Analysis
Analysis
New experimentsFind model errors
“Gaps”
Computational Opportunities
Simulations
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 23
Theory Gap for Large Systems
• Large but not infinite dimensionality is the problem• Analytical and numerical determination:
♦ Finding ‘true’ null states - there may be a great number♦ Finding linear null states- there may be a great number♦ Asymptotic behaviors♦ Controllability, predictability, integrability, ...♦ Steady state, non-linear behaviors♦ Bifurcation analyses♦ Perturbed behaviors - drug dosing, environment, mutants, etc.♦ ...
• How to calculate in a computationally efficient manner• Can’t afford to calculate everything• Need to a priori determine which are to be done
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 24
Continuousness / Stochasticity/ Discreteness / Ambiguity
• Continuous approximation breaks down♦ Need to use master equations or some other form of involving stochasticity♦ May need to dynamically switch as system evolves
• Some processes are truly discrete♦ Consider cellular automatons, Petri Nets, discrete events, etc.
• Some parts of the model are only known qualitatively♦ Qualitative simulation techniques.
• Uncertainty and variation in the system♦ Initial conditions♦ Rate constants and rate laws♦ Population variations♦ Interval or fuzzy integration
• Multiscale - time, length, concentration, etc.• Constraints - DAEs
The challenge: one hybrid integrator
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 25
Parameter challenges
• The larger the model:♦ the more parameters compared to the experiments
• Static guessing - filling in the gaps♦ guessing gene function by analogy♦ looking for missing reactions - i.e. enzyme
• Kinetic guessing - integrating kinetic islands - guessing plausible rate laws and parameters
♦ Analogy approaches, similarity across species(‘multiple alignment’)♦ From flux analysis?
• Do we need to know all parameters? Accuracy?
PyrD: DHO + Q = Or + QH2
DHO (mM)0.20.180.160.140.120.10.080.060.040.02
v (A
600/
min
)
0.3
0.25
0.2
0.15
0.1
0.05
Or = 0 µM
Or = 10 µMOr = 20 mM
Or = 40 mM
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 26
Parameter challenges• Determine parameters of rate laws from an
optimization to fit experimental kinetics data♦ noisy and incomplete data♦ ill-posed, possibly severely
• How do we scale this up as the model gets bigger?♦ One huge model fitting? - Can we even afford this approach?♦ One sub-systems at a time fitting?♦ Hierarchical fitting? - Stitching together pieces individually
calibrated does not a priori mean the model is calibrated
• What’s the best way to optimize?♦ Is L2 the best objective function?♦ Constraints - incorporating and coming up with better ones
• How do we know how well we’ve done?
UTP (mM)0.50.450.40.350.30.250.20.150.10.05
v (%
)
908070605040302010
0
UMP = 0.1 mM
UMP = 1 mM
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 27
Inverse Problems and Biological Plausibility
• What makes a model more biological than another?♦ thermodynamic constraints♦ numerical integrity - semi-definite solutions♦ asymptotic behaviors♦ stability properties♦ information theory constraints♦ physico-chemical constraints♦ environmental constraints♦ evolution constraints♦ flux distributions♦ mass and energy balance
• Parameter determination needs also
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 28
• Visualization in a large graph with too much detail♦ Analysis of results - what’s interesting?♦ Drill down, hyperbolic viewers, database driven for large models♦ Visualizing fluxes in a meaningful way
• How do you visualize huge networks?
• Tools needed for panning, zooming, drill-down, scalable, incrementally updatable from a database, etc.
• Pathway editors for input• Animation - visualizing temporal fluxes
Visualization Challenges
Experiments By T. Munzer, UBC, for visualizing Web connections
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 29
Discovering “New” BiologyAssumption: if we didn’t know anything any biology per se,
could we rediscover it from the model?
Caveat: if we can find “old” biology, then presumably we could find “new” biology
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 30
Discovering “New” Biology• Finding new cooperative or emergent phenomena:
♦ pathways and “distinguishable” sub-systems♦ cycles and “clocks”♦ oscillatory systems♦ regulatory systems♦ “states” or “modes” of the system
• The resulting biology acts as plausible checks on the model
• Some ideas:♦ Persistent - pathway behavior is or is not independent of initial conditions♦ Conditional - pathway is active only for certain initial conditions - the nub of
course is how to identify this♦ Model ⇒⇒⇒⇒ graph ⇒⇒⇒⇒ matrix ⇒⇒⇒⇒ permutation matrix reordering
⇒⇒⇒⇒ structure ⇒⇒⇒⇒ biology?♦ Pattern recognition approaches. Model comparison? Different
organisms/species?♦ Some type of flux or domain decomposition?
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 31
How do you know they’re right?Assertions checking
• Provide a means to formally represent biology that went into the model
♦ aspects of computer language parsing, AI-knowledge representation, inference
• Purposes♦ as a formal computer language for incorporation into software♦ for automation of the biology knowledge comparisons against data♦ allow checking model accuracy♦ used as criteria for optimisation - e.g. parameter determination of rate laws
• Consequences of the assertions♦ require certain behaviours to be present in the model♦ expect, but not require some behaviours♦ search for speculative behaviours♦ provide diagnostic tools for examining the quality of the data
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 32
Assertions - Bacterial Aerobicity ExampleDifferent genes are expressed under different environments conditions - temperature, media composition, pH, and oxygen. Regulatory systems control expression, but assertions can be used to ensure the basic regulatory processes of the model are accurate.
# Find the time when the system changes from anaerobic to aerobic behaviour and then# make sure that the key regulations appear to be happening
Regulation_time = time > change.time('ANAEROBIC', 'AEROBIC')AND 'ArcA-P' >> 'ArcA' #positive regulation (activation) of ArcA by ArcA-POR 'FNR-ox' >> 'FNR-red’ #FNR repressed aerobically
#Then, if regulation appears to be happening, for each protein behaving aerobically:
ForEach aerobic_protein=aerobic(*) #look at each aerobic protein, one at a time{b = flux.value(aerobic_protein); #get the flux of each aerobic protein concentrationc = gene.name(aerobic_protein); #time course of expression of the parent geneif (regulation_time AND (b > 0)){Success Action: #if the assertion for this protein is true
Message ("'AerobicityState' confirmed by the expression profile of gene %s",c)Failure Action: #if the assertion for this protein is false
Message ("Gene %s does not have the expected 'AerobicityState' expression pattern",c)Status = WARNING #indicate a non-fatal problem
}}
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 33
STATICMODEL
BUILDING
FITTING
MODEL BUILDING SIMULATOR
ASSERTIONS
RESULTS DB
COMPARATORWEB VIEWER
SCENARIOSINPUT
AUTOMATED
HUMAN
EQNs
DYNAMICMODELS
BUILDING
REGISTRATION
HUMAN RESOLUTION
EXP.DATADB
SCENARIOSDB
ASSERTIONSINPUT
STATISTICSCOLLECTION
PROBLEM
RESULTSANALYSIS
BET
ASSERTIONSDB
STATISTICSDB
CONVER
SION
DATA
TRIGGERNEW
VERSION
MODEL DB
The Pathway Modeling Factory Concept
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 34
What Else Is There?Much, Much More !
Only limited by our imaginations
Scientific Computing and Mathematical Modeling ++++ ∞∞∞∞-∞∞∞∞ 35
Serge Dronov423907
Igor Goryanin55470021
Hugh Spence1000001
Frank Tobin-1+2.718i+3.14j-1.0k
Acknowledgements
Valeriu Damian-Iordache
60402
ChetanGadgil
00946218
Jana Wolf6
Laura Potter-978