Systems Biology Approaches to ATC and RDCR
Mitchell Jay Cohen MD FACS Associate Professor In Residence University of California San Francisco
San Francisco General Hospital
• San Francisco’s only trauma center • Cares for all pa7ents with trauma7c injury in San Francisco, regardless of ability to pay
• Serves 100,000 pa7ents per year • Cares for 4,900 injured pa7ents per year
SFGH Surgical Research Lab
Problems with our data.
• Single time point predictions • No attention to dynamics • Correlations • Multivariate data • Multivariate confounding • Over fitting • Dimentionality • Complexity
Why Models?
• They allow for hypothesis generation • They allow for visualization • They allow for identification of pathways,
and relationships impossible to otherwise visualize
• They allow for in silico experimentation.
They are the most rigorous and unambiguous representation of our hypotheses and understanding
Kinds of models
Cartoon Models
aPC
T
TM ePCR
PC
T
Vascular Endothelium
PAI-1
D-Dimers
FVa FVIIIa
TAFI
TM
T
TAFIa
Plasmin
Plasminogen
Fibrinolysis
tPA
PAI-1
INJURY / HYPOPERFUSION
Coagulopathy Hyperfibrinolysis
MBL
Factor B Factor D
C1q
C3b
C5
MAC (C5b-9)
C5a
C3b C1r C1s C4 C2 MASP-1
MASP-2 C2, C4
Alternative Pathway
Classical Pathway Lectin
Pathway
Animal Models: Traumatic Coagulopathy
• Soft-Tissue Trauma
• Hemorrhagic shock: – Non-ventilated, fixed-pressure. – Blood withdrawn via vascular line. – MAP 35 +/- 5mmHg x 60 min.
• Resuscitation: – LR @ 2x shed blood volume
+ shed blood
Statistical Models
• Multivariate component space
Fac
tor I
I
Factor VIII Factor V
PC2
PC1
Dynamic ODE Models
• Suppose input is a (unit) impulse, U(s) = 1 and:
• System transfer func7on, including delay:
( ) 32233 3322
)()(
βββα
βα
+++=
+=
sssssUsY
skn deksksks
ksUsY −
+++=
012
23)(
)(
Statistical Models
The first step…physiologic state recognition.
The problem of physiologic recognition Recognition of resuscitation status and patient
physiology remains elusive. Occult hypoperfusion is common but
difficult to identify.
Clinicians rely on a few favorite variables and developed clinical acumen for treatment decisions. This acumen is threatened.
Heart rate Blood pressure Oxygen saturation
Pulmonary artery pressure Intracranial pressure Temperature
Nursing Documentation Medications Treatments Intake Output Vital signs Blood products IV fluids
Tissue oxygen
Inspired oxygen Tidal volume Peak pressure
End expiratory pressure Respiratory rate Ventilation mode
Cardiac output
Sedation level
ICP wave
This is a data intense environment
The ICU Today
The ICU Today
Trauma Informatics
2013
Treat one parameter and to one threshold… We order interventions targeting single parameters
• Fluid bolus or pressors for SBP < 90 mmHg • Mannitol for ICP > 20 mmHg
The ICU Today
We treat univariately in a multivariate world
Recognition of transition and trends?
Data not manageable No Attention to Relationship Among Parameters Experience of practitioner
EtCO2
Muscle Oxygen
Ventilator Parameters
Sedation
HR MAP RR SpO2
nursing documentation
& lab results
Aristein Bioinformatics
The first step collect the data…
Fig. 2: Selected medical devices and network connections.
Harnessing the Data
Building the Infrastructure Biomedical Informatics for Critical Care
COMPUTER SCIENCE
ACADEMIC MEDICINE
INDUSTRY
A Multidisciplinary Collaboration
The next step…conjure the simple from the complex At any time point a patient’s state is characterized
by the interactions of all variables.
Patient states that are identified would be impossible to see by brute force or traditional methods.
Examination of only a few is insufficient and provides only partial information about the patient state.
New informatic techniques allows organization of complexity so that all data can be simultaneously used and examined.
0 33 67 100 133 -10 -8 -6 -4 -2 0 2 4 6 8
10 12 14 Base deficit
Muscle Lactate
50 83 117 150 17
Time in hours
What’s wrong with the old way?
-1000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000
-8
-6
-4
-2
0
2
4
6
8
10
12
14
60
80
100
120
140
160
0
100
200
300
400
500
600
700
800
900
20
30
40
50
60
70
80
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0.022
0.024
0.026
0.028
0.030
0.032
0.034
0.036
0.038
0.040
BD mlac mgluc
map hr
mpyr
pmo2 mlp
to51
New informatic techniques Come from business data mining, high
throughput genomics, and statistical physics. These include…
Hierarchical clustering, k-means clustering,
self organizing maps, artificial neural networks and network topology.
This is how…
We can make sense of the expression patterns of thousands of genes at one time.
Facebook can know know your next best friend before you know. Companies can market womens’ underwear next to cereal next to beer..
For this analysis… Microdialysis, lab and physiologic data were
collected on 25 patients.
Our data set had 45 variables, 96,000 rows and over 4 million data points.
Data analyzed using traditional techniques which got us only so far…
Turned to complex bioinformatic techniques to make sense of these data.
€
d i, j( ) = xik − x jk( )2
k=1
n
∑Distance metric – Euclidean distance:
Hierarchical Clustering
Calculate distance metric
Merge “closest” clusters
Recalculate affected distances
“Closest” defined by complete linkage:
€
C A,B( ) = maxi∈A , j∈B
d i, j( ){ }
Data row Dendogram and Heatmap
MAP Heart Rate PmO2 PmO2
Temp Lung Comp
PEEP Min Vol
SpO2 fiO2 mLac mGlu mGlut mPyru mLP
Cluster 1
82.7 ±
12.8 99.4
± 16.8
36.5 ±
14.5 37.3
± 0.9
37.2 ±
11.7 7.8 ±
3.9 10.3 ± 2.0
99.0 ±
1.7 51.7
± 19.0
3.7 ± 2.7
7.0 ± 2.3
7.4 ±
5.5 213.2
± 25.2
0.016 ±
0.011
Cluster 2
81.6 ±
13.2 96.2
± 21.3
36.0 ±
10.3 37.0
± 1.6
36.0 ±
14.1 7.6 ±
2.8 9.5 ±
2.2 98.5
± 2.2
54.6 ±
19.1 2.2 ±
0.7 7.2 ± 2.2
9.2 ±
4.4 113.9
± 32.7
0.020 ±
0.014
Cluster 3
75.7 ±
16.0 101.8
± 11.4
43.3 ±
13.1 37.1
± 0.8
199.4 ±
48.5 4.8 ±
0.4 8.3 ±
2.2 99.5
± 0.8
32.2 ±
4.9 6.2 ± 1.3
6.6 ±
0.1 5.6 ±
3.3 533.2
± 28.0
0.012 ±
0.002
Cluster 4
81.5 ±
12.0 105.3
± 28.4
31.4 ±
10.3 36.7
± 0.8
41.4 ±
24.9 6.7 ±
2.5 9.1 ±
2.1 99.0
± 2.0
45.0 ±
13.9 7.3 ±
2.1 6.8 ± 2.2
10.6 ±
15.1 523.4
± 32.9
0.014 ±
0.004
Cluster 5
86.6 ±
16.9 100.6
± 25.8
28.9 ±
11.2 36.4
± 0.8
30.4 ±
11.5 7.8 ±
3.0 9.4 ±
1.8 98.4
± 3.9
52.4 ±
18.5 7.4 ±
2.0 6.1 ±
2.0 12.4 ±
12.7 430.7
± 30.0
0.017 ±
0.005
Cluster 6
83.6 ±
12.7 104.2
± 15.7
31.6 ±
10.9 37.2
± 0.7
33.9 ±
14.6 6.7 ±
2.7 10.1 ± 1.9
99.1 ±
1.7 46.4
± 15.0
5.4 ± 2.1
7.0 ± 2.7
12.1 ±
10.5 322.4
± 39.4
0.017 ±
0.007
Cluster 7
86.4 ±
2.5 88.9
± 1.3
45.3 ±
7.3 37.8
± 0.1
210.2 ±
62.9 4.2 ±
0.4 13.9 ± 1.4
96.0 ±
2.2 40.2
± 0.3
1.1 ± 0.0
7.6 ± 0.1
8.5 ±
0.1 87.2
± 0.6
0.013 ±
0.000
Cluster 8
81.4 ±
13.2 115.4
± 11.1
36.7 ±
7.9 36.6
± 0.4
38.4 ±
22.8 6.6 ±
2.5 8.9 ±
1.3 99.0
± 1.4
41.6 ±
13.1 10.7
± 2.7
6.4 ±
2.6 14.6
± 16.2
697.3 ±
60.0 0.015
± 0.003
Cluster 9
76.4 ±
12.7 109.4
± 10.9
32.5 ±
8.1 36.9
± 0.8
34.5 ±
14.5 5.7 ±
1.4 8.8 ±
1.6 99.0
± 1.6
39.7 ±
9.2 11.8
± 2.8
7.9 ± 4.8
11.6 ±
11.3 847.5
± 51.8
0.013 ±
0.003
Cluster 10
96.2 ±
25.2 103.7
± 3.6
22.9 ±
3.0 36.8
± 0.7
29.5 ±
5.4 5.0 ±
0.0 8.8 ±
1.5 98.6
± 1.3
41.1 ±
8.5 12.3
± 2.6
11.9 ±
2.9 30.4
± 26.5
989.5 ±
48.3 0.012
± 0.003
0.11
0.02
0.13
0.00
0.20
0.17
0.06
0.00
0.13
0.00 0.000.00
0.05
0.10
0.15
0.20
0.25
Baseline Cluster1 Cluster 2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8 Cluster9 Cluster10
Probablity of Death
0.47
0.31
0.50
0.00
0.52
0.63
0.35
0.00
0.56 0.55
0.84
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
Baseline Cluster1 Cluster 2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8 Cluster9 Cluster10
Probability of MOF
0.73
0.67
0.78
0.00
0.61
0.67
0.75
1.00
0.56 0.55
0.84
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Baseline Cluster1 Cluster 2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8 Cluster9 Cluster10
Probability of Infection
Cluster Assignment Over Time
Cluster Assignment and probability of death over time
Patient Movement Through State Space
The next step…better inference from our data.
What’s wrong with the statistics in our literature?
Background Super Learner Prognostic indicators for trauma patients Variable Importance Discussion
Super Learner Algorithm
Alan Hubbard (UC Berkeley); Mitch Cohen(UCSF); I. Diaz, A. Decker, M. Kutcher (UCSF) Statistical Learning Methods for Traumatic (Complex, Messy) Data
Superlearner: Prediction
Principal component analysis
¡ Pattern-finding
¡ Data reduction
¡ Correlated variables are decomposed into uncorrelated synthetic multivariables, or ‘principal components’ (PCs)
¡ Each PC explains the highest percentage of overall variance possible that remains independent of the previous PCs
Principal components analysis
• Simple linear regression Fa
ctor
V
Factor VIII
Fact
or V
Factor VIII
Factor VFactor VIII
Fact
or II
Principal components analysis
• Multiple linear regression
Principal components analysis
• Multivariate component space
Fac
tor I
I
Factor VIII Factor V
PC2
PC1
Methods
• Prospective cohort study • 2/2005 - 10/2010 • 163 critically injured trauma patients • Factor activity levels drawn on admission:
– Factors II, V, VII, VIII, IX, X – Anticoagulants Protein C, Antithrombin III – Fibrinolysis marker D-dimer
• Non-linear PCA modeling
PCA model construction
PC1 PC2 PC3 Percent variance 43.91 13.45 10.12
Prothrombin Factor V Factor VII Factor VIII Factor IX Factor X D-dimer
aPC Protein C
AT III
¡ Identify factor patterns
PC1 PC2 PC3 Percent variance 43.91 13.45 10.12
Prothrombin -0.86 Factor V -0.78 Factor VII -0.62 Factor VIII -0.35 Factor IX -0.69 Factor X -0.88 D-dimer 0.25
aPC 0.20 Protein C -0.80
AT III -0.74
PC1 PC2 PC3 Percent variance 43.91 13.45 10.12
Prothrombin -0.86 -0.04 Factor V -0.78 0.01 Factor VII -0.62 0.01 Factor VIII -0.35 0.34 Factor IX -0.69 0.07 Factor X -0.88 -0.01 D-dimer 0.25 0.80
aPC 0.20 0.74 Protein C -0.80 0.11
AT III -0.74 0.16
PC1 PC2 PC3 Percent variance 43.91 13.45 10.12
Prothrombin -0.86 -0.04 0.11 Factor V -0.78 0.01 -0.11
Factor VII -0.62 0.01 0.47 Factor VIII -0.35 0.34 -0.73 Factor IX -0.69 0.07 0.03 Factor X -0.88 -0.01 0.20 D-dimer 0.25 0.80 0.00
aPC 0.20 0.74 0.39 Protein C -0.80 0.11 -0.05
AT III -0.74 0.16 -0.17
PC1 PC2 PC3 Percent variance 43.91 13.45 10.12
Prothrombin -0.86 -0.04 0.11 Factor V -0.78 0.01 -0.11
Factor VII -0.62 0.01 0.47 Factor VIII -0.35 0.34 -0.73 Factor IX -0.69 0.07 0.03 Factor X -0.88 -0.01 0.20 D-dimer 0.25 0.80 0.00
aPC 0.20 0.74 0.39 Protein C -0.80 0.11 -0.05
AT III -0.74 0.16 -0.17 -1.0
+1.0
0.0
PCA model: PC1
• 43.91% variance • Negative correlation:
– All numbered factors – Anticoagulants PC & AT3
• Depletion coagulopathy
PC1 Variance 43.91%
Prothrombin -0.86 Factor V -0.78 Factor VII -0.62 Factor VIII -0.35 Factor IX -0.69 Factor X -0.88 D-dimer 0.25
aPC 0.20 Protein C -0.80
AT III -0.74
-1.0 +1.0
PCA model: PC2
• 13.45% variance • Positive correlation:
– D-dimer & aPC – Factor VIII
• Fibrinolytic coagulopathy
-1.0 +1.0
PC2 Variance 13.45%
Prothrombin -0.04 Factor V 0.01 Factor VII 0.01 Factor VIII 0.34 Factor IX 0.07 Factor X -0.01 D-dimer 0.80
aPC 0.74 Protein C 0.11
AT III 0.16
PCA model: outcomes
PC1 PC2 PC3
Odd
s ra
tio Mortality 1.48 1.62 -
Multiorgan failure - 1.83 - Acute lung injury - 2.24 -
VAP - 1.59 - INR ≥1.3 4.68 - - PTT ≥30 3.10 - 1.44
Summary
• Principal component scores generated from this model will correlate with outcomes.
• PC1 describes coagulopathic bleeding and mortality.
• PC2 describes coagulopathy associated with infectious and inflammatory outcomes.
• These two components are independent and not identifiable using ‘traditional statistics’.
The next step… In silico
Network Topology
Model Kinetic Equations • Rate constants
aggregated from literature ranges known in 2002.
• Initial conditions specify mean plasma concentrations for proteins, with TF variable.
Patient Movement Through State Space
Systems Biology Approaches
• Allow assessment of a patients physiologic and biologic state.
• Can solve the curse of dimensionality. • Can learn?? • Model and harvest the gestalt of the astute
clinician. • Are the future in the era of big data.
?