Bioinformatics Fox Chase Cancer Center
Identifying Changes in Signaling from High-Throughput Data
Michael OchsFox Chase Cancer Center
Bioinformatics Fox Chase Cancer Center
The “New” Paradigm
Personalized MedicineTa
rget
ed T
herap
ies
Overall Survival (years)
0 2 4 6 8 10
Group 1Patients
Group 2Patients
Group 1 Group 2
Your Chromosomes
Here
Bioinformatics Fox Chase Cancer Center
Outline
• Signaling and Gene Expression
• Bayesian Decomposition
• Examples of Analyses
Bioinformatics Fox Chase Cancer Center
Cellular Signaling
Extracellular Signal
Signal Transduction
Metabolic Changes
Transcription
Downward, Nature, 411, 759, 2001
Bioinformatics Fox Chase Cancer Center
Gene Expression
Bioinformatics Fox Chase Cancer Center
Identifying PathwaysM F H
A
B D
C E
A
B
C
D
E
Bioinformatics Fox Chase Cancer Center
Goal of Analysis
Take measurements of thousands of genes, some of which are responding to stimuli of interest
* *
1 2 3
** *
*
then identify the pathways
And find the correct set of basis vectors that link to pathways
Bioinformatics Fox Chase Cancer Center
Biological ModelBlock Protein-Protein Interaction
Leads to Loss of Some Transcripts, Reduction of Others Depending on Active Signaling Pathways
But the Gene Lists are Incomplete as are theNetwork Diagrams!
Bioinformatics Fox Chase Cancer Center
Issues to Solve
• Overlapping Signals– Genes are involved in multiple processes– Various processes are active
simultaneously in any observed data
• Identification of Process Behind Signal
– If find a signal, what is the cause– Do identification without a complete model
Bioinformatics Fox Chase Cancer Center
Outline
• Signaling and Gene Expression
• Bayesian Decomposition
• Examples of Analyses
Bioinformatics Fox Chase Cancer Center
Data
(Spellman et al, Mol Biol Cell, 9, 3273, 1999)(Cho et al, Mol Cell, 2, 65, 1998)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Bioinformatics Fox Chase Cancer Center
BD: Identification of Signals
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data
X
gene 1
gene N
* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *
gene 1
gene N
patt
ern
1
patt
ern
k
cond
itio
n 1
cond
itio
n M
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *
pattern 1
pattern k
cond
itio
n 1
cond
itio
n M
Distribution of Patterns
Patterns of Behavior
=
vsMock
complex behavior
is explained as combinations
of simpler behaviors
Bioinformatics Fox Chase Cancer Center
Markov Chain Monte Carlo
Markov Chain Monte Carlo is used to explore the possible solutions
We cannot always solve the problem directly, we can only estimate relative probabilities of possible solutions
Bioinformatics Fox Chase Cancer Center
Bayesian Statistics
p(model | data) =p(data | model) p(model)
p(data)
* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *
gene 1
gene N
patt
ern
1
patt
ern
k
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *
pattern 1
pattern k
cond
itio
n 1
cond
itio
n M
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
gene 1
gene N
cond
itio
n 1
cond
itio
n M
X=
Bioinformatics Fox Chase Cancer Center
Outline
• Signaling and Gene Expression
• Bayesian Decomposition
• Examples of Analyses
Bioinformatics Fox Chase Cancer Center
Acknowledgements
• Tom Moloshok (Cell Cycle, Mouse)
• Ghislain Bidaut (Yeast Deletion Mutants)
• Andrew Kossenkov (TFs, YDMs)
• Bill Speier, DJ Datta, Daniel Chung, Ryan Goldstein, Matt Lewandowski
Bioinformatics Fox Chase Cancer Center
Cell Cycle
Tobin and Morel, Asking About Cells, Harcourt Brace, 1997
Bioinformatics Fox Chase Cancer Center
• Data: Expression data of 788 yeast cell-cycle regulated genes [Cho, 1998] across 17 different time points was taken for analysis.
• Coregulation: 11 groups (from 5 to 17 genes in each group – 67 genes in total, 18 from 67 genes belong to more than one group) were composed, based on literature review (not cell cycle literature).
• Analysis: with and without coregulation information
Data
Bioinformatics Fox Chase Cancer Center
Validation
Cherepinsky et al, PNAS, 100, 9668, 2003
Bioinformatics Fox Chase Cancer Center
Sensitivity =
TP
FNFNTP
TP
+=
+ 1
1
TN
FPFPTN
TN
+=
+ 1
1Specificity =
TP true positiveTN true negativeFP false positiveFN false negative
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 - specificity
sensitivity
almost perfectgoodworthless
ROCReceiver Operator Characteristic
Area under the curve is the measurement of algorithm efficacy
ROC Analysis
1 - Specificity
Sen
siti
vity Fraction of called positives that are correct
Fraction of called negatives that are correct
Bioinformatics Fox Chase Cancer Center
Hierarchical Clustering
ROC CurveCherepinsky et al, PNAS, 100, 9668, 2003
Bioinformatics Fox Chase Cancer Center
Bayesian Decomposition
1 - Specificity
Sens
itiv
ity
Bioinformatics Fox Chase Cancer Center
Deletion Mutant Data Set
• 300 Deletion Mutants in S. cerevisiae– Biological/Technical Replicates with Gene
Specific Error Model– Filter Genes
• >25% Data Missing in Ratios or Uncertainties• < 2 Experiments with 3 Fold Change
– Filter Experiments• < 2 Genes Changing by 3 Fold
228 Experiments/764 Genes
(Hughes et al, Cell, 102, 109, 2000)
Bioinformatics Fox Chase Cancer Center
BD: Matrix Decomposition
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data
X
gene 1
gene N
* * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * ** * * *
gene 1
gene N
patt
ern
1
patt
ern
k
Mut
ant 1
Mut
ant M
* * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * *
pattern 1
pattern k
Mut
ant
1
Mut
ant
M
Distribution of Patterns
(what genes are in patterns)
Patterns of Behavior
(does mutant containpattern)
=
Bioinformatics Fox Chase Cancer Center
Analysis
• Bayesian Decomposition– Identify patterns and linked genes– Use genes to determine function
• Interpretation of Functions– Gene Ontology– Transcription factor data
• Validation
Bioinformatics Fox Chase Cancer Center
Use of Ontology: Pattern 1313 15
Bioinformatics Fox Chase Cancer Center
The Other Pattern: 1513 15
Bioinformatics Fox Chase Cancer Center
Transcription Factors
Signaling Pathways
to Transcription Factors
to mRNA Changes
Bioinformatics Fox Chase Cancer Center
Genes from Pattern 13*Fig1*Prm6*Fus1*Ste2*Aga1*Fus3Pes4*Prm1ORF*Bar1
* known to be involved in mating response
known to be regulated by Ste12p
Bioinformatics Fox Chase Cancer Center
Validation
(Posas, et al, Curr Opin Microbiology, 1, 175, 1998)
Amount of Behavior Explained by Mating Pathway for Mutants
Bioinformatics Fox Chase Cancer Center
Pattern 13 Mutants
Bioinformatics Fox Chase Cancer Center
Pattern 15 Mutants
Bioinformatics Fox Chase Cancer Center
Conclusions
• Transcriptional Response Provides Signatures of Pathway Activity
• Ontologies Can Guide Interpretation
• Bayesian Decomposition Can Dissect Strongly Overlapping Signatures
Bioinformatics Fox Chase Cancer Center
AcknowledgementsTom MoloshokJeffrey GrantYue Zhang Elizabeth GoralczykLiat ShimoniLuke Somers (UPenn)Olga TchuvatkinaMichael SlifkerSinoula ApostolouBrendan Reilly
CollaboratorsA. Godwin (FCCC)A. Favorov (GosNIIGenetika)J.-M. Claverie (CNRS)G. Parmigiani (JHU)O. Favorova (RMSU)
Ghislain Bidaut (UPenn CBIL)Andrew KossenkovVladimir Minayev (MPEI)Garo Toby (Dana Farber)Yan ZhouAidan Petersen
Bill Speier (Johns Hopkins)Daniel Chung (Columbia)DJ Datta (UCSF)Elizabeth Faulkner (UPenn)
Frank ManionBob Beck
Fox Chase
Bioinformatics Fox Chase Cancer Center
Patterns as Basis Vectors
PCA
BDFuzzy Clustering
Bioinformatics Fox Chase Cancer Center
MakingProteins
(Phenotype)
Bioinformatics Fox Chase Cancer Center
ROSETTA DATA
• From 5 to 20 patterns were posited in the analysis.
• Results were checked on information about Metabolic Pathways taken from Saccharomyces Genome Database - 11 groups of 4-6 genes, known to be involved in the same metabolic pathways.
• ROC analysis was performed
Bioinformatics Fox Chase Cancer Center
ROSETTA DATA
9 10 11 12 13 14 15 16 17 18 19 200.64
0.66
0.68
0.7
0.72
0.74
0.76
Number of patterns
area under ROC
WITH coregulation infoWITHOUT coregulation info