Date post: | 11-May-2015 |
Category: |
Education |
Upload: | florian-markowetz |
View: | 486 times |
Download: | 2 times |
Florian [email protected]
Probabilistic refinement
of cellular pathway models
Cambridge Statistical Laboratory Networks seminar series
2009 Jan 21
What is a signaling pathway?
DNA
mRNA
Protein
Environmentalstimuli
Receptor incell membrane
Protein cascade
Transcription factorsregulating target genes
Pathway
Pathway reconstruction
Signaling pathways are important-
Deregulation causes many diseases incl. cancer
Signaling pathways are poorly understood-
Only parts-lists
-
missing are interactions
within and between pathways
Biological research-
So far mostly focused on individual genes
New genome-scale datasets-
Opportunity for data integration and novel methods
What data do we have?
DNA
mRNA
Protein mRNA:-
Expression under different stimuli-
binding to DNA
Sequence:- binding motifs- epigenetic marks
Proteins:- interactions between proteins- binding to DNA
Morphology
Bulk of data:Microarray
Pathways as graphs•
Nodes
are (mostly) known
•
Goal: infer edges from data•
Data are heterogeneous
• binding motifs at genes• Protein domains• Functional annotation
•
co-expression between genes •
interactions between
proteins •
binding of proteins to
DNA
• Cause-effect data: • changing environments• experimental perturbations
Edges
Nodes
Paths
Pathway reconstruction
“Classical”
statistical approaches:Treat the genes/proteins as random variables and
explore correlation structure
in the data:–
Correlation graphs
–
Gaussian graphical models (partial correlation)–
Bayesian networks
Review: Markowetz and Spang (2007)
Challenges/Problems/Opportunities1. Correlation may be un-informative2. Integrate heterogeneous and noisy and
complementary data sources
– Part 1 –
Nested Effects Models
Experimental perturbations
DNA
mRNA
Protein
RNAi
Knockout
DrugsSmallmolecules
Stress
Readout:Global gene expression measurements
Drosophila immune response
Columns: perturbed genesRows: effects on other genes
1.
Silencing tak1
reduces expression of all
LPS-
inducible transcripts2.
Silencing rel
(key) or
mkk4/hep
reduces expression of subsets
of
induced transcripts
(Boutros et al, Dev Cell 2002)
(!)
Two types of entities
Components of signaling pathway
which are
experimentally perturbed
Downstream
effect reporters
(!!)
Only indirect information
No direct observation
of perturbation effects on other pathway components!
Inference from observed perturbation effects on downstream reporters.
The information gap
B
A C
DPathway
-
Cell survival or death- Growth rate- downstream genes
BA C
DPathway
Direct information: effects are visible at other
pathway components
Indirect information: effects are only visible at
down-stream reporters
Correlation won’t do
Downstream regulated
genes
BA C
DPathway Correlation
Graphical models:- Bayes Nets- GGMsMutual Information
NestedEffects Models
“Classical” approach
Nested Effects Models
Inferred pathwayPhenotypic profiles
A B
C D E F
G H
ABC
F
D
H
E
G
Gen
e pe
rtur
batio
ns
Effects
1.
Set of candidate pathway genes2.
High-dimensional phenotypic profile, e.g. microarrayINPUT
OUTPUT Graph representation of information flow explaining the phenotypes
NEM: model formulation
M’xyz :X Y Z
E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6
XYZ
Expected
E1 E2 E3 E4 E5 E6
XYZ
ObservedFN FN
FNFP
Pathway genes: X, Y, Z• core topology• to be reconstructed
= Model
M
Effect reporters: E1
, …, E6• states are observed
= Data D• positions in pathway unknown
= Parameters θ
Posterior:
P ( M | D ) = 1/Z .
P( D | M )
. P( M )Marginal likelihood
Likelihood P( D | M, θ
)
Error probabilitiese.g. false NEG rate 20%, false POS rate 5%
95.080.095.005.0)1Pr()0Pr()1Pr()1Pr( 2121
⋅⋅⋅==⋅=⋅=⋅== EEEELik
Prediction E1 =0 E2 =1Observation 1. E1 =1 E2 =1
2. E1 =0 E2 =1
Compare predictions with observations:
X
Y
Z
E1 E2
Marginal likelihood
∫ ΘΘΘ= dMPMDPMDP )|(),|()|(
∏∑∏= = =
==m
i
n
j
l
kiikm jMeP
n 1 1 1
),|(1 θ
Product over replicate observation
Average over possible positions in the pathway
Product over all effect reporters
Uniformprior overpositions
Distribution of single effect reporter with known position
NEM: inference
Model space: all transitively closed directed graphs
Exhaustive enumeration: score all models to find the one fitting the data best
Markowetz et al. Bioinformatics, 2005
MCMC, Simulated Annealing: take small probabilistic steps to explore model space
. . . with A Tresch; in preparation
Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble
Markowetz et al. ISMB 2007
NEM: extensions
Drop transitivityrequirement
Likelihood based on log-ratios
of effects
Feature selection
to concentrate on informative effect reporters
Tresch
and Markowetz (2008)
NEMs on Drosophila data
Summary of part 1
1.
Gene perturbation screens
with gene- expression readouts
2.
Perturbation screens suffer from the information gap
between pathways and
reporters
3.
Nested Effects Models
reconstruct pathway features from subset relations between observed effects
– Part 2 –
Data integration
and probabilistic refinement
of
a signaling pathway hypothesis
Pathway refinement1.
Start from given pathway hypothesis
Even if our understanding of pathways is poor, that does not mean we have none at all!
2.
Evaluate evidence for hypothesis in data
3.
Identify weakly supported areas and likely extensions
Not reconstruction from scratch.
Step 1:
assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway
in Yeast
Edge data I
Support for hypothesis in protein-protein interaction
data
Edge data II
Support for hypothesis in co-expression
data
Edge data IIIWhy is it so hard to reconstruct nuclear regulatory network from correlations?
Edge data IVSupport for hypothesis in
TF-DNA binding
data
Paths: cause-effect dataExpression profiling of knock-out mutants
(Hughes et al., 2000)
Result:transcriptional response to perturbation only visible on down-stream genes (information gap!)
Conclusion from data analysis
•
Every data source is informative for a specific compartment of the pathway
•
No data source is informative in all compartments
•
We expect these observations also to hold for other MAPK and signaling pathways.
Need compartment-specific integrative model encompassing edge, node, and path data.
Integrative model
Pathway graph as hidden/latent variables
Conditional distributions for each data type
Different data types contribute to each compartment
Graphical model defines posterior P(G|data)-> inference by Gibbs sampler
ParametersPrior
Evaluation
1.
Fit model parameters
on pheromone response pathway (training)
2.
Use fitted model on other MAPK pathways (generalization to closely related examples)
3.
Use fitted model on all other Yeast signaling pathways
(generalization to everything else)
… work in progress …
Acknowledgements
Nested Effects Models
Rainer Spang
(Univ. Regensburg) .:.
Dennis Kostka
(UC SF) .:.
Achim
Tresch
(Gene Center
Munich) .:.
Holger
Fröhlich
(DKFZ Heidelberg) .:. Tim Beißbarth
(Univ. Göttingen) .:. Josh
Stuart,
Charlie Vaske
(UC SC) .:.
Data integration
Olga G. Troyanskaya
(Princeton) .:. Edoardo Airoldi
(Harvard) .:.
David Blei
(Princeton) .:.