Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | lynne-phillips |
View: | 217 times |
Download: | 1 times |
Problem
• Limited number of experimental replications.
• Postgenomic data intrinsically noisy.
• Poor network reconstruction.
Problem
• Limited number of experimental replications.
• Postgenomic data intrinsically noisy.
• Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?
+
+
+
+
+
+
+…
• Which sources of prior knowledge are reliable?
• How do we trade off the different sources of prior knowledge against each other and against the data?
Overview of the talk
• Revision: Bayesian networks
• Integration of prior knowledge
• Empirical evaluation
Overview of the talk
• Revision: Bayesian networks
• Integration of prior knowledge
• Empirical evaluation
Bayesian networks
A
CB
D
E F
NODES
EDGES
•Marriage between graph theory and probability theory.
•Directed acyclic graph (DAG) representing conditional independence relations.
•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.
•We can infer how well a particular network explains the observed data.
),|()|(),|()|()|()(
),,,,,(
DCFPDEPCBDPACPABPAP
FEDCBAP
Bayesian networks versus causal networks
Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.
Bayesian networks versus causal networks
A
CB
A
CB
True causal graph
Node A unknown
Bayesian networks versus causal networks
A
CB
• Equivalence classes: networks with the same scores: P(D|M).
• Equivalent networks cannot be distinguished in light of the data.
A
CB
A
CB
A
CB
Symmetry breaking
A
CB
Prior knowledge
A
CB
A
CB
A
CB
P(M|D) = P(D|M) P(M) / Z
D: data. M: network structure
P(D|M)
Prior knowledge:
B is a transcription factor with binding sites in the upstream regions of A and C
P(M)
P(M|D) ~ P(D|M) P(M)
Learning Bayesian networks
P(M|D) = P(D|M) P(M) / Z
M: Network structure. D: Data
Overview of the talk
• Revision: Bayesian networks
• Integration of prior knowledge
• Empirical evaluation
Use TF binding motifs in promoter sequences
Biological prior knowledge matrix
Biological Prior Knowledge
Indicates some knowledge aboutthe relationship between genes i and j
Biological prior knowledge matrix
Biological Prior Knowledge
Define the energy of a Graph G
Indicates some knowledge aboutthe relationship between genes i and j
Notation
• Prior knowledge matrix:
P B (for “belief”)
• Network structure:
G (for “graph”) or M (for “model”)
• P: Probabilities
Prior distribution over networks
Energy of a network
Sample networks and hyperparameters from the posterior distribution • Capture intrinsic inference uncertainty• Learn the trade-off parameters automatically
P(M|D) = P(D|M) P(M) / Z
Prior distribution over networks
Energy of a network
Rewriting the energy
Energy of a network
Approximation of the partition function
Partition function of a perfect gas
Multiple sources of prior knowledge
MCMC sampling scheme
Sample networks and hyperparameters from the posterior distribution
Metropolis-Hastings scheme
Proposal probabilities
Bayesian networkswith biological prior knowledge
•Biological prior knowledge: Information about the interactions between the nodes.
•We use two distinct sources of biological prior knowledge.
•Each source of biological prior knowledge is associated with its own trade-off parameter: 1 and 2.
•The trade off parameter indicates how much biological prior information is used.
•The trade-off parameters are inferred. They are not set by the user!
Bayesian networkswith two sources of prior
Data
BNs + MCMC
Recovered Networks and trade off parameters
Source 1 Source 2
1 2
Bayesian networkswith two sources of prior
Data
BNs + MCMC
Source 1 Source 2
1 2
Recovered Networks and trade off parameters
Bayesian networkswith two sources of prior
Data
BNs + MCMC
Source 1 Source 2
1 2
Recovered Networks and trade off parameters
Overview of the talk
• Revision: Bayesian networks
• Integration of prior knowledge
• Empirical evaluation
Evaluation
• Can the method automatically evaluate how useful the different sources of prior knowledge are?
• Do we get an improvement in the regulatory network reconstruction?
• Is this improvement optimal?
Raf regulatory network
From Sachs et al Science 2005
Raf regulatory network
Evaluation: Raf signalling pathway
• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell
• Deregulation carcinogenesis
• Extensively studied in the literature gold standard network
DataPrior knowledge
Flow cytometry data
• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins
• 5400 cells have been measured under 9 different cellular conditions (cues)
• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments
Microarray example Spellman et al (1998)Cell cycle73 samples
Tu et al (2005)Metabolic cycle36 samples
Ge
nes
Ge
nes
time time
DataPrior knowledge
KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks.
http://www.genome.jp/kegg/
Flow cytometry data and KEGG
Prior knowledge from KEGG
Prior distribution
The data and the priors
+ KEGG
+ Random
Evaluation
• Can the method automatically evaluate how useful the different sources of prior knowledge are?
• Do we get an improvement in the regulatory network reconstruction?
• Is this improvement optimal?
Bayesian networkswith two sources of prior
Data
BNs + MCMC
Recovered Networks and trade off parameters
Source 1 Source 2
1 2
Bayesian networkswith two sources of prior
Data
BNs + MCMC
Source 1 Source 2
1 2
Recovered Networks and trade off parameters
Sampled values of the
hyperparameters
Evaluation
• Can the method automatically evaluate how useful the different sources of prior knowledge are?
• Do we get an improvement in the regulatory network reconstruction?
• Is this improvement optimal?
How can we evaluate the reconstruction accuracy?
Flow cytometry data and KEGG
Evaluation
• Can the method automatically evaluate how useful the different sources of prior knowledge are?
• Do we get an improvement in the regulatory network reconstruction?
• Is this improvement optimal?
Learning the trade-off hyperparameter
• Repeat MCMC simulations for large set of fixed hyperparameters β
• Obtain AUC scores for each value of β
• Compare with the proposed scheme in which β is automatically inferred.
Mean and standard deviation of the sampled trade off parameter
Conclusion• Bayesian scheme for the systematic
integration of different sources of biological prior knowledge.
• The method can automatically evaluate how useful the different sources of prior knowledge are.
• We get an improvement in the regulatory network reconstruction.
• This improvement is close to optimal.
Thank you