+ All Categories
Home > Documents > Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor...

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor...

Date post: 05-Jan-2016
Category:
Upload: lynne-phillips
View: 217 times
Download: 1 times
Share this document with a friend
Popular Tags:
67
Transcript
Page 1: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 2: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 3: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Problem

• Limited number of experimental replications.

• Postgenomic data intrinsically noisy.

• Poor network reconstruction.

Page 4: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Problem

• Limited number of experimental replications.

• Postgenomic data intrinsically noisy.

• Can we improve the network reconstruction by systematically integrating different sources of biological prior knowledge?

Page 5: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 6: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

+

Page 7: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

+

+

Page 8: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

+

+

+

+…

Page 9: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

• Which sources of prior knowledge are reliable?

• How do we trade off the different sources of prior knowledge against each other and against the data?

Page 10: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 11: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 12: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networks

A

CB

D

E F

NODES

EDGES

•Marriage between graph theory and probability theory.

•Directed acyclic graph (DAG) representing conditional independence relations.

•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.

•We can infer how well a particular network explains the observed data.

),|()|(),|()|()|()(

),,,,,(

DCFPDEPCBDPACPABPAP

FEDCBAP

Page 13: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 14: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networks versus causal networks

Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Page 15: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networks versus causal networks

A

CB

A

CB

True causal graph

Node A unknown

Page 16: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networks versus causal networks

A

CB

• Equivalence classes: networks with the same scores: P(D|M).

• Equivalent networks cannot be distinguished in light of the data.

A

CB

A

CB

A

CB

Page 17: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Symmetry breaking

A

CB

Prior knowledge

A

CB

A

CB

A

CB

P(M|D) = P(D|M) P(M) / Z

D: data. M: network structure

Page 18: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

P(D|M)

Page 19: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Prior knowledge:

B is a transcription factor with binding sites in the upstream regions of A and C

P(M)

Page 20: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

P(M|D) ~ P(D|M) P(M)

Page 21: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Learning Bayesian networks

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Page 22: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 23: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 24: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 25: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 26: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Use TF binding motifs in promoter sequences

Page 27: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Biological prior knowledge matrix

Biological Prior Knowledge

Indicates some knowledge aboutthe relationship between genes i and j

Page 28: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Biological prior knowledge matrix

Biological Prior Knowledge

Define the energy of a Graph G

Indicates some knowledge aboutthe relationship between genes i and j

Page 29: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Notation

• Prior knowledge matrix:

P B (for “belief”)

• Network structure:

G (for “graph”) or M (for “model”)

• P: Probabilities

Page 30: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Prior distribution over networks

Energy of a network

Page 31: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Sample networks and hyperparameters from the posterior distribution • Capture intrinsic inference uncertainty• Learn the trade-off parameters automatically

P(M|D) = P(D|M) P(M) / Z

Page 32: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Prior distribution over networks

Energy of a network

Page 33: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Rewriting the energy

Energy of a network

Page 34: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Approximation of the partition function

Partition function of a perfect gas

Page 35: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Multiple sources of prior knowledge

Page 36: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

MCMC sampling scheme

Page 37: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Sample networks and hyperparameters from the posterior distribution

Metropolis-Hastings scheme

Proposal probabilities

Page 38: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith biological prior knowledge

•Biological prior knowledge: Information about the interactions between the nodes.

•We use two distinct sources of biological prior knowledge.

•Each source of biological prior knowledge is associated with its own trade-off parameter: 1 and 2.

•The trade off parameter indicates how much biological prior information is used.

•The trade-off parameters are inferred. They are not set by the user!

Page 39: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Recovered Networks and trade off parameters

Source 1 Source 2

1 2

Page 40: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 41: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 42: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Overview of the talk

• Revision: Bayesian networks

• Integration of prior knowledge

• Empirical evaluation

Page 43: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 44: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Raf regulatory network

From Sachs et al Science 2005

Page 45: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Raf regulatory network

Page 46: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Deregulation carcinogenesis

• Extensively studied in the literature gold standard network

Page 47: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

DataPrior knowledge

Page 48: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Flow cytometry data

• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins

• 5400 cells have been measured under 9 different cellular conditions (cues)

• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Page 49: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Microarray example Spellman et al (1998)Cell cycle73 samples

Tu et al (2005)Metabolic cycle36 samples

Ge

nes

Ge

nes

time time

Page 50: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

DataPrior knowledge

Page 51: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

KEGG PATHWAYS are a collection of manually drawn pathway maps representing our knowledge of molecular interactions and reaction networks.

http://www.genome.jp/kegg/

Flow cytometry data and KEGG

Page 52: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Prior knowledge from KEGG

Page 53: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Prior distribution

Page 54: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

The data and the priors

+ KEGG

+ Random

Page 55: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 56: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Recovered Networks and trade off parameters

Source 1 Source 2

1 2

Page 57: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Bayesian networkswith two sources of prior

Data

BNs + MCMC

Source 1 Source 2

1 2

Recovered Networks and trade off parameters

Page 58: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Sampled values of the

hyperparameters

Page 59: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 60: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

How can we evaluate the reconstruction accuracy?

Page 61: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 62: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Flow cytometry data and KEGG

Page 63: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Evaluation

• Can the method automatically evaluate how useful the different sources of prior knowledge are?

• Do we get an improvement in the regulatory network reconstruction?

• Is this improvement optimal?

Page 64: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Learning the trade-off hyperparameter

• Repeat MCMC simulations for large set of fixed hyperparameters β

• Obtain AUC scores for each value of β

• Compare with the proposed scheme in which β is automatically inferred.

Mean and standard deviation of the sampled trade off parameter

Page 65: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Page 66: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Conclusion• Bayesian scheme for the systematic

integration of different sources of biological prior knowledge.

• The method can automatically evaluate how useful the different sources of prior knowledge are.

• We get an improvement in the regulatory network reconstruction.

• This improvement is close to optimal.

Page 67: Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Thank you


Recommended