+ All Categories
Home > Documents > Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of...

Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of...

Date post: 27-Mar-2015
Category:
Upload: ava-fisher
View: 217 times
Download: 3 times
Share this document with a friend
Popular Tags:
34
Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Transcript
Page 1: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Two bioinformatics applications of dynamic Bayesian networks

William Stafford NobleDepartment of Genome Sciences

Department of Computer Science and EngineeringUniversity of Washington

Page 2: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Outline

• Segmenting genomic data– Background: DNA, chromatin and DNase I– Simple solution– Wavelets– Hierarchical model

• Matching peptides to mass spectra– Background: tandem mass spectrometry– Modeling peptide fragmentation

Page 3: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

GenesGenes

Gene Gene ‘domains’‘domains’

DnaseIDnaseIHypersensitive Hypersensitive SiteSite

Trans-Trans-factor factor

complexcomplex

Chromatin Fiber Chromatin Fiber

NucleusNucleus

GenomicGenomicDNADNA

Packaged into Packaged into ChromatinChromatin

The human genome in vivo

Page 4: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Measuring chromatin

accessibility

Page 5: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 6: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

A simple hidden Markov model

• Each state contains a single Gaussian.• The model has six parameters (two transitions, two means, two standard

deviations).• The parameters are initialized randomly and trained in an unsupervised

fashion via expectation-maximization.• EM is re-started 100 times, and we select the parameters that yield the

highest likelihood.• The original data set is then segmented using either Viterbi or posterior

decoding.

Openchromatin

Closedchromatin

very

^

Page 7: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

1.5 megabases

Page 8: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

A problem, and two solutions

• Problem: We are interested in phenomena occurring at multiple scales.

• Solution #1: Perform a wavelet smooth prior to HMM analysis.

• Solution #2: Build a more complex probability model.

Page 9: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 10: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 11: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 12: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 13: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Change point model

• Four-state model: – major DNase hypersensitive site (DHS),– minor DHS,– intermediate sensitivity region, and– insensitive region.

• Continuous mixture of Gaussians at each state.

• Gamma distribution of lengths within each region.

Page 14: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Page 15: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Spanning the gaps

Beginning in State 1 (Insensitive)

Page 16: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Spanning the gaps

Beginning in State 4 (Major DHS)

Page 17: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Selecting the number of states

Page 18: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Improved fit to the data

Each panel is a QQ plot of the difference between the observed residuals and the theoretical Gaussian.

Insensitive Intermediate sensitivity

Minor DHS Major DHS

Page 19: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Capturing different scales

Page 20: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Enrichment of biologically relevant features

Page 21: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Future directions

• Many types of genomic data– Phylogenetic conservation scores– Various histone modifications– Replication timing, etc.

• Perform segmentions in multiple dimensions simultaneously.

• Assign statistical significance to observed segments.

Page 22: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Shotgun proteomics

TrainedModel

TestPSMs

TrainingPSMs

ProbabilityModel

Evaluation

PSM = peptide-spectrum match

Page 23: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Peptide sequence influences peak height

Page 24: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Bayesian network

• We model peptide fragmentation using a Bayesian network.

• Nodes represent random variables, and edges represent conditional dependencies.

• Each node stores a conditional probability table (CPT) giving Pr(node|parents).

1.000.00no b-ion observed

0.750.25 b-ion observed

intensity > 50% intensity < 50%

Is b-ionobserved?

b-ionintensity

Page 25: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Ion series modeled in a Markov chain

Is b-ionobserved?

b-ionintensity

Is b-ionobserved?

b-ionintensity

Is b-ionobserved?

b-ionintensity

Is b-ionobserved?

b-ionintensity

Is b-ionobserved?

b-ionintensity

~ PepHMM (Han et al., 2005).

Page 26: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

A more realistic model

Is b-ionobserved?

b-ion intensity

N-termAA

C-term AA

Is ion detectable?

Fractionalm/z

Is protonmobile?

Page 27: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Ion series modeled in a Markov chain

model nullpeptide ions,-bPr

modelpeptide ions,-bPrlogbLOR

Page 28: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Vectors of log-odds ratios

Correct peptide-spectrum matches Incorrect peptide-spectrum matches

Page 29: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Binary classifier

Page 30: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Model Evaluation: Accuracy

Model Redundant TP/FP Unique TP/FP

Bayes Net 285/300, 95% 137/144, 95.1%

SEQUEST 288/300, 96% 136/144, 94.4%

InsPecT 274/300, 91.3% 131/144, 90.9%

TrainedModel

TestPSMs

TrainingPSMs

ProbabilityModel

Evaluation

Page 31: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

An incorrect identification

SEQUEST: LRPGAELLEGAHVGNFVEMKBayes net: HQDETQDALNALDLLTNEK

Blue = b and y, green = a, red = ammonia loss, magenta = water loss, sienna = +2

This peptide does not appear in E. coli, the organism from which this protein sample was derived.

Page 32: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Co-eluting peptides

SEQUEST: AFPEAVLFIHPLDAKBayes net: DVFVHFSALQGNQFK

Blue = b and y, green = a, red = ammonia loss, magenta = water loss, sienna = +2

Page 33: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Future directions

• Build a single Bayesian network that includes all ion types.

• Produce more descriptive outputs from the Bayesian network for input to the classifier.

• Add more biophysical details to the model: chromatography retention time, a better mass-to-charge estimate, etc.

• Generate a better (larger, more accurate) gold standard data set.

Page 34: Two bioinformatics applications of dynamic Bayesian networks William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.

Acknowledgments

• DNase I hypersensitivity– John Stamatoyannopoulos– Pete Sabo– Scott Kuehn– many others in the Stam

lab

• Wavelet analysis: Bob Thurman

• Change point model– Charles Lawrence– Heng Lian– William Thompson

• Mass spectrometry– Aaron Klammer– Jeff Bilmes– Sheila Reynolds– Michael MacCoss


Recommended