+ All Categories
Home > Documents > Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM...

Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM...

Date post: 13-Mar-2019
Category:
Upload: lekhanh
View: 245 times
Download: 0 times
Share this document with a friend
20
1 Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview Alignment with HMM Gene predictions Protein structure predictions
Transcript
Page 1: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

1

Prof. Yechiam Yemini (YY)

Computer Science DepartmentColumbia University

Chapter 4: Hidden Markov Models

4.4 HMM Applications

2

Overview

Alignment with HMM Gene predictions Protein structure predictions

Page 2: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

2

3

Sequence Analysis Using HMMStep 1: construct an HMM model

Design an HMM generator for the observed sequencesAssign hidden states to underlying sequence regionsModel the questions to be answered in terms of hidden-pathway

Step 2: train the HMMSupervised/unsupervised

Step 3: analyze sequencesViterbi decoding: compute most likely hidden-pathway Forward/Backward: compute likelihood of sequence (& p/Z-score)

4

Alignment

Page 3: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

3

5

Profile HMM

S EMM M

I

D

ACTG

.6

.2

.1

.1

D

II I

D

6

Searching With Profile HMM Is the target sequence S generated by profile H?

Viterbi: use H to compute the most probable alignment of S Forward: compute probability of S generated by H

Example: searching for globinsCreate profile HMM from known globin sequencesUse the forward algorithm to search SWISS-PROT for globinCompare against the null hypothesis (random sequence)Result: globins are highly distinct

Page 4: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

4

7

Pairwise Alignment With HMM Key idea: recast DP as Viterbi decoding“Alignment” = sequence of pairs {<X,Y>,<X,_>, <_,Y>} Consider an HMM that emits pairs (pair HMM):

Begin M: PXY End

X: qX

Y: qy

ε

ε

τ

τ

τδ

δδ

δ

τ

1−2δ−τ

1−2δ−τ

1−ε−τ

1−ε−τ

8

Viterbi Decoding Pairwise Alignment

Begin M: PXY End

X: qX

Y: qy

ε

ε

τ

τ

τδ

δδ

δ

τ

1−2δ−τ

1−2δ−τ

1−ε−τ

1−ε−τ

1 2 3 4 5 6BMXYE

X=TGCGCAY=TGGCTA

TT -GGG

G-

XY-YX-

Viterbi decoding: optimum path of pairs Forward step: select among {XY,-Y,X-}

Page 5: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

5

9

Comparing With Standard DP Compute relationships scores probabilities Consider an HMM that emits pairs (pair HMM)

M

X

Ys(Xi,Yj) -d

s(Xi,Yj+k)

s(Xi+k,Yj)-d

-e

-e

Begin M: PXY End

X: qX

Y: qy

ε

ε

τ

τ

τδ

δδ

δ

τ

1−2δ−τ

1−2δ−τ

1−ε−τ

1−ε−τ

<δ,ε,τ,P,q> <s(x,y),d,e>

e.g., e=-log[ε/(1−τ) ]

10

Gene Predictions

Page 6: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

6

11

Schematics Prokaryotic genes

Eukaryotic genes

gene genegenepromoter

start stop

terminator

exon exonexonpromoter

start stopdonor acceptor

intron intron

12

Modeling Gene Structure With HMMGene regions are modeled as HMM statesEmission probabilities reflect nucleotide statistics

Splice site

Page 7: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

7

13

Simple Gene Models Consider a region of random length L

Markov model L is geometrically distributed P[L=k]=pk(1-p) E[L]=p/(1-p)

How do we model ORF?Codons can be modeled as higher-order states

A=.29C=.31G=.04T=.36

14

VEIL: Viterbi Exon-Intron Locator

Contains 9 hidden states Each state is a Markovian model of regions

Exons, introns, intergenic regions, splice sites, etc.Exon HMM Model

Upstream

Start Codon

Exon

Stop Codon

Downstream

3’ Splice Site

Intron

5’ Poly-A Site

5’ Splice Site

• Enter: start codon or intron (3’ Splice Site)

• Exit: 5’ Splice site or three stop codons(taa, tag, tga)

VEIL Architecture

(Henderson, Salzberg, & Fasman 1997)

Page 8: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

8

15

Genie (Kulp 96)

Uses a generalized HMM (GHMM) Edges in model are complete HMMs States are neural networks for signal finding

• J5’ – 5’ UTR

• EI – Initial Exon

• E – Exon, Internal Exon

• I – Intron

• EF – Final Exon

• ES – Single Exon

• J3’ – 3’UTR

BeginSequence

StartTranslation

Donorsplicesite

Acceptorsplicesite

StopTranslation

EndSequence

16

GeneScan (Burge & Karlin 97)

Ex5Intergenic Promoter Ex5Ex2 In4Ex2 Ex3 Ex4 Poly AIn3In2In1Ex1

5’ UTR 3’ UTR

Singleexongene

5’ UTR

Promoter

Intergenicregion

E1E0 E2

Einit Eterm

3’ UTR

I0 I1 I2

Poly A signal

J. Mol. Bio. (1997) 268, 78-94

Base models overall parse of the gene

5’UTR/3’UTR; Promoter; Poly A….

Exon-Intron-Exon structure of gene

Intron states model 3 scenarios:

I0: Intron is between two codons

I1: Intron is right after first codon base

I2: Intron is right after second codon base

Exons model respective scenarios

Page 9: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

9

17

GeneScan Strategy

Ex5Intergenic Promoter Ex5Ex2 In4Ex2 Ex3 Ex4 Poly AIn3In2In1Ex1

5’ UTR 3’ UTR

Singleexongene

5’ UTR

Promoter

Intergenicregion

E1E0 E2

Einit Eterm

3’ UTR

I0 I1 I2

Poly A signal

HMM: sequence generator; state=region

Decoding: parse sequence into regions

Viterbi: maximum likelihood decoder via DP

These structures admit generalizations

Forward Strand

Forward Strand

18

Extending The HMM Model

Problem: the HMM does not model natureSequence lengths are not geometrically distributedSome regions have special features (e.g., splice sites use GT…)…….

Challenge: generalize HMM while retaining algorithms

Ex5Intergenic Promoter Ex5Ex2 In4Ex2 Ex3 Ex4 Poly AIn3In2In1Ex1

….ACAGTTATAGCGTAATTGCGAGCTATCATGAG……

Decode

Page 10: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

10

19

GeneScan: Extended Sequence Generator Model

Add to state k a length distribution fk(L)Semi-Markov modelExtend the Viterbi DP recursion to reflect this

Incorporate special regions structuresWeight matrix model for splice site

…. Preserve the recursion structure Single

exongene

5’ UTR

Promoter

Intergenicregion

E1E0 E2

Einit Eterm

3’ UTR

I0 I1 I2

Poly A signal

20

Measuring Classifier Performance The challenge: how do we evaluate performance of a classifier?

E.g., consider a test to decide whether a person is sick (p) or healthy (n) Measured Performance

TP=True Positive; TN=True Negative; FP=False Positive; FN = False Negative

Sensitivity: SN=TP/(TP+FN) Sensitivity =percentage of correct predictions when the actual state is positive

Specificity: SP =TN/(TN+FP) Specificity=percentage of correct predictions when the actual state is negative

Receiver Operating Characteristics (ROC) curves Represent tradeoff between sensitivity & specificity

TP

TN

FN

FP

P

N

P’ N’prediction

Actual

Sensitivity

1-specificity

10.5

0.5 1.0

ROC curves

Page 11: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

11

21

Performance Comparisons

22

IdentifyingTransmembrane Proteins

Page 12: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

12

23

Transmembrane Proteins Transmembrane proteins are the key tools for cells interactions

Signaling, transport, adhesion…

A common structural motif: core helices Example: receptors

Function: receive signals, activate pathways Operations:

1. Signal recognition2. Signal transduction3. Pathway activation4. Down regulation

www.pdb.org

24

Example: The Insulin Pathway

From Wikipedia

Insulin binds toInsR receptor

Receptor activatespathway

Glut-4 is transported tomembrane to generate

glucose influx

Metabolic networkprocessing

InsR phosphorylated complex

Page 13: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

13

25

TMHMM [Sonhammer et al 98]

A hidden Markov Model for Predicting Transmembrane Helices in ProteinIn J. Glasgow et al., eds., Proc. Sixth Int. Conf. on Intelligent Systems forMolecular Biology, 175-182. AAAI Press, 1998. 1

A HMM to identifytransmembrane proteins

Architecture: The helix core is key Globular and loop domains Modeling challenge: variable length

Training The Maximization step of Baum

Welch uses simulated annealing 3-stage training

26

TMHMM Performance

Page 14: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

14

27

ProteinsStructure Prediction

28

Proteins Are Formed From Peptide Bonds

Ramachandran Regions

Page 15: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

15

29

Levels of Protein Structure

30

Structure Formation

Page 16: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

16

31

Helix is Formed Through H Bonds

32

Tertiary Structures: b-Barrel Example

Page 17: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

17

33

Why Is Structure Important?Protein functions are handled by structural elements

HIV protease drug binding siteEnzyme active site

Antibody-protein interfaces

34

Structure DeterminationMap: sequence structure

Structure determination is handled through crystalographyX-Ray; NMR

PDB: a database of structures

Page 18: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

18

35

The Problem Of Structure Prediction Derive conformation geometry from sequence

Ab-initio techniquesHomology techniques

Secondary structure is organized in folds α-helix, β-sheets….

36

The Structure Prediction Problem

Given: an amino acid sequenceOutput: structure annotation {H,B…}

Page 19: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

19

37

HMM Model Hidden states: folds Observable: AA sequence Viterbi decoding: find most likely annotation Training: supervised learning use known structures Performance: HMM works well for limited protein types

E.g., TMHMM… For general structure predictions Neural-Nets are superior

(e.g., PHD, R. Burkhard 93)Why?

o [HMM works as long as the underlying conformation is consistent with the Markovian model.Protein conformations may depend on complex long-range non-Markovian interactions. E.g.,consider the impact of a single AA change on hemoglobin conformation in sickle-cell anemia. ]

38

Example (Chu et al, 2004)Step 1: align sequences to partition into regions

Step 2: build extended HMM (semi-markov length)

www.gatsby.ucl.ac.uk/~chuwei/paper/icml04_talk.pdf

Page 20: Chapter 4: Hidden Markov Models - Columbia University · Chapter 4: Hidden Markov Models 4.4 HMM Applications 2 Overview ... A hidden Markov Model for Predicting Transmembrane Helices

20

39

Performance

Challenges long-range interactions (e.g., β-sheets)

40

Conclusions HMM are very useful when a Markovian model is appropriate

Solve three central problems: Decoding sequences to classify their components Computing likelihood of the classification Training

May be extended to handle non-Markovian elements

But can lose their predictive power when Markovian assumptions

diverge from nature


Recommended