HMM (Hidden Markov Model)

Post on 29-Jun-2015

1,270 views 4 download

Tags:

transcript

INTRODUCTION OF HIDDEN MARKOV MODEL

Mohan Kumar YadavM.Sc Bioinformatics

JNU JAIPUR

HIDDEN MARKOV MODEL(HMM)

Real-world has structures and processes which have observable outputs.

– Usually sequential .– Cannot see the event producing the output.

Problem: how to construct a model of the structure or process given only observations.

HISTORY OF HMM

• Basic theory developed and published in 1960s and 70s

• No widespread understanding and application until late 80s

• Why?– Theory published in mathematic journals which

were not widely read.– Insufficient tutorial material for readers to

understand and apply concepts.

Andrei Andreyevich Markov1856-1922

Andrey Andreyevich Markov was a Russian mathematician. He is best known for his work on stochastic processes. A primary subject of his research later became known as Markov chains and Markov processes .

HIDDEN MARKOV MODEL

• A Hidden Markov Model (HMM) is a statical model in which the system is being modeled is assumed to be a Markov process with hidden states.

• Markov chain property: probability of each subsequent state depends only on what was the previous state.

EXAMPLE OF HMM

• Coin toss: – Heads, tails sequence with 2 coins– You are in a room, with a wall– Person behind wall flips coin, tells result• Coin selection and toss is hidden• Cannot observe events, only output (heads,

tails) from events

– Problem is then to build a model to explain observed sequence of heads and tails.

• Weather– Once each day weather is observed• State 1: rain• State 2: cloudy• State 3: sunny

– What is the probability the weather for the next 7 days will be:• sun, sun, rain, rain, sun, cloudy, sun

– Each state corresponds to a physical observable event

EXAMPLE OF HMM

HMM COMPONENTS

• A set of states (x’s)• A set of possible output symbols (y’s)• A state transition matrix (a’s)– probability of making transition from one state to

the next• Output emission matrix (b’s)– probability of a emitting/observing a symbol at a

particular state• Initial probability vector– probability of starting at a particular state– Not shown, sometimes assumed to be 1

Rain Dry

0.70.3

0.2 0.8

• Two states : ‘Rain’ and ‘Dry’.• Transition probabilities: P(‘Rain’|‘Rain’)=0.3 ,

P(‘Dry’|‘Rain’)=0.7 , P(‘Ra’)=0.6 .•in’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry

EXAMPLE OF HMM

CALCULATION OF HMM

HMM COMPONENTS

COMMON HMM TYPES

• Ergodic (fully connected):– Every state of model can be reached in a single step from

every other state of the model.

• Bakis (left-right):– As time increases, states proceed from left to right

HMM IN BIOINFORMATICS

• Hidden Markov Models (HMMs) are a probabilistic model for modeling and representing biological sequences.

• They allow us to do things like find genes, do sequence alignments and find regulatory elements such as promoters in a principled manner.

PROBLEMS OF HMM

• Three problems must be solved for HMMs to be useful in real-world applications

1) Evaluation

2) Decoding

3) Learning

Given a set of HMMs, which is the one mostlikely to have produced the observation sequence?

EVOLUTION OF PROBLEM

GACGAAACCCTGTCTCTATTTATCC

HMM 1 HMM 2 HMM 3 HMM n…

p(HMM-1)?p(HMM-2)?

p(HMM-3)?p(HMM-n)?

DECODING PROBLEM

TRAINING PROBLEM

AATAGAGAGGTTCGACTCTGCATTTCCCAAATACGTAATGCTTACGGTACACGACCCAAGCTCTCTGCTTGAATCCCAAATCTGAGCGGACAGATGAGGGGGCGCAGAGGAAAAACAGGTTTTGGACCCTACATAAANAGAGAGGTTCGTAAATAGAGAGGTTCGACTCTGCATTTCCCAAATACGTAATGCTTACGGTTAAATAGAGAGGTTCGACTCTGCATTTCCCAAATACGTAATGCTTACGGTACACGACCCAAGCTCTCTGCTTGTAACTTGTTTTNGTCGCAGCTGGTCTTGCCTTTGCTGGGGCTGCTGAC

0.17 0.26 0.42 0.11 0.01 0.01 0.01 0.010.16 0.36 0.26 0.18 0.01 0.01 0.01 0.010.15 0.33 0.37 0.11 0.01 0.01 0.01 0.010.07 0.35 0.37 0.17 0.01 0.01 0.01 0.010.01 0.01 0.01 0.01 0.29 0.2 0.27 0.20.01 0.01 0.01 0.01 0.31 0.29 0.07 0.290.01 0.01 0.01 0.01 0.24 0.23 0.29 0.20.01 0.01 0.01 0.01 0.17 0.23 0.28 0.28

A+ C+ G+ T+ A- C- G- T-

A+

C+

G+

T+

A-

C-

G-

T-

From raw seqence data… to Transition Probabilities

How?

HMM-APPLICATION

• DNA Sequence analysis• Protein family profiling• Predprediction• Splicing signals prediction • Prediction of genes • Horizontal gene transfer• Radiation hybrid mapping, linkage analysis• Prediction of DNA functional sites.• CpG island

HMM-APPLICATION

• Speech Recognition• Vehicle Trajectory Projection• Gesture Learning for Human-Robot Interface• Positron Emission Tomography (PET)• Optical Signal Detection• Digital Communications• Music Analysis

HMM-BASED TOOLS

• GENSCAN (Burge 1997)• FGENESH (Solovyev 1997)• HMMgene (Krogh 1997)• GENIE (Kulp 1996)• GENMARK (Borodovsky & McIninch 1993)• VEIL (Henderson, Salzberg, & Fasman 1997)

BIOINFORMATICS RESOURCES• PROBE www.ncbi.nlm.nih.gov/ • BLOCKS www.blocks.fhcrc.org/• META-MEME

www.cse.ucsd.edu/users/bgrundy/metameme.1.0.html• SAM www.cse.ucsc.edu/research/compbio/sam.html • HMMERS hmmer.wustl.edu/ • HMMpro www.netid.com/ • GENEWISE www.sanger.ac.uk/Software/Wise2/ • PSI-BLAST www.ncbi.nlm.nih.gov/BLAST/newblast.html• PFAM www.sanger.ac.uk/Pfam/

Refrences

• Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-285.

• Essential bioinformatics, Jin Xion• http://www.sociable1.com/v/Andrey-Markov-

108362562522144#sthash.tbdud7my.dpuf

Thank You!