+ All Categories
Home > Documents > Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 ·...

Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 ·...

Date post: 15-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
48
Hidden Markov Models Reference: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, L. R. Rabiner, Proceedings of the IEEE, Vol. 77, No. 2, 1989
Transcript
Page 1: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Hidden Markov ModelsReference:A Tutorial on Hidden Markov Models and

Selected Applications in Speech Recognition, L. R. Rabiner, Proceedings of the IEEE, Vol. 77, No. 2, 1989

Page 2: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Outlinen Introductionn Markov Modelsn Hidden Markov Modelsn Forward/Backward Algorithmsn Viterbi Algorithmn Baum-Welch estimation algorithm

Page 3: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Introductionn Input consists of a sequence of signalsn Types of Signal Models

n Deterministic modelsn sine wave, sum of exponentials

n Statistical modelsn Gaussian process, Markov process, hidden Markov process

n Examples of Applicationsn Speech Recognitionn Word-Sense Disambiguationn DNA Sequence Modelingn Text Modeling and Information Extraction

Page 4: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Markov Models

Page 5: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Markov Models

n States are observable

Page 6: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Markov Models

Page 7: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Weather Modeln States

n R: Rainy, C: Cloudy, S: Sunny

n State Transition Probability Matrix

n What is the probability of observing O=SSRRSCS given that today is S?

Page 8: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Weather Modeln Basic Rule: P(A, B) = P(A|B)P(B)n Markov chain rule:

Page 9: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Weather Modeln Observation Sequence O:

O = (S, S, S, R, R, S, C, S)n By Chain Rule

initial probability

πi = P(q1=i)

Page 10: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Durationn What’s the probability that the sequence

remains in state i for exactly d time units?

n duration density is exponentialn expected value of duration d in state i

Page 11: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Duration

Page 12: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Hidden Markov Modelsn States are not observablen Observations are probabilistic functions of

statesn State transitions are still probabilistic

Page 13: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Coin Toss Modelsn Scenario: You are in a room with a barrier

through which you cannot see what is happening. On the other side of barrier another person is performing a coin (or multiple coin) tossing experiment. He will only tell you the result of each coin flip.

n The problem is how do we build a model to explain the observed sequence of heads and tails?

Page 14: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Coin Toss Modelsn Observation:a sequence of heads and tails

n Build a HMM to explain the observed sequencen What the states correspond to?n How many states? (How many coins?)n What are the parameters?

Page 15: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

One-coin Modeln Each state corresponds to a side of the

coin (observation generator)n observable Markov model

n Corresponds to a 1-state HMM

Page 16: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Two-coin Modeln Each state corresponds to a biased coinn Hidden Markov Model

n Transition matrix are estimated by a set of independent coin tosses

Page 17: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Three-coin Model

Page 18: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Model Selectionn Which model best matches the actual

observations?n 1-coin model: 1 unknown parametern 2-coin model: 4 unknown parametersn 3-coin model: 9 unknown parametersn Larger HMMs will match better than

smaller HMMsn Impose strong limitation on the size of

models

Page 19: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Urn and Ball Model

Page 20: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Headers of Scientific Paper

•Citation Index

•Citation Database

•Each state corresponds to one component of the paper header.

•application: information extraction

Page 21: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

DNA Sequence Modeling

•Each state corresponds to one position.

•application: profile HMM

Page 22: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Elements of a HMMn Q={1,2,… ,N} : set of hidden statesn V={1,2,… ,M} : set of observation symbolsn A: state transition probability matrix

n aij = P(qt+1=j|qt=i)

n B: observation symbol probabilityn bj(k) = P(ot=k|qt=j)

n π: initial state distributionn πi = P(q1=i)

n λ: the entire model λ=(A,B,π)

Page 23: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Sequence Generatorn generate a sequence of T observations

O=(o1,o2,… ,oT)1. Choose an initial state q1 = Si according to

state distribution, and set t = 12. Choose Ot=vk according to the symbol

probability distribution in state Si, i.e. bi(k)3. Transit to a new state qt+1=Sj according to

the state transition probability distribution for state Si, i.e. aij

4. Set t = t+1; go to step 3 if t < T; otherwise terminate the procedure

Page 24: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Execution of HMM

State sequence corresponds to a path in the grids

Page 25: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Three Basic Problemsn compute the probability that the model

generates the observation sequencen find the optimal state sequence that

generates the observation sequencen learn a HMM that best fits the observation

sequences

Page 26: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Basic Problem 1n Given observation O=(o1,o2,… ,oT) and

model λ=(A,B,π), efficiently compute P(O|λ)n P(O|λ) is the probability that O is produced by λn Hidden states complicates the probability

evaluationn Given two models λ1 and λ2, the probability

(score) can be used to choose the better onen λi models some protein familyn O denotes a proteinn find the most probable protein family for On speech recognitionn on-line handwritten character recognition

Page 27: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Basic Problem 2n Given observation O=(o1,o2,… ,oT) and

model λ=(A,B,π), find the optimal state sequence q=(q1,q2,… ,qT)n to uncover the hidden part of the modeln Optimality criterion has to be decided (e.g.

maximum likelihood)n find “explanation” for the data

n O is the header of some scientific papern find title, author, publication date, … of the papern a fundamental problem in citation index

generationn word-sense disambiguation, gene finding

Page 28: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Basic Problem 3n Given observation O=(o1,o2,… ,oT),

estimate model parameters λ=(A,B,π) that maximizes P(O|λ)n to train the modeln find the best topologyn find the best parameters

Page 29: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Word Speech Recognizern the speech signal of each word is

represented as a time sequence of coded spectral vectors

n build a HMM for each word; training sequence consists of sequences of codebook from one for more talkers

n recognition of an unknown word is performed by choosing the word whose model score is highest (i.e. the highest likelihood)

Page 30: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Solution to Problem 1n Problem: compute P(o1,o2,… ,oT|λ)n Consider state sequence q=(q1,q2,… ,qT)n Assume observations are independent

n P(O|q,λ) = Πi=1,… ,T(ot|qt,λ)= bq1(o1) bq2(o2)… bqT(oT)

n P(q|λ) = πq1aq1q2aq2q3… aqT-1qT

n P(O|λ) = Σq P(O|q,λ)P(q|λ)

n NT state sequences each with O(T) timen Complexity O(TNT)n For N=5, T=100, TNT=100x5100 ~ 1072

Page 31: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Forward Algorithm: Intuition

the probability of observing the partial sequence (o1,o2,… ,ot) such that state qt is i

αt(i) = P(o1,o2,… ,ot,qt=i|λ)

Page 32: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Forward Algorithmn forward variable αt(i) = P(o1,o2,… ,ot,qt=i|λ)n αt(i) is the probability of observing the partial

sequence (o1,o2,… ,ot) such that state qt is Si

n Initialization: α1(i) = πibi(o1)n Induction:

n Termination:

n Complexity: O(N2T)

)()()( 11

1 +=

+

= ∑ tj

N

iijtt obaij αα

∑ ==

N

i T iOP1

)()|( αλ

Page 33: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Backward Algorithm: Intuition

The probability of observing the partial sequence (ot+1,ot+2,… ,oT) such that state qt is i

βt(i) = P(ot+1,ot+2,… ,oT|qt=i,λ)

∑=

++=N

jttjijt jobai

111 )()()( ββ

∑ ==

N

i ii iobOP1 11 )()()|( βπλ

Page 34: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Backward Algorithmn backward variable

βt(i) = P(ot+1,ot+2,… ,oT|qt=i,λ)n βt(i) is the probability of observing the partial

sequence (ot+1,ot+2,… ,oT) such that state qtis i

n Initialization: βT(i) = 1n Induction:

n Termination: n Complexity: O(N2T)

∑=

++=N

jttjijt jobai

111 )()()( ββ ∑

=++=

N

jttjijt jobai

111 )()()( ββ

∑ ==

N

i ii iobOP1 11 )()()|( βπλ

Page 35: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Combing Forward and Backward

)()(),|,,()|,,,(

),,,,|,,(

)|,,,()|,,,,,,(

)|,,,()|,(

11

11

11

11

1

iiiqooPiqooP

iqooooP

iqooPooiqooP

iqooPiqOP

tt

tTttt

ttTt

tt

Tttt

tTt

βαλλ

λλ

λλλ

==×==

=×==

=====

LLLL

LLL

L

TtiiOPN

i tt ≤≤= ∑ =1 ,)()()|(

1βαλ

Page 36: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Solution to Problem 2n Find the most likely pathn Find the path that maximizes likelihood:

P(q1,q2,… ,qT|O,λ) which is equivalent to maximize P(q1,q2,… ,qT,O|λ)

n definen δt(i) is the highest prob. path ending at

state in by induction,

)|,,,,,,,(max)( 2121,,, 121

λδ ttqqqt oooiqqqPit

LLL

==−

)(])([max)( 11 ++ ⋅= tjijti

t obaij δδ

Page 37: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Viterbi Algorithm)(])([max)( 11 ++ ⋅= tjijt

it obaij δδ

)|,,,,,,,(max)( 2121,,, 121

λδ ttqqqt oooiqqqPit

LLL

==−

)(max1

* iP TNi

δ≤≤

=

Page 38: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Viterbi Algorithmn Initialization:

n Recursion:

n Termination:

n Path (state sequence) backtracking:

)( tj ob

Page 39: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Solution to Problem 3n estimate λ=(A,B,π) to maximize P(O|λ)n no analytic method because of complexity –

iterative methodn is the probability of being in state i at

time t, and in state j at time t+1n

∑ ∑= = ++

++

++

=

=

N

k

N

l ttlklt

ttjijt

ttjijtt

lobak

jobaiOP

jobaiji

1 1 11

11

11

)()()(

)()()(

)|(

)()()(),(

βα

βαλ

βαξ

),( jitξ

Page 40: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Expectation Maximizationn a’ij = (expected number of transitions from

state i to state j) / (expected number of transitions from state i)

n b’j(k) = (expected number of times in state j and observing symbol k) / (expected number of times in state j)

n P(O|λ’)>P(O|λ)

Page 41: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Expectation Maximization

∑ ∑∑

= =

=

++

=

===

===

1

1 1

1

1

11

),(

),(

),()|(

)|,,(),|,(

T

t

N

k t

T

t tij

ttt

tt

ki

jia

jiOP

OjqiqPOjqiqP

ξ

ξ

ξλ

λλ

Page 42: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Part-of-Speech Tagging

Page 43: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

POS Taggingn labeling each word in a sentence with its

appropriate part of speech, i.e. noun, verb, adjective, …n The-AT representative-NN put-VBD chairs-

NNS on-IN the-AT table-NN.

n The-AT representative-JJ put-NN chairs-VBZon-IN the-AT table-NN.

Page 44: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Information Sources in Tagging

n context informationn a new playn play football

n syntagmatic informationn AT JJ NN commonn AT JJ VBP extremely rare

n lexical informationn tag distribution of a work is extremely unevenn basic tag v.s. derived tagsn dumb tagger achieves 90% accuracy

Page 45: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Summary

n Rules-based tagger using syntagmaticpatterns is about 77% (Greene and Rubin, 1977)

n A dumb tagger (basic tag) is 90% (Charbiak, 1993)

n HMM tagger is about 97%

Page 46: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

HMM Taggers

n States of the HMM are tagsn transition probability

n emission probability

n tag sequence :

)(),(

)|( j

kjjk

tCttC

ttP =

)(),(

)|( j

kljl

tCtwC

twP =

)|(maxarg ,1,1,1

nnt

wtPn

Page 47: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

095413294656758016PRD

152212914764758426072VB

21392614117734247037201067NN

1850173141325043322IN

38018742601973BEZ

19048636000AT

PRDVBNNINBEZATwl tj

)(),(

)|( j

kljl

tCtwC

twP =

Transition Probability

Page 48: Hidden Markov Modelsread.pudn.com/.../555264/lecture9-Hidden-Markov-Models.pdf · 2008-04-12 · Introduction n Input consists of a sequence of signals n Types of Signal Models n

Emission Probability

4880900000.0000069016the

04108000progress

00382000president

000548400on

013336000move

0000100650is

04310000bear

PRDVBNNINBEZAT


Recommended