+ All Categories
Home > Documents > Hidden Markov Model - Georgia Institute of Technologymsse.gatech.edu/GIP/lect04_HMM_yanwang.pdf ·...

Hidden Markov Model - Georgia Institute of Technologymsse.gatech.edu/GIP/lect04_HMM_yanwang.pdf ·...

Date post: 05-Jun-2019
Category:
Upload: buitu
View: 214 times
Download: 0 times
Share this document with a friend
24
Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of Technology Atlanta, GA 30332, U.S.A. [email protected]
Transcript

Hidden Markov Model

Prof. Yan WangWoodruff School of Mechanical Engineering

Georgia Institute of TechnologyAtlanta, GA 30332, [email protected]

Multiscale Systems Engineering Research Group

Learning Objectives

To familiarize the hidden Markov model as a generalization of Markov chain

To understand the three basic problems (evaluation, decoding, and learning) in HMM model construction and applications

Multiscale Systems Engineering Research Group

Hidden Markov Model (HMM)

HMM is an extension of regular Markov chainState variables q’s are not directly observable

All statistical inference about the Markov chain itself has to be done in terms of observable o’sobservable

hidden

ot−1

qt−1

o1

q1 qt qt+1

ot ot+1 oT

qT

Multiscale Systems Engineering Research Group

HMM

o’s are conditionally independent given {qt}.

However, {ot} is not an independent sequence, nor a Markov chain itself.

observable

hidden

ot−1

qt−1

o1

q1 qt qt+1

ot ot+1 oT

qT

Multiscale Systems Engineering Research Group

HMM Components

State sequence: Q={q1,q2,…,qT} with Npossible valuesObservation sequence: O={o1,o2,…,oT} with M possible symbols {v1,v2,…vM}State transition matrix: A=(aij)N×N where aij=P(qt+1=j|qt=i)Observation matrix: B=(bik)N×M where bik=bi(vk)=P(ot=vk|qt=i)Initial state distribution: Δ=(δi)1×N where δi=P(q1=i)

Model’s parameters λ={A, B, Δ}

Multiscale Systems Engineering Research Group

Three Basic Problems in HMMEvaluation: Given a model with parameters λ and a sequence of observations O={o1,o2,…,oT}, what is the probability that the model generates those observations P(O|λ)?Decoding: Given a model with parameters λand a sequence of observations O={o1,o2,…,oT}, what is the most likely state sequence Q={q1,q2,…,qT} in the model that produces the observations?Learning: Given a set of observations O={o1,o2,…,oT}, how can we find a model with the parameters λ with the maximum likelihood P(O|λ)?

Multiscale Systems Engineering Research Group

Evaluation Problem

Given an O={o1,o2,…,oT}, P(O|λ)=?

Naïve algorithmbecause of lemma 3.1

where Q(d) is one of all possible combinations of state sequences

assume conditional independence between observations

However, the number of possible combinations of state sequences is huge!

( ) ( )

( ) ( ) ( )( | ) ( , | ) ( | , ) ( | )d d

d d d

Q QP O P O Q P O Q P Qλ λ λ λ= =∑ ∑

( )

1 1( | , ) ( | , ) ( )

t

T Tdt t q tt t

P O Q P o q b oλ λ= =

= =∏ ∏1 1 1

1 1( )11 1

( | ) ( | , )t t

T Tdq t t q q qt t

P Q P q q aλ δ λ δ+

− −

+= == =∏ ∏

Multiscale Systems Engineering Research Group

Evaluation Problem –Forward Algorithm

Define a forward variable

which can be recursively calculated by 1 2

( ) ( , , , , | )t t ti P o o o q iα λ= =…

1 1 1 1 1 1 1( ) ( , | ) ( | ) ( | , ) ( )

i ii P o q i P q i P o q i b oα λ λ λ δ= = = = = =

1 1 1 1

1 1 1 1

1 1 1 1 1

1 1 1 1

1 1 1 1

1 1

( ) ( , , , | )

( | ) ( , , | , )

( | ) ( | , ) ( , , | , )

( | , ) ( , , , | )

( | , ) ( , , , , | )

( |t

t t t

t t t

t t t t t

t t t t

t t t t tq

t t

i P o o q i

P q i P o o q i

P q i P o q i P o o q i

P o q i P o o q i

P o q i P o o q q i

P o q

α λλ λλ λ λ

λ λ

λ λ

+ + +

+ + +

+ + + +

+ + +

+ + +

+ +

= == = == = = == = =

= = =

= =

……

……

1 1

1 ,

, ) ( | , ) ( , , , | )

( ) ( )t

tt

t t t tq

i t q i t tq

i P q i q P o o q

b o a q

λ λ λ

α+

+

=

=

∑∑

1 1( | ) ( , | ) ( )

N N

T Ti iP O P O q i iλ λ α

= == = =∑ ∑

Multiscale Systems Engineering Research Group

Evaluation Problem –Backward Algorithm

Define a backward variable

which can be recursively calculated by 1 2

( ) ( , , , | , )t t t T ti P o o o q iβ λ+ += =…

( ) 1Tiβ =

1

1

1 2

1 2

1 2 1

2 11

( ) ( , , | , )

( | , ) ( , , | , )

( | , ) ( , , , | ) / ( | )

( | , ) ( , , , , | ) / ( | )

( , , | , , ) (( | , )

t

t t T t

t t t T t

t t t T t t

t t t T t t tq

t T t t tt t

i P o o q i

P o q i P o o q i

P o q i P o o q i P q i

P o q i P o o q i q P q i

P o o q i q P qP o q i

β λλ λλ λ λ

λ λ λλ

λ

+

+

+ +

+ +

+ + +

+ ++

= == = == = = =

= = = =

== =

………

……

1

1

11

1

1 2 1 1

1 , 1 1

, | )

( | )

( | , ) ( , , | , ) ( | , )

( ) ( )

t

t

tt

tq

t

t t t T t t tq

i t i q t tq

i q

P q i

P o q i P o o q P q q i

b o a q

λλ

λ λ λ

β

+

+

++

+

+ + + +

+ + +

=

=

= = =

=

∑∑

Multiscale Systems Engineering Research Group

Decoding Problem

Given an O={o1,o2,…,oT}, find a Q*={q1*,q2*,…,qT*} with the maximum of P(O|Q)

Viterbi Algorithm is a dynamic programming method to solve the decoding problem

Multiscale Systems Engineering Research Group

Dynamic Programming

Breaking down complex problems into subproblems in a recursive manner.

e.g. search the shortest path in the Traveling Salesman Problem

Multiscale Systems Engineering Research Group

Define an auxiliary probability

which is the highest probability that a single path leads to qt=i at time t

Recursively

Decoding Problem –Viterbi Algorithm

1 11 1 1, ,

( ) : max ( , , , , , , | )t

t t t tq qi P q q q i o oρ λ

−−= =

…… …

{ }{ }{ }

1 1 1 1

1 1 1

1

( ) max ( ) ( | , ) ( | )

max ( ) ( | , ) ( | )

max ( ) ( )

t t t t t ti

t t t t ti

t ij j ti

j i P q j q i P o q j

i P q j q i P o q j

i a b o

ρ ρ λ

ρ λ

ρ

+ + + +

+ + +

+

= = = =

= = = =

=

Multiscale Systems Engineering Research Group

Define an auxiliary variable

to store the optimal state at time t to reach state j at time t+1.

The algorithm is1. initialize

2. recursion

Decoding Problem –Viterbi Algorithm – cont’d

( )1 1( ) ( ) , 1, ,

j jj b o j j Nρ δ= ∀ = …

{ }1 1( ) max ( ) ( )

t t ij j tij i a b oρ ρ+ +=

{ } { }1 1( ) arg max ( ) ( ) arg max ( )

t t ij j t t iji i

j i a b o i aψ ρ ρ+ += =

1( ) 0jψ =

{ }1( ) arg max ( )

t t iji

j i aψ ρ+ =

Multiscale Systems Engineering Research Group

3. terminate•The optimal probability

•The optimal final state

4. backtrack state sequence

Decoding Problem –Viterbi Algorithm – cont’d

{ }* max ( )Tj

P jρ=

{ }* arg max ( )T T

jq jρ=

* *1 1( )

t t tq qψ + +=

Multiscale Systems Engineering Research Group

Learning Problem

Given an O={o1,o2,…,oT}, find a λ*={A*, B*, Δ*} with the maximum likelihood P(O|λ)

Global optimum needs to search all possible state and observation sequences

Instead, Baum-Welch Algorithm is usually used to search heuristically

{ } { }* arg max ( | ) arg max log ( | )P O P Oλ λ

λ λ λ= =

{ }( ) ( )

* ( ) ( )arg max log ( | , )d d

d d

O QP O Q

λλ λ= ∑ ∑

Multiscale Systems Engineering Research Group

Learning Problem –Baum-Welch Algorithm

Algorithm 1.Guess some parameters λ2. Determine some “probable paths” {Q(1),…,Q(d)}3. Estimate the number of transitions , from state i to state j, given the current estimate of λ.4. Estimate the number of the observation vkemitted from state i as

5. Estimate the initial distribution6. Re-estimate λ’ from Aij’s and Bi(vk)’s

7. If λ’ and λ is close enough, stop; otherwise, assign λ = λ’ and go back to step 2.

ija

ˆ( )i kb v

ˆiδ

Multiscale Systems Engineering Research Group

Learning Problem –Baum-Welch Algorithm – cont’d

Define an auxiliary likelihood

which is the probability that a transition qt=i and qt+1=joccurs at time t given the complete observations {o1,o2,…,oT} and model parameters λ

1 1( , ) ( , | , , , )t t t Ti j P q i q j o o+= = =ξ λ…

Multiscale Systems Engineering Research Group

Learning Problem –Baum-Welch Algorithm – cont’d

1 1

1 1

1

1 1 1

1

1 1 1

2 1 1

( , ) ( , | , , , )

( , , , , | )

( , , | )( , , | , , ) ( , | )

( , , | )( , , | | ) ( | , )

( , , | | ) ( | , ) ( |

t t t T

t t T

T

T t t t t

T

t t t t

t T t t t t

i j P q i q j o o

P q i q j o o

P o oP o o q i q j P q i q j

P o oP o o q i P o q j

P o o q j P q j q i P q i

+

+

+ +

+ +

+ + +

= = == =

=

= = = ==

= == = = =

=

ξ λλ

λλ λ

λλ λ

λ λ

……

……

………

1

1 1

1 1 2 1

1

1 1

1

)

( , , | )( , , , | ) ( | , )

( | , ) ( , , | | )

( , , | )( ) ( ) ( )

( )

T

t t t t

t t t T t

T

t ij i t t

N

Ti

P o oP o o q i P q j q i

P o q j P o o q j

P o oi a b o j

i

+

+ + + +

+ +

=

⎡ ⎤⎢ ⎥⎢ ⎥⎣ ⎦

⎡ ⎤= = =⎢ ⎥

= =⎢ ⎥⎣ ⎦=

=∑

λ

λλ λ

λ λ

λα β

α

……

Multiscale Systems Engineering Research Group

Learning Problem –Baum-Welch Algorithm – cont’dEstimate parameters

1 11 1

11 1 1

( , | , , , ) ( , )ˆ

( | , , , ) ( , )

T T

t t T tt tij T T N

t T tt t k

P q i q j o o i ja

P q i o o i k

+= =

= = =

= == =

=

∑ ∑∑ ∑ ∑

λ ξ

λ ξ

1 11

11

1

1 1

( , , | , , , )ˆ( )( | , , , )

1 ( , )

( , )t k

T

t k t t Tti k T

t TtT

o v tt

T N

tt k

P o v q i q j o ob v

P q i o o

i j

i k

+=

=

==

= =

= = ==

=

=

∑∑

∑∑ ∑

λ

λ

ξ

ξ

1 2 1 11 1ˆ ( , | , , , ) ( , )

N N

i Tk kP q i q k o o i k

= == = = =∑ ∑δ λ ξ…

Multiscale Systems Engineering Research Group

Learning Problem –Baum-Welch Algorithm – cont’dHow to measure two models, λ’ and λ, are close enough?

( )

( )( | )d

d

Op P O λ= ∑

11

2

( )( ) log

( )

p xCross Entropy p x

p x=

Multiscale Systems Engineering Research Group

HMM Applications

HMM has been applied in many fields that are based on the analysis of discrete-valued time series, such as

speech recognition (Rabiner, 1989)

Image recognition

genetic profile and classification (Eddy, 1998)

Multiscale Systems Engineering Research Group

Kalman Filter

Can be regarded as a special case of HMM

Also known as the Gaussian linear state-space model

State series are linearly dependent on history, subject to process white noises.

Observations are also linearly dependent on states, subject to measurement noises.

t t t= +x Cx w

t t t= +y Dx v

Multiscale Systems Engineering Research Group

Summary

Hidden Markov model is a generalization Markov chain

Multiscale Systems Engineering Research Group

Further ReadingsRabiner L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2): 267-296 Eddy S.R. (1998) Profile hidden Markov models. Bioinformatics Review, 14(9): 755-776 Dempster A.P., Laird N.M., and Rubin D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 39(1): 1-38Wu C.F.J. (1983) On the convergence properties of the EM algorithm. The Annals of Statistics, 11(1): 95-103HMM Software Packages http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html


Recommended