+ All Categories
Home > Documents > Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Date post: 21-Dec-2015
Category:
View: 216 times
Download: 1 times
Share this document with a friend
35
Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models
Transcript
Page 1: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Machine Learning

Hidden Markov Models

Page 2: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Markov Property

A stochastic process has the Markov property if the conditional probability of future states of the process, depends only upon the present state.

i.e. what I’m likely to do next depends only on where I am now, NOT on how I got here.

P(qt | qt-1,…,q1) = P(qt | qt-1)

Which processes have the Markov property?

K

1

2

Page 3: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Markov model for Dow Jones

Page 4: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Dishonest Casino

A casino has two dice:• Fair die

P(1) = P(2) =…= P(5) = P(6) = 1/6• Loaded die

P(1) = P(2) =…= P(5) = 1/10; P(6) = ½

I think the casino switches back and forth between fair and loaded die once every 20 turns, on average

Page 5: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

My dishonest casino model

FAIR LOADED

0.05

0.05

0.950.95

P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6

P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) = 1/2

This is a hidden Markov model (HMM)

Page 6: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Elements of a Hidden Markov Model

• A finite set of states Q = { q1, ..., qK }

• A set of transition probabilities between states, A…each aij, in A is the prob. of going from state i to state j

• The probability of starting in each state 1, …, K} …each K in is the probability of starting in state k

• A set of emission probabilities, B…where each bi(oj) in B is the probability of observing output oj when in state i

Page 7: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

My dishonest casino model

FAIR LOADED

0.05

0.05

0.950.95

This is a HIDDEN Markov model because the states are not directly observable.

If the fair die were red and the unfair die were blue, then the Markov model would NOT be hidden.

Page 8: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

HMMs are good for…

• Speech Recognition• Gene Sequence Matching• Text Processing

– Part of speech tagging– Information extraction– Handwriting recognition

Page 9: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Three Basic Problems for HMMs

• Given: observation sequence O=(o1o2…oT), of events from the alphabet , and HMM model = (A,B,)…

• Problem 1 (Evaluation): What is P(O| ), the probability of the observation sequence,

given the model

• Problem 2 (Decoding): What sequence of states Q=(q1q2…qT) best explains the

observations

• Problem 3 (Learning): How do we adjust the model parameters = (A,B,) to

maximize P(O| )?

Page 10: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Evaluation Problem

• Given observation sequence O and HMM , compute P(O| )• Helps us pick which model is the best one

FAIR LOADED

0.05

0.05

0.950.95

FAIR LOADED

0.95

0.05

0.950.05

O = 1,6,6,2,6,3,6,6

Page 11: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Computing P(O|)

• Naïve: Try every path through the model

• Sum the probabilities of all possible paths

• This can be intractable. O(NT)• What we do instead:

– The Forward Algorithm. O(N2T)FAIR LOADED

0.95

0.05

0.950.05

Page 12: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Forward Algorithm

Page 13: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The inductive step,

• Computation of t(j) by summing all previous values t-1(i) for all i

t-1(i) t(j)

A hidden state at time t-1 transitionprobability

Page 14: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Forward Algorithm Example

Observation sequence = 1,6,6,2

Model =P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6

P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) = 1/2

P (fair) = .7P (loaded) = .3

FAIR LOADED

0.95

0.05

0.950.05

State 1 (fair)

State 2 (loaded)

2(i)1(i) 3(i) 4(i)

Start prob

0.7*1/6

0.3*1/10

1(1)*0.05*1/6+1(2)*0.05*1/6

1(1)*0.95*1/2+1(2)*0.95*1/2

2(1)*0.05*1/6+2(2)*0.05*1/6

3(1)*0.05*1/6+3(2)*0.05*1/6

2(1)*0.95*1/2+2(2)*0.95*1/2

3(1)*0.95*1/10+3(2)*0.95*1/10

Page 15: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Markov model for Dow Jones

Page 16: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Forward trellis for Dow Jones

Page 17: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Decoding Problem

• What sequence of states Q=(q1q2…qT) best explains the observation sequence O=(o1o2…oT)?

• Helps us find the path through a model.

ART N V ADV

The dog sat quietly

Page 18: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Decoding Problem

What sequence of states Q=(q1q2…qT) best explains the observation sequence O=(o1o2…oT)?

• Viterbi Decoding: – slight modification of the forward algorithm– the major difference is the maximization over previous

states

Note: Most likely state sequence is not the same as the sequence of most likely states

Page 19: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Viterbi Algorithm

Page 20: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Forward inductive step

• Computation of t(j)

t-1(j)

t-1 t

)()()( 1 tjiji

tt obakj

Page 21: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Viterbi inductive step

• Computation of vt(j)

vt-1(i)

t-1 t

))()((max)( 1 tjkjti

t obaivjv

Keep track of who the predecessor was at each step.

Page 22: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Viterbi for Dow Jones

Page 23: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The Learning Problem

• Given O, how do we adjust the model parameters = (A,B,) to maximize P(O| )?

• In other words: How do we make a hidden Markov Model that best models the what we observe?

Page 24: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Baum-Welch Local Maximization

• 1st step: You determine– The number of hidden states, N– The emission (observation alphabet)

• 2nd step: randomly assign values to…A - the transition probabilitiesB - the observation (emission) probabilities - the starting state probabilities

• 3rd step: Let the machine re-estimateA, B,

Page 25: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Estimation Formulae

expected frequency of state at time 1

expected num transitions from state to state

expected num of transitions from state

expected num of observations of symbol in state ( )

expected

i

ij

j

i

i ja

i

k jb k

number of times in state j

Page 26: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Learning transitions…

),|,(),( 1 OSqSqPji jtitt

Page 27: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Math…

N

k

N

lttlklt

ttjijtt

ttijtt

jtitt

lObak

jObaiji

OP

jObaiji

OSqSqPji

1 111

11

11

1

)()()(

)()()(),(

)|(

)()()(),(

),|,(),(

Page 28: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Estimation of starting probs.

),()(

...

)(

1 at time i state offrequency expected

1

1

jii

where

i

N

jtt

i

i

This is number of transitions from i at time t

Page 29: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Estimation Formulae

1

11

1

expected num transitions from state to state

expected num of transitions from state

( , )

( )

ij

T

tt

ij T

tt

i ja

i

i ja

i

Page 30: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

Estimation Formulae

1Such that

1

expected num of observations of symbol in state ( )

expected number of times in state

( )

( )( )

t k

j

T

tt

O vj T

tt

k jb k

j

j

b kj

k

Page 31: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

What are we maximizing again?

),,(

is... model dreestimateOur

),,(

is... modelcurrent The

BA

BA

Page 32: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

The game is…

• EITHER the current model is at a local maximum and…

reestimate = current model

• OR our reestimate will be slightly better and…

reestimate != current model

• SO we feed in the reestimate as the current model, over and over until we can’t improve any more.

Page 33: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Caveats

• This is a kind of hill-climbing technique– Often has serious problems with local

maxima– You don’t know when you’re done

Page 34: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Doug Downey, adapted from Bryan Pardo,Northwestern University

So…how else could we do this?

• Standard gradient descent techniques?– Hill climb?– Beam search?– Genetic Algorithm?

Page 35: Doug Downey, adapted from Bryan Pardo,Northwestern University Machine Learning Hidden Markov Models.

Back to the fundamental question

• Which processes have the Markov property?– What if a hidden state variable is

included?(an in an HMM)

Doug Downey, adapted from Bryan Pardo,Northwestern University


Recommended