+ All Categories
Home > Documents > Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and...

Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and...

Date post: 09-Oct-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
29
Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391
Transcript
Page 1: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

Hidden Markov Models

The three basic HMM problems

(note: change in notation)

Mitch Marcus

CSE 391

Page 2: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI2

Parameters of an HMM

States: A set of states S=s1, … sn

Transition probabilities: A= a1,1, a1,2, …, an,n

Each ai,j represents the probability of

transitioning from state si to sj.

Emission probabilities: a set B of functions of

the form bi(ot) which is the probability of

observation ot being emitted by si

Initial state distribution: is the probability that

si is a start state

i

(This and later slides follow classic formulation by Rabiner and Juang

following Ferguson, as adapted by Manning and Schutze. Slides

adapted from Dorr. Note the change in notation!!)

Page 3: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI3

The Three Basic HMM Problems

Problem 1 (Evaluation): Given the observation

sequence O=o1,…,oT and an HMM model

, how do we compute the

probability of O given the model?

Problem 2 (Decoding): Given the observation

sequence O=o1,…,oT and an HMM model

, how do we find the state

sequence that best explains the observations?

Problem 3 (Learning): How do we adjust the

model parameters , to maximize

?

(A,B,)

(A,B,)

(A,B,)

P(O |)

Page 4: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI4

Problem 1: Probability of an Observation Sequence

Q: What is ?

A: the sum of the probabilities of all possible

state sequences in the HMM.

• The probability of each state sequence is itself the product of

the state transitions and emit probabilities

Naïve computation is very expensive. Given T

observations and N states, there are NT possible

state sequences.

• (for T=10 and N=10, 10 billion different paths!!)

Solution: linear time dynamic programming!

P(O |)

Page 5: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI5

The Crucial Data Structure: The Trellis

Page 6: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI6

Forward Probabilities:

For a given HMM ,

given that the state is i at time t (with change of

notation: some arbitrary time),

what is the probability that the partial

observation o1 … ot has been generated?

Forward algorithm computes t(i) 1<i<N, 1<t<T

in time 0(N2T) using the trellis

t (i) P(o1...ot, qt si | )

Page 7: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI7

Forward Algorithm: Induction step

t ( j) t1(i)aij

i1

N

b j (ot )

t (i) P(o1...ot , qt si | )

Page 8: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI8

Forward Algorithm

Initialization:

Induction:

Termination:

NjTtobaij tj

N

i

ijtt

1,2)()()(1

1

1(i) ibi(o1) 1 i N

P(O | ) T (i)i1

N

Page 9: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI9

Forward Algorithm Complexity

Naïve approach requires exponential time to

evaluate all NT state sequences

Forward algorithm using dynamic programming

takes O(N2T) computations

Page 10: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI10

Backward Probabilities:

For a given HMM ,

given that the state is i at time t,

what is the probability that the partial observation

ot+1 … oT will be generated?

Analogous to forward probability, just in the

other direction:

Backward algorithm computes t(i) 1<i<N, 1<t<T

in time 0(N2T) using the trellis

t (i) P(ot1...oT | qt si,)

Page 11: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI11

Backward Probabilities

N

j

ttjijt jobai1

11 )()()(

t (i) P(ot1...oT | qt si,)

Page 12: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI12

Backward Algorithm

Initialization:

Induction :

Termination:

T (i) 1, 1 i N

1 1

1

( ) ( ) ( ) 1 1,1N

t ij j t t

j

i a b o j T t i N

N

i

ii iobOP1

11 )()()|(

Page 13: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI13

Problem 2: Decoding

The Forward algorithm gives the sum of all

paths through an HMM efficiently.

Here, we want to find the highest probability

path.

We want to find the state sequence Q=q1…qT,

such that

Q argmaxQ'

P(Q' |O,)

Page 14: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI14

Viterbi Algorithm

Just like the forward algorithm, but instead of

summing over transitions from incoming

states, compute the maximum

Forward:

Viterbi Recursion:

1

1

( ) ( ) ( )N

t t ij j t

i

j i a b o

11max ( ) ( )( ) t ijt j t

i Ni a b oj

Page 15: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI15

Core Idea of Viterbi Algorithm

Page 16: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

Not quite what we want….

Viterbi recursion computes the maximum

probability path to state j at time t given that the

partial observation o1 … ot has been generated

But we want the path itself that gives the maximum

probability

Solution:

1. Keep backpointers

2. Find

3. Chase backpointers from state j at time T to

find state sequence (backwards) CIS 391 - Intro to AI

16

11max ( ) ( )( ) t ijt j t

i Ni a b oj

arg max ( )Tj

j

Page 17: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI17

Viterbi Algorithm

Initialization:

Induction:

(Backpointers)

`

Termination: (Final state!)

Backpointer path:

1(i) ib j (o1) 1 i N

11

( ) max ( ) ( )t t ij j ti N

j i a b o

NjTt 1,2

*

1

maxarg ( )Ti N

Tq i

qt

* t1(qt1

* ) t T 1,...,1

11

arg( ) max ( )t t iji N

j i a

Page 18: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI18

Problem 3: Learning

Up to now we’ve assumed that we know the

underlying model

Often these parameters are estimated on

annotated training data, but:

Annotation is often difficult and/or expensive

Training data is different from the current data

We want to maximize the parameters with

respect to the current data, i.e., we’re looking

for a model , such that

(A,B,)

'

' argmax

P(O | )

Page 19: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI19

Problem 3: Learning (If Time Allows…)

Unfortunately, there is no known way to

analytically find a global maximum, i.e., a

model , such that

But it is possible to find a local maximum

Given an initial model , we can always find a

model , such that

'

' argmax

P(O | )

'

P(O |') P(O |)

Page 20: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI20

Forward-Backward (Baum-Welch) algorithm

Key Idea: parameter re-estimation by hill-climbing

FB algorithm iteratively re-estimates the parameters yielding a new at each iteration

1. Initialize to a random set of values

2. Estimate , filling out the trellis for both the Forward and the Backward algorithms

3. Reestimate using both trellises, yielding a new estimate

Theorem:

'

P(O |') P(O |)

( | )P O

'

Page 21: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI21

Parameter Re-estimation

Three parameters need to be re-estimated:

• Initial state distribution:

• Transition probabilities: ai,j

• Emission probabilities: bi(ot)

i

Page 22: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI22

Re-estimating Transition Probabilities: Step 1

What’s the probability of being in state si at time t

and going to state sj, given the current model and

parameters?

t (i, j) P(qt si, qt1 s j |O,)

Page 23: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI23

Re-estimating Transition Probabilities: Step 1

t (i, j) t (i) ai, j b j (ot1) t1( j)

t (i) ai, j b j (ot1) t1( j)j1

N

i1

N

t (i, j) P(qt si, qt1 s j |O,)

Page 24: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI24

Re-estimating Transition Probabilities: Step 2

The intuition behind the re-estimation

equation for transition probabilities is

Formally:

i

ji

j,i s statefrom stransition of number expected

s stateto s statefrom stransition of number expecteda =

ˆ a i, j

t (i, j)t1

T1

t (i, j ')j '1

N

t1

T1

Page 25: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI25

Re-estimating Transition Probabilities

Defining

As the probability of being in state si, given the

complete observation O

We can say:

ˆ a i, j

t (i, j)t1

T1

t (i)t1

T1

t (i) t (i, j)j1

N

Page 26: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI26

Re-estimating Initial State Probabilities

Initial state distribution: is the probability

that si is a start state

Re-estimation is easy:

Formally:

i

1 time at s statein times of number expectedπ ii =

ˆ i 1(i)

Page 27: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI27

Re-estimation of Emission Probabilities

Emission probabilities are re-estimated as

Formally:

where

Note that here is the Kronecker delta function and

is not related to the in the discussion of the Viterbi

algorithm!!

i

ki

i s statein times of number expected

v symbolobserve and s statein times of number expected)k(b =

ˆ b i(k)

(ot,vk ) t (i)t1

T

t (i)t1

T

(ot,vk ) 1, if ot vk, and 0 otherwise

Page 28: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI28

The Updated Model

Coming from we get to

by the following update rules:

(A,B,)

' ( ˆ A , ˆ B , ˆ )

ˆ b i(k)

(ot,vk ) t (i)t1

T

t (i)t1

T

ˆ a i, j

t (i, j)t1

T1

t (i)t1

T1

ˆ i 1(i)

Page 29: Hidden Markov Models The three basic HMM problems (note ...cis391/Lectures/hmm... · (This and later slides follow classic formulation by Rabiner and Juang following Ferguson, as

CIS 391 - Intro to AI29

Expectation Maximization

The forward-backward algorithm is an instance of

the more general EM algorithm

• The E Step: Compute the forward and backward

probabilities for a give model

• The M Step: Re-estimate the model parameters


Recommended