Hidden Markov Models
The three basic HMM problems
(note: change in notation)
Mitch Marcus
CSE 391
CIS 391 - Intro to AI2
Parameters of an HMM
States: A set of states S=s1, … sn
Transition probabilities: A= a1,1, a1,2, …, an,n
Each ai,j represents the probability of
transitioning from state si to sj.
Emission probabilities: a set B of functions of
the form bi(ot) which is the probability of
observation ot being emitted by si
Initial state distribution: is the probability that
si is a start state
i
(This and later slides follow classic formulation by Rabiner and Juang
following Ferguson, as adapted by Manning and Schutze. Slides
adapted from Dorr. Note the change in notation!!)
CIS 391 - Intro to AI3
The Three Basic HMM Problems
Problem 1 (Evaluation): Given the observation
sequence O=o1,…,oT and an HMM model
, how do we compute the
probability of O given the model?
Problem 2 (Decoding): Given the observation
sequence O=o1,…,oT and an HMM model
, how do we find the state
sequence that best explains the observations?
Problem 3 (Learning): How do we adjust the
model parameters , to maximize
?
(A,B,)
(A,B,)
(A,B,)
P(O |)
CIS 391 - Intro to AI4
Problem 1: Probability of an Observation Sequence
Q: What is ?
A: the sum of the probabilities of all possible
state sequences in the HMM.
• The probability of each state sequence is itself the product of
the state transitions and emit probabilities
Naïve computation is very expensive. Given T
observations and N states, there are NT possible
state sequences.
• (for T=10 and N=10, 10 billion different paths!!)
Solution: linear time dynamic programming!
P(O |)
CIS 391 - Intro to AI5
The Crucial Data Structure: The Trellis
CIS 391 - Intro to AI6
Forward Probabilities:
For a given HMM ,
given that the state is i at time t (with change of
notation: some arbitrary time),
what is the probability that the partial
observation o1 … ot has been generated?
Forward algorithm computes t(i) 1<i<N, 1<t<T
in time 0(N2T) using the trellis
t (i) P(o1...ot, qt si | )
CIS 391 - Intro to AI7
Forward Algorithm: Induction step
t ( j) t1(i)aij
i1
N
b j (ot )
t (i) P(o1...ot , qt si | )
CIS 391 - Intro to AI8
Forward Algorithm
Initialization:
Induction:
Termination:
NjTtobaij tj
N
i
ijtt
1,2)()()(1
1
1(i) ibi(o1) 1 i N
P(O | ) T (i)i1
N
CIS 391 - Intro to AI9
Forward Algorithm Complexity
Naïve approach requires exponential time to
evaluate all NT state sequences
Forward algorithm using dynamic programming
takes O(N2T) computations
CIS 391 - Intro to AI10
Backward Probabilities:
For a given HMM ,
given that the state is i at time t,
what is the probability that the partial observation
ot+1 … oT will be generated?
Analogous to forward probability, just in the
other direction:
Backward algorithm computes t(i) 1<i<N, 1<t<T
in time 0(N2T) using the trellis
t (i) P(ot1...oT | qt si,)
CIS 391 - Intro to AI11
Backward Probabilities
N
j
ttjijt jobai1
11 )()()(
t (i) P(ot1...oT | qt si,)
CIS 391 - Intro to AI12
Backward Algorithm
Initialization:
Induction :
Termination:
T (i) 1, 1 i N
1 1
1
( ) ( ) ( ) 1 1,1N
t ij j t t
j
i a b o j T t i N
N
i
ii iobOP1
11 )()()|(
CIS 391 - Intro to AI13
Problem 2: Decoding
The Forward algorithm gives the sum of all
paths through an HMM efficiently.
Here, we want to find the highest probability
path.
We want to find the state sequence Q=q1…qT,
such that
Q argmaxQ'
P(Q' |O,)
CIS 391 - Intro to AI14
Viterbi Algorithm
Just like the forward algorithm, but instead of
summing over transitions from incoming
states, compute the maximum
Forward:
Viterbi Recursion:
1
1
( ) ( ) ( )N
t t ij j t
i
j i a b o
11max ( ) ( )( ) t ijt j t
i Ni a b oj
CIS 391 - Intro to AI15
Core Idea of Viterbi Algorithm
Not quite what we want….
Viterbi recursion computes the maximum
probability path to state j at time t given that the
partial observation o1 … ot has been generated
But we want the path itself that gives the maximum
probability
Solution:
1. Keep backpointers
2. Find
3. Chase backpointers from state j at time T to
find state sequence (backwards) CIS 391 - Intro to AI
16
11max ( ) ( )( ) t ijt j t
i Ni a b oj
arg max ( )Tj
j
CIS 391 - Intro to AI17
Viterbi Algorithm
Initialization:
Induction:
(Backpointers)
`
Termination: (Final state!)
Backpointer path:
1(i) ib j (o1) 1 i N
11
( ) max ( ) ( )t t ij j ti N
j i a b o
NjTt 1,2
*
1
maxarg ( )Ti N
Tq i
qt
* t1(qt1
* ) t T 1,...,1
11
arg( ) max ( )t t iji N
j i a
CIS 391 - Intro to AI18
Problem 3: Learning
Up to now we’ve assumed that we know the
underlying model
Often these parameters are estimated on
annotated training data, but:
Annotation is often difficult and/or expensive
Training data is different from the current data
We want to maximize the parameters with
respect to the current data, i.e., we’re looking
for a model , such that
(A,B,)
'
' argmax
P(O | )
CIS 391 - Intro to AI19
Problem 3: Learning (If Time Allows…)
Unfortunately, there is no known way to
analytically find a global maximum, i.e., a
model , such that
But it is possible to find a local maximum
Given an initial model , we can always find a
model , such that
'
' argmax
P(O | )
'
P(O |') P(O |)
CIS 391 - Intro to AI20
Forward-Backward (Baum-Welch) algorithm
Key Idea: parameter re-estimation by hill-climbing
FB algorithm iteratively re-estimates the parameters yielding a new at each iteration
1. Initialize to a random set of values
2. Estimate , filling out the trellis for both the Forward and the Backward algorithms
3. Reestimate using both trellises, yielding a new estimate
Theorem:
'
P(O |') P(O |)
( | )P O
'
CIS 391 - Intro to AI21
Parameter Re-estimation
Three parameters need to be re-estimated:
• Initial state distribution:
• Transition probabilities: ai,j
• Emission probabilities: bi(ot)
i
CIS 391 - Intro to AI22
Re-estimating Transition Probabilities: Step 1
What’s the probability of being in state si at time t
and going to state sj, given the current model and
parameters?
t (i, j) P(qt si, qt1 s j |O,)
CIS 391 - Intro to AI23
Re-estimating Transition Probabilities: Step 1
t (i, j) t (i) ai, j b j (ot1) t1( j)
t (i) ai, j b j (ot1) t1( j)j1
N
i1
N
t (i, j) P(qt si, qt1 s j |O,)
CIS 391 - Intro to AI24
Re-estimating Transition Probabilities: Step 2
The intuition behind the re-estimation
equation for transition probabilities is
Formally:
i
ji
j,i s statefrom stransition of number expected
s stateto s statefrom stransition of number expecteda =
ˆ a i, j
t (i, j)t1
T1
t (i, j ')j '1
N
t1
T1
CIS 391 - Intro to AI25
Re-estimating Transition Probabilities
Defining
As the probability of being in state si, given the
complete observation O
We can say:
ˆ a i, j
t (i, j)t1
T1
t (i)t1
T1
t (i) t (i, j)j1
N
CIS 391 - Intro to AI26
Re-estimating Initial State Probabilities
Initial state distribution: is the probability
that si is a start state
Re-estimation is easy:
Formally:
i
1 time at s statein times of number expectedπ ii =
ˆ i 1(i)
CIS 391 - Intro to AI27
Re-estimation of Emission Probabilities
Emission probabilities are re-estimated as
Formally:
where
Note that here is the Kronecker delta function and
is not related to the in the discussion of the Viterbi
algorithm!!
i
ki
i s statein times of number expected
v symbolobserve and s statein times of number expected)k(b =
ˆ b i(k)
(ot,vk ) t (i)t1
T
t (i)t1
T
(ot,vk ) 1, if ot vk, and 0 otherwise
CIS 391 - Intro to AI28
The Updated Model
Coming from we get to
by the following update rules:
(A,B,)
' ( ˆ A , ˆ B , ˆ )
ˆ b i(k)
(ot,vk ) t (i)t1
T
t (i)t1
T
ˆ a i, j
t (i, j)t1
T1
t (i)t1
T1
ˆ i 1(i)
CIS 391 - Intro to AI29
Expectation Maximization
The forward-backward algorithm is an instance of
the more general EM algorithm
• The E Step: Compute the forward and backward
probabilities for a give model
• The M Step: Re-estimate the model parameters