Lecture 13: Hidden Markov ModelShuai Li
A Markov system
• There are states 1, 2, … , , and the time steps are discrete, =
0,1,2, …
• On the t-th time step the system is in exactly one of the
available states. Call it
• Between each time step, the next state is chosen only based on
the information provided by the current state
• The current state determines the probability distribution for the
next state
2
Example
Example (cont.)
Markovian property
• +1is independent of −1, −2, … , 0 given
• In other words:
8
Example
• A human and a robot wander around randomly on a grid
9
Example (cont.)
• Each time step the human/robot moves randomly to an adjacent
cell
• Typical Questions: • “What’s the expected time until the human is
crushed like a bug?”
• “What’s the probability that the robot will hit the left wall
before it hits the human?”
• “What’s the probability Robot crushes human on next time
step?”
10
Example (cont.)
• The currently time is , and human remains uncrushed. What’s the
probability of crushing occurring at time + 1?
• If robot is blind: • We can compute this in advance
• If robot is omnipotent (i.e. if robot knows current state): • can
compute directly
• If robot has some sensors, but incomplete state information •
Hidden Markov Models are applicable
11
= -- A clumsy solution
• Step 1: Work out how to compute () for any path = 12
• Step 2: Use this knowledge to get =
12
= -- A cleverer solution
• For each state , define = = to be the probability of state at
time t
• Easy to do inductive computation
13
= -- A cleverer solution
• For each state , define = = to be the probability of state at
time t
• Easy to do inductive computation
14
Complexity comparison
• Cost of computing for all states is now 2
• Why?
• Why?
• This is the power of dynamic programming that is widely used in
HMM
15
Example (cont.)
• It’s currently time t, and human remains uncrushed. What’s the
probability of crushing occurring at time t + 1
• If robot is blind: • We can compute this in advance
• If robot is omnipotent (I.E. If robot knows state at time t): •
can compute directly
• If robot has some sensors, but incomplete state information •
Hidden Markov Models are applicable
16
Hidden state
• The previous example tries to estimate = unconditionally (no
other information)
• Suppose we can observe something that’s affected by the true
state
17
Noisy observation of hidden state
• Let’s denote the observation at time by
• is noisily determined depending on the current state
• Assume that is conditionally independent of −1, −2, … , 0, −1,
−2, … , 1, 0 given
• In other words
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, …
,
• Question 2: (Inference) Most probable path: • Given 1, … , , what
is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the
maximum likelihood HMM that could have
produced this string of observations?
• MLE 21
• Speech recognition/understanding • Phones → Words, Signal →
phones
• Human genome project
• Consumer decision modeling
• Economics and finance
Basic operations in HMMs
• For an observation sequence = 1, … , , three basic HMM operations
are:
23
T = # timesteps, N = # states
Formal definition of HMM
• The states are labeled 1, 2, … , • For a particular trial,
let
• be the number of observations
• be the number of states
• be the number of possible observations
• 1, 2, … , is the starting state probabilities
• = 1… is a sequence of observations
• = 12 is a path of states
• Then is the specification of an HMM The definition of and () will
be introduced in next page
24
• The definition of and ()
25
Example
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random
26
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
Probability of a series of observations
• What is = 123 = 1 = ∧ 2 = ∧ 3 = ?
• Slow, stupid way:
• How do we compute () for an arbitrary path ?
• How do we compute (|) for an arbitrary path ?
35
• () for an arbitrary path
36
37
Probability of a series of observations (cont.)
• Computation complexity of the slow stupid answer: • () would
require 27 () and 27 (|)
• A sequence of 20 observations would need 320=3.5 billion () and
3.5 billion (|)
• So we have to find some smarter answer
38
• Smart answer (based on dynamic programming)
• Given observations 12… • Define:
• In the example, what is 2(3) ?
39
40
42
43
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, …
,
• Question 2: (Inference) Most probable path: • Given 1, … , , what
is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the
maximum likelihood HMM that could have
produced this string of observations?
• MLE 44
45
• We’re going to compute the following variables
• It’s the probability of the path of length − 1 with the maximum
chance of doing all these things OCCURING and ENDING UP IN STATE Si
and PRODUCING OUTPUT O1…Ot
• DEFINE: mppt(i) = that path
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, …
,
• Question 2: (Inference) Most probable path: • Given 1, … , , what
is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the
maximum likelihood HMM that could have
produced this string of observations?
• MLE 51
• That “” is the notation for our HMM parameters
• Now we want to estimate from the observations
• AS USUAL: We could use
52
EM for HMMs
• If we knew we could estimate EXPECTATIONS of quantities such as •
Expected number of times in state
• Expected number of transitions →
• If we knew the quantities such as • Expected number of times in
state
• Expected number of transitions →
• We could compute the MAX LIKELIHOOD estimate of
57
• Bad news • There are lots of local minima
• Good news • The local minima are usually adequate models of the
data
• Notice • EM does not estimate the number of states. That must be
given.
• Often, HMMs are forced to have some links with zero probability.
This is done by setting = 0 in initial estimate (0)
• Easy extension of everything seen today: • HMMs with real valued
outputs
59