 Transcript
Lecture 13: Hidden Markov ModelShuai Li
A Markov system
• There are states 1, 2, … , , and the time steps are discrete, = 0,1,2, …
• On the t-th time step the system is in exactly one of the available states. Call it
• Between each time step, the next state is chosen only based on the information provided by the current state
• The current state determines the probability distribution for the next state
2
Example
Example (cont.)
Markovian property
• +1is independent of −1, −2, … , 0 given
• In other words:
8
Example
• A human and a robot wander around randomly on a grid
9
Example (cont.)
• Each time step the human/robot moves randomly to an adjacent cell
• Typical Questions: • “What’s the expected time until the human is crushed like a bug?”
• “What’s the probability that the robot will hit the left wall before it hits the human?”
• “What’s the probability Robot crushes human on next time step?”
10
Example (cont.)
• The currently time is , and human remains uncrushed. What’s the probability of crushing occurring at time + 1?
• If robot is blind: • We can compute this in advance
• If robot is omnipotent (i.e. if robot knows current state): • can compute directly
• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable
11
= -- A clumsy solution
• Step 1: Work out how to compute () for any path = 12
• Step 2: Use this knowledge to get =
12
= -- A cleverer solution
• For each state , define = = to be the probability of state at time t
• Easy to do inductive computation
13
= -- A cleverer solution
• For each state , define = = to be the probability of state at time t
• Easy to do inductive computation
14
Complexity comparison
• Cost of computing for all states is now 2
• Why?
• Why?
• This is the power of dynamic programming that is widely used in HMM
15
Example (cont.)
• It’s currently time t, and human remains uncrushed. What’s the probability of crushing occurring at time t + 1
• If robot is blind: • We can compute this in advance
• If robot is omnipotent (I.E. If robot knows state at time t): • can compute directly
• If robot has some sensors, but incomplete state information • Hidden Markov Models are applicable
16
Hidden state
• The previous example tries to estimate = unconditionally (no other information)
• Suppose we can observe something that’s affected by the true state
17
Noisy observation of hidden state
• Let’s denote the observation at time by
• is noisily determined depending on the current state
• Assume that is conditionally independent of −1, −2, … , 0, −1, −2, … , 1, 0 given
• In other words
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, … ,
• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have
produced this string of observations?
• MLE 21
• Speech recognition/understanding • Phones → Words, Signal → phones
• Human genome project
• Consumer decision modeling
• Economics and finance
Basic operations in HMMs
• For an observation sequence = 1, … , , three basic HMM operations are:
23
T = # timesteps, N = # states
Formal definition of HMM
• The states are labeled 1, 2, … , • For a particular trial, let
• be the number of observations
• be the number of states
• be the number of possible observations
• 1, 2, … , is the starting state probabilities
• = 1… is a sequence of observations
• = 12 is a path of states
• Then is the specification of an HMM The definition of and () will be introduced in next page
24
• The definition of and ()
25
Example
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random
26
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
• Let’s generate a sequence of observations:
Example (cont.)
• Start randomly in state 1 or 2
• Choose one of the output symbols in each state at random.
Probability of a series of observations
• What is = 123 = 1 = ∧ 2 = ∧ 3 = ?
• Slow, stupid way:
• How do we compute () for an arbitrary path ?
• How do we compute (|) for an arbitrary path ?
35
• () for an arbitrary path
36
37
Probability of a series of observations (cont.)
• Computation complexity of the slow stupid answer: • () would require 27 () and 27 (|)
• A sequence of 20 observations would need 320=3.5 billion () and 3.5 billion (|)
• So we have to find some smarter answer
38
• Smart answer (based on dynamic programming)
• Given observations 12… • Define:
• In the example, what is 2(3) ?
39
40
42
43
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, … ,
• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have
produced this string of observations?
• MLE 44
45
• We’re going to compute the following variables
• It’s the probability of the path of length − 1 with the maximum chance of doing all these things OCCURING and ENDING UP IN STATE Si and PRODUCING OUTPUT O1…Ot
• DEFINE: mppt(i) = that path
• The robot with noisy sensors is a good example
• Question 1: (Evaluation) State estimation: • what is = |1, … ,
• Question 2: (Inference) Most probable path: • Given 1, … , , what is the most probable path of the states? And what is
the probability?
• Question 3: (Leaning) Learning HMMs: • Given 1, … , , what is the maximum likelihood HMM that could have
produced this string of observations?
• MLE 51
• That “” is the notation for our HMM parameters
• Now we want to estimate from the observations
• AS USUAL: We could use
52
EM for HMMs
• If we knew we could estimate EXPECTATIONS of quantities such as • Expected number of times in state
• Expected number of transitions →
• If we knew the quantities such as • Expected number of times in state
• Expected number of transitions →
• We could compute the MAX LIKELIHOOD estimate of
57
• Bad news • There are lots of local minima
• Good news • The local minima are usually adequate models of the data
• Notice • EM does not estimate the number of states. That must be given.
• Often, HMMs are forced to have some links with zero probability. This is done by setting = 0 in initial estimate (0)
• Easy extension of everything seen today: • HMMs with real valued outputs
59

Top Related