Post on 02-Jan-2022
transcript
Lecture 13: Hidden MarkovModel
Shuai Li
John Hopcroft Center, Shanghai Jiao Tong University
https://shuaili8.github.io
https://shuaili8.github.io/Teaching/VE445/index.html
1
A Markov system
β’ There are π states π1, π2, β¦ , ππ, and the time steps are discrete, π‘ =0,1,2, β¦
β’ On the t-th time step the system is in exactly one of the available states. Call it ππ‘
β’ Between each time step, the next state is chosen only based on the information provided by the current state ππ‘
β’ The current state determines the probability distribution for the next state
2
Example
β’ Three states
β’ Current state: π3
3
Example (cont.)
β’ Three states
β’ Current state: π2
4
Example (cont.)
β’ Three states
β’ The transition matrix
5
Example (cont.)
Markovian property
β’ ππ‘+1is independent of ππ‘β1, ππ‘β2, β¦ , π0 given ππ‘
β’ In other words:
6
Example 2
7
Markovian property
8
Example
β’ A human and a robot wander around randomly on a grid
9
Note: N (num.states) = 18 * 18 = 324
Example (cont.)
β’ Each time step the human/robot moves randomly to an adjacent cell
β’ Typical Questions:β’ βWhatβs the expected time until the human is crushed like a bug?β
β’ βWhatβs the probability that the robot will hit the left wall before it hits the human?β
β’ βWhatβs the probability Robot crushes human on next time step?β
10
Example (cont.)
β’ The currently time is π‘, and human remains uncrushed. Whatβs the probability of crushing occurring at time π‘ + 1?
β’ If robot is blind:β’ We can compute this in advance
β’ If robot is omnipotent (i.e. if robot knows current state):β’ can compute directly
β’ If robot has some sensors, but incomplete state informationβ’ Hidden Markov Models are applicable
11
π ππ‘ = π -- A clumsy solution
β’ Step 1: Work out how to compute π(π) for any path π = π1π2β―ππ‘
β’ Step 2: Use this knowledge to get π ππ‘ = π
12
π ππ‘ = π -- A cleverer solution
β’ For each state ππ, define ππ‘ π = π ππ‘ = ππ to be the probability of state ππ at time t
β’ Easy to do inductive computation
13
πππ = π ππ‘+1 = ππ|ππ‘ = ππ
π ππ‘ = π -- A cleverer solution
β’ For each state ππ, define ππ‘ π = π ππ‘ = ππ to be the probability of state ππ at time t
β’ Easy to do inductive computation
14
Complexity comparison
β’ Cost of computing ππ‘ π for all states ππ is now π π‘π2
β’ Why?
β’ The first method has π ππ‘
β’ Why?
β’ This is the power of dynamic programming that is widely used in HMM
15
Example (cont.)
β’ Itβs currently time t, and human remains uncrushed. Whatβs the probability of crushing occurring at time t + 1
β’ If robot is blind:β’ We can compute this in advance
β’ If robot is omnipotent (I.E. If robot knows state at time t):β’ can compute directly
β’ If robot has some sensors, but incomplete state informationβ’ Hidden Markov Models are applicable
16
Hidden state
β’ The previous example tries to estimate π ππ‘ = ππ unconditionally (no other information)
β’ Suppose we can observe something thatβs affected by the true state
17
What the robot see (uncorrupted data)
What the robot see (corrupted data)
Noisy observation of hidden state
β’ Letβs denote the observation at time π‘ by ππ‘
β’ ππ‘ is noisily determined depending on the current state
β’ Assume that ππ‘ is conditionally independent of ππ‘β1, ππ‘β2, β¦ , π0, ππ‘β1, ππ‘β2, β¦ , π1, π0 given ππ‘
β’ In other words
18
Example
19
Example (cont.)
20
Hidden Markov models
β’ The robot with noisy sensors is a good example
β’ Question 1: (Evaluation) State estimation: β’ what is π ππ‘ = ππ|π1, β¦ , ππ‘
β’ Question 2: (Inference) Most probable path: β’ Given π1, β¦ , ππ‘, what is the most probable path of the states? And what is
the probability?
β’ Question 3: (Leaning) Learning HMMs: β’ Given π1, β¦ , ππ‘, what is the maximum likelihood HMM that could have
produced this string of observations?
β’ MLE21
Application of HMM
β’ Robot planning + sensing when thereβs uncertainty
β’ Speech recognition/understandingβ’ Phones β Words, Signal β phones
β’ Human genome project
β’ Consumer decision modeling
β’ Economics and finance
β’ β¦
22
Basic operations in HMMs
β’ For an observation sequence π = π1, β¦ , ππ, three basic HMM operations are:
23
T = # timesteps, N = # states
Formal definition of HMM
β’ The states are labeled π1, π2, β¦ , ππβ’ For a particular trial, let
β’ π be the number of observations
β’ π be the number of states
β’ π be the number of possible observations
β’ π1, π2, β¦ , ππ is the starting state probabilities
β’ π = π1β¦ππ is a sequence of observations
β’ π = π1π2β―ππ‘ is a path of states
β’ Then is the specification of an HMMβ’The definition of πππ and ππ(π) will be introduced in next page
24
Formal definition of HMM (cont.)
β’ The definition of πππ and ππ(π)
25
Example
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random
26
Example (cont.)
27
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
28
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
29
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
30
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
31
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
32
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
33
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
β’ Letβs generate a sequence of observations:
Example (cont.)
34
β’ Start randomly in state 1 or 2
β’ Choose one of the output symbols in each state at random.
Probability of a series of observations
β’ What is π π = π π1π2π3 = π π1 = π β§ π2 = π β§ π3 = π ?
β’ Slow, stupid way:
β’ How do we compute π(π) for an arbitrary path π?
β’ How do we compute π(π|π) for an arbitrary path π?
35
Probability of a series of observations (cont.)
β’ π(π) for an arbitrary path π
36
Probability of a series of observations (cont.)
β’ π(π|π) for an arbitrary path π
37
Probability of a series of observations (cont.)
β’ Computation complexity of the slow stupid answer:β’ π(π) would require 27 π(π) and 27 π(π|π)
β’ A sequence of 20 observations would need 320=3.5 billion π(π) and 3.5 billion π(π|π)
β’ So we have to find some smarter answer
38
Probability of a series of observations (cont.)
β’ Smart answer (based on dynamic programming)
β’ Given observations π1π2β¦ππβ’ Define:
β’ In the example, what is πΌ2(3) ?
39
πΌπ‘(π) : easy to define recursively
40
πΌπ‘(π) in the example
β’ We see π1π2π3 = πππ
41
Easy question
β’ We can cheaply compute
β’ (How) can we cheaply compute
β’ (How) can we cheaply compute
42
Easy question (cont.)
β’ We can cheaply compute
β’ (How) can we cheaply compute
β’ (How) can we cheaply compute
43
Recall: Hidden Markov models
β’ The robot with noisy sensors is a good example
β’ Question 1: (Evaluation) State estimation: β’ what is π ππ‘ = ππ|π1, β¦ , ππ‘
β’ Question 2: (Inference) Most probable path: β’ Given π1, β¦ , ππ‘, what is the most probable path of the states? And what is
the probability?
β’ Question 3: (Leaning) Learning HMMs: β’ Given π1, β¦ , ππ‘, what is the maximum likelihood HMM that could have
produced this string of observations?
β’ MLE44
Most probable path (MPP) given observations
45
Efficient MPP computation
β’ Weβre going to compute the following variables
β’ Itβs the probability of the path of length π‘ β 1 with the maximum chance of doing all these things OCCURING and ENDING UP IN STATE Si and PRODUCING OUTPUT O1β¦Ot
β’ DEFINE: mppt(i) = that path
β’ So: Ξ΄t(i)= Prob(mppt(i))
46
The Viterbi algorithm
47
The Viterbi algorithm (cont.)
48
The Viterbi algorithm (cont.)
49
The Viterbi algorithm (cont.)
β’ Summary
50
Recall: Hidden Markov models
β’ The robot with noisy sensors is a good example
β’ Question 1: (Evaluation) State estimation: β’ what is π ππ‘ = ππ|π1, β¦ , ππ‘
β’ Question 2: (Inference) Most probable path: β’ Given π1, β¦ , ππ‘, what is the most probable path of the states? And what is
the probability?
β’ Question 3: (Leaning) Learning HMMs: β’ Given π1, β¦ , ππ‘, what is the maximum likelihood HMM that could have
produced this string of observations?
β’ MLE51
Inferring an HMM
β’ Remember, weβve been doing things like
β’ That βπβ is the notation for our HMM parameters
β’ Now we want to estimate π from the observations
β’ AS USUAL: We could use
52
Max likelihood HMM estimation
β’ Define:
53
Max likelihood HMM estimation
54
Max likelihood HMM estimation
55
Max likelihood HMM estimation
56
EM for HMMs
β’ If we knew π we could estimate EXPECTATIONS of quantities such asβ’ Expected number of times in state π
β’ Expected number of transitions π β π
β’ If we knew the quantities such asβ’ Expected number of times in state π
β’ Expected number of transitions π β π
β’ We could compute the MAX LIKELIHOOD estimate of
57
EM for HMMs
58
EM for HMMs
β’ Bad newsβ’ There are lots of local minima
β’ Good newsβ’ The local minima are usually adequate models of the data
β’ Noticeβ’ EM does not estimate the number of states. That must be given.
β’ Often, HMMs are forced to have some links with zero probability. This is done by setting πππ = 0 in initial estimate π(0)
β’ Easy extension of everything seen today: β’ HMMs with real valued outputs
59