Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Hidden Markov Models
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Administrivia • PS4 – going OK?
• Please share your experiences on Piazza – e.g. discovered something that is subtle about using vl_sift. If you want to talk about what scales worked and why that’s ok too.
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Outline • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture
Slides “borrowed” from UMd and elsewhere Material from: slides from Sebastian Thrun, and Yair Weiss
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Audio Spectrum
Audio Spectrum of the Song of the Prothonotary Warbler
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Bird Sounds
Chestnut-sided Warbler Prothonotary Warbler
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Questions One Could Ask
• What bird is this? • How will the song
continue? • Is this bird sick? • What phases does this
song have?
Time series classification Time series prediction
Outlier detection Time series segmentation
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Other Sound Samples
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Another Time Series Problem
Intel
Cisco General Electric
Microsoft
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Questions One Could Ask
• Will the stock go up or down?
• What type stock is this (eg, risky)?
• Is the behavior abnormal?
Time series prediction
Time series classification
Outlier detection
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Music Analysis
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Questions One Could Ask
• Is this Beethoven or Bach? • Can we compose more of
that? • Can we segment the piece
into themes?
Time series classification Time series
prediction/generation Time series segmentation
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
For vision: Waving, pointing, controlling?
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
The Real Question • How do we model these problems?
• How do we formulate these questions as a
inference/learning problems?
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Outline For Today • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture • Summary
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Weather: A Markov Model (maybe?)
Sunny
Rainy
Snowy
80%
15%
5%
60%
2% 38%
20%
75% 5%
Probability of moving to a given
state depends only on the current state:
1st Order Markovian
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Ingredients of a Markov Model • States:
• State transition probabilities:
• Initial state distribution:
Sunny Rainy
Snowy
80%
15%
5%
60%
2% 38%
20%
75% 5%
1[ ]i iP q Sπ = =
1 2{ , ,..., }NS S S
1( | )ij t i t ja P q S q S+= = =
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Ingredients of Our Markov Model • States:
• State transition probabilities:
• Initial state distribution:
(.7 .25 .05)π =
{ , , }sunny rainy snowyS S S
.8 .15 .05.38 .6 .02.75 .05 .2
A =
Sunny Rainy
Snowy
80%
15%
5%
60%
2% 38%
20%
75% 5%
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Probability of a Time Series • Given:
• What is the probability of this series?
)05.25.7.(=π
=
2.05.75.02.6.38.05.15.8.
A
0001512.02.002.06.06.015.07.0 =⋅⋅⋅⋅⋅=
)|()|()|()|()|()(
snowysnowyrainysnowy
rainyrainyrainyrainysunnyrainysunnySSPSSP
SSPSSPSSPSP⋅⋅
⋅⋅⋅
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Outline For Today • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture • Summary
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Hidden Markov Models
Sunny
Rainy
Snowy
80%
15%
5%
60%
2% 38%
20%
75% 5%
Sunny Rainy
Snowy
80%
15%
5%
60%
2% 38%
20%
75% 5%
60%
10%
30%
65%
5%
30%
50% 0% 50%
NOT OBSERVABLE
OBSERVABLE
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Probability of a Time Series • Given:
• What is the probability of this series?
)05.25.7.(=π
=
2.05.75.02.6.38.05.15.8.
A
=
5.5.065.3.05.1.3.6.
B
),...,(),...,|()()|( 7171,..., all 71
qqPqqOPQPQOPqqQ
∑∑ ==
),...,,,()( umbrellaumbrellacoatcoat OOOOPOP =
2 4 6(0.3 0.1 0.6) (0.7 0.8 ) ...= ⋅ ⋅ ⋅ ⋅ +
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Specification of an HMM • N - number of states
• Q = {q1; q2; : : : ;qT} – sequence of states
• Some form of output symbols • Discrete – finite vocabulary of symbols of size M. One symbol is
“emitted” each time a state is visited (or transition taken). • Continuous – an output density in some feature space associated
with each state where a output is emitted with each visit
• For a given sequence observation O • O = {o1; o2; : : : ;oT} – oi observed symbol or feature at time i
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Specification of an HMM • A - the state transition probability matrix
• aij = P(qt+1 = j|qt = i) • B- observation probability distribution
• Discrete: bj(k) = P(ot = k |qt = j) i ≤ k ≤ M • Continuous bj(x) = p(ot = x | qt = j)
• π - the initial state distribution • π (j) = P(q1 = j)
• Full HMM over a of states and output space is thus specified as a triplet: λ = (A,B,π)
S3 S2 S1
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
What does this have to do with Vision? • Given some sequence of observations, what “model”
generated those? • Using the previous example: given some observation
sequence of clothing:
• Is this Philadelphia, Boston or Newark?
• Notice that if Boston vs Arizona would not need the sequence!
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Outline For Today • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture • Summary
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
The 3 great problems in HMM modelling 1. Evaluation: Given the model 𝜆 = (𝐴,𝐵,𝜋) what is the
probability of occurrence of a particular observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇} = 𝑃(𝑂|𝜆)
• This is the heart of the classification/recognition problem: I have a trained model for each of a set of classes, which one would most likely generate what I saw.
2. Decoding: Optimal state sequence to produce an observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇}
• Useful in recognition problems – helps give meaning to states – which is not exactly legal but often done anyway.
3. Learning: Determine model λ, given a training set of observations
• Find λ, such that 𝑃(𝑂|𝜆) is maximal
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Naïve solution
NB: Observations are mutually independent, given the hidden states. That is, if I know the states then the previous observations don’t help me predict new observation. The states encode *all* the information. Usually only kind-of true – see CRFs.
)()...()(),|(,|( 22111
TqTqqt
T
it obobobqoPqOP ==) ∏
=
λλ
• State sequence 𝑄 = (𝑞1, … 𝑞𝑇) • Assume independent observations:
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Naïve solution
1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=
• But we know the probability of any given sequence of states:
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Naïve solution • Given:
• We get:
NB: -The above sum is over all state paths -There are 𝑁𝑇 states paths, each ‘costing’ 𝑂(𝑇) calculations, leading to 𝑂(𝑇𝑁𝑇) time complexity.
)()...()(),|(,|( 22111
TqTqqt
T
it obobobqoPqOP ==) ∏
=
λλ
∑=q
qPqOPOP )|(),|()|( λλλ
1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
• Define auxiliary forward variable α:
Problem 1: Efficient solution
)|,,...,()( 1 λα iqooPi ttt ==
𝛼𝑡(𝑖) is the probability of observing a partial sequence of observables 𝑜1, … 𝑜𝑡 AND at time t, state 𝑞𝑡 = 𝑖
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Efficient solution
• Recursive algorithm: • Initialise: • Calculate: • Obtain:
1 1( ) ( )i ii b oα π=
1 11
( ) ( ) ( )N
t t ij j ti
j i a b oα α+ +=
=
∑
∑=
=N
iT iOP
1)()|( αλ
Complexity is only O(𝑵𝟐𝑻)!!!
(Partial obs seq to t AND state 𝑖 at 𝑡) x (transition to j at t+1) x (sensor)
Sum of different ways of getting obs seq
Sum, as can reach 𝑗 from any preceding state
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
The Forward Algorithm (1) S2
S3
S1
S2
S3
S1
O2 O1
S2
S3
S1
O3
S2
S3
S1
O4
S2
S3
S1
OT
…
),,...,()( 1 ittt SqOOPi ==α
)()(
)()|,(
),,...,(),,...,|,,...,(
),,...,()(
11
111
111111
1111
iaOb
iSqSqOP
SqOOPSqOOSqOOP
SqOOPj
tijt
N
ij
titjtt
N
i
N
iittittjtt
jttt
α
α
α
+=
++=
=++
+++
∑
∑
∑
=
===
====
==
)()( 11 Obi iiπα =
(Trellis diagram)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Alternative solution
• Define auxiliary forward variable β:
1 2( ) ( , ,..., | , )t t t T ti P o o o q iβ λ+ += =
Backward algorithm:
β𝑡(𝑖) – the probability of observing a sequence of observables o t+1 , … , 𝑜𝑇 GIVEN state 𝑞𝑡 = 𝑖 at time 𝑡, and λ
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 1: Alternative solution • Recursive algorithm:
• Initialize: • Calculate:
• Terminate:
( ) 1T jβ =
Complexity is 𝑂(𝑁2𝑇)
1 11
( ) ( ) ( )N
t t i j j tj
i j a b oβ β + +=
= ∑
∑=
=N
iiOp
11 )()|( βλ 1,...,1−= Tt
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Forward-Backward • Optimality criterion : to choose the states that are
individually most likely at each time t
• The probability of being in state i at time t
• : accounts for partial observation sequence • account for remainder
tq
1
( ) ( | , )( ) ( )
( ) ( )
i t
t tN
t ti
t p q i Oi i
i i
γ λα β
α β=
= =
=
∑
( )t iα( ) :t iβ 1 2, ,...t t To o o+ +
1 2, ,... to o o
= p(O|λ) and qt =i
= p(O|λ)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 2: Decoding • Choose state sequence to maximise probability of
observation sequence • Viterbi algorithm - inductive algorithm that keeps the best
state sequence at each instance
S2
S3
S1
S2
S3
S1
O2 O1
S2
S3
S1
O3
S2
S3
S1
O4
S2
S3
S1
OT
…
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 2: Decoding
• State sequence to maximize 𝑃(𝑂,𝑄|�):
• Define auxiliary variable δ:
1 2( , ,... | , )TP q q q O λ
Viterbi algorithm:
1 2 1 2( ) max ( , ,..., , , ,... | )t t tqi P q q q i o o oδ λ= =
𝛿𝑡(𝑖) – the probability of the most probable path ending in state 𝑞𝑡 = 𝑖
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 2: Decoding • Recurrent property: • Algorithm:
• 1. Initialise:
1 1( ) max( ( ) ) ( )t t ij j tij i a b oδ δ+ +=
1 1( ) ( )i ii b oδ π= Ni ≤≤1
1( ) 0iψ =
To get state seq, need to keep track of argument to maximise this, for each t and j. Done via the array 𝜓𝑡(𝑗).
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 2: Decoding • 2. Recursion:
• 3. Terminate:
11( ) max( ( ) ) ( )t t ij j ti N
j i a b oδ δ −≤ ≤=
11( ) arg max( ( ) )t t iji N
j i aψ δ −≤ ≤= NjTt ≤≤≤≤ 1,2
)(max1
iP TNiδ
≤≤=∗
)(maxarg1
iq TNiT δ≤≤
∗ =
P* gives the state-optimized probability
Q* is the optimal state sequence (𝑄∗ = {𝑞1∗, 𝑞2∗, … , 𝑞𝑇∗})
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 2: Decoding • 4. Backtrack state sequence:
1 1( )t t tq qψ∗ ∗
+ += 1, 2,...,1t T T= − −
O(N2T) time complexity
S2
S3
S1
S2
S3
S1
O2 O1
S2
S3
S1
O3
S2
S3
S1
O4
S2
S3
S1
OT
…
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning • Training HMM to encode observation sequence such that
HMM should identify a similar obs seq in future • Find 𝜆 = (𝐴,𝐵,𝜋), maximizing 𝑃(𝑂|𝜆) • General algorithm:
• Initialize: 𝜆0 • Compute new model 𝜆, using 𝜆0 and observed
sequence 𝑂 • Then • Repeat steps 2 and 3 until:
λλ ←o
dOPOP <− )|(log)|(log 0λλ
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning
)|()()()(
),( 11
λβα
ξOP
jobaiji ttjijt
t++=
• Let ξ(i,j) be a probability of being in state i at time t and at state j at time t+1, given λ and O seq
∑∑= =
++
++= N
i
N
jttjijt
ttjijt
jobai
jobai
1 111
11
)()()(
)()()(
βα
βα
Step 1 of Baum-Welch algorithm:
= p(O and (take i to j) |λ )
= p(O|λ)
= p(take i to j at time t |O,λ)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning
Operations required for the computation of the joint event that the system is in state Si and time t and State Sj at time t+1
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning Let be a probability of being in state 𝑖 at time 𝑡, given 𝑂 - expected no. of transitions from state i - expected no. of transitions
1( ) ( , )
N
t tj
i i jγ ξ=
= ∑1
1( )
T
tt
iγ−
=∑1
1( , )
T
tt
i jξ−
=∑ ji →
( )t iγ
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning
the expected frequency of state i at time t=1 ratio of expected no. of transitions from
state i to j over expected no. of transitions from state i ratio of expected no. of times in
state j observing symbol k over expected no. of times in state j
∑∑=
)(),(
ˆiji
at
tij γ
ξ
Step 2 of Baum-Welch algorithm:
,( )ˆ ( )
( )t
tt o kj
t
jb k
j
γ
γ==
∑∑
)(ˆ 1 iγπ =
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Problem 3: Learning
• Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables 𝛼,𝛽
• B-W algorithm is a special case of the EM algorithm: • E-step: calculation of ξ and γ • M-step: iterative calculation of , ,
• Practical issues: • Can get stuck in local maxima • Numerical problems – log and scaling
π ija )(ˆ kbj
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Now HMMs and Vision: Gesture Recognition…
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
"Gesture recognition"-like activities
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Some thoughts about gesture • There is a conference on Face and Gesture Recognition
so obviously Gesture recognition is an important problem…
• Prototype scenario: • Subject does several examples of "each gesture" • System "learns" (or is trained) to have some sort of model for each • At run time compare input to known models and pick one
• New found life for gesture recognition:
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic Gesture Recognition using HMMs
Nam, Y., & Wohn, K. (1996, July). Recognition of space-time hand-gestures using hidden Markov model. In ACM symposium on Virtual reality software and technology (pp. 51-58).
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic gesture recognition using HMMs (1)
Data glove
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic gesture recognition using HMMs (2)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic gesture recognition using HMMs (3)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic gesture recognition using HMMs (4)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Generic gesture recognition using HMMs (5)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Wins and Losses of HMMs in Gesture • Good points about HMMs:
• A learning paradigm that acquires spatial and temporal models and does some amount of feature selection.
• Recognition is fast; training is not so fast but not too bad.
• Not so good points: • If you know something about state definitions, difficult to
incorporate • Every gesture is a new class, independent of anything else you’ve
learned. • ->Particularly bad for “parameterized gesture.”
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Parameterized Gesture
“I caught a fish this big.”
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Parametric HMMs (PAMI, 1999)
• Basic ideas: • Make output probabilities of the state be a function of the parameter of
interest, 𝑏𝑗 (𝒙) becomes 𝑏′𝑗(𝒙,𝜃). • Maintain same temporal properties, 𝑎𝑖𝑗 unchanged. • Train with known parameter values to solve for dependencies of 𝑏𝑏 on θ. • During testing, use EM to find θ that gives the highest probability. That
probability is confidence in recognition; best θ is the parameter.
• Issues: • How to represent dependence
on θ ? • How to train given θ ? • How to test for θ ? • What are the limitations on
dependence on θ ?
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Linear PHMM - Representation Represent dependence on θ as linear movement of the
mean of the Gaussians of the states: Need to learn Wj and µj for each state j. (ICCV ’98)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Linear PHMM - training • Need to derive EM equations for linear parameters and
proceed as normal:
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Linear HMM - testing • Derive EM equations with respect to θ :
• We are testing by EM! (i.e. iterative): • Solve for γtk given guess for θ • Solve for θ given guess for γtk
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
How big was the fish?
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Pointing
• Pointing is the prototypical example of a parameterized gesture.
• Assuming two DOF, can parameterize either by (x,y) or by (θ,φ) .
• Under linear assumption must choose carefully.
• A generalized non-linear map would allow greater freedom. (ICCV 99)
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Linear pointing results Test for both recognition and recovery:
If prune based on legal θ (MAP via uniform density) :
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
Noise sensitivity
• Compare ad hoc procedure with PHMM parameter recovery (ignoring “their” recognition problem!!).
Hidden Markov Models CS 4495 Computer Vision – A. Bobick
HMMs and vision • HMMs capture sequencing nicely in a probabilistic
manner.
• Moderate time to train, fast to test.
• More when we do activity recognition…