CS 4495 Computer Vision Hidden Markov Modelsafb/classes/CS4495-Fall...Time series prediction ... •...

Hidden Markov Models CS 4495 Computer Vision – A. Bobick

Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision Hidden Markov Models


Administrivia • PS4 – going OK?

• Please share your experiences on Piazza – e.g. discovered something that is subtle about using vl_sift. If you want to talk about what scales worked and why that’s ok too.


Outline • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture

Slides “borrowed” from UMd and elsewhere Material from: slides from Sebastian Thrun, and Yair Weiss


Audio Spectrum

Audio Spectrum of the Song of the Prothonotary Warbler


Bird Sounds

Chestnut-sided Warbler Prothonotary Warbler


Questions One Could Ask

• What bird is this? • How will the song

continue? • Is this bird sick? • What phases does this

song have?

Time series classification Time series prediction

Outlier detection Time series segmentation


Other Sound Samples


Another Time Series Problem

Intel

Cisco General Electric

Microsoft



• Will the stock go up or down?

• What type stock is this (eg, risky)?

• Is the behavior abnormal?

Time series prediction

Time series classification

Outlier detection


Music Analysis



• Is this Beethoven or Bach? • Can we compose more of

that? • Can we segment the piece

into themes?

Time series classification Time series

prediction/generation Time series segmentation


For vision: Waving, pointing, controlling?


The Real Question • How do we model these problems?

• How do we formulate these questions as a

inference/learning problems?


Outline For Today • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture • Summary


Weather: A Markov Model (maybe?)

Sunny

Rainy

Snowy

80%

15%

5%

60%

2% 38%

20%

75% 5%

Probability of moving to a given

state depends only on the current state:

1st Order Markovian


Ingredients of a Markov Model • States:

• State transition probabilities:

• Initial state distribution:

Sunny Rainy

Snowy

80%

15%

5%

60%

2% 38%

20%

75% 5%

1[ ]i iP q Sπ = =

1 2{ , ,..., }NS S S

1( | )ij t i t ja P q S q S+= = =


Ingredients of Our Markov Model • States:

• State transition probabilities:

• Initial state distribution:

(.7 .25 .05)π =

{ , , }sunny rainy snowyS S S

.8 .15 .05.38 .6 .02.75 .05 .2

A =

Sunny Rainy

Snowy

80%

15%

5%

60%

2% 38%

20%

75% 5%


Probability of a Time Series • Given:

• What is the probability of this series?

)05.25.7.(=π

=

2.05.75.02.6.38.05.15.8.

A

0001512.02.002.06.06.015.07.0 =⋅⋅⋅⋅⋅=

)|()|()|()|()|()(

snowysnowyrainysnowy

rainyrainyrainyrainysunnyrainysunnySSPSSP

SSPSSPSSPSP⋅⋅

⋅⋅⋅




Hidden Markov Models

Sunny

Rainy

Snowy

80%

15%

5%

60%

2% 38%

20%

75% 5%

Sunny Rainy

Snowy

80%

15%

5%

60%

2% 38%

20%

75% 5%

60%

10%

30%

65%

5%

30%

50% 0% 50%

NOT OBSERVABLE

OBSERVABLE


Probability of a Time Series • Given:

• What is the probability of this series?

)05.25.7.(=π

=

2.05.75.02.6.38.05.15.8.

A

=

5.5.065.3.05.1.3.6.

B

),...,(),...,|()()|( 7171,..., all 71

qqPqqOPQPQOPqqQ

∑∑ ==

),...,,,()( umbrellaumbrellacoatcoat OOOOPOP =

2 4 6(0.3 0.1 0.6) (0.7 0.8 ) ...= ⋅ ⋅ ⋅ ⋅ +


Specification of an HMM • N - number of states

• Q = {q1; q2; : : : ;qT} – sequence of states

• Some form of output symbols • Discrete – finite vocabulary of symbols of size M. One symbol is

“emitted” each time a state is visited (or transition taken). • Continuous – an output density in some feature space associated

with each state where a output is emitted with each visit

• For a given sequence observation O • O = {o1; o2; : : : ;oT} – oi observed symbol or feature at time i


Specification of an HMM • A - the state transition probability matrix

• aij = P(qt+1 = j|qt = i) • B- observation probability distribution

• Discrete: bj(k) = P(ot = k |qt = j) i ≤ k ≤ M • Continuous bj(x) = p(ot = x | qt = j)

• π - the initial state distribution • π (j) = P(q1 = j)

• Full HMM over a of states and output space is thus specified as a triplet: λ = (A,B,π)

S3 S2 S1


What does this have to do with Vision? • Given some sequence of observations, what “model”

generated those? • Using the previous example: given some observation

sequence of clothing:

• Is this Philadelphia, Boston or Newark?

• Notice that if Boston vs Arizona would not need the sequence!




The 3 great problems in HMM modelling 1. Evaluation: Given the model 𝜆 = (𝐴,𝐵,𝜋) what is the

probability of occurrence of a particular observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇} = 𝑃(𝑂|𝜆)

• This is the heart of the classification/recognition problem: I have a trained model for each of a set of classes, which one would most likely generate what I saw.

2. Decoding: Optimal state sequence to produce an observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇}

• Useful in recognition problems – helps give meaning to states – which is not exactly legal but often done anyway.

3. Learning: Determine model λ, given a training set of observations

• Find λ, such that 𝑃(𝑂|𝜆) is maximal


Problem 1: Naïve solution

NB: Observations are mutually independent, given the hidden states. That is, if I know the states then the previous observations don’t help me predict new observation. The states encode *all* the information. Usually only kind-of true – see CRFs.

)()...()(),|(,|( 22111

TqTqqt

T

it obobobqoPqOP ==) ∏

=

λλ

• State sequence 𝑄 = (𝑞1, … 𝑞𝑇) • Assume independent observations:


Problem 1: Naïve solution

1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=

• But we know the probability of any given sequence of states:


Problem 1: Naïve solution • Given:

• We get:

NB: -The above sum is over all state paths -There are 𝑁𝑇 states paths, each ‘costing’ 𝑂(𝑇) calculations, leading to 𝑂(𝑇𝑁𝑇) time complexity.

)()...()(),|(,|( 22111

TqTqqt

T

it obobobqoPqOP ==) ∏

=

λλ

∑=q

qPqOPOP )|(),|()|( λλλ

1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=


• Define auxiliary forward variable α:

Problem 1: Efficient solution

)|,,...,()( 1 λα iqooPi ttt ==

𝛼𝑡(𝑖) is the probability of observing a partial sequence of observables 𝑜1, … 𝑜𝑡 AND at time t, state 𝑞𝑡 = 𝑖


Problem 1: Efficient solution

• Recursive algorithm: • Initialise: • Calculate: • Obtain:

1 1( ) ( )i ii b oα π=

1 11

( ) ( ) ( )N

t t ij j ti

j i a b oα α+ +=

=

∑

∑=

=N

iT iOP

1)()|( αλ

Complexity is only O(𝑵𝟐𝑻)!!!

(Partial obs seq to t AND state 𝑖 at 𝑡) x (transition to j at t+1) x (sensor)

Sum of different ways of getting obs seq

Sum, as can reach 𝑗 from any preceding state


The Forward Algorithm (1) S2

S3

S1

S2

S3

S1

O2 O1

S2

S3

S1

O3

S2

S3

S1

O4

S2

S3

S1

OT

…

),,...,()( 1 ittt SqOOPi ==α

)()(

)()|,(

),,...,(),,...,|,,...,(

),,...,()(

11

111

111111

1111

iaOb

iSqSqOP

SqOOPSqOOSqOOP

SqOOPj

tijt

N

ij

titjtt

N

i

N

iittittjtt

jttt

α

α

α

+=

++=

=++

+++

∑

∑

∑

=

===

====

==

)()( 11 Obi iiπα =

(Trellis diagram)


Problem 1: Alternative solution

• Define auxiliary forward variable β:

1 2( ) ( , ,..., | , )t t t T ti P o o o q iβ λ+ += =

Backward algorithm:

β𝑡(𝑖) – the probability of observing a sequence of observables o t+1 , … , 𝑜𝑇 GIVEN state 𝑞𝑡 = 𝑖 at time 𝑡, and λ


Problem 1: Alternative solution • Recursive algorithm:

• Initialize: • Calculate:

• Terminate:

( ) 1T jβ =

Complexity is 𝑂(𝑁2𝑇)

1 11

( ) ( ) ( )N

t t i j j tj

i j a b oβ β + +=

= ∑

∑=

=N

iiOp

11 )()|( βλ 1,...,1−= Tt


Forward-Backward • Optimality criterion : to choose the states that are

individually most likely at each time t

• The probability of being in state i at time t

• : accounts for partial observation sequence • account for remainder

tq

1

( ) ( | , )( ) ( )

( ) ( )

i t

t tN

t ti

t p q i Oi i

i i

γ λα β

α β=

= =

=

∑

( )t iα( ) :t iβ 1 2, ,...t t To o o+ +

1 2, ,... to o o

= p(O|λ) and qt =i

= p(O|λ)


Problem 2: Decoding • Choose state sequence to maximise probability of

observation sequence • Viterbi algorithm - inductive algorithm that keeps the best

state sequence at each instance

S2

S3

S1

S2

S3

S1

O2 O1

S2

S3

S1

O3

S2

S3

S1

O4

S2

S3

S1

OT

…


Problem 2: Decoding

• State sequence to maximize 𝑃(𝑂,𝑄|�):

• Define auxiliary variable δ:

1 2( , ,... | , )TP q q q O λ

Viterbi algorithm:

1 2 1 2( ) max ( , ,..., , , ,... | )t t tqi P q q q i o o oδ λ= =

𝛿𝑡(𝑖) – the probability of the most probable path ending in state 𝑞𝑡 = 𝑖


Problem 2: Decoding • Recurrent property: • Algorithm:

• 1. Initialise:

1 1( ) max( ( ) ) ( )t t ij j tij i a b oδ δ+ +=

1 1( ) ( )i ii b oδ π= Ni ≤≤1

1( ) 0iψ =

To get state seq, need to keep track of argument to maximise this, for each t and j. Done via the array 𝜓𝑡(𝑗).


Problem 2: Decoding • 2. Recursion:

• 3. Terminate:

11( ) max( ( ) ) ( )t t ij j ti N

j i a b oδ δ −≤ ≤=

11( ) arg max( ( ) )t t iji N

j i aψ δ −≤ ≤= NjTt ≤≤≤≤ 1,2

)(max1

iP TNiδ

≤≤=∗

)(maxarg1

iq TNiT δ≤≤

∗ =

P* gives the state-optimized probability

Q* is the optimal state sequence (𝑄∗ = {𝑞1∗, 𝑞2∗, … , 𝑞𝑇∗})


Problem 2: Decoding • 4. Backtrack state sequence:

1 1( )t t tq qψ∗ ∗

+ += 1, 2,...,1t T T= − −

O(N2T) time complexity

S2

S3

S1

S2

S3

S1

O2 O1

S2

S3

S1

O3

S2

S3

S1

O4

S2

S3

S1

OT

…


Problem 3: Learning • Training HMM to encode observation sequence such that

HMM should identify a similar obs seq in future • Find 𝜆 = (𝐴,𝐵,𝜋), maximizing 𝑃(𝑂|𝜆) • General algorithm:

• Initialize: 𝜆0 • Compute new model 𝜆, using 𝜆0 and observed

sequence 𝑂 • Then • Repeat steps 2 and 3 until:

λλ ←o

dOPOP <− )|(log)|(log 0λλ


Problem 3: Learning

)|()()()(

),( 11

λβα

ξOP

jobaiji ttjijt

t++=

• Let ξ(i,j) be a probability of being in state i at time t and at state j at time t+1, given λ and O seq

∑∑= =

++

++= N

i

N

jttjijt

ttjijt

jobai

jobai

1 111

11

)()()(

)()()(

βα

βα

Step 1 of Baum-Welch algorithm:

= p(O and (take i to j) |λ )

= p(O|λ)

= p(take i to j at time t |O,λ)


Problem 3: Learning

Operations required for the computation of the joint event that the system is in state Si and time t and State Sj at time t+1


Problem 3: Learning Let be a probability of being in state 𝑖 at time 𝑡, given 𝑂 - expected no. of transitions from state i - expected no. of transitions

1( ) ( , )

N

t tj

i i jγ ξ=

= ∑1

1( )

T

tt

iγ−

=∑1

1( , )

T

tt

i jξ−

=∑ ji →

( )t iγ


Problem 3: Learning

the expected frequency of state i at time t=1 ratio of expected no. of transitions from

state i to j over expected no. of transitions from state i ratio of expected no. of times in

state j observing symbol k over expected no. of times in state j

∑∑=

)(),(

ˆiji

at

tij γ

ξ

Step 2 of Baum-Welch algorithm:

,( )ˆ ( )

( )t

tt o kj

t

jb k

j

γ

γ==

∑∑

)(ˆ 1 iγπ =


Problem 3: Learning

• Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables 𝛼,𝛽

• B-W algorithm is a special case of the EM algorithm: • E-step: calculation of ξ and γ • M-step: iterative calculation of , ,

• Practical issues: • Can get stuck in local maxima • Numerical problems – log and scaling

π ija )(ˆ kbj


Now HMMs and Vision: Gesture Recognition…


"Gesture recognition"-like activities


Some thoughts about gesture • There is a conference on Face and Gesture Recognition

so obviously Gesture recognition is an important problem…

• Prototype scenario: • Subject does several examples of "each gesture" • System "learns" (or is trained) to have some sort of model for each • At run time compare input to known models and pick one

• New found life for gesture recognition:


Generic Gesture Recognition using HMMs

Nam, Y., & Wohn, K. (1996, July). Recognition of space-time hand-gestures using hidden Markov model. In ACM symposium on Virtual reality software and technology (pp. 51-58).


Generic gesture recognition using HMMs (1)

Data glove










Wins and Losses of HMMs in Gesture • Good points about HMMs:

• A learning paradigm that acquires spatial and temporal models and does some amount of feature selection.

• Recognition is fast; training is not so fast but not too bad.

• Not so good points: • If you know something about state definitions, difficult to

incorporate • Every gesture is a new class, independent of anything else you’ve

learned. • ->Particularly bad for “parameterized gesture.”


Parameterized Gesture

“I caught a fish this big.”


Parametric HMMs (PAMI, 1999)

• Basic ideas: • Make output probabilities of the state be a function of the parameter of

interest, 𝑏𝑗 (𝒙) becomes 𝑏′𝑗(𝒙,𝜃). • Maintain same temporal properties, 𝑎𝑖𝑗 unchanged. • Train with known parameter values to solve for dependencies of 𝑏𝑏 on θ. • During testing, use EM to find θ that gives the highest probability. That

probability is confidence in recognition; best θ is the parameter.

• Issues: • How to represent dependence

on θ ? • How to train given θ ? • How to test for θ ? • What are the limitations on

dependence on θ ?


Linear PHMM - Representation Represent dependence on θ as linear movement of the

mean of the Gaussians of the states: Need to learn Wj and µj for each state j. (ICCV ’98)


Linear PHMM - training • Need to derive EM equations for linear parameters and

proceed as normal:


Linear HMM - testing • Derive EM equations with respect to θ :

• We are testing by EM! (i.e. iterative): • Solve for γtk given guess for θ • Solve for θ given guess for γtk


How big was the fish?


Pointing

• Pointing is the prototypical example of a parameterized gesture.

• Assuming two DOF, can parameterize either by (x,y) or by (θ,φ) .

• Under linear assumption must choose carefully.

• A generalized non-linear map would allow greater freedom. (ICCV 99)


Linear pointing results Test for both recognition and recovery:

If prune based on legal θ (MAP via uniform density) :


Noise sensitivity

• Compare ad hoc procedure with PHMM parameter recovery (ignoring “their” recognition problem!!).


HMMs and vision • HMMs capture sequencing nicely in a probabilistic

manner.

• Moderate time to train, fast to test.

• More when we do activity recognition…

Date post:	29-Oct-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

CS 4495 Computer Vision Hidden Markov Modelsafb/classes/CS4495-Fall...Time series prediction ... •...

Documents