Transcript

Hidden Markov Model

Nov 11, 2008

Sung-Bae Cho

• Hidden Markov Model

• Inference of Hidden Markov Model

• Path Tracking of HMM

• Learning of Hidden Markov Model

• Hidden Markov Model Applications

• Summary & Review

Agenda

Temporal Pattern Recognition

• The world is constantly changing.

• Temporal data sequence= …, X2, X1, X0, X1, X2, …

• Observed vs. real value

– Real value: X

– Observation: Y

3

Xt

YYtt

~

Hidden Concept and Actual Realization

1 2 3 nY YY Y Y 1 2 3 nX X X X X

reality

idea

X1

X2

X3

X4 X5

X6

X7

X8

Y1

Y2

Y3

Y4 Y5

Y6

Y7

Y8

4

Hidden Markov Model

• Definition:

– A statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters

– challenge is to determine the hidden parameters from the observable parameters

– Extracted model parameters can be used to perform further analysis

• Expression:

• A hidden random variable Xt that conditions another random variable Yt

X1 X2 Xt …

Y1 Y2 Yt …

1( , )t t tX Y

1

2

3

N

P2|1 P3|2P1|1

( ) ( | )t t tP X P Y X

Xt S = {1, 2, …, N}

5

Random Processes

• Xt1 Xt or Xt | Xt1 => Markov process

– Description: {P(Xt|Xt1)}

• Yt | Xt => Random process (often a Gaussian process)

– Description: {P(Yt|Xt)}

• Combination: {P(Xt|Xt1)P(Yt|Xt)}

– Doubly stochastic process

X1 X2 Xt…

Y1 Y2 Yt…

6

Why HMM?

• A good model for highly variable discrete-time sequence

– often noisy, uncertain and incomplete

• Generalization of DTW matching template

• Rigorous and theoretical foundation

– the model can be optimized

• Models spatiotemporal variabilities elegantly

– greater variability-modeling power than M-Chain

• Efficient inference/computation algorithms

• Theoretical and robust learning algorithm

• Can be combined to model complex patterns (composition and extension)

7

What is an HMM - Notation

• Three sets of parameters = (, A, B)

– Initial state probabilities :

= {i : i = Pr(X1= i)}

• constraints:

– Transition probabilities A:

A = {aij : aij = Pr(Xt+1= j|Xt= i)}

• constraints:

– Observation probabilities B:

B = {bj(v) : bj(v) = Pr(ot= v|Xt= j)}

• constraints:

1

0, 1N

i ii

1

0, 1N

ii ijj

a a

( ) 0, ( ) 1j jv V

b v b v

8

Model Parameters

• State space/alphabet:

• Matrices:S = { 1, 2, 3 }, N = 3

V = { 1, 2, 3, 4 }

.4

.3

.1

.2

.4

.3

.1

.2

.4

.3

.1

.2

b1(v) =

.21 2 3a12= 0.3

.7 .4.6

.8

.7 .3 .0

.0 .4 .6

.0 .2 .8A =

1. 0. 0. =

• N : num. of hidden state

• Q : state set

–

• M : num. of observation symbol

• S : observation symbol set

–

• A : transition probabilities

• B : observation probabilities

• π : initial state probabilities

• λ : HMM model

– λ=(A, B, π)

},,,{ 21 NqqqQ

},,,{ 21 NsssS

9

Markov Model Rule

• Observation sequence

–

• Chain rule

–

• Markov assumption

– Observation oi is affected by only observation oi-1

–

• Markov chain rule

–

T

iiiT oooPoooP

11121 ),,|(),,,(

},,,{ 21 ToooO

T

iiiT ooPoooP

1121 )|(),,,(

)|(),,|( 111 iiii ooPoooP

10

• Hidden Markov Model

• Inference of Hidden Markov Model

• Path Tracking of HMM

• Learning of Hidden Markov Model

• Hidden Markov Model Applications

• Summary & Review

Agenda

Three Basic Problems

• Evaluation (Estimation) Problem• given an HMM • given an observation• compute the probability of the observation

Solution: Forward Algorithm, Backward Algorithm• Decoding Problem

• given an HMM• given an observation • compute the most likely state sequence• i.e.

Solution: Viterbi Algorithm• Learning / optimization problem

• given an HMM • given an observation• find an HMM such that

Solution: Baum-Welch Algorithm

1 2{ , ,... | }Tp o o o 1 2, ,... To o o

1 2, ,... To o o

1 2, ,...,q q qTs s s

1,... 1 2 1arg max ( , ,..., | ,... , )q qT T Tp o o o q q

1 2, ,... To o o

1 2 1 1 2{ , ,... | } { , ,... | }T Tp o o o p o o o

The Evaluation Problem

• We know :

=

• From this :

=

• Obvious:

for sufficiently large values of T, it is infeasible to compute the above term for all possible state sequences need other solution

1 2 1 2( , ,... | , , ,..., )T q q qTp o o o s s s

1 1 1 11 11,... 1( ) ( ) ( )

k k kq q q q q kk Ts b o a b o

1 2( , ,... | )Tp o o o

1 1 1 112

3

1 11,... 1,... 11,...1,...

1,...

( ) ( ) ( )k k k

T

q q q q q kq N k Tq Nq N

q N

s b o a b o

The Forward Algorithm

• At time t and state i, probability of partial observation sequence

: array

• As a result at the last time T

1 2, ,... to o o ( )t i

1 1( ) ( )i ii b o 1 i N [ ][ ]time state

1 11

( ) [ ( ) ] ( )N

t t ij j ti

j i a b o

[ ][ ] ( )timetime state state

[ ][ ]state

T state1 2( , ,... | )Tp o o o

Forward Algorithm

• Definition

• Algorithm

– Initialization

– Induction

– End condition

15

1(i) ibi(o1) 1i N

)|,...()( 1 ittt sqooPi

)()()(1

1 tj

N

iijtt obaij

P(O | ) T (i)i1

N

)1,2( NjTt

Backward Algorithm

• Definition

• Algorithm

– Initialization

– Induction

– End condition

16

),|...()( 1 itTtt sqooPi

T (i) 1, 1i N

N

jttjijt jobai

111 )()()(

P(O | ) i 1(i)i1

N

)1 ,1,...,1( NiTt

• Hidden Markov Model

• Inference of Hidden Markov Model

• Path Tracking of HMM

• Learning of Hidden Markov Model

• Hidden Markov Model Applications

• Summary & Review

Agenda

The Decoding Problem

• Finding the “optimal” state sequence associated with the given observation sequence

Forward-Backward

• Optimality criterion : to choose the states that are individually most likely at each time t

• The probability of being in state i at time t

• : accounts for partial observation sequence

• : account for remainder

tq

1

( ) ( | , )

( ) ( )

( ) ( )

i t

t tN

t ti

t p q i O

i i

i i

( )t i( )t i 1 2, ,...t t To o o

1 2, ,... to o o

Viterbi Algorithm

• Solution to model decoding problem

– Given Y = O = o1 o2 o3 · · · oT,

– What is the best among all possible state sequences that might have produced O?

The best?• Be evaluated in probabilistic terms

1. A sequence of the most likely states at each time? (Greedy fashion)

2. The most likely complete state sequence (from any one of start states to any one final states): P(X, O|)

The best?• Be evaluated in probabilistic terms

1. A sequence of the most likely states at each time? (Greedy fashion)

2. The most likely complete state sequence (from any one of start states to any one final states): P(X, O|)

1,ˆ

TX

20

Viterbi Path

• is the path whose joint probability with the observation is the most likely:

1,ˆ arg max ( , | )T

XX P O X

1 1 2 2 3 1 1 21 2

( , | ) ( | , ) ( | )( ) ( ) ( )

T T Tx x x x x x x x x x T

P O X P O X P Xa a a b o b o b o

N T possible paths of X

O(TN T) multiplications with exhaustive enumeration

• Simplistic rewriting: (Let X = X1,T = x1 x2 … xT )

21

Viterbi Path Likelihood

Partial Viterbi path likelihood: (for X1,t, tT)

Back pointer to the prev best state

1ˆ( ) arg max ( ) ( )t t ij j t

ij i i a b o

1, 1, 1

1, 1 1, 1

1, 1 1, 1 1, 1 1, 1

1, 1 1, 2 1 1

1 1

( ) Pr( , , | )Pr( , , , | )

( , | ) Pr( , | , , )

max Pr( , , | ) Pr( , | , )

max ( ) Pr( | , , ) Pr( | ,

t t t t

t t t t

t t t t t t

t t t t t ti

t t t t t ti

j O X X jO o X X j

P O X o X j O X

O X X i o X j X i

i o X i X j X j X i

1

)

max ( ) ( ) , 1,..., , 1,...,t j t iji

i b o a j N t T

22

Viterbi Algorithm

1 1( ) ( )i ii b o

1 11

( ) max ( ) ( )t t ij j ti N

j i a b o

)(max1

iP TNi

1( ) 0i

11

( ) arg max ( )t t iji N

j i a

1arg max ( )T T

i Nx i

1 1( ), 1,...,1t t tx x t T

1

2

3

states

• Initialization

• Recursion

• Termination

• Backtracing

23

Viterbi Algorithm: Example

• Viterbi trellis construction

.6

.2

.2

.2

.5

.3

.0

.3

.7

RGB

.5

.6

.4

.4.1

=[1 0 0]T

.5×.2.0018

.00648

.01008

.4×.3

.1×.7.4×.7

.6×.3

.61×.6

0×.2.0

0×.0.0

.5×.2.018

.6×.5.036

.00576

.4×.5

.1×.3.4×.3

.5×.6.18

.6×.2.048

.0

.4×.2

.1×.0.4×.0

P(O, X*|) = Pr(RRGB, X= 1123|) = 0.01008

R R G B

24

• Hidden Markov Model

• Inference of Hidden Markov Model

• Path Tracking of HMM

• Learning of Hidden Markov Model

• Hidden Markov Model Applications

• Summary & Review

Agenda

The Learning / Optimization problem

• How do we adjust the model parameters to maximize

??

• Parameter Estimation

• Baum-Welch Algorithm ( EM : Expectation Maximization )

• Iterative Procedure

( | )P O

Parameter Estimation

• Probability of being in state i at time t, and state j at time t+1

• Probability of being in state i at time t, given the entire observation sequence and the model

• We can relate these by summing over j

1

1 1

1 11 1

( , ) ( , | , )

( ) ( ) ( )

( ) ( ) ( )

t t t

t ij j t t

N N

t ij j t ti j

i j P q i q j O

i a b o j

i a b o j

1

( ) ( , )N

t tj

i i j

Parameter Estimation (3)

• By summing over time index t …

• expected number of times that state i visited

• expected number of transitions made from state i

• That is …

= expected number of times that state i in O

= expected number of transitions made from state i to j

in O

• Update using &

: expected frequency (number of times) in state i at time (t=1)

1

1

( )T

tt

i

1

1

( , )T

tt

i j

( , , )A B ( , )t i j ( )i t_

1( )i i

Parameter Estimation (5)

• New Transition Probability …

expected number of transitions from state i to j

expected number of transitions from state I

=

1

_1

1

1

( , )

( )

T

tt

ij T

tt

i ja

i

Parameter Estimation (6)

• New Observation Probability…

expected number of times in state j and observing symbol

expected number of times in j

=

kv

1_. .

1

( )

( )( )

t k

T

tts t o v

j T

tt

j

b kj

Parameter Estimation (7)

• From , if we define new

• New model is more likely than old model in the sense that

• The observation sequence is more likely to be produced by new model

• has been proved by Baum & his colleagues

• iteratively use new model in place of old model, and repeat the reestimation calculation “ML estimation”

( , , )A B _ _ _ _

( , , )A B

_

( | ) ( | )P O P O

Baum-Welch Algorithm (1)

t (i, j) P(qt si, qt1 s j | O,)• Definition

–

• Calculation

–

• Definition

–

• Calculation

–

t (i) t (i, j)j1

N

t (i, j) t (i) ai, j b j (ot1) t1( j)

t (i) ai, j b j (ot1) t1( j)j1

N

i1

N

),|()( OsqPi itt

32

Baum-Welch Algorithm (2)

• Algorithm

1. Set initial model (λ0)

2. Estimate next model

Calculate: ,

3. Maximization : finding λ

4. If P(O|λ)-P(O|λ0) < threshold then stop

5. Else λ = λ0, move to 2 (repetition)

)(it ),( jit

T

tt

T

ttkt

i

i

ivokb

1

1

)(

)(),()(

1

1

1

1,

)(

),(

T

tt

T

tt

ji

i

jia

)(1 ii

(ot ,vk ) 1, if ot vk, and 0 otherwise

33

Classification Algorithm

• Classification

• Viterbi algorithm

• Domain/linguistic knowledge

– Markov source model for character probability

1,

ˆ arg max ( | ) ( )

arg max ( ) max ( )

T k kk

kk T

ik

k p Y P

P i

P(W) = P(w1 w2 … wn) = P(w1) P(w2|w1) … P(wn|wn-1)P(“123”) = P(“1”) P(“2”|“1”) P(“3”| “2”)

34

• Hidden Markov Model

• Inference of Hidden Markov Model

• Path Tracking of HMM

• Learning of Hidden Markov Model

• Hidden Markov Model Applications

• Summary & Review

Agenda

University of Alberta

• National ICT Australia project, University of Alberta, Canada

• Object

– Motion/gesture recognition of human

• sensors

– active, magnetic field, acoustic, laser, camera sensor

• method

– Coupled hidden Markov model (CHMM)

– Coupled HMMs provide an efficient way to resolve many complex problems, and offer superior training speeds, model likelihoods, and robustness to initial conditions.

– Proposed by M. Brand (1997)

[M. Brand, N. Oliver, and A. Pentland, “Coupled Hidden Markov Models for complex action recognition,” in IEEE Intl. Conf. Comp. Vis. Pat. Rec., 1997, pp. 994.999.]

36

University of Bologna

• Micrel, University of Bologna, Lab Italy, (2004)

• Research

– Setup ubiquitous environments

– Sensory data processing

– Gesture recognition

• Sensors

– Develop: Wireless MOCA (Motion capture with integrated accelerometers)

• Accelerometer, gyroscope

• Small size, small consumption, wireless

• Wearing on body

• Recognition method

– Hidden Markov Model

37

MIT Media LAB

• Media Laboratory, Massachusetts Institute of Technology

• Area: Visual Contextual Awareness in Wearable Computing (1998)

• Sensor: Vision

• Method

– Probabilistic object recognition

• Based on observed diverse featurevector

• Using probabilistic relations (O: object, M: measurement)

– Task recognition with HMM

38

eWatch Sensor Platform

• CMU Computer Science Lab, 2005

• Activity Recognition + improving power consumption

• Hardware

– LCD, LED, vibration motor, speaker,Bluetooth for wireless communication

– Li-Ion battery with a capacity of 700mAh

• Sensors

– a two-axis accelerometer (ADXL202; +/- 2g)

– Microphone, light & temperature sensors

• Method

– multi-class SVMs + HMM based Selective Sampling

39

Summary

• Hidden Markov Model introduction

• HMM inference method (estimation)

• HMM path tracking (decoding)

• HMM learning

• HMM application

4040

Top Related