+ All Categories
Home > Documents > Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs)

Date post: 15-Jan-2016
Category:
Upload: burke
View: 74 times
Download: 0 times
Share this document with a friend
Description:
Hidden Markov Models (HMMs). Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic in nature Can be used for handwriting, keystroke biometrics. Classification with Static Features. Simpler than dynamic problem - PowerPoint PPT Presentation
Popular Tags:
35
1 Hidden Markov Models (HMMs) • Probabilistic Automata • Ubiquitous in Speech/Speaker Recognition/Verification • Suitable for modelling phenomena which are dynamic in nature • Can be used for handwriting, keystroke biometrics
Transcript
Page 1: Hidden  Markov Models  (HMMs)

1

Hidden Markov Models (HMMs)

• Probabilistic Automata

• Ubiquitous in Speech/Speaker Recognition/Verification

• Suitable for modelling phenomena which are dynamic in nature

• Can be used for handwriting, keystroke biometrics

Page 2: Hidden  Markov Models  (HMMs)

2

Classification with Static Features

• Simpler than dynamic problem

• Can use, for example, MLPs

• E.g. In two dimensional space:

x x

xx

xx x o

ooo

oo

o

Page 3: Hidden  Markov Models  (HMMs)

3

Hidden Markov Models (HMMs)• First: Visible VMMs

• Formal Definition• Recognition• Training

• HMMs• Formal Definition• Recognition• Training• Trellis Algorithms

• Forward-Backward• Viterbi

Page 4: Hidden  Markov Models  (HMMs)

4

Visible Markov Models

• Probabilistic Automaton

• N distinct states S = {s1, …, sN}

• M-element output alphabet K = {k1, …, kM}

• Initial state probabilities Π = {πi}, i S

• State transition at t = 1, 2,…

• State trans. probabilities A = {aij}, i,j S

• State sequence X = {X1, …, XT}, Xt S

• Output seq. O = {o1, …, oT}, ot K

Page 5: Hidden  Markov Models  (HMMs)

5

VMM: Weather Example

2.0

5.0

3.0

3

2

1

Page 6: Hidden  Markov Models  (HMMs)

6

Generative VMM

• We choose the state sequence probabilistically…

• We could try this using:• the numbers 1-10• drawing from a hat• an ad-hoc assignment scheme

Page 7: Hidden  Markov Models  (HMMs)

7

• Training Problem– Given an observation sequence O and a “space”

of possible models which spans possible values for model parameters w = {A, Π}, how do we find the model that best explains the observed data?

• Recognition (decoding) problem– Given a model wi = {A, Π}, how do we compute

how likely a certain observation is, i.e. P(O | wi) ?

2 Questions

Page 8: Hidden  Markov Models  (HMMs)

8

Training VMMs

• Given observation sequences Os, we want to find model parameters w = {A, Π} which best explain the observations

• I.e. we want to find values for w = {A, Π} that maximises P(O | w)

• {A, Π} chosen = argmax {A, Π} P(O | {A, Π})

Page 9: Hidden  Markov Models  (HMMs)

9

• Straightforward for VMMs• frequency in state i at time t =1•

(number of transitions from state i to state j)-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(number of transitions from state i)

=(number of transitions from state i to state j)-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(number of times in state i)

Training VMMs

iija

Page 10: Hidden  Markov Models  (HMMs)

10

Recognition

• We need to calculate P(O | wi)

• P(O | wi) is handy for calculating P(wi|O)

• If we have a set of models L = {w1,w2,…,wV} then if we can calculate P(wi|O) we can choose the model which returns the highest probability, i.e.

wchosen = argmax wi L P(wi|O)

Page 11: Hidden  Markov Models  (HMMs)

11

Recognition• Why is P(O | wi) of use?

• Let’s revisit speech for a moment.• In speech we are given a sequence of

observations, e.g. a series of MFCC vectors– E.g. MFCCs taken from frames of length 20-

40ms, every 10-20 ms

• If we have a set of models L = {w1,w2,…,wV} and if we can calculate P(wi|O) we can choose the model which returns the highest probability, i.e.wchosen = argmax wi L P(wi|O)

Page 12: Hidden  Markov Models  (HMMs)

12

wchosen = argmax wi L P(wi|O)• P(wi|O) difficult to calculate as we would have

to have a model for every possible observation sequence O

• Use Bayes’ rule:P(x | y) = P (y | x) P(x) / P(y)

• So now we have

wchosen = argmax wi L P(O |wi) P(wi) / P(O)• P(wi) can be easily calculated• P(O) is the same for each calculation and so

can be ignored

• So P(O |wi) is the key!!!

Page 13: Hidden  Markov Models  (HMMs)

13

Hidden Markov Models• Probabilistic Automaton

• N distinct states S = {s1, …, sN}• M-element output alphabet K = {k1, …, kM}• Initial state probabilities Π = {πi}, i S• State transition at t = 1, 2,…• State trans. probabilities A = {aij}, i,j S• Symbol emission probabilities

B = {bik}, i S, k K• State sequence X = {X1, …, XT}, Xt S• Output sequence O = {o1, …, oT}, ot K

Page 14: Hidden  Markov Models  (HMMs)

14

HMM: Weather Example

Page 15: Hidden  Markov Models  (HMMs)

15

State Emission Distributions

0

0.1

0.2

0.3

0.4

0.5

sunny rainy cloudy

Discrete probability distribution

Page 16: Hidden  Markov Models  (HMMs)

16

State Emission Distributions

Continuous probability distribution

5 10 15 20 25 300

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Temperature (C)

Page 17: Hidden  Markov Models  (HMMs)

17

Generative HMM

• Now we not only choose the state sequence probabilistically…

• …but also the state emissions• Try this yourself using the numbers 1-10 and

drawing from a hat...

Page 18: Hidden  Markov Models  (HMMs)

18

• Recognition (decoding) problem– Given a model wi = {A, B, Π}, how do we compute how likely

a certain observation is, i.e. P(O | wi) ?

• State sequence?– Given the observation sequence and a model how do we

choose a state sequence X = {X1, …, XT} that best explains

the observations

• Training Problem– Given an observation sequence O and a “space” of possible

models which spans possible values for model parameters w = {A, B, Π}, how do we find the model that best explains the observed data?

3 Questions

Page 19: Hidden  Markov Models  (HMMs)

19

Computing P(O | w)• For any particular state sequence X = {X1, …, XT}

we have

TT oXoXoX

ttTt

bbb

wXoP) P(O | X, w

2211

),|(1

and

TT XXXXXXX aaa w) P(X132211

|

Page 20: Hidden  Markov Models  (HMMs)

20

• This requires (2T) NT multiplications

• Very inefficient!

Computing P(O | w)

T

ttttXX

oX

T

XXoXX

X

bab

wXPwXOPwOP

wXPwXOPwXOP

1

1111 2

)|(),|()|(

)|(),|()|,(

Page 21: Hidden  Markov Models  (HMMs)

21

Trellis Algorithms

• Array of states vs. time

Page 22: Hidden  Markov Models  (HMMs)

22

• Overlap in paths implies repetition of the same calculations

• Harness the overlap to make calculations efficient• A node at (si , t) stores info about state sequences

that contain Xt = si

Trellis Algorithms

Page 23: Hidden  Markov Models  (HMMs)

23

• Consider 2 states and 3 time points:

Trellis Algorithms

31112221111

32212111111

31112111111

1

1111 2

)|(),|()|(

osssosssoss

osssosssoss

osssosssoss

XXoX

T

XXoXX

X

babab

babab

babab

bab

wXPwXOPwOP

T

tttt

Page 24: Hidden  Markov Models  (HMMs)

24

• A node at (si , t) stores info about state sequences up to

time t that arrive at si

Forward Algorithm

s1

s2

sj

)(tis

)(1

ts

)(2

ts

)1( tjs

jssa1

jssa2

Page 25: Hidden  Markov Models  (HMMs)

25

Forward Algorithm

N

i s

os

N

i ssss

osss

itts

TwOP

Tt

Njbatt

Nib

wsXoooPt

i

tjjiij

iii

i

1

1

21

)()|(:nTerminatio

11

,1,)()1(

:Induction

1,)1(:tionInitialisa

)|,()(:Definition

1

1

Page 26: Hidden  Markov Models  (HMMs)

26

• A node at (si , t) stores info about

state sequences from time t that evolve from

si

Backward Algorithm

s1

s2

si

)(tis

)1(1

ts

)1(2

ts

)(tis

111 ti osss ba

122 ti osss ba

Page 27: Hidden  Markov Models  (HMMs)

27

Backward Algorithm

N

i soss

sos

N

j sss

s

itTtts

iii

jtjjii

i

i

bwOP

Tt

Njtbat

NiT

wsXoooPt

1

1

21

)1()|(:nTerminatio

1,,1

,1),1()(

:Induction

1,1)(:tionInitialisa

)|,()(:Definition

1

1

Page 28: Hidden  Markov Models  (HMMs)

28

• P(O | w) as calculated from the forward and backward algorithms should be the same

• FB algorithm usually used in training• FB algorithm not suited to recognition as it

considers all possible state sequences• In reality, we would like to only consider the

“best” state sequence (HMM problem 2)

Forward & Backward Algorithms

Page 29: Hidden  Markov Models  (HMMs)

29

“Best” State Sequence

• How is “best” defined?

• We could choose most likely individual state at each time t:

),|(maxarg1

wOsXP itNi

Page 30: Hidden  Markov Models  (HMMs)

30

),|(maxarg1

wOsXP itNi

•Define

N

i ss

ss

N

i it

it

it

its

tt

tt

wsXOP

wsXOP

wOP

wsXOP

wOsXPt

ii

ii

i

1

1

)()(

)()(

)|,(

)|,(

)|(

)|,(

),|()(

Page 31: Hidden  Markov Models  (HMMs)

31

Viterbi Algorithm

…may produce an unlikely or even invalid state sequence

• One solution is to choose the most likely state sequence:

),|(maxarg1

wOsXP itNi

)|,(maxarg),|(maxarg wOXPwOXPXX

Page 32: Hidden  Markov Models  (HMMs)

32

• Define

Viterbi Algorithm

wooosXXXXPt tittXXX

st

i|,,max)( 21121

121

)(tis is the best score along a single path, at time

t, which accounts for the first t observations and

ends in state si

By induction we have:

1])([max)1(

tijiij ossssi

s batt

Page 33: Hidden  Markov Models  (HMMs)

33

Viterbi Algorithm

Tt

Njat

Tt

Njbatt

Nib

jiii

tijiij

i

iii

sssNi

s

ossssNi

s

s

osss

2

1,])1([maxarg)1(

2

1,])1([max)(

:Recursion

0)1(

1,)1(

:tionInitialisa

1

1

1

Page 34: Hidden  Markov Models  (HMMs)

34

Viterbi Algorithm

1,,2,1),1(

:ngbacktrackiby Path

)]([maxarg

)]([max

:nTerminatio

*1

*

1

*

1

*

TTttX

TX

TP

t

i

i

Xt

sNi

T

sNi

Page 35: Hidden  Markov Models  (HMMs)

35

Viterbi vs. Forward Algorithm

• Similar in implementation– Forward sums over all incoming paths– Viterbi maximises

• Viterbi probability Forward probability

• Both efficiently implemented using a trellis structure


Recommended