+ All Categories
Home > Documents > An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process...

An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process...

Date post: 20-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
69
An introduction to Hidden Markov Models Arnaud Hubaux Computer Science Institute FUNDP - Namur March 20, 2007 Academic year 2006-2007 Arnaud Hubaux An introduction to Hidden Markov Models
Transcript
Page 1: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

An introduction to Hidden Markov Models

Arnaud Hubaux

Computer Science InstituteFUNDP - Namur

March 20, 2007

Academic year 2006-2007

Arnaud Hubaux An introduction to Hidden Markov Models

Page 2: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Outline

Part I ConceptsPart II Basic problemsPart III Case studyPart IV Conclusion

Arnaud Hubaux An introduction to Hidden Markov Models

Page 3: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

Part I

Concepts

Arnaud Hubaux An introduction to Hidden Markov Models

Page 4: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Outline

1 IntroductionOverviewStochastic processMarkov chain

2 HMM

Arnaud Hubaux An introduction to Hidden Markov Models

Page 5: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Overview

Firstly described in statistical papers in the mid 60s

stochastic process → Markov process → HMM

One of the first application was speech recognition in the mid 70s

Applications to the analysis of biological sequences (DNA) started inthe mid 80s

Nowadays they are used in:1 gesture and body motion recognition2 optical character recognition3 bioinformatics4 information extraction5 . . .

Arnaud Hubaux An introduction to Hidden Markov Models

Page 6: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Stochastic process

Definition ([Bil02])

A discrete-time stochastic process is a collection{Xt : 1 ≤ t ≤ T t ∈ IN} of random variables ordered by the discretetime index t.

In general, the distribution for each of the variables Xt can bearbitrary and different for each t.

There may also be arbitrary conditional independence relationshipsbetween different subsets of variables of the process.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 7: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Markov chainDefinition (cont’d)

Definition ([Bil02])

A collection of discrete-valued random variables {Qt : t ≥ 1} forms annth-order Markov chain if it only depends on the n previous states, i.e.

P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Q1 = q1] =

P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Qt−n = qt−n]

∀q1, . . . , qt and n < t.In particular:

P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Q1 = q1] =

P[Qt = qt |Qt−1 = qt−1]

is a first-order Markov chain.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 8: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Markov chainDefinition (cont’d)

The event {Qt = qi} is seen as if the chain is in state qi at time t

The event {Qt = qi ,Qt+1 = qj} is the transition from state qi tostate qj at time t

A discrete time homogeneous Markov chain can be seen as a finitestate automata (FSA) with conditional probabilites on transitions

Definition

A Markov process is a stochastic process that has the Markovproperty.

The term Markov chain is used to mean a discrete-time Markovprocess.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 9: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Markov chainTransition probabilities

Statistical evolution of a Markov chain is determined by statetransition probabilities:

aij(t) = P[Qt = qj |Qt−1 = qi ]

⇒ function of states succession and current time

If the chain is time-independent, i.e. it does not depend on time t:(time-)homogeneous chain and:

∀t : aij(t) = aij

In an homogeneous chain, the (stochastic) transition matrix is Awhere ∀i , j :

aij = (A)ij

which is the probability to go from state i to state j with

∀i :∑

j

aij = 1 ∧ aij ≥ 0

Arnaud Hubaux An introduction to Hidden Markov Models

Page 10: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

OverviewStochastic processMarkov chain

Markov chainExample

Given the transition matrix:

A =

0.2 0.4 0.40.1 0.7 0.20.5 0.5 0

the FSA corresponding to this homogeneous first-order Markov chain is:

S1

S2 S3

0.4

0.4

0.2

0.1

0.20.7

0.5

0.5

Arnaud Hubaux An introduction to Hidden Markov Models

Page 11: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

Outline

1 Introduction

2 HMMDefinitionsBasic problemsImplementation problems

Arnaud Hubaux An introduction to Hidden Markov Models

Page 12: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsIntroductory example

Lets suppose a dishonest casino where you can play betting games suchas coin tossing. Tosses are supposed to be independent. Imagine nowthat the casino has two coins:

the first coin, c1, is fair

the second coin, c2 is unfair and has a probability of head H of 0.3and tail T of 0.7

The coin flipped is changed at every play with a probability of 0.1 toavoid awaking players’ suspicion.Note that each coin has a probability of 0.5 to be chosen for the first play.How is it possible to model such a problem ?How can we, given an observed result, determine the coins that havebeen tossed ?

Arnaud Hubaux An introduction to Hidden Markov Models

Page 13: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsNotations

Lets define:1 A set S = {S1,S2, . . . ,SN} of N states where the observed state at

time t will be noted qt2 An alphabet V = {v1, v2, . . . , vM} of size M corresponding to the

output symbols3 A state transition probability distribution matrix A = {aij} where

aij = P[qt = Sj |qt−1 = Si ] 1 ≤ i , j ≤ N

4 An observation symbol probability distribution matrix B = {bj(k)}where

bj(k) = P[vk at t|qt = Sj ] 1 ≤ j ≤ N

1 ≤ k ≤ M

5 An initial state distribution vector π = {πi} where

πi = P[q1 = Si ] 1 ≤ i ≤ N

Arnaud Hubaux An introduction to Hidden Markov Models

Page 14: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsHMM

Definition ([Rab90])

A complete specification of an HMM is defined with:

1 the probability measures of A, B and π noted

λ = {A,B, π}

where λ is called an HMM model

2 the observation symbols alphabet V

3 the N and M model parameters

Arnaud Hubaux An introduction to Hidden Markov Models

Page 15: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsObservation sequence

Definition ([Rab90])

HMM can be used to generate an observation sequence

O = O1O2 . . .OT

where Ot ∈ V , 1 ≤ t ≤ T as follows

1 choose q1 = Si according to π

2 set t = 1

3 choose Ot = vk according to bi (k) of state Si

4 transit to qt+1 = Sj according to aij of state Si

5 if t < T then set t = t + 1 and go to 3 else end

Arnaud Hubaux An introduction to Hidden Markov Models

Page 16: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsIntroductory example (cont’d)

Given the previous example, there comes:

1 the number of states N = 2

2 the alphabet is V = {H,T} and thus M = 2

3 λ = {A,B, π} is

A =

[0.9 0.10.1 0.9

]B =

[0.5 0.50.3 0.7

]π =

[0.50.5

]where A is the transition matrix associated to transitions between c1

and c2, B is the observation symbols probability matrix and π theinitial transition probability vector.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 17: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsRepresentations of HMM

FSA: shows only the state transitions

S1

S2 S3

⇒ easy analysis of the HMM topology

Distribution dependence: shows the dependence between theobserved symbols and the hidden states

. . . qt+1 qt+2 qt+3 qt+4 . . .

Ot+1 Ot+2 Ot+3 Ot+4

⇒ better view of the distributions dependence

Arnaud Hubaux An introduction to Hidden Markov Models

Page 18: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsRepresentations of HMM (cont’d)

Time step evolution: shows the collection of states and thetransitions between states at each successive time step

S1

S2

S3

t1 t2 t3

. . .

. . .

. . .

⇒ possible to represent non-homogeneous Markov Chain

Arnaud Hubaux An introduction to Hidden Markov Models

Page 19: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsModels of HMM

Ergodic model: every state can be reached from any state in asingle step, i.e. ∀i , j : aij 6= 0

S1

S2 S3

Left-Right/Bakis model: as time increases, state indexes increase,

i.e. ∀i , j : j < i → aij = 0 and πi =

{0 i 6= 11 i = 1

S1 S2 S3 S4

Arnaud Hubaux An introduction to Hidden Markov Models

Page 20: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

DefinitionsHMM variants

Null transition model: some transitions produce no output, i.e.jump from one state to another without producing an observablesymbol

S1 S2 S3 S4

Tied states model: tie states whose parameters are the same. Ithas the advantage to reduce the number of parameters to alter whentraining the model

State duration model: add explicit state duration (instead ofsimple loop)

. . .

Arnaud Hubaux An introduction to Hidden Markov Models

Page 21: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

Basic problems [Rab90]Overview

1 Evaluation problemHow to efficiently compute P(O|λ) for some given O and λ ?

2 State sequence problemHow to find the state sequence Q = q1q2 . . . qT which best matchessome given O and λ ?

3 Training problemHow to adjust the parameters of λ to maximize P(O|λ) for a givenO ?

Arnaud Hubaux An introduction to Hidden Markov Models

Page 22: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

Implementation problems [Rab90]Overview

ScalingHow to avoid precision loss (computer limits) due to hugecomputations ?

→ scaling procedures

Multiple observation sequenceHow to ensure that the observation sequence run through everystates of the model ?

→ use multiple observation sequences

Initial estimateHow to initialise the values of A, B and π ?

→ A, π: random, uniform→ B: manual

Arnaud Hubaux An introduction to Hidden Markov Models

Page 23: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

IntroductionHMM

DefinitionsBasic problemsImplementation problems

Implementation problems (cont’d)Overview

Insufficient trainingHow is it possible to ensure that the model has been sufficientlytrained ?

→ increase the size of the observation set→ reduce the size of the model→ segment the model training

Model choiceHow to chose the model architecture, size and observation symbols ?

→ trial/error→ best practice

Arnaud Hubaux An introduction to Hidden Markov Models

Page 24: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Part II

Basic Problems

Arnaud Hubaux An introduction to Hidden Markov Models

Page 25: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Outline

3 Evaluation problemSimple approachForward/Backward procedure

4 State sequence problem

5 Training problem

Arnaud Hubaux An introduction to Hidden Markov Models

Page 26: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Simple approach

Given O = O1O2 . . .OT and λ = {A,B, π} enumerate every possiblestate sequence of length T .For a given state sequence with q1 as initial state

Q = q1q2 . . . qT

the probability of observing O for this sequence is

P(O|Q, λ) =T∏

t=1

P(Ot |qt , λ)

which is equal to

P(O|Q, λ) = bq1(O1)bq2(O2) . . . bqT(OT )

where the probability of such sequence is

P(Q|λ) = πq1aq1q2aq2q3 . . . aqT−1qT

Arnaud Hubaux An introduction to Hidden Markov Models

Page 27: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Simple approach (cont’d)

The joint probability of O and Q, i.e. their simultaneous occurrence is

P(O,Q|λ) = P(O|Q, λ)P(Q|λ)

From this we can derive

P(O|λ) =∑all Q

P(O|Q, λ)P(Q|λ)

=∑

q1q2...qT∈Q

πq1bq1(O1)aq1q2bq2(O2) . . . aqT−1qTbqT

(OT )

Such an approach implies:

1 (2T − 1)NT multiplications2 NT − 1 additions

where NT is the number of possible state sequences

⇒ O(2TNT ) complexity

Arnaud Hubaux An introduction to Hidden Markov Models

Page 28: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureForward variable definition

Definition

Consider the forward variable αt(i) defined as

αt(i) = P(O1O2 . . .Ot , qt = Si |λ)

which is the probability to observe O1O2 . . .Ot until time t and to be instate Si at time t given the model λ

Arnaud Hubaux An introduction to Hidden Markov Models

Page 29: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureForward procedure algorithm

αt(i) can be solved inductively:

1 Initialization:

α1(i) = πibi (O1) 1 ≤ i ≤ N

2 Induction:

αt+1(j) =

[N∑

i=1

αt(i)aij

]bj(Ot+1) 1 ≤ t ≤ T − 1

1 ≤ j ≤ N

3 Termination:

P(O|λ) =N∑

i=1

αT (i)

Arnaud Hubaux An introduction to Hidden Markov Models

Page 30: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureForward procedure evaluation

This procedure requires:

1 N(N + 1)(T − 1) + N multiplications

2 N(N − 1)(T − 1) additions

⇒ O(N2T ) complexity

For N = 5 and T = 100 there comes:

Forward Simple# computations ≈ 2500 1072

Arnaud Hubaux An introduction to Hidden Markov Models

Page 31: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureForward procedure latice

Implementation of the computation of αt(i) in terms of:

states i

a lattice of observations t

0 1 2 T

1

2

N

Observation (t)

Sta

te

Arnaud Hubaux An introduction to Hidden Markov Models

Page 32: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureBackward variable definition

Definition

Consider the backward variable βt(i) defined as

βt(i) = P(Ot+1Ot+2 . . .OT |qt = Si , λ)

which is probability of the partial observation sequence from t + 1 to T ,given state Si at time t and the model λ

Arnaud Hubaux An introduction to Hidden Markov Models

Page 33: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Simple approachForward/Backward procedure

Forward/Backward procedureBackward procedure algorithm

βt(i) can be solved inductively:

1 Initialization:

βT (i) = 1 1 ≤ i ≤ N

2 Induction:

βt(i) =N∑

j=1

aijbj(Ot+1)βt+1(j) t = T − 1,T − 2, . . . , 1

1 ≤ i ≤ N

Arnaud Hubaux An introduction to Hidden Markov Models

Page 34: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Outline

3 Evaluation problem

4 State sequence problemProblem specificationIndividually most likely state methodViterbi algorithm

5 Training problem

Arnaud Hubaux An introduction to Hidden Markov Models

Page 35: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Problem specification

Observation: there is not only a single way of finding the optimal statesequence.Solutions:

1 choose individually most likely states qt

2 find the single best state sequence

Solution 1 maximizes the number of correct states by choosing the mostlikely state for each t.Solution 2 is the most widely used method and the state sequencedetermination is achieved by the Viterbi algorithm.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 36: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Individually most likely state method

Lets define the probability of being in state Si at time t given a model λand an observation sequence O:

γt(i) = P(qt = Si |O, λ)

=αt(i)βt(i)

P(O|λ)

=αt(i)βt(i)∑Ni=1 αt(i)βt(i)

whereN∑

i=1

γt(i) = 1

The most likely state qt at time t is

qt = argmax1≤i≤N

[γt(i)] 1 ≤ t ≤ T

⇒ no regard to the probability of occurrence of the sequence of states⇒ invalid states sequences may appear

Arnaud Hubaux An introduction to Hidden Markov Models

Page 37: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmIntroduction

Goal: find the best state sequence Q = q1q2 . . . qT for the correspondingobservation O = O1O2 . . .OT

Lets define

δt(i) = maxq1q2...qt−1

P[q1q2 . . . qt = Si ,O1O2 . . .Ot |λ]

which is the highest probability along a single path, at time t, whichaccounts for the first t observations and ends in state Si with inductionstep

δt+1(j) = [maxiδt(i)aij ]bj(Ot+1)

where arguments maximizing δt+1(j) for each t and j are stored in anarray ψt(j)

Arnaud Hubaux An introduction to Hidden Markov Models

Page 38: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmAlgorithm

1 Initialization:

δ1(i) = πibi (01) 1 ≤ i ≤ Nψ1(i) = 0

2 Recursion:

δt(j) = max1≤i≤N

[δt−1(i)aij ]bj(Ot) 2 ≤ t ≤ T , 1 ≤ j ≤ N

ψt(j) = argmax1≤i≤N

[δt−1(i)aij ] 2 ≤ t ≤ T , 1 ≤ j ≤ N

3 Termination:P∗ = max

1≤i≤N[δT (i)]

q∗T = argmax1≤i≤N

[δT (i)]

4 State sequence backtracking:

q∗t = ψt+1(q∗t+1) t = T − 1,T − 2, . . . , 1

Arnaud Hubaux An introduction to Hidden Markov Models

Page 39: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmAlgorithm (con’d)

The 3 first steps are quite similar to the forward procedure where the∑has been replaced by max

The computation may also be implemented with a lattice

Arnaud Hubaux An introduction to Hidden Markov Models

Page 40: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmExample [Oco06]

Lets define

the alphabetV = {a, b, c}

the model λ = {A,B, π} is given by

A =

0.3 0.3 0.40.4 0.4 0.20.1 0.6 0.3

B =

0.25 0.35 0.40.1 0.25 0.650.5 0.45 0.05

π =

0.20.30.5

Arnaud Hubaux An introduction to Hidden Markov Models

Page 41: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmExample (cont’d)

Goal: find the path q1, q2, q3 which best matches O = a, b, cSolution:

Initialisation:γ1(1) = 0.2× 0.25 = 0.05γ1(2) = 0.3× 0.1 = 0.03γ1(3) = 0.5× 0.5 = 0.25

ψ1 = [0 0 0]

Recursion:

γ2(1) = max [0.05× 0.3, 0.03× 0.4, 0.25× 0.1]× 0.35 = 0.00875γ2(2) = max [0.05× 0.3, 0.03× 0.4, 0.25× 0.6]× 0.25 = 0.0375γ2(3) = max [0.05× 0.4, 0.03× 0.2, 0.25× 0.3]× 0.45 = 0.03375

ψ2 = [3 3 3]

Arnaud Hubaux An introduction to Hidden Markov Models

Page 42: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmExample (cont’d)

Solution:

Recursion:

γ3(1) = max [0.00875× 0.3, 0.0375× 0.4, 0.03375× 0.1]× 0.4= 0.006

γ3(2) = max [0.00875× 0.3, 0.0375× 0.4, 0.03375× 0.6]× 0.65= 0.013625

γ3(3) = max [0.00875× 0.4, 0.0375× 0.2, 0.03375× 0.3]× 0.5= 0.0050625

ψ3 = [2 3 3]

Terminaison:

P∗ = max [0.006, 0.013625, 0.0050625] = 0.013625q∗T = argmax [0.006, 0.013625, 0.0050625] = 2

Arnaud Hubaux An introduction to Hidden Markov Models

Page 43: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationIndividually most likely state methodViterbi algorithm

Viterbi algorithmExample (cont’d)

Solution:

Backtracking :

S1

a

S1

b

S1

c

0.006

S2 S2 S2 0.013625

S3 S3 S3 0.0050625

0

⇒ Maximum probability path: q1 = S3, q2 = S3 and q3 = S2

Arnaud Hubaux An introduction to Hidden Markov Models

Page 44: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Outline

3 Evaluation problem

4 State sequence problem

5 Training problemProblem specificationHMM training

Arnaud Hubaux An introduction to Hidden Markov Models

Page 45: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Problem specification

Observation: adjusting the model λ parameters to maximize theprobability of the observed sequence is the most intricate problem to solve

Issue: no known way to analytically solve the problem

Solution: an iterative procedure maximizing P(O|λ) given adjusted λparameters: Baum-Welch Method(other alternative: gradient technique)

Arnaud Hubaux An introduction to Hidden Markov Models

Page 46: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Baum-Welch Method

Lets define

ξt(i , j) = P(qt = Si , qt+1 = Sj |O, λ)

which is the probability of being in state Si at time t and in state Sj attime t + 1.It can also be defined in terms of the forward/backward variables

ξt(i , j) =P(qt = Si , qt+1 = Sj ,O|λ)

P(O|λ)

=αt(i)aijbj(Ot+1)βt+1(j)

P(O|λ)

=αt(i)aijbj(Ot+1)βt+1(j)∑N

i=1

∑Nj=1 αt(i)aijbj(Ot+1)βt+1(j)

Arnaud Hubaux An introduction to Hidden Markov Models

Page 47: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Baum-Welch Method (cont’d)

Lets define

γt(i) =N∑

j=1

ξt(i , j)

which is the probability of being in state Si at time t for a given O and λ.From this we can derive:

1 the expected number of transitions from Si from 1 to T − 1, i.e.

T−1∑t=1

γt(i)

2 the expected number of transitions from Si to Sj from 1 to T − 1,i.e.

T−1∑t=1

ξt(i , j)

Arnaud Hubaux An introduction to Hidden Markov Models

Page 48: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Baum-Welch Method (cont’d)

From the previous formula, we can derive re-estimations of A, B and π.

1 the expected frequency (number of times) in state Si at time t = 1:

πi = γ1(i) s.t.N∑

i=1

πi = 1

2 theexpected number of transitions from Si to Sj

expected number of transitions from Si:

aij =

∑T−1t=1 ξt(i , j)∑T−1t=1 γt(i)

s.t.N∑

j=1

aij = 1 1 ≤ i ≤ N

3 theexpected number of times in Sj observing symbol vk

expected number of times in Sj:

bj(k) =

∑Tt=1

s.t.Ot=vt

γt(j)∑Tt=1 γt(j)

s.t.M∑

k=1

bj(k) = 1 1 ≤ j ≤ N

Arnaud Hubaux An introduction to Hidden Markov Models

Page 49: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Baum-Welch Method (cont’d)

The re-estimated model λ = {A,B, π} may imply:

1 P(O|λ) < P(O|λ): model λ is more likely than λ

2 P(O|λ) = P(O|λ): λ defines the critical point of the likelihoodfunction

3 P(O|λ) > P(O|λ): model λ is more likely than λ1 iteratively use λ as the new model⇒ the re-estimation procedure will improve the probability ofobserving O

2 stop when some limiting point is reached

The final result is called the maximum likelihood estimate of the HMM

Issue: avoid local maxima

Arnaud Hubaux An introduction to Hidden Markov Models

Page 50: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Evaluation problemState sequence problem

Training problem

Problem specificationHMM training

Baum-Welch Method (cont’d)

In order to maximize P(O|λ) one can use the Baum’s auxiliary function:

Q(λ, λ) =∑Q

P(Q|O, λ)log [P(O,Q|λ)]

over λ which leads to increased likelihood, i.e.:

maxλ

[Q(λ, λ)] ⇒ P(O|λ) ≥ P(O|λ)

At the end, the likelihood function converges to a critical point

Arnaud Hubaux An introduction to Hidden Markov Models

Page 51: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Part III

Case Study

Arnaud Hubaux An introduction to Hidden Markov Models

Page 52: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

ContextModelSolution

Outline

6 Information extractionContextModelSolution

7 Face recognition

Arnaud Hubaux An introduction to Hidden Markov Models

Page 53: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

ContextModelSolution

Context [SMR99]

Imagine you have a library of computer science papers at your disposal.What kind of information could you extract from their headers:

title

author

names

affiliation

address

notes

email

. . .

How could we tag such words ?

Arnaud Hubaux An introduction to Hidden Markov Models

Page 54: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

ContextModelSolution

Model

Each element of this list will be considered as a class that we want toextract. Each class is represented by one or more states in the model.Each state emits words from a class-specific distribution.

abstract end

keyword

note

addresspubnum

emailaffiliation

dateauthor

0.84

.01

.01

.02

.01

.11

.61

.7

.93

.19

.04

.87

.09

.96

.1

.17

.73.97

.03

.04.24

.07.11

.03

.88

.12

.04.08

note

title

pubnum0.4

start

0.11

0.93

0.88

0.86

.07

0.6

0.03

0.12

We can learn by training:

1 class-specific distribution

2 state transition probabilities

Arnaud Hubaux An introduction to Hidden Markov Models

Page 55: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

ContextModelSolution

Solution

Label elements from headers with classes.In order to achieve this goal:

1 treat each word from the header as an observation

2 recover the most-likely state sequence with the Viterbi algorithm

3 assign the states to the words as their class tag

Arnaud Hubaux An introduction to Hidden Markov Models

Page 56: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

Outline

6 Information extraction

7 Face recognitionFace recognitionModelSolution

Arnaud Hubaux An introduction to Hidden Markov Models

Page 57: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

Context [NH]

Imagine you have a list of staff members.An identification card containing information such as the name, thefunction and the face is associated to each member.How could you develop a system that automatically recognizes theworkers as they enter the company building ?More precisely, how is it possible to recognize elements as the:

hair

forehead

eyes

nose

mouth

and find matchings with saved information ?

Arnaud Hubaux An introduction to Hidden Markov Models

Page 58: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

Model

As these elements appear in a fixed order we can define the followingHMM model:

SH SF SE SN SM

a12

a11

a23

a22

a34

a33 a44

a45

a55

Where states S mean:

SH : the hair

SF : the forehead

SE : the eyes

SN : the nose

SM : the mouth

Arnaud Hubaux An introduction to Hidden Markov Models

Page 59: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

Model (con’t)

Lets define:

W : the image width

H: the image height

L: the segment height

P: the height of the overlapping between the segments

T : the number of segments constituting the image is given by:

T =H − L

L− P+ 1

which must be chosen very carefully.

Each staff member is represented by an HMM face model in DB.

W

HLP

Arnaud Hubaux An introduction to Hidden Markov Models

Page 60: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

SolutionTraining

Face segments are converted into 2D-DCT coefficients which formthe observation vector ODiscrete Cosine Transform: transforms an image from the spatialdomain to the frequency domain.Each HMM is trained with 5 different instances of the same face

Model

Initialization

Model

Model

Convergence

Reestimation

NO

ModelParameters

YES

Feature

Extraction

Extraction

BlockDataTraining

Arnaud Hubaux An introduction to Hidden Markov Models

Page 61: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

SolutionRecognition

O is obtained as in the training phaseThe matching face f is such that:

P(O|λf ) = argmaxn

P(O|λn)

λ1

ComputationProbability

ProbabilityComputation

Probability

Computation

λ

λ

2

N

Extraction

Block

Feature

Extraction

Image

Test

Maximum

Selection

Model

Recognized

Arnaud Hubaux An introduction to Hidden Markov Models

Page 62: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

Information extractionFace recognition

Face recognitionModelSolution

SolutionResults

Horizontal lines show classes identified with the Viterbi algorithm

Crossed images represent incorrect classifications

Arnaud Hubaux An introduction to Hidden Markov Models

Page 63: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Part IV

Conclusion

Arnaud Hubaux An introduction to Hidden Markov Models

Page 64: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Outline

8 Summary

9 References

Arnaud Hubaux An introduction to Hidden Markov Models

Page 65: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Summary

We have seen that HMM:

strongly relies on statistical theories

can be used in many fields

model highly influences the results

initialization and training is not an easy business

training is unsupervised

Arnaud Hubaux An introduction to Hidden Markov Models

Page 66: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Outline

8 Summary

9 References

Arnaud Hubaux An introduction to Hidden Markov Models

Page 67: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Lawrence R. Rabiner.A tutorial on hidden Markov models and selected applications inspeech recognition.In Proceedings of the IEEE, volume 77, pages 267–296, SanFrancisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc.

Kristie Seymore, Andrew McCallum, and Roni Rosenfeld.Learning hidden Markov model structure for information extraction.In AAAI 99 Workshop on Machine Learning for InformationExtraction, 1999.

A. Nefian and M. Hayes.Hidden Markov models for face recognition.In ICASSP98, pp. 2721–2724, 98.

Jeff Bilmes.What HMM can do.Technical Report UWEETR-2002-0003, Dept of EE, University ofWashington, January 2002.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 68: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Daniel Ocone.Hidden Markov models.In Discrete and Probabilistic Models in Biology, 2006.Chapter 8.

Thomas G Dietterich.Machine learning for sequential data: A review.In Proceedings of the Joint IAPR International Workshop onStructural, Syntactic, and Statistical Pattern Recognition, pages15–30, London, UK, 2002. Springer-Verlag.

Arnaud Hubaux An introduction to Hidden Markov Models

Page 69: An introduction to Hidden Markov Models - UNamur · Introduction HMM Overview Stochastic process Markov chain Markov chain Definition (cont’d) The event {Q t = q i} is seen as

SummaryReferences

Thank you for your attention.

Any questions ?

Arnaud Hubaux An introduction to Hidden Markov Models


Recommended