An introduction to Hidden Markov Models
Arnaud Hubaux
Computer Science InstituteFUNDP - Namur
March 20, 2007
Academic year 2006-2007
Arnaud Hubaux An introduction to Hidden Markov Models
Outline
Part I ConceptsPart II Basic problemsPart III Case studyPart IV Conclusion
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
Part I
Concepts
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Outline
1 IntroductionOverviewStochastic processMarkov chain
2 HMM
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Overview
Firstly described in statistical papers in the mid 60s
stochastic process → Markov process → HMM
One of the first application was speech recognition in the mid 70s
Applications to the analysis of biological sequences (DNA) started inthe mid 80s
Nowadays they are used in:1 gesture and body motion recognition2 optical character recognition3 bioinformatics4 information extraction5 . . .
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Stochastic process
Definition ([Bil02])
A discrete-time stochastic process is a collection{Xt : 1 ≤ t ≤ T t ∈ IN} of random variables ordered by the discretetime index t.
In general, the distribution for each of the variables Xt can bearbitrary and different for each t.
There may also be arbitrary conditional independence relationshipsbetween different subsets of variables of the process.
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Markov chainDefinition (cont’d)
Definition ([Bil02])
A collection of discrete-valued random variables {Qt : t ≥ 1} forms annth-order Markov chain if it only depends on the n previous states, i.e.
P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Q1 = q1] =
P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Qt−n = qt−n]
∀q1, . . . , qt and n < t.In particular:
P[Qt = qt |Qt−1 = qt−1,Qt−2 = qt−2, . . . ,Q1 = q1] =
P[Qt = qt |Qt−1 = qt−1]
is a first-order Markov chain.
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Markov chainDefinition (cont’d)
The event {Qt = qi} is seen as if the chain is in state qi at time t
The event {Qt = qi ,Qt+1 = qj} is the transition from state qi tostate qj at time t
A discrete time homogeneous Markov chain can be seen as a finitestate automata (FSA) with conditional probabilites on transitions
Definition
A Markov process is a stochastic process that has the Markovproperty.
The term Markov chain is used to mean a discrete-time Markovprocess.
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Markov chainTransition probabilities
Statistical evolution of a Markov chain is determined by statetransition probabilities:
aij(t) = P[Qt = qj |Qt−1 = qi ]
⇒ function of states succession and current time
If the chain is time-independent, i.e. it does not depend on time t:(time-)homogeneous chain and:
∀t : aij(t) = aij
In an homogeneous chain, the (stochastic) transition matrix is Awhere ∀i , j :
aij = (A)ij
which is the probability to go from state i to state j with
∀i :∑
j
aij = 1 ∧ aij ≥ 0
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
OverviewStochastic processMarkov chain
Markov chainExample
Given the transition matrix:
A =
0.2 0.4 0.40.1 0.7 0.20.5 0.5 0
the FSA corresponding to this homogeneous first-order Markov chain is:
S1
S2 S3
0.4
0.4
0.2
0.1
0.20.7
0.5
0.5
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
Outline
1 Introduction
2 HMMDefinitionsBasic problemsImplementation problems
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsIntroductory example
Lets suppose a dishonest casino where you can play betting games suchas coin tossing. Tosses are supposed to be independent. Imagine nowthat the casino has two coins:
the first coin, c1, is fair
the second coin, c2 is unfair and has a probability of head H of 0.3and tail T of 0.7
The coin flipped is changed at every play with a probability of 0.1 toavoid awaking players’ suspicion.Note that each coin has a probability of 0.5 to be chosen for the first play.How is it possible to model such a problem ?How can we, given an observed result, determine the coins that havebeen tossed ?
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsNotations
Lets define:1 A set S = {S1,S2, . . . ,SN} of N states where the observed state at
time t will be noted qt2 An alphabet V = {v1, v2, . . . , vM} of size M corresponding to the
output symbols3 A state transition probability distribution matrix A = {aij} where
aij = P[qt = Sj |qt−1 = Si ] 1 ≤ i , j ≤ N
4 An observation symbol probability distribution matrix B = {bj(k)}where
bj(k) = P[vk at t|qt = Sj ] 1 ≤ j ≤ N
1 ≤ k ≤ M
5 An initial state distribution vector π = {πi} where
πi = P[q1 = Si ] 1 ≤ i ≤ N
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsHMM
Definition ([Rab90])
A complete specification of an HMM is defined with:
1 the probability measures of A, B and π noted
λ = {A,B, π}
where λ is called an HMM model
2 the observation symbols alphabet V
3 the N and M model parameters
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsObservation sequence
Definition ([Rab90])
HMM can be used to generate an observation sequence
O = O1O2 . . .OT
where Ot ∈ V , 1 ≤ t ≤ T as follows
1 choose q1 = Si according to π
2 set t = 1
3 choose Ot = vk according to bi (k) of state Si
4 transit to qt+1 = Sj according to aij of state Si
5 if t < T then set t = t + 1 and go to 3 else end
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsIntroductory example (cont’d)
Given the previous example, there comes:
1 the number of states N = 2
2 the alphabet is V = {H,T} and thus M = 2
3 λ = {A,B, π} is
A =
[0.9 0.10.1 0.9
]B =
[0.5 0.50.3 0.7
]π =
[0.50.5
]where A is the transition matrix associated to transitions between c1
and c2, B is the observation symbols probability matrix and π theinitial transition probability vector.
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsRepresentations of HMM
FSA: shows only the state transitions
S1
S2 S3
⇒ easy analysis of the HMM topology
Distribution dependence: shows the dependence between theobserved symbols and the hidden states
. . . qt+1 qt+2 qt+3 qt+4 . . .
Ot+1 Ot+2 Ot+3 Ot+4
⇒ better view of the distributions dependence
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsRepresentations of HMM (cont’d)
Time step evolution: shows the collection of states and thetransitions between states at each successive time step
S1
S2
S3
t1 t2 t3
. . .
. . .
. . .
⇒ possible to represent non-homogeneous Markov Chain
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsModels of HMM
Ergodic model: every state can be reached from any state in asingle step, i.e. ∀i , j : aij 6= 0
S1
S2 S3
Left-Right/Bakis model: as time increases, state indexes increase,
i.e. ∀i , j : j < i → aij = 0 and πi =
{0 i 6= 11 i = 1
S1 S2 S3 S4
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
DefinitionsHMM variants
Null transition model: some transitions produce no output, i.e.jump from one state to another without producing an observablesymbol
S1 S2 S3 S4
Tied states model: tie states whose parameters are the same. Ithas the advantage to reduce the number of parameters to alter whentraining the model
State duration model: add explicit state duration (instead ofsimple loop)
. . .
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
Basic problems [Rab90]Overview
1 Evaluation problemHow to efficiently compute P(O|λ) for some given O and λ ?
2 State sequence problemHow to find the state sequence Q = q1q2 . . . qT which best matchessome given O and λ ?
3 Training problemHow to adjust the parameters of λ to maximize P(O|λ) for a givenO ?
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
Implementation problems [Rab90]Overview
ScalingHow to avoid precision loss (computer limits) due to hugecomputations ?
→ scaling procedures
Multiple observation sequenceHow to ensure that the observation sequence run through everystates of the model ?
→ use multiple observation sequences
Initial estimateHow to initialise the values of A, B and π ?
→ A, π: random, uniform→ B: manual
Arnaud Hubaux An introduction to Hidden Markov Models
IntroductionHMM
DefinitionsBasic problemsImplementation problems
Implementation problems (cont’d)Overview
Insufficient trainingHow is it possible to ensure that the model has been sufficientlytrained ?
→ increase the size of the observation set→ reduce the size of the model→ segment the model training
Model choiceHow to chose the model architecture, size and observation symbols ?
→ trial/error→ best practice
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Part II
Basic Problems
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Outline
3 Evaluation problemSimple approachForward/Backward procedure
4 State sequence problem
5 Training problem
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Simple approach
Given O = O1O2 . . .OT and λ = {A,B, π} enumerate every possiblestate sequence of length T .For a given state sequence with q1 as initial state
Q = q1q2 . . . qT
the probability of observing O for this sequence is
P(O|Q, λ) =T∏
t=1
P(Ot |qt , λ)
which is equal to
P(O|Q, λ) = bq1(O1)bq2(O2) . . . bqT(OT )
where the probability of such sequence is
P(Q|λ) = πq1aq1q2aq2q3 . . . aqT−1qT
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Simple approach (cont’d)
The joint probability of O and Q, i.e. their simultaneous occurrence is
P(O,Q|λ) = P(O|Q, λ)P(Q|λ)
From this we can derive
P(O|λ) =∑all Q
P(O|Q, λ)P(Q|λ)
=∑
q1q2...qT∈Q
πq1bq1(O1)aq1q2bq2(O2) . . . aqT−1qTbqT
(OT )
Such an approach implies:
1 (2T − 1)NT multiplications2 NT − 1 additions
where NT is the number of possible state sequences
⇒ O(2TNT ) complexity
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureForward variable definition
Definition
Consider the forward variable αt(i) defined as
αt(i) = P(O1O2 . . .Ot , qt = Si |λ)
which is the probability to observe O1O2 . . .Ot until time t and to be instate Si at time t given the model λ
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureForward procedure algorithm
αt(i) can be solved inductively:
1 Initialization:
α1(i) = πibi (O1) 1 ≤ i ≤ N
2 Induction:
αt+1(j) =
[N∑
i=1
αt(i)aij
]bj(Ot+1) 1 ≤ t ≤ T − 1
1 ≤ j ≤ N
3 Termination:
P(O|λ) =N∑
i=1
αT (i)
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureForward procedure evaluation
This procedure requires:
1 N(N + 1)(T − 1) + N multiplications
2 N(N − 1)(T − 1) additions
⇒ O(N2T ) complexity
For N = 5 and T = 100 there comes:
Forward Simple# computations ≈ 2500 1072
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureForward procedure latice
Implementation of the computation of αt(i) in terms of:
states i
a lattice of observations t
0 1 2 T
1
2
N
Observation (t)
Sta
te
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureBackward variable definition
Definition
Consider the backward variable βt(i) defined as
βt(i) = P(Ot+1Ot+2 . . .OT |qt = Si , λ)
which is probability of the partial observation sequence from t + 1 to T ,given state Si at time t and the model λ
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Simple approachForward/Backward procedure
Forward/Backward procedureBackward procedure algorithm
βt(i) can be solved inductively:
1 Initialization:
βT (i) = 1 1 ≤ i ≤ N
2 Induction:
βt(i) =N∑
j=1
aijbj(Ot+1)βt+1(j) t = T − 1,T − 2, . . . , 1
1 ≤ i ≤ N
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Outline
3 Evaluation problem
4 State sequence problemProblem specificationIndividually most likely state methodViterbi algorithm
5 Training problem
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Problem specification
Observation: there is not only a single way of finding the optimal statesequence.Solutions:
1 choose individually most likely states qt
2 find the single best state sequence
Solution 1 maximizes the number of correct states by choosing the mostlikely state for each t.Solution 2 is the most widely used method and the state sequencedetermination is achieved by the Viterbi algorithm.
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Individually most likely state method
Lets define the probability of being in state Si at time t given a model λand an observation sequence O:
γt(i) = P(qt = Si |O, λ)
=αt(i)βt(i)
P(O|λ)
=αt(i)βt(i)∑Ni=1 αt(i)βt(i)
whereN∑
i=1
γt(i) = 1
The most likely state qt at time t is
qt = argmax1≤i≤N
[γt(i)] 1 ≤ t ≤ T
⇒ no regard to the probability of occurrence of the sequence of states⇒ invalid states sequences may appear
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmIntroduction
Goal: find the best state sequence Q = q1q2 . . . qT for the correspondingobservation O = O1O2 . . .OT
Lets define
δt(i) = maxq1q2...qt−1
P[q1q2 . . . qt = Si ,O1O2 . . .Ot |λ]
which is the highest probability along a single path, at time t, whichaccounts for the first t observations and ends in state Si with inductionstep
δt+1(j) = [maxiδt(i)aij ]bj(Ot+1)
where arguments maximizing δt+1(j) for each t and j are stored in anarray ψt(j)
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmAlgorithm
1 Initialization:
δ1(i) = πibi (01) 1 ≤ i ≤ Nψ1(i) = 0
2 Recursion:
δt(j) = max1≤i≤N
[δt−1(i)aij ]bj(Ot) 2 ≤ t ≤ T , 1 ≤ j ≤ N
ψt(j) = argmax1≤i≤N
[δt−1(i)aij ] 2 ≤ t ≤ T , 1 ≤ j ≤ N
3 Termination:P∗ = max
1≤i≤N[δT (i)]
q∗T = argmax1≤i≤N
[δT (i)]
4 State sequence backtracking:
q∗t = ψt+1(q∗t+1) t = T − 1,T − 2, . . . , 1
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmAlgorithm (con’d)
The 3 first steps are quite similar to the forward procedure where the∑has been replaced by max
The computation may also be implemented with a lattice
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmExample [Oco06]
Lets define
the alphabetV = {a, b, c}
the model λ = {A,B, π} is given by
A =
0.3 0.3 0.40.4 0.4 0.20.1 0.6 0.3
B =
0.25 0.35 0.40.1 0.25 0.650.5 0.45 0.05
π =
0.20.30.5
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmExample (cont’d)
Goal: find the path q1, q2, q3 which best matches O = a, b, cSolution:
Initialisation:γ1(1) = 0.2× 0.25 = 0.05γ1(2) = 0.3× 0.1 = 0.03γ1(3) = 0.5× 0.5 = 0.25
ψ1 = [0 0 0]
Recursion:
γ2(1) = max [0.05× 0.3, 0.03× 0.4, 0.25× 0.1]× 0.35 = 0.00875γ2(2) = max [0.05× 0.3, 0.03× 0.4, 0.25× 0.6]× 0.25 = 0.0375γ2(3) = max [0.05× 0.4, 0.03× 0.2, 0.25× 0.3]× 0.45 = 0.03375
ψ2 = [3 3 3]
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmExample (cont’d)
Solution:
Recursion:
γ3(1) = max [0.00875× 0.3, 0.0375× 0.4, 0.03375× 0.1]× 0.4= 0.006
γ3(2) = max [0.00875× 0.3, 0.0375× 0.4, 0.03375× 0.6]× 0.65= 0.013625
γ3(3) = max [0.00875× 0.4, 0.0375× 0.2, 0.03375× 0.3]× 0.5= 0.0050625
ψ3 = [2 3 3]
Terminaison:
P∗ = max [0.006, 0.013625, 0.0050625] = 0.013625q∗T = argmax [0.006, 0.013625, 0.0050625] = 2
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationIndividually most likely state methodViterbi algorithm
Viterbi algorithmExample (cont’d)
Solution:
Backtracking :
S1
a
S1
b
S1
c
0.006
S2 S2 S2 0.013625
S3 S3 S3 0.0050625
0
⇒ Maximum probability path: q1 = S3, q2 = S3 and q3 = S2
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Outline
3 Evaluation problem
4 State sequence problem
5 Training problemProblem specificationHMM training
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Problem specification
Observation: adjusting the model λ parameters to maximize theprobability of the observed sequence is the most intricate problem to solve
Issue: no known way to analytically solve the problem
Solution: an iterative procedure maximizing P(O|λ) given adjusted λparameters: Baum-Welch Method(other alternative: gradient technique)
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Baum-Welch Method
Lets define
ξt(i , j) = P(qt = Si , qt+1 = Sj |O, λ)
which is the probability of being in state Si at time t and in state Sj attime t + 1.It can also be defined in terms of the forward/backward variables
ξt(i , j) =P(qt = Si , qt+1 = Sj ,O|λ)
P(O|λ)
=αt(i)aijbj(Ot+1)βt+1(j)
P(O|λ)
=αt(i)aijbj(Ot+1)βt+1(j)∑N
i=1
∑Nj=1 αt(i)aijbj(Ot+1)βt+1(j)
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Baum-Welch Method (cont’d)
Lets define
γt(i) =N∑
j=1
ξt(i , j)
which is the probability of being in state Si at time t for a given O and λ.From this we can derive:
1 the expected number of transitions from Si from 1 to T − 1, i.e.
T−1∑t=1
γt(i)
2 the expected number of transitions from Si to Sj from 1 to T − 1,i.e.
T−1∑t=1
ξt(i , j)
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Baum-Welch Method (cont’d)
From the previous formula, we can derive re-estimations of A, B and π.
1 the expected frequency (number of times) in state Si at time t = 1:
πi = γ1(i) s.t.N∑
i=1
πi = 1
2 theexpected number of transitions from Si to Sj
expected number of transitions from Si:
aij =
∑T−1t=1 ξt(i , j)∑T−1t=1 γt(i)
s.t.N∑
j=1
aij = 1 1 ≤ i ≤ N
3 theexpected number of times in Sj observing symbol vk
expected number of times in Sj:
bj(k) =
∑Tt=1
s.t.Ot=vt
γt(j)∑Tt=1 γt(j)
s.t.M∑
k=1
bj(k) = 1 1 ≤ j ≤ N
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Baum-Welch Method (cont’d)
The re-estimated model λ = {A,B, π} may imply:
1 P(O|λ) < P(O|λ): model λ is more likely than λ
2 P(O|λ) = P(O|λ): λ defines the critical point of the likelihoodfunction
3 P(O|λ) > P(O|λ): model λ is more likely than λ1 iteratively use λ as the new model⇒ the re-estimation procedure will improve the probability ofobserving O
2 stop when some limiting point is reached
The final result is called the maximum likelihood estimate of the HMM
Issue: avoid local maxima
Arnaud Hubaux An introduction to Hidden Markov Models
Evaluation problemState sequence problem
Training problem
Problem specificationHMM training
Baum-Welch Method (cont’d)
In order to maximize P(O|λ) one can use the Baum’s auxiliary function:
Q(λ, λ) =∑Q
P(Q|O, λ)log [P(O,Q|λ)]
over λ which leads to increased likelihood, i.e.:
maxλ
[Q(λ, λ)] ⇒ P(O|λ) ≥ P(O|λ)
At the end, the likelihood function converges to a critical point
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Part III
Case Study
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
ContextModelSolution
Outline
6 Information extractionContextModelSolution
7 Face recognition
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
ContextModelSolution
Context [SMR99]
Imagine you have a library of computer science papers at your disposal.What kind of information could you extract from their headers:
title
author
names
affiliation
address
notes
. . .
How could we tag such words ?
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
ContextModelSolution
Model
Each element of this list will be considered as a class that we want toextract. Each class is represented by one or more states in the model.Each state emits words from a class-specific distribution.
abstract end
keyword
note
addresspubnum
emailaffiliation
dateauthor
0.84
.01
.01
.02
.01
.11
.61
.7
.93
.19
.04
.87
.09
.96
.1
.17
.73.97
.03
.04.24
.07.11
.03
.88
.12
.04.08
note
title
pubnum0.4
start
0.11
0.93
0.88
0.86
.07
0.6
0.03
0.12
We can learn by training:
1 class-specific distribution
2 state transition probabilities
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
ContextModelSolution
Solution
Label elements from headers with classes.In order to achieve this goal:
1 treat each word from the header as an observation
2 recover the most-likely state sequence with the Viterbi algorithm
3 assign the states to the words as their class tag
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
Outline
6 Information extraction
7 Face recognitionFace recognitionModelSolution
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
Context [NH]
Imagine you have a list of staff members.An identification card containing information such as the name, thefunction and the face is associated to each member.How could you develop a system that automatically recognizes theworkers as they enter the company building ?More precisely, how is it possible to recognize elements as the:
hair
forehead
eyes
nose
mouth
and find matchings with saved information ?
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
Model
As these elements appear in a fixed order we can define the followingHMM model:
SH SF SE SN SM
a12
a11
a23
a22
a34
a33 a44
a45
a55
Where states S mean:
SH : the hair
SF : the forehead
SE : the eyes
SN : the nose
SM : the mouth
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
Model (con’t)
Lets define:
W : the image width
H: the image height
L: the segment height
P: the height of the overlapping between the segments
T : the number of segments constituting the image is given by:
T =H − L
L− P+ 1
which must be chosen very carefully.
Each staff member is represented by an HMM face model in DB.
W
HLP
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
SolutionTraining
Face segments are converted into 2D-DCT coefficients which formthe observation vector ODiscrete Cosine Transform: transforms an image from the spatialdomain to the frequency domain.Each HMM is trained with 5 different instances of the same face
Model
Initialization
Model
Model
Convergence
Reestimation
NO
ModelParameters
YES
Feature
Extraction
Extraction
BlockDataTraining
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
SolutionRecognition
O is obtained as in the training phaseThe matching face f is such that:
P(O|λf ) = argmaxn
P(O|λn)
λ1
ComputationProbability
ProbabilityComputation
Probability
Computation
λ
λ
2
N
Extraction
Block
Feature
Extraction
Image
Test
Maximum
Selection
Model
Recognized
Arnaud Hubaux An introduction to Hidden Markov Models
Information extractionFace recognition
Face recognitionModelSolution
SolutionResults
Horizontal lines show classes identified with the Viterbi algorithm
Crossed images represent incorrect classifications
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Part IV
Conclusion
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Outline
8 Summary
9 References
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Summary
We have seen that HMM:
strongly relies on statistical theories
can be used in many fields
model highly influences the results
initialization and training is not an easy business
training is unsupervised
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Outline
8 Summary
9 References
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Lawrence R. Rabiner.A tutorial on hidden Markov models and selected applications inspeech recognition.In Proceedings of the IEEE, volume 77, pages 267–296, SanFrancisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc.
Kristie Seymore, Andrew McCallum, and Roni Rosenfeld.Learning hidden Markov model structure for information extraction.In AAAI 99 Workshop on Machine Learning for InformationExtraction, 1999.
A. Nefian and M. Hayes.Hidden Markov models for face recognition.In ICASSP98, pp. 2721–2724, 98.
Jeff Bilmes.What HMM can do.Technical Report UWEETR-2002-0003, Dept of EE, University ofWashington, January 2002.
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Daniel Ocone.Hidden Markov models.In Discrete and Probabilistic Models in Biology, 2006.Chapter 8.
Thomas G Dietterich.Machine learning for sequential data: A review.In Proceedings of the Joint IAPR International Workshop onStructural, Syntactic, and Statistical Pattern Recognition, pages15–30, London, UK, 2002. Springer-Verlag.
Arnaud Hubaux An introduction to Hidden Markov Models
SummaryReferences
Thank you for your attention.
Any questions ?
Arnaud Hubaux An introduction to Hidden Markov Models