+ All Categories
Home > Documents > A Tutorial on Hidden Markov Models.pdf

A Tutorial on Hidden Markov Models.pdf

Date post: 06-Jul-2018
Category:
Upload: nicky-muscat
View: 244 times
Download: 0 times
Share this document with a friend

of 22

Transcript
  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    1/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    A Tutorial on Hidden Markov Modelsby Lawrence R. Rabiner

    in Readings in speech recognition (1990)

    Marcin Marsza lek

    Visual Geometry Group

    16 February 2009

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Figure:  Andrey Markov

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    2/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Signals and signal models

    Real-world processes produce signals, i.e., observable outputsdiscrete (from a codebook) vs continousstationary (with const. statistical properties) vs nonstationarypure vs corrupted (by noise)

    Signal models provide basis for

    signal analysis, e.g., simulationsignal processing, e.g., noise removalsignal recognition, e.g., identification

    Signal models can be

    deterministic – exploit some known properties of a signal

    statistical – characterize statistical properties of a signal

    Statistical signal models

    Gaussian processesPoisson processes

    Markov processesHidden Markov processes

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    3/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Signals and signal models

    Real-world processes produce signals, i.e., observable outputsdiscrete (from a codebook) vs continousstationary (with const. statistical properties) vs nonstationarypure vs corrupted (by noise)

    Signal models provide basis for

    signal analysis, e.g., simulationsignal processing, e.g., noise removalsignal recognition, e.g., identification

    Signal models can be

    deterministic – exploit some known properties of a signal

    statistical – characterize statistical properties of a signal

    Statistical signal models

    Gaussian processesPoisson processes

    Markov processesHidden Markov processes

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Assumption

    Signal can be well characterized as a parametric randomprocess, and the parameters of the stochastic process can bedetermined in a precise, well-defined manner

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    4/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Discrete (observable) Markov model

    Figure:  A Markov chain with 5 states and selected transitions

    N   states:   S 1,S 2, ..., S N 

    In each time instant  t  = 1, 2, ...,T  a system changes(makes a transition) to state  q t 

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    I d i F d B k d P d Vi bi Al i h B W l h R i i E i

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    5/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Discrete (observable) Markov model

    For a special case of a first order Markov chain

    P (q t  = S  j |q t −1  = S i , t t −2  = S k ,...) = P (q t  = S  j |q t −1  = S i )

    Furthermore we only assume processes where right-hand side

    is time independent – const. state transition probabilities

    aij  = P (q t  = S  j |q t −1  = S  j ) 1 ≤ i , j  ≤ N 

    where

    aij  ≥ 0N 

     j =1

    aij  = 1

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    I t d ti F d B k d P d Vit bi Al ith B W l h R ti ti E t i

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    6/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Discrete hidden Markov model (DHMM)

    Figure:  Discrete HMM with 3 states and 4 possible outputs

    An observation is a probabilistic function of a state, i.e.,HMM is a doubly embedded stochastic process

    A DHMM is characterized byN   states  S  j   and  M  distinct observations  v k  (alphabet size)State transition probability distribution  AObservation symbol probability distribution  B Initial state distribution  π

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction Forward Backward Procedure Viterbi Algorithm Baum Welch Reestimation Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    7/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Discrete hidden Markov model (DHMM)

    We define the DHMM as  λ = (A,B , π)A = {aij }   aij  = P (q t +1 = S  j |q t  = S i ) 1 ≤ i , j  ≤ N B  = {b ik }   b ik  = P (O t  = v k |q t  = S i ) 1 ≤ i  ≤ N 

    1 ≤ k  ≤ M π = {πi }   πi  = P (q 1  = S i ) 1 ≤ i  ≤ N 

    This allows to generate an observation seq.  O  = O 1O 2...O T 1   Set  t  = 1, choose an initial state  q 1  = S i  according to the

    initial state distribution  π2   Choose  O t  = v k  according to the symbol probability

    distribution in state  S i , i.e.,  b ik 3   Transit to a new state  q 

    t +1 = S 

     j   according

    to the state transition probability distibutionfor state  S i , i.e.,  aij 

    4   Set  t  = t  + 1,if  t  

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    8/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    Three basic problems for HMMs

    Evaluation  Given the observation sequence  O  = O 1O 2...O T   anda model  λ = (A,B , π), how do we efficientlycompute  P (O |λ), i.e., the probability of theobservation sequence given the model

    Recognition  Given the observation sequence  O  = O 1O 2...O T   anda model  λ = (A,B , π), how do we choose acorresponding state sequence  Q  = q 1q 2...q T  which isoptimal in some sense, i.e., best explains theobservations

    Training  Given the observation sequence  O  = O 1O 2...O T , howdo we adjust the model parameters  λ = (A,B , π) tomaximize  P (O |λ)

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    9/22

    Introduction   Forward Backward Procedure   Viterbi Algorithm   Baum Welch Reestimation   Extensions

    Brute force solution to the evaluation problem

    We need  P (O |λ), i.e., the probability of the observationsequence  O  = O 1O 2...O T  given the model  λ

    So we can enumerate every possible state sequenceQ  = q 1q 2...q T 

    For a sample sequence  Q 

    P (O |Q , λ) =T 

    t =1

    P (O t |q t , λ) =T 

    t =1

    b q t O t 

    The probability of such a state sequence  Q   is

    P (Q |λ) = P (q 1)T 

    t =2

    P (q t |q t −1) = πq 1

    t =2

    aq t −1q t 

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction Forward-Backward Procedure Viterbi Algorithm Baum-Welch Reestimation Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    10/22

    Introduction   Forward Backward Procedure   Viterbi Algorithm   Baum Welch Reestimation   Extensions

    Brute force solution to the evaluation problem

    Therefore the joint probability

    P (O ,Q |λ) = P (Q |λ)P (O |Q , λ) = πq 1

    t =2

    aq t −1q t 

    t =1

    b q t O t 

    By considering all possible state sequences

    P (O |λ) =

    πq 1b q 1O 1

    t =2

    aq t −1q t b q t O t 

    Problem: order of  2TN T  calculations

    N T  possible state sequencesabout 2T  calculations for each sequence

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    11/22

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    12/22

    g

    Forward procedure

    Figure:  Operations forcomputing the forwardvariable  α j (t  + 1)

    Figure:   Computing  α j (t )in terms of a lattice

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/http://goback/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    13/22

    Backward procedure

    Figure:  Operations for

    computing the backwardvariable  β i (t )

    We define a backwardvariable  β i (t ) as theprobability of the partialobservation seq.   after time  t , given state  S i  at time  t 

    β i (t ) = P (O t +1Ot  + 2...O T |q t  = S i , λ)

    This can be computed inductively as well

    β i (T ) = 1 1 ≤ i  ≤ N 

    β i (t − 1) =N 

     j =1

    aij b  jO t β  j (t ) 2 ≤ t  ≤ T 

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/http://goback/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    14/22

    Uncovering the hidden state sequence

    Unlike for evaluation, there is no single “optimal” sequence

    Choose states which are individually most likely(maximizes the number of correct states)Find the single best state sequence(guarantees that the uncovered sequence is valid)

    The first choice means finding argmaxi  γ i (t ) for each  t , where

    γ i (t ) = P (q t  = S i |O , λ)

    In terms of forward and backward variables

    γ i (t ) =  P (O 1...O t , q t  = S i |λ)P (O t +1...O T |q t  = S i , λ)

    P (O |λ)

    γ i (t ) =  αi (t )β i (t )N  j =1 α j (t )β  j (t )

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    15/22

    Viterbi algorithm

    Finding the best single sequence means computingargmaxQ  P (Q |O , λ), equivalent to argmaxQ  P (Q ,O |λ)

    The Viterbi algorithm (dynamic programming) defines  δ  j (t ),i.e., the highest probability of a single path of length  t   whichaccounts for the observations and ends in state  S  j 

    δ  j (t ) = maxq 1,q 2,...,q t −1

    P (q 1q 2...q t  = j ,O 1O 2...O t |λ)

    By induction

    δ  j (1) = π j b  jO 1   1 ≤  j  ≤ N δ  j (t  + 1) =

    maxi 

    δ i (t )aij b  jO t +1   1 ≤ t  ≤ T  − 1

    With backtracking (keeping the maximizing argument for eacht   and   j ) we find the optimal solution

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    16/22

    Backtracking

    Figure:  Illustration of the backtracking procedure   c  G.W. Pulford

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    17/22

    Estimation of HMM parameters

    There is no known way to analytically solve for the modelwhich maximizes the probability of the observation sequence

    We can choose  λ = (A,B , π) which locally maximizes P (O |λ)gradient techniquesBaum-Welch reestimation (equivalent to EM)

    We need to define  ξ ij (t ), i.e., the probability of being in stateS i  at time  t  and in state  S  j  at time  t  + 1

    ξ ij (t ) = P (q t  = S i , q t +1 = S  j |O , λ)

    ξ ij (t ) = αi (t )aij b  jO 

    t +1

    β  j (t  + 1)

    P (O |λ)   =

    =  αi (t )aij b  jO t +1β  j (t  + 1)N i =1

    N  j =1 αi (t )aij b  jO t +1β  j (t  + 1)

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    18/22

    Estimation of HMM parameters

    Figure:  Operations for

    computing the  ξ ij (t )

    Recall that  γ i (t ) is a probability of state  S i  at time  t , hence

    γ i (t ) =N 

     j =1

    ξ ij (t )

    Now if we sum over the time index  t T −1t =1   γ i (t ) = expected number of times that  S i   is visited

    = expected number of  transitions from state  S i T −1t =1   ξ ij (t ) = expected number of  transitions from  S i   to  S  j 

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/http://goback/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    19/22

    Baum-Welch Reestimation

    Reestimation formulas

    π̄i  = γ i (1) āij  =

    T −1t =1   ξ ij (t )T −1t =1   γ i (t )

    b̄  jk  =

    O t =v k 

    γ  j (t )T 

    t =1 γ  j (t )

    Baum et al. proved that if current model is  λ = (A,B , π) andwe use the above to compute  λ̄ = (Ā, B̄ , π̄) then either

    λ̄ =  λ  – we are in a critical point of the likelihood functionP (O |λ̄) >  P (O |λ) – model  λ̄  is more likely

    If we iteratively reestimate the parameters we obtain amaximum likelihood estimate of the HMM

    Unfortunately this finds a local maximum and the surface canbe very complex

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    20/22

    Non-ergodic HMMs

    Until now we have only considered

    ergodic (fully connected) HMMsevery state can be reached fromany state in a finite number of steps

    Figure:  Ergodic HMM

    Left-right (Bakis) model good for speech recognition

    as time increases the state index increases or stays the samecan be extended to parallel left-right models

    Figure:  Left-right HMM

    Figure:  Parallel HMM

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    21/22

    Gaussian HMM (GMMM)

    HMMs can be used with continous observation densitiesWe can model such densities with Gaussian mixtures

    b  j O =M 

    m=1

    c  jm N (O,µ jm,U jm)

    Then the reestimation formulas are still simple

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    Introduction   Forward-Backward Procedure   Viterbi Algorithm   Baum-Welch Reestimation   Extensions

    http://find/

  • 8/17/2019 A Tutorial on Hidden Markov Models.pdf

    22/22

    More fun

    Autoregressive HMMs

    State Duration Density HMMs

    Discriminatively trained HMMs

    maximum mutual informationinstead of maximum likelihood

    HMMs in a similarity measure

    Conditional Random Fields canloosely be understood as ageneralization of an HMMs

    Figure:   Random Oxford

    fields   c  R. Tourtelot

    constant transition probabilities replaced with arbitraryfunctions that vary across the positions in the sequence of hidden states

    Marcin Marsza lek   A Tutorial on Hidden Markov Models

    http://find/

Recommended