+ All Categories
Home > Documents > Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model...

Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model...

Date post: 15-Apr-2018
Category:
Upload: doanhanh
View: 222 times
Download: 3 times
Share this document with a friend
93
Hidden Markov Models Steve Renals Automatic Speech Recognition— ASR Lecture 5 2 February 2009 Steve Renals Hidden Markov Models 1
Transcript
Page 1: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Hidden Markov Models

Steve Renals

Automatic Speech Recognition— ASR Lecture 52 February 2009

Steve Renals Hidden Markov Models 1

Page 2: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Overview

Fundamentals of HMMs

Today

Statistical Speech Recognition

HMM Acoustic Models

Forward algorithm

Viterbi algorithm

Thursday

Forward-backward training

Extension to mixture models

Steve Renals Hidden Markov Models 2

Page 3: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Overview

Fundamentals of HMMs

Today

Statistical Speech Recognition

HMM Acoustic Models

Forward algorithm

Viterbi algorithm

Thursday

Forward-backward training

Extension to mixture models

Steve Renals Hidden Markov Models 2

Page 4: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Variability in speech recognition

Several sources of variation

Size Number of word types in vocabulary, perplexity

Style Continuously spoken or isolated? Planned monologueor spontaneous conversation?

Speaker Tuned for a particular speaker, orspeaker-independent? Adaptation to speakercharacteristics and accent

Acoustic environment Noise, competing speakers, channelconditions (microphone, phone line, room acoustics)

Steve Renals Hidden Markov Models 3

Page 5: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Variability in speech recognition

Several sources of variation

Size Number of word types in vocabulary, perplexity

Style Continuously spoken or isolated? Planned monologueor spontaneous conversation?

Speaker Tuned for a particular speaker, orspeaker-independent? Adaptation to speakercharacteristics and accent

Acoustic environment Noise, competing speakers, channelconditions (microphone, phone line, room acoustics)

Steve Renals Hidden Markov Models 3

Page 6: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Variability in speech recognition

Several sources of variation

Size Number of word types in vocabulary, perplexity

Style Continuously spoken or isolated? Planned monologueor spontaneous conversation?

Speaker Tuned for a particular speaker, orspeaker-independent? Adaptation to speakercharacteristics and accent

Acoustic environment Noise, competing speakers, channelconditions (microphone, phone line, room acoustics)

Steve Renals Hidden Markov Models 3

Page 7: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Variability in speech recognition

Several sources of variation

Size Number of word types in vocabulary, perplexity

Style Continuously spoken or isolated? Planned monologueor spontaneous conversation?

Speaker Tuned for a particular speaker, orspeaker-independent? Adaptation to speakercharacteristics and accent

Acoustic environment Noise, competing speakers, channelconditions (microphone, phone line, room acoustics)

Steve Renals Hidden Markov Models 3

Page 8: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Linguistic Knowledge or Machine Learning

Intense effort needed to derive and encode linguistic rules thatcover all the language

Very difficult to take account of the variability of spokenlanguage with such approaches

Data-driven machine learning: Construct simple models ofspeech which can be learned from large amounts of data(thousands of hours of speech recordings)

Steve Renals Hidden Markov Models 4

Page 9: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Statistical Speech Recognition

Steve Renals Hidden Markov Models 5

Page 10: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Fundamental Equation of Statistical Speech Recognition

If X is the sequence of acoustic feature vectors (observations) and

W denotes a word sequence, the most likely word sequence W∗ is

given by

W∗ = arg maxW

P(W | X)

Applying Bayes’ Theorem:

P(W | X) =p(X |W)P(W)

p(X)

∝ p(X |W)P(W)

W∗ = arg maxW

p(X |W)︸ ︷︷ ︸Acoustic

model

P(W)︸ ︷︷ ︸Language

model

Steve Renals Hidden Markov Models 6

Page 11: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Fundamental Equation of Statistical Speech Recognition

If X is the sequence of acoustic feature vectors (observations) and

W denotes a word sequence, the most likely word sequence W∗ is

given by

W∗ = arg maxW

P(W | X)

Applying Bayes’ Theorem:

P(W | X) =p(X |W)P(W)

p(X)

∝ p(X |W)P(W)

W∗ = arg maxW

p(X |W)︸ ︷︷ ︸Acoustic

model

P(W)︸ ︷︷ ︸Language

model

Steve Renals Hidden Markov Models 6

Page 12: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Statistical speech recognition

Statistical models offer a statistical “guarantee” — see the licenceconditions of the best known automatic dictation system, forexample:

Licensee understands that speech recognition is astatistical process and that recognition errors areinherent in the process. licensee acknowledges that itis licensee s responsibility to correct recognition errorsbefore using the results of the recognition.

Steve Renals Hidden Markov Models 7

Page 13: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Hidden Markov Models

Steve Renals Hidden Markov Models 8

Page 14: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

HMM Acoustic Model

s(t!1) s(t) s(t+1)

x(t + 1)x(t ! 1) x(t)

Hidden state s and observed acoustic features x

p(X |W) =∑

Q

p(X | Q)P(Q |W)

Q is a sequence of pronunciations

Steve Renals Hidden Markov Models 9

Page 15: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

HMM Acoustic Model

s(t!1) s(t) s(t+1)

x(t + 1)x(t ! 1) x(t)

Hidden state s and observed acoustic features x

p(X |W) = maxQ

p(X | Q)P(Q |W)

Q is a sequence of pronunciations

Steve Renals Hidden Markov Models 9

Page 16: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Continuous Density HMM

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Probabilistic finite state automaton

Paramaters λ:

Transition probabilities: akj = P(sj | sk)

Output probability density function: bj(x) = p(x | sj)

Steve Renals Hidden Markov Models 10

Page 17: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Continuous Density HMM

s1 s2 s3 sEsI

x3x1 x2 x4 x5 x6

Probabilistic finite state automaton

Paramaters λ:

Transition probabilities: akj = P(sj | sk)

Output probability density function: bj(x) = p(x | sj)

Steve Renals Hidden Markov Models 10

Page 18: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

HMM Assumptions

s(t!1) s(t) s(t+1)

x(t + 1)x(t ! 1) x(t)

1 Observation independence An acoustic observation x isconditionally independent of all other observations given thestate that generated it

2 Markov process A state is conditionally independent of allother states given the previous state

Steve Renals Hidden Markov Models 11

Page 19: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Output distribution

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Single multivariate Gaussian with mean µj , covariance matrix Σj :

bj(x) = p(x | sj) = N (x; µj ,Σj)

M-component Gaussian mixture model:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Steve Renals Hidden Markov Models 12

Page 20: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Output distribution

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Single multivariate Gaussian with mean µj , covariance matrix Σj :

bj(x) = p(x | sj) = N (x; µj ,Σj)

M-component Gaussian mixture model:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Steve Renals Hidden Markov Models 12

Page 21: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Output distribution

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Single multivariate Gaussian with mean µj , covariance matrix Σj :

bj(x) = p(x | sj) = N (x; µj ,Σj)

M-component Gaussian mixture model:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Steve Renals Hidden Markov Models 12

Page 22: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Output distribution

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Single multivariate Gaussian with mean µj , covariance matrix Σj :

bj(x) = p(x | sj) = N (x; µj ,Σj)

M-component Gaussian mixture model:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Steve Renals Hidden Markov Models 12

Page 23: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models 13

Page 24: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models 13

Page 25: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models 13

Page 26: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward algorithm

Goal: determine p(X | λ)

Sum over all possible state sequences s1s2 . . . sT that couldresult in the observation sequence X

Rather than enumerating each sequence, compute theprobabilities recursively (exploiting the Markov assumption)

Steve Renals Hidden Markov Models 14

Page 27: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward algorithm

Goal: determine p(X | λ)

Sum over all possible state sequences s1s2 . . . sT that couldresult in the observation sequence X

Rather than enumerating each sequence, compute theprobabilities recursively (exploiting the Markov assumption)

Steve Renals Hidden Markov Models 14

Page 28: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Recursive algorithms on HMMs

Visualize the problem as a state-time trellis

k

i

j

i

j

k

i

j

k

t-1 t t+1

Steve Renals Hidden Markov Models 15

Page 29: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward algorithm

Goal: determine p(X | λ)

Sum over all possible state sequences s1s2 . . . sT that couldresult in the observation sequence X

Rather than enumerating each sequence, compute theprobabilities recursively (exploiting the Markov assumption)

Forward probability, αt(sj): the probability of observing theobservation sequence x1 . . . xt and being in state sj at time t:

αt(sj) = p(x1, . . . , xt ,S(t) = sj | λ)

Steve Renals Hidden Markov Models 16

Page 30: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward algorithm

Goal: determine p(X | λ)

Sum over all possible state sequences s1s2 . . . sT that couldresult in the observation sequence X

Rather than enumerating each sequence, compute theprobabilities recursively (exploiting the Markov assumption)

Forward probability, αt(sj): the probability of observing theobservation sequence x1 . . . xt and being in state sj at time t:

αt(sj) = p(x1, . . . , xt ,S(t) = sj | λ)

Steve Renals Hidden Markov Models 16

Page 31: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward recursion

Initialization

α0(sI ) = 1

α0(sj) = 0 if sj 6= sI

Recursion

αt(sj) =N∑

i=1

αt−1(si )aijbj(xt)

Termination

p(X | λ) = αT (sE ) =N∑

i=1

αT (si )aiE

Steve Renals Hidden Markov Models 17

Page 32: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward recursion

Initialization

α0(sI ) = 1

α0(sj) = 0 if sj 6= sI

Recursion

αt(sj) =N∑

i=1

αt−1(si )aijbj(xt)

Termination

p(X | λ) = αT (sE ) =N∑

i=1

αT (si )aiE

Steve Renals Hidden Markov Models 17

Page 33: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: The Forward recursion

Initialization

α0(sI ) = 1

α0(sj) = 0 if sj 6= sI

Recursion

αt(sj) =N∑

i=1

αt−1(si )aijbj(xt)

Termination

p(X | λ) = αT (sE ) =N∑

i=1

αT (si )aiE

Steve Renals Hidden Markov Models 17

Page 34: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Interim Summary

Framework for statistical speech recognition

HMM acoustic models

HMM likelihood computation: the Forward algorithm

Reading

Jurafsky and Martin (2008). Speech and LanguageProcessing(2nd ed.): sections 6.1–6.5; 9.2; 9.4.Gales and Young (2007). “The Application of Hidden MarkovModels in Speech Recognition”, Foundations and Trends inSignal Processing, 1 (3), 195–304: section 2.2.Rabiner and Juang (1989). “An introduction to hidden Markovmodels”, IEEE ASSP Magazine, 3 (1), 4–16.

Steve Renals Hidden Markov Models 18

Page 35: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Hidden Markov Models (part 2)

Steve Renals

Automatic Speech Recognition— ASR Lecture 65 February 2009

Steve Renals Hidden Markov Models (part 2) 19

Page 36: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Overview

Fundamentals of HMMs

Previously

Statistical Speech Recognition

HMM Acoustic Models

Forward algorithm

Today

Viterbi algorithm

Forward-backward training

Extension to mixture models

Steve Renals Hidden Markov Models (part 2) 20

Page 37: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Overview

Fundamentals of HMMs

Previously

Statistical Speech Recognition

HMM Acoustic Models

Forward algorithm

Today

Viterbi algorithm

Forward-backward training

Extension to mixture models

Steve Renals Hidden Markov Models (part 2) 20

Page 38: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Continuous Density HMM

s1 s2 s3 sEP(s2 | s1)

P(s2 | s2)

p(x | s2)

x

p(x | s1)

x x

P(s1|sI)

p(x | s3)

sIP(s3 | s2) P(sE | s3)

P(s3 | s3)P(s1 | s1)

Probabilistic finite state automaton

Paramaters λ:

Transition probabilities: akj = P(sj | sk)

Output probability density function: bj(x) = p(x | sj)

Steve Renals Hidden Markov Models (part 2) 21

Page 39: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Continuous Density HMM

s1 s2 s3 sEsI

x3x1 x2 x4 x5 x6

Probabilistic finite state automaton

Paramaters λ:

Transition probabilities: akj = P(sj | sk)

Output probability density function: bj(x) = p(x | sj)

Steve Renals Hidden Markov Models (part 2) 21

Page 40: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models (part 2) 22

Page 41: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models (part 2) 22

Page 42: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

The three problems of HMMs

Working with HMMs requires the solution of three problems:

1 Likelihood Determine the overall likelihood of an observationsequence X = (x1, . . . , xt , . . . , xT ) being generated by anHMM

2 Decoding Given an observation sequence and an HMM,determine the most probable hidden state sequence

3 Training Given an observation sequence and an HMM, learnthe best HMM parameters λ = {{ajk}, {bj()}}

Steve Renals Hidden Markov Models (part 2) 22

Page 43: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

1. Likelihood: Forward Recursion

αt(sj) = p(x1, . . . , xt ,S(t) = sj | λ)

k

i

j

t-1

i

j

k

t

i

j

k

t+1

!t!1(sk)

!t!1(s j)

!t!1(si)

aki

a ji

aiib j(xt)! !t(si)

Steve Renals Hidden Markov Models (part 2) 23

Page 44: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi approximation

Instead of summing over all possible state sequences, justconsider the most likely

Achieve this by changing the summation to a maximisation inthe recursion:

Vt(sj) = maxi

Vt−1(si )aijbj(xt)

Changing the recursion in this way gives the likelihood of themost probable path

We need to keep track of the states that make up this path bykeeping a sequence of backpointers to enable a Viterbibacktrace: the backpointer for each state at each timeindicates the previous state on the most probable path

Steve Renals Hidden Markov Models (part 2) 24

Page 45: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi approximation

Instead of summing over all possible state sequences, justconsider the most likely

Achieve this by changing the summation to a maximisation inthe recursion:

Vt(sj) = maxi

Vt−1(si )aijbj(xt)

Changing the recursion in this way gives the likelihood of themost probable path

We need to keep track of the states that make up this path bykeeping a sequence of backpointers to enable a Viterbibacktrace: the backpointer for each state at each timeindicates the previous state on the most probable path

Steve Renals Hidden Markov Models (part 2) 24

Page 46: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi approximation

Instead of summing over all possible state sequences, justconsider the most likely

Achieve this by changing the summation to a maximisation inthe recursion:

Vt(sj) = maxi

Vt−1(si )aijbj(xt)

Changing the recursion in this way gives the likelihood of themost probable path

We need to keep track of the states that make up this path bykeeping a sequence of backpointers to enable a Viterbibacktrace: the backpointer for each state at each timeindicates the previous state on the most probable path

Steve Renals Hidden Markov Models (part 2) 24

Page 47: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi approximation

Instead of summing over all possible state sequences, justconsider the most likely

Achieve this by changing the summation to a maximisation inthe recursion:

Vt(sj) = maxi

Vt−1(si )aijbj(xt)

Changing the recursion in this way gives the likelihood of themost probable path

We need to keep track of the states that make up this path bykeeping a sequence of backpointers to enable a Viterbibacktrace: the backpointer for each state at each timeindicates the previous state on the most probable path

Steve Renals Hidden Markov Models (part 2) 24

Page 48: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Recursion

Likelihood of the most probable path

k

i

j

t-1

i

j

k

t

i

j

k

t+1

Vt!1(sk)

Vt!1(s j)

Vt!1(si)

aki

a ji

aiib j(xt)

max

Vt(si)

Steve Renals Hidden Markov Models (part 2) 25

Page 49: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Recursion

Backpointers to the previous state on the most probable path

k

i

j

t-1

i

j

k

t

i

j

k

t+1

Vt!1(s j)

a ji

b j(xt) Vt(si)btt(si) = s j

Steve Renals Hidden Markov Models (part 2) 26

Page 50: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

2. Decoding: The Viterbi algorithm

Initialization

V0(sI ) = 1

V0(sj) = 0 if sj 6= sI

bt0(sj) = 0

Recursion

Vt(sj) =N

maxi=1

Vt−1(si )aijbj(xt)

btt(sj) = argN

maxi=1

Vt−1(si )aijbj(xt)

Termination

P∗ = VT (sE ) =N

maxi=1

VT (si )aiE

s∗T = btT (qE ) = argN

maxi=1

VT (si )aiE

Steve Renals Hidden Markov Models (part 2) 27

Page 51: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

2. Decoding: The Viterbi algorithm

Initialization

V0(sI ) = 1

V0(sj) = 0 if sj 6= sI

bt0(sj) = 0

Recursion

Vt(sj) =N

maxi=1

Vt−1(si )aijbj(xt)

btt(sj) = argN

maxi=1

Vt−1(si )aijbj(xt)

Termination

P∗ = VT (sE ) =N

maxi=1

VT (si )aiE

s∗T = btT (qE ) = argN

maxi=1

VT (si )aiE

Steve Renals Hidden Markov Models (part 2) 27

Page 52: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

2. Decoding: The Viterbi algorithm

Initialization

V0(sI ) = 1

V0(sj) = 0 if sj 6= sI

bt0(sj) = 0

Recursion

Vt(sj) =N

maxi=1

Vt−1(si )aijbj(xt)

btt(sj) = argN

maxi=1

Vt−1(si )aijbj(xt)

Termination

P∗ = VT (sE ) =N

maxi=1

VT (si )aiE

s∗T = btT (qE ) = argN

maxi=1

VT (si )aiE

Steve Renals Hidden Markov Models (part 2) 27

Page 53: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Backtrace

Backtrace to find the state sequence of the most probable path

k

i

j

t-1

i

j

k

t

i

j

k

t+1

Vt!1(s j)

a ji

b j(xt) Vt(si)btt(si) = s j

btt+1(sk) = skSteve Renals Hidden Markov Models (part 2) 28

Page 54: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

3. Training: Forward-Backward algorithm

Goal: Efficiently estimate the parameters of an HMM λ froman observation sequence

Assume single Gaussian output probability distribution

bj(x) = p(x | sj) = N (x; µj ,Σj)

Parameters λ:

Transition probabilities aij :∑i

aij = 1

Gaussian parameters for state sj :mean vector µj; covariance matrix Σj

Steve Renals Hidden Markov Models (part 2) 29

Page 55: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

3. Training: Forward-Backward algorithm

Goal: Efficiently estimate the parameters of an HMM λ froman observation sequence

Assume single Gaussian output probability distribution

bj(x) = p(x | sj) = N (x; µj ,Σj)

Parameters λ:

Transition probabilities aij :∑i

aij = 1

Gaussian parameters for state sj :mean vector µj; covariance matrix Σj

Steve Renals Hidden Markov Models (part 2) 29

Page 56: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

3. Training: Forward-Backward algorithm

Goal: Efficiently estimate the parameters of an HMM λ froman observation sequence

Assume single Gaussian output probability distribution

bj(x) = p(x | sj) = N (x; µj ,Σj)

Parameters λ:

Transition probabilities aij :∑i

aij = 1

Gaussian parameters for state sj :mean vector µj; covariance matrix Σj

Steve Renals Hidden Markov Models (part 2) 29

Page 57: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Training

If we knew the state-time alignment, then each observationfeature vector could be assigned to a specific state

A state-time alignment can be obtained using the mostprobable path obtained by Viterbi decodingMaximum likelihood estimate of aij , if C (si → sj) is the countof transitions from si to sj

aij =C (si → sj)∑k C (sk → sj)

Likewise if Zj is the set of observed acoustic feature vectorsassigned to state j , we can use the standard maximumlikelihood estimates for the mean and the covariance:

µj =

∑x∈Zj

x

|Zj |

Σj

=

∑x∈Zj

(x− µj)(x− µj)T

|Zj |

Steve Renals Hidden Markov Models (part 2) 30

Page 58: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Training

If we knew the state-time alignment, then each observationfeature vector could be assigned to a specific stateA state-time alignment can be obtained using the mostprobable path obtained by Viterbi decoding

Maximum likelihood estimate of aij , if C (si → sj) is the countof transitions from si to sj

aij =C (si → sj)∑k C (sk → sj)

Likewise if Zj is the set of observed acoustic feature vectorsassigned to state j , we can use the standard maximumlikelihood estimates for the mean and the covariance:

µj =

∑x∈Zj

x

|Zj |

Σj

=

∑x∈Zj

(x− µj)(x− µj)T

|Zj |

Steve Renals Hidden Markov Models (part 2) 30

Page 59: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Training

If we knew the state-time alignment, then each observationfeature vector could be assigned to a specific stateA state-time alignment can be obtained using the mostprobable path obtained by Viterbi decodingMaximum likelihood estimate of aij , if C (si → sj) is the countof transitions from si to sj

aij =C (si → sj)∑k C (sk → sj)

Likewise if Zj is the set of observed acoustic feature vectorsassigned to state j , we can use the standard maximumlikelihood estimates for the mean and the covariance:

µj =

∑x∈Zj

x

|Zj |

Σj

=

∑x∈Zj

(x− µj)(x− µj)T

|Zj |

Steve Renals Hidden Markov Models (part 2) 30

Page 60: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Viterbi Training

If we knew the state-time alignment, then each observationfeature vector could be assigned to a specific stateA state-time alignment can be obtained using the mostprobable path obtained by Viterbi decodingMaximum likelihood estimate of aij , if C (si → sj) is the countof transitions from si to sj

aij =C (si → sj)∑k C (sk → sj)

Likewise if Zj is the set of observed acoustic feature vectorsassigned to state j , we can use the standard maximumlikelihood estimates for the mean and the covariance:

µj =

∑x∈Zj

x

|Zj |

Σj

=

∑x∈Zj

(x− µj)(x− µj)T

|Zj |Steve Renals Hidden Markov Models (part 2) 30

Page 61: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 62: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 63: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 64: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 65: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 66: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 67: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM Algorithm

Viterbi training is an approximation—we would like toconsider all possible paths

In this case rather than having a hard state-time alignment weestimate a probability

State occupation probability: The probability γt(sj) ofoccupying state sj at time t given the sequence of observations

We can use this for an iterative algorithm for HMM training:the EM algorithm

Each iteration has two steps:

E-step estimate the state occupation probabilities(Expectation)

M-step re-estimate the HMM parameters based on theestimated state occupation probabilities(Maximisation)

Steve Renals Hidden Markov Models (part 2) 31

Page 68: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward probabilities

To estimate the state occupation probabilities it is useful todefine (recursively) another set of probabilities—the Backwardprobabilities

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

The probability of future observations given a the HMM is instate sj at time t

These can be recursively computed (going backwards in time)

InitialisationβT (si ) = aiE

Recursion

βt(si ) =N∑

j=1

aijbj(xt+1)βt+1(sj)

Termination

p(X | λ) = β0(sI ) =N∑

j=1

aIjbj(x1)β1(sj) = αT (sE )

Steve Renals Hidden Markov Models (part 2) 32

Page 69: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward probabilities

To estimate the state occupation probabilities it is useful todefine (recursively) another set of probabilities—the Backwardprobabilities

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

The probability of future observations given a the HMM is instate sj at time tThese can be recursively computed (going backwards in time)

InitialisationβT (si ) = aiE

Recursion

βt(si ) =N∑

j=1

aijbj(xt+1)βt+1(sj)

Termination

p(X | λ) = β0(sI ) =N∑

j=1

aIjbj(x1)β1(sj) = αT (sE )

Steve Renals Hidden Markov Models (part 2) 32

Page 70: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward probabilities

To estimate the state occupation probabilities it is useful todefine (recursively) another set of probabilities—the Backwardprobabilities

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

The probability of future observations given a the HMM is instate sj at time tThese can be recursively computed (going backwards in time)

InitialisationβT (si ) = aiE

Recursion

βt(si ) =N∑

j=1

aijbj(xt+1)βt+1(sj)

Termination

p(X | λ) = β0(sI ) =N∑

j=1

aIjbj(x1)β1(sj) = αT (sE )

Steve Renals Hidden Markov Models (part 2) 32

Page 71: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward probabilities

To estimate the state occupation probabilities it is useful todefine (recursively) another set of probabilities—the Backwardprobabilities

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

The probability of future observations given a the HMM is instate sj at time tThese can be recursively computed (going backwards in time)

InitialisationβT (si ) = aiE

Recursion

βt(si ) =N∑

j=1

aijbj(xt+1)βt+1(sj)

Termination

p(X | λ) = β0(sI ) =N∑

j=1

aIjbj(x1)β1(sj) = αT (sE )

Steve Renals Hidden Markov Models (part 2) 32

Page 72: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward probabilities

To estimate the state occupation probabilities it is useful todefine (recursively) another set of probabilities—the Backwardprobabilities

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

The probability of future observations given a the HMM is instate sj at time tThese can be recursively computed (going backwards in time)

InitialisationβT (si ) = aiE

Recursion

βt(si ) =N∑

j=1

aijbj(xt+1)βt+1(sj)

Termination

p(X | λ) = β0(sI ) =N∑

j=1

aIjbj(x1)β1(sj) = αT (sE )

Steve Renals Hidden Markov Models (part 2) 32

Page 73: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Backward Recursion

βt(sj) = p(xt+1, xt+2, xT | S(t) = sj ,λ)

k

i

j

t-1

i

j

k

t

i

j

k

t+1

!t+1(sk)

!t+1(s j)

!t+1(si)

aik

ai j

aii

bk(xt+1)

b j(xt+1)

bi(xt+1)!!t(si)

Steve Renals Hidden Markov Models (part 2) 33

Page 74: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

State Occupation Probability

The state occupation probability γt(sj) is the probability ofoccupying state sj at time t given the sequence of observations

Express in terms of the forward and backward probabilities:

γt(sj) = P(S(t) = sj | X,λ) =1

αT (sE )αt(j)βt(j)

recalling that p(X|λ) = αT (sE )Since

αt(sj)βt(sj) = p(x1, . . . , xt ,S(t) = sj | λ)

p(xt+1, xt+2, xT | S(t) = sj ,λ)

= p(x1, . . . , xt , xt+1, xt+2, . . . , xT ,S(t) = sj | λ)

= p(X,S(t) = sj | λ)

P(S(t) = sj | X,λ) =p(X,S(t) = sj | λ)

p(X|λ)

Steve Renals Hidden Markov Models (part 2) 34

Page 75: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

State Occupation Probability

The state occupation probability γt(sj) is the probability ofoccupying state sj at time t given the sequence of observationsExpress in terms of the forward and backward probabilities:

γt(sj) = P(S(t) = sj | X,λ) =1

αT (sE )αt(j)βt(j)

recalling that p(X|λ) = αT (sE )

Since

αt(sj)βt(sj) = p(x1, . . . , xt ,S(t) = sj | λ)

p(xt+1, xt+2, xT | S(t) = sj ,λ)

= p(x1, . . . , xt , xt+1, xt+2, . . . , xT ,S(t) = sj | λ)

= p(X,S(t) = sj | λ)

P(S(t) = sj | X,λ) =p(X,S(t) = sj | λ)

p(X|λ)

Steve Renals Hidden Markov Models (part 2) 34

Page 76: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

State Occupation Probability

The state occupation probability γt(sj) is the probability ofoccupying state sj at time t given the sequence of observationsExpress in terms of the forward and backward probabilities:

γt(sj) = P(S(t) = sj | X,λ) =1

αT (sE )αt(j)βt(j)

recalling that p(X|λ) = αT (sE )Since

αt(sj)βt(sj) = p(x1, . . . , xt , S(t) = sj | λ)

p(xt+1, xt+2, xT | S(t) = sj ,λ)

= p(x1, . . . , xt , xt+1, xt+2, . . . , xT ,S(t) = sj | λ)

= p(X,S(t) = sj | λ)

P(S(t) = sj | X,λ) =p(X, S(t) = sj | λ)

p(X|λ)

Steve Renals Hidden Markov Models (part 2) 34

Page 77: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Re-estimation of Gaussian parameters

The sum of state occupation probabilities through time for astate, may be regarded as a “soft” count

We can use this “soft” alignment to re-estimate the HMMparameters:

µj =

∑Tt=1 γt(sj)xt∑Tt=1 γt(sj)

Σj

=

∑Tt=1 γt(sj)(xt − µj)(x− µj)T∑T

t=1 γt(sj)

Steve Renals Hidden Markov Models (part 2) 35

Page 78: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Re-estimation of Gaussian parameters

The sum of state occupation probabilities through time for astate, may be regarded as a “soft” count

We can use this “soft” alignment to re-estimate the HMMparameters:

µj =

∑Tt=1 γt(sj)xt∑Tt=1 γt(sj)

Σj

=

∑Tt=1 γt(sj)(xt − µj)(x− µj)T∑T

t=1 γt(sj)

Steve Renals Hidden Markov Models (part 2) 35

Page 79: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Re-estimation of transition probabilities

Similarly to the state occupation probability, we can estimateξt(si , sj), the probability of being in si at time t and sj att + 1, given the observations:

ξt(si , sj) = P(S(t) = si ,S(t + 1) = sj | X,λ)

=P(S(t) = si , S(t + 1) = sj ,X | λ)

p(X|Λ)

=αt(si )aijbj(xt+1)βt+1(sj)

αT (sE )

We can use this to re-estimate the transition probabilities

aij =

∑Tt=1 ξt(si , sj)∑N

k=1

∑Tt=1 ξt(si , sk)

Steve Renals Hidden Markov Models (part 2) 36

Page 80: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Re-estimation of transition probabilities

Similarly to the state occupation probability, we can estimateξt(si , sj), the probability of being in si at time t and sj att + 1, given the observations:

ξt(si , sj) = P(S(t) = si ,S(t + 1) = sj | X,λ)

=P(S(t) = si , S(t + 1) = sj ,X | λ)

p(X|Λ)

=αt(si )aijbj(xt+1)βt+1(sj)

αT (sE )

We can use this to re-estimate the transition probabilities

aij =

∑Tt=1 ξt(si , sj)∑N

k=1

∑Tt=1 ξt(si , sk)

Steve Renals Hidden Markov Models (part 2) 36

Page 81: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Pulling it all together

Iterative estimation of HMM parameters using the EMalgorithm. At each iteration

E step For all time-state pairs1 Recursively compute the forward probabilitiesαt(sj) and backward probabilities βt(j)

2 Compute the state occupation probabilitiesγt(sj) and ξt(si , sj)

M step Based on the estimated state occupationprobabilities re-estimate the HMM parameters:mean vectors µj , covariance matrices Σj andtransition probabilities aij

The application of the EM algorithm to HMM training issometimes called the Forward-Backward algorithm

Steve Renals Hidden Markov Models (part 2) 37

Page 82: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Pulling it all together

Iterative estimation of HMM parameters using the EMalgorithm. At each iteration

E step For all time-state pairs1 Recursively compute the forward probabilitiesαt(sj) and backward probabilities βt(j)

2 Compute the state occupation probabilitiesγt(sj) and ξt(si , sj)

M step Based on the estimated state occupationprobabilities re-estimate the HMM parameters:mean vectors µj , covariance matrices Σj andtransition probabilities aij

The application of the EM algorithm to HMM training issometimes called the Forward-Backward algorithm

Steve Renals Hidden Markov Models (part 2) 37

Page 83: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Pulling it all together

Iterative estimation of HMM parameters using the EMalgorithm. At each iteration

E step For all time-state pairs1 Recursively compute the forward probabilitiesαt(sj) and backward probabilities βt(j)

2 Compute the state occupation probabilitiesγt(sj) and ξt(si , sj)

M step Based on the estimated state occupationprobabilities re-estimate the HMM parameters:mean vectors µj , covariance matrices Σj andtransition probabilities aij

The application of the EM algorithm to HMM training issometimes called the Forward-Backward algorithm

Steve Renals Hidden Markov Models (part 2) 37

Page 84: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Extension to a corpus of utterances

We usually train from a large corpus of R utterances

If xrt is the tth frame of the r th utterance Xr then we can

compute the probabilities αrt(j), βr

t (j), γrt (sj) and ξrt (si , sj) as

before

The re-estimates are as before, except we must sum over theR utterances, eg:

µj =

∑Rr=1

∑Tt=1 γ

rt (sj)xr

t∑Rr=1

∑Tt=1 γ

rt (sj)

Steve Renals Hidden Markov Models (part 2) 38

Page 85: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Extension to Gaussian mixture model (GMM)

The assumption of a Gaussian distribution at each state isvery strong; in practice the acoustic feature vectors associatedwith a state may be strongly non-Gaussian

In this case an M-component Gaussian mixture model is anappropriate density function:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Given enough components, this family of functions can modelany distribution.

Train using the EM algorithm, in which the componentestimation probabilities are estimated in the E-step

Steve Renals Hidden Markov Models (part 2) 39

Page 86: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Extension to Gaussian mixture model (GMM)

The assumption of a Gaussian distribution at each state isvery strong; in practice the acoustic feature vectors associatedwith a state may be strongly non-Gaussian

In this case an M-component Gaussian mixture model is anappropriate density function:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Given enough components, this family of functions can modelany distribution.

Train using the EM algorithm, in which the componentestimation probabilities are estimated in the E-step

Steve Renals Hidden Markov Models (part 2) 39

Page 87: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Extension to Gaussian mixture model (GMM)

The assumption of a Gaussian distribution at each state isvery strong; in practice the acoustic feature vectors associatedwith a state may be strongly non-Gaussian

In this case an M-component Gaussian mixture model is anappropriate density function:

bj(x) = p(x | sj) =M∑

m=1

cjmN (x; µjm,Σjm)

Given enough components, this family of functions can modelany distribution.

Train using the EM algorithm, in which the componentestimation probabilities are estimated in the E-step

Steve Renals Hidden Markov Models (part 2) 39

Page 88: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM training of HMM/GMM

Rather than estimating the state-time alignment, we estimatethe component/state-time alignment, and component-stateoccupation probabilities γt(sj ,m): the probability ofoccupying mixture component m of state sj at time t

We can thus re-estimate the mean of mixture component mof state sj as follows

µjm =

∑Tt=1 γt(sj ,m)xt∑Tt=1 γt(sj ,m)

And likewise for the covariance matrices (mixture modelsoften use diagonal covariance matrices)The mixture coefficients are re-estimated in a similar way totransition probabilities:

cjm =

∑Tt=1 γt(sj ,m)∑M

`=1

∑Tt=1 γt(sj , `)

Steve Renals Hidden Markov Models (part 2) 40

Page 89: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM training of HMM/GMM

Rather than estimating the state-time alignment, we estimatethe component/state-time alignment, and component-stateoccupation probabilities γt(sj ,m): the probability ofoccupying mixture component m of state sj at time tWe can thus re-estimate the mean of mixture component mof state sj as follows

µjm =

∑Tt=1 γt(sj ,m)xt∑Tt=1 γt(sj ,m)

And likewise for the covariance matrices (mixture modelsoften use diagonal covariance matrices)

The mixture coefficients are re-estimated in a similar way totransition probabilities:

cjm =

∑Tt=1 γt(sj ,m)∑M

`=1

∑Tt=1 γt(sj , `)

Steve Renals Hidden Markov Models (part 2) 40

Page 90: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

EM training of HMM/GMM

Rather than estimating the state-time alignment, we estimatethe component/state-time alignment, and component-stateoccupation probabilities γt(sj ,m): the probability ofoccupying mixture component m of state sj at time tWe can thus re-estimate the mean of mixture component mof state sj as follows

µjm =

∑Tt=1 γt(sj ,m)xt∑Tt=1 γt(sj ,m)

And likewise for the covariance matrices (mixture modelsoften use diagonal covariance matrices)The mixture coefficients are re-estimated in a similar way totransition probabilities:

cjm =

∑Tt=1 γt(sj ,m)∑M

`=1

∑Tt=1 γt(sj , `)

Steve Renals Hidden Markov Models (part 2) 40

Page 91: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Doing the computation

The forward, backward and Viterbi recursions result in a longsequence of probabilities being multiplied

This can cause floating point underflow problems

In practice computations are performed in the log domain (inwhich multiplies become adds)

Working in the log domain also avoids needing to perform theexponentiation when computing Gaussians

Steve Renals Hidden Markov Models (part 2) 41

Page 92: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

Summary: HMMs

HMMs provide a generative model for statistical speechrecognition

Three key problems1 Computing the overall likelihood: the Forward algorithm2 Decoding the most likely state sequence: the Viterbi algorithm3 Estimating the most likely parameters: the EM

(Forward-Backward) algorithm

Solutions to these problems are tractable due to the two keyHMM assumptions

1 Conditional independence of observations given the currentstate

2 Markov assumption on the states

Steve Renals Hidden Markov Models (part 2) 42

Page 93: Hidden Markov Models - The University of Edinburgh · Hidden Markov Models ... HMM Acoustic Model s(t! 1) s(t) s(t+1) ... 2 Decoding Given an observation sequence and an HMM, determine

References: HMMs

Jurafsky and Martin (2008). Speech and LanguageProcessing(2nd ed.): sections 6.1–6.5; 9.2; 9.4. (Errata athttp://www.cs.colorado.edu/~martin/SLP/Errata/SLP2-PIEV-Errata.html)

Gales and Young (2007). “The Application of Hidden MarkovModels in Speech Recognition”, Foundations and Trends inSignal Processing, 1 (3), 195–304: section 2.2.

Rabiner and Juang (1989). “An introduction to hiddenMarkov models”, IEEE ASSP Magazine, 3 (1), 4–16.

Steve Renals Hidden Markov Models (part 2) 43


Recommended