Date post: | 04-Jun-2018 |
Category: |
Documents |
Upload: | arvindravi |
View: | 235 times |
Download: | 0 times |
of 33
8/13/2019 Hidden Markov Model - ASR
1/33
HiddenMarkovModelling Introduction Problemformulation
Forward-Backwardalgorithm
Viterbisearch Baum-Welchparameterestimation Otherconsiderations
Multipleobservationsequences Phone-basedmodelsforcontinuousspeechrecognition ContinuousdensityHMMs Implementationissues
6.345AutomaticSpeechRecognition HMM1
Lecture # 10
Session 2003
8/13/2019 Hidden Markov Model - ASR
2/33
InformationTheoreticApproachtoASRSpeech Generation
Text
Generation
Speech Recognition
noisy
communicationchannel
Linguistic
Decoder
Speech
Acoustic
Processor
A W
Recognitionisachievedbymaximizingtheprobabilityofthelinguisticstring,W,giventheacousticevidence,A,i.e.,choosethe
linguisticsequenceW suchthatP(W|A) = m
WaxP(W|A)
6.345AutomaticSpeechRecognition HMM2
8/13/2019 Hidden Markov Model - ASR
3/33
InformationTheoreticApproachtoASR FromBayesrule:
P(W|A) = P(A|W)P(W)P(A)
HiddenMarkovmodelling(HMM)dealswiththequantityP(A|W) Changeinnotation:
A OW
P(A|W) P(O|)
6.345AutomaticSpeechRecognition HMM3
8/13/2019 Hidden Markov Model - ASR
4/33
HMM:AnExample2
22
11
112
21
2
02
22
11
112
21
2
12
22
11
112
21
2
21 2
1
2
Consider3mugs,eachwithmixturesofstatestones,1and2 Thefractionsfortheith mugareai1 andai2, andai1+ai2= 1 Consider2urns,eachwithmixturesofblackandwhiteballs Thefractionsfortheith urnarebi(B)andbi(W);bi(B) + bi(W) = 1 Theparametervectorforthismodelis:
={a01, a02, a11, a12, a21, a22, b1(B), b1(W), b2(B), b2(W)}6.345AutomaticSpeechRecognition HMM4
8/13/2019 Hidden Markov Model - ASR
5/33
HMM:AnExample (contd)2
221
1
112
21
2
02
221
1
112
21
2
12
22
11
112
21
2
1
21 1
1 11 2 2 1
ObservationSequence: O={B , W , B , W , W , B}StateSequence: Q ={1,1,2,1,2,1}
Goal: GiventhemodelandtheobservationsequenceO,howcantheunderlyingstatesequenceQ bedetermined?6.345AutomaticSpeechRecognition HMM5
222
11
112
21
2
22
22
11
112
21
2
1
1 2
222
11
112
21
2
2
1
8/13/2019 Hidden Markov Model - ASR
6/33
ElementsofaDiscreteHiddenMarkovModel N: numberofstatesinthemodel
states,s={s1, s2, . . . , sN} stateattimet,qts
M: numberofobservationsymbols(i.e.,discreteobservations) observationsymbols,v={v1, v2, . . . , vM} observationattimet,otv
A
=
{aij}: statetransitionprobabilitydistribution aij =P(qt+1 =sj|qt =si), 1i, jN
B={bj(k)}: observationsymbolprobabilitydistributioninstatej bj(k) = P(vk att|qt =sj), 1jN, 1kM
={i}: initialstatedistribution i =P(q1=si), 1iN
Notationally,anHMMistypicallywrittenas: ={A,B, }6.345AutomaticSpeechRecognition HMM6
8/13/2019 Hidden Markov Model - ASR
7/33
HMM:AnExample (contd)Foroursimpleexample:
={a01, a02}, A= a11 a12 , and B= b1(B) b1(W)a21 a22 b2(B) b2(W)
StateDiagram2-state 3-state
1 2
a11 a12a22
1 32a21
{b1(B), b1(W)} {b2(B), b2(W)}
6.345AutomaticSpeechRecognition HMM7
8/13/2019 Hidden Markov Model - ASR
8/33
GenerationofHMMObservations1. Chooseaninitialstate,q1=si,basedontheinitialstate
distribution,2. Fort= 1 toT:
Chooseot =vk accordingtothesymbolprobabilitydistributioninstatesi, bi(k)
Transitiontoanewstateqt+1 =sj accordingtothestatetransitionprobabilitydistributionforstatesi, aij
3. Incrementtby1, return to step 2 iftT;else,terminatea0i aij
q1
q2
. . . . .
bi(k)
o1 o2
qT
oT
6.345AutomaticSpeechRecognition HMM8
8/13/2019 Hidden Markov Model - ASR
9/33
RepresentingStateDiagrambyTrellis
1 2 3
s1
s2
s3
0 1 2 3 4
Thedashedlinerepresentsanulltransition,wherenoobservationsymbolisgenerated
6.345AutomaticSpeechRecognition HMM9
8/13/2019 Hidden Markov Model - ASR
10/33
ThreeBasicHMMProblems1. Scoring: GivenanobservationsequenceO ={o1, o2, ..., oT}anda
model={A,B, },howdowecomputeP(O |),theprobabilityoftheobservationsequence?
==>TheForward-BackwardAlgorithm2. Matching: GivenanobservationsequenceO ={o1, o2, ..., oT}, how
dowechooseastatesequenceQ ={q1, q2, ..., qT}whichisoptimuminsomesense?
==>TheViterbiAlgorithm3. Training: Howdoweadjustthemodelparameters={A, B, }to
maximizeP(O |)?==>
The
Baum-Welch
Re-estimation
Procedures
6.345AutomaticSpeechRecognition HMM10
8/13/2019 Hidden Markov Model - ASR
11/33
ComputationofP(O|)P(O|) = P(O,Q |)
allQ
P(O,Q |) = P(O|Q, )P(Q |) Considerthefixed statesequence:Q =q1q2. . . qT
P(O|Q, ) = bq1(o1)bq2(o2). . . bqT(oT)P(Q |) = q1aq1q2aq2q3 . . . aqT1qT
Therefore:P(O|) =
q1,q2,...,qTq1bq1(o1)aq1q2bq2(o2). . . aqT1qTbqT (oT)
Calculation
required
2T
N
T(there
are
N
Tsuch
sequences)
ForN= 5, T=100210051001072 computations!
6.345AutomaticSpeechRecognition HMM11
8/13/2019 Hidden Markov Model - ASR
12/33
TheForwardAlgorithm Letusdefinetheforwardvariable,t(i),astheprobabilityofthe
partialobservationsequenceuptotimetand statesi attimet,giventhemodel,i.e.
t(i) = P(o1o2. . . ot, qt =si|) Itcaneasilybeshownthat:
1(i) = ibi(o1), 1iNN
P(O|) = T(i) Byinduction: i=1
N 1tT1t+1(j)= [ t(i)aij ]bj(ot+1), 1jN
i=1 CalculationisontheorderofN2T.
ForN= 5, T=10010052 computations,insteadof10726.345AutomaticSpeechRecognition HMM12
8/13/2019 Hidden Markov Model - ASR
13/33
ForwardAlgorithm Illustrations1
si
sj
sN
1 t t+1 T
a1j
aNj
ajj
aij
t(i)
t+1(j)
6.345AutomaticSpeechRecognition HMM13
8/13/2019 Hidden Markov Model - ASR
14/33
TheBackwardAlgorithm Similarly,letusdefinethebackwardvariable,t(i), as the
probabilityofthepartialobservationsequencefromtimet+ 1 totheend,givenstatesi attimetandthemodel,i.e.
t(i) = P(ot+1ot+2 . . . oT|qt =si, ) Itcaneasilybeshownthat:
and:
Byinduction:N
t(i)=j=1
T(i) = 1,
1
i
N
N
P(O|) = ibi(o1)1(i)i=1
t=T1, T2, . . . , 1aijbj(ot+1) t+1 (j), 1iN
6.345AutomaticSpeechRecognition HMM14
8/13/2019 Hidden Markov Model - ASR
15/33
BackwardProcedure Illustrations1
si
sj
sN
1 t t+1 T
ai1
aiN
aij
aiit(i)
t+1(j)
6.345AutomaticSpeechRecognition HMM15
8/13/2019 Hidden Markov Model - ASR
16/33
N
FindingOptimalStateSequences Onecriterionchoosesstates,qt,whichareindividuallymostlikely
Thismaximizestheexpectednumberofcorrectstates Letusdefine
t(i) astheprobabilityofbeinginstates
i attimet,
giventheobservationsequenceandthemodel,i.e.t(i) = P(qt = si|O, )
i=1
t(i) = 1, t Thentheindividuallymostlikelystate,qt, at timet is:
qt = argmax t(i) 1 tT1iN
Notethatitcanbeshownthat:t(i) = t(i)t(i)
P(O|)
6.345AutomaticSpeechRecognition HMM16
8/13/2019 Hidden Markov Model - ASR
17/33
i
FindingOptimalStateSequences Theindividualoptimalitycriterionhastheproblemthatthe
optimumstatesequencemaynotobeystatetransitionconstraints Anotheroptimalitycriterionistochoosethestatesequencewhich
maximizesP(Q,O|);ThiscanbefoundbytheViterbialgorithm Letusdefinet(i) asthehighestprobabilityalongasinglepath,at
timet,whichaccountsforthefirsttobservations,i.e.t(i) = max P(q1q2 . . . qt1, qt = si, o1o2 . . . ot|)
q1,q2,...,qt1
Byinduction: t+1(j) = [max t(i)aij ]bj(ot+1) Toretrievethestatesequence,wemustkeeptrackofthestate
sequencewhichgavethebestpath,attimet,tostatesi
Wedo
this
in
a
separate
array
t(i)
6.345AutomaticSpeechRecognition HMM17
8/13/2019 Hidden Markov Model - ASR
18/33
T
TheViterbiAlgorithm1. Initialization:
1(i) = ibi(o1), 1 iN1(i) = 0
2.
Recursion:
t(j) = max1iN
[t1(i)aij ]bj(ot), 2 tT 1 jNt(j) = argmax[t1(i)aij ], 2 tT 1 jN
1iN3. Termination:
P = max1iN[T(i)]
q = argmax[T(i)]1iN
4. Path(state-sequence)backtracking:
qt = t+1(qt+1), t= T1, T2, . . . , 1ComputationN2T6.345AutomaticSpeechRecognition HMM18
8/13/2019 Hidden Markov Model - ASR
19/33
TheViterbiAlgorithm: AnExample
1
s1
s3
s2
4320
0 bbaa
1 2 3
.8
.2.5.5
.7
.3.3.7
P(a)P(b).5 .4
.1
.5
.2
.3
O={a a b b}
.40
.09
.35.35
.40
.15
.10.10
.15
.21.21
.20.20.20.20.20.20.20.20.20
.10.10.10.10.10
.09
6.345AutomaticSpeechRecognition HMM19
8/13/2019 Hidden Markov Model - ASR
20/33
TheViterbiAlgorithm: AnExample (contd)0 a aa aab aabb
s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032
s2 s1,0
.2
s1, a
.21
s1, a
.084
s1, b
.0144
s1, b
.00144
s2, a .04 s2, a .042 s2, b .0168 s2, b .00336s2,0 .021 s2,0 .0084 s2,0 .00168 s2,0 .000336
s3 s2,0 .02s2, a .03 s2, a .0315 s2, b .0294 s2, b .00588
0 a a b b
1.0 0.4 0.16 0.016 0.0016s1
s2
s3
0 1 2 3 4
0.005880.02940.03150.030.02
0.003360.01680.0840.210.2
6.345AutomaticSpeechRecognition HMM20
8/13/2019 Hidden Markov Model - ASR
21/33
MatchingUsingForward-BackwardAlgorithm0 a aa aab aabb
s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032
s2 s1,0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144s2, a .04 s2, a .066 s2, b .0364 s2, b .0108s2,0 .033 s2,0 .0182 s2,0 .0054 s2,0 .001256
s3 s2,0 .02s2, a .03 s2, a .0495 s2, b .0637 s2, b .0189
s1
s2
s3
0 a a b b
1.0 0.4 0.16 0.016 0.0016
0.012560.05400.1820.330.2
0.0201560.06910.06770.0630.02
0 1 2 3 46.345AutomaticSpeechRecognition HMM21
8/13/2019 Hidden Markov Model - ASR
22/33
Baum-WelchRe-estimation Baum-Welchre-estimationusesEMtodetermineMLparameters Definet(i, j)astheprobabilityofbeinginstatesi attimetand
statesj attimet+ 1,giventhemodelandobservationsequencet(i, j) = P(qt =si, qt+1 =sj|O, )
Then:t(i, j) = t(i)aijbj(ot+1)t+1(j)
P(O|)N
t(i) = t(i, j)j=1
Summingt(i)andt(i, j),weget:T1
t(i) = expectednumberoftransitionsfromsit=1
T1
t(i, j) =
expectednumber
of
transitions
from
sito
sj
t=1
6.345AutomaticSpeechRecognition HMM22
8/13/2019 Hidden Markov Model - ASR
23/33
Baum-WelchRe-estimationProceduress1
si
sj
sN
1 t-1 t t+1 t+2 T
t(i)
t+1(j)
aij
6.345AutomaticSpeechRecognition HMM23
8/13/2019 Hidden Markov Model - ASR
24/33
TT
Baum-WelchRe-estimationFormulas = expectednumberoftimesinstatesi att= 1
= 1(i)
aij = expectednumberoftransitionsfromstatesi tosjexpectednumberoftransitionsfromstatesi
T1t(i, j)
t=1=T1
t(i)
t=1
bj(k) = expectednumberoftimesinstatesj withsymbolvkexpectednumberoftimesinstatesj
t(j)t=1
ot=vk=
t(j)t=1
6.345AutomaticSpeechRecognition HMM24
8/13/2019 Hidden Markov Model - ASR
25/33
Baum-WelchRe-estimationFormulasA, ) isthe If= (A,B, ) istheinitialmodel,and= (B,
re-estimatedmodel. Thenitcanbeprovedthateither:1. Theinitialmodel,,definesacriticalpointofthelikelihood
function,inwhichcase= , or2. ModelismorelikelythaninthesensethatP(O|) > P(O|),
i.e.,wehavefoundanewmodelfromwhichtheobservationsequenceismorelikelytohavebeenproduced.
ThuswecanimprovetheprobabilityofObeingobservedfromthemodelifweiterativelyuseinplaceofandrepeatthere-estimationuntilsomelimitingpointisreached. TheresultingmodeliscalledthemaximumlikelihoodHMM.
6.345AutomaticSpeechRecognition HMM25
8/13/2019 Hidden Markov Model - ASR
26/33
K K
MultipleObservationSequences Speechrecognitiontypicallyusesleft-to-rightHMMs. TheseHMMs
cannotbetrainedusingasingleobservationsequence,becauseonlyasmallnumberofobservationsareavailabletotraineachstate. Toobtainreliableestimatesofmodelparameters,onemustusemultipleobservationsequences. Inthiscase,there-estimationprocedureneedstobemodified.
LetusdenotethesetofKobservationsequencesasO= {O(1),O(2), . . . , O(K)}
1 2 Tk}isthek-thobservationsequence.whereO(k) = {o(k), o(k), . . . , o(k) Assumethattheobservationssequencesaremutually
independent,wewanttoestimatetheparameterssoastomaximize
P(O|) = P(O(k) |) = Pkk=1 k=1
6.345AutomaticSpeechRecognition HMM26
8/13/2019 Hidden Markov Model - ASR
27/33
Tk
MultipleObservationSequences (contd) Sincethere-estimationformulasarebasedonfrequencyof
occurrenceofvariousevents,wecanmodifythembyaddinguptheindividualfrequenciesofoccurrenceforeachsequence
K Tk1 K 1 Tk1t
k(i, j)k=1
Pk t=1t
k(i)aijbj(o(k)t+1)kt+1(j)k=1 t=1 =aij =
K Tk1 K 1 Tk1
t
k(i) t
k(i)kt(i)
k=1 t=1 k=1 Pk t=1K Tk K 1
tk(j)
k=1P
k
t=1
tk(i)kt(i)
k=1 t=1ot(k)=v ot(k)=vbj() =
K Tk = K Tkt
k(j)k=1
Pkt=11
t
k(i)kt(i)k=1t=1
6.345AutomaticSpeechRecognition HMM27
8/13/2019 Hidden Markov Model - ASR
28/33
Phone-basedHMMs Word-basedHMMsareappropriateforsmallvocabularyspeech
recognition. ForlargevocabularyASR,sub-word-based(e.g.,phone-based)modelsaremoreappropriate.
6.345AutomaticSpeechRecognition HMM28
8/13/2019 Hidden Markov Model - ASR
29/33
Phone-basedHMMs (contd) Thephonemodelscanhavemanystates,andwordsaremadeup
fromaconcatenationofphonemodels.
6.345AutomaticSpeechRecognition HMM29
8/13/2019 Hidden Markov Model - ASR
30/33
ContinuousDensityHiddenMarkovModels AcontinuousdensityHMMreplacesthediscreteobservation
probabilities,bj(k),byacontinuousPDFbj(x) Acommonpracticeistorepresentbj(x)asamixtureofGaussians:
Mbj(x) = cjkN[x, jk,jk] 1jN
k=1
where cjk isthemixtureweightM
cjk 0 (1jN, 1kM, and cjk = 1,1jN),k=1
N isthenormaldensity,andjk andjk arethemeanvectorandcovariancematrixassociatedwithstatejandmixturek.
6.345AutomaticSpeechRecognition HMM30
8/13/2019 Hidden Markov Model - ASR
31/33
AcousticModellingVariations Semi-continuousHMMsfirstcomputeaVQcodebookofsizeM
TheVQcodebookisthenmodelledasafamilyofGaussianPDFs EachcodewordisrepresentedbyaGaussianPDF,andmaybe
usedtogetherwithotherstomodeltheacousticvectors FromtheCD-HMMviewpoint,thisisequivalenttousingthe
samesetofM mixturestomodelallthestates ItisthereforeoftenreferredtoasaTiedMixtureHMM
Allthreemethodshavebeenusedinmanyspeechrecognitiontasks,withvaryingoutcomes
Forlarge-vocabulary,continuousspeechrecognitionwithsufficientamount(i.e.,tensofhours)oftrainingdata,CD-HMMsystemscurrentlyyieldthebestperformance,butwithconsiderable
increase
in
computation
6.345AutomaticSpeechRecognition HMM31
8/13/2019 Hidden Markov Model - ASR
32/33
Implementation Issues Scaling: topreventunderflow SegmentalK-meansTraining: totrainobservationprobabilitiesby
firstperformingViterbialignment Initialestimatesof: toproviderobustmodels Pruning: toreducesearchcomputation
6.345AutomaticSpeechRecognition HMM32
8/13/2019 Hidden Markov Model - ASR
33/33
References X.Huang,A.Acero,andH.Hon,SpokenLanguageProcessing,
Prentice-Hall,2001. F.Jelinek,StatisticalMethodsforSpeechRecognition. MITPress,
1997. L.RabinerandB.Juang,FundamentalsofSpeechRecognition,
Prentice-Hall,1993.
6.345AutomaticSpeechRecognition HMM33