Hidden Markov Model - ASR

8/13/2019 Hidden Markov Model - ASR

1/33

HiddenMarkovModelling Introduction Problemformulation

Forward-Backwardalgorithm

Viterbisearch Baum-Welchparameterestimation Otherconsiderations

Multipleobservationsequences Phone-basedmodelsforcontinuousspeechrecognition ContinuousdensityHMMs Implementationissues

6.345AutomaticSpeechRecognition HMM1

Lecture # 10

Session 2003


2/33

InformationTheoreticApproachtoASRSpeech Generation

Text

Generation

Speech Recognition

noisy

communicationchannel

Linguistic

Decoder

Speech

Acoustic

Processor

A W

Recognitionisachievedbymaximizingtheprobabilityofthelinguisticstring,W,giventheacousticevidence,A,i.e.,choosethe

linguisticsequenceW suchthatP(W|A) = m

WaxP(W|A)



3/33

InformationTheoreticApproachtoASR FromBayesrule:

P(W|A) = P(A|W)P(W)P(A)

HiddenMarkovmodelling(HMM)dealswiththequantityP(A|W) Changeinnotation:

A OW

P(A|W) P(O|)



4/33

HMM:AnExample2

22

11

112

21

2

02

22

11

112

21

2

12

22

11

112

21

2

21 2

1

2

Consider3mugs,eachwithmixturesofstatestones,1and2 Thefractionsfortheith mugareai1 andai2, andai1+ai2= 1 Consider2urns,eachwithmixturesofblackandwhiteballs Thefractionsfortheith urnarebi(B)andbi(W);bi(B) + bi(W) = 1 Theparametervectorforthismodelis:

={a01, a02, a11, a12, a21, a22, b1(B), b1(W), b2(B), b2(W)}6.345AutomaticSpeechRecognition HMM4


5/33

HMM:AnExample (contd)2

221

1

112

21

2

02

221

1

112

21

2

12

22

11

112

21

2

1

21 1

1 11 2 2 1

ObservationSequence: O={B , W , B , W , W , B}StateSequence: Q ={1,1,2,1,2,1}

Goal: GiventhemodelandtheobservationsequenceO,howcantheunderlyingstatesequenceQ bedetermined?6.345AutomaticSpeechRecognition HMM5

222

11

112

21

2

22

22

11

112

21

2

1

1 2

222

11

112

21

2

2

1


6/33

ElementsofaDiscreteHiddenMarkovModel N: numberofstatesinthemodel

states,s={s1, s2, . . . , sN} stateattimet,qts

M: numberofobservationsymbols(i.e.,discreteobservations) observationsymbols,v={v1, v2, . . . , vM} observationattimet,otv

A

=

{aij}: statetransitionprobabilitydistribution aij =P(qt+1 =sj|qt =si), 1i, jN

B={bj(k)}: observationsymbolprobabilitydistributioninstatej bj(k) = P(vk att|qt =sj), 1jN, 1kM

={i}: initialstatedistribution i =P(q1=si), 1iN

Notationally,anHMMistypicallywrittenas: ={A,B, }6.345AutomaticSpeechRecognition HMM6


7/33

HMM:AnExample (contd)Foroursimpleexample:

={a01, a02}, A= a11 a12 , and B= b1(B) b1(W)a21 a22 b2(B) b2(W)

StateDiagram2-state 3-state

1 2

a11 a12a22

1 32a21

{b1(B), b1(W)} {b2(B), b2(W)}



8/33

GenerationofHMMObservations1. Chooseaninitialstate,q1=si,basedontheinitialstate

distribution,2. Fort= 1 toT:

Chooseot =vk accordingtothesymbolprobabilitydistributioninstatesi, bi(k)

Transitiontoanewstateqt+1 =sj accordingtothestatetransitionprobabilitydistributionforstatesi, aij

3. Incrementtby1, return to step 2 iftT;else,terminatea0i aij

q1

q2

. . . . .

bi(k)

o1 o2

qT

oT



9/33

RepresentingStateDiagrambyTrellis

1 2 3

s1

s2

s3

0 1 2 3 4

Thedashedlinerepresentsanulltransition,wherenoobservationsymbolisgenerated



10/33

ThreeBasicHMMProblems1. Scoring: GivenanobservationsequenceO ={o1, o2, ..., oT}anda

model={A,B, },howdowecomputeP(O |),theprobabilityoftheobservationsequence?

==>TheForward-BackwardAlgorithm2. Matching: GivenanobservationsequenceO ={o1, o2, ..., oT}, how

dowechooseastatesequenceQ ={q1, q2, ..., qT}whichisoptimuminsomesense?

==>TheViterbiAlgorithm3. Training: Howdoweadjustthemodelparameters={A, B, }to

maximizeP(O |)?==>

The

Baum-Welch

Re-estimation

Procedures



12/33

TheForwardAlgorithm Letusdefinetheforwardvariable,t(i),astheprobabilityofthe

partialobservationsequenceuptotimetand statesi attimet,giventhemodel,i.e.

t(i) = P(o1o2. . . ot, qt =si|) Itcaneasilybeshownthat:

1(i) = ibi(o1), 1iNN

P(O|) = T(i) Byinduction: i=1

N 1tT1t+1(j)= [ t(i)aij ]bj(ot+1), 1jN

i=1 CalculationisontheorderofN2T.

ForN= 5, T=10010052 computations,insteadof10726.345AutomaticSpeechRecognition HMM12


13/33

ForwardAlgorithm Illustrations1

si

sj

sN

1 t t+1 T

a1j

aNj

ajj

aij

t(i)

t+1(j)



14/33

TheBackwardAlgorithm Similarly,letusdefinethebackwardvariable,t(i), as the

probabilityofthepartialobservationsequencefromtimet+ 1 totheend,givenstatesi attimetandthemodel,i.e.

t(i) = P(ot+1ot+2 . . . oT|qt =si, ) Itcaneasilybeshownthat:

and:

Byinduction:N

t(i)=j=1

T(i) = 1,

1

i

N

N

P(O|) = ibi(o1)1(i)i=1

t=T1, T2, . . . , 1aijbj(ot+1) t+1 (j), 1iN



15/33

BackwardProcedure Illustrations1

si

sj

sN

1 t t+1 T

ai1

aiN

aij

aiit(i)

t+1(j)



16/33

N

FindingOptimalStateSequences Onecriterionchoosesstates,qt,whichareindividuallymostlikely

Thismaximizestheexpectednumberofcorrectstates Letusdefine

t(i) astheprobabilityofbeinginstates

i attimet,

giventheobservationsequenceandthemodel,i.e.t(i) = P(qt = si|O, )

i=1

t(i) = 1, t Thentheindividuallymostlikelystate,qt, at timet is:

qt = argmax t(i) 1 tT1iN

Notethatitcanbeshownthat:t(i) = t(i)t(i)

P(O|)



17/33

i

FindingOptimalStateSequences Theindividualoptimalitycriterionhastheproblemthatthe

optimumstatesequencemaynotobeystatetransitionconstraints Anotheroptimalitycriterionistochoosethestatesequencewhich

maximizesP(Q,O|);ThiscanbefoundbytheViterbialgorithm Letusdefinet(i) asthehighestprobabilityalongasinglepath,at

timet,whichaccountsforthefirsttobservations,i.e.t(i) = max P(q1q2 . . . qt1, qt = si, o1o2 . . . ot|)

q1,q2,...,qt1

Byinduction: t+1(j) = [max t(i)aij ]bj(ot+1) Toretrievethestatesequence,wemustkeeptrackofthestate

sequencewhichgavethebestpath,attimet,tostatesi

Wedo

this

in

a

separate

array

t(i)



18/33

T

TheViterbiAlgorithm1. Initialization:

1(i) = ibi(o1), 1 iN1(i) = 0

2.

Recursion:

t(j) = max1iN

[t1(i)aij ]bj(ot), 2 tT 1 jNt(j) = argmax[t1(i)aij ], 2 tT 1 jN

1iN3. Termination:

P = max1iN[T(i)]

q = argmax[T(i)]1iN

4. Path(state-sequence)backtracking:

qt = t+1(qt+1), t= T1, T2, . . . , 1ComputationN2T6.345AutomaticSpeechRecognition HMM18


19/33

TheViterbiAlgorithm: AnExample

1

s1

s3

s2

4320

0 bbaa

1 2 3

.8

.2.5.5

.7

.3.3.7

P(a)P(b).5 .4

.1

.5

.2

.3

O={a a b b}

.40

.09

.35.35

.40

.15

.10.10

.15

.21.21

.20.20.20.20.20.20.20.20.20

.10.10.10.10.10

.09



20/33

TheViterbiAlgorithm: AnExample (contd)0 a aa aab aabb

s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032

s2 s1,0

.2

s1, a

.21

s1, a

.084

s1, b

.0144

s1, b

.00144

s2, a .04 s2, a .042 s2, b .0168 s2, b .00336s2,0 .021 s2,0 .0084 s2,0 .00168 s2,0 .000336

s3 s2,0 .02s2, a .03 s2, a .0315 s2, b .0294 s2, b .00588

0 a a b b

1.0 0.4 0.16 0.016 0.0016s1

s2

s3

0 1 2 3 4

0.005880.02940.03150.030.02

0.003360.01680.0840.210.2



21/33

MatchingUsingForward-BackwardAlgorithm0 a aa aab aabb

s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032

s2 s1,0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144s2, a .04 s2, a .066 s2, b .0364 s2, b .0108s2,0 .033 s2,0 .0182 s2,0 .0054 s2,0 .001256

s3 s2,0 .02s2, a .03 s2, a .0495 s2, b .0637 s2, b .0189

s1

s2

s3

0 a a b b

1.0 0.4 0.16 0.016 0.0016

0.012560.05400.1820.330.2

0.0201560.06910.06770.0630.02

0 1 2 3 46.345AutomaticSpeechRecognition HMM21


22/33

Baum-WelchRe-estimation Baum-Welchre-estimationusesEMtodetermineMLparameters Definet(i, j)astheprobabilityofbeinginstatesi attimetand

statesj attimet+ 1,giventhemodelandobservationsequencet(i, j) = P(qt =si, qt+1 =sj|O, )

Then:t(i, j) = t(i)aijbj(ot+1)t+1(j)

P(O|)N

t(i) = t(i, j)j=1

Summingt(i)andt(i, j),weget:T1

t(i) = expectednumberoftransitionsfromsit=1

T1

t(i, j) =

expectednumber

of

transitions

from

sito

sj

t=1



23/33

Baum-WelchRe-estimationProceduress1

si

sj

sN

1 t-1 t t+1 t+2 T

t(i)

t+1(j)

aij



24/33

TT

Baum-WelchRe-estimationFormulas = expectednumberoftimesinstatesi att= 1

= 1(i)

aij = expectednumberoftransitionsfromstatesi tosjexpectednumberoftransitionsfromstatesi

T1t(i, j)

t=1=T1

t(i)

t=1

bj(k) = expectednumberoftimesinstatesj withsymbolvkexpectednumberoftimesinstatesj

t(j)t=1

ot=vk=

t(j)t=1



25/33

Baum-WelchRe-estimationFormulasA, ) isthe If= (A,B, ) istheinitialmodel,and= (B,

re-estimatedmodel. Thenitcanbeprovedthateither:1. Theinitialmodel,,definesacriticalpointofthelikelihood

function,inwhichcase= , or2. ModelismorelikelythaninthesensethatP(O|) > P(O|),

i.e.,wehavefoundanewmodelfromwhichtheobservationsequenceismorelikelytohavebeenproduced.

ThuswecanimprovetheprobabilityofObeingobservedfromthemodelifweiterativelyuseinplaceofandrepeatthere-estimationuntilsomelimitingpointisreached. TheresultingmodeliscalledthemaximumlikelihoodHMM.



26/33

K K

MultipleObservationSequences Speechrecognitiontypicallyusesleft-to-rightHMMs. TheseHMMs

cannotbetrainedusingasingleobservationsequence,becauseonlyasmallnumberofobservationsareavailabletotraineachstate. Toobtainreliableestimatesofmodelparameters,onemustusemultipleobservationsequences. Inthiscase,there-estimationprocedureneedstobemodified.

LetusdenotethesetofKobservationsequencesasO= {O(1),O(2), . . . , O(K)}

1 2 Tk}isthek-thobservationsequence.whereO(k) = {o(k), o(k), . . . , o(k) Assumethattheobservationssequencesaremutually

independent,wewanttoestimatetheparameterssoastomaximize

P(O|) = P(O(k) |) = Pkk=1 k=1



27/33

Tk

MultipleObservationSequences (contd) Sincethere-estimationformulasarebasedonfrequencyof

occurrenceofvariousevents,wecanmodifythembyaddinguptheindividualfrequenciesofoccurrenceforeachsequence

K Tk1 K 1 Tk1t

k(i, j)k=1

Pk t=1t

k(i)aijbj(o(k)t+1)kt+1(j)k=1 t=1 =aij =

K Tk1 K 1 Tk1

t

k(i) t

k(i)kt(i)

k=1 t=1 k=1 Pk t=1K Tk K 1

tk(j)

k=1P

k

t=1

tk(i)kt(i)

k=1 t=1ot(k)=v ot(k)=vbj() =

K Tk = K Tkt

k(j)k=1

Pkt=11

t

k(i)kt(i)k=1t=1



28/33

Phone-basedHMMs Word-basedHMMsareappropriateforsmallvocabularyspeech

recognition. ForlargevocabularyASR,sub-word-based(e.g.,phone-based)modelsaremoreappropriate.



29/33

Phone-basedHMMs (contd) Thephonemodelscanhavemanystates,andwordsaremadeup

fromaconcatenationofphonemodels.



30/33

ContinuousDensityHiddenMarkovModels AcontinuousdensityHMMreplacesthediscreteobservation

probabilities,bj(k),byacontinuousPDFbj(x) Acommonpracticeistorepresentbj(x)asamixtureofGaussians:

Mbj(x) = cjkN[x, jk,jk] 1jN

k=1

where cjk isthemixtureweightM

cjk 0 (1jN, 1kM, and cjk = 1,1jN),k=1

N isthenormaldensity,andjk andjk arethemeanvectorandcovariancematrixassociatedwithstatejandmixturek.



31/33

AcousticModellingVariations Semi-continuousHMMsfirstcomputeaVQcodebookofsizeM

TheVQcodebookisthenmodelledasafamilyofGaussianPDFs EachcodewordisrepresentedbyaGaussianPDF,andmaybe

usedtogetherwithotherstomodeltheacousticvectors FromtheCD-HMMviewpoint,thisisequivalenttousingthe

samesetofM mixturestomodelallthestates ItisthereforeoftenreferredtoasaTiedMixtureHMM

Allthreemethodshavebeenusedinmanyspeechrecognitiontasks,withvaryingoutcomes

Forlarge-vocabulary,continuousspeechrecognitionwithsufficientamount(i.e.,tensofhours)oftrainingdata,CD-HMMsystemscurrentlyyieldthebestperformance,butwithconsiderable

increase

in

computation



32/33

Implementation Issues Scaling: topreventunderflow SegmentalK-meansTraining: totrainobservationprobabilitiesby

firstperformingViterbialignment Initialestimatesof: toproviderobustmodels Pruning: toreducesearchcomputation



33/33

References X.Huang,A.Acero,andH.Hon,SpokenLanguageProcessing,

Prentice-Hall,2001. F.Jelinek,StatisticalMethodsforSpeechRecognition. MITPress,

1997. L.RabinerandB.Juang,FundamentalsofSpeechRecognition,

Prentice-Hall,1993.


Date post:	04-Jun-2018
Category:	Documents
Upload:	arvindravi
View:	235 times
Download:	0 times

Hidden Markov Model - ASR

Documents