+ All Categories
Home > Documents > Hidden Markov Model - ASR

Hidden Markov Model - ASR

Date post: 04-Jun-2018
Category:
Upload: arvindravi
View: 235 times
Download: 0 times
Share this document with a friend

of 33

Transcript
  • 8/13/2019 Hidden Markov Model - ASR

    1/33

    HiddenMarkovModelling Introduction Problemformulation

    Forward-Backwardalgorithm

    Viterbisearch Baum-Welchparameterestimation Otherconsiderations

    Multipleobservationsequences Phone-basedmodelsforcontinuousspeechrecognition ContinuousdensityHMMs Implementationissues

    6.345AutomaticSpeechRecognition HMM1

    Lecture # 10

    Session 2003

  • 8/13/2019 Hidden Markov Model - ASR

    2/33

    InformationTheoreticApproachtoASRSpeech Generation

    Text

    Generation

    Speech Recognition

    noisy

    communicationchannel

    Linguistic

    Decoder

    Speech

    Acoustic

    Processor

    A W

    Recognitionisachievedbymaximizingtheprobabilityofthelinguisticstring,W,giventheacousticevidence,A,i.e.,choosethe

    linguisticsequenceW suchthatP(W|A) = m

    WaxP(W|A)

    6.345AutomaticSpeechRecognition HMM2

  • 8/13/2019 Hidden Markov Model - ASR

    3/33

    InformationTheoreticApproachtoASR FromBayesrule:

    P(W|A) = P(A|W)P(W)P(A)

    HiddenMarkovmodelling(HMM)dealswiththequantityP(A|W) Changeinnotation:

    A OW

    P(A|W) P(O|)

    6.345AutomaticSpeechRecognition HMM3

  • 8/13/2019 Hidden Markov Model - ASR

    4/33

    HMM:AnExample2

    22

    11

    112

    21

    2

    02

    22

    11

    112

    21

    2

    12

    22

    11

    112

    21

    2

    21 2

    1

    2

    Consider3mugs,eachwithmixturesofstatestones,1and2 Thefractionsfortheith mugareai1 andai2, andai1+ai2= 1 Consider2urns,eachwithmixturesofblackandwhiteballs Thefractionsfortheith urnarebi(B)andbi(W);bi(B) + bi(W) = 1 Theparametervectorforthismodelis:

    ={a01, a02, a11, a12, a21, a22, b1(B), b1(W), b2(B), b2(W)}6.345AutomaticSpeechRecognition HMM4

  • 8/13/2019 Hidden Markov Model - ASR

    5/33

    HMM:AnExample (contd)2

    221

    1

    112

    21

    2

    02

    221

    1

    112

    21

    2

    12

    22

    11

    112

    21

    2

    1

    21 1

    1 11 2 2 1

    ObservationSequence: O={B , W , B , W , W , B}StateSequence: Q ={1,1,2,1,2,1}

    Goal: GiventhemodelandtheobservationsequenceO,howcantheunderlyingstatesequenceQ bedetermined?6.345AutomaticSpeechRecognition HMM5

    222

    11

    112

    21

    2

    22

    22

    11

    112

    21

    2

    1

    1 2

    222

    11

    112

    21

    2

    2

    1

  • 8/13/2019 Hidden Markov Model - ASR

    6/33

    ElementsofaDiscreteHiddenMarkovModel N: numberofstatesinthemodel

    states,s={s1, s2, . . . , sN} stateattimet,qts

    M: numberofobservationsymbols(i.e.,discreteobservations) observationsymbols,v={v1, v2, . . . , vM} observationattimet,otv

    A

    =

    {aij}: statetransitionprobabilitydistribution aij =P(qt+1 =sj|qt =si), 1i, jN

    B={bj(k)}: observationsymbolprobabilitydistributioninstatej bj(k) = P(vk att|qt =sj), 1jN, 1kM

    ={i}: initialstatedistribution i =P(q1=si), 1iN

    Notationally,anHMMistypicallywrittenas: ={A,B, }6.345AutomaticSpeechRecognition HMM6

  • 8/13/2019 Hidden Markov Model - ASR

    7/33

    HMM:AnExample (contd)Foroursimpleexample:

    ={a01, a02}, A= a11 a12 , and B= b1(B) b1(W)a21 a22 b2(B) b2(W)

    StateDiagram2-state 3-state

    1 2

    a11 a12a22

    1 32a21

    {b1(B), b1(W)} {b2(B), b2(W)}

    6.345AutomaticSpeechRecognition HMM7

  • 8/13/2019 Hidden Markov Model - ASR

    8/33

    GenerationofHMMObservations1. Chooseaninitialstate,q1=si,basedontheinitialstate

    distribution,2. Fort= 1 toT:

    Chooseot =vk accordingtothesymbolprobabilitydistributioninstatesi, bi(k)

    Transitiontoanewstateqt+1 =sj accordingtothestatetransitionprobabilitydistributionforstatesi, aij

    3. Incrementtby1, return to step 2 iftT;else,terminatea0i aij

    q1

    q2

    . . . . .

    bi(k)

    o1 o2

    qT

    oT

    6.345AutomaticSpeechRecognition HMM8

  • 8/13/2019 Hidden Markov Model - ASR

    9/33

    RepresentingStateDiagrambyTrellis

    1 2 3

    s1

    s2

    s3

    0 1 2 3 4

    Thedashedlinerepresentsanulltransition,wherenoobservationsymbolisgenerated

    6.345AutomaticSpeechRecognition HMM9

  • 8/13/2019 Hidden Markov Model - ASR

    10/33

    ThreeBasicHMMProblems1. Scoring: GivenanobservationsequenceO ={o1, o2, ..., oT}anda

    model={A,B, },howdowecomputeP(O |),theprobabilityoftheobservationsequence?

    ==>TheForward-BackwardAlgorithm2. Matching: GivenanobservationsequenceO ={o1, o2, ..., oT}, how

    dowechooseastatesequenceQ ={q1, q2, ..., qT}whichisoptimuminsomesense?

    ==>TheViterbiAlgorithm3. Training: Howdoweadjustthemodelparameters={A, B, }to

    maximizeP(O |)?==>

    The

    Baum-Welch

    Re-estimation

    Procedures

    6.345AutomaticSpeechRecognition HMM10

  • 8/13/2019 Hidden Markov Model - ASR

    11/33

    ComputationofP(O|)P(O|) = P(O,Q |)

    allQ

    P(O,Q |) = P(O|Q, )P(Q |) Considerthefixed statesequence:Q =q1q2. . . qT

    P(O|Q, ) = bq1(o1)bq2(o2). . . bqT(oT)P(Q |) = q1aq1q2aq2q3 . . . aqT1qT

    Therefore:P(O|) =

    q1,q2,...,qTq1bq1(o1)aq1q2bq2(o2). . . aqT1qTbqT (oT)

    Calculation

    required

    2T

    N

    T(there

    are

    N

    Tsuch

    sequences)

    ForN= 5, T=100210051001072 computations!

    6.345AutomaticSpeechRecognition HMM11

  • 8/13/2019 Hidden Markov Model - ASR

    12/33

    TheForwardAlgorithm Letusdefinetheforwardvariable,t(i),astheprobabilityofthe

    partialobservationsequenceuptotimetand statesi attimet,giventhemodel,i.e.

    t(i) = P(o1o2. . . ot, qt =si|) Itcaneasilybeshownthat:

    1(i) = ibi(o1), 1iNN

    P(O|) = T(i) Byinduction: i=1

    N 1tT1t+1(j)= [ t(i)aij ]bj(ot+1), 1jN

    i=1 CalculationisontheorderofN2T.

    ForN= 5, T=10010052 computations,insteadof10726.345AutomaticSpeechRecognition HMM12

  • 8/13/2019 Hidden Markov Model - ASR

    13/33

    ForwardAlgorithm Illustrations1

    si

    sj

    sN

    1 t t+1 T

    a1j

    aNj

    ajj

    aij

    t(i)

    t+1(j)

    6.345AutomaticSpeechRecognition HMM13

  • 8/13/2019 Hidden Markov Model - ASR

    14/33

    TheBackwardAlgorithm Similarly,letusdefinethebackwardvariable,t(i), as the

    probabilityofthepartialobservationsequencefromtimet+ 1 totheend,givenstatesi attimetandthemodel,i.e.

    t(i) = P(ot+1ot+2 . . . oT|qt =si, ) Itcaneasilybeshownthat:

    and:

    Byinduction:N

    t(i)=j=1

    T(i) = 1,

    1

    i

    N

    N

    P(O|) = ibi(o1)1(i)i=1

    t=T1, T2, . . . , 1aijbj(ot+1) t+1 (j), 1iN

    6.345AutomaticSpeechRecognition HMM14

  • 8/13/2019 Hidden Markov Model - ASR

    15/33

    BackwardProcedure Illustrations1

    si

    sj

    sN

    1 t t+1 T

    ai1

    aiN

    aij

    aiit(i)

    t+1(j)

    6.345AutomaticSpeechRecognition HMM15

  • 8/13/2019 Hidden Markov Model - ASR

    16/33

    N

    FindingOptimalStateSequences Onecriterionchoosesstates,qt,whichareindividuallymostlikely

    Thismaximizestheexpectednumberofcorrectstates Letusdefine

    t(i) astheprobabilityofbeinginstates

    i attimet,

    giventheobservationsequenceandthemodel,i.e.t(i) = P(qt = si|O, )

    i=1

    t(i) = 1, t Thentheindividuallymostlikelystate,qt, at timet is:

    qt = argmax t(i) 1 tT1iN

    Notethatitcanbeshownthat:t(i) = t(i)t(i)

    P(O|)

    6.345AutomaticSpeechRecognition HMM16

  • 8/13/2019 Hidden Markov Model - ASR

    17/33

    i

    FindingOptimalStateSequences Theindividualoptimalitycriterionhastheproblemthatthe

    optimumstatesequencemaynotobeystatetransitionconstraints Anotheroptimalitycriterionistochoosethestatesequencewhich

    maximizesP(Q,O|);ThiscanbefoundbytheViterbialgorithm Letusdefinet(i) asthehighestprobabilityalongasinglepath,at

    timet,whichaccountsforthefirsttobservations,i.e.t(i) = max P(q1q2 . . . qt1, qt = si, o1o2 . . . ot|)

    q1,q2,...,qt1

    Byinduction: t+1(j) = [max t(i)aij ]bj(ot+1) Toretrievethestatesequence,wemustkeeptrackofthestate

    sequencewhichgavethebestpath,attimet,tostatesi

    Wedo

    this

    in

    a

    separate

    array

    t(i)

    6.345AutomaticSpeechRecognition HMM17

  • 8/13/2019 Hidden Markov Model - ASR

    18/33

    T

    TheViterbiAlgorithm1. Initialization:

    1(i) = ibi(o1), 1 iN1(i) = 0

    2.

    Recursion:

    t(j) = max1iN

    [t1(i)aij ]bj(ot), 2 tT 1 jNt(j) = argmax[t1(i)aij ], 2 tT 1 jN

    1iN3. Termination:

    P = max1iN[T(i)]

    q = argmax[T(i)]1iN

    4. Path(state-sequence)backtracking:

    qt = t+1(qt+1), t= T1, T2, . . . , 1ComputationN2T6.345AutomaticSpeechRecognition HMM18

  • 8/13/2019 Hidden Markov Model - ASR

    19/33

    TheViterbiAlgorithm: AnExample

    1

    s1

    s3

    s2

    4320

    0 bbaa

    1 2 3

    .8

    .2.5.5

    .7

    .3.3.7

    P(a)P(b).5 .4

    .1

    .5

    .2

    .3

    O={a a b b}

    .40

    .09

    .35.35

    .40

    .15

    .10.10

    .15

    .21.21

    .20.20.20.20.20.20.20.20.20

    .10.10.10.10.10

    .09

    6.345AutomaticSpeechRecognition HMM19

  • 8/13/2019 Hidden Markov Model - ASR

    20/33

    TheViterbiAlgorithm: AnExample (contd)0 a aa aab aabb

    s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032

    s2 s1,0

    .2

    s1, a

    .21

    s1, a

    .084

    s1, b

    .0144

    s1, b

    .00144

    s2, a .04 s2, a .042 s2, b .0168 s2, b .00336s2,0 .021 s2,0 .0084 s2,0 .00168 s2,0 .000336

    s3 s2,0 .02s2, a .03 s2, a .0315 s2, b .0294 s2, b .00588

    0 a a b b

    1.0 0.4 0.16 0.016 0.0016s1

    s2

    s3

    0 1 2 3 4

    0.005880.02940.03150.030.02

    0.003360.01680.0840.210.2

    6.345AutomaticSpeechRecognition HMM20

  • 8/13/2019 Hidden Markov Model - ASR

    21/33

    MatchingUsingForward-BackwardAlgorithm0 a aa aab aabb

    s1 1.0 s1, a .4 s1, a .16 s1, b .016 s1, b .0016s1,0 .08 s1,0 .032 s1,0 .0032 s1,0 .00032

    s2 s1,0 .2 s1, a .21 s1, a .084 s1, b .0144 s1, b .00144s2, a .04 s2, a .066 s2, b .0364 s2, b .0108s2,0 .033 s2,0 .0182 s2,0 .0054 s2,0 .001256

    s3 s2,0 .02s2, a .03 s2, a .0495 s2, b .0637 s2, b .0189

    s1

    s2

    s3

    0 a a b b

    1.0 0.4 0.16 0.016 0.0016

    0.012560.05400.1820.330.2

    0.0201560.06910.06770.0630.02

    0 1 2 3 46.345AutomaticSpeechRecognition HMM21

  • 8/13/2019 Hidden Markov Model - ASR

    22/33

    Baum-WelchRe-estimation Baum-Welchre-estimationusesEMtodetermineMLparameters Definet(i, j)astheprobabilityofbeinginstatesi attimetand

    statesj attimet+ 1,giventhemodelandobservationsequencet(i, j) = P(qt =si, qt+1 =sj|O, )

    Then:t(i, j) = t(i)aijbj(ot+1)t+1(j)

    P(O|)N

    t(i) = t(i, j)j=1

    Summingt(i)andt(i, j),weget:T1

    t(i) = expectednumberoftransitionsfromsit=1

    T1

    t(i, j) =

    expectednumber

    of

    transitions

    from

    sito

    sj

    t=1

    6.345AutomaticSpeechRecognition HMM22

  • 8/13/2019 Hidden Markov Model - ASR

    23/33

    Baum-WelchRe-estimationProceduress1

    si

    sj

    sN

    1 t-1 t t+1 t+2 T

    t(i)

    t+1(j)

    aij

    6.345AutomaticSpeechRecognition HMM23

  • 8/13/2019 Hidden Markov Model - ASR

    24/33

    TT

    Baum-WelchRe-estimationFormulas = expectednumberoftimesinstatesi att= 1

    = 1(i)

    aij = expectednumberoftransitionsfromstatesi tosjexpectednumberoftransitionsfromstatesi

    T1t(i, j)

    t=1=T1

    t(i)

    t=1

    bj(k) = expectednumberoftimesinstatesj withsymbolvkexpectednumberoftimesinstatesj

    t(j)t=1

    ot=vk=

    t(j)t=1

    6.345AutomaticSpeechRecognition HMM24

  • 8/13/2019 Hidden Markov Model - ASR

    25/33

    Baum-WelchRe-estimationFormulasA, ) isthe If= (A,B, ) istheinitialmodel,and= (B,

    re-estimatedmodel. Thenitcanbeprovedthateither:1. Theinitialmodel,,definesacriticalpointofthelikelihood

    function,inwhichcase= , or2. ModelismorelikelythaninthesensethatP(O|) > P(O|),

    i.e.,wehavefoundanewmodelfromwhichtheobservationsequenceismorelikelytohavebeenproduced.

    ThuswecanimprovetheprobabilityofObeingobservedfromthemodelifweiterativelyuseinplaceofandrepeatthere-estimationuntilsomelimitingpointisreached. TheresultingmodeliscalledthemaximumlikelihoodHMM.

    6.345AutomaticSpeechRecognition HMM25

  • 8/13/2019 Hidden Markov Model - ASR

    26/33

    K K

    MultipleObservationSequences Speechrecognitiontypicallyusesleft-to-rightHMMs. TheseHMMs

    cannotbetrainedusingasingleobservationsequence,becauseonlyasmallnumberofobservationsareavailabletotraineachstate. Toobtainreliableestimatesofmodelparameters,onemustusemultipleobservationsequences. Inthiscase,there-estimationprocedureneedstobemodified.

    LetusdenotethesetofKobservationsequencesasO= {O(1),O(2), . . . , O(K)}

    1 2 Tk}isthek-thobservationsequence.whereO(k) = {o(k), o(k), . . . , o(k) Assumethattheobservationssequencesaremutually

    independent,wewanttoestimatetheparameterssoastomaximize

    P(O|) = P(O(k) |) = Pkk=1 k=1

    6.345AutomaticSpeechRecognition HMM26

  • 8/13/2019 Hidden Markov Model - ASR

    27/33

    Tk

    MultipleObservationSequences (contd) Sincethere-estimationformulasarebasedonfrequencyof

    occurrenceofvariousevents,wecanmodifythembyaddinguptheindividualfrequenciesofoccurrenceforeachsequence

    K Tk1 K 1 Tk1t

    k(i, j)k=1

    Pk t=1t

    k(i)aijbj(o(k)t+1)kt+1(j)k=1 t=1 =aij =

    K Tk1 K 1 Tk1

    t

    k(i) t

    k(i)kt(i)

    k=1 t=1 k=1 Pk t=1K Tk K 1

    tk(j)

    k=1P

    k

    t=1

    tk(i)kt(i)

    k=1 t=1ot(k)=v ot(k)=vbj() =

    K Tk = K Tkt

    k(j)k=1

    Pkt=11

    t

    k(i)kt(i)k=1t=1

    6.345AutomaticSpeechRecognition HMM27

  • 8/13/2019 Hidden Markov Model - ASR

    28/33

    Phone-basedHMMs Word-basedHMMsareappropriateforsmallvocabularyspeech

    recognition. ForlargevocabularyASR,sub-word-based(e.g.,phone-based)modelsaremoreappropriate.

    6.345AutomaticSpeechRecognition HMM28

  • 8/13/2019 Hidden Markov Model - ASR

    29/33

    Phone-basedHMMs (contd) Thephonemodelscanhavemanystates,andwordsaremadeup

    fromaconcatenationofphonemodels.

    6.345AutomaticSpeechRecognition HMM29

  • 8/13/2019 Hidden Markov Model - ASR

    30/33

    ContinuousDensityHiddenMarkovModels AcontinuousdensityHMMreplacesthediscreteobservation

    probabilities,bj(k),byacontinuousPDFbj(x) Acommonpracticeistorepresentbj(x)asamixtureofGaussians:

    Mbj(x) = cjkN[x, jk,jk] 1jN

    k=1

    where cjk isthemixtureweightM

    cjk 0 (1jN, 1kM, and cjk = 1,1jN),k=1

    N isthenormaldensity,andjk andjk arethemeanvectorandcovariancematrixassociatedwithstatejandmixturek.

    6.345AutomaticSpeechRecognition HMM30

  • 8/13/2019 Hidden Markov Model - ASR

    31/33

    AcousticModellingVariations Semi-continuousHMMsfirstcomputeaVQcodebookofsizeM

    TheVQcodebookisthenmodelledasafamilyofGaussianPDFs EachcodewordisrepresentedbyaGaussianPDF,andmaybe

    usedtogetherwithotherstomodeltheacousticvectors FromtheCD-HMMviewpoint,thisisequivalenttousingthe

    samesetofM mixturestomodelallthestates ItisthereforeoftenreferredtoasaTiedMixtureHMM

    Allthreemethodshavebeenusedinmanyspeechrecognitiontasks,withvaryingoutcomes

    Forlarge-vocabulary,continuousspeechrecognitionwithsufficientamount(i.e.,tensofhours)oftrainingdata,CD-HMMsystemscurrentlyyieldthebestperformance,butwithconsiderable

    increase

    in

    computation

    6.345AutomaticSpeechRecognition HMM31

  • 8/13/2019 Hidden Markov Model - ASR

    32/33

    Implementation Issues Scaling: topreventunderflow SegmentalK-meansTraining: totrainobservationprobabilitiesby

    firstperformingViterbialignment Initialestimatesof: toproviderobustmodels Pruning: toreducesearchcomputation

    6.345AutomaticSpeechRecognition HMM32

  • 8/13/2019 Hidden Markov Model - ASR

    33/33

    References X.Huang,A.Acero,andH.Hon,SpokenLanguageProcessing,

    Prentice-Hall,2001. F.Jelinek,StatisticalMethodsforSpeechRecognition. MITPress,

    1997. L.RabinerandB.Juang,FundamentalsofSpeechRecognition,

    Prentice-Hall,1993.

    6.345AutomaticSpeechRecognition HMM33


Recommended