+ All Categories
Home > Documents > Hidden Markov Models

Hidden Markov Models

Date post: 27-Jan-2016
Category:
Upload: cullen
View: 79 times
Download: 0 times
Share this document with a friend
Description:
Hidden Markov Models. IP notice: slides from Dan Jurafsky. Markov Chains Hidden Markov Models Three Algorithms for HMMs The Forward Algorithm The Viterbi Algorithm The Baum-Welch (EM Algorithm) Applications: The Ice Cream Task Part of Speech Tagging. Outline. - PowerPoint PPT Presentation
Popular Tags:
79
Hidden Markov Hidden Markov Models Models IP notice: slides from Dan Jurafsky
Transcript
Page 1: Hidden Markov Models

Hidden Markov Hidden Markov ModelsModels

IP notice: slides from Dan Jurafsky

Page 2: Hidden Markov Models

OutlineOutline Markov ChainsMarkov Chains Hidden Markov ModelsHidden Markov Models Three Algorithms for HMMsThree Algorithms for HMMs

The Forward Algorithm The Viterbi Algorithm The Baum-Welch (EM Algorithm)

Applications:Applications: The Ice Cream Task Part of Speech Tagging

Page 3: Hidden Markov Models

Chomsky GrammarsChomsky Grammars

Distinguish grammatical English from Distinguish grammatical English from ungrammatical English:ungrammatical English: John thinks Sara hit the boy * The hit thinks Sara John boy John thinks the boy was hit by Sara Who does John think Sara hit? John thinks Sara hit the boy and the girl * Who does John think Sara hit the boy and? John thinks Sara hit the boy with the bat What does John think Sara hit the boy with? Colorless green ideas sleep furiously. * Green sleep furiously ideas colorless.

Page 4: Hidden Markov Models

Acceptors and Acceptors and TransformersTransformers Chomsky’s grammars are about which Chomsky’s grammars are about which

utterances are acceptableutterances are acceptable Other research programs are aimed at Other research programs are aimed at

transformingtransforming utterances utterances Translate a English sentence into Japanese… Transform a speech waveform into transcribed

words… Compress a sentence, summarize a text… Transform a syntactic analysis into a semantic

analysis… Generate a text from a semantic representation…

Page 5: Hidden Markov Models

Strings and TreesStrings and Trees

Early on, trees were realized to be a useful tool in Early on, trees were realized to be a useful tool in describing what is grammaticaldescribing what is grammatical A sentence is a noun phrase (NP) followed by a verb phrase

(VP) A noun phrase is a determiner (DT) followed by a noun (NN) A noun phrase is a noun phrase (NP) followed by a

prepositional phrase (PP) A PP is a preposition (IN) followed by an NP

A string is acceptable if it has an acceptable tree …A string is acceptable if it has an acceptable tree … Transformations may take place at the tree level …Transformations may take place at the tree level …

S

NP VP

NP PP

IN NP

Page 6: Hidden Markov Models

Natural Language Natural Language ProcessingProcessing 1980s1980s: Many tree-based grammatical : Many tree-based grammatical

formalismsformalisms 1990s1990s: Regression to string-based formalisms: Regression to string-based formalisms

Hidden Markov Models (HMM), Finite-State Acceptors (FSAs) and Transducers (FSTs)

N-gram models for accepting sentences [eg, Jelinek 90]

Taggers and other statistical transformations [eg, Church 88]

Machine translation [eg, Brown et al 93] Software toolkits implementing generic weighted

FST operations [eg, Mohri, Pereira, Riley 00]

Page 7: Hidden Markov Models

Natural Language Natural Language ProcessingProcessing 2000s2000s: Emerging interest in tree-based : Emerging interest in tree-based

probabilistic modelsprobabilistic models Machine translation

• [Wu 97, Yamada & Knight 02, Melamed 03, Chiang 05, …] Summarization

• [Knight & Marcu 00, …] Paraphrasing

• [Pang et al 03, …] Question answering

• [Echihabi & Marcu 03, …] Natural language generation

• [Bangalore & Rambow 00, …]

Page 8: Hidden Markov Models

FSA-s and FST-sFSA-s and FST-s                                               

                                                                                                             

           

Page 9: Hidden Markov Models

Finite-State Transducer Finite-State Transducer (FST)(FST)

k

n

i

g

h

t

Original input: Transformation:q

i : AY qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

Page 10: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

q2

Original input: Transformation:

k

n

i

g

h

t

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

Page 11: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

N

Original input: Transformation:

k

n

i

g

h

t

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

Page 12: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

q AY

N

Original input: Transformation:

t

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

k

n

i

g

h

t

Page 13: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

q3

AY

N

Original input: Transformation:k

n

i

g

h

t

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

Page 14: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

q4

AY

N

Original input: Transformation:k

n

i

g

h

t

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

Page 15: Hidden Markov Models

Finite-State (String) Finite-State (String) TransducerTransducer

Tqfinal

AY

N

k

n

i

g

h

t

Original input: Transformation:

qq2

qfinal

q3 q4

k : ε

n : N

h : εg : ε

t : T

i : AY

Page 16: Hidden Markov Models

DefinitionsDefinitions A A weighted finite-state automaton (WFSA)weighted finite-state automaton (WFSA)

An FSA with probabilities on the arcs The sum of the probabilities leaving any arc

must sum to one A A Markov chain (or observable Markov Markov chain (or observable Markov

Model)Model) a special case of a WFST in which the input

sequence uniquely determines which states the automaton will go through

Markov chains can’t represent inherently Markov chains can’t represent inherently ambiguous problemsambiguous problems Useful for assigning probabilities to

unambiguous sequences

Page 17: Hidden Markov Models

Weighted Finite State Weighted Finite State TransducerTransducer FST: FSA whose state transitions are labeled FST: FSA whose state transitions are labeled

with both input and output symbols.with both input and output symbols. A weighted A weighted transducer puts weights on transducer puts weights on

transitions in addition to the input and output transitions in addition to the input and output symbolssymbols

Weights may encode probabilities, durations, Weights may encode probabilities, durations, penalties, ...penalties, ...

Used in speech recognitionUsed in speech recognition Tutorial at Tutorial at

http://www.cs.nyu.edu/~mohri/pub/hbka.pdfhttp://www.cs.nyu.edu/~mohri/pub/hbka.pdf

Page 18: Hidden Markov Models

Markov chain for Markov chain for weatherweather

Page 19: Hidden Markov Models

Markov chain for wordsMarkov chain for words

Page 20: Hidden Markov Models

Markov chain = “First-order Markov chain = “First-order observable Markov Model”observable Markov Model”

A set of states A set of states QQ q1, q2…qN sequence of states: state at time t is qt

Transition probabilities: Transition probabilities: a set of probabilities A = a01a02…an1…ann.

Each aij represents the probability of transitioning from state i to state j

Distinguished start and end statesDistinguished start and end states

aij P(qt j | qt 1 i) 1i, j N

aij 1; 1i Nj1

N

Page 21: Hidden Markov Models

Markov chain = “First-order Markov chain = “First-order observable Markov Model”observable Markov Model”

Markov AssumptionMarkov Assumption Current state only depends on previous stateCurrent state only depends on previous state

PP((qqii | | qq11 … … qqii-1-1) = ) = PP((qqii | | qqii-1-1))

Page 22: Hidden Markov Models

Another representation for Another representation for start statestart state Instead of start stateInstead of start state Special initial probability vector Special initial probability vector

An initial distribution over probability of start states

Constraints:Constraints:

i P(q1 i) 1i N

j 1j1

N

Page 23: Hidden Markov Models

The weather model using The weather model using

Page 24: Hidden Markov Models

The weather model: The weather model: specific examplespecific example

Page 25: Hidden Markov Models

Markov chain for weatherMarkov chain for weather

What is the probability of 4 consecutive warm What is the probability of 4 consecutive warm days?days?

Sequence is warm-warm-warm-warmSequence is warm-warm-warm-warm i.e., state sequence is 3-3-3-3i.e., state sequence is 3-3-3-3

PP(3, 3, 3, 3) =(3, 3, 3, 3) =

33aa3333aa3333aa3333aa3333 = 0.2 • (0.6) = 0.2 • (0.6)33 = 0.0432 = 0.0432

Page 26: Hidden Markov Models

How about?How about?

Hot hot hot hotHot hot hot hot Cold hot cold hotCold hot cold hot

What does the difference in these What does the difference in these probabilities tell you about the real world probabilities tell you about the real world weather info encoded in the figure?weather info encoded in the figure?

Page 27: Hidden Markov Models

HMM for Ice CreamHMM for Ice Cream

You are a climatologist in the year 2799You are a climatologist in the year 2799 Studying global warmingStudying global warming You can’t find any records of the weather in You can’t find any records of the weather in

Baltimore, MD for summer of 2008Baltimore, MD for summer of 2008 But you find Jason Eisner’s diaryBut you find Jason Eisner’s diary Which lists how many ice-creams Jason ate Which lists how many ice-creams Jason ate

every date that summerevery date that summer Our job: figure out how hot it wasOur job: figure out how hot it was

Page 28: Hidden Markov Models

Hidden Markov ModelHidden Markov Model

For Markov chains, the output symbols are the same as the For Markov chains, the output symbols are the same as the states.states. See hot weather: we’re in state hot

But in named-entity or part-of-speech tagging (and speech But in named-entity or part-of-speech tagging (and speech recognition and other things)recognition and other things) The output symbols are words But the hidden states are something else

• Part-of-speech tags

• Named entity tags

So we need an extension!So we need an extension! A A Hidden Markov ModelHidden Markov Model is an extension of a Markov chain in is an extension of a Markov chain in

which the input symbols are not the same as the states.which the input symbols are not the same as the states. This means This means we don’t know which state we are inwe don’t know which state we are in..

Page 29: Hidden Markov Models

Hidden Markov ModelsHidden Markov Models

Page 30: Hidden Markov Models

AssumptionsAssumptions

Markov assumption:Markov assumption:

PP((qqii | | qq11 … … qqii-1-1) = ) = PP((qqii | | qqii-1-1))

Output-independence assumptionOutput-independence assumption

P(ot | O1t 1,q1

t ) P(ot |qt )

Page 31: Hidden Markov Models

Eisner taskEisner task

GivenGiven Ice Cream Observation Sequence: 1,2,3,2,2,2,3…

Produce:Produce: Weather Sequence: H,C,H,H,H,C…

Page 32: Hidden Markov Models

HMM for ice creamHMM for ice cream

Page 33: Hidden Markov Models

Different types of HMM Different types of HMM structurestructure

Bakis = left-to-right Ergodic = fully-connected

Page 34: Hidden Markov Models

The Three Basic Problems for The Three Basic Problems for HMMsHMMs

Problem 1 (Problem 1 (EvaluationEvaluation)): : Given the observation sequence Given the observation sequence OO=(=(oo11oo22……ooTT)), and an HMM model , and an HMM model = ( = (AA,,BB)), , how do we how do we

efficiently compute efficiently compute PP((OO| | )), the probability of the , the probability of the observation sequence, given the modelobservation sequence, given the model

Problem 2 (Problem 2 (DecodingDecoding)): : Given the observation sequence Given the observation sequence OO=(=(oo11oo22……ooTT)), and an HMM model , and an HMM model = ( = (AA,,BB)), , how do we how do we

choose a corresponding state sequence choose a corresponding state sequence QQ=(=(qq11qq22……qqTT)) that that

is optimal in some sense (i.e., best explains the is optimal in some sense (i.e., best explains the observations)observations)

Problem 3 (Problem 3 (Learning): Learning): How do we adjust the model How do we adjust the model parameters parameters = ( = (AA,,BB)) to maximize to maximize PP((OO| | ) )??

Jack Ferguson at IDA in the 1960s

Page 35: Hidden Markov Models

Problem 1: computing the Problem 1: computing the observation likelihoodobservation likelihood

Given the following HMM:Given the following HMM:

How likely is the sequence 3 1 3?How likely is the sequence 3 1 3?

Page 36: Hidden Markov Models

How to compute likelihoodHow to compute likelihood

For a Markov chain, we just follow the states For a Markov chain, we just follow the states 3 1 3 and multiply the probabilities3 1 3 and multiply the probabilities

But for an HMM, we don’t know what the But for an HMM, we don’t know what the states are!states are!

So let’s start with a simpler situation.So let’s start with a simpler situation. Computing the observation likelihood for a Computing the observation likelihood for a

givengiven hidden state sequence hidden state sequence Suppose we knew the weather and wanted to

predict how much ice cream Jason would eat. i.e. P( 3 1 3 | H H C)

Page 37: Hidden Markov Models

Computing likelihood of 3 1 3 given Computing likelihood of 3 1 3 given hidden state sequencehidden state sequence

Page 38: Hidden Markov Models

Computing joint probability of Computing joint probability of observation and state sequenceobservation and state sequence

Page 39: Hidden Markov Models

Computing total likelihood of Computing total likelihood of 3 1 33 1 3 We would need to sum overWe would need to sum over

Hot hot cold Hot hot hot Hot cold hot ….

How many possible hidden state sequences are there How many possible hidden state sequences are there for this sequence?for this sequence?

How about in general for an HMM with How about in general for an HMM with NN hidden states hidden states and a sequence of and a sequence of TT observations? observations? NT

So we can’t just do separate computation for each So we can’t just do separate computation for each hidden state sequence.hidden state sequence.

Page 40: Hidden Markov Models

Instead: the Forward Instead: the Forward algorithmalgorithm A kind of A kind of dynamic programmingdynamic programming algorithm algorithm

Just like Minimum Edit Distance Uses a table to store intermediate values

Idea:Idea: Compute the likelihood of the observation

sequence By summing over all possible hidden state

sequences But doing this efficiently

• By folding all the sequences into a single trellis

Page 41: Hidden Markov Models

The forward algorithmThe forward algorithm

The goal of the forward algorithm is to The goal of the forward algorithm is to computecompute

PP((oo11, , oo22 … … ooTT, , qqTT = = qqFF | | ))

We’ll do this by recursionWe’ll do this by recursion

Page 42: Hidden Markov Models

The forward algorithmThe forward algorithm

Each cell of the forward algorithm trellis Each cell of the forward algorithm trellis tt((jj)) Represents the probability of being in state j After seeing the first t observations Given the automaton

Each cell thus expresses the following Each cell thus expresses the following probabilityprobability

tt((jj) = ) = PP((oo11, , oo22 … … oott, , qqtt = = jj | | ))

Page 43: Hidden Markov Models

The Forward RecursionThe Forward Recursion

Page 44: Hidden Markov Models

The Forward TrellisThe Forward Trellis

Page 45: Hidden Markov Models

We update each cellWe update each cell

Page 46: Hidden Markov Models

The Forward AlgorithmThe Forward Algorithm

Page 47: Hidden Markov Models

DecodingDecoding

Given an observation sequenceGiven an observation sequence 3 1 3

And an HMMAnd an HMM The task of the The task of the decoderdecoder

To find the best hidden state sequence

Given the observation sequence Given the observation sequence OO=(=(oo11oo22……ooTT)), ,

and an HMM model and an HMM model = (= (AA,,BB)), , how do we how do we choose a corresponding state sequence choose a corresponding state sequence QQ=(=(qq11qq22……qqTT)) that is optimal in some sense that is optimal in some sense

(i.e., best explains the observations)(i.e., best explains the observations)

Page 48: Hidden Markov Models

DecodingDecoding One possibility:One possibility:

For each hidden state sequence For each hidden state sequence QQ• HHH, HHC, HCH, HHH, HHC, HCH,

Compute Compute PP((OO||QQ)) Pick the highest one Pick the highest one

Why not?Why not? NNTT

Instead:Instead: The Viterbi algorithmThe Viterbi algorithm Is again a Is again a dynamic programmingdynamic programming algorithm algorithm Uses a similar trellis to the Forward algorithmUses a similar trellis to the Forward algorithm

Page 49: Hidden Markov Models

Viterbi intuitionViterbi intuition

We want to compute the joint probability of We want to compute the joint probability of the observation sequence together with the the observation sequence together with the best state sequence best state sequence

maxq 0,q1,...,qT

P(q0,q1,...,qT ,o1,o2,...,oT ,qT qF | )

Page 50: Hidden Markov Models

Viterbi RecursionViterbi Recursion

Page 51: Hidden Markov Models

The Viterbi trellisThe Viterbi trellis

Page 52: Hidden Markov Models

Viterbi intuitionViterbi intuition

Process observation sequence left to rightProcess observation sequence left to right Filling out the trellisFilling out the trellis Each cell:Each cell:

)()(max)( 11

tjijt

N

it obaivjv

Page 53: Hidden Markov Models

Viterbi AlgorithmViterbi Algorithm

Page 54: Hidden Markov Models

Viterbi backtraceViterbi backtrace

Page 55: Hidden Markov Models

Training a HMMTraining a HMM

Forward-backward or Baum-Welch algorithm Forward-backward or Baum-Welch algorithm (Expectation Maximization)(Expectation Maximization)

Backward probabilityBackward probability

tt((ii) = ) = PP((oott+1+1, , oott+2+2……ooT T | | qqtt = = i, i, ))

TT((ii) = ) = aaii,,F F 1 1 ii NN

Page 56: Hidden Markov Models

function function FORWARD-BACKWARD(FORWARD-BACKWARD(observations observations of len of len TT, , output vocabulary output vocabulary VV, , hidden state set Qhidden state set Q) ) returns returns HMMHMM=(=(AA,,BB))

initialize initialize A A and and BB

iterate iterate until convergenceuntil convergence

E-stepE-step

M-stepM-step

returnreturn AA, , BB

Page 57: Hidden Markov Models

Hidden Markov Models for Hidden Markov Models for Part of Speech TaggingPart of Speech Tagging

Page 58: Hidden Markov Models

Part of speech taggingPart of speech tagging

8 (ish) traditional English parts of speech8 (ish) traditional English parts of speech Noun, verb, adjective, preposition, adverb, article, Noun, verb, adjective, preposition, adverb, article,

interjection, pronoun, conjunction, etc.interjection, pronoun, conjunction, etc. This idea has been around for over 2000 years This idea has been around for over 2000 years

(Dionysius Thrax of Alexandria, c. 100 B.C.)(Dionysius Thrax of Alexandria, c. 100 B.C.) Called: parts-of-speech, lexical category, word Called: parts-of-speech, lexical category, word

classes, morphological classes, lexical tags, POSclasses, morphological classes, lexical tags, POS We’ll use POS most frequentlyWe’ll use POS most frequently Assuming that you know what these areAssuming that you know what these are

Page 59: Hidden Markov Models

POS examplesPOS examples

NN noun noun chair, bandwidth, pacingchair, bandwidth, pacing VV verbverb study, debate, munch study, debate, munch ADJADJ adjadj purple, tall, ridiculouspurple, tall, ridiculous ADVADV adverbadverb unfortunately, slowly,unfortunately, slowly, PP prepositionpreposition of, by, toof, by, to PROPRO pronounpronoun I, me, mineI, me, mine DETDET determinerdeterminer the, a, that, thosethe, a, that, those

Page 60: Hidden Markov Models

POS Tagging examplePOS Tagging example

WORDWORD tagtag

thethe DETDETkoalakoala NNput put VVthe the DETDETkeyskeys NNonon PPthethe DETDETtabletable NN

Page 61: Hidden Markov Models

POS TaggingPOS Tagging

Words often have more than one POS: Words often have more than one POS: backback The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB

The POS tagging problem is to determine the The POS tagging problem is to determine the POS tag for a particular instance of a word.POS tag for a particular instance of a word.

These examples from Dekang Lin

Page 62: Hidden Markov Models

POS tagging as a POS tagging as a sequence classification sequence classification tasktask We are given a sentence (an “observation” or We are given a sentence (an “observation” or

“sequence of observations”)“sequence of observations”) Secretariat is expected to race tomorrow She promised to back the bill

What is the best sequence of tags which What is the best sequence of tags which corresponds to this sequence of corresponds to this sequence of observations?observations?

Probabilistic view:Probabilistic view: Consider all possible sequences of tags Out of this universe of sequences, choose the tag

sequence which is most probable given the observation sequence of n words w1…wn.

Page 63: Hidden Markov Models

Getting to HMMGetting to HMM

We want, out of all sequences of We want, out of all sequences of nn tags tags tt11……ttnn the single tag sequence such that the single tag sequence such that PP((tt11……ttnn||ww11……wwnn)) is highest. is highest.

Hat ^ means “our estimate of the best one”Hat ^ means “our estimate of the best one” ArgmaxArgmaxxx ff((xx)) means “the means “the xx such that such that ff((xx)) is is

maximized”maximized”

Page 64: Hidden Markov Models

Getting to HMMGetting to HMM

This equation is guaranteed to give us the This equation is guaranteed to give us the best tag sequencebest tag sequence

But how to make it operational? How to But how to make it operational? How to compute this value?compute this value?

Intuition of Bayesian classification:Intuition of Bayesian classification: Use Bayes rule to transform into a set of other

probabilities that are easier to compute

Page 65: Hidden Markov Models

Using Bayes RuleUsing Bayes Rule

Page 66: Hidden Markov Models

Likelihood and priorLikelihood and prior

n

Page 67: Hidden Markov Models

Two kinds of probabilities Two kinds of probabilities (1)(1) Tag transition probabilities Tag transition probabilities PP((ttii||ttii-1-1))

Determiners likely to precede adjs and nouns• That/DT flight/NN• The/DT yellow/JJ hat/NN

• So we expect P(NN|DT) and P(JJ|DT) to be high• But P(DT|JJ) to be low

Compute P(NN|DT) by counting in a labeled corpus:

Page 68: Hidden Markov Models

Two kinds of probabilities Two kinds of probabilities (2)(2) Word likelihood probabilities Word likelihood probabilities PP((wwii||ttii))

VBZ (3sg Pres verb) likely to be “is” Compute P(is|VBZ) by counting in a labeled corpus:

Page 69: Hidden Markov Models

POS tagging: likelihood POS tagging: likelihood and priorand prior

n

Page 70: Hidden Markov Models

An Example: the verb An Example: the verb “race”“race” Secretariat/Secretariat/NNPNNP is/ is/VBZVBZ expected/ expected/VBNVBN to/ to/TO TO

racerace//VBVB tomorrow/ tomorrow/NRNR People/People/NNSNNS continue/ continue/VBVB to/ to/TOTO inquire/ inquire/VBVB the/ the/DTDT

reason/reason/NNNN for/ for/ININ the/ the/DTDT racerace//NNNN for/ for/ININ outer/ outer/JJJJ space/space/NNNN

How do we pick the right tag?How do we pick the right tag?

Page 71: Hidden Markov Models

Disambiguating “race”Disambiguating “race”

Page 72: Hidden Markov Models

P(NN|TO) = .00047P(NN|TO) = .00047 P(VB|TO) = .83P(VB|TO) = .83 P(race|NN) = .00057P(race|NN) = .00057 P(race|VB) = .00012P(race|VB) = .00012 P(NR|VB) = .0027P(NR|VB) = .0027 P(NR|NN) = .0012P(NR|NN) = .0012

P(VB|TO)P(race|VB)P(NR|VB) = .00000027P(VB|TO)P(race|VB)P(NR|VB) = .00000027 P(NN|TO)P(race|NN)P(NR|NN) =.00000000032P(NN|TO)P(race|NN)P(NR|NN) =.00000000032

So we (correctly) choose the verb readingSo we (correctly) choose the verb reading

Page 73: Hidden Markov Models

Transitions between the hidden states of Transitions between the hidden states of HMM, showing A probsHMM, showing A probs

Page 74: Hidden Markov Models

B observation likelihoods B observation likelihoods for POS HMMfor POS HMM

Page 75: Hidden Markov Models

The A matrix for the POS The A matrix for the POS HMMHMM

Page 76: Hidden Markov Models

The B matrix for the POS The B matrix for the POS HMMHMM

Page 77: Hidden Markov Models

Viterbi intuition: we are looking Viterbi intuition: we are looking for the best ‘path’for the best ‘path’

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4S3 S5

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Page 78: Hidden Markov Models

Viterbi exampleViterbi example

Page 79: Hidden Markov Models

OutlineOutline

Markov ChainsMarkov Chains Hidden Markov ModelsHidden Markov Models Three Algorithms for HMMsThree Algorithms for HMMs

The Forward AlgorithmThe Forward Algorithm The Viterbi AlgorithmThe Viterbi Algorithm The Baum-Welch (EM Algorithm)The Baum-Welch (EM Algorithm)

Applications:Applications: The Ice Cream TaskThe Ice Cream Task Part of Speech TaggingPart of Speech Tagging Next time: Named Entity TaggingNext time: Named Entity Tagging


Recommended