Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and...

Pair HMMs and edit distance

Ristad & Yianilos

Special meeting Wed 4/14

• What: Evolving and Self-Managing Data Integration Systems

• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign

• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)

• Where: Sennott Square Building, room 5317

Special meeting 4/28 (last class)

• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning

• 10:30-11:50 am, Wean Hall 4601

• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)

Pair HMMs – Ristad & Yianolis

• HMM review– notation– inference (forward algorithm)– learning (forward-backward & EM)

• Pair HMMs– notation– generating edit strings– distance metrics (stochastic, viterbi)– inference (forward)– learning (forward-backward & EM)

• Results from R&Y paper– K-NN with trained distance, hidden prototypes– problem: phoneme strings => words

• Advanced Pair HMMs– adding state (eg for affine gap models)– Smith Waterman?– CRF training?

last week

today

HMM Notation

},...,{, emittingafter in is HMM state:

of substring a:...

ofprefix a :...

aka string,input :...

)Pr(y,probabilitemission :)(

)Pr(y,probabilitn transitio:)'(

HMM theof parameters :

1

1

1

21

'

Ktt

t

Tutt

ut

Tt

t

TT

l

ll

llsxs

xxxxx

xxxx

xxxx

xsxl

ssll

HMM Example

},...,{, emittingafter in is HMM state:

of substring a:...

ofprefix a :...

aka string,input :...

)Pr(y,probabilitemission :)(

)Pr(y,probabilitn transitio:)'(

HMM theof parameters :

1

1

1

21

'

Ktt

t

Tutt

ut

Tt

t

TT

l

ll

llsxs

xxxxx

xxxx

xxxx

xsxl

ssll

1 2

Pr(1->2)

Pr(2->1)

Pr(2->2)Pr(1->1)

Pr(1->x)

d 0.3

h 0.5

b 0.2

Pr(2->x)

a 0.3

e 0.5

o 0.2

Sample output: xT=heehahaha, sT=122121212

HMM Inference

t=1 t=2 t=2 ... t=T

l=1 ...

l=2 ...

...

l=K ...

Key point: Pr(si=l) depends only on Pr(l’->l) and si-1

x1 x2 x3 xT

HMM Inference

t=1 t=2 t=2 ... t=T

l=1 ...

l=2 ...

...

l=K ...

Key point: Pr(si=l) depends only on Pr(l’->l) and si-1

so you can propogate probabilities forward...

x1 x2 x3 xT

HMM Inference – Forward Algorithm

l=1 ....

l=2 ....

...

l=K ....

x1 x2 x3 xT

)()'()1,(

)Pr()'Pr()'Pr(

)Pr(),(

11

'

t

ttt

l

tt

xllltl

xllllsx

lsxtl

),( tl

HMM Learning - EM

Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T

• forward algorithm + epsilon

• hidden variables are states s at times t=1,...,t=T

– Maximize probability of parameters given expectations:

• replace #(l’->l)/#(l’) with weighted version of counts

• replace #(l’->x)/#(l’) with weighted version

HMM Inference

t=1 t=2 t=2 ... t=T

l=1 ...

l=2 ...

...

l=K ...

Forward algorithm: computes probabilities α(l,t) based on information in first t letters of string, ignores “downstream” information

x1 x2 x3 xT

HMM Inference

l=1 ...

l=2

...

l=K

x1 x2 x3 x

T

)Pr(),(

)Pr()Pr()Pr(

1

1

lsxli

lsxlsxlsx

iTi

iTii

ii

T

)Pr()()'()'Pr( 211 lsxxllllsx iTi

lii

Ti

),1()()'()',( 1 lixllllil

i

HMM Learning - EM

Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T

• forward backward algorithm

• hidden variables are states s at times t=1,...,t=T

– Maximize probability of parameters given expectations:

• replace #(l’->l)/#(l’) with weighted version of counts

• replace #(l’->x)/#(l’) with weighted version

Pair HMM Notation

}..1{}..1{, with associatedproperty hidden :

edits, of string:

, aka strings,input :,

),...,(),,( written also y,probabilitemission :)(

editsemissions/:}END{)},{()},{()},{(

HMMpair theof parameters :

*

VTrzr

Ezz

yxyx

bbae

babaE

kkk

n

VT

Pair HMM Example

}..1{}..1{, with associatedproperty hidden :

edits, of string:

, aka strings,input :,

),...,(),,( written also y,probabilitemission :)(

editsemissions/:}END{)},{()},{()},{(

HMMpair theof parameters :

*

VTrzr

Ezz

yxyx

bbae

babaE

kkk

n

VT

1

e Pr(e)

<a,a> 0.10

<e,e> 0.10

<h,h> 0.10

<a,-> 0.05

<h,t> 0.05

<-,h> 0.01

... ..

Pair HMM Example

1

e Pr(e)

<a,a> 0.10

<e,e> 0.10

<h,h> 0.10

<e,-> 0.05

<h,t> 0.05

<-,h> 0.01

... ..

Sample run: zT = <h,t>,<e,e><e,e><h,h>,<e,->,<e,e>Strings x,y produced by zT: x=heehee, y=teehe

Notice that x,y is also produced by z4 + <e,e>,<e,-> and many other edit strings

Distances based on pair HMMs

),,(:),,(:

)Pr()Pr()|,Pr(VTnnVTnn yxzEDITz i

i

yxzEDITz

nVT zzyx

)|,Pr(log),(stochastic TTTT yxyxd

)|Pr(maxarglog),(),,(:

viterbi n

yxzEDITz

TT zyxdTTnn

Pair HMM Inference

),()1,(),(),1(),()1,1(

),(),Pr(),(),Pr(),(),Pr(

),Pr(),(1111

vtvt

vvt

tvt

vtvt

vt

yvtxvtyxvt

yyxxyxyxyx

yxvt

)1,1( vt ),1( vt

)1,( vt ),( vt

Dynamic programming is possible: fill out matrix left-to-right, top-down

Pair HMM Inference

),()1,(),(),1(),()1,1(

),Pr(),(

vtvt

vt

yvtxvtyxvt

yxvt

t=1 t=2 t=2 ... t=T

v=1 ...

v=2 ...

...

v=K ...

x1 x2 x3 xT

Pair HMM Inference

),()1,(),(),1(),()1,1(

),Pr(),(

vtvt

vt

yvtxvtyxvt

yxvt

t=1 t=2 t=2 ... t=T

v=1 ...

v=2 ...

...

v=K ...

Pair HMM Inference

t=1 t=2 t=2 ... t=T

v=1 ...

v=2 ...

...

v=K ...

One difference: after i emissions of pair HMM, we do not know the column position

i=1

i=1 i=3

i=1 i=2

i=3

Pair HMM Inference: Forward-Backward

)1,(),(),1(),()1,1(),(

),Pr(),(),Pr(),(),Pr(),(

),Pr(),(

),Pr(),())','(),(Pr(

1111

''1

jiyjixjiyx

yxyyxxyxyx

yxji

yxjijirjir

jiji

Vj

Tij

Vj

Tii

Vj

Tiji

Vj

Ti

Vj

Tikk

t=1 t=2 t=2 ... t=T

v=1 ...

v=2 ...

...

v=K ...

Multiple states

1

e Pr(e)

<a,a> 0.10

<e,e> 0.10

<h,h> 0.10

<a,-> 0.05

<h,t> 0.05

<-,h> 0.01

... ..

3e Pr(e)

<a,a> 0.10

<e,e> 0.10

<h,h> 0.10

<a,-> 0.05

2

e Pr(e)

<a,a> 0.10

<e,e> 0.10

<h,h> 0.10

<a,-> 0.05

...v=K

...

...v=2

...v=1

t=T...t=2t=2t=1l=2

An extension: multiple states

...v=K

...

...v=2

...v=1

t=T...t=2t=2t=1l=1

conceptually, add a “state” dimension to the model

EM methods generalize easily to this setting

Back to R&Y paper...

• They consider “coarse” and “detailed” models, as well as mixtures of both.

• Coarse model is like a back-off model – merge edit operations into equivalence classes (e.g. based on equivalence classes for chars).

• Test by learning distance for K-NN with an additional latent variable

K-NN with latent prototypes

test example y (a string of phonemes)

possible prototypes x (known word pronounciation )

x1 x2 x3 xm

words from dictionary

y

w1 w2 wK

learned phonetic distance

K-NN with latent prototypes

x1 x2 x3 xm

y

w1 w2 wK

learned phonetic distance

Method needs (x,y) pairs to train a distance – to handle this, an additional level of E/M is used to pick the “latent prototype” to pair with each y

Hidden prototype K-nn

Experiments

•E1: on-line pronounciation dictionary

•E2: subset of E1 with corpus words

•E3: dictionary from training corpus

•E4: dictionary from training + test corpus (!)

•E5: E1 + E3

Experiments

Experiments

Special meeting Wed 4/14

• What: Evolving and Self-Managing Data Integration Systems

• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign

• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)

• Where: Sennott Square Building, room 5317

Special meeting 4/28 (last class)

• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning

• 10:30-11:50 am, Wean Hall 4601

• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)

Date post:	14-Dec-2015
Category:	Documents
Upload:	phebe-francis
View:	216 times
Download:	0 times

Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and...

Documents