Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | phebe-francis |
View: | 216 times |
Download: | 0 times |
Pair HMMs and edit distance
Ristad & Yianilos
Special meeting Wed 4/14
• What: Evolving and Self-Managing Data Integration Systems
• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign
• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)
• Where: Sennott Square Building, room 5317
Special meeting 4/28 (last class)
• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning
• 10:30-11:50 am, Wean Hall 4601
• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)
Pair HMMs – Ristad & Yianolis
• HMM review– notation– inference (forward algorithm)– learning (forward-backward & EM)
• Pair HMMs– notation– generating edit strings– distance metrics (stochastic, viterbi)– inference (forward)– learning (forward-backward & EM)
• Results from R&Y paper– K-NN with trained distance, hidden prototypes– problem: phoneme strings => words
• Advanced Pair HMMs– adding state (eg for affine gap models)– Smith Waterman?– CRF training?
last week
today
HMM Notation
},...,{, emittingafter in is HMM state:
of substring a:...
ofprefix a :...
aka string,input :...
)Pr(y,probabilitemission :)(
)Pr(y,probabilitn transitio:)'(
HMM theof parameters :
1
1
1
21
'
Ktt
t
Tutt
ut
Tt
t
TT
l
ll
llsxs
xxxxx
xxxx
xxxx
xsxl
ssll
HMM Example
},...,{, emittingafter in is HMM state:
of substring a:...
ofprefix a :...
aka string,input :...
)Pr(y,probabilitemission :)(
)Pr(y,probabilitn transitio:)'(
HMM theof parameters :
1
1
1
21
'
Ktt
t
Tutt
ut
Tt
t
TT
l
ll
llsxs
xxxxx
xxxx
xxxx
xsxl
ssll
1 2
Pr(1->2)
Pr(2->1)
Pr(2->2)Pr(1->1)
Pr(1->x)
d 0.3
h 0.5
b 0.2
Pr(2->x)
a 0.3
e 0.5
o 0.2
Sample output: xT=heehahaha, sT=122121212
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Key point: Pr(si=l) depends only on Pr(l’->l) and si-1
x1 x2 x3 xT
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Key point: Pr(si=l) depends only on Pr(l’->l) and si-1
so you can propogate probabilities forward...
x1 x2 x3 xT
HMM Inference – Forward Algorithm
l=1 ....
l=2 ....
...
l=K ....
x1 x2 x3 xT
)()'()1,(
)Pr()'Pr()'Pr(
)Pr(),(
11
'
t
ttt
l
tt
xllltl
xllllsx
lsxtl
),( tl
HMM Learning - EM
Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T
• forward algorithm + epsilon
• hidden variables are states s at times t=1,...,t=T
– Maximize probability of parameters given expectations:
• replace #(l’->l)/#(l’) with weighted version of counts
• replace #(l’->x)/#(l’) with weighted version
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Forward algorithm: computes probabilities α(l,t) based on information in first t letters of string, ignores “downstream” information
x1 x2 x3 xT
HMM Inference
l=1 ...
l=2
...
l=K
x1 x2 x3 x
T
)Pr(),(
)Pr()Pr()Pr(
1
1
lsxli
lsxlsxlsx
iTi
iTii
ii
T
)Pr()()'()'Pr( 211 lsxxllllsx iTi
lii
Ti
),1()()'()',( 1 lixllllil
i
HMM Learning - EM
Expectation maximization:– Find expectations, i.e. Pr(si=l) for i=1,...,T
• forward backward algorithm
• hidden variables are states s at times t=1,...,t=T
– Maximize probability of parameters given expectations:
• replace #(l’->l)/#(l’) with weighted version of counts
• replace #(l’->x)/#(l’) with weighted version
Pair HMM Notation
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMMpair theof parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
Pair HMM Example
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMMpair theof parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
Pair HMM Example
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<e,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
Sample run: zT = <h,t>,<e,e><e,e><h,h>,<e,->,<e,e>Strings x,y produced by zT: x=heehee, y=teehe
Notice that x,y is also produced by z4 + <e,e>,<e,-> and many other edit strings
Distances based on pair HMMs
),,(:),,(:
)Pr()Pr()|,Pr(VTnnVTnn yxzEDITz i
i
yxzEDITz
nVT zzyx
)|,Pr(log),(stochastic TTTT yxyxd
)|Pr(maxarglog),(),,(:
viterbi n
yxzEDITz
TT zyxdTTnn
Pair HMM Inference
),()1,(),(),1(),()1,1(
),(),Pr(),(),Pr(),(),Pr(
),Pr(),(1111
vtvt
vvt
tvt
vtvt
vt
yvtxvtyxvt
yyxxyxyxyx
yxvt
)1,1( vt ),1( vt
)1,( vt ),( vt
Dynamic programming is possible: fill out matrix left-to-right, top-down
Pair HMM Inference
),()1,(),(),1(),()1,1(
),Pr(),(
vtvt
vt
yvtxvtyxvt
yxvt
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
x1 x2 x3 xT
Pair HMM Inference
),()1,(),(),1(),()1,1(
),Pr(),(
vtvt
vt
yvtxvtyxvt
yxvt
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
Pair HMM Inference
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
One difference: after i emissions of pair HMM, we do not know the column position
i=1
i=1 i=3
i=1 i=2
i=3
Pair HMM Inference: Forward-Backward
)1,(),(),1(),()1,1(),(
),Pr(),(),Pr(),(),Pr(),(
),Pr(),(
),Pr(),())','(),(Pr(
1111
''1
jiyjixjiyx
yxyyxxyxyx
yxji
yxjijirjir
jiji
Vj
Tij
Vj
Tii
Vj
Tiji
Vj
Ti
Vj
Tikk
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
Multiple states
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
3e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
2
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=2
An extension: multiple states
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=1
conceptually, add a “state” dimension to the model
EM methods generalize easily to this setting
Back to R&Y paper...
• They consider “coarse” and “detailed” models, as well as mixtures of both.
• Coarse model is like a back-off model – merge edit operations into equivalence classes (e.g. based on equivalence classes for chars).
• Test by learning distance for K-NN with an additional latent variable
K-NN with latent prototypes
test example y (a string of phonemes)
possible prototypes x (known word pronounciation )
x1 x2 x3 xm
words from dictionary
y
w1 w2 wK
learned phonetic distance
K-NN with latent prototypes
x1 x2 x3 xm
y
w1 w2 wK
learned phonetic distance
Method needs (x,y) pairs to train a distance – to handle this, an additional level of E/M is used to pick the “latent prototype” to pair with each y
Hidden prototype K-nn
Experiments
•E1: on-line pronounciation dictionary
•E2: subset of E1 with corpus words
•E3: dictionary from training corpus
•E4: dictionary from training + test corpus (!)
•E5: E1 + E3
Experiments
Experiments
Special meeting Wed 4/14
• What: Evolving and Self-Managing Data Integration Systems
• Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign
• When: Wednesday, April 14, 2004 @ 11am (food at 10:30am)
• Where: Sennott Square Building, room 5317
Special meeting 4/28 (last class)
• First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning
• 10:30-11:50 am, Wean Hall 4601
• All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)