Date post: | 11-Jun-2015 |
Category: |
Technology |
Upload: | yoh-okuno |
View: | 3,458 times |
Download: | 0 times |
Exact Decoding of Phrase-‐based Translation Models through
Lagrangian Relaxation Yin-‐Wen Chang (MIT),
Michael Collins (Columbia University)
EMNLP 2011 reading
About the presenter
• Name: Yoh Okuno
• Software Engineer at Web company
• Interest: NLP, Machine Learning, Data Mining
• Skill: C/C++, Python, Hadoop, etc.
• Weblog: http://d.hatena.ne.jp/nokuno/
Decoding in Phrase-‐based SMT
• Decoding in SMT is NP-‐Hard
– Approximate search: beam search
– Exact search: ILP(Integer Linear Programming)
• Propose adoption of Lagrangian relaxation
and efficient dynamic programming
Phrase-‐based SMT Model
• Reordering makes the problem complicated
• Use 3-‐gram language model
f(y) = h(e(y)) +L�
k=1
g(pk) +L−1�
k=1
ηδ(t(pk), s(pk+1))
LM Translation Distortion y =< p1p2...pL >pk = (s, t, e)δ(t, s) = |t+ 1− s|
output: phrase: distortion:
η : negative constant x: input sentence
Decoding with constraints
• Our purpose: solve
• Define y(i) = # of x_i is translated in y
1. Each word in the input is translated exactly
once: y(i) = 1 for all i
2. Distortion limit:
argmaxy∈Y
f(y)
δ(t(pk), s(pk+1)) < d
Exact dynamic programming
• Use states:
• w1, w2: trigram context words
• b: bit string which input words are translated
• r: end position of the previous phrase
(w1, w2, b, r)
Exact dynamic programming • Yet it is intractable
Decoding based on Lagrangian Relaxation
• Consider broader set of Y
• Y’ use looser constraint:
• That means, N words are translated
argmaxy∈Y �
f(y)
N�
i=1
y(i) = N
Efficient Dynamic Programming
• Use states:
– or
• n: number of translated words
• (l,m): range of previous translated words
• Transition as one phrase translation
(w1, w2, n, r)
(w1, w2, n, l,m, r)
pk = (s, t, e)
Applying Lagrangian Relaxation • Solve relaxed problem + constraints
• Apply Lagrangian method
• Dual objective and dual problem:
argmaxy∈Y �
f(y) such that ∀ i, y(i) = 1
L(u, y) = f(y) +�
i
u(i)(y(i)− 1)
minu
L(u) = minu
maxy∈Y �
L(u, y)
Decoding by subgradient method
Intuitive interpretation • Lagrange multiplier u(i) penalizes or rewards
input word i to be translated exactly once
• Update: – Declease u(i) if y(i) > 1,
– Inclease u(i) if y(i) = 0
– Do nothing if y(i) = 1
ut(i) = ut−1(i)− αt(yt(i)− 1)
Input: dadurch konnen die qualit ¨ at und die regelm ¨ aßige postzustellung auch weiterhin sichergestellt werden .
the quality and also the and the quality and also the regular will continue to be continue to be continue to.. in that way, and can thus quality in that way, the qualit and.. can the regular distribution should also ensure distribution.. the regular and regular and regular the quality and the .. in that way, the quality of the quality of the distribution...
output: in that way, the quality and the regular distribution should continue to be guaranteed.
Experimental summary
• Language: German to English translation
• Corpus: Europarl data (1,824 sentence)
• Proposed method finds exact solutions on 99%
• Average run time is 120 seconds
• Moses makes search errors of 4 to 18%
Table 1: iteration and conversion
• 97% of the examples converge within 120 iter.
Table 4: ILP/LP are too slow
Table 5: Moses search errors
Table 7: BLUE doesn’t improveL
Conclusion • Described an exact decoding algorithm for
SMT using Lagrangian relaxation
• Proposed method finds exact solutions on
99% samples within 120 seconds in average
• Future work: apply Lagrangian relaxation to
training algorithms for SMT
Any Question?
Transition for DP
• Define transition as one phrase translation
(w1, w2, n, l,m, r) −→ (w�1, w
�2, n
�, l�,m�, r�)pk = (s, t, e)
(w�1, w
�2) = (eM−1, eM ) if M > 1
(w2, e1) if M = 1n� = n+ t− s+ 1