Edinburgh MT lecture 7: phrase-based MT

Post on 25-Jul-2015

301 views 10 download

Tags:

transcript

Phrase-Based Translation

The IBM Models

The IBM Models

•Fertility probabilities.

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然

� �北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

IBM Model 4

Tradeoffs: Modeling v. Learning

IBM Model 1 ✔ ✘ ✘ ✔ ✔

HMM ✔ ✔ ✘ ✘ ✔

IBM Model 4 ✔ ✔ ✔ ✘ ✘

Lexical

Tran

slatio

n

Local orderi

ng depen

dency

Fertilit

y

Convex

Tracta

ble Exa

ct

Inferen

ce

Tradeoffs: Modeling v. Learning

IBM Model 1 ✔ ✘ ✘ ✔ ✔

HMM ✔ ✔ ✘ ✘ ✔

IBM Model 4 ✔ ✔ ✔ ✘ ✘

Lexical

Tran

slatio

n

Local orderi

ng depen

dency

Fertilit

y

Convex

Tracta

ble Exa

ct

Inferen

ce

Lesson:Trade exactnessfor expressivity

Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然

� �北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

IBM Model 4

What are some things this model doesn’t account for?

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

What are some things this model doesn’t account for?

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However ,

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

Phrase-based Models

Phrase-based Models

•Segmentation probabilities.

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Phrase-based Models

•Segmentation probabilities: fixed (uniform)

•Phrase translation probabilities.

•Distortion probabilities: fixed (decaying)

Learning p(Chinese|English)

•Reminder: (nearly) every problem comes down to computing either:

•Sums: MLE or EM (learning)

•Maximum: most probable (decoding)

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p( )

) +

) +

Marginalize: sum all alignments containing the link

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

We have to sum over exponentially many alignments!

EM for Model 1

probability of an alignment.

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

EM for Model 1

probability of an alignment.

observed uniform

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

factors across words.

EM for Model 1

probability of an alignment.

observed uniform

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

EM for Model 1

p(ai = j|F,E) =p(ai = j, F |E)

p(F,E)=

EM for Model 1

北北

.�

a⇥A: �north

p(north| ) · p(rest of a)

p(ai = j|F,E) =p(ai = j, F |E)

p(F,E)=

EM for Model 1

北北

.�

a⇥A: �north

p(north| ) · p(rest of a)

marginal probability of alignments containing link

p(ai = j|F,E) =p(ai = j, F |E)

p(F,E)=

EM for Model 1

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

EM for Model 1

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

EM for Model 1

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

c

EM for Model 1

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

c

identical!

EM for Model 1

北p(north| ).�

c�Chinese words p(north|c)

EM for Phrase-Based

•Model parameters: p(E phrase|F phrase)

•All we need to do is compute expectations:

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

EM for Phrase-Based

•Model parameters: p(E phrase|F phrase)

•All we need to do is compute expectations:

p(F,E) sums over all possible phrase alignments

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

EM for Phrase-Based

•Model parameters: p(E phrase|F phrase)

•All we need to do is compute expectations:

p(F,E) sums over all possible phrase alignments...which are one-to-one by definition.

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

EM for Phrase-Based

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

, the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

EM for Phrase-Based

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

Can we compute this quantity?

, the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

EM for Phrase-Based

p(ai = j|F,E) =p(ai,i0 = hj, j0i, F |E)

p(F,E)

Can we compute this quantity?

, the sky remained clear under the strong north wind .

How many 1-to-1 alignments are there ofthe remaing 8 Chinese and 8 English words?

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Computing expectations from a phrase-based model, given a sentence pair, is #P-Complete(by reduction to counting perfect matchings;

DeNero & Klein, 2008)

argmaxa p(a|f,e) is also hard

argmaxa p(a|f,e) is also hard

argmaxa p(a|f,e) is also hard

Now What?

•Option #1: approximate expectations

•Restrict computation to some tractable subset of the alignment space (arbitrarily biased).

•Markov chain Monte Carlo (slow).

Now What?•Change the problem definition

•We already know how to learn word-to-word translation models efficiently.

•Idea: learn word-to-word alignments, extract most probable alignment, then treat it as observed.

•Learn phrase translations consistent with word alignments.

•Decouples alignment from model learning -- is this a good thing?

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

akemasu / open

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

watashi / I

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

watashi / I ✘

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo / box

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo / the box

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo / open the box

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo / open the box✘

Phrase Extraction

I open the box

watashi

wa

hako

wo

akemasu

hako wo akemasu / open the box

Phrasal Translation Estimation

Phrasal Translation Estimation

•Option #1 (EM over restricted space)

•Align with a word-based model.

•Compute expectations only over alignments consistent with the alignment grid.

Phrasal Translation Estimation

•Option #1 (EM over restricted space)

•Align with a word-based model.

•Compute expectations only over alignments consistent with the alignment grid.

•Option #2 (Non-global estimation)

•View phrase pairs as observed, irrespective of context or overlap.

Decoding

We want to solve this problem:

e⇤ = arg max

ep(e|f)

Decoding

We want to solve this problem:

e⇤ = arg max

ep(e|f)

Q: how many English sentences are there?

北 风 呼啸 。

北 风 呼啸 。

segmentationssubstitutionspermutations

北 风 呼啸 。

O(2n)segmentationssubstitutionspermutations

北 风 呼啸 。

O(2n)O(5n)

segmentationssubstitutionspermutations

北 风 呼啸 。

O(2n)O(5n)O(n!)

segmentationssubstitutionspermutations

Key Idea

Key Idea

Key Idea

Key Idea

Key Idea

Key Idea

Dynamic Programming

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

START Although

crystal clear

START However

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

START Although

crystal clear

START However

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

wind shrieked

wind screamed

north wind

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

wind shrieked

wind screamed

north wind

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

shrieked ,

, yet

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

shrieked ,

, yet

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

sky ,

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

sky ,

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

Although the northern wind shrieked across the sky, but was still very clear.

Approximation: Pruning

Approximation: Pruning

Idea: prune states by accumulated path length

Approximation: Pruning

Approximation: Pruning

Solution: Group states by number of covered words.

•Some (not all) key ingredients in Google Translate:

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•... Coupled with a huge language model

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•... Coupled with a huge language model

•... And decoding w/ severe pruning heuristics