Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Why Generative Models Underperform Surface
Heuristics
UC BerkeleyNatural Language Processing
John DeNero, Dan Gillick, James Zhang, and Dan Klein
Overview: Learning Phrases
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)
Intersected and grown word alignments
Directional word alignments
Overview: Learning Phrases
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)
Phrase-level generative model
• Early successful phrase-based SMT system [Marcu & Wong ‘02]
• Challenging to train
• Underperforms heuristic approach
OutlineI) Generative phrase-based alignment
Motivation Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error rate
III) Proposed Improvements
Motivation for Learning Phrases
Translate!
Input sentence:
Output sentence:
J ’ ai un chat .
I have a spade .
Motivation for Learning Phrases
appelle un chat un chat
call
a
spade
a
spade
appelle call
chat un chat spade a spade
Motivation for Learning Phrases
appelle un chat un chat
call
a
spade
a
spade
appelleappelle un appelle un chatunun chatun chat unchatchat unchat un chat
callcall acall a spadea x2
a spade x2
a spade aspade x2
spade aspade a spade
… appelle un chat un chat …
A Phrase Alignment Model Compatible with Pharaoh
les chats aiment le poisson frais .
cats like fresh fish .
Training Regimen That Respects Word Alignment
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.
frais
.
X
Training Regimen That Respects Word Alignment
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
Only 46% of training sentences contributed to training.
36
37
38
39
40
0 1 2 3 4
EM Iterations
BLEU
100k25k
Performance Results
Heuristically generated parameters
Performance Results
39.0
38.538.3
38.8
37
38
39
40
Heuristic(100k)
Heuristic(50k)
Heuristic(25k)
Learned(100k)
BLEU
Lost training data is not the whole story
Learned parameters with 4x training data
underperform heuristic
OutlineI) Generative phrase-based alignment
Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error
rate
III) Proposed Improvements
Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart
Example: Maximizing Likelihood with Competing Segmentations
cartecartecarte surcarte surcarte sur lacarte sur lasurlasur lasur la tablesur la tablela tablela tabletabletable
mapnoticemap onnotice onmap on thenotice on theontheon theon the tableon the chartthe tablethe charttablechart
0.50.50.50.50.50.51.01.01.00.50.50.50.50.50.50.25 * 7 / 7
= 0.25
carte sur la tableLikelihood Computation
Training Corpus
French: carte sur la table
English: map on the table
French: carte sur la table
English: notice on the chart
Example: Maximizing Likelihood with Competing Segmentations
cartecarte surcarte sur lasursur lasur la tablelala tabletable
mapnotice onnotice on theonon theon the tablethethe tablechart
1.01.01.01.01.01.01.01.01.0
carte sur la table
Likelihood of “notice on the chart” pair: 1.0 * 2 / 7 = 0.28 > 0.25
Likelihood of “map on the table” pair: 1.0 * 2 / 7 = 0.28 > 0.25
EM Training Significantly Decreases Entropy of the Phrase Table
French phrase entropy:
0 10 20 30 40
0-.01
.01-.5
.5-1
1-1.5
1.5-2
> 2
Entropy
Percent of French Phrases
LearnedHeuristic
10% of French phrases have deterministic distributions
Effect 1: Useful Phrase Pairs Are Lost Due to Critically Small Probabilities
In 10k translated sentences, no phrases with weight less than 10-5 were used by the decoder.
0 100 200 300 400
Heuristic
Learned
Effective Table Size (1000 phrases)
Effect 2: Determinized Phrases Override Better Candidates During Decoding
the situation varies to an enormous degree
the situation varie d ' une immense degré
the situation varies to an enormous degree
the situation varie d ' une immense caractérise
Heuristic
Learned
~00.02amount
0.010.02extent
0.260.38level
0.640.49degree
degré
€
φH
€
φEM
0.998~0degree
~00.05features
0.0010.21characterized
0.0010.49characterizes
caractérise
€
φH
€
φEM
Effect 3: Ambiguous Foreign Phrases Become Active During Decoding
Deterministic phrases can be used by the decoder with no cost.
Translations for the French apostrophe
OutlineI) Generative phrase-based alignment
Model structure and training Performance results
II) Error analysis Properties of the learned phrase
table Contributions to increased error
rate
III) Proposed Improvements
Motivation for Reintroducing Entropy to the Phrase Table
1. Useful phrase pairs are lost due to critically small probabilities.
2. Determinized phrases override better candidates.
3. Ambiguous foreign phrases become active during decoding.
Reintroducing Lost Phrases
36.5 37 37.5 38 38.5 39
Learned
Heuristic
Interpolated
BLEU (25k sentences)
Interpolation yields up to 1.0 BLEU improvement
Smoothing Phrase Probabilities
Reserves probability mass for unseen translations based on
the length of the French phrase
36.5 37 37.5 38 38.5 39
Learned
Heuristic
Smoothed
BLEU (25k sentences)
Conclusion Generative phrase models determinize the phrase table via the latent segmentation variable.
A determinized phrase table introduces errors at decoding time.
Modest improvement can be realized by reintroducing phrase table entropy.
Questions?