NTT-UT SMT System for NTCIR-9 PatentMT
Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata
NTT Communication Science LaboratoriesKyoto, Japan
Xianchao Wu, Takuya Matsuzaki, Jun’ichi Tsujii
The University of Tokyo
Tokyo, Japan
1
task
singlesystemfeatures
additionalfeature
rank
English-Japanese Japanese-English Chinese-English
• Pre-ordering• Big LM• WFST decode
• Pre-ordering • WA Adaptation
Sys. Comb.+ U-Tokyo
forest-to-tree
Sys. Comb.+ U-Tokyo
HPBMT
Sys. Comb.+ U-Tokyo
HPBMT
1st 5th 9th
Overview
2
task
singlesystemfeatures
additionalfeature
rank
English-Japanese Japanese-English Chinese-English
• Pre-ordering• Big LM• WFST decode
• Pre-ordering • WA Adaptation
Sys. Comb.+ U-Tokyo
forest-to-tree
Sys. Comb.+ U-Tokyo
HPBMT
Sys. Comb.+ U-Tokyo
HPBMT
1st 5th 9th
Overview
2
Better than RBMTeven in Subjective
Evaluation!!!
task
singlesystemfeatures
additionalfeature
rank
English-Japanese Japanese-English Chinese-English
• Pre-ordering• Big LM• WFST decode
• Pre-ordering • WA Adaptation
Sys. Comb.+ U-Tokyo
forest-to-tree
Sys. Comb.+ U-Tokyo
HPBMT
Sys. Comb.+ U-Tokyo
HPBMT
1st 5th 9th
Overview
Today’s Focus !2
Better than RBMTeven in Subjective
Evaluation!!!
Head Finalizationfor En-Ja pre-ordering
3
Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)
3
Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)
• Moving heads to rhs on HPSG tree
• English HPSG Parser “Enju” (U-Tokyo)
3
Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)
• Moving heads to rhs on HPSG tree
• English HPSG Parser “Enju” (U-Tokyo)
• Pseudo-word insertion for Ja particles
• Predicate-argument structure by Enju
3
Head Finalizationfor En-Ja pre-ordering• Isozaki et al. (WMT 2010)
• Moving heads to rhs on HPSG tree
• English HPSG Parser “Enju” (U-Tokyo)
• Pseudo-word insertion for Ja particles
• Predicate-argument structure by Enju
• Determiner (a/an/the) deletion
3
Head Finalization Example
4
I lost my wallet in the airport yesterday
head
head
Head Finalization Example
4
I lost my wallet in the airport yesterday
head
head
Head Finalization Example
I lostmy walletinthe airportyesterday
head
head
Head Finalization Example
I lostmy walletinthe airportyesterday
head
head
• Move Heads
Head Finalization Example
I lostmy walletinthe airportyesterday
head
head
• Move Heads• Remove a, an, the
Head Finalization Example
I lostmy walletinthe airportyesterday
head
head
_va0 _va2
• Move Heads• Remove a, an, the• Insert pseudo-particles for subjects & objects
Head Finalization Example
I lostmy walletinairportyesterday_va0 _va2
Head Finalization Example
I lostmy walletinairportyesterday_va0 _va2
私 は 昨日 空港 で 私 の 財布 を なくし た
Head Finalization Example
I lostmy walletinairportyesterday_va0 _va2
私 は 昨日 空港 で 私 の 財布 を なくし た
Monotone Translation !!
Japanese Big LM
• Word 5-gram LM from 300M Ja sentences
7
WFST-based Monotone Decoding
8
WFST-based Monotone Decoding
• MT becomes monotone by pre-ordering
8
WFST-based Monotone Decoding
• MT becomes monotone by pre-ordering
• Efficient decoding by WFST
8
WFST-based Monotone Decoding
• MT becomes monotone by pre-ordering
• Efficient decoding by WFST
• phrase segmentation > phrase translation > word segmentation > LM
8
WFST-based Monotone Decoding
• MT becomes monotone by pre-ordering
• Efficient decoding by WFST
• phrase segmentation > phrase translation > word segmentation > LM
• Efficient on-the-fly composition
8
WFST-based Monotone Decoding
• MT becomes monotone by pre-ordering
• Efficient decoding by WFST
• phrase segmentation > phrase translation > word segmentation > LM
• Efficient on-the-fly composition
• ~3x faster than Moses PBMT
8
Generalized MBR-basedSystem Combination
9
Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)
9
Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)
• Hyp. selection on N-bests on M systems
9
Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)
• Hyp. selection on N-bests on M systems
• Optimization in RIBES+BLEU
9
Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)
• Hyp. selection on N-bests on M systems
• Optimization in RIBES+BLEU
• System-independent “agreement” features
• Sub-components on RIBES & BLEU
9
Generalized MBR-basedSystem Combination• Duh et al. (IJCNLP 2011)
• Hyp. selection on N-bests on M systems
• Optimization in RIBES+BLEU
• System-independent “agreement” features
• Sub-components on RIBES & BLEU
• Ranking SVM-like pairwise training
9
EJ Auto-Eval Results
10
BLEU (%) RIBES (%)
EJ Auto-Eval Results
10
0 20 40 60 80
HPBMT Baseline
F2S (U-Tokyo)
PreOrder (WFST)
PO+BigLM (Moses)
GMBR Sys. Comb.
BLEU (%) RIBES (%)
EJ Auto-Eval Results
10
0 20 40 60 80
HPBMT Baseline
F2S (U-Tokyo)
PreOrder (WFST)
PO+BigLM (Moses)
GMBR Sys. Comb.39.48
38.81
36.83
27.99
31.66
BLEU (%) RIBES (%)
EJ Auto-Eval Results
10
0 20 40 60 80
HPBMT Baseline
F2S (U-Tokyo)
PreOrder (WFST)
PO+BigLM (Moses)
GMBR Sys. Comb.39.48
38.81
36.83
27.99
31.66
78.13
77.82
77.29
68.61
72
BLEU (%) RIBES (%)
EJ Subj.-Eval Results
11
Adequacy Acceptability (%)
EJ Subj.-Eval Results
11
0 20 40 60 80
HPBMT Baseline
PreOrder (WFST)
GMBR Sys. Comb.
RBMT6-1
Adequacy Acceptability (%)
EJ Subj.-Eval Results
11
0 20 40 60 80
HPBMT Baseline
PreOrder (WFST)
GMBR Sys. Comb.
RBMT6-1
Adequacy Acceptability (%)
2.60
3.56
3.67
3.51
EJ Subj.-Eval Results
11
0 20 40 60 80
HPBMT Baseline
PreOrder (WFST)
GMBR Sys. Comb.
RBMT6-166
69
47
Adequacy Acceptability (%)
2.60
3.56
3.67
3.51
n/a
What we found...
12
What we found...
12
• Head Finalization worked QUITE well !
• Simple but effective way for EJ translation
• Monotone translation is relatively easy?
What we found...
12
• Head Finalization worked QUITE well !
• Simple but effective way for EJ translation
• Monotone translation is relatively easy?
• Further improved by GMBR Sys. Comb.
• System variance (diversity) is important?
Conclusion
13
Conclusion
• State-of-the-art EJ translation
• even better than RBMT !
13
Conclusion
• State-of-the-art EJ translation
• even better than RBMT !
• ... moderate in JE/CE
• JE pre-ordering, CE adaptation
13
That’s It!
Acknowledgments• PatentMT organizers, for all of this great task!
• Prof. Hideki Isozaki, for Head Finalization
• Dr. Takaaki Hori, Dr. Shinji Watanabe, for WFST decoding
14