+ All Categories
Home > Documents > Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin...

Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin...

Date post: 08-Aug-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
34
Edinburgh’s Neural Machine Translation Systems Barry Haddow University of Edinburgh October 27, 2016 Barry Haddow Edinburgh’s NMT Systems 1 / 20
Transcript
Page 1: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Edinburgh’s Neural Machine Translation Systems

Barry Haddow

University of Edinburgh

October 27, 2016

Barry Haddow Edinburgh’s NMT Systems 1 / 20

Page 2: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Collaborators

Rico Sennrich Alexandra Birch

Barry Haddow Edinburgh’s NMT Systems 1 / 20

Page 3: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Edinburgh’s WMT Results Over the Years

2013 2014 2015 20160

10

20

30

20.3 20.9 20.821.5

19.420.2

22 22.1

24.7

BLE

Uon

new

stes

t201

3(E

N→

DE

)

phrase-based SMTsyntax-based SMTneural MT

Barry Haddow Edinburgh’s NMT Systems 2 / 20

Page 4: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Edinburgh’s WMT Results Over the Years

2013 2014 2015 20160

10

20

30

20.3 20.9 20.821.5

19.420.2

22 22.1

24.7

BLE

Uon

new

stes

t201

3(E

N→

DE

)

phrase-based SMTsyntax-based SMTneural MT

Barry Haddow Edinburgh’s NMT Systems 2 / 20

Page 5: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Edinburgh’s WMT Results Over the Years

2013 2014 2015 20160

10

20

30

20.3 20.9 20.821.5

19.420.2

22 22.1

24.7

BLE

Uon

new

stes

t201

3(E

N→

DE

)

phrase-based SMTsyntax-based SMTneural MT

Barry Haddow Edinburgh’s NMT Systems 2 / 20

Page 6: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Neural Machine Translation:Encoder-Decoder-with-Attention

[Image: Philipp Koehn]Barry Haddow Edinburgh’s NMT Systems 3 / 20

Page 7: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Neural versus Phrase-based MT

phrase-based SMTLearn segment-segment correspondences from bitext

→ training is multistage pipeline of heuristics

→ strong independence assumptions

→ “fixed” trade-off between features

neural MTLearn mathematical function on vectors from bitext

→ end-to-end trained model

→ output conditioned on full source text and target history

→ non-linear dependence on information sources

Barry Haddow Edinburgh’s NMT Systems 4 / 20

Page 8: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Neural versus Phrase-based MT

phrase-based SMTLearn segment-segment correspondences from bitext

→ training is multistage pipeline of heuristics

→ strong independence assumptions

→ “fixed” trade-off between features

neural MTLearn mathematical function on vectors from bitext

→ end-to-end trained model

→ output conditioned on full source text and target history

→ non-linear dependence on information sources

Barry Haddow Edinburgh’s NMT Systems 4 / 20

Page 9: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Innovations in Edinburgh’s WMT16 Systems

1 Subword models to allow translation of rare/unknown words→ since networks have small, fixed vocabulary

2 Back-translated monolingual data as additional training data→ allows us to make use of extensive monolingual resources

3 Combination of left-to-right and right-to-left models→ Reduces “label-bias” problem

4 Pervasive dropout→ Technical device to improve training with small data sets

Barry Haddow Edinburgh’s NMT Systems 5 / 20

Page 10: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Problem with Word-level Models

they charge a carry-on bag fee.sie erheben eine Hand|gepäck|gebühr.

Neural MT architectures have small and fixed vocabularytranslation is an open-vocabulary problem

productive word formation (example: compounding)names (may require transliteration)

Barry Haddow Edinburgh’s NMT Systems 6 / 20

Page 11: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Why Subword Models?

transparent translationsmany translations are semantically/phonologically transparent→ translation via subword units possiblemorphologically complex words (e.g. compounds):

solar system (English)Sonnen|system (German)Nap|rendszer (Hungarian)

named entities:Barack Obama (English; German)Áàðàê Îáàìà (Russian)バラク・オバマ (ba-ra-ku o-ba-ma) (Japanese)

cognates and loanwords:claustrophobia (English)Klaustrophobie (German)Êëàóñòðîôîáèÿ (Russian)

Barry Haddow Edinburgh’s NMT Systems 7 / 20

Page 12: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Byte-pair Encoding

Start with maximally split (i.e. characters)

Use statistics to identify groups to merge

Proceed until pre-determined number of merge operations is learnt

Examples:system sentencesource health research institutesreference Gesundheitsforschungsinstituteword-level Forschungsinstitutesubword Gesundheits|forsch|ungsin|stitutesource rakfiskreference ðàêôèñêà (rakfiska)word-level rakfisk → UNK → rakfisksubword rak|f|isk → ðàê|ô|èñêà (rak|f|iska)

Barry Haddow Edinburgh’s NMT Systems 8 / 20

Page 13: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Subword Models

translation of many rare/unknown words is transparent

subword units allow open-vocabulary NMT without back-off model

substantial gains in translation quality, especially for rare words

Barry Haddow Edinburgh’s NMT Systems 9 / 20

Page 14: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Monolingual Training Data

why monolingual data for phrase-based SMT?relax independence assumptions 3

more training data 3

more appropriate training data (domain adaptation) 3

why monolingual data for neural MT?relax independence assumptions 7

more training data 3

more appropriate training data (domain adaptation) 3

Barry Haddow Edinburgh’s NMT Systems 10 / 20

Page 15: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Monolingual Data in NMT

encoder-decoder already conditions onprevious target words

no architecture change required to learnfrom monolingual data

Barry Haddow Edinburgh’s NMT Systems 11 / 20

Page 16: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Monolingual Training Instances

Problemwe have no source context for monolingual training instances

Solutionstwo methods to deal with missing source context:

empty/dummy source context→ danger of unlearning conditioning on sourceproduce synthetic source sentence via back-translation→ get approximation of source context

Barry Haddow Edinburgh’s NMT Systems 12 / 20

Page 17: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Monolingual Training Instances

Problemwe have no source context for monolingual training instances

Solutionstwo methods to deal with missing source context:

empty/dummy source context→ danger of unlearning conditioning on sourceproduce synthetic source sentence via back-translation→ get approximation of source context

Barry Haddow Edinburgh’s NMT Systems 12 / 20

Page 18: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Evaluation: WMT 15 English↔German

syntax-based parallel +monolingual +synthetic

0

10

20

30

24.4 23.6 24.626.5

24.4 23.624.4

BLE

U

English→German

PBSMT parallel +synthetic +synth-ens4

0

10

20

30 29.326.7

30.431.6

29.326.7

29.3

BLE

UGerman→English

Barry Haddow Edinburgh’s NMT Systems 13 / 20

Page 19: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Why is monolingual data helpful?

Domain adaptation effect

Reduces over-fitting

Improves fluency

Barry Haddow Edinburgh’s NMT Systems 14 / 20

Page 20: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Putting it all together: WMT16 Results

EN→CS EN→DE EN→RO EN→RU CS→EN DE→EN RO→EN RU→EN

0

10

20

30

40

BLE

U

parallel data +synthetic data +ensemble +R2L reranking

Barry Haddow Edinburgh’s NMT Systems 15 / 20

Page 21: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Human Pairwise Ranking Scores

direction BLEU rank human rankEN→CS 1 of 9 1 of 20EN→DE 1 of 11 1 of 15EN→RO 2 of 10 1–2 of 12EN→RU 1 of 8 2–5 of 12CS→EN 1 of 4 1 of 12DE→EN 1 of 6 1 of 10RO→EN 2 of 5 2 of 7RU→EN 3 of 6 5 of 10

NB: Human rankings include online systems, and for en→cs, extra systems from tuning

task

Barry Haddow Edinburgh’s NMT Systems 16 / 20

Page 22: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

NMT vs. PBMT: An extended test

Training and test drawn from UN corpusMulti-parallel, 11M linesArabic, Chinese, English, French, Russian, Spanish

Apply BPE, but no monolingual data

NMT systems trained for 8 days

Evaluate using BLEU on 4000 sentences

[Junczys-Dowmunt et. al, 2016]

Barry Haddow Edinburgh’s NMT Systems 17 / 20

Page 23: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

NMT vs. PBMT: UN data

Barry Haddow Edinburgh’s NMT Systems 18 / 20

Page 24: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Conclusions and Outlook

Conclusionsneural MT is SOTA on many tasks

subword models and back-translated data contributed to success

NMT has gone from lab to deployment, much faster than expected

Problems and OpportunitiesInclusion of terminology, placeholders, markup

Interpretation and manual changes to models

Computer-aided and interactive MT

Incremental training

Extra knowledge sources (context, multimodal)

Sharing across languages, domains

Barry Haddow Edinburgh’s NMT Systems 19 / 20

Page 25: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

AcknowledgmentsSome of the research presented here was conducted in cooperation withSamsung Electronics Polska sp. z o.o. - Samsung R&D Institute Poland.

This project has received funding from the European Union’sHorizon 2020 research and innovation programme undergrant agreement 645452 (QT21).

Barry Haddow Edinburgh’s NMT Systems 20 / 20

Page 26: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Thank-you

Barry Haddow Edinburgh’s NMT Systems 21 / 20

Page 27: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16 Results (BLEU)

uedin-nmt 34.2metamind 32.3NYU-UMontreal 30.8cambridge 30.6uedin-syntax 30.6KIT/LIMSI 29.1KIT 29.0uedin-pbmt 28.4jhu-syntax 26.6

EN→DE

uedin-nmt 38.6uedin-pbmt 35.1jhu-pbmt 34.5uedin-syntax 34.4KIT 33.9jhu-syntax 31.0

DE→EN

uedin-nmt 25.8NYU-UMontreal 23.6jhu-pbmt 23.6cu-chimera 21.0uedin-cu-syntax 20.9cu-tamchyna 20.8cu-TectoMT 14.7cu-mergedtrees 8.2

EN→CS

uedin-nmt 31.4jhu-pbmt 30.4PJATK 28.3cu-mergedtrees 13.3

CS→EN

uedin-pbmt 35.2uedin-nmt 33.9uedin-syntax 33.6jhu-pbmt 32.2LIMSI 31.0

RO→EN

QT21-HimL-SysComb 28.9uedin-nmt 28.1RWTH-SYSCOMB 27.1uedin-pbmt 26.8uedin-lmu-hiero 25.9KIT 25.8lmu-cuni 24.3LIMSI 23.9jhu-pbmt 23.5usfd-rescoring 23.1

EN→RO

uedin-nmt 26.0amu-uedin 25.3jhu-pbmt 24.0LIMSI 23.6AFRL-MITLL 23.5NYU-UMontreal 23.1AFRL-MITLL-verb-annot 20.9

EN→RU

amu-uedin 29.1NRC 29.1uedin-nmt 28.0AFRL-MITLL 27.6AFRL-MITLL-contrast 27.0

RU→EN

Edinburgh NMT

SystemCombination withEdinburgh NMT

Barry Haddow Edinburgh’s NMT Systems 22 / 20

Page 28: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16 Results (BLEU)

uedin-nmt 34.2metamind 32.3NYU-UMontreal 30.8cambridge 30.6uedin-syntax 30.6KIT/LIMSI 29.1KIT 29.0uedin-pbmt 28.4jhu-syntax 26.6

EN→DE

uedin-nmt 38.6uedin-pbmt 35.1jhu-pbmt 34.5uedin-syntax 34.4KIT 33.9jhu-syntax 31.0

DE→EN

uedin-nmt 25.8NYU-UMontreal 23.6jhu-pbmt 23.6cu-chimera 21.0uedin-cu-syntax 20.9cu-tamchyna 20.8cu-TectoMT 14.7cu-mergedtrees 8.2

EN→CS

uedin-nmt 31.4jhu-pbmt 30.4PJATK 28.3cu-mergedtrees 13.3

CS→EN

uedin-pbmt 35.2uedin-nmt 33.9uedin-syntax 33.6jhu-pbmt 32.2LIMSI 31.0

RO→EN

QT21-HimL-SysComb 28.9uedin-nmt 28.1RWTH-SYSCOMB 27.1uedin-pbmt 26.8uedin-lmu-hiero 25.9KIT 25.8lmu-cuni 24.3LIMSI 23.9jhu-pbmt 23.5usfd-rescoring 23.1

EN→RO

uedin-nmt 26.0amu-uedin 25.3jhu-pbmt 24.0LIMSI 23.6AFRL-MITLL 23.5NYU-UMontreal 23.1AFRL-MITLL-verb-annot 20.9

EN→RU

amu-uedin 29.1NRC 29.1uedin-nmt 28.0AFRL-MITLL 27.6AFRL-MITLL-contrast 27.0

RU→EN

Edinburgh NMT

SystemCombination withEdinburgh NMT

Barry Haddow Edinburgh’s NMT Systems 22 / 20

Page 29: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16 Results (BLEU)

uedin-nmt 34.2metamind 32.3NYU-UMontreal 30.8cambridge 30.6uedin-syntax 30.6KIT/LIMSI 29.1KIT 29.0uedin-pbmt 28.4jhu-syntax 26.6

EN→DE

uedin-nmt 38.6uedin-pbmt 35.1jhu-pbmt 34.5uedin-syntax 34.4KIT 33.9jhu-syntax 31.0

DE→EN

uedin-nmt 25.8NYU-UMontreal 23.6jhu-pbmt 23.6cu-chimera 21.0uedin-cu-syntax 20.9cu-tamchyna 20.8cu-TectoMT 14.7cu-mergedtrees 8.2

EN→CS

uedin-nmt 31.4jhu-pbmt 30.4PJATK 28.3cu-mergedtrees 13.3

CS→EN

uedin-pbmt 35.2uedin-nmt 33.9uedin-syntax 33.6jhu-pbmt 32.2LIMSI 31.0

RO→EN

QT21-HimL-SysComb 28.9uedin-nmt 28.1RWTH-SYSCOMB 27.1uedin-pbmt 26.8uedin-lmu-hiero 25.9KIT 25.8lmu-cuni 24.3LIMSI 23.9jhu-pbmt 23.5usfd-rescoring 23.1

EN→RO

uedin-nmt 26.0amu-uedin 25.3jhu-pbmt 24.0LIMSI 23.6AFRL-MITLL 23.5NYU-UMontreal 23.1AFRL-MITLL-verb-annot 20.9

EN→RU

amu-uedin 29.1NRC 29.1uedin-nmt 28.0AFRL-MITLL 27.6AFRL-MITLL-contrast 27.0

RU→EN

Edinburgh NMT

SystemCombination withEdinburgh NMT

Barry Haddow Edinburgh’s NMT Systems 22 / 20

Page 30: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16: English→German

# score range system1 0.49 1 UEDIN-NMT

2 0.40 2 METAMIND

3 0.29 3 UEDIN-SYNTAX

4 0.17 4 NYU-MONTREAL

5 −0.01 5-10 ONLINE-B−0.01 5-10 KIT-LIMSI

−0.02 5-10 CAMBRIDGE

−0.02 5-10 ONLINE-A−0.03 5-10 PROMT-RULE

−0.05 6-10 KIT

6 −0.14 11-12 JHU-SYNTAX

−0.15 11-12 JHU-PBMT

7 −0.26 13-14 UEDIN-PBMT

−0.33 13-15 ONLINE-F−0.34 14-15 ONLINE-G

Barry Haddow Edinburgh’s NMT Systems 23 / 20

Page 31: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16: English→German

# score range system1 0.49 1 UEDIN-NMT

2 0.40 2 METAMIND

3 0.29 3 UEDIN-SYNTAX

4 0.17 4 NYU-MONTREAL

5 −0.01 5-10 ONLINE-B−0.01 5-10 KIT-LIMSI

−0.02 5-10 CAMBRIDGE

−0.02 5-10 ONLINE-A−0.03 5-10 PROMT-RULE

−0.05 6-10 KIT

6 −0.14 11-12 JHU-SYNTAX

−0.15 11-12 JHU-PBMT

7 −0.26 13-14 UEDIN-PBMT

−0.33 13-15 ONLINE-F−0.34 14-15 ONLINE-G

Neural MT

Barry Haddow Edinburgh’s NMT Systems 23 / 20

Page 32: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

WMT16: English→German

# score range system1 0.49 1 UEDIN-NMT

2 0.40 2 METAMIND

3 0.29 3 UEDIN-SYNTAX

4 0.17 4 NYU-MONTREAL

5 −0.01 5-10 ONLINE-B−0.01 5-10 KIT-LIMSI

−0.02 5-10 CAMBRIDGE

−0.02 5-10 ONLINE-A−0.03 5-10 PROMT-RULE

−0.05 6-10 KIT

6 −0.14 11-12 JHU-SYNTAX

−0.15 11-12 JHU-PBMT

7 −0.26 13-14 UEDIN-PBMT

−0.33 13-15 ONLINE-F−0.34 14-15 ONLINE-G

Neural MT Neural components

Barry Haddow Edinburgh’s NMT Systems 23 / 20

Page 33: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Fluency and Adequacy (German→ English)

Adequacy Fluencyde

-en

UEDIN-NMT 0.204 75.8 0.339 77.5ONLINE-A 0.095 72.7 0.094 70.1ONLINE-B 0.086 72.2 0.015 68.4

UEDIN-SYNTAX 0.065 71.5 0.141 71.8KIT 0.062 71.4 0.192 72.7

UEDIN-PBMT 0.042 70.9 0.004 68.6JHU-PBMT 0.019 70.5 0.084 70.5ONLINE-G 0.009 70.2 −0.067 65.3ONLINE-F −0.204 64.0 −0.348 57.8

JHU-SYNTAX −0.261 62.4 −0.237 62.5

[Bojar et al. WMT 2016]

Barry Haddow Edinburgh’s NMT Systems 24 / 20

Page 34: Edinburgh's Neural Machine Translation Systemsusfd-rescoring 23.1 EN!RO uedin-nmt 26.0 amu-uedin 25.3 jhu-pbmt 24.0 LIMSI 23.6 AFRL-MITLL 23.5 NYU-UMontreal 23.1 AFRL-MITLL-verb-annot

Neural MT and Phrase-based SMT

Neural MT Phrase-based SMTtranslation quality 3

model size 3

training time 3

model interpretability 3

decoding efficiency 3 3

toolkits3 3

(for simplicity) (for maturity)special hardware requirement GPU lots of RAM

Barry Haddow Edinburgh’s NMT Systems 25 / 20


Recommended