Date post: | 19-Jan-2017 |
Category: |
Technology |
Upload: | matiss-rikters |
View: | 349 times |
Download: | 2 times |
Neural Network Language Models
for Candidate Scoring in Multi-System Machine
TranslationMatīss Rikters
University of LatviaCOLING 2016 6th Workshop on
Hybrid Approaches to TranslationOsaka, Japan
December 11, 2016
Contents
1. Introduction2. Baseline System 3. Example Sentence4. Neural Network Language Models5. Results6. Related publications7. Future plans
Chunking– Parse sentences with Berkeley Parser (Petrov et al., 2006)– Traverse the syntax tree bottom up, from right to left– Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)• The word is non-alphabetic or only one symbol long• The word begins with a genitive phrase («of »)
– Otherwise, initialize a new chunk with the word– In case when chunking results in too many chunks, repeat the process,
allowing more (than sentence word count / 4) words in a chunkTranslation with online MT systems
– Google Translate; Bing Translator; Yandex.Translate; Hugo.lv 12-gram language model
– DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences
Baseline System
Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google Translate
Bing Translator LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence recomposition
Baseline System
Sentence Chunking
Choose the best candidate
KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history :
where the probability and backoff penalties are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Example sentence
Recently there has been an increased interest
in the automated discovery
of equivalent expressions in different languages .
Neural Language Models
• RWTHLM• CPU only• Feed-forward, recurrent (RNN) and long short-term
memory (LSTM) NNs• MemN2N
• CPU or GPU• End-to-end memory network (RNN with attention)
• Char-RNN• CPU or GPU• RNNs, LSTMs and rated recurrent units (GRU)• Character level
Best Models• RWTHLM
• one feed-forward input layer with a 3-word history, followed by one linear layer of 200 neurons with sigmoid activation function
• MemN2N• internal state dimension of 150, linear part of
the state 75, number of hops set to six• Char-RNN
• 2 LSTM layers with 1,024 neurons each, dropout set to 0.5
Char-RNN
• A character level model works better for highly inflected languages with less data
• Requires Torch scientific computing framework + additional packages
• Can run on CPU, NVIDIA GPU or AMD GPU
• Intended for generating new text, modified to score new text
More in Andrej Karpathy’s blog
Experiment Environment
Training• Baseline KenLM and RWTHLM modes
• 8-core CPU with 16GB of RAM• MemN2N
• GeForce Titan X (12GB, 3,072 CUDA cores)12-core CPU and 64GB RAM
• Char-RNN• Radeon HD 7950 (3GB, 1,792 cores)
8-core CPU and 16GB RAM
Translation• All models
• 4-core CPU with 16GB of RAM
Results
System PerplexityTraining Corpus
SizeTrained
OnTraining
Time BLEU
KenLM 34.67 3.1M CPU 1 hour 19.23
RWTHLM 136.47 3.1M CPU 7 days 18.78
MemN2N 25.77 3.1M GPU 4 days 18.81
Char-RNN 24.46 1.5M GPU 2 days 19.53
General domain
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.7715.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
12.00
12.50
13.00
13.50
14.00
14.50
15.00
15.50
16.00
16.50
17.00
Perplexity BLEU-HY Linear (BLEU-HY)BLEU-BG Linear (BLEU-BG)
Epoch
Per
plex
ity
BLE
U
Legal domain
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.7715.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
Perplexity BLEU-BG Linear (BLEU-BG)BLEU-HY Linear (BLEU-HY)
Epoch
Per
plex
ity
BLE
U
• Matīss Rikters"Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 4th HyTra Workshop
• Matīss Rikters and Inguna Skadiņa"Syntax-based multi-system machine translation" LREC 2016
• Matīss Rikters and Inguna Skadiņa"Combining machine translated sentence chunks from multiple MT systems" CICLing 2016
• Matīss Rikters"K-translate – interactive multi-system machine translation"Baltic DB&IS 2016
• Matīss Rikters"Searching for the Best Translation Combination Across All Possible Variants"Baltic HLT 2016
Related publications
Baseline system• http://ej.uz/ChunkMTOnly the chunker + visualizer• http://ej.uz/chunkerInteractive browser version• http://ej.uz/KTranslateWith integrated usage of NN LMs• http://ej.uz/NNLMs
Code on GitHub
https://github.com/M4t1ss
More enhancements for the chunking step– Try dependency parsing instead of constituency
Choose the best translation candidate with MT quality estimation– QuEst++ (Specia et al., 2015)– SHEF-NN (Shah et al., 2015)
Add special processing of multi-word expressions (MWEs)Handle MWEs in neural machine translation systems
Future work
References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010).
• Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.• Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation.
Association for Computational Linguistics, 2011.• Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).• Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).• Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.• Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.
• Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.
• Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016)• Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016)• Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language
Processing. , 2014.• Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of
the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.• Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine
Translation. 2015.• Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association
for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.
• Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).• Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References
Thank you!
Thank you!