Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
Machine Translation
A Presentation by:
Julie Conlonova,
Rob Chase,
and Eric Pomerleau
Overview
Language Alignment SystemDatasets
Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus)
A word-aligned set for testing and evaluation to measure accuracy and precision
Decoding
Language Alignment
Goal: Produce a word-aligned set from a sentence-aligned dataset
First step on the road toward Statistical Machine Translation
Example Problem: The motion to adjourn the House is now
deemed to have been adopted. La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
IBM Models 1 and 2-Kevin Knight, A Statistical MT Tutorial Workbook, 1999
Each capable of being used to produce a word-aligned dataset separately.
EM Algorithm Model 1 produces T-values based on
normalized fractional counting of corresponding words.
Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words
Training Data
European Parliament Proceedings Parallel Corpus 1996-2003
Aligned Languages: English - French English - Dutch English - Italian English - Finish English - Portuguese English - Spanish English - Greek
Training Data cont.
Eliminated Misaligned sentences Sentences with 50 or more words XML tags Symbols and numerical characters other then
commas and periods
Ideally…
http://www.cs.berkeley.edu/~klein/cs294-5
Bypassing Interlingua: Models I-III
Variables contributing to the probability of a sentence:Correlation between words in the
source/target languagesFertility of a wordCorrelation between order of words in
source sentence and order of words in target
A Translation Matrix
Rob Cat is Dog
Rob 1 0 0 0
Gato 0 1 0 0
es 0 0 .5 0
esta 0 0 .5 0
Perro 0 0 0 1
Building the Translation Matrix: Starting from alignments
Find the sentence alignmentIf a word in the source aligns with a word
in the target, then increment the translation matrix.
Normalize the translation matrix
Can’t find alignments
Most sentences in the hansards corpus are 60 words long. There are many that can be over 100.
100100 possible alignments
Counting
Rob is a boy. Rob es nino.Rob is tall. Rob es alto.Eric is tall. Eric es alto.
… …
Base counts on co-occurrence, weighting based on sentence length.
Iterative Convergence
Use Estimation Maximization algorithm
Creates translation matrix
Rob Is Tall boy
Rob .66 .33 .25 .25
es .30 .66 .25 .25
alto .2 .05 .5 0
nino .2 .05 0 .5
Distorting the Sentence
Word order changes between languagesHow is a sentence with 2 words distorted?How is a sentence with 3 words distorted?How is a sentence with …
To keep track of this information we use…
A tesseract!
(A quadruply nested default dictionary)
This could be a problem if there are more than 100 words in a sentence.
100x100x100x100 = too big for RAM and takes too much time
Broad Look at MT
“The translation process can be described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target language.”
- “Translation Process”, Wikipedia, May 2006
Decoding
How to go from the T-matrix and A-matrix to a word alignment?
There are several approaches…
Viterbi
If only doing alignment, much smaller memory and time requirements.
Returns optimal path.
T-Matrix probabilities function as the “emission” matrix
A-Matrix probabilities concerned with the positioning of words
Decoding as a Translator
Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner.
However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets.
Greedy Hill ClimbingKnight & Koehn, What’s New in Statistical Machine Translation, 2003
Best first search2-step look ahead to avoid getting stuck in
most probable local maxima
Beam SearchKnight & Koehn, What’s New in Statistical Machine Translation, 2003
Optimization of Best First Search with heuristics and “beam” of choices
Exponential tradeoff when increasing the “beam” width
Other Decoding MethodsKnight & Koehn, What’s New in Statistical Machine Translation, 2003
Finite State Transducer Mapping between languages based on a finite
automaton
Parsing String to Tree Model
Problem: One to Many
Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value”
Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999
Results
Study done in 2003 on word alignment error rates in Hansards corpus: Model 2 –
29.3% on 8K training sentence pairs 19.5% on 1.47M training sentence pairs
Optimized Model 6 – 20.3% on 8K training sentence pairs 8.7% on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003
Expected Accuracy
70% overall
Language performance: Dutch
French• Italian, Spanish, Portuguese
Greek Finish
Possible Future Work
Given more time, we would’ve implemented IBM Model 3
Additionally uses n, p, and d fertilities for weighted alignments: N, number of words produced by one word D, distortion P, parameter involving words that aren’t involved directly
Invokes Model 2 for scoring
Another Possible Translation Scheme
Example-Based Machine Translation Translation-by-Analogy Can sometimes achieve better than the “gist”
translations from other models
Why Is Improving Machine Translation Necessary?
A Chinese to English Translation
The End
Are there any questions/comments?