+ All Categories
Home > Documents > Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Date post: 17-Jan-2018
Category:
Upload: samantha-shepherd
View: 231 times
Download: 0 times
Share this document with a friend
Description:
Jan 2009Statistical MT3 Criterial Issues Do we want a translation system for one language pair or for many language pairs? Can we assume a constrained vocabulary or do we need to deal with arbitrary text? What resources exist for the languages that we are dealing with? How long will it take us to develop the resources and what human resources?
56
Jan 2009 Statistical MT 1 Advanced Techniques in NLP Machine Translation III Statistical MT
Transcript
Page 1: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 1

Advanced Techniques in NLP

Machine Translation IIIStatistical MT

Page 2: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 2

Approaching MT

• There are many different ways of approaching the problem of MT.

• The choice of approach is complex and depends upon:– Task requirements– Human resources– Linguistic resources

Page 3: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 3

Criterial Issues

• Do we want a translation system for one language pair or for many language pairs?

• Can we assume a constrained vocabulary or do we need to deal with arbitrary text?

• What resources exist for the languages that we are dealing with?

• How long will it take us to develop the resources and what human resources?

Page 4: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 4

Parallel Data

• Lots of translated text available: 100s of million words of translated text for some language pairs– a book has a few 100,000s words– an educated person may read 10,000 words a day– 3.5 million words a year– 300 million a lifetime

• Computers can see more translated text than humans read in a lifetime

• Machine can learn how to translate foreign languages.

[Koehn 2006]

Page 5: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 5

Statistical Translation

• Robust• Domain independent• Extensible• Does not require language specialists• Does requires parallel texts• Uses noisy channel model of translation

Page 6: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 6

Noisy Channel ModelSentence Translation (Brown et. al. 1990)

sourcesentence

target sentence

sentence

Page 7: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 7

Statistical Modelling

• Learn P(f|e) from a parallel corpus• Not sufficient data to estimate P(f|e) directly

[from Koehn 2006]

Page 8: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 8

The Problem of Translation• Given a sentence T of the target language, seek

the source sentence S from which a translator produced T, i.e.find S that maximises P(S|T)

• By Bayes' theorem P(S|T) = P(S) x P(T|S)

P(T)whose denominator is independent of S.

• Hence it suffices to maximise P(S) x P(T|S)

Page 9: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 9

The Three Components of a Statistical MT model

1. Method for computing language model probabilities (P(S))

2. Method for computing translation probabilities (P(S|T))

3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)

Page 10: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 10

A Statistical MT System

Source Language

Model

TranslationModel

P(S) * P(T|S) = P(S|T)

S T

DecoderT S

Page 11: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 11

Three Kinds of Model

StatisticalModels

LanguageModels

TranslationModels

Word Based

PhraseBased

Syntax Based

Word Based

Phrase Based

Syntax Based

Page 12: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 12

Language Models based on N-Grams of Words

• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))

• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))

• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))

Page 13: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 13

Syntax Based Language Models

• Good syntax tree – good English• Allows for long distance contstraints• Left sentence preferred by syntax based

model

Page 14: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 14

Word-Based Translation Models

• Translation process is decomposed into smaller steps• Each is tied to words• Based on IBM Models [Brown et al., 1993]

[from Koehn 2006]

Page 15: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 15

Word TM derived from Bitext

ENGLISH• the cat sleeps• the dog sleeps• the horse eats

FRENCH• le chat dort• le chien dort• le cheval mange

Page 16: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 16

le chat dort/the cat sleeps

le I I I

chat I I I

chien

cheval

dort I I I

mange

the cat dog horse sleeps eats

Page 17: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 17

le chien dort/the dog sleeps

le II I I II

chat I I I

chien I I I

cheval

dort II I I II

mange

the cat dog horse sleeps eats

Page 18: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 18

le cheval mange/the horse eats

le III I I I II I

chat I I I

chien I I I

cheval I I I

dort II I I II

mange I I I

the cat dog horse sleeps eats

P(t|s)

1/9

3/9

1/9

1/9

1/9

2/9

Page 19: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 19

Parameter Estimation

• Based on counting occurrences within monolingual and bilingual data.

• For language model, we need only source language text.

• For translation model, we need pairs of sentences that are translations of each other.

• Use EM (Expectation Maximisation) Algorithm (Baum 1972) to optimize model parameters.

Page 20: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 20

EM Algorithm

• Word Alignments:for sentence pair ("a b c", "x y z")are formed from arbitrary pairings from the two sentences and include:(a.x,b.y,c.z), (a.z,b.y,c.x), etc.

• There is a large number of possible alignments, since we also allow, e.g.(ab.x,0.y,c.z),

Page 21: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 21

EM Algorithm1. Make initial estimate of parameters. This can

be used to compute the probability of any possible word alignment.

2. Re-estimate parameters by ranking each possible alignment by its probability according to initial guess.

3. Repeated iterations assign ever greater probability to the set of sentences actually observed.Algorithm leads to a local maximum of the probability of observed sentence pairs as a function of the model parameters

Page 22: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 22

Parameters forIBM Translation Model

• Word Translation Probability, P(t|s)probability that source word s is translated as target word t.

• Fertility P(n|s)probability that source word s is translated by n target words (25 ≥ n≥0).

• Distortion: P(i|j,l)probability that source word at position j is translated by target word at position i in target sentence of length l.

Page 23: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 23

Experiment 1 (Brown et. al. 1990)

• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.

• Considered 9,000 most common words in each language.

• Assumptions (initial parameter values)– each of the 9000 target words equally likely as

translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for

each of the 9000 source words– each target position equally likely given each source

position and target length

Page 24: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 24

English: theFrench Probabilityle .610la .178l’ .083les .023ce .013il .012de .009à .007que .007

Fertility Probability1 .8710 .1242 .004

Page 25: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 25

English: notFrench Probabilitypas .469ne .460non .024pas du tout .003faux .003plus .002ce .002que .002jamais .002

Fertility Probability2 .7580 .1331 .106

Page 26: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 26

English: hear

French Probabilitybravo .992entendre .005entendu .002entends .001

Fertility Probability0 .5841 .416

Page 27: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 27

Sentence Translation Probability

• Given translation model for words, we can compute translation probability of sentence taking parameters into account.

• P(Jean aime Marie|John loves Mary) =

P(Jean|John) * P(1,John) * P(1|1,3) *P(aime|loves) * P(1,loves) * P(2|2,3) *P(Marie|Mary) * P(1,Mary) * P(3|3,3)

Page 28: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 28

Flaws in Word-Based Translation

• Model handles many:one P(ttt|s) but not one:many P(t|sss) translations e.g. – Zeitmangel erschwert das Problem .– lack of time makes more difficult the problem .

• Correct translation: Lack of time makes the problem more difficult.

• MT output: Time makes the problem.[from Koehn 2006]

Page 29: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 29

Flaws Word-Based Translation (2)

• Phrasal Translation: P(ttt|ssss)e.g. erübrigt sich /there is no point in– Eine Diskussion erübrigt sich demnach .– a discussion is made unnecessary itself therefore .

– Correct translation: Therefore, there is no point in a discussion.

• MT output: A debate turned therefore .[from Koehn 2006]

Page 30: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 30

Flaws in Word BasedTranslation (3)

Syntactic transformations • Example Object/subject reordering• Den Vorschlag lehnt die Kommission ab

the proposal rejects the commission off• Correct translation: The commission

rejects the proposal .• MT output: The proposal rejects the commission.

[from Koehn 2006]

Page 31: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 31

Phrase Based Translation Models

• Foreign input is segmented in phrases.• Phrases are any sequence of words, not

necessarily linguistically motivated.• Each phrase is translated into English• Phrases are reordered.

[from Koehn 2006]

Page 32: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 32

Syntax Based Translation Models

Page 33: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 33

Word Based Decoding: searching for the best translation

(Brown 1990)• Maintain list of hypotheses. • Initial hypothesis: (Jean aime Marie | *)• Search proceeds iteratively. • At each iteration we extend most promising

hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) *

• Parenthesised numbers indicate corresponding position in target sentence

Page 34: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 34

Phrase-Based Decoding

• Build translation left to right– select foreign word(s) to be translated– find English phrase translation– add English phrase to end of partial translation

[Koehn 2006]

Page 35: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 35

Decoding Process

• one to many translation[Koehn 2006]

Page 36: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 36

Decoding Process

• many to one translation[Koehn 2006]

Page 37: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 37

Decoding Process

• translation finished[Koehn 2006]

Page 38: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 38

Hypothesis Expansion

• Start with empty hypothesis– e: no English words– f: no foreign words covered– p: probability 1

[Koehn 2006]

Page 39: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 39

Hypothesis Expansion

Page 40: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 40

Hypothesis Expansion

• further hypothesis expansion[Koehn 2006]

Page 41: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 41

Decoding Process

• adding more hypotheses leads to explosion of search space.

[Koehn 2006]

Page 42: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 42

Hypothesis Recombination

• Sometimes different choices of hypothesis lead to the same translation result.

• Such paths can be combined.[Koehn 2006]

Page 43: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 43

Hypothesis Recombination

• Drop weaker path• Keep pointer from weaker path

[Koehn 2006]

Page 44: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 44

Pruning• Hypothesis recombination is not sufficient

– Heuristically discard weak hypotheses early• Organize Hypothesis in stacks, e.g. by

– same foreign words covered– same number of foreign words covered (Pharaoh does this)– same number of English words produced

• Compare hypotheses in stacks, discard bad ones– histogram pruning: keep top n hypotheses in each stack (e.g.,

n=100)– threshold pruning: keep hypotheses that are at most times the

cost of best hypothesis in stack (e.g., = 0.001)

Page 45: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 45

Hypothesis Stacks

• Organization of hypothesis into stacks– here: based on number of foreign words translated– during translation all hypotheses from one stack are expanded– expanded Hypotheses are placed into stacks– one to many translation

[Koehn 2006]

Page 46: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 46

Comparing Hypotheses covering Same Number of Foreign Words

• Hypothesis that covers easy part of sentence is preferred• Need to consider future cost of uncovered parts• Should take account of one to many translation

[Koehn 2006]

Page 47: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 47

Future Cost Estimation

• Use future cost estimates when pruning hypotheses• For each uncovered contiguous span:

– look up future costs for each maximal contiguous uncovered span

– add to actually accumulated cost for translation option for pruning

[Koehn 2006]

Page 48: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 48

Pharoah

• A beam search decoder for phrase-based models– works with various phrase-based models– beam search algorithm– time complexity roughly linear with input length– good quality takes about 1 second per sentence– Very good performance in DARPA/NIST Evaluation– Freely available for researchers

http://www.isi.edu/licensed-sw/pharaoh/• Coming soon: open source version of Pharaoh

Page 49: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 49

Pharoah Demo% echo ’das ist ein kleines haus’ | pharaoh -f pharaoh.ini > outPharaoh v1.2.9, written by Philipp Koehn

a beam search decoder for phrase-based statistical machine translation models(c) 2002-2003 University of Southern California(c) 2004 Massachusetts Institute of Technology(c) 2005 University of Edinburgh, Scotlandloading language model from europarl.srilmloading phrase translation table from phrase-table, stored 21, pruned 0, kept 21loaded data structures in 2 secondsreading input sentencestranslating 1 sentences.translated 1 sentences in 0 seconds[3mm] % cat outthis is a small house

Page 50: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 50

Brown Experiment 2

• Perform translation using 1000 most frequent words in the English corpus.

• 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary.

• 117,000 pairs of sentences completely covered by both vocabularies.

• Parameters of English language model from 570,000 sentences in English part.

Page 51: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 51

Experiment 2 contd

• 73 French sentences tested from elsewhere in corpus. Results were classified as– Exact – same as actual translation– Alternate – same meaning– Different – legitimate translation but different

meaning– Wrong – could not be intepreted as a translation– Ungrammatical – grammatically deficient

• Corrections to the last three categories were made and keystrokes were counted

Page 52: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 52

Results

Category # sentences percentExact 4 5Alternate 18 25Different 13 18Wrong 11 15Ungrammatical 27 37

Total 73

Page 53: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 53

Results - Discussion

• According to Brown et. al., system performed successfully 48% of the time (first three categories).

• 776 keystrokes needed to repair 1916 keystrokes to generate all 73 translations from scratch.

• According to authors, system therefore reduces work by 60%.

Page 54: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 54

IssuesAutomatic evaluation methods• can computers decide what are good

translations?Phrase-based models• what are atomic units of translation?• how are they discovered?• the best method in statistical machine translationDiscriminative training• what are the methods that directly optimize

translation performance?

Page 55: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 55

The Speculative (Koehn 2006)

Syntax-based transfer models• how can we build models that take

advantage of syntax?• how can we ensure that the output is

grammatical?Factored translation models• how can we integrate different levels of

abstraction?

Page 56: Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.

Jan 2009 Statistical MT 56

Bibliography

• Statistical MTBrown et. al., A Statistical Approach to MT, Computational Linguistics 16.2, 1990 pp79-85 (search “ACL Anthology”)

• Koehn tutorial (see http://www.iccs.inf.ed.ac.uk/~pkoehn/)


Recommended