+ All Categories
Home > Documents > LREC 2008 Marrakech 29 May 20081 Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe...

LREC 2008 Marrakech 29 May 20081 Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe...

Date post: 13-Dec-2015
Category:
Upload: elaine-rogers
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
LREC 2008 Marrakech 29 May 2008 1 Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès- Nancy, France Phrase-Based Machine Translation based on Simulated Annealing
Transcript

LREC 2008 Marrakech 29 May 2008 1

Caroline Lavecchia, Kamel Smaïli and David Langlois

LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France

Phrase-Based Machine Translation based on Simulated Annealing

LREC 2008 Marrakech 29 May 2008 2

OutlineOutline

Statistical Machine Translation (SMT)Concept of inter-lingual triggersOur SMT system based on inter-lingual triggers

– Word-based approach

– Phrase-based approach using Simulated Annealing algorithm (SA)

ExperimentsConclusion

LREC 2008 Marrakech 29 May 2008 3

T* = argmaxT P(T|S)

T* = argmaxT P(T) * P(S|T)

Given a source sentence S , find the best target sentence T* which maximizes the probability P(T|S)Noisy channel approach

Language modelLanguage model Translation ModelTranslation Model

Introduction

ApproachesStatistical Machine Translation

Introduction:Introduction:

LREC 2008 Marrakech 29 May 2008 4

Word-based approach– Translation process is done word-by-word – IBM models (Brown et al., 1993)

Phrase-based approach(Och et al., 1999), (Yamada and Knight, 2001), (Marcu and Wong, 2002)

– Better MT system quality– Advantages:

Explicitly models lexical unitsEx: rat de bibliothèque bookworm

Easily captures local reorderingEx: Tour Eiffel Eiffel tower

Introduction

ApproachesStatistical Machine Translation

Approaches:Approaches:

LREC 2008 Marrakech 29 May 2008 5

Current translation models are complex Their estimation needs a lot of time and

memory

A new translation model based on inter-lingual triggers:A new translation model based on inter-lingual triggers:

We propose a new translation model based on a simple concept: the triggers.

LREC 2008 Marrakech 29 May 2008 6

Triggers in statistical language modeling:Triggers in statistical language modeling:

Concept of inter-lingual triggers

•A trigger is a set composed of a word and its best correlated words.

•Triggers are determined by computing Mutual Information (MI) between words on a monolingual corpus.

•A trigger is a set composed of a word and its best correlated words.

•Triggers are determined by computing Mutual Information (MI) between words on a monolingual corpus.

Gary Kasparov is a chess champion

•In statistical language modeling, triggers allow to enhance the probability of triggered words given a triggering word.

Gary Kasparov is a chess champion

•In statistical language modeling, triggers allow to enhance the probability of triggered words given a triggering word.)

)(*)(

),(log(*),(),(

yPxP

yxPyxPyxMI

Review of triggers

Inter-lingual triggers

LREC 2008 Marrakech 29 May 2008 7

Inter-lingual triggers:Inter-lingual triggers:

Review of triggers

Inter-lingual triggers

•An inter-lingual trigger is a set composed of a source unit s and its best correlated target units:t1, …, tn.

•Inter-lingual triggers are determined by computing Mutual Information (MI) between units on a bilingual aligned corpus.

•An inter-lingual trigger is a set composed of a source unit s and its best correlated target units:t1, …, tn.

•Inter-lingual triggers are determined by computing Mutual Information (MI) between units on a bilingual aligned corpus.

))(*)(

),(log(*),(),(

tPsP

tsPtsPtsMI

Gary Kasparov is a chess champion

Gary Kasparov est un champion d’échecs

We hope to find possible translations of s among the set of its triggered target units t1, …, tn

Gary Kasparov is a chess champion

Gary Kasparov est un champion d’échecs

We hope to find possible translations of s among the set of its triggered target units t1, …, tn

Concept of inter-lingual triggers

1 source word triggers 1 target word.1 source word triggers 1 target word.

LREC 2008 Marrakech 29 May 2008 8

1-To-11-To-1 triggers: triggers:

Review of triggers

Inter-lingual triggers

n-To-mn-To-m triggers: triggers:

Gary → Gary

Gary → Kasparov

Kasparov → Gary

Kasparov → chess

échecs → chess

champion → champion

n source words trigger m target words.n source words trigger m target words.

Gary Kasparov → Gary Kasparov

champion d’échecs → chess champion

un champion → a champion

Kasparov est un → is a chess

échecs → chess champion

champion → is a chess

Source: Gary Kasparov est un champion d’échecsTarget: Gary Kasparov is a chess championSource: Gary Kasparov est un champion d’échecsTarget: Gary Kasparov is a chess champion

Concept of inter-lingual triggers

LREC 2008 Marrakech 29 May 2008 9

SMT based on inter-lingual triggersSMT based on inter-lingual triggers

How to make good use of inter-lingual triggers in order to estimate a translation model?

•Word-based translation model using 1-To-1 triggers

•Phrase-based translation model using n-To-m triggers

LREC 2008 Marrakech 29 May 2008 10

SMT based on inter-lingual triggers

For each source word, we keep its k best 1-To-1 triggers.

We hope this constitute its potential translations.

Translation model– We assign to each inter-lingual trigger a probability

calculated as follow:

)(

),(

),()()(,

sTrigt

iii stMI

stMIstPsTrigts

Word-based translation model

Word-based translation model using Word-based translation model using 1-To-11-To-1 triggers triggers

Motivations:– Most methods for learning phrase translations

require word alignments– All phrase pairs that are consistent with this word

alignment are collected

phrases with no linguistic motivation

noisy phrases

LREC 2008 Marrakech 29 May 2008 11

Phrase-based translation model using on Phrase-based translation model using on n-To-mn-To-m triggers triggers

SMT based on inter-lingual triggers Phrase-based translation model

1. Extract phrases from the source corpus

2. Determine potential translations of the source phrases by using n-To-m triggers

3. Start with 1-To-1 triggers to set a baseline MT system

4. Select an optimal subset of n-To-m triggers by Simulated Annealing algorithm

LREC 2008 Marrakech 29 May 2008 12

Method for learning phrase translation:Method for learning phrase translation:

SMT based on inter-lingual triggers Phrase-based translation model

Iterative process which selects phrases by grouping words with high Mutual Information. (Zitouni et al., 2003)

Only those which improve the perplexity on the source corpus are kept.

→ pertinent source phrases

LREC 2008 Marrakech 29 May 2008 13

Phrase extraction:Phrase extraction:

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

LREC 2008 Marrakech 29 May 2008 14

Learning potential phrase translation:Learning potential phrase translation:

A source phrase can be translated by different target sequences of variable sizes.

Assumption: each source phrase of l words can be translated by a sequence of j target wordswhere j Є [l-Δl, l+Δl]

For each source phrase of length l, potential translations are:• sets of n-To-m triggers with n = l and m Є [l-Δl, l+ Δl]

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

LREC 2008 Marrakech 29 May 2008 15

Example:Example:

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

Potential translations of porter plainte

2-To-1 2-To-2 2-To-3

press press charges can press charges

charges can press not press charges

easy not press you can press

Source phrase:porter plainte (l=2)

We assume that porter plainte can be translated by sequences of 1, 2 or 3 target words (Δl=1).

LREC 2008 Marrakech 29 May 2008 16

General case:General case:

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

All source phrases associated with its k potential translations constitute the set of n-To-m triggers.

We have to select among n-To-m triggers pertinent translations and discard noisy ones.

Our problem: find a optimal subset of phrase translations which leads to the best MT performance

Unreasonnable to try all possibilities!! Proposed method: use Simulated Annealing algorithm

LREC 2008 Marrakech 29 May 2008 17

Terminate searchTerminate search

Initial configuration

Pertub the configuration

Accept new

configuration

Accept new

configuration

Update current configuration

Adjust temperature

Stop

no

yes

yesno

Simulated Annealing:Simulated Annealing:

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

Technique applied to find an optimal solution to a combinatorial problem

Technique applied to find an optimal solution to a combinatorial problem

Initial temperature

LREC 2008 Marrakech 29 May 2008 18

Algorithm applied to SMT:Algorithm applied to SMT:

Method for learning phrase translation

Source phrase extraction

Determine potential phrase translation

Select optimal phrase translations by SA algorithm

1. Start with a high temperature T and a baseline word-based MT system using 1-To-1 triggers

2. do1. Perturb the system from state i to state j by randomly adding a subset

of n-To-m triggers into the currrent SMT system

2. Evaluate the performance of the new system (Ej)

3. If (Ej>Ei) then move from state i to state j

Otherwise accepte state j with a probability random(P)<e(Ei-Ej)/T with P Є[0-1]

Until the performance of our SMT system stops increasing

a) Decrease the temperature and go to step 2 until the performance of the system stops increasing

LREC 2008 Marrakech 29 May 2008 19

Text input decoder Text output

Translation model

Language modelLanguage model

Bleuinitial

Subset of n-to-m trigersSubset of n-to-m trigers

Bleunew > Bleucurrent

Initial system

Bleunew ≤ Bleucurrent

New system

Bleucurrent

Pertubation of the current system

Bleunew

1-To-1 triggers1-To-1

triggersn-To-m triggersn-To-m triggers

T

BleuBleuePrandom newcurrent )(

)(

T

BleuBleuePrandom newcurrent )(

)(

LREC 2008 Marrakech 29 May 2008 20

Subtitle copora:Subtitle copora:

Experiments

Corpora

Tuning step

Evaluation

Subtitle parallel corpora built using Dynamic Time Wrapping algorithm (Lavecchia et al., 2007)

French English

Train Sentences 27523

Words 191185 205785

Singletons 7066 5400

Vocabulary 14655 11718

Dev Sentences 1959

Words 13598 14739

Test Sentences 756

Words 5314 756

LREC 2008 Marrakech 29 May 2008 21

SA algorithm parameters:SA algorithm parameters:

Experiments

Corpora

Tuning step

Evaluation

1-To-1 triggers: all source words associated with its best 50 target words

n-To-m triggers: – 15860 source phrases– all source phrases associated with its 30 best n-To-1,

n-To-2 and n-To-3 inter-lingual triggers

Initial temperature: 10-4

System perturbation: adding 10 potential translations of 10 source phrases

LREC 2008 Marrakech 29 May 2008 22

Initial system:Initial system:

Experiments

Corpora

Tuning step

Evaluation

Translation Model tm lm d w Bleu

1-To-1 triggers 0.6 0.3 0.3 0 12.49

IBM M3(2) 0.8 0.6 0.6 0 12.39

(1) Trigram model(2) (Brown et al., 1993)

Text input Text outputPharaoh decoderPharaoh decoder

Word translation

model

Word translation

model

Language model(1)

Language model(1)

LREC 2008 Marrakech 29 May 2008 23

Final system:Final system:

Experiments

Corpora

Tuning step

Evaluation

Translation Model tm lm d w Bleu

optimal n-To-m triggers 0.6 0.3 0.3 0 14.14

Reference(2) 0.8 0.6 0.6 0 7.02

(1) Trigram model(2) (Och, 2002)

Text input Text outputPharaoh decoderPharaoh decoder

Phrase translation

model

Phrase translation

model

Language model(1)

Language model(1)

LREC 2008 Marrakech 29 May 2008 24

Evaluation of the final system:Evaluation of the final system:

Experiments

Corpora

Tuning step

Evaluation

Inter-lingual triggers State of the art

1-To-1 n-To-m IBM3 Reference

Dev 12.49 14.14 12.39 7.02

Test 13.63 10.77 14.00 6.57

Lead of n-to-m triggers on 1-to-1 triggers not corroborated on the test corpusExplanations:

- Over-fitting due to poor amount of data- Corpora of different movie styles

Lead of n-to-m triggers on 1-to-1 triggers not corroborated on the test corpusExplanations:

- Over-fitting due to poor amount of data- Corpora of different movie styles

Impact of over-fitting more important on the state-of-the-art systems.

LREC 2008 Marrakech 29 May 2008 25

Conclusion and future work:Conclusion and future work:

• A new method for learning phrase translations1. Extract source phrases 2. Find phrase translations using inter-lingual triggers3. Select the pertinent ones using SA algorithm

advantages: no word alignment + more pertinent phrase translations

• Experiments on movie subtitle corpora More robust on sparse data than a state-of-the-art approach Better translation quality in terms of Bleu score (+7pts dev., +4pts test)

• A new method for learning phrase translations1. Extract source phrases 2. Find phrase translations using inter-lingual triggers3. Select the pertinent ones using SA algorithm

advantages: no word alignment + more pertinent phrase translations

• Experiments on movie subtitle corpora More robust on sparse data than a state-of-the-art approach Better translation quality in terms of Bleu score (+7pts dev., +4pts test)

• Improvement of our system Classify movies Integrate linguistic knowledge in the translation process

Considering inter-lingual triggers not only on word surface forms

• Improvement of our system Classify movies Integrate linguistic knowledge in the translation process

Considering inter-lingual triggers not only on word surface forms


Recommended