+ All Categories
Home > Documents > Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for...

Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for...

Date post: 20-Dec-2015
Category:
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
49
Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited Lecture CS 4705: Introduction to Natural Language Processing Fall 2004
Transcript
Page 1: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Machine Translation:Challenges and Approaches

Nizar HabashPost-doctoral Fellow

Center for Computational Learning Systems

Columbia University

Invited LectureCS 4705: Introduction to Natural Language Processing

Fall 2004

Page 2: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Sounds like Faulkner?

http://www.ee.ucla.edu/~simkin/sounds_like_faulkner.html

It lay on the table a candle burning at each corner upon the envelope tied in a soiled pink garter two artificial flowers. Not hit a man in glasses.

It was once a shade, which was in all beautiful weather under a tree and varied like the branches in the wind.

William Faulkner, "The sound and the fury“

Es war einmal ein Schatten, der lag bei jedem schönen Wetter unter einem Baum und schwankte wie die Zweige im Wind.

Helmut Wördemann, "Der unzufriedene Schatten“

(Translated by Systran)

Faulkner

Machine Translation

Faulkner

Machine Translation

Page 3: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Progress in MTStatistical MT example

Form a talk by Charles Wayne, DARPA

2002 2003 Human Translationinsistent Wednesday may recurred her trips to Libya tomorrow for flying

Cairo 6-4 ( AFP ) - an official announced today in the Egyptian lines company for flying Tuesday is a company " insistent for flying " may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment .

Egyptair Has Tomorrow to Resume Its Flights to Libya

Cairo 4-6 (AFP) - said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flights to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya.

Egypt Air May Resume its Flights to Libya Tomorrow

Cairo, April 6 (AFP) - An Egypt Air official announced, on Tuesday, that Egypt Air will resume its flights to Libya as of tomorrow, Wednesday, after the UN Security Council had announced the suspension of the embargo imposed on Libya.

Page 4: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Road Map

• Why Machine Translation (MT)?

• Multilingual Challenges for MT

• MT Approaches

• MT Evaluation

Page 5: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Why (Machine) Translation?

Languages in the world• 6,800 living languages• 600 with written tradition • 95% of world population

speaks 100 languages

Translation Market• $8 Billion Global Market• Doubling every five years

(Donald Barabé, invited talk, MT Summit 2003)

Page 6: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Why Machine Translation?

• Full Translation– Domain specific

• Weather reports

• Machine-aided Translation– Translation dictionaries

– Translation memories

– Requires post-editing

• Cross-lingual NLP applications– Cross-language IR

– Cross-language Summarization

Page 7: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Road Map

• Why Machine Translation (MT)?• Multilingual Challenges for MT

– Orthographic variations– Lexical ambiguity– Morphological variations– Translation divergences

• MT Paradigms• MT Evaluation

Page 8: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Multilingual Challenges

• Orthographic Variations– Ambiguous spelling

• اشعارا االوالد األو الد� كتب ك�ت�ب�اشع�ارا�

– Ambiguous word boundaries•

• Lexical Ambiguity– Bank بنك (financial) vs. ضفة(river)– Eat essen (human) vs. fressen (animal)

Page 9: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Multilingual Challenges Morphological Variations

• Affixation vs. Root+Pattern

write written كتب

بوكتم

kill killed قتل لوقتم

do done فعل لوفعم

conj

noun

pluralarticle

• Tokenization

And the cars and the cars

اتسيارالو w Al SyArAt

Et les voitures et le voitures

Page 10: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Multilingual ChallengesTranslation Divergences

• How languages map semantics to syntax• 35% of sentences in TREC El Norte Corpus (Dorr et al 2002)

• Divergence Types– Categorial (X tener hambre X be hungry) [98%]

– Conflational (X dar puñaladas a Z X stab Z) [83%]

– Structural (X entrar en Y X enter Y) [35%]

– Head Swapping (X cruzar Y nadando X swim across Y) [8%]

– Thematic (X gustar a Y Y like X) [6%]

Page 11: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

هنا تلسI-am-not here

be

I here

I am not here

not

ليس

نا ا هنا

Translation Divergencesconflation

Je ne suis pas iciI not be not here

etre

Je icine pas

Page 12: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

*نا ا بردان

*קר ל

بردانانا I cold

be

I cold

I am cold ליקרcold for-me

אני

Translation Divergencescategorial, thematic and structural

tener

Yo frio

tengo frioI-have cold

Page 13: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

swim

I quicklyacross

river

I swam across the river quickly

Translation Divergenceshead swap and categorial

اسرع

انا عبورسباحة

نهر

سباحة النهر عبور اسرعتI-sped crossing the-river swimming

Page 14: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

swim

I quicklyacross

river

I swam across the river quickly

Translation Divergences head swap and categorial

חצה

אני אתב

נהר

ב

שחיה מהירות

חציתי את הנהר בשחיה במהירותI-crossed obj river in-swim speedily

Page 15: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Translation Divergences head swap and categorial

חצה

אני אתב

נהר

ב

שחיה מהירות

اسرع

انا عبورسباحة

نهر

swim

I quicklyacross

river

noun

prep

verb

noun

adverb

verb

nounverb

noun

Page 16: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Translation DivergencesOrthography+Morphology+Syntax

妈妈的车 mama de che

car

mom

possessed-by

mom’s car

ماما ةسيارsayyArat mama

la voiture de maman

Page 17: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Road Map

• Why Machine Translation (MT)?

• Multilingual Challenges for MT

• MT Approaches– Gisting / Transfer / Interlingua– Statistical / Symbolic / Hybrid – Practical Considerations

• MT Evaluation

Page 18: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Gisting

Page 19: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesGisting Example

Sobre la base de dichas experiencias se estableció en 1988 una metodología.

Envelope her basis out speak experiences them settle at 1988 one methodology.

On the basis of these experiences, a methodology was arrived at in 1988.

Page 20: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Gisting

Transfer

Page 21: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesTransfer Example

• Transfer Lexicon – Map SL structure to TL structure

poner

X mantequilla en

Y

:obj:mod:subj

:obj

butter

X Y

:subj :obj

X puso mantequilla en Y X buttered Y

Page 22: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Gisting

Transfer

Interlingua

Page 23: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesInterlingua Example: Lexical Conceptual Structure

(Dorr, 1993)

Page 24: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Interlingua

Gisting

Transfer

Page 25: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesMT Pyramid

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Interlingual Lexicons

Dictionaries/Parallel Corpora

Transfer Lexicons

Page 26: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesStatistical vs. Symbolic

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

Page 27: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Approaches Noisy Channel Model

Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 28: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Approaches

IBM Model (Word-based Model)

http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

Page 29: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

MT ApproachesStatistical vs. Symbolic vs. Hybrid

Page 30: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Source word

Source syntax

Source meaning Target meaning

Target syntax

Target word

Analysis Generation

MT ApproachesStatistical vs. Symbolic vs. Hybrid

Page 31: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesHybrid Example: GHMT

• Generation-Heavy Hybrid Machine Transaltion

• Lexical transfer but NO structural transfer

poner

Maria mantequilla en

pan

:obj:mod:subj

:obj

lay locate place put render set stand

Maria butter bilberry on in into at

bread loaf

:obj:mod:subj

:obj

Maria puso la mantequilla en el pan.

Page 32: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesHybrid Example: GHMT

• LCS-driven Expansion

• Conflation Example

Goal

BUTTERV

MARIA BREAD

Agent Goal

PUTV

BUTTERN

ThemeAgent

MARIA BREAD

[CAUSE GO] [CAUSE GO]

CategorialVariation

Page 33: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesHybrid Example: GHMT

• Structural Overgeneration

put

Maria butter on

bread

lay

Maria butter at

loaf

render

Maria butter into

loaf

butter

Maria bread

bread

Maria butter …

Page 34: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

• Structural N-gram Model– Long-distance – Lexemes

• Surface N-gram Model– Local – Surface-forms

John

buy

MT ApproachesHybrid Example: GHMT

Target Statistical Resources

car

a red

John bought cara red

Page 35: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT ApproachesHybrid Example: GHMT Linearization &Ranking

Maria buttered the bread -47.0841Maria butters the bread -47.2994Maria breaded the butter -48.7334Maria breads the butter -48.835Maria buttered the loaf -51.3784Maria butters the loaf -51.5937Maria put the butter on bread -54.128

Page 36: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Approaches Practical Considerations

• Resources Availability– Parsers and Generators

• Input/Output compatability

– Translation Lexicons• Word-based vs. Transfer/Interlingua

– Parallel Corpora• Domain of interest

• Bigger is better

• Time Availability– Statistical training, resource building

Page 37: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Approaches Resource Poverty

No Parser?No Translation Dictionary?

Parallel Corpus• Align with rich language

• Extract dictionary

•Parse rich side•Infer parses

•Build a statistical parser

Page 38: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Road Map

• Why Machine Translation (MT)?

• Multilingual Challenges for MT

• MT Approaches

• MT Evaluation

Page 39: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Evaluation

• More art than science

• Wide range of Metrics/Techniques– interface, …, scalability, …, faithfulness, ...

space/time complexity, … etc.

• Automatic vs. Human-based– Dumb Machines vs. Slow Humans

Page 40: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Evaluation Metrics

• System-based MetricsCount internal resources: size of lexicon, number of grammar rules, etc.– easy to measure– not comparable across systems– not necessarily related to utility

(Church and Hovy 1993)

Page 41: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Evaluation Metrics• Text-based Metrics

– Sentence-based Metrics• Quality: Accuracy, Fluency, Coherence, etc.

• 3-point scale to 100-point scale

– Comprehensibility Metrics• Comprehension, Informativeness,

• x-point scales, questionnaires

• most related to utility

• hard to measure

Page 42: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

MT Evaluation Metrics

• Text-based Metrics (cont’d)– Amount of Post-Editing

• number of keystrokes per page

• not necessarily related to utility

• Cost-based Metrics– Cost per page– Time per page

Page 43: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

5 contents of original sentence conveyed (might need minor corrections)

4 contents of original sentence conveyed BUT errors in word order

3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.

2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers

1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses

Human-based Evaluation ExampleAccuracy Criteria

Page 44: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

5 clear meaning, good grammar, terminology and sentence structure

4 clear meaning BUT bad grammar, bad terminology or bad sentence structure

3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure

2 meaning unclear BUT inferable

1 meaning absolutely unclear

Human-based Evaluation ExampleFluency Criteria

Page 45: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Fluency vs. Accuracy

Accuracy

Fluency

conMTFAHQ

MTProf.MT

Info.MT

Page 46: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Automatic Evaluation ExampleBleu Metric

• Bleu – BiLingual Evaluation Understudy (Papineni et al 2001)

– Modified n-gram precision with length penalty

– Quick, inexpensive and language independent

– Correlates highly with human evaluation

– Bias against synonyms and inflectional variations

Page 47: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Test Sentence

colorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep iratelydrab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Automatic Evaluation ExampleBleu Metric

Page 48: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Test Sentence

colorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep iratelydrab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4/5

Automatic Evaluation ExampleBleu Metric

Page 49: Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Test Sentence

colorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiouslycolorless green ideas sleep furiously

Gold Standard References

all dull jade ideas sleep iratelydrab emerald concepts sleep furiously

colorless immature thoughts nap angrily

Unigram precision = 4 / 5 = 0.8Bigram precision = 2 / 4 = 0.5

Bleu Score = (a1 a2 …an)1/n

= (0.8 ╳ 0.5)½ = 0.6325 63.25

Automatic Evaluation ExampleBleu Metric


Recommended