+ All Categories
Home > Documents > Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer...

Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer...

Date post: 21-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
32
MIT Lincoln + Computer Science AI Labs 1 8/14/2006 Charles University Wade Shèn, Břooke Cowan, Ondrej Bojar and Christine Möran Factored Translation Models for Small Data Problems Experiments with Spanish, Czech and Chinese
Transcript
Page 1: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

MIT Lincoln + Computer Science AI Labs

18/14/2006

Charles University

Wade Shèn, Břooke Cowan, OndrejBojar and Christine Möran

Factored Translation Models for Small Data Problems

Experiments with Spanish, Czech and Chinese

Page 2: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

28/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 3: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

38/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

General MotivationsChallenges with Small Data

• Phrase-based MT relies on large data– Learn “Phrase” co-occurence within language– Learn Translation templates/phrases across languages

• Problems Phrase-based MT with Small Data– Word Alignment– Hard to see enough phrases (coverage)

Especially in morphologically rich languages– Tend to rely on shorter phrases

Increased local agreement problems Increased long-distance coherence problems

Page 4: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

48/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Possible Advantages of Factored ModelsGeneralization over Morphology

• We can Model morph. variation and phrase translation separately for better statistics: Translation + Generation

– Spanish Gender

– Czech Case

Masculine FeminineEnglishSpanish Él es un jugador rojo Ella es una jugadora roja

he is a red player she is a red player

Nominative + Plural Dative + PluralEnglishCzech černé kočky černým kočkám

black cats black cats

el ser un jugador rojMorph: f 3p+sing f f fMorph: m 3p+sing m m m

černá kočka Morph: dat+pl dat+plMorph: nom+pl nom+pl

Page 5: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

58/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factors as Type CheckingLong Range Phenomena and Divergence

• Long range dependencies can be modeled with latent factors– Spanish: Verb – Subject Number Agreement

• Verb-Argument dependencies

Spanish Mi hija de dos años tiene catarroGloss My daughter of two years has coldCzech Nachlazena je moje dvouletá dcera.

verb: 3p+singSubject: 3p+sing AGR

verb: 3p+sing Subject: 3p+singAGR

Czech Napsal zprávu o matčině domu na papírGloss He wrote a message about mother’s house on a paper

noun: accusativeverb select

Czech Našel zprávu o matčině domu na papířeGloss He found a message about mother’s house on a paper

noun: locativeverb select

Page 6: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

68/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Phrase-Level Generalization

• Class-based divergences– Chinese-English resultative constructions

Similar pattern for large class of verbs

• Longer distance movement dependencies– Chinese-English Questions

Chinese 你 要 答 破 吗made hit broken doneyou

回Gloss it

English you broke it

Chinese 你 要 答 [clause…] 吗want [clause…] y/n-markeryou

would you like to reply to [clause…] ?

回Gloss replyEnglish

causes reorderingTags: VModal Pn Tag: Part

Verb Specific

Page 7: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

78/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Large vs. Small DataHow generalizations may affect SMT Performance

• With large data sets these phenomena can be learned– Language Models should get local agreement phenomena

with enough data– Long range agreement/coherence still problematic– Generalization may still be better, but errors in analysis can

limit

• Generalization may be advantageous for small data– For example: (Spanish/Czech Agreement)

Can’t learn every noun/adjective/determiner triple– Situation for many real-world problems

Page 8: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

88/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines– Approaches– Data Sets

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 9: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

98/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Data Sets and Baselines

Data Set Translation Direction(s)

Size Baseline w/diff LMs(BLEU/Surface)

Full Europarl English Spanish

950k LM Train700k Bitext

3g 29.354g 29.575g 29.54

3g 23.413g (950k) 25.10

3g 25.82(four references)

4g 19.54(seven references)

Euromini English Spanish

60k LM Train40k Bitext

Czech WSJ English Czech

20k LM Train20k Bitext

IWSLT Chinese Chinese English

40k LM Train40k Bitext

Page 10: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

108/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Using Factored ModelsApproaches for Small-Data Tasks

• Factored Models we tried– Different levels of linguistic information modeled separately

example: Morphology vs. phrasal content– Feature “Checking” of existing phrasal models with LMs on

factors

– Generalized Factor-based Distortion Phrase are likely to move distance X if preceding word is Tag Y

• Hypothesis: These models allows better utilization of limited training data

I would like some donutsGood

pn mod vb det np

I would like some big jumpBad

pn mod vb det adj vb

Words

POS

High likelihood Low likelihood

Page 11: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

118/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Different Factored ApproachesOverview of Models Tried

• High Order Language Models

• Parallel Translation Models

Analysis Problems AddressedExplicit Agreement

Long Distance CoherenceUnsupervised Agreement/Coherence • LMs over Word-Classes

• LMs over verbs/subject• LMs over nouns determiner

adjectives

SupervisedModel Types

• LMs over POS

• Parallel Translation Models over Word-Classes and Surface

Agreement/CoherenceUnsupervised

Explicit AgreementProblem Types

• Parallel Translation Models over Lemmas and Morphology

SupervisedAnalysis Model Types

Page 12: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

128/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish– Morphology and Agreement Features (Brooke)– Parallel Lemma and Morphology Translation (Wade)– Scaling to Larger Corpora (Wade)

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 13: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

138/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Spanish ExperimentsLanguage Models over Morphological Features

• NDA– Nouns/Determiner/Adjective Agreement– Generate only on N, D and A tags (don’t

care’s elsewhere)

• VNP– Verb/Nouns/Preposition Selection

Agreement– Generate on V, N or P

ModelModel

Surface

Generate + Check Latent Factors

nda

word

vpn

N/D/A FeaturesGender: masc, fem, common, none Number: sing, plural, invariable, none

V/N/P FeaturesNumber: sing, plural, invariable, none Person: 1p, 2p, 3p, nonePrep-ID: Preposition, none

Page 14: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

148/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

ModelModel

Spanish ExperimentsSkipped LMs for Agreement

• Allow NULL factors to be generated• Increase effective context length to model longer range

dependencies

Surface

Generate Latent Factors

…gave the woman

nda

word

vpn

s+f

s

s+f

X

X

“a”3+s

X

mujerlaadio

Target Phrase

Source Phrase

Page 15: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

158/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Spanish Agreement LMsExperimental Results

• With Skipping

• No Skipping (LM counts don’t care positions)

• No Skipping with all morphological features w/ and w/o POS

• All models beat baseline– Skipping doesn’t seem to help– Full morphology is best

Data Set Baseline NDA VPN BothEuroMini 23.41 24.47 24.33 24.54

Data Set Baseline NDA+Skip VPN+SkipEuroMini 23.41 24.03 24.16

Data Set Baseline Morph Morph+POSEuroMini 23.41 24.66 24.25

Page 16: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

168/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Lemma

Person + Number + Gender + Case

Spanish ExperimentsParallel Lemma/Morphology Translation

• Factor surface into lemma and morphology features• Translate both simultaneously• Re-generate target surface form• Apply LM on both surface and morphology features

• Results:

Surface

Analysis Generation

Me

I

1ps+ Acc

Yo

Mi

1ps+ Acc

Data Set Baseline LemmaEuroMini + 950k LM 25.10 25.71

Page 17: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

178/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Scaling Up to Large TrainingPOS Language Models

• Full Train → Less/No Gain from richer features

POS-LM vs. Baseline

28

28.5

2929.5

30

30.5

31

3g 4g 5g 6g 7g 8g 9g

POS N-gram Order

BLE

U S

core

BaselinePOS-LMFull Tags

NOTE: Scale

Page 18: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

188/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech– Factored Word Alignment for Limited Data– Rich Morphology and Tagged LMs– Putting it Together: Parallel Translation

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Analysis and Conclusions

• Follow-on Research

Page 19: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

198/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factors for Coping with Limited DataBetter Word Alignment for Czech

• Word Alignment is difficult when data is limited and Morphology is rich

– Data: 20k bitext sentences, large vocabulary– Contrast Set: 20k + 840k (Out of Domain) sentences– Task: English Czech

• Two methods to deal with limited data

• Contrastive Behavior for small and large data

Stem Alignment Lemma Alignment

Data Set Word-Word Stem-Lemma Stem-Stem20k Czech 25.17 25.23 25.82

24.99Large Contrast 25.40

Page 20: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

208/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Czeching Rich Morphology with TagsTagged Czech Language Models

• Idea: Use morphologically rich POS Tag sequences to “czech”target output generation

• POS Information Configurations (Baseline: 25.82)

Surface

Generation

cat

N+acc

kočky

Apply LM

Full TagsFeature 1Feature 2… (15 total)Size: 1098 tagsResult: 27.04

CNG TagsCaseNumber+Genderon V, P, PP, N, ASize: 707 tagsResult: 27.45

CNG+VPCNG FeaturesPerson+Tense+Aspect (verbs)Lemma+Case (prepostions)Size: 899 tagsResult: 27.62

Page 21: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

218/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Comparing with Larger Data ModelsTagged Czech Language Models

• Large vs. Small Data

• Tagged Language Models improve performance for small data significantly

– approaches large data performance• Large Task also improves (but much less: 2.36% vs. 6.97%)

Data Set Data Set BLEU Relative Improvement

20k Czech 25.82 –Large Contrast(20k + 840k OOD)

27.47 –

Baseline

20k Czech 27.62 6.97%CNG+VP

2.37%Large Contrast(20k + 840k OOD)

28.12

Page 22: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

228/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Parallel Translation Models for Czech

• Motivation: Factored LM models seem to lose number information

• Better than baseline, but worse than both CNG & CNG+VP

POS Tag + CNG Features

Surfacehim

3p+acc

ho

Model ResultSurface Surface + POS POS+CNG 25.94

on Lemma

3p+acc

Surface Lemma + POS POS+CNG 26.43

Page 23: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

238/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models (Christine)– Lexical Distortion Models– Factor-based Distortion– Results

• Models for Sparse Statistics in Chinese

• Analysis and Conclusions

• Follow-on Research

Page 24: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

248/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Generalized Distortion ModelingIntroduction to Distortion

• For each phrase pair we learn its likely placement relative to the previous phrase

• Orientations– Monotone

word alignment point on top left– Swap

word alignment point on top right– Discontinuous

Not monotone or swap

• Examples– la casa roja the red house– D NN ADJ D ADJ NN

Source

Targ

et

Monotone

Discontinuous Swap

Page 25: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

258/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factor-based Distortion Models

• A Factor-based extension of Lexicalized Distortion– Use of more general factors

e.g. POSf-POSe, Lemma-Lemma

• Can model longer range dependencies– More conditioning variables

• Motivating Results– Hard-coding in a few factor based rules (e.g. swap nouns and

adjectives when translating from English to Spanish) led to improvements (Gispert, et. al. 2006)

Page 26: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

268/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Factor-based DistortionSpanish Experiments

• Lexicalized Distortion only

• Factor-based Distortion on small data

• Further Experiments– Other Factors– Minimizing Model Parameters– Combining different models

Data Set ResultBaseline (No Lexical)

Factored: POS-POS SystemCombined: Lexical + POS-POS

Baseline Lexical

Europarl Lang Pharaoh MosesEn De

Es En

En Es

18.15 18.85

31.06 31.85 31.46 32.37

Page 27: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

278/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Coping with Rich Morphological Constraints in Czech

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Conclusions and Follow-on Research

Page 28: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

288/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

IWSLT ChineseExperiments with Unsupervised Annotation

• Data: Travel-domain sentences, limited vocabulary, short sentences• Task: Text and ASR translation, Chinese English

• Can we use automatic word classes to learn general sequence constraints?

• First Experiment: 2-gram Word Class LMs of varying orders

ModelModel

SurfaceHow much is it?

class

word

c55c3c22c1

?钱多少总共

Target Phrase

Source Phrase

Page 29: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

298/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

IWSLT ChineseAlignment Templates for Translation

• Second Experiment: Extend Class-based LM to the translation Model

• Bigram word classes for source and target

• Translate alignment templates similar to [Och 98] + surface

• Apply LM to surface and Class

Word Class

Surface

Generation

Me

I Yo

Mi

Page 30: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

308/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

18

18.5

19

19.5

20

20.5

21

21.5

22

22.5

3g 4g 5g 6g 7g 8g 9g

Class N-gram Order

BLE

U S

core

Baseline

Class-LM

ClassTrans+LM

• Class-LM significantly better (p=0.05, ~1.0 BLEU)• Class-Trans may be limited by synchronous PT constraint

– Start to address here, but not in time for eval

NOTE: Scale

IWSLT ChineseAutoclass Results

Page 31: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

318/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Outline

• Motivations

• Experimental Design and Baselines

• Models for Agreement in Spanish

• Generalizing Lexical Distortion Models

• Models for Sparse Statistics in Chinese

• Coping with Rich Morphological Constraints in Czech

• Conclusions and Follow On Research

Page 32: Factored Translation Models for Small Data Problems€¦ · 3 8/14/2006 MIT Lincoln + Computer Science AI Labs Charles University General Motivations Challenges with Small Data •

328/14/2006

MIT Lincoln + Computer Science AI Labs Charles University

Conclusions and Future Work

• Factored Approach can help with small data– Large Data tasks may need different factored approaches

• MIT/LL + CSAIL– Continue experiments with morphology and coherence– Fully Asynchronous Factor Translation– Apply techniques to other languages

Extend existing LCTL experiments– Syntax-driven reordering models (Brooke)

• Asynchronous Factors Translation (Hieu)

• Making use of verb sub-categorization information (Ondrej)


Recommended