Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July,...

Enriched translation model using morphology in MT

Luong Minh ThangWING group meeting – 07 July, 2009

04/11/23 1

Overview

• Brief recap on SMT & morphological analysis• Motivation• Enriched translation model

– Twin phrase-table construction– Merging phrase tables

• Experiments• Conclusion

04/11/23 2

SMT overview – alignment

• Parallel data

04/11/23

These are , first and foremost , messages of concern at the economic and social problems that we are experiencing , in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens .

Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista , vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista , kaikkien kansalaistemme taholta .

• Alignment: one-to-many (1-M)

Maria no daba una botefada a la bruja verde

NULL Mary did not slap the green witchSource

Target3

SMT overview – translation model

• Intersect alignment 1-M + M-1 M – M• Extracting phrases from M-M alignment translation model (phrase table).

04/11/23

problems ||| ongelmat ||| 0.372611 0.597858 0.114146 0.13882 2.718 problems ||| ongelmasta ||| 0.352941 0.423077 0.000836237 0.0012435 2.718…problems ||| vaikeuksista ||| 0.0696946 0.105991 0.0124042 0.0130002 2.718problems ||| vaikeuksien ||| 0.0410959 0.062069 0.000836237 0.0010174 2.718

Phrase penalty

Translation probabilitiesEnglish eForeign f

4

Lexical probabilities

Recap - Morphological analysis

• Morpheme: minimal meaning-bearing unitEnglish: machine + s, present + ed, etc.Finnish: oppositio + kansa + n + edusta + ja

= opposition of parliament member

• Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF

04/11/23 5

Motivation• Problem:

– Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc.

– Rare words often occur and are hard to align incorrect entries in normal (word-align) phrase table.

• Solution:– Construct morpheme-align phrase table (PT) to

aggregate better statistics for rare words.– Combine word- and morpheme-align PTs to produce

even better translation model in a proper way.04/11/23 6

Overview




04/11/23 7

Twin phrase-table (PT) construction

04/11/23

GIZA++

Decoding

Word alignment

Morpheme alignment

Word Morpheme

PTm

PTwm

Phrase Extraction

PTw

Morphological segmentation

Phrase Extraction

GIZA++

PT merging

problem/STM+ s/SUF ||| ongelma/STM+ t/SUF

problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF

problems ||| vaikeuksista

8

Existing PT-merging methods

• Add-feature - (Nakov, 2008; Chen et. al. 2009):

F1 = F2 = F3 =

heuristic-driven

• Interpolation - (Wu & Wang, 2007) :– tran(f|e) = α * tran1(f|e) + (1- α) * tran2(f|e)

– lex(f|e) = β * lex1(f|e) + (1- β) * lex2(f|e)

not consider score “meaning”04/11/23

1 if from 1st PT

0.5 otherwise

1 if from 2nd PT

0.5 otherwise

1 if from both PTs0.5 otherwise

9

Our merging method – normalizing translation probabilities

tran1(e|f) =count1(e, f) / ∑e count1(e, f) tran2(e|f) =count2(e, f) / ∑e count2(e, f)

04/11/23 10

problem + s ||| vaikeu + ksi + staproblem + s ||| ongelma + sta

problem + s ||| ongelma + tproblem + s ||| ongelma + tproblem + s ||| ongelma + tproblem + s ||| vaikeu + ksi + sta

PTwm PTm

MLE


tran(vaikeuksista | problems) =1/2=0.5tran(ongelmasta | problems) =1/2=0.5

tran(ongelmat | problems) = 3/4 = 0.75tran(vaikeuksista | problems) = 1/4 = 0.25

Undesired translation!

tran(vaikeuksista | problems) = (0.5 + 0.25)/2 = 0.375 tran(ongelmat | problems) = (0 + 0.75)/2 = 0.375tran(ongelmasta | problems) = (0.5 + 0)/2 = 0.25

Interpolation(ratio = 0.5)

04/11/23 11



PTwm PTm

MLE


tran1(e|f) =count1(e, f) / ∑e count1(e, f) tran2(e|f) =count2(e, f) / ∑e count2(e, f)

04/11/23 12

Normalization

tran(e|f) =[ count1(e, f) + count2(e, f)] / [ ∑e count1(e, f) + ∑e count2(e, f) ]



PTwm PTm

MLE


tran(vaikeuksista | problems) =1/2=0.5tran(ongelmasta | problems) =1/2=0.5

tran(ongelmat | problems) = 3/4 = 0.75tran(vaikeuksista | problems) = 1/4 = 0.25

tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17

Desired translation!

Normalization

04/11/23 13



PTwm PTm

MLE

Our merging method – full lexical probability interpolation

lex(vaikeuksista | problems) = w1

lex(ongelmasta | problems) = w2

lex(vaikeu + ksi + sta | problem + s) = m1

lex(ongelma + t | problem + s) = m3

lex(vaikeuksista | problems) = (w1 + m1)/2 lex(ongelmat | problems) = (w2 + 0)/2 lex(ongelmasta | problems) = (0 + m3) /2

NormalInterpolation(ratio = 0.5) Missing

interpolated probabilities !

PTm lexical model

P(vaikeuksista|problems)P(ongelmasta|problems)

P(vaikeu|problem), P(ongelma|problem), P(t|s), P(ksi|s),P(sta|s)

04/11/23 14

PTw lexical model

• Estimate lex(ongelma + sta | problem + s) using PTm lexical model m2

• Estimate lex(ongelmat | problems) using PTw lexical model w3

FullInterpolation

Overview




04/11/23 15

Experiments – dataset

• 2005 ACL shared task (Koehn & Monz, 2005)

04/11/23 16

Experiments – baselines

• w-system: uses PTw translate at word-level

• m-system: uses PTm translate at morpheme-level

• m-BLEU: BLEU where each token unit is a morpheme04/11/23 17

Experiments – our system

• Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005)

04/11/23 18

Conclusion

Our contributions:• Enrich the translation model without using

additional data.• Propose a principal way to merge phrase

tables generated at different granularities.

04/11/23 19

Q & A

•Thank you !!!

04/11/23 20

Date post:	01-Apr-2015
Category:	Documents
Upload:	neil-neve
View:	215 times
Download:	1 times

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July,...

Documents