+ All Categories
Home > Documents > A Syntax-Driven Bracketing Model for Phrase-Based Translation

A Syntax-Driven Bracketing Model for Phrase-Based Translation

Date post: 13-Jan-2016
Category:
Upload: bevis
View: 40 times
Download: 2 times
Share this document with a friend
Description:
A Syntax-Driven Bracketing Model for Phrase-Based Translation. Deyi Xiong, et al. ACL 2009. Introduction. Machine Translation Chinese to English Chinese 把 7 月 11 日 設立 為 航海 節 An ideal case:. 把 7 月 11 日 設立 為 航海 節. - PowerPoint PPT Presentation
27
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009
Transcript
Page 1: A Syntax-Driven Bracketing Model for Phrase-Based Translation

A Syntax-Driven Bracketing Model for Phrase-Based Translation

Deyi Xiong, et al.

ACL 2009

Page 2: A Syntax-Driven Bracketing Model for Phrase-Based Translation

把 7 月 11 日 設立 為 航海 節

Introduction

• Machine Translation– Chinese to English– Chinese

• 把 7 月 11 日 設立 為 航海 節• An ideal case:

to establish July 11 as Sailing Festival day

Page 3: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Wrong Linguistic Structure

• 航海 節 is a syntactic constituent

把 7 月 11 日 設立 為 航海 節

to set up for navigation on July 11 knots

Page 4: A Syntax-Driven Bracketing Model for Phrase-Based Translation

A Naive Solution

• Employ syntactic constraints– Fully respect linguistic structures

Page 5: A Syntax-Driven Bracketing Model for Phrase-Based Translation

把 今天 設立 為 航海 節

A Naive Solution (2)

• Unfortunately, it damages the performance– Non-syntactic translations are sometimes useful

Sailing Festival dayestablish today as

Page 6: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Syntax-Driven Bracketing Model

• SDB model

• Translation unit is more important– Whether it is syntactic or non-syntactic

• Include but not limited to constituent matching/violation

• Protect the strength of the phrase-based system

Page 7: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Translation Unit

• Bracketable source phrase and its corresponding translation

• Bracketable– A source phrase is bracketable

• Its translation is contiguous

– A pair of neighboring phrases is bracketable• Their translations are contiguous after combined

Page 8: A Syntax-Driven Bracketing Model for Phrase-Based Translation

establish today as

Translation Unit Examples

• Bracketable

把 今天 設立 為

establish today as

把 今天 設立 為

• 把 今天 設立 and 為 are bracketable

• 把 今天 設立 為 is bracketable

Page 9: A Syntax-Driven Bracketing Model for Phrase-Based Translation

把 今天 設立 為

establish today as

Translation Unit Examples

• Unbracketable

• 設立 and 為 are unbracketable

• 設立 為 is unbracketable

Page 10: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Bracketing Instances Extraction

• Extract bracketable and unbracketable instances from training data– Aligned sentence pair + parsed source sentence

• Estimate whether a source phrase is bracketable at run time

Page 11: A Syntax-Driven Bracketing Model for Phrase-Based Translation
Page 12: A Syntax-Driven Bracketing Model for Phrase-Based Translation

SDB Features

Page 13: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Rule Features

• Rule Features (RF)– CFG rule

– Horizontal context

Page 14: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Rule Features (2)

S1: ADVP ADS2: VP VV AS NPS: VP ADVP VP

Page 15: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Path Features

• Path features (PF)– Path to roots

• S1 to the root of S

• S2 to the root of S

• S to the root of this tree

– Vertical context

Page 16: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Path Features (2)

S1: ADVP VPS2: VP VPS: VP IP

Page 17: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Constituent Boundary Matching Features

• Constituent Boundary Matching Features (CBMF)– Exact match

• Source phrase covers the boundaries of its tree

– Inside match• Source phrase covers a sequence of its tree

– Crossing match• Source phrase crosses the subtree of its tree

Page 18: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Constituent Boundary Matching Features (3)

Exactmatch

Insidematch

Crossingmatch

Page 19: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Integration into Phrase-based MT

• SDB model estimate the probability that a source phrase is bracketable. – Whether it can be translated as a unit

• Integrated into BTG MT system– Bracketing Transduction Grammar (Wu, 1997)

establish today as

把 今天 設立 為

as establish today

把 今天 設立 為

Straight Inverted

Page 20: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Experiment

• Comparing models– Baseline: BTG system– XP+ (Marton and Resnik, 2008)

• NP, VP, PP, ADVP….• Penalize each time when violating the syntactic bo

undaries. (soft constraint)

– UniSDB• Only S features

– BiSDB• S1, S2 and S features

Page 21: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Experiment (2)

• Chinese parser– Lexicalized PCFG parser (Xiong et al., 2005)

• Parallel corpus– FBIS corpus

• Word alignment– GIZA++

• Four-gram language model– Built with SRILM

– Xinhua section of the the English Gigaword corpus

• Maximum Entropy (ME) Trainer– Zhang 2004

Page 22: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Result

• SDB receives the largest feature weight– Imply its impact on decoder.

Baseline features(Common for phrase-based systems)

XP+ and SDB

Page 23: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Result (2)

• NIST MT-05 test set– Improvement of 1.67 BLEU over baseline

– Improvement of 0.59 BLEU over XP+

Page 24: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Result (3)

• Based on CBMF, adding rule and path feature achieves further improvement

• BiSDB is constantly better than UniSDB– Inner contexts (S1 and S2) are useful

Page 25: A Syntax-Driven Bracketing Model for Phrase-Based Translation

XP+ and SDB

• Same– Consider syntactic constituent

• Different– XP+ only punishes non-syntactic source phrase– SDB is able to encourage non-syntactic if the phrase i

s bracketable

Page 26: A Syntax-Driven Bracketing Model for Phrase-Based Translation

XP+ and SDB

Page 27: A Syntax-Driven Bracketing Model for Phrase-Based Translation

Conclusion

• SDM model predict whether a source phrase can be translated as a unit.

• Appropriate constituent violations are helpful– Because it better inherit the strength of phrase-based

approach


Recommended