+ All Categories
Home > Technology > CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Date post: 18-Nov-2014
Category:
Upload: aaron-l-f-han
View: 745 times
Download: 6 times
Share this document with a friend
Description:
Abstract of Aaron Han’s Presentation The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics: – perform well in certain language pairs but weak on others, which we call the language-bias problem; – consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem; – design incomprehensive factors (e.g. precision only). To address the existing problems, he has developed several automatic evaluation metrics: – Design tunable parameters to address the language-bias problem; – Use concise linguistic features for the linguistic extremism problem; – Design augmented factors. The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc. A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
68
Aaron L.-F. Han Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory University of Macau, Macau S.A.R., China 2013.08 @ CUHK, Hong Kong Email: hanlifengaaron AT gmail DOT com Homepage: http://www.linkedin.com/in/aaronhan
Transcript
Page 1: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Aaron L.-F. Han

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory

University of Macau, Macau S.A.R., China

2013.08 @ CUHK, Hong Kong

Email: hanlifengaaron AT gmail DOT com

Homepage: http://www.linkedin.com/in/aaronhan

Page 2: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

The importance of machine translation (MT) evaluation

Automatic MT evaluation metrics introduction 1. Lexical similarity

2. Linguistic features

3. Metrics combination

Designed metric: LEPOR Series 1. Motivation

2. LEPOR Metrics Description

3. Performances on international ACL-WMT corpora

4. Publications and Open source tools

Other research interests and publications

Page 3: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Eager communication with each other of different nationalities

– Promote the translation technology

• Rapid development of Machine translation

– machine translation (MT) began as early as in the 1950s (Weaver, 1955)

– big progress science the 1990s due to the development of computers (storage capacity and computational power) and the enlarged bilingual corpora (Marino et al. 2006)

Page 4: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Some recent works of MT research:

– Och (2003) present MERT (Minimum Error Rate Training) for log-linear SMT

– Su et al. (2009) use the Thematic Role Templates model to improve the translation

– Xiong et al. (2011) employ the maximum-entropy model, etc.

– The data-driven methods including example-based MT (Carl and Way, 2003) and statistical MT (Koehn, 2010) became main approaches in MT literature.

Page 5: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• How well the MT systems perform and whether they make some progress?

• Difficulties of MT evaluation

– language variability results in no single correct translation

– the natural languages are highly ambiguous and different languages do not always express the same content in the same way (Arnold, 2003)

Page 6: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Traditional manual evaluation criteria:

– intelligibility (measuring how understandable the sentence is)

– fidelity (measuring how much information the translated sentence retains as compared to the original) by the Automatic Language Processing Advisory Committee (ALPAC) around 1966 (Carroll, 1966)

– adequacy (similar as fidelity), fluency (whether the sentence is well-formed and fluent) and comprehension (improved intelligibility) by Defense Advanced Research Projects Agency (DARPA) of US (White et al., 1994)

Page 7: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Problems of manual evaluations :

– Time-consuming

– Expensive

– Unrepeatable

– Low agreement (Callison-Burch, et al., 2011)

Page 8: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

2.1 Lexical similarity

2.2 Linguistic features

2.3 Metrics combination

Page 9: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Precision-based

Bleu (Papineni et al., 2002 ACL)

• Recall-based

ROUGE(Lin, 2004 WAS)

• Precision and Recall

Meteor (Banerjee and Lavie, 2005 ACL)

Page 10: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Word-order based

NKT_NSR(Isozaki et al., 2010EMNLP), Port (Chen et al., 2012 ACL), ATEC (Wong et al., 2008AMTA)

• Word-alignment based

AER (Och and Ney, 2003 J.CL)

• Edit distance-based

WER(Su et al., 1992Coling), PER(Tillmann et al., 1997 EUROSPEECH), TER (Snover et al., 2006 AMTA)

Page 11: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Language model

LM-SVM (Gamon et al., 2005EAMT)

• Shallow parsing

GLEU (Mutton et al., 2007ACL), TerrorCat (Fishel et al., 2012WMT)

• Semantic roles

Named entity, morphological, synonymy, paraphrasing, discourse representation, etc.

Page 12: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• MTeRater-Plus (Parton et al., 2011WMT)

– Combine BLEU, TERp (Snover et al., 2009) and Meteor (Banerjee and Lavie, 2005; Lavie and Denkowski, 2009)

• MPF & WMPBleu (Popovic, 2011WMT)

– Arithmetic mean of F score and BLEU score

• SIA (Liu and Gildea, 2006ACL)

– Combine the advantages of n-gram-based metrics and loose-sequence-based metrics

Page 13: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• LEPOR: automatic machine translation evaluation metric considering the: Length Penalty, Precision, n-gram Position difference Penalty and Recall.

Page 14: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Weaknesses in existing metrics:

– perform well on certain language pairs but weak on others, which we call as the language-bias problem;

– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call as the extremism problem;

– present incomprehensive factors (e.g. BLEU focus on precision only).

– What to do?

Page 15: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• to address some of the existing problems:

– Design tunable parameters to address the language-bias problem;

– Use concise or optimized linguistic features for the linguistic extremism problem;

– Design augmented factors.

Page 16: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Sub-factors:

• 𝐿𝑃 =

exp 1 −𝑟

𝑐: 𝑐 < 𝑟

1 ∶ 𝑐 = 𝑟

exp 1 −𝑐

𝑟: 𝑐 > 𝑟

(1)

• 𝑟: length of reference sentence

• 𝑐: length of candidate (system-output) sentence

Page 17: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 = exp −𝑁𝑃𝐷 (2)

• 𝑁𝑃𝐷 =1

𝐿𝑒𝑛𝑔𝑡ℎ𝑜𝑢𝑡𝑝𝑢𝑡 |𝑃𝐷𝑖|

𝐿𝑒𝑛𝑔𝑡ℎ𝑜𝑢𝑡𝑝𝑢𝑡

𝑖=1 (3)

• 𝑃𝐷𝑖 = |𝑀𝑎𝑡𝑐ℎ𝑁𝑜𝑢𝑡𝑝𝑢𝑡 − 𝑀𝑎𝑡𝑐ℎ𝑁𝑟𝑒𝑓| (4)

• 𝑀𝑎𝑡𝑐ℎ𝑁𝑜𝑢𝑡𝑝𝑢𝑡: position of matched token in

output sentence

• 𝑀𝑎𝑡𝑐ℎ𝑁𝑟𝑒𝑓: position of matched token in reference

sentence

Page 18: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Fig. 1. N-gram word alignment algorithm

Page 19: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Fig. 2. Example of n-gram word alignment

Page 20: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Fig. 3. Example of NPD calculation

Page 21: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• N-gram precision and recall:

• 𝑃𝑛 =#𝑛𝑔𝑟𝑎𝑚𝑚𝑎𝑡𝑐ℎ𝑒𝑑

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠 𝑖𝑛 𝑠𝑦𝑠𝑡𝑒𝑚 𝑜𝑢𝑡𝑝𝑢𝑡 (5)

• 𝑅𝑛 =#𝑛𝑔𝑟𝑎𝑚𝑚𝑎𝑡𝑐ℎ𝑒𝑑

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘𝑠 𝑖𝑛 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 (6)

• 𝐻𝑃𝑅 = 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝛼𝑅𝑛, 𝛽𝑃𝑛 =𝛼+𝛽𝛼

𝑅𝑛+

𝛽

𝑃𝑛

(7)

Page 22: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Fig. 4. Example of bigram matching

Page 23: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• LEPOR Metrics:

• 𝐿𝐸𝑃𝑂𝑅 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃) (8)

• ℎ𝐿𝐸𝑃𝑂𝑅 =𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝑤𝐿𝑃𝐿𝑃,𝑤𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙, 𝑤𝐻𝑃𝑅𝐻𝑃𝑅

• = 𝑤𝑖

𝑛𝑖=1

𝑤𝑖

𝐹𝑎𝑐𝑡𝑜𝑟𝑖

𝑛𝑖=1

=𝑤𝐿𝑃+𝑤𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙+𝑤𝐻𝑃𝑅𝑤𝐿𝑃𝐿𝑃

+𝑤𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙

+𝑤𝐻𝑃𝑅𝐻𝑃𝑅

(9)

• 𝑛𝐿𝐸𝑃𝑂𝑅 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × 𝑒𝑥𝑝 ( 𝑤𝑛𝑙𝑜𝑔𝐻𝑃𝑅𝑁𝑛=1 )

(10)

Page 24: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Example, employment of linguistic features:

Fig. 5. Example of n-gram POS alignment

Fig. 6. Example of NPD calculation

Page 25: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Combination with linguistic features:

• ℎ𝐿𝐸𝑃𝑂𝑅𝑓𝑖𝑛𝑎𝑙 =1

𝑤ℎ𝑤+𝑤ℎ𝑝(𝑤ℎ𝑤ℎ𝐿𝐸𝑃𝑂𝑅𝑤𝑜𝑟𝑑 +

𝑤ℎ𝑝ℎ𝐿𝐸𝑃𝑂𝑅𝑃𝑂𝑆) (11)

• ℎ𝐿𝐸𝑃𝑂𝑅𝑃𝑂𝑆 and ℎ𝐿𝐸𝑃𝑂𝑅𝑤𝑜𝑟𝑑 use the same algorithm on POS sequence and word sequence respectively.

Page 26: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• When multi-references:

• Select the alignment that results in the minimum NPD score.

Fig. 7. N-gram alignment when multi-references

Page 27: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• How reliable is the automatic metric?

• Evaluation criteria for evaluation metrics:

– Human judgments are the golden to approach, currently.

• Correlation with human judgments:

– System-level correlation

– Segment-level correlation

Page 28: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• System-level correlation:

• Spearman rank correlation coefficient:

– 𝜌 𝑋𝑌 = 1 −6 𝑑𝑖

2𝑛𝑖=1

𝑛(𝑛2−1) (12)

– 𝑋 = 𝑥1, … , 𝑥𝑛 , 𝑌 = {𝑦1, … , 𝑦𝑛}

• Pearson correlation coefficient:

– 𝜌 𝑋𝑌 = (𝑥𝑖−𝜇𝑥)(𝑦𝑖−𝜇𝑦)𝑛

𝑖=1

𝑥𝑖−𝜇𝑥2𝑛

𝑖=1 𝑦𝑖−𝜇𝑦2𝑛

𝑖=1

(13)

Page 29: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Segment-level Kendall’s tau correlation:

• 𝜏 =𝑛𝑢𝑚 𝑐𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑡 𝑝𝑎𝑖𝑟𝑠−𝑛𝑢𝑚 𝑑𝑖𝑠𝑐𝑜𝑟𝑑𝑎𝑛𝑡 𝑝𝑎𝑖𝑟𝑠

𝑡𝑜𝑡𝑎𝑙 𝑝𝑎𝑖𝑟𝑠 (14)

• The segment unit can be a single sentence or fragment that contains several sentences.

Page 30: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Performances on ACL-WMT 2011 copora

• Two translation directions:

– English-to-other (Spanish, German, French, Czech)

– Other-to-English

• System-level metrics:

– 𝐿𝐸𝑃𝑂𝑅𝐴 =1

𝑛𝑢𝑚𝑠𝑒𝑛𝑡 |𝐿𝐸𝑃𝑂𝑅𝑖|

𝑛𝑢𝑚𝑠𝑒𝑛𝑡𝑖=1 (15)

– 𝐿𝐸𝑃𝑂𝑅𝐵 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐(𝛼𝑅, 𝛽𝑃) (16)

Page 31: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Table 1. system-level Spearman correlation with human judgment on WMT11 corpora

Page 32: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Performances on ACL-WMT 2013 copora

• Two translation directions:

– English-to-other (Spanish, German, French, Czech, and Russian)

– Other-to-English

Page 33: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• System-level & sentence-level

– LEPOR_v3.1: hLEPOR, nLEPOR_baseline

– ℎ𝐿𝐸𝑃𝑂𝑅 =1

𝑛𝑢𝑚𝑠𝑒𝑛𝑡 |ℎ𝐿𝐸𝑃𝑂𝑅𝑖|

𝑛𝑢𝑚𝑠𝑒𝑛𝑡𝑖=1 (17)

– ℎ𝐿𝐸𝑃𝑂𝑅𝑓𝑖𝑛𝑎𝑙 =1

𝑤ℎ𝑤+𝑤ℎ𝑝(𝑤ℎ𝑤ℎ𝐿𝐸𝑃𝑂𝑅𝑤𝑜𝑟𝑑 + 𝑤ℎ𝑝ℎ𝐿𝐸𝑃𝑂𝑅𝑃𝑂𝑆) (18)

– 𝑛𝐿𝐸𝑃𝑂𝑅 = 𝐿𝑃 × 𝑁𝑃𝑜𝑠𝑃𝑒𝑛𝑎𝑙 × exp ( 𝑤𝑛𝑙𝑜𝑔𝐻𝑃𝑅𝑁𝑛=1 ) (19)

Page 34: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Table 2&3. system-level Pearson correlation with human judgment

Page 35: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Table 4&5. segment-level Kendall’s tau correlation with human judgment

Page 36: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors

– Aaron L.-F. Han, Derek F. Wong and Lidia S. Chao. Proceedings of COLING 2012: Posters, pages 441–450, Mumbai, India.

• Language-independent Model for Machine Translation Evaluation with Reinforced Factors

– Aaron L.-F. Han, Derek Wong, Lidia S. Chao, Liangye He, Yi Lu, Junwen Xing, Xiaodong Zeng. Proceedings of MT Summit 2013. Nice, France.

Page 37: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 38: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 39: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Language independent MT evaluation-LEPOR: https://github.com/aaronlifenghan/aaron-project-lepor

• MT evaluation with linguistic features-hLEPOR: https://github.com/aaronlifenghan/aaron-project-hlepor

• English-French Phrase tagset mapping and application in unsupervised MT evaluation-HPPR: https://github.com/aaronlifenghan/aaron-project-hppr

• Unsupervised English-Spanish MT evaluation-EBLEU: https://github.com/aaronlifenghan/aaron-project-ebleu

• Projects Homepage: https://github.com/aaronlifenghan

Page 40: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• My research interests:

– Natural Language Processing

– Signal Processing

– Machine Learning

– Artificial Intelligence

– Pattern Recognition

• My past research works:

– Machine Translation Evaluation, Word Segmentation, Entity Recognition, Multilingual Treebanks

Page 41: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Other publications:

• A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task

– Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yi Lu, Yervant Ho, Yiming Wang, Zhou jiaji. Proceedings of the ACL 2013 EIGHTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013), 8-9 August 2013. Sofia, Bulgaria.

– ACL-WMT13 METRICS TASK: Our metrics are language independent

English-vs-other (French, Spanish, Czech, German, Russian)

Can perform on both system-level and segment-level

The official results show our metrics have advantages as compared to others.

Page 42: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling

– Aaron Li-Feng Han, Yi Lu, Derek F. Wong, Lidia S. Chao, Yervant Ho, Anson Xing. Proceedings of the ACL 2013 EIGHTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION (ACL-WMT 2013), 8-9 August 2013. Sofia, Bulgaria.

– ACL-WMT13 QUALITY ESTIMATION TASK (no reference translation): Task 1.1: sentence level EN-ES quality estimation

Task 1.2: system selection, EN-ES, EN-DE, new

Task 2: word-level QE, EN-ES, binary classification, multi-class classification, new

We design novel EN-ES POS tagset mapping and metric EBLEU in task 1.1.

We explore the Naïve Bayes and Support Vector Machine in task 1.2.

We achieve the highest F1 score in task 2 using Conditional Random Field.

Page 43: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Designed POS tagset mapping of Spanish (Tree tagger) to universal tagset

(Petrov et al., 2012)

Page 44: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 𝐸𝐵𝐿𝐸𝑈 = 1 − 𝑀𝐿𝑃 × exp ( 𝑤𝑛log (𝐻(𝛼𝑅𝑛, 𝛽𝑃𝑛)))

– 𝑀𝐿𝑃 = 𝑒1−

𝑠

ℎ 𝑖𝑓 ℎ < 𝑠

𝑒1−ℎ

𝑠 𝑖𝑓 ℎ ≥ 𝑠

– 𝑃𝑛 =#𝑐𝑜𝑚𝑚𝑜𝑛 𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘 𝑖𝑛 𝑡𝑎𝑟𝑔𝑒𝑡 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒

– 𝑅𝑛 =#𝑐𝑜𝑚𝑚𝑜𝑛 𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘

#𝑛𝑔𝑟𝑎𝑚 𝑐ℎ𝑢𝑛𝑘 𝑖𝑛 𝑠𝑜𝑢𝑟𝑐𝑒 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒

Page 45: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Bayes’ rule:

– 𝑝 𝑐𝑖 𝑥1, 𝑥2, … , 𝑥𝑛 =𝑝 𝑥1,𝑥2,…,𝑥𝑛 𝑐𝑖 𝑝(𝑐𝑖)

𝑝(𝑥1,𝑥2,…,𝑥𝑛)

• SVM, find the point with smallest margin to hyper plane and then maximize this margin:

– 𝑎𝑟𝑔𝑚𝑎𝑥𝑤 ,𝑏 {𝑚𝑖𝑛𝑛

(𝑙𝑎𝑏𝑒𝑙 ∙ (𝑤𝑇𝑥 + 𝑏)) ∙1

||𝑤||}

• Conditional random fields:

– 𝑃 𝑌 𝑋 ∝exp ( 𝜆𝑘𝑓𝑘 𝑒, 𝑌|𝑒 , 𝑋𝑒∈𝐸,𝑘 + 𝜇𝑘𝑔𝑘(𝑣, 𝑌|𝑣 , 𝑋)𝑣∈𝑉,𝑘 )

Page 46: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Designed features for CRF & NB algorithms

Page 47: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

ACL-WMT13 word level quality estimation task results

Page 48: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 49: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation

– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho, Shuo Li, Lynn Ling Zhu. In GSCL 2013. LNCS Vol. 8105, Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch.

– German Society for Computational Linguistics (oral presentation): To facilitate future research in unsupervised induction of syntactic structures

We design French-English phrase tagset mapping

We propose a universal phrase tagset

Phase tags extracted from French Treebank and English Penn Treebank

Explore the employment of the proposed mapping in unsupervised MT evaluation

Page 50: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Designed phrase tagset mapping for English and French

Page 51: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Evaluation based on parsing information from syntactic treebanks

Page 52: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Convert the word sequence into universal phrase tagset sequence

Page 53: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 𝐻𝑃𝑃𝑅 = 𝐻𝑎𝑟(𝑤𝑃𝑠𝑁1𝑃𝑠𝐷𝑖𝑓, 𝑤𝑃𝑟𝑁2𝑃𝑟𝑒,𝑤𝑅𝑐𝑁3𝑅𝑒𝑐)

– 𝑁1𝑃𝑠𝐷𝑖𝑓 =1

𝑛 𝑁1𝑃𝑠𝐷𝑖𝑓𝑖

– 𝑁2𝑃𝑟𝑒 = 𝑒𝑥𝑝 𝑤𝑛𝑙𝑜𝑔𝑃𝑛𝑁2

– 𝑁3𝑅𝑒𝑐 = 𝑒𝑥𝑝 ( 𝑤𝑛𝑙𝑜𝑔𝑅𝑛𝑁3)

Page 54: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho, Lynn Ling Zhu, Shuo Li. Accepted. In GSCL 2013. LNCS Vol. 8105, Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch.

– German Society for Computational Linguistics (poster paper): No word boundary in the Chinese expression

Chinese word segmentation is a difficult problem

Word segmentation is crucial to the word alignment in machine translation

We discuss the characteristics of Chinese and design optimized features

We formulize some problems and issues in Chinese word segmentation

Page 55: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 56: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFORMATION

– Aaron Li-Feng Han, Derek F. Wong, Lidia S. Chao, Yervant Ho. In TSD 2013. Plzen, Czech Republic. LNAI Vol. 8082, pp. 121-128. Volume Editors: I. Habernal and V. Matousek. Springer-Verlag Berlin Heidelberg.

– Text, Speech and Dialogue 2013 (oral presentation): We explore the unsupervised machine translation evaluation method

We design hLEPOR algorithm for the first time

We explore the POS usage in unsupervised MT evaluation

Experiments are performed on English vs French, German

Page 57: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 58: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

– Aaron Li-Feng Han, Derek Fai Wong and Lidia Sam Chao. In Proceeding of LP&IIS. M.A. Klopotek et al. (Eds.): IIS 2013, LNCS Vol. 7912, pp. 57–68, Warsaw, Poland. Springer-Verlag Berlin Heidelberg.

– Intelligent Information System 2013 (oral presentation): Named entity recognition is important in IR, MT, text analysis, etc.

Chinese named entity recognition is more difficult due to no word boundary

We compare the performances of different algorithm, NB, CRF, SVM, ME

We analysis the characteristics respectively on personal, location, organization names

We show the performance of different features and select the optimized one.

Page 59: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Page 60: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Ongoing and further works:

– The combination of translation and evaluation, tuning the translation model using evaluation metrics

– Evaluation models from the perspective of semantics

– The exploration of unsupervised evaluation models, extracting features from source and target languages

Page 61: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• Actually speaking, the evaluation works are very related to the similarity measuring. Where I have employed them is in the MT evaluation only. These works can be further developed into other literature:

– information retrieval

– question and answering

– Searching

– text analysis

– etc.

Page 62: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

Q and A Thanks for your attention!

Aaron L.-F. Han, 2013.08

Page 63: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 1. Weaver, Warren.: Translation. In William Locke and A. Donald Booth, editors,

• Machine Translation of Languages: Fourteen Essays. John Wiley and Sons, New

• York, pages 15-23 (1955)

• 2. Marino B. Jose, Rafael E. Banchs, Josep M. Crego, Adria de Gispert, Patrik Lambert,

• Jose A. Fonollosa, Marta R. Costa-jussa: N-gram based machine translation,

• Computational Linguistics, Vol. 32, No. 4. pp. 527-549, MIT Press (2006)

• 3. Och, F. J.: Minimum Error Rate Training for Statistical Machine Translation. In

• Proceedings of (ACL-2003). pp. 160-167 (2003)

• 4. Su Hung-Yu and Chung-Hsien Wu: Improving Structural Statistical Machine Translation

• for Sign Language With Small Corpus Using Thematic Role Templates as

• Translation Memory, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE

• PROCESSING, VOL. 17, NO. 7, SEPTEMBER (2009)

• 5. Xiong D., M. Zhang, H. Li: A Maximum-Entropy Segmentation Model for Statistical

• Machine Translation, Audio, Speech, and Language Processing, IEEE Transactions

• on, Volume: 19, Issue: 8, 2011 , pp. 2494- 2505 (2011)

• 6. Carl, M. and A. Way (eds): Recent Advances in Example-Based Machine Translation.

• Kluwer Academic Publishers, Dordrecht, The Netherlands (2003)

Page 64: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 7. Koehn P.: Statistical Machine Translation, (University of Edinburgh), Cambridge

• University Press (2010)

• 8. Arnold, D.: Why translation is dicult for computers. In Computers and Translation:

• A translator's guide. Benjamins Translation Library (2003)

• 9. Carroll, J. B.: Aan experiment in evaluating the quality of translation, Pierce, J.

• (Chair), Languages and machines: computers in translation and linguistics. A report

• by the Automatic Language Processing Advisory Committee (ALPAC), Publication

• 1416, Division of Behavioral Sciences, National Academy of Sciences, National Research

• Council, page 67-75 (1966)

• 10. White, J. S., O'Connell, T. A., and O'Mara, F. E.: The ARPA MT evaluation

• methodologies: Evolution, lessons, and future approaches. In Proceedings of the

• Conference of the Association for Machine Translation in the Americas (AMTA

• 1994). pp 193-205 (1994)

• 11. Su Keh-Yih, Wu Ming-Wen and Chang Jing-Shin: A New Quantitative Quality

• Measure for Machine Translation Systems. In Proceedings of the 14th International

• Conference on Computational Linguistics, pages 433-439, Nantes, France, July

• (1992)

Page 65: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 12. Tillmann C., Stephan Vogel, Hermann Ney, Arkaitz Zubiaga, and Hassan Sawaf:

• Accelerated DP Based Search For Statistical Translation. In Proceedings of the 5th

• European Conference on Speech Communication and Technology (EUROSPEECH97)

• (1997)

• 13. Papineni, K., Roukos, S., Ward, T. and Zhu, W. J.: BLEU: a method for automatic

• evaluation of machine translation. In Proceedings of the (ACL 2002), pages 311-318,

• Philadelphia, PA, USA (2002)

• 14. Doddington, G.: Automatic evaluation of machine translation quality using ngram

• co-occurrence statistics. In Proceedings of the second international conference

• on Human Language Technology Research(HLT 2002), pages 138-145, San Diego,

• California, USA (2002)

• 15. Turian, J. P., Shen, L. and Melanmed, I. D.: Evaluation of machine translation

• and its evaluation. In Proceedings of MT Summit IX, pages 386-393, New Orleans,

• LA, USA (2003)

• 16. Banerjee, S. and Lavie, A.: Meteor: an automatic metric for MT evaluation with

• high levels of correlation with human judgments. In Proceedings of ACL-WMT,

• pages 65-72, Prague, Czech Republic (2005)

Page 66: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 17. Denkowski, M. and Lavie, A.: Meteor 1.3: Automatic metric for reliable optimization

• and evaluation of machine translation systems. In Proceedings of (ACL-WMT),

• pages 85-91, Edinburgh, Scotland, UK (2011)

• 18. Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J.: A study of

• translation edit rate with targeted human annotation. In Proceedings of the Conference

• of the Association for Machine Translation in the Americas (AMTA), pages

• 223-231, Boston, USA (2006)

• 19. Chen, B. and Kuhn, R.: Amber: A modied bleu, enhanced ranking metric. In

• Proceedings of (ACL-WMT), pages 71-77, Edinburgh, Scotland, UK (2011)

• 20. Bicici, E. and Yuret, D.: RegMT system for machine translation, system combination,

• and evaluation. In Proceedings ACL-WMT, pages 323-329, Edinburgh,

• Scotland, UK (2011)

• 21. Taylor, J. Shawe and N. Cristianini: Kernel Methods for Pattern Analysis. Cambridge

• University Press 2004.

• 22. Wong, B. T-M and Kit, C.: Word choice and word position for automatic MT

• evaluation. In Workshop: MetricsMATR of the Association for Machine Translation

• in the Americas (AMTA), short paper, 3 pages, Waikiki, Hawai'I, USA (2008)

Page 67: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 23. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., and Tsukada, H.: Automatic evaluation

• of translation quality for distant language pairs. In Proceedings of the 2010

• Conference on (EMNLP), pages 944{952, Cambridge, MA (2010)

• 24. Talbot, D., Kazawa, H., Ichikawa, H., Katz-Brown, J., Seno, M. and Och, F.: A

• Lightweight Evaluation Framework for Machine Translation Reordering. In Proceedings

• of the Sixth (ACL-WMT), pages 12-21, Edinburgh, Scotland, UK (2011)

• 25. Song, X. and Cohn, T.: Regression and ranking based optimisation for sentence

• level MT evaluation. In Proceedings of the (ACL-WMT), pages 123-129, Edinburgh,

• Scotland, UK (2011)

• 26. Popovic, M.: Morphemes and POS tags for n-gram based evaluation metrics. In

• Proceedings of (ACL-WMT), pages 104-107, Edinburgh, Scotland, UK (2011)

• 27. Popovic, M., Vilar, D., Avramidis, E. and Burchardt, A.: Evaluation without references:

• IBM1 scores as evaluation metrics. In Proceedings of the (ACL-WMT),

• pages 99-103, Edinburgh, Scotland, UK (2011)

• 28. Petrov S., Leon Barrett, Romain Thibaux, and Dan Klein: Learning accurate,

• compact, and interpretable tree annotation. Proceedings of the 21st ACL, pages

• 433-440, Sydney, July (2006)

Page 68: CUHK intern PPT. Machine Translation Evaluation: Methods and Tools

• 29. Callison-Bruch, C., Koehn, P., Monz, C. and Zaidan, O. F.: Findings of the 2011

• Workshop on Statistical Machine Translation. In Proceedings of (ACL-WMT), pages

• 22-64, Edinburgh, Scotland, UK (2011)

• 30. Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M. and Zaidan,

• O. F.: Findings of the 2010 Joint Workshop on Statistical Machine Translation and

• Metrics for Machine Translation. In Proceedings of (ACL-WMT), pages 17-53, PA,

• USA (2010)

• 31. Callison-Burch, C., Koehn, P., Monz,C. and Schroeder, J.: Findings of the 2009

• Workshop on Statistical Machine Translation. In Proceedings of ACL-WMT, pages

• 1-28, Athens, Greece (2009)

• 32. Callison-Burch, C., Koehn, P., Monz,C. and Schroeder, J.: Further meta-evaluation

• of machine translation. In Proceedings of (ACL-WMT), pages 70-106, Columbus,

• Ohio, USA (2008)

• 33. Avramidis E., Popovic, M., Vilar, D., Burchardt, A.: Evaluate with Condence

• Estimation: Machine ranking of translation outputs using grammatical features. In

• Proceedings of the Sixth Workshop on Statistical Machine Translation, Association

• for Computational Linguistics (ACL-WMT), pages 65-70, Edinburgh, Scotland, UK

• (2011)


Recommended