Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo,...

transcript

Statistical Machine Translation of Texts with Misspelled Words

Nicola Bertoldi, Mauro Cettolo, Marcello FedericoFBK - Fondazione Bruno Kessler,

Trento, Italy

ACL 2010

Outline

Introduction System Data Evaluation Conclusions

Introduction

non-word error

Introduction

real-word error

Introduction

Six different typing error operations ◆ Substitution

Target: [We] had just come in from Australia.Error : [Ww] had just come in from Australia.

◆ InsertionTarget: is a good place to stay, if you are looking for a hotel [around] LAX airport.Error : is a good place to stay, if you are looking for a hotel [arround] LAX airport.

◆ DeletionTarget: The room was [excellent] but the hallway was [filthy].Error : The room was [exellent] but the hallway was [filty].

Introduction

◆ TranspositionTarget: The staff was [friendly].Error : The staff was [freindly].

◆ Run-OnTarget: I saw a teacher[.] who cares?Error : I saw a teacher[ ] who cares?

◆ SplitTarget: [We] had just come in from Australia.Error : [W e] had just come in from Australia.

Introduction

Outline

System

SystemStep 1.

Step 2.

SystemStep 3.

SystemStep 4.

SystemStep 5.Translation of the CN (e) is performed with the Moses decoder (Koehn et al., 2007)

Outline

DataEvaluation DataNon-word NoiseRandomly replace words in the text according to a list of 4,100frequently non-word errors provided in the Wikipedia.

Real-word NoiseReal-word errors are automatically introduced by another list of frequently misused words in the Wikipedia.

Random-word NoiseCorrupting the original text by randomly replacing, inserting,and deleting Characters.

Outline

Evaluation

Outline

Conclusions

◆ This paper addressed the issue of automatically translating written texts that are corrupted by misspelling errors.

◆ The enhanced MT system has been tested on texts corrupted with increasing noise levels of three different sources: random, non-word, and real-word errors.

◆ The impact of misspelling errors on MT performance depends on the noise rate, but not on the noise source.

Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo,...

Documents