© 2010 The MITRE Corporation. All rights reserved
Sherri Condon, Dan Parvaz, John Aberdeen, Christy Doran, Andrew Freeman and Marwan Awad
The MITRE Corporation
Evaluation of Machine Translation Errors in English and
Iraqi Arabic
Approved for Public Release:10-101174. Distribution Unlimited
LREC 2010
© 2008 The MITRE Corporation. All rights reserved
2
© 2010 The MITRE Corporation. All rights reserved
Preview
Methods– DARPA Speech Translation– HTER and annotation process– Annotation categories
Iraqi Arabic to English (I→E) Errors– Polarity errors– Pronoun errors– Copula errors
English to Iraqi Arabic (E→I) Errors – Subject pronoun inflection errors– Word order errors– “Other” errors
Summary and Conclusions
© 2008 The MITRE Corporation. All rights reserved
3
© 2010 The MITRE Corporation. All rights reserved
DARPA Speech Translation Systems
2-way communication for English and Iraqi Arabic– Military domains and use cases– Checkpoint, facility inspection, civil affairs, training, medical– Funded 4 speech translation systems (labeled A-D)
Evaluations conducted by NIST and MITRE– Live evaluations with military users and Iraqi speakers– Offline evaluations using recordings of military users and
Iraqi speakers
Error analyses use translations of text transcriptions from offline recordings– Exclude errors from speech recognition
© 2008 The MITRE Corporation. All rights reserved
4
© 2010 The MITRE Corporation. All rights reserved
Evaluation Data
Samples from 2 evaluations– June 2008– November 2008
Translations from 4 systems Subset of offline inputs
Translation Direction June, 2008 Nov., 2008
English to Iraqi Arabic 436 372
Iraqi Arabic to English 388 432
Number of Translations Annotated
© 2008 The MITRE Corporation. All rights reserved
5
© 2010 The MITRE Corporation. All rights reserved
Error Analysis
THIS IS REALLY HARD!– Errors depend on what’s correct– But no single correct translation
Automated measures of translation quality like Translation Error Rate (TER) are not diagnostic– Scores based on changes needed to turn system output into
reference translation (insertion, deletion, substitution, shift)– Human TER (HTER) requires humans to create reference
translations as close as possible to system output
We used HTER for error annotation– Provides a maximally close correct translation– TER alignment and annotation facilitates our annotation
© 2008 The MITRE Corporation. All rights reserved
6
© 2010 The MITRE Corporation. All rights reserved
Annotation Process
Customize reference translations– NIST post-editing tool for HTER reference translations– 4 reference translations for post-editors
Align and annotate translations with TER– Annotators may change alignments– Keep word classes aligned where possible
Annotate TER errors– Identify major word classes of errors– Quantify polarity and speech act errors– Exclude minor errors
© 2008 The MITRE Corporation. All rights reserved
7
© 2010 The MITRE Corporation. All rights reserved
ID Reference Output TER RealignAnnotate
70 and and70 is I @ ssa*70 this this70 stuff stuff70 was D S null70 stolen stolen70 from from70 the the70 market market
Sample Annotation
*substituted speech act (takes priority over “word order” annotation)
© 2008 The MITRE Corporation. All rights reserved
8
© 2010 The MITRE Corporation. All rights reserved
Annotations
Null [synonyms, articles, some prepositions/inflections] Word Order [= TER ‘shift’] Polarity (negative to positive or positive to negative) Substituted Speech Act (e.g., question to statement) Untranslated (transliterated, “???”) Verb (deleted, inserted, and substituted) Noun (same) Pronoun (same) Pronoun-Verb Complex [for English contractions and
Arabic verbs with subject inflection only] (same) Verb Person Inflection [substitute Arabic subject inflection] Other [adjectives, prepositions, conjunctions] (same)
© 2008 The MITRE Corporation. All rights reserved
9
© 2010 The MITRE Corporation. All rights reserved
June I→E: Proportions of TER Error Types
Deletions Insertions Substitutions Word Order0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
0.450
ABCD
Systems
© 2008 The MITRE Corporation. All rights reserved
10
© 2010 The MITRE Corporation. All rights reserved
June I→E: Proportions of Word Class Errors
Pronouns Pro-V Complex
Verbs Nouns Other0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
ABCD
Systems
© 2008 The MITRE Corporation. All rights reserved
11
© 2010 The MITRE Corporation. All rights reserved
I→E: Polarity Errors
Transcript: عندهم و متدرب جندي ثالثين حالياً عنديخفيفة أسلحة
MT: I don’t have at the moment thirty soldier trained and they have light weapons
Ref: I have at the moment thirty trained soldiers and they have light weapons
Transcript: و مستعجلين كنا وقت عندنا كان ما الله و التشيكنا ما سوينا فما بالعجل ناسيشتغلون محتاجين
قبل عليهم MT: no and god we do not have time we were in a hurry and we
need people to work hurry up so we did nothing we checked them before
Ref: no we did not have time we were in a hurry and we need people to work immediately so we did not check them before System A B C D
Frequency 2 2 2 1
© 2008 The MITRE Corporation. All rights reserved
12
© 2010 The MITRE Corporation. All rights reserved
June I→E Pronoun Issues: Subjects
Frequency of pronouns (19%) and nouns (17%) are nearly equal yet pronoun errors are 2 times higher than noun errors
In Iraqi Arabic pronominal subjects are expressed only as verb inflection– MT: was bitten by a scorpion– Ref: he was bitten by a scorpion
But some contrasts are neutralized/iftahamit/ إفتهمت understand+past+1st or 2nd person singular subject“I/you understood”
– MT: you see his symptoms– Ref: I saw his symptoms
© 2008 The MITRE Corporation. All rights reserved
13
© 2010 The MITRE Corporation. All rights reserved
I→E Pronoun Issues: Insertions
Subject pronouns (few)– MT: those people they store them in this complex– Ref: those people store them in this complex
Resumptive pronouns (frequent)– MT: it is about three kilometers from the point the
checkpoint that he ran away from it– Ref: it is about three kilometers from the point the
checkpoint that he ran away from– MT: the area is four streets that will probably restrict it– Ref: the area is four streets that we will probably surround
These are non-null only if they might cause confusion, e.g., garden paths
© 2008 The MITRE Corporation. All rights reserved
14
© 2010 The MITRE Corporation. All rights reserved
I→E Pronoun Issues: Gender
Iraqi Arabic does not have a neutral gender Many examples with it instead of he or she
– MT: are taking care of it god willing and hopefully it will get better a little bit more
– Ref: we are taking care of him god willing and hopefully he will get better soon
– MT: of course I mean it is in good condition– Ref: of course I mean she is in good condition
Only one example of he instead of it– MT: he civilian house consists of three rooms– Ref: it is a civilian house consisting of three rooms
© 2008 The MITRE Corporation. All rights reserved
15
© 2010 The MITRE Corporation. All rights reserved
I→E Verbs: English be vs. Arabic “be”
English be serves several functions– They are eating at the restaurant (progressive)– The car was driven by a teenager (passive)– Sam is my brother (copula: identity)– Julia is brilliant (copula: attribution)
Arabic copula is not used in present tense– MT: no sir all the family in the house – Ref: no sir all the family is in the house – MT: but the problem those lazy and sleep on the at night– Ref: but the problem is they are lazy and sleep at night
Many errors with be are more complex errors
© 2008 The MITRE Corporation. All rights reserved
16
© 2010 The MITRE Corporation. All rights reserved
Proportion of be in June I→E Verb Errors
A B C D0.000
0.100
0.200
0.300
0.400
0.500
0.600
Systems
© 2008 The MITRE Corporation. All rights reserved
17
© 2010 The MITRE Corporation. All rights reserved
June E→I: Proportions of TER Error Types
Deletions Insertions Substitutions Word Order0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
0.450
0.500
0.550
ABCD
Systems
© 2008 The MITRE Corporation. All rights reserved
18
© 2010 The MITRE Corporation. All rights reserved
June E→I: Proportions of Word Class Errors
Pronou
ns
Verb P
erson
Infle
ction
Verbs
Nouns
Other
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
ABCD
Systems
© 2008 The MITRE Corporation. All rights reserved
19
© 2010 The MITRE Corporation. All rights reserved
E→I: Subject Verb Agreement Inflection
With an expressed subject, subject inflection on the verb that does not agree may cause confusion
Source: my marines are going to search the house
Ref: رح مالتي البيت ونفتشيالمارينز MT: رح ا مالتي فتشالبيت ألمارينزRef: AlmArynz mAlty rH yft$wn AlbytRef: the-Marines my will 3m-search-pl the-house
MT: AlmArynz mAlty rH >ft$ AlbytMT: the-Marines my will 1s-search the-house
Special annotation for these errors: Verb Person Inflection Relatively high frequency, except in rule-based system
© 2008 The MITRE Corporation. All rights reserved
20
© 2010 The MITRE Corporation. All rights reserved
E→I: Pronominal Subject Inflection on Verbs
Pronoun errors occur when subject inflection does not match the source subject pronounSource: I might need to tell my commander I am stopping you
Ref: الزم مالي أيمكن للمسؤول وقفكأقولMT: الزم مالي تيمكن للمسؤول وقفنقولRef: ymkn lAzm >qwl llms&wl mAly >wqfkRef: maybe must 1st-sg-say to-the-official my 1st-sg-stop-
2nd-sg
MT: ymkn lAzm tqwl llms&wl mAly nwqfkMT: maybe must 2m/3f-say to-the-official my 1st-pl-stop-
2nd-sg
Number errors usually annotated as ‘null’ (green font) Person errors dramatically change meaning (red font)
© 2008 The MITRE Corporation. All rights reserved
21
© 2010 The MITRE Corporation. All rights reserved
E→I: Both Subject and Verb are Incorrect
With pronominal subject unexpressed, a single verb may incorporate more than one significant error
Source: we will record inside who it belongs to Ref: رح منو سجلنإحنا مال هو جوة MT: رح مال قعديإحنا منو جوة
Ref: <HnA rH nsjl jwp hw mAl mnwRef: we will we-record inside he possession whom
MT: <HnA rH yqEd jwp mnw mAlMT: we will he-sits inside whom possession
Special annotation for Pronoun-Verb Complex Should count as both pronoun and verb error Low frequency
© 2008 The MITRE Corporation. All rights reserved
22
© 2010 The MITRE Corporation. All rights reserved
E→I: Word Order Errors: Noun-Adjective
Slightly more word order errors in E→I vs. I→E In both directions, a significant proportion of these
reverse noun head and modifier orderSource: they have additional supplies
Ref: إضافية التجهيزاتعندهمMT: التجهيزات إضافيعند Ref: Endhm AltjhyzAt <DAfypRef: at+them det+supplies additional+fem
MT: End <DAfy AltjhyzAtMT: with additional det+supplies
© 2008 The MITRE Corporation. All rights reserved
23
© 2010 The MITRE Corporation. All rights reserved
E→I: Word Order Errors: Noun-Noun
This is the Arabic noun-noun modification known as the construct or idafaSource: How does your source know this?
Ref: الشيء مالتك المصدرشلون بهذا عرفMT: أعرفهذا مصدرمالتك شلونRef: $lwn AlmSdr mAltk Erf bh*A Al$y’Ref: how det-source poss-2sm 3s-know in-this det-thing
MT: $lwn mAltk mSdr >Erf h*AMT: how poss-2sm source 1s-know this
.
© 2008 The MITRE Corporation. All rights reserved
24
© 2010 The MITRE Corporation. All rights reserved
E→I: Word Order Errors in Idafa
40% of November 2008 E→I word order errors are wrong idafa order
Source: How does your source know this?
Ref: أشوف علمود مالتكم محطةإجيت الكهرباءMT: الكهرباء أشوف علمود مالتكم المحطةإجيت Ref: <jyt Elmwd >$wf mHTp AlkhrbA’ mAltkm
Ref: came+1s in-order-to see+1s station-of det+electricity poss-2p
MT: <jyt Elmwd >$wf AlkhrbA’ AlmHTp mAltkm
MT: came+1s in-order-to see+1s det+electricity det+station poss-2p
.
.
© 2008 The MITRE Corporation. All rights reserved
25
© 2010 The MITRE Corporation. All rights reserved
E→I: “Other” Errors from Phrasal Verbs
Phrasal verbs are frequently treated as verbs plus prepositions
Source: we have to go through the detaining process Ref: الحجز نسويالزم عملية MT: عنطريق الزم العملية نروح الحجزRef: lAzm nswy Emlyp AlHjzRef: must 1pl-do +def-process the-
detention
MT: lAzm nrwH En Tryq AlHjz AlEmlypMT: must 1pl-go from road the-detention the-
process
English source "to go through" roughly means "to do from start to finish"
MT translated it as “motion through” or "to take a certain route"
This is a type of word sense error
© 2008 The MITRE Corporation. All rights reserved
26
© 2010 The MITRE Corporation. All rights reserved
E→I: “Other” Multiword Expression Errors
23% of “Other” errors involve multiword expressions in the November 2008 corpus
Source: we can give you funds to where you can go out and buy the materials
Ref: الفلوس ن ننطيك المواد علمودقدر وتشتري تطلع MT: المواد وين الفلوس أقدر وتشتري تطلع Ref: nqdr nnTyk Alflws Elmwd tTlE wt$try AlmwAd
Ref: can+1p 1p+give+2ms det+money in-order-to 2ms+go-up and+2ms+buy det+material
MT: >qdr Alflws wyn tTlE wt$try AlmwAd
MT: can+1s det+money where 2ms+go-up and+2ms+buy det+material
© 2008 The MITRE Corporation. All rights reserved
27
© 2010 The MITRE Corporation. All rights reserved
June I→E: Error Type Proportions by Word Class
Pronouns Verbs Nouns Other TotalTo English
To Arabic
To English
To Arabic
To English
To Arabic
To English
To Arabic
To English
To Arabic
Deletion 0.109 0.065 0.121 0.026 0.038 0.036 0.056 0.086 0.323 0.214
Insertion 0.057 0.034 0.043 0.024 0.017 0.026 0.043 0.047 0.160 0.132
Substitution 0.095 0.059 0.090 0.077 0.048 0.119 0.115 0.132 0.348 0.387
Total 0.261 0.158 0.253 0.127 0.103 0.181 0.214 0.266 0.831 0.732
© 2008 The MITRE Corporation. All rights reserved
28
© 2010 The MITRE Corporation. All rights reserved
November I→E: Error Type Proportions by Word Class
Pronouns Verbs Nouns Other TotalTo English
To Arabic
To English
To Arabic
To English
To Arabic
To English
To Arabic
To English
To Arabic
Deletion 0.112 0.039 0.102 0.022 0.046 0.031 0.075 0.053 0.334 0.144
Insertion 0.047 0.028 0.039 0.035 0.007 0.039 0.039 0.079 0.131 0.182
Substitution 0.102 0.024 0.087 0.072 0.052 0.103 0.081 0.153 0.323 0.352
Total 0.261 0.092 0.228 0.129 0.105 0.173 0.195 0.284 0.789 0.678
Total June 0.261 0.158 0.253 0.127 0.103 0.181 0.214 0.266 0.831 0.732
© 2008 The MITRE Corporation. All rights reserved
29
© 2010 The MITRE Corporation. All rights reserved
I→E: Other Error Proportions
June 2008 November 2008
Error Type To English To Arabic To English To Arabic
Word Order 0.139 0.170 0.171 0.166
Pro-V Complex 0.013 0.007 0.003 0.000
Verb Person n/a 0.090 n/a 0.155
Polarity 0.009 0.002 0.017 0
Speech Act 0.006 0 0.019 0
Untranslated 0.001 0 0.001 0
Total 0.169 0.269 0.211 0.321
© 2008 The MITRE Corporation. All rights reserved
30
© 2010 The MITRE Corporation. All rights reserved
Error Frequencies and BLEU Scores
System
June TER Errors
NovemberTERErrors
June Non-null Errors*
NovemberNon-null Errors*
June BLEU Scores
NovemberBLEU Scores
A 292 269 176 /1.81 180 /1.67 .469 .516 B 355 354 240 /2.58 223 /2.06 .446 .471 C 287 279 166 /1.71 175 /1.64 .484 .502 D 291 229 189 /1.94 146 /1.35 .475 .500
I to
E
A 353 225 179 /1.64 134 /1.44 .341 .363 B 408 222 203 /1.86 132 /1.42 .305 .327 C 246 144 116 /1.06 87 /0.94 .339 .378 D 233 221 115 /1.05 104 /1.12 .325 .369
E to
I
*raw frequency/normalized per input
© 2008 The MITRE Corporation. All rights reserved
31
© 2010 The MITRE Corporation. All rights reserved
Conclusions
Linguistic differences will always challenge translation systems
Some differences are difficult even for high frequency expressions like the copula– The need to insert lexemes not present in the source– Or to remove lexemes that are present in the source– These are characteristics of multiword expressions
Discourse context is needed for deictic elements like pronouns– Iraqi Arabic speakers know whether the speaker is referring to
“I” or “you” from the context– Knowing whether to translate Arabic “he” or “she” as “it”
requires knowledge of the referent of the pronoun
© 2008 The MITRE Corporation. All rights reserved
32
© 2010 The MITRE Corporation. All rights reserved
Future Work
Compute relative weight of error types– Compare to human judgments collected by NIST– Compute regression tests
Compare July 2007 with November 2008 translations Additional subcategories of errors
© 2008 The MITRE Corporation. All rights reserved
33
© 2010 The MITRE Corporation. All rights reserved
Word Sense Ambiguities
June 2008– I -> E averaged .021– E -> I averaged .032
These are low compared to Vilar et al. (2006) After analysis of November E ->1 “Other” errors,
annotators were more sensitive to broader class of word sense errors– November E ->1 is about 10%– Comparable to Vilar et al. (2006)
November I -> E word sense analysis is incomplete
© 2008 The MITRE Corporation. All rights reserved
34
© 2010 The MITRE Corporation. All rights reserved
Inter-Annotator Reliability
English annotation performed by 3 native speakers– June 2008 annotated independently– November 2008 each annotated twice and differences
resolved 3 Arabic annotators
– 2 non-native speakers and 1 native speaker– Half annotated by each non-native speaker– All annotations reviewed by native speaker– Differences resolved