Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate

technologyfrom seed

LINGUISTIC EVALUATIONOF SUPPORT VERB CONSTRUCTIONS

BY OPENLOGOS AND GOOGLE TRANSLATE

ANABELA BARREIROINESC-ID

KUTZ ARRIETAOracle

JOHANNA MONTIUniversity of Sassari

WANG LINGCMU-IST

BRIGITTE ORLIACLogos Institute

FERNANDO BATISTAINESC-ID, ISCTE-IUL

SUSANNE PREUßSaarland University

ISABEL TRANCOSOINESC-ID, IST

Language Resources and Evaluation Conference 26-31 May, Reykjavik, Iceland

2

• Introduction– Towards Hybrid Machine Translation– OpenLogos and Google Translate Models

• Evaluation Task – Corpus and Datasets– Quantitative Results– Linguistic Evaluation Details

• Current Work– Semantico-Syntactic Knowledge Integration into SMT

• Conclusions and Future Work

Outline

3

• MT GOAL– researchers aim for robust MT systems that can produce high

quality translations

• CURRENT PROBLEMS– translations produced by widely used MT systems still show

unfortunate errors that require significant post-editing effort– there is lack of periodical qualitative evaluation efforts

involving MT systems of different nature– state-of-the-art quality metrics and estimation have been

targeting human-factors tasks (post-editing time and effort), but NOT diagnosing fine-grained linguistic errors to improve syntactic structure and meaning

Introduction

4

• CURRENT TREND– produce systems that combine linguistic resources and

analysis with statistical techniques that will lead to linguistically enhancing SMT models

• OUR MOTIVATION– belief that an effective method to advance MT research is to

bring different approaches together, comparing them and measuring which modules need improvement

– to our knowledge, no major effort has been made to combine the strengths of different MT approaches with the purpose of overcoming known weaknesses on the basis of a joint linguistic evaluation of those weaknesses

Introduction

5

• OUR GOALS– advance hybrid MT, starting by understanding different

approaches, their weaknesses and strengths

– perform a systematic fine-grained linguistic analysis of the performance of individual models

– The first exercise to achieve our goals is to evaluate the performance of RBMT and SMT when dealing with a very specific linguistic phenomenon: support verb constructions

Introduction

6

• A current trend in MT research is the creation of HMT models that combine linguistic knowledge with statistical techniques

• HMT systems attempt to combine RBMT systems [Scott, 2003] with data-driven MT systems, such as phrase-based SMT [Koehn, 2007]

• System combination often leads to improvements in translation quality, as different systems tend to address different translation challenges

• it is still not obvious which HMT approach will be the most efficient one and will lead to higher quality translation in the long run

Towards Hybrid Machine Translation

7

• SMT models learn generalizations of the translation process using parallel corpora– they tend to perform better than RBMT when parallel corpora

is abundant (English-Mandarin)– when parallel corpora is scarce (Spanish-Basque), they have

insufficient data to learn generalizations [Labaka, 2007]

• Morphologically rich languages require more data to learn accurate translations– SMT models for morphologically rich languages have been

proposed [Chahuneau, 2013]– RBMT systems with manually-encoded morphology are an

alternative for resource-poor languages


8

• Some methods to combine RBMT with SMT:– combine the translations of the same text by two different systems

[Eisele, 2008] [Heafield, 2011]– use data-driven techniques to improve RBMT systems

[Eisele, 2008] uses phrase pair extraction in phrase-based SMT to extract phrasal translations used to improve the coverage of a RBMT system

– a similar method using example-based MT for the same end has been proposed [Sanchez, 2009]

– use statistical post-editing methods to improve RBMT translation quality [Elming, 2006] [Simard, 2007] [Dugast, 2007] [Terumasa, 2007]

– use RBMT systems to enhance data-driven approaches. [Shirai, 1997] uses an example-based MT system [Brown, 1996] to create an initial translation template, and a RBMT system to translate individual words and phrases according to this template


9

• is an open source copy of the commercial Logos System• addresses morphology, syntax, and semantics, has robust

parsers, sets of semantico-syntactic rules, terminology sets and tools

• pattern-based methodology– closer in spirit to the SMT approach with the advantage of

including semantic knowledge/understanding• uses an intermediate language (SAL) to encode linguistic

information and process text– SAL contributes to OpenLogos (OL) high quality translation and

lessens one of the main problems in SMT (the sparseness in linguistic examples)

• its linguistic knowledge databases have not been developed for over 10 years

The OpenLogos Model

10

• one of the most widely used online MT systems• this SMT system benefits from the large amount of parallel

data that Google collects from the web– in March 2014, it was set to account for 80 language pairs

• translation quality is highly dependent on the language pair, producing better results for close language pairs (Portuguese and Spanish) and languages for which large amounts of parallel data are available

• closed system, however, no knowledge of semantic understanding is known to exist in Google Translate (GT)

The Google Translate Model

11

• sentences containing 100 support verb constructions (SVC) extracted from the news and Internet

• SVC - multiword or complex predicate formed by a semantically weak verb, and a predicate noun/adjective/adverb [Barreiro, 2008]

– make a presentationsupport verb make + predicate noun presentation

– make it simplesupport verb make + predicate adjective simple

Evaluation Task: Corpus

12

• Why SVC?– studied systematically within the Lexicon-Grammar Theory

• the scientific study of SVC eliminates subjectivity concerns for the evaluation task

– occur abundantly in texts – recognized and processed computationally

• in general and specific-purpose corpora • for several languages

– most MT systems still fail at addressing the compositional aspect of multiword units• when translated incorrectly, SVC have a negative impact in the

understandability and quality of translations


13

• Why SVC?– SVC can be non-contiguous (the individual elements that

compose the unit are placed apart in the sentence), with a smaller or greater number of inserts

• An insert is any word in between elements of the multiword other than an article before a predicate noun

• we are taking a growing interest in

– non-contiguous SVC are extremely difficult to align in SMT, remaining one of the key cross-language challenges for MT


14

Support Verb Constructions Types in Our Corpus

Nominal Support Verb Construction (NSVC) make a presentation

Adjectival Support Verb Construction (ADJSVC) be meaningful

Contiguous nominal (NON-CONT NSVC)

have [ADV+ADJ-particularly good] links

Prepositional nominal (PREPNSVC)

give an illustration of

Non-contiguous prepositional nominal (NON-CONT PREPNSVC)

be the [ADJ-immediate] cause of

Idiomatic nominal (IDIOM NSVC)

set in motion, place at risk, go on strike

Idiomatic prepositional nominal (IDIOM PREPNSVC)

earn an income of

Non-contiguous idiomatic nominal (NON-CONT IDIOM NSVC)

hold [NP-the option] in place, be of [ADJ-practical] value

Non-contiguous idiomatic prepositional nominal (NON-CONT IDIOM PREPNSVC)

give [PRO-us] a [bird’s-eye] view of, be [ADV-clearly] at odds with, open talks [May 14] with

15

Support Verb Constructions Types in Our Corpus

Nominal Support Verb Construction (NSVC) make a presentation

Adjectival Support Verb Construction (ADJSVC) be meaningful

Non-contiguous adjectival (NON-CONT ADJSVC)

be [ADV-extremely] selective

Prepositional adjectival (PREPADJSVC)

be known as; be involved in

Non-contiguous prepositional adjectival (NON-CONT PREPADJSVC)

fall [ADV-so far] short of

16

• Each SVC was annotated according to the SVC taxonomy• SVC corpus was translated into FR, GE, IT, PT and ES, using the

OL and the GT systems• native linguists evaluated the SVC translation quality for each

target language and classified the errors according to a binary evaluation metrics: – OK ERR (agreement, morphologically-related or other

problems, such as incorrect prepositions, wrong word order)• a comprehensive qualitative evaluation of mistranslations

according to the different types of SVC was provided• none of the systems was trained for the task - texts were not

domain specific

Evaluation Task: Setup

17

Quantitative Results

Lang. pair System OK ERR Agreem Other

EN-FRGT 64 32 4 -

OL 51 48 1 -

EN-GEGT 37 46 3 14

OL 60 33 1 6

EN-ITGT 61 31 - 8

OL 43 52 - 5

EN-PTGT 68 27 5 -

OL 41 58 1 -

EN-ESGT 51 41 6 2

OL 25 70 3 2

Results for translation of the 100 SVC in our corpus for FR, GE, IT, PT, and ES

with the OL and the GT MT systems

18

• OL translates correctly more SVC than GT• incorrect translations (for both systems) concern:

– word choice, incl. most prepositions - lexical (L)– word order, incl. incorrect clause segmentation - order (O)– word form, incl. choice between bare-infinitive and to-

infinitive - morphology (M)– missing word, mainly auxiliary and main verb - ellipsis (E)

• GT has + lexical, morphology and missing word errors than OL • GT lexical coverage is poor “wrt” contiguous SVC• GT does not translate well the GE verb split (even after

reordering)

Linguistic EvaluationEN-GE

19

• GT translates correctly more SVC than OL• most translation errors by both systems involved:

– incorrect lexical choice for some or all of the elements of the SVC (non-translation or literal translation)

– wrong agreement (subject-verb, subject-predicate adjective)– non-contiguous and idiomatic SVC– less idiomatic SVC - problems with (i) prepositions; (ii) literal

translation of the support verb and (iii) wrong lexical choice for the predicate noun

– prepositions and determiner assignment, which require minor post-editing corrections (e.g., prepositional adjectival SVC)

Linguistic EvaluationEN-FR/IT/PT/ES

20

• In general, SVC problems by GT were more structural, while SVC problems by OL were more lexical • OL would easily translate contiguous and non-contiguous SVC correctly,

provided it added it to its dictionary and rule DB• OL is able to resolve the SVC internal modifiers better than GT,

which removes some meaning from the source in the translation• OL use of linguistic knowledge in its structural analysis is a

powerful feature that can turn OL performance for the Romance languages as satisfactory as that for GE

• Higher quality translation can be achieved if we combine:• OL ability to translate different surface structures of a sentence• GT rich word selection powered by sophisticated statistical

methods to extract knowledge from large volumes of parallel data

Linguistic Evaluation: Conclusions

21

• In the OL system, linguistic elements are represented in a semantico-syntactic abstraction language (SAL) with ontological properties

• SAL represents the heart of OL, accounting for its effectiveness in parsing and semantic understanding [Scott, 2003] [Barreiro et al., 2011] [Barreiro et., 2014]– http://www.l2f.inesc-id.pt/~abarreiro/openlogos-tutorial/INDEX.HTM

• SAL is hierarchical, made up of supersets, sets and subsets • SAL knowledge is encoded in the lexicon,

both in the dictionary entries and in the rules.

• Bilingual dictionaries with SAL knowledge are available at:– http://metanet4u.l2f.inesc-id.pt

Proposal for Semantico-Syntactic Knowledge Integration into SMT

22

• In OL, all NL input sentences are converted into SAL patterns, which represent the semantico-syntactic and morphological features of each word

• SAL elements interact with semantico-syntactic rules called SEMTAB rules, which– represent the meaning of words on the basis of their

association with other words (context)– disambiguate the meanings of words in the source text by

identifying the syntactic structures underlying each meaning– provide the target language equivalents of each identified

meaning of a source language– are conceptual and encode deep structure relations


23

• called after dictionary look-up and during the execution of target transfer rules (TRAN rules) to solve ambiguity problems (verb dependencies) and multiwords, overriding the default dictionary transfer

• When a sentence is being parsed by TRAN, OL sends the SAL patterns to the SEMTAB database to look for a rule match

• If the rule exists for a linguistic string, TRAN uses that rule and overrides the dictionary transfer for that string


24

• A string can maintain the SVC structure or be paraphrased

apply paint toPT: aplicar tinta a / pintar

• The SEMTAB rule applies to different surface structures of the SVC and any insert specified in the rule

they applied immediately red paint (immediately) toPT: aplicaram imediatamente tinta vermelha a


25

• As long as the SEMTAB rule exists in the database, OL can process and translate correctly all the incorrectly translated SVC in our corpus (by OL and GT)

• The OL method can overcome the structural problems presented by SMT, not only the contiguous, but also the non-contiguous SVC, independently of how remotely they occur in the sentence

• The OL methodology applies to any type of multiword and allows the translation of other context-sensitive challenges


26

• Multiwords (SVC) are responsible for most translation errors– researchers need to develop approach-independent

systematic linguistic quality evaluation metrics with phased error categorization tasks where specific linguistic phenomena (such as SVC) can be evaluated individually in stages by MT expert linguists

• fine-grained error categorization can contribute to more controlled and systematic evaluation tasks

• evaluation needs to target each group of linguistic errors and identify which system has more difficulties translating each type of linguistic challenge (paradigmatic evaluation)

Conclusions and Future Work

27

• evaluation tasks require the construction of corpora to test grammatical correctness addressing individual linguistic phenomena– different types of multiwords, relative constructions, passives,

pronouns, determiners, locative prepositions, etc.

• TOWARDS HYBRIDIZATION– the question “how effectively can rule-based and statistical

MT be combined?” can only be answered after linguistic quality evaluation metrics are developed and validated by the MT community• no effective hybridization can take place before linguistic

evaluation of the results provided by different approaches is successfully accomplished

Conclusions and Future Work

28

Thank you!

This research was supported by FCT Fundação para a Ciência e Tecnologia, through grant SFRH/BPD/91446/2012) and project PEst-OE/EEI/LA0021/2013.

Date post:	24-Apr-2015
Category:	Technology
Upload:	anabela-barreiro
View:	353 times
Download:	1 times

Linguistic Evaluation of Support Verb Construction Translations by OpenLogos and Google Translate

Technology