+ All Categories
Home > Documents > Machine Translation MÖSG vt 2004

Machine Translation MÖSG vt 2004

Date post: 21-Mar-2016
Category:
Upload: hilde
View: 25 times
Download: 1 times
Share this document with a friend
Description:
Machine Translation MÖSG vt 2004. Anna Sågvall Hein. Can computers translate?. Not a simple yes or no depends on the text the purpose of the translation the required quality. Classical problems with MT. unrealistic expectations bad translations - PowerPoint PPT Presentation
45
Machine Translation MÖSG vt 2004 Anna Sågvall Hein
Transcript
Page 1: Machine Translation MÖSG vt 2004

Machine TranslationMÖSG vt 2004

Anna Sågvall Hein

Page 2: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Can computers translate?

Not a simple yes or no• depends on the text• the purpose of the translation• the required quality

Page 3: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Classical problems with MT

• unrealistic expectations• bad translations• difficulties in integrating MT in

the work flow– the Ericsson case

Page 5: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Basic translation strategies

• direct translation• transfer-based translation• statistical translation• combined strategies

Page 6: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Direct translation, 1

• no intermediary sentence structure

• the most important language component is a translation dictionary

• translation proceeds mostly word by word, or phrase by phrase

• translation problems are handled more or less case by case by means of specific rules

Page 7: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Direct translation, 2• quality

– typically browsing quality– depends on

• the quality of the translation dictionary• the coverage of the translation rules

– editing quality may be achieved• problems with

– ambiguity– inflection– word order– structural differences

Page 8: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Advanced classical approach (Tucker 1987)

• source text dictionary lookups and morphological analysis

• identification of homographs• identification of compounds• identification of nouns and verb

phrases• processing of idioms

Page 9: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Advanced approach, cont.

• processing of prepositions• subject-predicate identification• syntactic ambiguity

identification• synthesis and morphological

processing of target text• rearrangement of words and

phrases in target text

Page 10: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Feasibility of the direct translation strategyIs it possible to carry out the direct

translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure?

Page 11: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

SYSTRANSYStem TRANslation• developped in the US by Peter Toma• first version 1969 (Ru-En)• EC bought the rights of Systran in 1976• Systran SA, France, is the current

owner of the rights of Systran• currently 18 language pairs, excl.

Swedish• Swedish-->English is being introduced,

starting in June 2004(http://babelfish.altavista.com/)

Page 12: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Systran, cont.

• more than 1,600,000 dictionary units• 20 domain dictionaries• daily use by EC translators,

administrators of the European institutions

• originally a direct translation strategy– see H&S

• to-day more of a transfer-based strategy

Page 13: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 1: fairly good translation /Systran sv-en"Enskilda företagare som inte bildat

bolag klassificeras hit." 

"Individual entrepreneurs that have not formed companies are classified  here.”

Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats.

Page 14: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de

inte ens utsatts för influensa." 

"When the villages were contacted had they not even been exposed to flu.”

Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd.

Page 15: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 3: ambiguity problem/ Systran sv-en"Vad kan vi lära av

Arrawetestammen?" 

"What can we faith of the Arawete?”

Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb.

Page 16: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 4: ambiguity problem/ Systran sv-en”Extrapoleringen går till så här. " 

”The extrapolation goes to so here.”

Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord.

Page 17: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Motivations for transfer-based translation• lexical ambiguity• structural differences

See further Ingo 91 (6), Wikholm (89)

Page 18: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Transfer-based translation,1• intermediary sentence structure• provides a basis for the

systematic handling of grammatical problems and lexical choices

• basic processes– analysis– transfer– generation (synthesis)

Page 19: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Transfer-based translation, 2• knowledge-intensive• language modules

– dictionary and grammar of source language

– transfer dictionary and transfer rules

– dictionary and grammar of target language

Page 20: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Multra

• transfer-based translation engine• high quality• focus on restricted domains• developped at Uppsala University

Page 21: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Page 22: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Multra formalisms

intermediary structure– feature structure

• grammatical function & constituencyanalysis grammar

– proceduraltransfer

– unification based (Beskow 93)synthesis

– PATR-like style (Beskow 93)

Page 23: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Simplistic approach

• sentence splitting• tokenisation• handling capital letters• dictionary look-up and lexical

substitution• copying unknown words, digits,

signs of punctuation etc.• formal editing

Page 24: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 1: Multra

Sv. I oljefilterhållaren sitter en överströmningsventil.

En. The oil filter retainer has an overflow valve.

(from the Scania corpus)

sitter hasadv subjsubj obj

Page 25: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 2

Sv. Fyll på olja i växellådan. En. Fill gearbox with oil.(from the Scania corpus)

fyll på fillobj advadv obj

Page 26: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 3: Multra

Detta filter ska bytas med jämna mellanrum.

This filter must be renewed at regular intervals.

Lexical choices in the context

ska - mustbyta –renewmed - atjämna – regularmellanrum - interval

Page 27: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Ex. 4: Multra

Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600.

The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600.

gäller – applies tobeteckning – the designations

Page 28: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Feasibility of machine translation• Re-use of translations• Quality in relation to purpose• Sublanguage• Spell checked and grammar

checked SL• Controlled language• Human machine interaction• Evalution data and criteria

Page 29: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Re-use of previous translations• translation memories• translation dictionaries• statistical machine translation

Page 30: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Re-use techniques,1

• sentence alignment– linking source and target sentences

pairwise– success rate close to 100 %– translation memories

Page 31: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Re-use techniques, 2

• word alignment– linking sub-sentence segments,

typically, source and target words and phrases pairwise

– large-scale processing– success rate close to 80 %– translation dictionaries– statistical machine translation

Page 32: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

A word alignment exampleJag tar mittplatsen, som jag inte tycker om.

I take the middle seat, which I dislike.

jag – Itar – takemittplatsen – the middle seatsom – whichjag – Iinte tycker om – dislike

(from Tiedemann 2003)

Page 33: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Statistical machine translation• large scale word alignment

– raw translation dictionary • direct translation using the

dictionary– no translation rules

• smoothing the translation by means of a language model– statistically based

• decoding algorithm cruical• arabic – english• hindi - english

Page 34: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Quality

• publishing quality– high quality translation, good

enough for publishing, typically, after inspection and minor editing

• browsing quality– low quality translation,

comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words

Page 35: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Translation purposes• translation

– publishing quality• browsing

– browsing quality• gisting

– browsing quality• drafting

– publishing/browsing quality?• cross-language information

retrieval– browsing quality

Page 36: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

MT as a cross-language communication toolMT is used not only for pure

translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001)

Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001

(http://ourworld.compuserve.com/homepages/WJHutchins/MTS-2001.htm)

Page 37: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Restrictions on the input language

– sublanguage• text type• domain

– controlled language– spell checked– grammar checked

Page 38: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Typically

• general language – browsing quality

• restricted language – high quality

Page 39: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Spell checking and grammar checking• If there are spelling errors or typos

in the SL dictionary search will fail• If there are grammatical errors in

the SL grammatical analysis will fail

Where and how should spell and grammar checking be accounted for? Before or during the process?

Page 40: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Controlled language

controlled vocabulary– full lexical coverage, e.g. Scania

Swedishcontrolled grammar

– full grammatical coveragelanguage checker

– e.g. Scania Checker

Page 41: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Human intervention

before– language checking

during– e.g. ambiguity resolution

after– post-editing

Page 42: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Evaluation of MT

• coverage (recall)• quality (precision)

Page 43: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Current trends in direct translationre-use of translations

– translation memories of sentences and sub-sentence units such as words, phrases and larger units

– example-based translation– statistical translation

Will re-use of translations overcome the problems with the direct translation approach that were discussed above?

If so, how can the problems be handled?

Page 44: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Why machine translation?

• cheaper• faster• more consequent• when it succeeds ..

Page 45: Machine Translation MÖSG vt 2004

@Anna Sågvall Hein, MÖSG 2004

Assignment: Hable Con Ella (en-sv)

• Make a general quality assessment of the translation.

• Suggest a possible use of a translation of this kind. • Identify the steps that were taken in the

translation. • Specify the translation errors that were made and

discuss them.• Suggest improvements in the framework of the

direct translation strategy.• Motivate them.• Formalise them in a framework of your own

choice.• Discuss their general adequacy in the translation

of Swedish to English.


Recommended