+ All Categories
Home > Documents > What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Date post: 29-Jan-2016
Category:
Upload: brendan-kelly
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
What’s so hard about What’s so hard about translation? translation? Ed Kenschaft Ed Kenschaft University of Maryland University of Maryland UMIACS, CLIP Lab UMIACS, CLIP Lab
Transcript
Page 1: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

What’s so hard about What’s so hard about translation?translation?

Ed KenschaftEd Kenschaft

University of MarylandUniversity of Maryland

UMIACS, CLIP LabUMIACS, CLIP Lab

Page 2: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Translation Needs (1)Translation Needs (1)

AssimilationAssimilation News monitoringNews monitoring Intercepts, noisy documentsIntercepts, noisy documents

High recall, low precisionHigh recall, low precision

Page 3: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Translation Needs (2)Translation Needs (2)

DisseminationDissemination UN, EUUN, EU Commercial documentationCommercial documentation Bible translationBible translation

High recall & precisionHigh recall & precision

Page 4: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Translation Needs (3)Translation Needs (3)

EmergencyEmergency MilitaryMilitary MedicalMedical Disaster reliefDisaster relief

High precision, moderate recallHigh precision, moderate recall

Page 5: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

SIL InternationalSIL International

Faith-based Christian organizationFaith-based Christian organization Partner with speakers of languages that Partner with speakers of languages that

have never been written downhave never been written down PurposesPurposes

preserve the language and culturepreserve the language and culture document the language for studydocument the language for study translate the Bible and community translate the Bible and community

development materialsdevelopment materials Documented 1400+ languages in 70+ Documented 1400+ languages in 70+

countriescountries

Page 6: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Challenges [1]Challenges [1]

Ultra-low-density languagesUltra-low-density languages mostly unwrittenmostly unwritten no large (or small) parallel corporano large (or small) parallel corpora no Bible for bootstrappingno Bible for bootstrapping

Page 7: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Challenges [2]Challenges [2]

Untrained translatorsUntrained translators 66thth grade education grade education

One trained linguist for 10 languagesOne trained linguist for 10 languages

Page 8: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Challenges [3]Challenges [3] Exceedingly rich domain of discourseExceedingly rich domain of discourse

approximates all of natural languageapproximates all of natural language GenresGenres

historical narrativehistorical narrative dialogdialog poetrypoetry personal letterspersonal letters

TopicsTopics business, politics, sex, relationships, diet …business, politics, sex, relationships, diet … no controlled vocabularyno controlled vocabulary

Page 9: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Challenges [4]Challenges [4]

Demand for 100% accuracy/fluencyDemand for 100% accuracy/fluency Life-changing lessonsLife-changing lessons Easy to misinterpretEasy to misinterpret

Page 10: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Challenges [5]Challenges [5]

Nearly endless variety of target Nearly endless variety of target languageslanguages ~6800 languages~6800 languages ~1400 written, ~5400 unwritten~1400 written, ~5400 unwritten ~half will survive next century~half will survive next century ~2000-3000 remaining~2000-3000 remaining

Page 11: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Linguistic VariationLinguistic Variation

Phonological variationPhonological variation Morphological variationMorphological variation

three-boys-shot-arrows-at-the-gazellethree-boys-shot-arrows-at-the-gazelle Syntactic variationSyntactic variation

grammatical markers (e.g. dual, grammatical markers (e.g. dual, causative)causative)

discourse markers (e.g. topic/focus)discourse markers (e.g. topic/focus) honorificshonorifics

Page 12: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Cultural VariationCultural Variation Cleanse me with hyssop, and I will be Cleanse me with hyssop, and I will be

clean;clean;wash me, and I will be whiter than wash me, and I will be whiter than

snow.snow.(Psalm 51:7, NIV)(Psalm 51:7, NIV)

What is hyssop?What is hyssop? What is snow?What is snow? What does it mean to be white?What does it mean to be white?

Page 13: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Cultural VariationCultural Variation Cleanse me with a plant indigenous to the Cleanse me with a plant indigenous to the

lands of the ancient Near East, used in lands of the ancient Near East, used in Jewish religious ceremonies, and I will be Jewish religious ceremonies, and I will be whiter than the precipitation that falls like whiter than the precipitation that falls like rain when the weather is very cold, which rain when the weather is very cold, which indicates a state of moral purity.indicates a state of moral purity.

Page 14: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Intelligibility ≠ Fidelity (1)Intelligibility ≠ Fidelity (1)

Moses had horns.Moses had horns.

Page 15: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Intelligibility ≠ Fidelity (2)Intelligibility ≠ Fidelity (2)

Where there is no vision, the people Where there is no vision, the people perish. (Proverbs 29:18a, KJ21)perish. (Proverbs 29:18a, KJ21)

Page 16: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Intelligibility ≠ Fidelity (2)Intelligibility ≠ Fidelity (2)

Where there is no vision, the people Where there is no vision, the people perish. (Proverbs 29:18a, KJ21)perish. (Proverbs 29:18a, KJ21)

When people do not accept divine When people do not accept divine guidance, they run wild. (Pr 29:18a, guidance, they run wild. (Pr 29:18a, NLT)NLT)

Page 17: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Waste of Time?Waste of Time?

Can a computer solve all these Can a computer solve all these problems?problems? Not on your lifeNot on your life

Can a computer replace a translator?Can a computer replace a translator? Limited domains onlyLimited domains only

What can it do?What can it do? Word-processingWord-processing Data storage & analysisData storage & analysis First draft?First draft?

Page 18: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

General ApproachGeneral Approach

CAT vs. MTCAT vs. MT Linguistically informed systemsLinguistically informed systems Supervised learningSupervised learning Exploit all available resourcesExploit all available resources

SL resourcesSL resources Existing TL dataExisting TL data

Page 19: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Data RepresentationData Representation

Text encodingText encoding UnicodeUnicode

FontsFonts GraphiteGraphite

Interlinear textInterlinear text LinguaLinks, Toolbox, FieldWorksLinguaLinks, Toolbox, FieldWorks

Page 20: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

Elicitation & AnalysisElicitation & Analysis

Elicit syntactic & morphological dataElicit syntactic & morphological data AVENUE, EXPEDITIONAVENUE, EXPEDITION

Elicit word lists for language surveyElicit word lists for language survey WordSurvWordSurv

Page 21: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

SL ResourcesSL Resources

Related language adaptationRelated language adaptation CARLACARLA

Projection across word alignmentProjection across word alignment GIZA++, Multi-Align, Parser ProjectionGIZA++, Multi-Align, Parser Projection

Page 22: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

NLGNLG

Rich interlinguaRich interlingua TBTA (Tod Allman)TBTA (Tod Allman)

Statistical fluency enhancementStatistical fluency enhancement (Sebastian Varges)(Sebastian Varges)

Page 23: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

EvaluationEvaluation

Need for automationNeed for automation Multiplying documentsMultiplying documents Shortage of expertsShortage of experts

BLEUBLEU How well does it work?How well does it work? What does it mean?What does it mean?

METEORMETEOR Stresses recallStresses recall

Page 24: What’s so hard about translation? Ed Kenschaft University of Maryland UMIACS, CLIP Lab.

The Limits of NLPThe Limits of NLP

Who knows?Who knows?


Recommended