+ All Categories
Home > Documents > MT Challenges Ed Kenschaft University of Maryland kensch at umd.

MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Date post: 13-Jan-2016
Category:
Upload: robert-harrell
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
MT Challenges MT Challenges Ed Kenschaft Ed Kenschaft University of Maryland University of Maryland kensch at umd kensch at umd
Transcript
Page 1: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

MT ChallengesMT Challenges

Ed KenschaftEd Kenschaft

University of MarylandUniversity of Maryland

kensch at umdkensch at umd

Page 2: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

My PerspectiveMy Perspective

Software engineerSoftware engineer Linguistics student at UMDLinguistics student at UMD Researcher in NLP groupResearcher in NLP group Studied with SIL translatorsStudied with SIL translators Analyst with SIL software Analyst with SIL software

developmentdevelopment

Page 3: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

SIL InternationalSIL International

Faith-based Christian organizationFaith-based Christian organization Partner with speakers of languages that Partner with speakers of languages that

have never been written downhave never been written down PurposesPurposes

preserve the language and culturepreserve the language and culture document the language for studydocument the language for study translate the Bible and community translate the Bible and community

development materialsdevelopment materials Documented 1400+ languages in 70+ Documented 1400+ languages in 70+

countriescountries

Page 4: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Challenges of Bible Challenges of Bible TranslationTranslation

Ultra-low-density languagesUltra-low-density languages Nearly endless variety of target languagesNearly endless variety of target languages

2000-3000 remaining2000-3000 remaining Exceedingly rich domain of discourseExceedingly rich domain of discourse

approximates all of natural languageapproximates all of natural language Demand for 100% accuracy/fluencyDemand for 100% accuracy/fluency Cultural variationCultural variation Intelligibility ≠ FidelityIntelligibility ≠ Fidelity

Page 5: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Cultural ContextCultural Context Cleanse me with hyssop, and I will be clean;Cleanse me with hyssop, and I will be clean;

wash me, and I will be whiter than snow.wash me, and I will be whiter than snow.(Psalm 51:7, NIV)(Psalm 51:7, NIV)

What is hyssop?What is hyssop? What is snow?What is snow? What does it mean to be white?What does it mean to be white? Cleanse me with a plant indigenous to the lands of Cleanse me with a plant indigenous to the lands of

the ancient Near East, used in Jewish religious the ancient Near East, used in Jewish religious ceremonies, and I will be whiter than the ceremonies, and I will be whiter than the precipitation that falls like rain when the weather is precipitation that falls like rain when the weather is very cold, which indicates a state of moral purity.very cold, which indicates a state of moral purity.

Page 6: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Intelligibility ≠ FidelityIntelligibility ≠ Fidelity

Where there is no vision, the people Where there is no vision, the people perish. (Proverbs 29:18a, KJ21)perish. (Proverbs 29:18a, KJ21)

When people do not accept divine When people do not accept divine guidance, they run wild. (Pr 29:18a, guidance, they run wild. (Pr 29:18a, NLT)NLT)

Page 7: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Waste of Time?Waste of Time?

Can a computer replace a translator?Can a computer replace a translator? Limited domains onlyLimited domains only

What can it do?What can it do? Word-processingWord-processing Data storage & analysisData storage & analysis First draft?First draft?

Page 8: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

General ApproachGeneral Approach

CAT vs. MTCAT vs. MT Linguistically informed systemsLinguistically informed systems Supervised learningSupervised learning Exploit all available resourcesExploit all available resources

SL resourcesSL resources Existing TL dataExisting TL data

Page 9: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Data RepresentationData Representation

Text encodingText encoding UnicodeUnicode

FontsFonts GraphiteGraphite

Interlinear textInterlinear text LinguaLinks, Toolbox, FieldWorksLinguaLinks, Toolbox, FieldWorks

Page 10: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

Elicitation & AnalysisElicitation & Analysis

Elicit syntactic & morphological dataElicit syntactic & morphological data AVENUE, EXPEDITIONAVENUE, EXPEDITION

Elicit word lists for language surveyElicit word lists for language survey WordSurvWordSurv

Page 11: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

SL ResourcesSL Resources

Related language adaptationRelated language adaptation CARLACARLA

Projection across word alignmentProjection across word alignment GIZA++, Multi-Align, Parser ProjectionGIZA++, Multi-Align, Parser Projection

Page 12: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

NLGNLG

Rich interlinguaRich interlingua TBTA (Tod Allman)TBTA (Tod Allman)

Statistical fluency enhancementStatistical fluency enhancement (Sebastian Varges)(Sebastian Varges)

Page 13: MT Challenges Ed Kenschaft University of Maryland kensch at umd.

The Limits of NLPThe Limits of NLP

Who knows?Who knows? TMI-2004TMI-2004


Recommended