The Landscape of Irish Language Technologymlp.computing.dcu.ie/mlp2017/docs/tlynn.pdf · The...

Post on 09-Aug-2020

0 views 0 download

transcript

The Landscape of Irish Language TechnologyTeresa LynnADAPT Centre, Dublin City University

The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

www.adaptcentre.ieIrish – a minority language

NationalLanguageFirstOfficialLanguage

Census:2016Population:4,761,865Abilitytospeak:1,761,420peopleDailyusage:73,803people

www.adaptcentre.ieIrish Language Features

WordOrder=VerbSubjectObject

English: `Isawtheboy’

Irish: Chonaic mé anbuachaill

Gloss: Saw Itheboy

www.adaptcentre.ieIrish Language Features

www.adaptcentre.ieIrish Language Features

Vowel Harmony

Caithim – `I spend’Casaim – `I turn’

Rithfinn – `I would run’D’íosfainn – `Iwould eat’

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

www.adaptcentre.ieSome Terminology Issues

Irish = minority language(spoken by the minority)

Irish = low-resourced language (lacking language tools and resources)

BUT

Does “low-resourced” always mean “minority”??

www.adaptcentre.ieTagalog (Philippines)

• 21millionL1speakers• 50millionL2speakers

Notaminoritylanguage…

…butisconsideredlow-resourced

www.adaptcentre.ieSome Terminology Issues

Irish = A minority European LanguageIrish = A low-resourced European Language

www.adaptcentre.ieIrish language technology survey

META-NET white paper series (Judge et al., 2012)

o EU-led studyo Survey of 31 EU languageso Language resources and technologies

www.adaptcentre.ie

MT

13

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish, Romanian

weak or no support

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician,

Greek, Icelandic, Irish, Latvian,Lithuanian, Maltese, Norwegian,

Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh

excellent

Czech, Dutch, Finnish, French, German,

Italian, Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek,

Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene,

Swedish

weak or no support

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian, Welsh

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian, Polish, Portuguese, Romanian, Slovak,

Slovene, Swedish

weak or no supportexcellent

English

good

Czech, Dutch, French, German, Hungarian,

Italian, Polish, Spanish, Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian, Finnish, Galician,

Greek, Norwegian, Portuguese, Romanian, Serbian, Slovak, Slovene

weak or no supportexcellent

Res

ourc

esTe

xt A

naly

sis

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese, Serbian,

Welsh

Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh

www.adaptcentre.ieExamples of existing resources

o Speechsynthesizer/ScreenReader

o Multipleelectronicdictionaries,terminologyDBs,

o POStagger/Morphologicalanalyser/stemmer

o POStaggedcorpus,Dependencytreebank,SpokenCorpus,ParallelData,Monolingualcorpus(30millionwords),Vicipéid (43karticles),DBpedia

o POStaggedTwittercorpus,POS-taggerforIrishtweets,

o Chunkingparser,statisticalparser

o BasicCALLsystems

o 2xMachineTranslationsystems(oneinusebyGovernmenttranslators)

www.adaptcentre.ieExamples of unfunded contributions (Kevin Scannell)

o Spell-checkerforIrish

o GrammarCheckerforIrish

o Localisation of:GNU/Linux,Mozilla,OpenOffice,Gmail,Facebook,Twitter

o Web-corpuscollection

o EnglishIrishSMT/Irish-ScotsGaelicSMT

o IndigenousTweetssite

o IrishWebcrawler

o WordNetforIrish

o Code.org inIrish

o PredictiveTextToolforIrish

www.adaptcentre.ieLanguage at Risk – in Digital Age

“PrintingPressresultedintheextinctionofmanyminorityandregionallanguages”

WilltechnologyhavethesameimpactonIrish?

www.adaptcentre.ieLanguage at Risk – in Digital Age

Needtoensurecontinuinglanguageusage…….throughtechnology

o Edutainmentpackageso Wordprocessingtoolso Webpagetranslationo Searchengineso Gameso Socialmedia

o Sociolinguisticstudyo Trackmisuse

Source:http://www.leuphana.de/institute/ies/llt2015.html

www.adaptcentre.ieDigital Strategy for the Irish Language 2017

Contributors:

o TeresaLynn DublinCityUniversityo JohnJudge DublinCityUniversityo ElaineUí Dhonnchadha TrinityCollegeDublino Neasa Ní Chiaráin TrinityCollegeDublino Ailbhe Ní Chasaide TrinityCollegeDublin

www.adaptcentre.ieDigital Strategy for the Irish Language 2017

LinguisticResources Corpora Knowledge

Bases NLPTools NLGTools

SpeechModels

SpeechSynthesis

SpeechRecognition

SpokenDialogueSystems

MachineTranslation

InformationRetrieval

StateandPublicUse CALL Disabilityand

Access

Synergies(Industryand

Public)

TopicsCovered:

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

"Uk mapEngland"byUKPhoenix79- Image:British IslesUnitedKingdom.svg.

www.adaptcentre.ieIrish on Twitter

2millionIrishlanguagetweets

www.adaptcentre.ie

Source:indigenoustweets.com

Irish on Twitter

Source:indigenoustweets.com

ant-amseo ant7ainseo chugainn bei 2ag partyáil lemuintir Ráth Daingin!Hopeyoure nottooscared#upthevillage

ant-amseo antseachtain seo chugainn,beidh tú ag partyáil lemuintir Ráth Daingin!Hopeyoure nottooscared#upthevillage

Basque:10,490,641tweets

Kapampangan:2,182,515tweets

Kiswahili:8,187,127tweets

Welsh:5,602,170tweets

Irish:1,718,687tweets

Frisian:905,259tweetsSetswana:787,990tweets

Asturianu:559,652tweets

Hausa:436,244tweets

Yorùbá:288,513tweets

Ikinyarwanda:355,397tweets

March2017

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

www.adaptcentre.ieCurrent Irish LT Projects at DCU

o Tapadóir SMT project (PhD student – Meghan Dowling)

o European Language Resource Coordination

o Code-switching in Irish tweets

o Universal Dependencies for Irish

www.adaptcentre.ieCurrent Irish LT Projects at DCU

GaelTech Project (2017-2021)

o Automatic Identification of Multiword Expressions (PhD student, Abigail Walsh)

o Irish User-Generated Content

o Dependency Treebank(s) expansion

www.adaptcentre.ieOutline

o Irish Language

o Status of Irish language technology

o Minority languages and social media

o Current Irish LT projects at DCU

o Conclusion

www.adaptcentre.ieConclusion

Landscape of Irish language technology has improved….How?

Influenced Government Policy through:o online useo demand for technologyo empirically demonstrating evolution of languageo starting off with pilot systems and demonstrate the benefits of LTo team up with other (similar) minority languageso engaging with larger NLP projects (e.g. UD, COST Action)o organise workshops for sharing knowledge/collaborations/networking

www.adaptcentre.ie

#GRMA

Go raibh maith agaibhThank you (pl)