+ All Categories
Home > Documents > Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and...

Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and...

Date post: 23-Jan-2021
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
45
Morphology 11-711 Algorithms for NLP 21 November 2017 – Part I (Some slides from Lori Levin, David Mortenson)
Transcript
Page 1: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Morphology

11-711AlgorithmsforNLP21November2017– PartI

(SomeslidesfromLoriLevin,DavidMortenson)

Page 2: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TypesofLexicalandMorphologicalProcessing

• Tokenization• Input:rawtext• Output:sequenceoftokensnormalizedforfurtherprocessing

• Recognition• Input:astringofcharacters• Output:isitalegalword?(yesorno)

• MorphologicalParsing• Input:aword• Output:ananalysisofthestructureoftheword

• MorphologicalGeneration• Input:ananalysisofthestructureoftheword• Output:aword

Page 3: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Butfirst:Whatisaword?

• Thethingsthatareinthedictionary?• Buthowdidthelexicographersdecidewhattoputinthedictionary?

• Thethingsbetweenspacesandpunctuation?• Thesmallestunitthatcanbeutteredinisolation?

• Youcouldsaythiswordinisolation:Unimpressively• Thisonetoo: impress• Butyouprobablywouldn’tsaytheseinisolation,unlessyouweretalkingaboutmorphology:• un• ive• ly

Page 4: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Sowhatisaword?

• Cangetprettytricky:• didn’t• would’ve• gonna• shoulda woulda coulda• Ima• blackboard(vs.schoolboard)• baseball(vs.golfball)• thepersonwholeft’s hat;JimandGregg’s apartment• acct.• LTI

Page 5: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

About1000pages.$139.99

Youdon’thavetoreadit.

Thepointisthatittakes1000pagesjusttosurveytheissuesrelatedtowhatwordsare.

Page 6: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Sowhatisaword?

• Itisuptoyouorthesoftwareyouuseforprocessingwords.• Takelinguisticsclasses.• Makegooddecisionsinsoftwaredesignandengineering.

Page 7: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Tokenization

Page 8: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Tokenization

Input:rawtextOutput:sequenceoftokens normalizedforeasierprocessing.

Page 9: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Tokenization

• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�

• ButGermantoo:Noun-nouncompounds:Gesundheitsversicherungsgesellschaften

• Spanishclitics:Darmelo• EvenEnglishhasissues,toasmalldegree:GreggandBob’shouse

Page 10: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Tokenization

• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�

• ButGermantoo:Noun-nouncompounds:Gesundheits-versicherungs-gesellschaften (health

insurancecompanies)• Spanishclitics:Darmelo• EvenEnglishhasissues,toasmalldegree:GreggandBob’shouse

Page 11: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Tokenization

• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�

• ButGermantoo:Noun-nouncompounds:Gesundheitsversicherungsgesellschaften

• Spanishclitics:Dar-me-lo(Togivemeit)• EvenEnglishhasissues,toasmallerdegree:GreggandBob’shouse

Page 12: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TokenizationInput:rawtext

Dr. Smith said tokenization of English is “harder than you’ve thought.” When in New York, he paid $12.00 a day for lunch and wondered what it would be like to work for AT&T or Google, Inc.

OutputfromStanfordParser:http://nlp.stanford.edu:8080/parser/index.jspwithpart-of-speechtags:

Dr./NNP Smith/NNP said/VBD tokenization/NN of/IN English/NNP is/VBZ ``/`` harder/JJR than/IN you/PRP 've/VBP thought/VBN ./. ''/’’When/WRB in/IN New/NNP York/NNP ,/, he/PRP paid/VBD $/$ 12.00/CD a/DT day/NN for/IN lunch/NN and/CC wondered/VBD what/WP it/PRP would/MD be/VB like/JJ to/TO work/VB for/IN AT&T/NNP or/CC Google/NNP ,/, Inc./NNP ./.

Page 13: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

MorphologicalPhenomena

Page 14: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

WhatisLinguisticMorphology?

• Morphologyisthestudyoftheinternalstructureofwords.

• Derivationalmorphology. Hownewwordsarecreatedfromexistingwords.• [grace]• [[grace]ful]• [un[grace]ful]]

• Inflectionalmorphology. Howfeaturesrelevanttothesyntacticcontextofawordaremarkedonthatword.• Thisexampleillustratesnumber(singularandplural)andtense(presentandpast).• Greenindicatesirregular.Blueindicateszeromarkingofinflection.Redindicatesregularinflection.• This student walks.• These studentswalk.• These students walked.

• Compounding. Creatingnewwordsbycombiningexistingwords• Withorwithoutspaces:surfboard,golfball,blackboard

Page 15: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Morphemes

• Morphemes.Minimalpairingsofformandmeaning.

• Roots. The“core”ofawordthatcarriesitsbasicmeaning.• apple :‘apple’• walk :‘walk’

• Affixes (prefixes,suffixes,infixes,andcircumfixes).Morphemesthatareaddedtoabase(arootorstem)toperformeitherderivationalorinflectionalfunctions.• un- :‘NEG’• -s :‘PLURAL’

Page 16: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

LanguageTypology

Page 17: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TypesofLanguages:

• Inorderofmorphologicalcomplexity:• Isolating(orAnalytic)• Fusional(orInflecting)• Agglutinative• Polysynthetic• Others

Page 18: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

IsolatingLanguages:ChineseLittlemorphologyotherthancompounding

• Chinese inflection• fewaffixes(prefixesandsuffixes):

• � "��� ������ mén:wǒmén,nǐmén,tāmén, tóngzhìménplural:we,you(pl.),theycomrades,LGBTpeople

• “suffixes”thatmarkaspect:- -zhě ‘continuousaspect’• Chinesederivation• /&� yìshùjiā ‘artist’

• Chineseisachampionintherealmofcompounding—upto80%ofChinesewordsareactuallycompounds.

( + 1 → (1

dú fàn dúfàn

‘poison,drug’ ‘vendor’ ‘drug trafficker’

Page 19: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

AgglutinativeLanguages:SwahiliVerbsinSwahilihaveanaverageof4-5morphemes,http://wals.info/valuesets/22A-swa

Swahili English

m-tu a-li-lala ‘Thepersonslept’

m-tu a-ta-lala ‘Thepersonwillsleep’

wa-tu wa-li-lala ‘Thepeopleslept’

wa-tu wa-ta-lala ‘Thepeople willsleep’

• Wordswrittenwithouthyphensorspacesbetweenmorphemes.• Orangeprefixesmarknounclass(likegender,exceptSwahili hasnineinsteadoftwoor

three).• Verbsagreewithnounsinnounclass.• Adjectivesalsoagreewithnouns.• Veryhelpfulinparsing.

• Blackprefixesindicatetense.

Page 20: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TurkishExampleofextremeagglutinationButmostTurkishwordshavearoundthreemorphemes

uygarlaştıramadıklarımızdanmışsınızcasına�(behaving)asifyouareamongthosewhomwewerenotabletocivilize�

uygar �civilized�+laş �become�+tır �causeto�+ama �notable�+dık pastparticiple+larplural+ımız firstpersonpluralpossessive(�our�)+dan ablativecase(�from/among�)+mış past+sınız secondpersonplural(�y�all�)+casına finiteverb→adverb(�asif�)

Page 21: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Operationalization

• operate(opus/opera+ate)• ion• al• ize• ate• ion

Page 22: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FusionalLanguages:Spanish

Singular Plural

1st 2nd 3rdformal 2nd

1st 2nd 3rd

Present am-o am-as am-a am-a-mos am-áis am-an

Imperfect am-ab-a am-ab-as am-ab-a am-áb-a-mos am-ab-ais am-ab-an

Preterit am-é am-aste am-ó am-a-mos am-asteis am-aron

Future am-aré am-arás am-ará am-are-mos am-aréis am-arán

Conditional am-aría am-arías am-aría am-aría-mos am-aríais am-arían

Page 23: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

PolysyntheticLanguages:Yupik

• Polysyntheticmorphologiesallowthecreationoffull“sentences”bymorphologicalmeans.• Theyoftenallowtheincorporationofnounsintoverbs.• Theymayalsohaveaffixesthatattachtoverbsandtaketheplaceofnouns.• YupikEskimountu-ssur-qatar-ni-ksaite-ngqiggte-uqreindeer-hunt-FUT-say-NEG-again-3SG.INDIC‘Hehadnotyetsaidagainthathewasgoingtohuntreindeer.’

Page 24: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Root-and-PatternMorphology:Arabic

• Root-and-pattern.A specialkindoffusional morphologyfoundinArabic,Hebrew,andtheircousins.• Rootusuallyconsistsofasequenceofconsonants.• Wordsarederivedand,tosomeextent,inflectedbypatternsofvowelsintercalatedamongtherootconsonants.• kitaab ‘book’• kaatib ‘writer;writing’• maktab ‘office;desk’• maktaba ‘library’

Page 25: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

OtherNon-Concatenative Morphological

Processes

Non-concatenativemorphology involvesoperationsotherthantheconcatenationofaffixeswithbases.• Infixation.Amorphemeisinsertedinsideanothermorphemeinsteadofbeforeorafterit.• Reduplication.Canbeprefixing,suffixing,andeveninfixing.

• Tagalog:• sulat (write,imperative)• susulat (reduplication)(write,future)• sumulat (infixing)(write,past)• sumusulat (infixingandreduplication)(write,present)

• Apophony,includingtheumlautinEnglishtooth→teeth;subtractivemorphology,includingthetruncation inEnglishnicknameformation(David→Dave);andsoon.• Tonechange;stressshift.Andmore...

Page 26: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Type-TokenCurvesFinnishisagglutinative

Iñupiaq ispolysynthetic

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Type

s

Tokens

Type-TokenCurves

English

Arabic

Hocąk

Inupiaq

Finnish

TypesandTokens:“Iliketowalk.Iamwalkingnow.Itookalongwalkearliertoo.”

Thetypewalk occurstwice.Sotherearetwotokensofthetypewalk.

Walking isadifferenttypethatoccursonce.

Page 27: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

MorphologicalProcessing

Page 28: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Recognizing thewordsofalanguage

• Input:astring(fromsomealphabet)• Output:isitalegalword? (yesorno)

Page 29: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FSAforEnglishNouns

Lexicon:

Note:“fox”becomespluralbyadding“es”not“s”.Wewillgettothatlater.

Page 30: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Finite-StateAutomaton

• Q:afinitesetofstates• q0� Q:aspecialstartstate• F� Q:asetoffinalstates• Σ:afinitealphabet• Transitions:

• Encodesaset ofstringsthatcanberecognizedbyfollowingpathsfromq0 tosomestateinF.

qiqjs� Σ*

......

Page 31: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FSAforEnglishAdjectives

Butnotethatthisacceptswordslike“unbig”.

Big,bigger,biggestHappy,happier,happiest,happilyUnhappy,unhappier,unhappiest,unhappilyClear,clearer,clearest,clearlyUnclear,unclearly

Cool,cooler,coolest,coollyRed,redder,reddestReal,unreal,really

Page 32: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FSAforEnglishDerivationalMorphology

Howbigdotheseautomataget?Reasonablecoverageofalanguagetakesanexpertabouttwotofourmonths.

Whatdoesittaketobeanexpert?Studylinguisticstogetusedtoallthecommonandnot-so-commonthingsthathappen,andthenpractice.

Page 33: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

MorphologicalParsing

Input:awordOutput:theword’sstem(s)andfeaturesexpressedbyothermorphemes.

Example: geese→goose+N+Plgooses→goose+V+3P+Sgdog→{dog+N+Sg,dog+V}leaves→{leaf+N+Pl,leave+V+3P+Sg}

Page 34: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

UpperSide/LowerSide

talk+Past

talked

FST

uppersideorunderlyingform

lowersideorsurfaceform

Page 35: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FiniteStateTransducers

• Q:afinitesetofstates• q0� Q:aspecialstartstate• F� Q:asetoffinalstates• ΣandΔ:twofinitealphabets• Transitions:

qiqj

s :ts� Σ*andt� Δ*

......

Page 36: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

MorphologicalParsingwithFSTs

Note�samesymbol�shorthand.

^denotesamorphemeboundary.

#denotesawordboundary.

Page 37: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

EnglishSpellingGettingbacktofox+s =foxes

Page 38: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TheEInsertionRuleasaFST

✏ ! e/

8<

:

s

x

z

9=

; ^ s#

Generateanormallyspelledwordfromanabstractrepresentationofthemorphemes:

Input:fox^s#(fox^εs#)Output:foxes#(foxεes#)

Page 39: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TheEInsertionRuleasaFST

✏ ! e/

8<

:

s

x

z

9=

; ^ s#

Parseanormallyspelledwordintoanabstractrepresentationofthemorphemes:

Input:foxes#(foxεes#)Output:fox^s#(fox^εs#)

Page 40: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

CombiningFSTs

parse

generate

Page 41: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

FSTOperations

Input:fox+N+plOutput:foxes#

Page 42: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

LanguageTypeComparisonwrt FSTs

• Morphologiesofalltypescanbeanalyzedusingfinitestatemethods.• Somepresentmorechallengesthanothers:• Analyticlanguages.Trivial,sincethereislittleornomorphology(otherthancompounding).• Agglutinatinglanguages.Straightforward—finitestatemorphologywas“made”forlanguageslikethis.• Polysyntheticlanguages.Similartoagglutinatinglanguages,butwithblurredlinesbetweenmorphologyandsyntax.• Fusional languages. Easyenoughtoanalyzeusingfinitestatemethodaslongasoneallows“morphemes”tohavelotsofsimultaneousmeaningsandoneiswillingtoemploysomeadditionaltricks.• Root-and-patternlanguages. Requiresomeveryclevertricks.

Page 43: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

Stemming(“PoorMan’sMorphology”)

Input:awordOutput:theword’sstem(approximately)

ExamplesfromthePorterstemmer:•-sses→-ss•-ies→i•-ss→s

Page 44: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

nonoahnob

nobilitynobisnoble

noblemannoblemennobleness

noblernobles

noblessenoblestnobly

nobodynocesnod

noddednoddingnoddlenoddlesnoddynods

nonoahnobnobilnobinoblnoblemannoblemennoblnoblernoblnoblessnoblestnoblinobodinocenodnodnodnoddlnoddlnoddinod

Page 45: Morphologytbergkir/11711fa17/morphology-F...•Affixes(prefixes, suffixes, infixes, and circumfixes). Morphemes that are added to a base (a root or stem) to perform either derivational

TheGoodNews

• Morethanalmostanyotherproblemincomputationallinguistics,morphologyisasolvedproblem(aslongasyoucanaffordtowriterulesbyhand).• Finitestatemethodsprovideasimpleandpowerfulmeansofgeneratingandanalyzingwords(aswellasthephonologicalalternationsthataccompanywordformation/inflection).• Finitestatemorphologyisoneofthegreatsuccessesofnaturallanguageprocessing.• OnebrilliantaspectofusingFSTsformorphology:thesamecode canhandlebothanalysis andgeneration.


Recommended