LexicalSemanticsandWordSenseDisambiguation
Announcements• Midtermsampleques/onsonwebsite• Nextclass:midtermreviewforpartoftheclass.PostyourwishesfortopicsforthereviewonPiazza
• HW1gradesout.Meanis81.Nicegoing!• Followingtopics:seman/cparsing,thentodistributedseman/csandwordembeddings,neuralnets.
Polysemy• ThebankisconstructedfromredbrickIwithdrewthemoneyfromthebank• Arethosethesamesense?• OrconsiderthefollowingWSJexample• Whilesomebanksfurnishspermonlytomarriedwomen,othersarelessrestric/ve• Whichsenseofbankisthis?• Isitdis/nctfrom(homonymouswith)theriverbanksense?• Howaboutthesavingsbanksense?
Polysemy• Asinglelexemewithmul/plerelatedmeanings(bankthebuilding,bankthefinancialins/tu/on)• Mostnon-rarewordshavemul/plemeanings• Thenumberofmeaningsisrelatedtoitsfrequency• Verbstendmoretopolysemy• Dis/nguishingpolysemyfromhomonymyisn’talwayseasy(ornecessary)
MetaphorandMetonymy• Specifictypesofpolysemy• Metaphor:• GermanywillpullSloveniaoutofitseconomicslump.• Ispent2hoursonthathomework.• Metonymy• TheWhiteHouseannouncedyesterday.• Thischaptertalksaboutpart-of-speechtagging• Bank(building)andbank(financialins/tu/on)
Howdoweknowwhenawordhasmorethanonesense?• ATISexamples• Whichflightsservebreakfast?• DoesAmericaWestservePhiladelphia?
• The“zeugma”test:
• ?DoesUnitedservebreakfastandSanJose?
Synonyms• Wordthathavethesamemeaninginsomeorallcontexts.• filbert/hazelnut• couch/sofa• big/large• automobile/car• vomit/throwup• Water/H20
• Twolexemesaresynonymsiftheycanbesuccessfullysubs/tutedforeachotherinallsitua/ons• Ifsotheyhavethesameproposi+onalmeaning
Synonyms• Buttherearefew(orno)examplesofperfectsynonymy.• Whyshouldthatbe?• Evenifmanyaspectsofmeaningareiden/cal• S/llmaynotpreservetheacceptabilitybasedonno/onsofpoliteness,slang,register,genre,etc.
• Example:• WaterandH20
Somemoreterminology• Lemmasandwordforms• Alexemeisanabstractpairingofmeaningandform• Alemmaorcita+onformisthegramma/calformthatisusedtorepresentalexeme.• Carpetisthelemmaforcarpets• Dormiristhelemmaforduermes.
• Specificsurfaceformscarpets,sung,duermesarecalledwordforms
• Thelemmabankhastwosenses:• Instead,abankcanholdtheinvestmentsinacustodialaccountintheclient’sname• Butasagricultureburgeonsontheeastbank,theriverwillshrinkevenmore.
• Asenseisadiscreterepresenta/onofoneaspectofthemeaningofaword
Synonymyisarelationbetweensensesratherthanwords
• Considerthewordsbigandlarge• Aretheysynonyms?• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?
• Howabouthere:• MissNelson,forinstance,becameakindofbigsistertoBenjamin.
• ?MissNelson,forinstance,becameakindoflargesistertoBenjamin.
• Why?• bighasasensethatmeansbeingolder,orgrownup• largelacksthissense
Antonyms• Sensesthatareoppositeswithrespecttoonefeatureoftheirmeaning• Otherwise,theyareverysimilar!• dark/light• short/long• hot/cold• up/down• in/out• Moreformally:antonymscan• defineabinaryopposi/onoratoppositeendsofascale(long/short,fast/slow)• Bereversives:rise/fall,up/down
Hyponymy• Onesenseisahyponymofanotherifthefirstsenseismorespecific,deno/ngasubclassoftheother• carisahyponymofvehicle• dogisahyponymofanimal• mangoisahyponymoffruit
• Conversely• vehicleisahypernym/superordinateofcar• animalisahypernymofdog• fruitisahypernymofmango
superordinate vehicle fruit furniture mammal
hyponym car mango chair dog
Hypernymymoreformally• Extensional:• Theclassdenotedbythesuperordinate• extensionallyincludestheclassdenotedbythehyponym
• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB
• Hyponymyisusuallytransi/ve• (AhypoBandBhypoCentailsAhypoC)
II.WordNet• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadic/onary
• Versionsforotherlanguagesareunderdevelopment
Category Unique Forms
Noun 117,097
Verb 11,488
Adjective 22,141
Adverb 4,601
WordNet
• Whereitis:• hdps://wordnet.princeton.edu/
FormatofWordnetEntries
WordNetNounRelations
WordNetVerbRelations
WordNetHierarchies
Howis“sense”deJinedinWordNet?• Thesetofnear-synonymsforaWordNetsenseiscalledasynset(synonymset);it’stheirversionofasenseoraconcept• Example:chumpasanountomean• ‘apersonwhoisgullibleandeasytotakeadvantageof’
• Eachofthesesensessharethissamegloss• ThusforWordNet,themeaningofthissenseofchumpisthislist.
Wordnetexample
WordSenseDisambiguation
WordSenseDisambiguation(WSD)• Given• awordincontext,• Afixedinventoryofpoten/alwordsenses
• decidewhichsenseofthewordthisis.• English-to-SpanishMT• InventoryissetofSpanishtransla/ons
• SpeechSynthesis• Inventoryishomographswithdifferentpronuncia/onslikebassandbow
• Automa/cindexingofmedicalar/cles• MeSH(MedicalSubjectHeadings)thesaurusentries
TwovariantsofWSDtask• LexicalSampletask• Smallpre-selectedsetoftargetwords• Andinventoryofsensesforeachword• All-wordstask• Everywordinanen/retext• Alexiconwithsensesforeachword• Sortoflikepart-of-speechtagging• Excepteachlemmahasitsowntagset
Approaches• Supervised• Semi-supervised• Unsupervised
• Dic/onary-basedtechniques• Selec/onalAssocia/on
• Lightlysupervised• Bootstrapping• PreferredSelec/onalAssocia/on
SupervisedMachineLearningApproaches
• Supervisedmachinelearningapproach:• atrainingcorpusof?• usedtotrainaclassifierthatcantagwordsinnewtext• Justaswesawforpart-of-speechtagging,sta/s/calMT.
• Summaryofwhatweneed:• thetagset(“senseinventory”)• thetrainingcorpus• Asetoffeaturesextractedfromthetrainingcorpus• Aclassifier
SupervisedWSD1:WSDTags• What’satag?
WordNet} hdp://www.cogsci.princeton.edu/cgi-bin/webwn
WordNetBassThenoun``bass''has8sensesinWordNet1. bass-(thelowestpartofthemusicalrange)2. bass,basspart-(thelowestpartinpolyphonicmusic)3. bass,basso-(anadultmalesingerwiththelowestvoice)4. seabass,bass-(fleshoflean-fleshedsaltwaterfishofthefamilySerranidae)5. freshwaterbass,bass-(anyofvariousNorthAmericanlean-fleshedfreshwaterfishesespeciallyof
thegenusMicropterus)6. bass,bassvoice,basso-(thelowestadultmalesingingvoice)7. bass-(thememberwiththelowestrangeofafamilyofmusicalinstruments)8. bass-(nontechnicalnameforanyofnumerousediblemarineandfreshwaterspiny-finnedfishes)
Inventoryofsensetagsforbass
SupervisedWSD2:Getacorpus
• Lexicalsampletask:• Line-hard-servecorpus-4000examplesofeach• Interestcorpus-2369sense-taggedexamples
• Allwords:• Seman+cconcordance:acorpusinwhicheachopen-classwordislabeledwithasensefromaspecificdic/onary/thesaurus.• SemCor:234,000wordsfromBrownCorpus,manuallytaggedwithWordNetsenses
• SENSEVAL-3compe//oncorpora-2081taggedwordtokens
SupervisedWSD3:Extractfeaturevectors
• Weaver(1955)• Ifoneexaminesthewordsinabook,oneata/measthroughanopaquemaskwithaholeinitonewordwide,thenitisobviouslyimpossibletodetermine,oneata/me,themeaningofthewords.[…]Butifonelengthenstheslitintheopaquemask,un/lonecanseenotonlythecentralwordinques/onbutalsosayNwordsoneitherside,thenifNislargeenoughonecanunambiguouslydecidethemeaningofthecentralword.[…]Theprac/calques/onis:``WhatminimumvalueofNwill,atleastinatolerablefrac/onofcases,leadtothecorrectchoiceofmeaningforthecentralword?''
• dishes• bass
• washingdishes.• simpledishesincluding• convenientdishesto• ofdishesand• freebasswith• poundbassof• andbassplayer• hisbasswhile
• “Inourhouse,everybodyhasacareerandnoneofthemincludeswashingdishes,”hesays.• Inher/nykitchenathome,Ms.Chenworksefficiently,s/r-fryingseveralsimpledishes,includingbraisedpig’searsandchckenliverswithgreenpeppers.• Postquickandconvenientdishestofixwhenyourinahurry.• Japanesecuisineoffersagreatvarietyofdishesandregionalspecial/es
• Weneedmoregoodteachers–rightnow,thereareonlyahalfadozenwhocanplaythefreebasswithease.• Thoughs/llafarcryfromthelake’srecord52-poundbassofadecadeago,“youcouldfilletthesefishagain,andthatmadepeoplevery,veryhappy.”Mr.Paulsonsays.• Anelectricguitarandbassplayerstandofftooneside,notreallypartofthescene,justasasortofnodtogringoexpecta/onsagain.• LowecaughthisbasswhilefishingwithproBillLeeofKilleen,Texas,whoiscurrentlyin144thplacewithtwobassweighing2-09.
Featurevectors• Asimplerepresenta/onforeachobserva/on(eachinstanceofatargetword)• Vectorsofsetsoffeature/valuepairs• I.e.filesofcomma-separatedvalues
• ThesevectorsshouldrepresentthewindowofwordsaroundthetargetHowbigshouldthatwindowbe?
Twokindsoffeaturesinthevectors
• Colloca+onalfeaturesandbag-of-wordsfeatures• Colloca+onal• Featuresaboutwordsatspecificposi/onsneartargetword• Ovenlimitedtojustwordiden/tyandPOS
• Bag-of-words• Featuresaboutwordsthatoccuranywhereinthewindow(regardlessofposi/on)• Typicallylimitedtofrequencycounts
Examples• Exampletext(WSJ)• Anelectricguitarandbassplayerstandofftoonesidenotreallypartofthescene,justasasortofnodtogringoexpecta/onsperhaps• Assumeawindowof+/-2fromthetarget
Examples• Exampletext• Anelectricguitarandbassplayerstandofftoonesidenotreallypartofthescene,justasasortofnodtogringoexpecta/onsperhaps• Assumeawindowof+/-2fromthetarget
Collocational
• Posi/on-specificinforma/onaboutthewordsinthewindow• guitarandbassplayerstand• [guitar,NN,and,CC,player,NN,stand,VB]• Wordn-2,POSn-2,wordn-1,POSn-1,Wordn+1POSn+1…• Inotherwords,avectorconsis/ngof• [posi/onnword,posi/onnpart-of-speech…]
Bag-of-words• Informa/onaboutthewordsthatoccurwithinthewindow.• Firstderiveasetoftermstoplaceinthevector.• Thennotehowoveneachofthosetermsoccursinagivenwindow.
Co-OccurrenceExample• Assumewe’vesedledonapossiblevocabularyof12wordsthatincludesguitarandplayerbutnotandandstand
• guitarandbassplayerstand• [0,0,0,1,0,0,0,0,0,1,0,0]• Whicharethecountsofwordspredefinedase.g.,• [fish,fishing,viol,guitar,double,cello…
ClassiJiers• OncewecasttheWSDproblemasaclassifica/onproblem,thenallsortsoftechniquesarepossible• NaïveBayes(theeasiestthingtotryfirst)• Decisionlists• Decisiontrees• Neuralnets• Supportvectormachines• Nearestneighbormethods…
ClassiJiers• Thechoiceoftechnique,inpart,dependsonthesetoffeaturesthathavebeenused• Sometechniquesworkbeder/worsewithfeatureswithnumericalvalues• Sometechniquesworkbeder/worsewithfeaturesthathavelargenumbersofpossiblevalues• Forexample,thefeaturethewordtotheleAhasafairlylargenumberofpossiblevalues
NaïveBayes• ŝ=p(s|V),or
• WheresisoneofthesensesSpossibleforawordwandVtheinputvectoroffeaturevaluesforw• Assumefeaturesindependent,soprobabilityofVistheproductofprobabili/esofeachfeature,givens,so• p(V)sameforanyŝ
• Then
)|1()|( s
n
jv jpsVp ∏
==
)|1()(maxargˆ s
n
jv jpsp
Sss ∏
=∈=
)()()|(maxarg
VpspsVp
Ss∈maxargSs∈
• Howdowees/matep(s)andp(vj|s)?• p(si)ismax.likelihoodes/matefromasense-taggedcorpus(count(si,wj)/count(wj))–howlikelyisbanktomean‘financialins/tu/on’overallinstancesofbank?
• P(vj|s)ismax.likelihoodofeachfeaturegivenacandidatesense(count(vj,s)/count(s))–howlikelyisthepreviouswordtobe‘river’whenthesenseofbankis‘financialins/tu/on’
• Calculatetakethehighestscoringsenseasthemostlikelychoice
)|1()(maxargˆ s
n
jv jpsp
Sss ∏
=∈=
NaïveBayesTest• Onacorpusofexamplesofusesofthewordline,naïveBayesachievedabout73%correct
• Good?
DecisionLists:anotherpopularmethod• Acasestatement….
LearningDecisionLists• Restricttheliststorulesthattestasinglefeature(1-decisionlistrules)• Evaluateeachpossibletestandrankthembasedonhowwelltheywork.• Gluethetop-Nteststogetherandcallthatyourdecisionlist.
Yarowsky
• Onabinary(homonymy)dis/nc/onusedthefollowingmetrictorankthetests
• Thisgivesabout95%onthistest…
€
P(Sense1 |Feature)P(Sense2 |Feature)
WSDEvaluationsandbaselines• Invivoversusinvitroevalua/on• Invitroevalua/onismostcommonnow• Exactmatchaccuracy• %ofwordstaggediden/callywithmanualsensetags
• Usuallyevaluateusingheld-outdatafromsamelabeledcorpus• Problems?• Whydowedoitanyhow?
• Baselines• Mostfrequentsense• TheLeskalgorithm
MostFrequentSense• Wordnetsensesareorderedinfrequencyorder• So“mostfrequentsense”inwordnet=“takethefirstsense”• SensefrequenciescomefromSemCor
Ceiling• Humaninter-annotatoragreement• Compareannota/onsoftwohumans• Onsamedata• Givensametaggingguidelines
• Humanagreementsonall-wordscorporawithWordnetstylesenses• 75%-80%
UnsupervisedMethodsWSD:Dictionary/Thesaurusmethods• TheLeskAlgorithm• Selec/onalRestric/ons
SimpliJiedLesk
OriginalLesk:pinecone
CorpusLesk• Addcorpusexamplestoglossesandexamples• Thebestperformingvariant
DisambiguationviaSelectionalRestrictions• “Verbsareknownbythecompanytheykeep”• Differentverbsselectfordifferentthema/croles
washthedishes(takeswashable-thingaspa/ent)servedeliciousdishes(takesfood-typeaspa/ent)
• Method:anotherseman/cadachmentingrammar• Seman/cadachmentrulesareappliedassentencesaresyntac/callyparsed,e.g.VP-->VNPVàserve<theme>{theme:food-type}
• Selec/onalrestric/onviola/on:noparse
• Butthismeanswemust:• Writeselec/onalrestric/onsforeachsenseofeachpredicate–oruseFrameNet• Servealonehas15verbsenses
• Obtainhierarchicaltypeinforma/onabouteachargument(usingWordNet)• Howmanyhypernymsdoesdishhave?• Howmanywordsarehyponymsofdish?
• Butalso:• Some/messelec/onalrestric/onsdon’trestrictenough(Whichdishesdoyoulike?)• Some/mestheyrestricttoomuch(Eatdirt,worm!I’lleatmyhat!)
• Canwetakeasta/s/calapproach?
Semi-supervisedBootstrapping• Whatifyoudon’thaveenoughdatatotrainasystem…• Bootstrap• Pickawordthatyouasananalystthinkwillco-occurwithyourtargetwordinpar/cularsense• Grepthroughyourcorpusforyourtargetwordandthehypothesizedword• Assumethatthetargettagistherightone
Bootstrapping• Forbass• Assumeplayoccurswiththemusicsenseandfishoccurswiththefishsense
Sentencesextractingusing“Jish”and“play”
Wheredotheseedscomefrom?1) Handlabeling2) “Onesenseperdiscourse”:• Thesenseofawordishighlyconsistentwithina
document-Yarowsky(1995)• Truefortopicdependentwords• NotsotrueforotherPOSlikeadjec/vesand
verbs,e.g.make,take• Krovetz(1998)“Morethanonesenseper
discourse”arguesitisn’ttrueatallonceyoumovetofine-grainedsenses
3) Onesensepercolloca/on:• Awordreoccurringincolloca/onwiththesame
wordwillalmostsurelyhavethesamesense.Slide adapted from Chris Manning
StagesintheYarowskybootstrappingalgorithm
Problems• GiventhesegeneralMLapproaches,howmanyclassifiersdoIneedtoperformWSDrobustly• Oneforeachambiguouswordinthelanguage
• Howdoyoudecidewhatsetoftags/labels/sensestouseforagivenword?• Dependsontheapplica/on
WordNetBass• Taggingwiththissetofsensesisanimpossiblyhardtaskthat’sprobablyoverkillforanyrealis/capplica/on
1. bass-(thelowestpartofthemusicalrange)2. bass,basspart-(thelowestpartinpolyphonicmusic)3. bass,basso-(anadultmalesingerwiththelowestvoice)4. seabass,bass-(fleshoflean-fleshedsaltwaterfishofthefamilySerranidae)5. freshwaterbass,bass-(anyofvariousNorthAmericanlean-fleshedfreshwaterfishesespeciallyofthegenus
Micropterus)6. bass,bassvoice,basso-(thelowestadultmalesingingvoice)7. bass-(thememberwiththelowestrangeofafamilyofmusicalinstruments)8. bass-(nontechnicalnameforanyofnumerousediblemarineandfreshwaterspiny-finnedfishes)
SensevalHistory• ACL-SIGLEXworkshop(1997)• YarowskyandResnikpaper
• SENSEVAL-I(1998)• LexicalSampleforEnglish,French,andItalian
• SENSEVAL-II(Toulouse,2001)• LexicalSampleandAllWords• Organiza/on:Kilkgarriff(Brighton)
• SENSEVAL-III(2004)• SENSEVAL-IV->SEMEVAL(2007)• SEMEVAL(2010)• SEMEVAL2017:hdp://alt.qcri.org/semeval2017/index.php?id=tasks
SLIDE ADAPTED FROM CHRIS MANNING
WSDPerformance• Varieswidelydependingonhowdifficultthedisambigua/ontaskis• Accuraciesofover90%arecommonlyreportedonsomeoftheclassic,ovenfairlyeasy,WSDtasks(pike,star,interest)• Sensevalbroughtcarefulevalua/onofdifficultWSD(manysenses,differentPOS)• Senseval1:morefinegrainedsenses,widerrangeoftypes:• Overall:about75%accuracy• Nouns:about80%accuracy• Verbs:about70%accuracy
Summary• LexicalSeman/cs• Homonymy,Polysemy,Synonymy• Thema/croles• Computa/onalresourceforlexicalseman/cs• WordNet• Task• Wordsensedisambigua/on• Next:seman/cparsing