ML4NLPIntroduction to Syntax and Parsing
CS 590NLP
Dan GoldwasserPurdue University
“I shot an elephant in my pajamas”
“How he got into my pajamas,I'll never know.”
Groucho Marx
“I shot an elephant in my pajamas.”
“I shot an elephant in the zoo.”
Parsing
Language isnotjustastreamofwords,Wewanttorepresentlinguisticstructure!
• Twoviews:– ConstituencyParsing:Buildahierarchicalphrasestructure
– DependencyParsing:Showwordsdependencies(dependency=modifiers,orarguments)
Dependency and constituent parsing
ConstituencyParsing“thegoodolddays..”:Writeaprogram!
S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP
NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a
Can you parse: “Mary saw John with binoculars”?How about: “Mary saw a dog with binoculars”?
ConstituencyParsing
• “thegoodolddays..”:Writeaprogram!• Canyoutreatnaturalandformallanguages inthesameway?
ConstituencyParsing
“Fed raises interests rates 0.5% in effort to control inflation”
Howmanyparsingoptions?Millionofpossibleparsesinabroad-coverage grammar
This explains the popularity of statistical methods in NLP : millions of options, but only a few are likely!
CFG
• Formally:acontext-freegrammaris:• G=(T,N,S,R)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)
• AgrammarGgeneratesalanguageL
PCFG
• Formally:aprobabilistic context-freegrammar:• G=(T,N,S,R,P)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)– P:probabilityfunctionoverR
∀X ∈ N,!
X→Y ∈R
P (X → Y ) = 1
PCFGexample
S à NP VP 1.0NP à Det N 0.6NP à NP PP 0.4VP à V NP 0.6VP à VP PP 0.4PP à P NP 1.0
NP à John 0.3NP à Mary 0.3N à binoculars 0.2N à dog 0.2V à saw 0.2P à with 0.4Det à a 0.4…
P(Tree) – The probability of a tree is the product of the probabilities of the rules used to generate it.
ConstituencyParsingasStructuredLearning
• CanyoudefinePCFGasastructuredpredictionproblem?– Howwouldyoudefinethepredictionproblem?–Whatarethedependenciesinthemodel?–Whataretheparametersyouneedtolearn?–Whataregoodfeatures?
CKYAlgorithm
• Dynamicprogrammingalgorithmforparsing• GivenaCFGGandastringw,determinecanGparsew?
• WeassumeGisaCNF:– Eachrulehasatmost2symbolsontherightAàBCorAà B,Aà a
• ThealgorithmmaintainsatriangularDPtable.– Bottomrow:parsestringsofsize1– Secondbottomrow:parsestringsofsize2..– Toprow:parsetheentiresentence!
DPTriangularTable
X1,5X1,4 X2,5X1,3 X2, 4 X3,5X1,2 X2,3 X3, 4 X4,5X1,1 X2,2 X3,3 X4,4 X5,5w1 w2 w3 w4 w5
Tableforstring‘w’thathaslength5
ConstructingTheTriangularTable
{B} {A,C} {A,C} {B} {A,C}b a a b a
Ateachpointconsiderpossible rules,andtheirprobabilities
Sà AB|BCA à BA|aBà CC|bCà AB|a
CYKAlgorithm
• SimilartoViterbi,keepbackpointers toreconstructtheparsetreefromthetable
• Theruleactivationscoringfunctiondependsonthedependencyassumptions– Lookattheprobabilityofpreviousrowactivation,andconsidertheconditionalprobabilityoftherulegivenpreviousparses.
• OverallComplexity:O(n3 G)
Option2:dependencyparsing
• Keyidea:syntacticstructurerepresentedasrelationsbetweenlexicalitems,calleddependencies
Dependencies can be represented as a graph, where the nodes are words, and edges are dependencies,which are: (1) directional (2) often typed
Mainverb Obj
Subj Det
Root Mary ate a banana
Non-ProjectiveStructure
Root Mary ate a banana today that was yellow
Projectivestructure:nocrossingedges
Arethosereallyneeded?
However,wewilloftenassumenon-projectivity.- Itmakeslifeeasier- Itdoesn’toccuroften
DependencyParsing
• Weneedtoanswertwoquestions–
– Howcanyoumakeparsingdecisions?(i.e.,inference)
– Howdoyoulearntheparameterstoscorethesedecisions?
DependencyParsing
• Parser:foreachword,choosewhichotherworditdependson.– Youcanchoosetolabelthesedependencies
• Constraints:– Onlyoneroot– Nocycles
èEssentially, force a tree structure• AdditionalConstraints:nocrossingdependencies
ParsingApproaches
• Twocompetingapproaches–
• ExactInference:mostlygraphbasedalgorithms(e.g.,spanningtree)butalsoILP
• Approximateinference:lineartimetransitionparser
• Transitionbasedparserareverypopular!
GreedyTransition-basedDependencyParsing(Nivre’03)
• Parseroperatesbymaintainingtwodatastructures:– StackandBuffer
• Parsingisdoneviaasequenceofoperations.– Pushingthewordsfromthebuffertothestack,andassociatingdependencyedgesoverwordsinthestack.
– Shift:takeawordfromthetopofthebuffer,andputitontopofthestack.
– Left/RightArc:Dependencyoperationsassociatedependenciesbetweenwordsinthestack,andremovethedependentwordfromthestack.
• Parsingsequenceendswhenthestackandbufferareempty
[Root] Ilikelettuce
Stack Buffer
[Root]I likelettuce
[Root]Ilike lettuce
[Root]like lettuce
shift
shift
Left Arc
[Root]likelettuce
Shift
[Root]like
Right Arc
[Root]
Right Arc
LearningforDependencyParsing
• Learningatransitionparser:usedatatobuildascoringfunctionforparseroperations.– Thisshouldsoundfamiliar..
• Breakthedataintoasequenceofdecisions,andtraina”next-state”function.– Locallearning,(greedy)inferenceonlyattesttime.
• Traditionally:SVM,LR,..– Essentiallyamulticlassclassifieroverthecurrentstateoftheparser.
LearningforDependencyParsing
• Whichfeatureswouldyouconsider?
DeepLearningforDependencyParsing(Chen,Manning’14)
FromSyntaxtoSemantics
• Thesyntacticstructureofthesentencecapturessomesemanticproperties(e.g.,recallthePPattachmentproblem).
• However,itdoesnotaccountformeaninginabroadsense.
• Interestingquestion:Whatisacomputationalmodelformeaning?
Whatisthemeaningofmeaning?
Semantic
• Wedistinguishbetween:
• Lexicalsemantics:meaningofwords
• Compositionalsemantics:Combineindividualunitstoformthemeaningoflargerunits.
Applications
• Semanticsiswhatwereallycareabout:– Questionanswering– Intelligentinformationaccess– Robotcommunication– Summarization– …
Deepvs.ShallowSemantics
• Surprisingly,wetendtobelievethatdogsunderstandmuchmore!
• Similarly– shallowNLPperformssurprisinglywell!
Semantics
• Wewilllookattwosemanticsproblems:
– FormalSemanticRepresentation:findamathematical representationofmeaning
– InformationExtraction:“machinereading”view-populateaDBoffactsfromtext.
FormalModelsofMeaning
FormalModelsofMeaning
• Formalmodelforcompositionalsemantics:– Formthesemanticsofparents,basedonthesemanticsofthechildren
• Weassumeadictionaryofitems:– Constant symbols– Functions
NP VPJohn Smokes
S
Smokes(John)
ConstantsandFunctions
• Constants– PurdueUniversity– BarakObama
• Properties:– Red(x),Small(x),..
• Relations:– Love(x,y),PresidentOf(BarakObama,USA)
Generatingameaningrepresentation
• Weassumethatsyntacticrepresentationsandcompositionalsemanticsarehighlydependent
• Simplealgorithm:– Createaparsetree– Findsemanticrepresentationofwords(leafnodes)– Combinesemanticsofchildrenintoparentnode(bottomup)
SemanticParser
• Keyidea:augmentsyntacticparsingwithmeaning!
S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP
NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a
SemanticParser
• Keyidea:augmentsyntacticparsingwithmeaning!
S [SM] à NP [NPM] VP [VPM], Apply(SM, NPM,VPM)VP [VPM] à V [VM] NP [NPM], Apply(VPM, VM,NPM)NP [NPM] à N [NM], Apply(NP, N)..
N [john]à JohnN [mary]à MaryV [λx.saw(x)] à saw
Thisissometimescalledalexicon
Generatingameaningrepresentation
LearningforSemanticParsing
• Similartosyntacticparsing,therearemanypossiblemeaningderivationforasinglesentence– Eachcouldresultinadifferentsemanticrepresentation!
• Tohelpusdisambiguatethemeaningofasentence,wecandefineprobabilisticparser:
N [john]à John 0.4N [mary]à Mary 0.2V [λx.saw(x)] à saw 0.9
GroundedLanguageInterpretation• Compositionality:constructingmeaningbycomposing
themeaningoflowerlevelunits.– Lowestlevel(“leaves”)aretypicallyconstantsymbols
• Wheredothesymbolscomefrom?• Weassumeaworldmodel,providingtherelevantsetof
symbols– Peopleonyoursmartphone,transactionsinaDB,entitieson
wikipedia,realworldobjects(”pickupthatblock”)• Analyzingthedifficultyofsemanticinterpretation:
– Complexityoftheinputlanguage,complexityofthesetofsymbols,complexityoftheirmapping
GroundedLanguageInterpretation• ”pickupthegreenpiece”• ”pickupthegreenpiecethat’snexttothebluepiece”• ”pickupthegreenpiecethat’sshapedlikealettuce”• “pickupthegreenpiecethat’sattheleftendofthebottomrow.
AJointModelofLanguageandPerceptionforGroundedAttributeLearning.Matuszeket-al.2012
GroundedLanguageInterpretation
• Createameaningrepresentationcapturingthemappingfromlanguagetopreceptsintherealworld
The(probabilityof)truthvalueofthesepredictsdependsonrealworldgrounding
GroundedLanguageInterpretation
• Createascoringfunction,connectingthetworepresentations
NaturalLanguageCommunicationwithRobots.Bisket-al.2016
Worldrepresentation
GroundedLanguageInterpretation
• Twocompetingapproaches.– Createanexplicitmeaningrepresentation– Createascoringfunctionthatranksmeaningrepresentations(ortheiroutcomes)
• Wediscusseditinthecontextofgroundedrepresentations(“realworldobjects”)– Similardiscussionfordifferentsettings(e.g.,DBaccess).
• Whichonewillbeeasiertolearn?Whatkindofsupervisioneffortisneededineither?
Scalingup
Voters go to the polls in four states on Tuesday, with Michigan the biggest prize for both parties. Donald J. Trump seeks to strengthen his position as the Republican front-runner, while his rivals look to slow his drive toward the nomination. For the Democrats, Senator Bernie Sanders of Vermont faces a crucial test in his upstart campaign to derail Hillary Clinton.Here are some of the things we will be watching in the contests in Hawaii, Idaho, Michigan and Mississippi.
NYTimes article
MachineReading
• Morerealistictask:givenunstructuredtext,createstructuredknowledge
• SimpleExamples:– NamedEntityRecognition
• Morecomplicated:– Relationships betweenentities
IEExample
FortheDemocrats,SenatorBernieSandersofVermont facesacrucialtestinhisupstartcampaigntoderailHillaryClinton.HerearesomeofthethingswewillbewatchinginthecontestsinHawaii,Idaho,Michigan andMississippi.
BernieSandersis-ademocrat
BernieSandersis-fromVermont
RelationExtraction
• Wemakeadistinctionbetweenclosed andopen IE
• Closed:focusonasmallsetofrelations– Easytothinkaboutasasupervisedtask
• Open:findallrelations
RelationExtraction
• Populartask:ACE2003defined4types:– Role:member,owner,affiliate,client– Part:subsidiary,physicalpart-of,setmembership– At:location,based-in,residence– Social:parent,sibling,spouse
• Realisticsettings:Freebasehasthousandsofrelations!
BuildingRelationExtractors
• Simplepatternrecognition
Hearst(1992)
Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium,forlaboratoryorindustrialuse.
WhatdoesGelidium mean?
PatternbasedRelationExtraction
PatternbasedRelationExtraction
Bootstrapping• Simpleidea:– Givenasmallseedsetofrelations(e.,g byminingpatterns)
– AndALOTofunsupervisedtext– Findmentionsofrelation inthetext– Usementionstocomeupwithnewpatterns!
SupervisedRelationExtraction
• Givenasentence,findthelistofentities,andpredictifthereisarelation.
• Keyproblem:findingagoodfeaturerepresentation
Zhouetal.2005
ScalingupRE
• Keyproblem:realisticmachinereadingrequiresdealingwiththousandsofrelations.
• Directlyannotatingforthistaskisnotreasonable,howcanwescaleup?
• Keyidea: distantsupervision– SimilartoBootstrapping+Learning
DistantSupervision
• Assumewehaveacollectionofrelations– Easy!(e.g.,Freebase,Wikipedia,..)
• ..andthatiftwoentitiesappearinarelation,sentencescontainingthesetwoentitieswillexpressthisrelationship.
• Usesuchsentencesasnoisytrainingdata!
DistantSupervisionExample