+ All Categories
Home > Documents > Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12...

Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12...

Date post: 07-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
11
CS388: Natural Language Processing Greg Durre8 Lecture 12: Dependency I dependency syntax coordina@on Administrivia Project 1 graded, discussion at end of lecture Mini 2 due tonight Final project proposals due next Tuesday Recall: Cons@tuency Tree-structured syntac@c analyses of sentences Nonterminals (NP, VP, etc.) as well as POS tags (bo8om layer) Structured is defined by a CFG Recall: CKY He wrote a long report on Mars NP PP NP Find argmax P(T|x) = argmax P(T, x) Dynamic programming: chart maintains the best way of building symbol X over span (i, j) Loop over all split points k, apply rules X -> Y Z to build X in every possible way Cocke-Kasami-Younger i j k X Z Y
Transcript
Page 1: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

CS388:NaturalLanguageProcessing

GregDurre8

Lecture12:DependencyI dependency

syntax

coordina@on

Administrivia

‣ Project1graded,discussionatendoflecture

‣ Mini2duetonight

‣ FinalprojectproposalsduenextTuesday

Recall:Cons@tuency

‣ Tree-structuredsyntac@canalysesofsentences

‣ Nonterminals(NP,VP,etc.)aswellasPOS tags(bo8omlayer)

‣ StructuredisdefinedbyaCFG

Recall:CKY

He wrote a long report on Mars

NPPP

NP

‣ FindargmaxP(T|x)=argmaxP(T,x)

‣ Dynamicprogramming:chartmaintainsthebestwayofbuildingsymbolXoverspan(i,j)

‣ Loopoverallsplitpointsk,applyrulesX->YZtobuild Xineverypossibleway

Cocke-Kasami-Younger

i jk

X

ZY

Page 2: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Recall:Top-downParsing

‣ Dynamicprogrammingversion:

‣ Greedytop-downversion:ateachstage,predictsplitpointkandlabell

(bestwayofbuildingiandjinvolvesmaxingoversplitpointandasinglelabel)

‣ Canscoresplitpointsandalsolabels

Outline

‣ Dependencyrepresenta@on,contrastwithcons@tuency

‣ Projec@vity

‣ Graph-baseddependencyparsers

DependencyRepresenta@on

LexicalizedParsing

S(ran)

NP(dog)

VP(ran)

PP(to)

NP(house)

DT(the) NN(house)TO(to)VBD(ran)DT(the) NN(dog)the housetoranthe dog

Page 3: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

DependencyParsing

DT NNTOVBDDT NNthe housetoranthe dog

‣ Dependencysyntax:syntac@cstructureisdefinedbythesearcs‣ Head(parent,governor)connectedtodependent(child,modifier)‣ EachwordhasexactlyoneparentexceptfortheROOTsymbol,dependenciesmustformadirectedacyclicgraph

ROOT

‣ POStagssameasbefore,usuallyrunataggerfirstaspreprocessing

DependencyParsing

DT

NN

TO

VBD

DT

NN

the

house

to

ran

the

dog

‣ S@llano@onofhierarchy!Subtreesobenalignwithcons@tuents

DependencyParsing

DT NNTOVBDDT NNthe housetoranthe dog

‣ Canlabeldependenciesaccordingtosyntac@cfunc@on

det

‣ Majorsourceofambiguityisinthestructure,sowefocusonthatmore(labelingseparatelywithaclassifierworkspre8ywell)

nsubj

pobj

detprep

Dependencyvs.Cons0tuency:PPA5achment

‣ Cons@tuency:severalruleproduc@onsneedtochange

Page 4: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

thechildrenatethecakewithaspoon

‣ Dependency:oneword(with)assignedadifferentparent

Dependencyvs.Cons0tuency:PPA5achment

‣Morepredicate-argumentfocusedviewofsyntax

‣ “What’sthemainverbofthesentence?Whatisitssubjectandobject?”—easiertoanswerunderdependencyparsing

‣ Cons@tuency:ternaryruleNP->NPCCNP

Dependencyvs.Cons0tuency:Coordina0on

dogsinhousesandcats

‣ Dependency:firstitemisthehead

Dependencyvs.Cons0tuency:Coordina0on

dogsinhousesandcats

‣ Coordina@onisdecomposedacrossafewarcsasopposedtobeingasingleruleproduc@onasincons@tuency

‣ Canalsochooseandtobethehead‣ Inbothcases,headworddoesn’treallyrepresentthephrase—cons@tuencyrepresenta@onmakesmoresense

[dogsinhouses]andcats dogsin[housesandcats]

StanfordDependencies‣ Designedtobeprac@callyusefulforrela@onextrac@on

Standard Collapsed

Billsonportsandimmigra@onweresubmi8edbySenatorBrownback,RepublicanofKansas

Page 5: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Dependencyvs.Cons@tuency

‣ Dependencyisobenmoreusefulinprac@ce(modelspredicateargumentstructure)

‣ PPa8achmentisbe8ermodeledunderdependency

‣ Coordina@onisbe8ermodeledundercons@tuency

‣ Slightlydifferentrepresenta@onalchoices:

‣ Dependencyparsersareeasiertobuild:no“grammarengineering”,nounaries,easiertogetstructureddiscrimina@vemodelsworkingwell

‣ Dependencyparsersareusuallyfaster

‣ Dependenciesaremoreuniversalcross-lingually

UniversalDependencies‣ Annotatedependencieswiththesamerepresenta@oninmanylanguages

h8p://universaldependencies.org/

English

Bulgarian

Czech

Swiss

Projec@vity

DT

NN

TO

VBD

DT

NN

the

house

to

ran

the

dog

‣ Anysubtreeisacon@guousspanofthesentence<->treeisprojec/ve

Projec@vity‣ Projec@ve<->no“crossing”arcs

dogsinhousesandcats thedograntothehouse

credit:LanguageLog

‣ Crossingarcs:

Page 6: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Projec@vityinotherlanguages

credit:Pitleretal.(2013)

‣ (SwissGermanalsohasfamousnon-context-freeconstruc@ons)

‣ SwissGermanexample

Projec@vity

Pitleretal.(2013)

‣Manytreesinotherlanguagesarenonprojec@ve

‣ Numberoftreesproduceableunderdifferentformalisms

Projec@vity

‣Manytreesinotherlanguagesarenonprojec@ve

‣ Someotherformalisms(thatarehardertoparsein),mostusefuloneis1-Endpoint-Crossing

‣ Numberoftreesproduceableunderdifferentformalisms

Pitleretal.(2013)

Graph-BasedParsing

Page 7: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

DefiningDependencyGraphs

‣ Wordsinsentencex,treeTisacollec@onofdirectededges(parent(i),i)foreachwordi

‣ Eachwordhasexactlyoneparent.Edgesmustformaprojec@vetree

‣ Log-linearCRF(discrimina@ve):

‣ Exampleofafeature=I[head=to&modifier=house](moreinafewslides)

the housetoranthe dogROOT

P (T |x) = exp

X

i

w>f(i, parent(i),x)

!

‣ Parsing=iden@fyparent(i)foreachword

GeneralizingCKY

wrote a long report on Mars

45

4

2 5

‣ score(2,7,4)=max(score(2,7,4),newscore)

‣ newscore=chart(2,5,4)+chart(5,7,5)+edgescore(4->5)‣ DPchartwiththreedimensions:start,end,andhead,start<=head<end

‣ Timecomplexityofthis?

‣ Manyspuriousderiva/ons:canbuildthesametreeinmanyways…needabe8eralgorithm

4=report5=on

4 7

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Completeitems:headisat“tallend”,maybemissingchildrenontallside‣ Incompleteitems:arcfrom“tall”to“short”end,wordonshortendmayalsobemissingchildren

‣ Cubic-@mealgorithm

‣Maintaintwodynamicprogrammingchartswithdimension[n,n,2]:

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

‣ Completeitem:allchildrenarea8ached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,mays@llexpectchildren

‣ Taketwoadjacentcompleteitems,addarcandbuildincompleteitem

= or

+ =

‣ Takeanincompleteitem,completeit(othercaseissymmetric)

Page 8: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

1)Buildincompletespan

2)Promotetocomplete

3)Buildincompletespan

+

=

+

or

=

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

+

=

+

or

=4)Promotetocomplete

Eisner’sAlgorithm:O(n3)

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣We’vebuiltlebchildrenandrightchildrenofranascompleteitems

‣ A8achingtoROOTmakesanincompleteitemwithlebchildren,a8acheswithrightchildrensubsequentlytofinishtheparse

Eisner’sAlgorithm

the ran to the housedogROOTthe ran to the housedogROOT

Rightcomplete

Lebcomplete

Rightincomplete

Lebincomplete

Page 9: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Eisner’sAlgorithm

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ Eisner’salgorithmdoesn’thavesplitpointambigui@eslikeCKYdoes

‣ Lebandrightchildrenarebuiltindependently,headsareedgesofspans

‣ Chartsarenxnx2becauseweneedtotrackarcdirec@on/lebvsright

Eisner:

n5

BuildingSystems

‣ Canimplementdecodingandmarginalcomputa@onusingEisner’salgorithmtomax/sumoverprojec@vetrees

‣ Conceptuallythesameasinference/learningforsequen@alCRFsforNER,canalsousemargin-basedmethods

FeaturesinGraph-BasedParsing

‣ Dynamicprogramexposestheparentandchildindices

‣ McDonaldetal.(2005)—conjunc@onsofparentandchildwords+POS,POSofwordsinbetween,POSofsurroundingwords

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

‣ HEAD=TO&MOD=NN‣ HEAD=TO&MOD-1=the

‣ HEAD=TO&MOD=house‣ ARC_CROSSES=DT

f(i, parent(i),x)

Higher-OrderParsing

KooandCollins(2009)

‣ Trackaddi@onalstateduringparsingsowecanlookat“grandparents”(andsiblings).O(n4)dynamicprogramoruseapproximatesearch

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

f(i, parent(i), parent(parent(i)),x)

Page 10: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

BiaffineNeuralParsing‣ NeuralCRFsfordependencyparsing:letc=LSTMembeddingofi,p=LSTMembeddingofparent(i).score(i,parent(i),x)=pTUc

DozatandManning(2017)

(numwordsxhiddensize) (numwordsxnumwords)

LSTMlooksatwordsandPOS

Evalua@ngDependencyParsing‣ UAS:unlabeleda8achmentscore.Accuracyofchoosingeachword’sparent(ndecisionspersentence)

‣ Log-linearCRFparser,decodingwithEisneralgorithm:91UAS

‣ LAS:addi@onallyconsiderlabelforeachedge

‣ Higher-orderfeaturesfromKooparser:93UAS

‣ BestEnglishresultswithneuralCRFs(DozatandManning):95-96UAS

HPSG

PollardandSag(1994),ZhouandZhao(2019)

‣ Head-drivenphrasestructuregrammar(HPSG):verycomplexgrammarformalismwhichannotateslargefeaturestructuresovertree

‣ Veryli8leworkonHPSGinNLP

Parsingwith“HPSG”

ZhouandZhao(2019)

‣ Jointmodelofcons@tuencyanddependencycombiningideasfromDozat+ManningandSternetal.

Page 11: Administrivia CS388: Natural Language Processing Lecture ...gdurrett/courses/fa2019/lectures/lec12 … · ‣Coordina@on is decomposed across a few arcs as opposed to being a single

Parsingwith“HPSG”

ZhouandZhao(2019)

‣ SlightlystrongerresultsthanDozat+Manning,significantlybe8erresultsonChinese

Takeaways

‣ Dependencyparsingalsohasefficientdynamicprogramsforinference

‣ Dependencyformalismprovidesanalterna@vetocons@tuency,par@cularlyusefulinhowportableitisacrosslanguages

‣ CRFs+neuralCRFs(again)workwell

Proj1Results

JiamingChen:82.46F1

Po-YiChen:82.02F1

Ting-YuYen:81.57F1

Allothers<81

‣ WordPairfeatures,largerwindowforPOStagextrac@on([-2,2])

‣ Alsolargerwindowanddatashufflinginbetweenepochs

‣ Citygaze8eer,genericdaterecognizer

PrakharSingh:81.54F1

‣ UnregularizedAdagradworkedbest


Recommended