+ All Categories
Transcript
Page 1: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

CS388:NaturalLanguageProcessingLecture1:Introduc9on

GregDurre<

Administrivia

‣ Coursewebsite:h<p://www.cs.utexas.edu/~gdurre</courses/sp2021/cs388.shtml

‣ Piazza:linkonthecoursewebsite

‣ Myofficehours:Officehours:Tuesday1pm-2pm,Wednesday3:30pm-4:30pm

‣ TA:XiYe.SeecoursewebsiteforOHs

‣ Lecture:TuesdaysandThursdays9:30am-10:45am

‣ Note:myOHstodayare12:30pm-1:30pm

‣ Gradescope:youshould’vego<enanemail

CourseRequirements

‣ 391LMachineLearning(orequivalent)

‣ 311or311HDiscreteMathforComputerScience(orequivalent)

‣ Addi9onalpriorexposuretoprobability,linearalgebra,op9miza9on,linguis9cs,andNLPusefulbutnotrequired

‣ Pythonexperience

‣ Mini1isoutnow(dueJanuary28),pleaselookatitsoon

‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethantheprojects,whicharesmaller-scalethanthefinalproject)

What’sthegoalofNLP?‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystemsSiri,what’syourfavoritekindof

movie?

Ilikesuperheromovies!

What’scomeoutrecently?

TheAvengers

Page 2: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Ques9onAnsweringWhenwasAbrahamLincolnborn?

February12,1809

Name Birthday

Lincoln,Abraham 2/12/1809Washington,George 2/22/1732

Adams,John 10/30/1735

Theparkhasatotaloffivevisitorcenters

five

HowmanyvisitorscentersarethereinRockyMountainNa9onalPark?

maptoBirthdayfield

MachineTransla9on

中共中央政治局7⽉30⽇召开会议,会议分析研究当前经济形势,部署下半年经济⼯作。

ThePoli9calBureauoftheCPCCentralCommi<eeheldamee9ngonJuly30toanalyzeandstudythecurrenteconomicsitua9onandplaneconomicworkinthesecondhalfoftheyear.

People’sDaily,August10,2020

ThePoli9calBureauoftheCPCCentralCommi<ee July30 holdamee9ng

Translate

Automa9cSummariza9on

OneofNewAmerica’swriterspostedastatementcri9calofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

providemissingcontext

paraphrasetoprovideclarity

compresstext

NLPAnalysisPipeline

Syntac9cparses

Coreferenceresolu9on

En9tydisambigua9on

Discourseanalysis

Summarize

Extractinforma9on

Answerques9ons

Iden9fysen9ment

‣ NLPisaboutbuildingthesepieces!Translate

TextAnalysis ApplicaDonsText AnnotaDons

‣ Allofthesecomponentsaremodeledwithsta9s9calapproachestrainedwithmachinelearning

Page 3: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjecDve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

SNP

VP

VBZ NNflightstoMiami

λx.flight(x)∧dest(x)=Miami

Howdoweusetheserepresenta9ons?

Labels

SequencesTrees

TextAnalysisText

‣ Mainques9on:Whatrepresenta9onsdoweneedforlanguage?Whatdowewanttoknowaboutit?

‣ Boilsdownto:whatambigui9esdoweneedtoresolve?

ApplicaDons

Treetransducers(formachinetransla9on)

Extractsyntac9cfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

Whyislanguagehard?(andhowcanwehandlethat?)

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedaoerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

‣ >5datasetsinthelasttwoyearsexaminingthisproblemandcommonsensereasoning

‣ Referen9alambiguity

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyadvocatedviolence

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyfearedviolence

Page 4: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

LanguageisAmbiguous!

examplecredit:DanKlein

‣ Syntac9candseman9cambigui9es:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect

TeacherStrikesIdleKids

BanonNudeDancingonGovernor’sDesk

IraqiHeadSeeksArms

N N V NN V ADJ N

NP

NP

PP

PP PPN

NP

PN

body/posi9on

body/weapon

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibili9eswhichareresolvedpragma9cally

‣ Combinatoriallymanypossibili9es,manyyouwon’tevenregisterasambigui9es,butsystemss9llhavetoresolvethem

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeau9fulItisreallybeau9fuloutsideHemakestrulybeau9fulItfactactuallyhandsome

‣ Lotsofdata!

slidecredit:DanKlein

Whatdoweneedtounderstandlanguage? Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinforma9onbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

metaphor;“approves”

DepartmentofJus6ce

‣ Whatisagreenlight?Howdoweunderstandwhat“greenligh9ng”does?

‣ Needcommonsenseknowledge

Page 5: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

McMahanandStone(2015)Gollandetal.(2010)

Whatdoweneedtounderstandlanguage?

‣ Linguis9cstructure

‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

‣ However,linguis9cstellsuswhatphenomenaweneedtobeabletodealwithandgivesushintsabouthowlanguageworks

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Whattechniquesdoweuse?(tocombinedata,knowledge,linguis9cs,etc.)

PretrainingUnsup:topicmodels,grammarinduc9on

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2020

earlieststatMTworkatIBM

Largelyrule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,Sen9ment

Semi-sup,structuredpredic9on

Neural

Page 6: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Supervisedvs.Unsupervised

‣ Supervisedtechniquesworkwellonveryli<ledata(evenneuralnetworks)

annota9on(twohours!)

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnota9on”Garre<eandBaldridge(2013)

be<ersystem!

‣ Fullyunsupervisedtechniqueshavefallenoutoffavor

Petersetal.(2018),Devlinetal.(2019)

Pretraining

‣ Languagemodeling:predictthenextwordinatext P (wi|w1, . . . , wi�1)

P(w|Iwanttogoto)= 0.01Hawai’i

P(w|theac9ngwashorrible,Ithinkthemoviewas)=

‣ Modelunderstandssomesen9ment?

0.005LA0.0001class

‣ Trainaneuralnetworktodolanguagemodelingonmassiveunlabeledtext,fine-tuneittodo{tagging,sen9ment,ques9onanswering,…}

:usethismodelforotherpurposes

0.1bad0.001good

Interpretability

Wallace,Gardner,SinghInterpretabilityTutorialatEMNLP2020

‣ Whenwehavecomplexmodels,howdoweunderstandtheirdecisions?

Interpretability‣ Whenwehavecomplexmodels,howdoweunderstandtheirdecisions?

‣ “A<ribu9on”:understandwhatpartsoftheinputcontributetoapredic9on

‣ WhywasitclassAinsteadofclassB?

‣ Whatisthe“counterfactual”scenarioweareconsidering(thefoil)?

IdrankteabecauseIdon’tlikecoffeeIdrankteabecauseIwasthirsty(JacoviandGoldberg,2020))

‣ Datasetbiases:doesourdatahaveflawsthatpreventthemodelfromdoingtherightthing?

‣ Probing:whatrepresenta9onsgetlearnedindeepmodels?

Page 7: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresenta9onsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguis9cstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

‣ NLPencompassesallofthesethings

NLPvs.Computa9onalLinguis9cs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputa9onaltoolstostudylanguage

Hamiltonetal.(2016)

NLPvs.Computa9onalLinguis9cs

‣ Computa9onaltoolsforotherpurposes:literarytheory,poli9calscience…

Bamman,O’Connor,Smith(2013)

Outline

MLandstructuredpredic9onforNLP{

Neuralnets {

Page 8: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Outline:Syntax+Seman9cs Outline:Applica9ons

Ethics

https://toxicdegeneration.allenai.org/

‣ E.g.,“toxicdegenera9on”:systemscangenerate{racist,sexist,…}content

‣Wewilltouchonethicalissuesthroughoutthecourse

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ Makeyoua“producer”ratherthana“consumer”ofNLPtools

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheac9veresearchtopicsin2021?

‣ Thefourassignmentsshouldteachyouwhatyouneedtoknowtounderstandnearlyanysystemintheliterature(e.g.:state-of-the-artNERsystem=project1+mini2+BERT,basicMTsystem=project2)

‣ Understandhowtolookatlanguagedataandapproachlinguis9cphenomena

Page 9: Administrivia CS388: Natural Language Processing Lecture 1 ...gdurrett/courses/sp2021/lectures/lec1-4pp.pdf‣Hector Levesque (2011): “Winograd schema challenge” (named aoer Terry

Assignments

‣ Twominis(10%each),twoprojects(20%each)‣ Implementa9on-oriented,withanopen-endedcomponenttoeach

‣ Mini1(classifica9on)isoutNOW

‣ 1weekforminis,~2weeksperproject,5“slipdays”forautoma9cextensions

‣ Grading:‣ Minis:largelygradedbasedoncodeperformance

‣ Projects:gradedonamixofcodeperformance,writeup,extension

Theseprojectsrequireunderstandingoftheconcepts,abilitytowriteperformantcode,andabilitytothinkabouthowtodebugcomplexsystems.Theyarechallenging,sostartearly!

Assignments

‣ Finalproject(40%)

‣ Groupsof2preferred,1ispossible‣ (Brief!)proposaltobeapprovedbymebythemidpointofthesemester

‣ Wri<eninthestyleandtoneofanACLpaper

A climate conducive to learning and creating knowledge is the right of every person in our community. Bias, harassment and discrimination of any sort have no place here. If you notice an incident that causes concern, please contact the Campus Climate Response Team: diversity.utexas.edu/ccrt

The College of Natural Sciences is steadfastly committed to enriching and transformative educational and research experiences for every member of our community. Find more resources to support a diverse, equitable and welcoming community within Texas Science and share your experiences at cns.utexas.edu/diversity

Conduct Survey(onInstapoll)1. Name

2. Fillin:Iama[CS/____][PhD/masters/undergrad]inyear[12345+]3. Writeonereasonyouwanttotakethisclassoronethingyouwanttogetoutofit

4. Oneinteres9ngfactaboutyourself,orwhatyouliketodoinyourspare9me


Top Related