+ All Categories
Home > Documents > Recall: SQuAD CS388: Natural Language Processinggdurrett/courses/fa2019/... · Same en;ty Doc 3...

Recall: SQuAD CS388: Natural Language Processinggdurrett/courses/fa2019/... · Same en;ty Doc 3...

Date post: 02-Feb-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
12
CS388: Natural Language Processing Greg Durre8 Lecture 22: Ques;on Answering 2 Recall: SQuAD Single-document, single-sentence ques;on-answering task where the answer is always a substring of the passage Rajpurkar et al. (2016) Predict start and end indices of the answer in the passage Recall: Bidirec;onal A8en;on Flow Seo et al. (2016) Each passage word now “knows about” the query Recall: QA with BERT Devlin et al. (2019) What was Marie Curie the first female recipient of ? [SEP] One of the most famous people born in Warsaw was Marie … Predict start and end posi;ons of answer in passage No need for crazy BiDAF-style layers
Transcript
  • CS388:NaturalLanguageProcessing

    GregDurre8

    Lecture22:Ques;on 
Answering2

    Recall:SQuAD

    ‣ Single-document,single-sentenceques;on-answeringtaskwheretheanswerisalwaysasubstringofthepassage

    Rajpurkaretal.(2016)

    ‣ Predictstartandendindicesoftheanswerinthepassage

    Recall:Bidirec;onalA8en;onFlow

    Seoetal.(2016)

    Eachpassagewordnow“knowsabout”thequery

    Recall:QAwithBERT

    Devlinetal.(2019)

    WhatwasMarieCuriethefirstfemalerecipientof?[SEP]OneofthemostfamouspeopleborninWarsawwasMarie…

    ‣ Predictstartandendposi;onsofanswerinpassage‣ NoneedforcrazyBiDAF-stylelayers

  • Recall:SQuADSOTA

    ‣ HarderQAsecngsareneeded

    ‣ Performanceisverysaturated

    ThisLecture

    ‣ Retrieval-basedQA/mul;-hopQA

    ‣ ProblemsinQA,especiallyrelatedtoanswertypeoverficng

    ‣ NewQAfron;ers

    ProblemsinQA

    AdversarialSQuAD‣ SQuADques;onsareoeeneasy:“whatwasshetherecipientof?”passage:“…recipientofNobelPrize…”

    JiaandLiang(2017)

  • AdversarialSQuAD

    WhatwasMarieCuriethefirstfemalerecipientof?[SEP]…firstfemalerecipientoftheNobelPrize…

    ‣ BERTeasilylearnssurface-levelcorrespondenceslikethiswithself-a8en;on

    AdversarialSQuAD‣ SQuADques;onsareoeeneasy:“whatwasshetherecipientof?”passage:“…recipientofNobelPrize…”

    JiaandLiang(2017)

    ‣ Canwemakethemharderbyaddingadistractoranswerinaverysimilarcontext?

    ‣ Takeques;on,modifyittolooklikeananswer(butit'snot),thenappendittothepassage

    AdversarialSQuAD

    JiaandLiang(2017)

    ‣ Distractor“looks”moreliketheques;onthantherightanswerdoes,evenifen;;esarewrong

    WeaknesstoAdversaries

    JiaandLiang(2017)

    ‣ Performanceofbasicallyeverymodeldropstobelow60%(whenthemodeldoesn'ttrainonthese)

    ‣ BERTvariantsalsoweaktothesekindsofadversaries

    ‣ Unlikeotheradversarialmodels,wedon’tneedtocustomizetheadversarytothemodel;thissinglesentencebreakseverySQuADmodel

  • UniversalAdversarial“Triggers”

    Wallaceetal.(2019)

    ‣ Adding“whyhowbecausetokillamericanpeople”causesSQuADmodelstoreturnthisanswer10-50%ofthe;mewhengivena“why"ques;on

    ‣ Similara8acksonotherques;ontypeslike“who”

    ‣ SimilartoJiaandLiang,butinsteadaddthesameadversarytoeverypassage

    HowtofixQA?‣ Be8ermodels?

    ‣ Be8erdatasets

    ‣ Butamodeltrainedonweakdatawilloeens;llbeweaktoadversaries‣ TrainingonJia+Liangadversariescanhelp,butthereareplentyofothersimilara8ackswhichthatdoesn'tsolve

    ‣ Sameques;onsbutwithmoredistractorsmaychallengeourmodels

    ‣ HarderQAtasks‣ Askques;onswhichcannotbeansweredinasimpleway

    ‣ Nextup:retrieval-basedQAmodels

    ‣ Aeerwards:mul=-hopQAandotherQAsecngs

    RetrievalModels

    Open-domainQA

    ‣ SQuAD-styleQAisveryar;ficial,notreallyarealapplica;on

    ‣ RealQAsystemsshouldbeabletohandlemorethanjustaparagraphofcontext—theore;callyshouldworkoverthewholeweb?

    Q:WhatwasMarieCurietherecipientof?

    MarieCuriewasawardedtheNobelPrizeinChemistryandtheNobelPrizeinPhysics…

    MotherTeresareceivedtheNobelPeacePrizein…

    CuriereceivedhisdoctorateinMarch1895…Skłodowskareceivedaccoladesforherearlywork…

  • Open-domainQA

    ‣ SQuAD-styleQAisveryar;ficial,notreallyarealapplica;on

    ‣ RealQAsystemsshouldbeabletohandlemorethanjustaparagraphofcontext—theore;callyshouldworkoverthewholeweb?

    ‣ QApipeline:givenaques;on:

    ‣ RetrievesomedocumentswithanIRsystem

    ‣ ZeroinontheanswerinthosedocumentswithaQAmodel

    ‣ Thisalsointroducesmorecomplexdistractors(badanswers)andshouldrequirestrongerQAsystems

    DrQA

    Chenetal.(2017)

    ‣ Howoeendoestheretrievedcontextcontaintheanswer?(usesLucene)

    ‣ FullretrievalresultsusingaQAmodeltrainedonSQuAD:taskismuchharder

    RetrievalwithBERT

    Leeetal.(2019)

    ‣ Canwedobe8erthanasimpleIRsystem?

    ‣ EncodethequerywithBERT,pre-encodeallparagraphswithBERT,queryisbasicallynearestneighbors

    Problems

    Leeetal.(2019)

    ‣ManySQuADques;onsarenotsuitedtothe“open”secngbecausethey’reunderspecified

    ‣ SQuADques;onswerewri8enbypeoplelookingatthepassage—encouragesaques;onstructurewhichmimicsthepassageanddoesn’tlooklike“real”ques;ons

    ‣WheredidtheSuperBowltakeplace?

    ‣WhichplayerontheCarolinaPantherswasnamedMVP?

  • NaturalQues;ons

    Kwiatkowskietal.(2019)

    ‣ Ques;onsarosenaturally,unlikeSQuADques;onswhichwerewri8enbypeoplelookingatapassage.Thismakesthemmuchharder

    ‣ ShortanswerF1s<60,longanswerF1s

  • HotpotQA

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Sameen;ty

    Doc3

    Sameen;ty

    ExamplepickedfromHotpotQA[Yangetal.,2018]

    ‣Muchlongerandmoreconvolutedques;ons

    Mul;-hopReasoning

    TheOberoifamilyisanIndianfamilythatisfamousforitsinvolvementinhotels,namelythroughTheOberoiGroup

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc1

    Doc2

    Sameen;ty

    Sameen;ty

    ExamplepickedfromHotpotQA[Yangetal.,2018]

    Thisisanidealizedversionofmul;-hopreasoning.Domodelsneedtodothistodowellonthistask?

    Mul;-hopReasoning

    TheOberoifamilyisanIndianfamilythatisfamousforitsinvolvementinhotels,namelythroughTheOberoiGroup

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc1

    Doc2

    ExamplepickedfromHotpotQA(Yang2018)

    Modelcanignorethebridgingen;tyanddirectlypredicttheanswer

    Highlexicaloverlap

    Mul;-hopReasoning

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Sameen;ty

    Doc3

    Sameen;ty

    Nosimplelexicaloverlap.…butonlyonegovernmentposi;onappearsinthecontext!

    ExamplepickedfromHotpotQA[Yangetal.,2018]

  • Inves;ga;on

    Canamodeliden;fytheanswerwithonlyasetofcandidates?

    Canamodeliden;fywheretheanswerisinasinglehop?

    Governmentposi=on ChiefofProtocol,actress,singer

    OberoiFamily Delhi

    ChenandDurre8(2019)

    FindingtheanswerdirectlyQues2on:Whatgovernmentposi;onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    KaushikandLipton(2018)

    ChiefofProtocolbusinesswoman…actress

    MeetCorlissArcherisanAmericantelevisionsitcomthatairedonCBS…

    ShirleyTempleBlackwasanAmericanactress,businesswoman,andsinger…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStatesDoc1

    Doc2

    Doc3

    NoContextBaseline

    DotProduct

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?

    ChiefofProtocol businesswoman actress

    Ques;onEncoder

    AnswerEncoder

    ...

    ChiefofProtocol0.7

    businesswoman0.2

    actress0.1

    ChenandDurre8(2019)

    Accuracy

    30

    50

    70

    59.3

    67.466.464.8

    42.9

    38.8

    NoContextEn;ty-GCN

    CFC BAGMajority-candidate

    BiDAF

    ‣ SOTAmodelstrainedonthismaybelearningques;on-answercorrespondences,notmul;-hopreasoningasadver;sed

    Morethanhalfofques;onscanbeansweredwithoutevenusingthecontext!

    state-of-the-artweakbaselines

    ResultsonWikiHop

    NoContextEn;ty-GCN

    CFC BAGMajority-candidate

    BiDAF

    ChenandDurre8(2019)

  • Inves;ga;on

    Canamodeliden;fytheanswerwithonlyasetofcandidates?

    Canamodeliden;fywheretheanswerisinasinglehop?

    Governmentposi=on ChiefofProtocol,actress,singer

    OberoiFamily Delhi

    ChenandDurre8(2019)

    SentenceFactoredModel

    Findtheanswerbycomparingeachsentencewiththeques;onseparately!

    Ques%on:TheOberoifamilyispartofahotelcompanythathasaheadofficeinwhatcity?

    TheOberoiGroupisahotelcompanywithitsheadofficeinDelhi.

    Doc2

    FutureFibreTechnologiesafibertechnologiescompany…

    Doc3

    TheOberoifamilyisanIndianfamilythatis…

    Doc1

    ChenandDurre8(2019)

    SentenceFactoredModel

    TheOberoifamily…whatcity?

    TheOberoiGroup…inDelhi.

    FutureFibreTechnologiesisafibre…

    TheOberoifamily…

    Answerpredic;on:Delhi ‣Soemaxoverallsentencesistheonlycross-sentenceinterac;on

    BiDAF BiDAFBiDAF

    ChenandDurre8(2019)

    BiDAF++ QFE GRN DFGN SentenceFactored

    F1

    0

    35

    70

    50.8

    69.769.068.1

    58.7

    Asimplesinglesentencereasoningmodelcansolvemorethanhalfques;onsonHotpotQA.

    ResultsonHotpotQA

    BiDAF++ QFE GRN DFGN SentenceFactored

    ChenandDurre8(2019)

  • OtherWork

    ‣ Minetal.ACL2019“Composi;onalQues;onsdonotNecessitateMul;-hopReasoning”

    ‣ FocusesjustonHotpotQA

    ‣ Addi;onallytriestoadversariallyhardenHotpotagainstthesea8acks.Somelimitedsuccess,butdoesn'tsolvetheproblem

    Ques;onAnsweringwithChains

    ChainExtractor

    QAmodel(BERT)

    FinalAnswerSpan

    ReasoningChain

    ‣ Maybewecanstrengthenourmodelstoavoidtheseweaknesses.Forcethemtoexplicitlyextractareasoningchaintomakethembe8er

    Q:Whatgovernmentposi2onwasheld…

    Sent1 Sent2

    …Shebeganherdiploma=ccareer…

    AKissforCorlisswas…

    ShirleyTempleBlackwasa…

    KissandTellisacomedyfilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    Asanadult,sheservedasChiefofProtocoloftheUnitedStates

    Sent5

    Chenetal.(2019)

    Ques;onAnsweringwithChainsQues%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?Answer:ChiefofProtocol

    ShirleyTempleBlackwasanAmericanactress,businesswoman,anddiplomat…

    Asanadult,sheservedastheChiefofProtocoloftheUnitedStates…

    Shebeganherdiploma=ccareerin1969,whensherepresented…

    KissandTellisafilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    “AKissforCorliss”isasequeltothefilm“KissandTell”.

    ItstarsShirleyTempleinherfinalstarringrole…

    DO

    C 1

    DO

    C 2

    DO

    C 3

    SharedEn;ty

    ShirleyTemple

    CorlissArcher

    ReasoningChain1

    In-DocCoref

    ‣ Strongconnec;onbetweentheen;;esusedhereChenetal.(2019)

    Ques;onAnsweringwithChains

    SharedEn;ty

    ShirleyTemple

    ReasoningChain2

    In-DocCoref

    KissandTell

    ‣Morespecula;vethantheotherchainbuts;llleadstotheanswer

    Ques%on:Whatgovernmentposi=onwasheldbythewomanwhoportrayedCorlissArcherinthefilmKissandTell?Answer:ChiefofProtocol

    ShirleyTempleBlackwasanAmericanactress,businesswoman,anddiplomat…

    Asanadult,sheservedastheChiefofProtocoloftheUnitedStates…

    Shebeganherdiploma=ccareerin1969,whensherepresented…

    KissandTellisafilminwhich17-year-oldShirleyTempleactsasCorlissArcher.

    “AKissforCorliss”isasequeltothefilm“KissandTell”.

    ItstarsShirleyTempleinherfinalstarringrole…

    DO

    C 1

    DO

    C 2

    DO

    C 3

    Chenetal.(2019)

  • ChainSupervision

    ‣ Extractpseudogoldchainsbasedon:

    SharedEn;ty

    ShirleyTemple

    CorlissArcher

    In-DocCoref

    ‣Within-documentcoreference:wedon’trunacoreferencesystembutinsteadlinkallsentenceswithinaparagraph

    ‣ Shareden;;es:enableconnec;onsbetweendifferentsources

    ‣ Giventhesechains,welearnamodeltoextractthem.Attest2me,noannota2onsareneeded

    Chenetal.(2019)

    ChainExtrac;onandQA‣ ParagraphsareencodedwithBERTtocomputesentencerepresenta;ons

    BERT

    BERT

    ‣ Apointernetworkselectsasequenceofsentences

    s3

    s3

    s7

    s3

    STOP

    ‣ AfinalBERTmodel 
thenextractsananswerspanfromoneormorechains

    s3 s7

    s3 s8s1 s2

    BERT Ans

    Chenetal.(2019)

    QAResults

    ‣ HighperformanceonWikiHop(*pastsystemsdidn'tuseBERT)andHotpot‣ AlsolargegainsonhardexamplesinHotpotQA(ourmodelfrompart1couldnotfindanswersinasinglehop)

    WikiHop(English)

    50

    60

    70

    80

    90

    76.571.470.970.669.067.6

    50

    60

    70

    80

    90

    74.169.768.169.6

    HotpotQA(English)

    DecompRC QFE Ours

    Accuracy

    F1

    GCN BAG CFC JDReader DynSAN Ours DFGN

    ‣ Ongoingwork:howcanreasoningchainsbetakenbelowthesentencelevelandbemorestrongly;edtointerpretablelogicalinference?

    NewTypesofQA

  • DROP

    Duaetal.(2019)

    ‣ Ques;ontypes:subtrac;on,comparison(whichdidhevisitfirst),coun;ngandsor;ng(whichkickerkickedmorefieldgoals),

    ‣ Invitesadhocsolu;ons(structurethemodelaroundpredic;ngdifferencesbetweennumbers)

    ‣ Onethreadofresearch:let’sbuildQAdatasetstohelpthecommunityfocusonmodelingpar;cularthings

    Mul;QA

    TalmorandBerant(2019)

    ‣MaybeweshouldjustlookatlotsofQAdatasetsinstead?

    ‣ BERTtrainedonSQuADgets


Recommended