Learningtoreasonbyreadingtextandansweringquestions
MinjoonSeoNaturalLanguageProcessingGroup
UniversityofWashingtonMay26,2017
@Kakao Brain
Whatisreasoning?
SimpleQuestionAnsweringModel
Whatis“Hello”inFrench? Bonjour.
Examples
• Mostneuralmachinetranslationsystems(Choetal.,2014;Bahdanau etal., 2014)• Needveryhighhiddenstatesize(~1000)• Noneedtoquerythedatabase(context)à veryfast
• Mostdependency,constituencyparser(Chenetal.,2014;Kleinetal.,2003)• Sentimentclassification(Socher etal.,2013)
• Classifyingwhetherasentenceispositiveornegative• Mostneuralimageclassificationsystems
• Thequestionisalways“Whatisintheimage?”
• Mostclassificationsystems
SimpleQuestionAnsweringModel
Whatis“Hello”inFrench? Bonjour.
Problem:parametricmodelhasfinitecapacity.
“Youcan’tevenfitasentenceintoasinglevector”-DanRoth
QAModelwithContext
English French
Hello Bonjour
Thankyou Merci
Whatis“Hello”inFrench? Bonjour.
Context(KnowledgeBase)
Examples
• WikiQA(Yangetal.,2015)• QASent(Wangetal.,2007)• WebQuestions (Berant etal.,2013)• WikiAnswer (Wikia)• Free917(Cai andYates,2013)
• Manydeeplearningmodelswithexternalmemory (e.g.MemoryNetworks)
QAModelwithContext
Eats IsA
(Amphibian, insect) (Frog, amphibian)
(insect,flower) (Fly,insect)
Whatdoesafrogeat? Fly
Context(KnowledgeBase)
Somethingismissing…
QAModelwithReasoningCapability
Eats IsA
(Amphibian, insect) (Frog, amphibian)
(insect,flower) (Fly,insect)
Whatdoesafrogeat? Fly
Context(KnowledgeBase)
FirstOrderLogicIsA(A, B)^IsA(C,D)^Eats(B,D)à Eats(A,C)
Examples
• Semanticparsing• GeoQuery (Krishnamurthyetal.,2013;Artzi etal.,2015)
• Sciencequestions• AristoChallenge(Clarketal.,2015)• ProcessBank (Berant etal.,2014)
• Machinecomprehension• MCTest (Richardsonetal.,2013)
“Vague”linebetweennon-reasoningQAandreasoningQA• Non-reasoning:• Therequiredinformationisexplicitinthecontext• Themodeloftenneedstohandlelexical/syntacticvariations
• Reasoning:• Therequiredinformationmaynot beexplicitinthecontext• Needtocombinemultiplefactstoderivetheanswer
• Thereisnoclearlinebetweenthetwo!
Ifourobjectiveisto“answer”difficultquestions…• Wecantrytomakethemachinemorecapableofreasoning(bettermodel)
• Wecantrytomakemoreinformationexplicitinthecontext(moredata)
OR
QAModelwithReasoningCapability
Eats IsA
(Amphibian, insect) (Frog, amphibian)
(insect,flower) (Fly,insect)
Whatdoesafrogeat? Fly
Context(KnowledgeBase)
FirstOrderLogicIsA(A, B)^IsA(C,D)^Eats(B,D)à Eats(A,C)
Whomakesthis?Tellmeit’s notme…
ReasoningQAModelwithUnstructuredData
Whatdoesafrogeat? Fly
Frogisanexampleofamphibian.Fliesareoneofthemostcommoninsectsaroundus.Insectsaregoodsourcesofproteinforamphibians.…
Contextinnaturallanguage
Iaminterestedin…
• Naturallanguageunderstanding• Naturallanguagehasdiversesurfaceforms(lexically,syntactically)
• Learningtoreadtextandreasonbyquestionanswering(dialog)• Textisunstructureddata• Derivingnewknowledgefromexistingknowledge
• End-to-endtraining• Minimizinghumanefforts
Reasoningcapability
NLUcapability End-to-end
AAAI2014EMNLP2015
ECCV2016CVPR2017
ICLR2017ACL2017
ICLR2017
Reasoningcapability
NLUcapability End-to-end
GeometryQA
GeometryQA
In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. What is the length of BD?
a) 2 b) 4 c) 6d) 8 e) 10
EB D
A
O
5
2
C
GeometryQAModel
WhatisthelengthofBD? 8
In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD.
FirstOrderLogic
Localcontext Globalcontext
Method
• Learntomapquestiontologicalform• Learntomaplocalcontexttologicalform• Textà logicalform• Diagramà logicalform
• Globalcontextisalreadyformal!• Manually defined• “IfAB=BC,then<CAB=<ACB”
• Solveronalllogicalforms• Wecreatedareasonablenumericalsolver
Mappingquestion/texttologicalform
IntriangleABC,lineDEisparallelwithlineAC,DBequals4,ADis8,andDEis5.FindAC.(a)9(b)10(c)12.5(d)15(e)17
B
D E
A C
IsTriangle(ABC) ∧ Parallel(AC, DE) ∧
Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))
TextInput
Logicalform
Difficulttodirectlymaptexttoalonglogicalform!
Mappingquestion/texttologicalformIntriangleABC,lineDEisparallelwithlineAC,DBequals4,ADis8,andDEis5.FindAC.(a)9(b)10(c)12.5(d)15(e)17
B
D E
A C
IsTriangle(ABC)Parallel(AC, DE)Parallel(AC, DB)Equals(LengthOf(DB), 4)Equals(LengthOf(AD), 8)Equals(LengthOf(DE), 5)Equals(4, LengthOf(AD))…
Over-generatedliterals0.960.910.740.970.940.940.31…
Textscores1.000.990.02n/an/an/an/a…
Diagramscores
Selectedsubset
TextInput
Logicalform
Ourmethod
IsTriangle(ABC) ∧ Parallel(AC, DE) ∧
Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))
Numericalsolver
Literal EquationEquals(LengthOf(AB),d) (Ax-Bx)2+(Ay-By)2-d2 =0Parallel(AB,CD) (Ax-Bx)(Cy-Dy)-(Ay-By)(Cx-Dx)=0PointLiesOnLine(B,AC) (Ax-Bx)(By-Cy)-(Ay-By)(Bx-Cx)=0Perpendicular(AB,CD) (Ax-Bx)(Cx-Dx)+(Ay-By)(Cy-Dy)=0
• Findthesolutiontotheequationsystem• Useoff-the-shelfnumericalminimizers(WalesandDoye,1997;Kraft,1988)
• Numericalsolvercanchoosenot toanswerquestion
• Translateliteralstonumericequations
Dataset• Trainingquestions(67questions,121sentences)• Seoetal.,2014• Highschoolgeometryquestions
• Testquestions (119questions,215sentences)• Wecollectedthem• SAT(UScollegeentranceexam)geometryquestions
• Wemanuallyannotatedthetextparseofallquestions
Results(EMNLP2015)
0
10
20
30
40
50
60
Textonly Diagramonly
Rule-based GeoS Studentaverage
SATScore(%
)
***0.25penaltyforincorrectanswer
Demo(geometry.allenai.org/demo)
Limitations
• Datasetissmall• Requiredlevelofreasoningisveryhigh• Alotofmanualefforts(annotations,ruledefinitions,etc.)• End-to-endsystemissimplyhopeless
• Collectmoredata?• Changetask?• Curriculumlearning?(Domorehopeful tasksfirst?)
Reasoningcapability
NLUcapability End-to-end
DiagramQA
DiagramQA
Q:Theprocessofwaterbeingheatedbysunandbecominggasiscalled
A:Evaporation
IsDQAsubsetofVQA?
• Diagramsandrealimagesareverydifferent• Diagramcomponentsaresimplerthanrealimages• Diagramcontainsalotofinformationinasingleimage• Diagramsarefew(whereasrealimagesarealmostinfinitelymany)
Problem
Whatcomesbeforesecondfeed? 8
Difficulttolatentlylearnrelationships
Strategy
Whatdoesafrogeat? Fly
DiagramGraph
DiagramParsing
QuestionAnswering
Attentionvisualization
Results(ECCV2016)
Method Trainingdata Accuracy
Random(expected) - 25.00
LSTM+CNN VQA 29.06
LSTM+CNN AI2D 32.90
Ours AI2D 38.47
Limitations
• Youcan’treallycallthisreasoning…• Rathermatchtingalgorithm• Nocomplexinferenceinvolved
• Youneedalotofpriorknowledgetoanswersomequestions!• E.g.“Flyisaninsect”,“Frogisanamphibian”
TextbookQAtextbookqa.org (CVPR2017)
Reasoningcapability
NLUcapability End-to-end
MachineComprehension
QuestionAnsweringTask(StanfordQuestionAnsweringDataset,2016)
Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?
A:DenverBroncos
WhyNeuralAttention?
Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?
Allowsadeeplearningarchitecturetofocusonthemostrelevantphraseofthecontexttothequery
inadifferentiablemanner.
OurModel:Bi-directionalAttentionFlow(BiDAF)
Attention
Modeling
MLP+softmax
𝑖$ = 0 𝑖' = 1
BarakObamaisthepresidentoftheU.S. WholeadstheUnitedStates?
Attention
(Bidirectional)AttentionFlow
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
Char/WordEmbeddingLayers
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
CharacterandWordEmbedding
• Wordembeddingisfragileagainstunseenwords• Charembeddingcan’teasilylearnsemanticsofwords• Useboth!
• CharembeddingasproposedbyKim(2015)
Seattle
SeattleCNN
+MaxPooling
concat
Embeddingvector
PhraseEmbeddingLayer
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
PhraseEmbeddingLayer• Inputs:thechar/wordembeddingofqueryandcontextwords• Outputs:wordrepresentationsawareoftheirneighbors(phrase-awarewords)
• ApplybidirectionalRNN(LSTM)forbothqueryandcontext
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
Context Query
AttentionLayer
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
AttentionLayer
• Inputs:phrase-awarecontextandquerywords• Outputs:query-awarerepresentationsofcontextwords
• Context-to-queryattention:Foreach(phrase-aware)contextword,choosethemostrelevantwordfromthe(phrase-aware)querywords• Query-to-contextattention:Choosethecontextwordthatismostrelevanttoanyofquerywords.
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
Context-to-QueryAttention(C2Q)
Q:WholeadstheUnitedStates?
C:BarakObamaisthepresidentoftheUSA.
Foreachcontextword,findthemostrelevantqueryword.
Query-to-ContextAttention(Q2C)
WhileSeattle’sweatherisveryniceinsummer,itsweatherisveryrainyinwinter,makingitoneofthemostgloomycitiesintheU.S.LAis…
Q:Whichcityisgloomyinwinter?
ModelingLayer
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
ModelingLayer
• Attentionlayer:modelinginteractionsbetweenqueryandcontext• Modelinglayer:modelinginteractionswithin(query-aware)contextwordsviaRNN(LSTM)
• Divisionoflabor:letattentionandmodelinglayerssolelyfocusontheirowntasks• Weexperimentallyshowthatthisleadstoabetterresultthanintermixingattentionandmodeling
OutputLayer
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
Training
• Minimizesthenegativelogprobabilitiesofthetruestartindexandthetrueendindex
𝑦*+ Trueendindexofexamplei
𝑦*, Truestartindexofexamplei
𝐩+ Probabilitydistributionofstopindex
𝐩, Probabilitydistributionofstartindex
Previouswork
• Usingneuralattentionasacontroller(Xiong etal.,2016)• UsingneuralattentionwithinRNN(Wang&Jiang,2016)• Mostoftheseattentionsareuni-directional
• BiDAF (ourmodel)• usesneuralattentionasalayer,• Isseparatedfrommodelingpart(RNN),• Isbidirectional
VGG-16
Modeling Layer
Output Layer
Attention Flow Layer
Phrase Embed Layer
Word Embed Layer
x1 x2 x3 xT q1 qJ
LSTM
LSTM
LSTM
LSTM
Start End
h1 h2 hT
u1
u2
uJ
Softm
ax
h1 h2 hT
u1
u2
uJ
Max
Softmax
Context2Query
Query2Context
h1 h2 hT u1 uJ
LSTM + SoftmaxDense + Softmax
Context Query
Query2Context and Context2QueryAttention
WordEmbedding
GLOVE Char-CNN
Character Embed Layer
CharacterEmbedding
g1 g2 gT
m1 m2 mT
BiDAF (ours)
ImageClassifierandBiDAF
StanfordQuestionAnsweringDataset(SQuAD)(Rajpurkar etal.,2016)
• MostpopulararticlesfromWikipedia• QuestionsandanswersfromTurkers• 90ktrain,10kdev,?test(hidden)• Answermustlieinthecontext• Twometrics:ExactMatch(EM)andF1
SQuAD Results(http://stanford-qa.com)asofDec2
(ICLR2017)
Now..
50
55
60
65
70
75
80
NoCharEmbedding NoWordEmbedding NoC2QAttention NoQ2CAttention DynamicAttention FullModel
EM F1
Ablationsondevdata
InteractiveDemo
http://allenai.github.io/bi-att-flow/demo
AttentionVisualizations
There%are%13 natural%reserves%in%Warsaw%–among%others%,%Bielany Forest%,%KabatyWoods%,%Czerniaków Lake%.%About%15%kilometres (%9%miles%)%from%Warsaw%,%the%Vistula%river%'s%environment%changes%strikingly%and%features%a%perfectly%preserved%ecosystem%,%with%a%habitat%of%animals%that%includes%the%otter%,%beaver%and%hundreds%of%bird%species%.%There%are%also%several%lakes%in%Warsaw%– mainly%the%oxbow%lakes%,%like%Czerniaków Lake%,%the%lakes%in%the%Łazienkior%Wilanów Parks%,%Kamionek Lake%.%There%are%lot%of%small%lakes%in%the%parks%,%but%only%a%few%are%permanent%– the%majority%are%emptied%before%winter%to%clean%them%of%plants%and%sediments%.
Howmany
naturalreserves
arethere
inWarsaw
?
[]hundreds, few, among, 15, several, only, 13, 9natural, ofreservesare, are, are, are, are, includes[][]Warsaw, Warsaw, Warsawinter species
Where
did
Super
Bowl
50
take
place
?
Super%Bowl%50%was%an%American%football%game%to%determine%the%champion%of%the%National%Football%League%(%NFL%)%for%the%2015%season%.%The%American%Football%Conference%(%AFC%)%champion% Denver%Broncos%defeated%the%National%Football%Conference%(%NFC%)%champion%Carolina%Panthers%24–10%to%earn%their%third%Super%Bowl%title%.%The%game%was%played%on%February%7%,%2016%,%at%Levi%'s%Stadium%in%the%San%Francisco%Bay%Area%at%Santa%Clara%,%California .%As%this%was%the%50th%Super%Bowl%,%the%league%emphasized%the%"%golden%anniversary%"%with%various%goldZthemed%initiatives%,%as%well%as%temporarily%suspending%the%tradition%of%naming%each%Super%Bowl%game%with%Roman%numerals%(%under%which%the%game%would%have%been%known%as%"%Super%Bowl%L%"%)%,%so%that%the%logo%could%prominently%feature%the%Arabic%numerals%50%.
at, the, at, Stadium, Levi, in, Santa, Ana
[]
Super, Super, Super, Super, Super
Bowl, Bowl, Bowl, Bowl, Bowl
50
initiatives
EmbeddingVisualizationatWordvsPhraseLayers
January
September
August
July
May
may
effect and may result in
the state may not aid
of these may be more
Opening in May 1852 at
debut on May 5 ,
from 28 January to 25
but by September had been
Howdoesitcomparewithfeature-basedmodels?
CNN/DailyMail ClozeTest(Hermannetal.,2015)
• ClozeTest(PredictingMissingwords)• ArticlesfromCNN/DailyMail• Human-writtensummaries• Missingwordsarealwaysentities• CNN– 300karticle-querypairs• DailyMail – 1Marticle-querypairs
CNN/DailyMail ClozeTestResults
TransferLearning(ACL2017)
SomelimitationsofSQuAD
Reasoningcapability
NLUcapability End-to-end
bAbIQA&Dialog
ReasoningQuestionAnswering
DialogSystem
U:CanyoubookatableinRomeinItalianCuisine
S:Howmanypeopleinyourparty?
U:Forfourpeopleplease.
S:Whatpricerangeareyoulookingfor?
DialogtaskvsQA
• DialogsystemcanbeconsideredasQAsystem:• Lastuser’sutteranceisthequery• Allpreviousconversationsarecontexttothequery• Thesystem’snextresponseistheanswertothequery
• Posesafewuniquechallenges• Dialogsystemrequirestrackingstates• Dialogsystemneedstolookatmultiplesentencesintheconversation• Buildingend-to-enddialogsystemismorechallenging
Ourapproach:Query-Reduction
<START>Sandragottheapplethere.Sandradroppedtheapple.Danieltooktheapplethere.Sandrawenttothehallway.Danieljourneyedtothegarden.
Q:Whereistheapple?
Reducedquery:
Whereistheapple?WhereisSandra?WhereisSandra?WhereisDaniel?WhereisDaniel?WhereisDaniel?à garden
A:garden
Query-ReductionNetworks• Reducethequeryintoaneasier-to-answerqueryoverthesequenceofstate-changingtriggers(sentences),invectorspace
Sandragottheapplethere.
!"
!"
#""
#"$
%""
%"$
Where isSandra?
Sandradroppedtheapple
!$
!$
#$"
#$$
%""
%$$
Danieltooktheapplethere.
!&
!&
#&"
#&$
%""
%&$
Where isDaniel?
Sandrawenttothehallway.
!'
!'
#'"
#'$
%""
%'$
Where isDaniel?
Danieljourneyedtothegarden.
!(
!(
#("
#($
%""
%($ → *+
Where isDaniel?
Whereistheapple?
#
garden
Where isSandra?
∅ ∅ ∅ ∅
QRNCell
𝛼 𝜌
1 − ×
× +
𝐱𝑡 𝐪𝑡
𝐡𝑡−1 𝐡𝑡
𝐳𝑡 𝐡𝑡
sentence query
reducedquery(hiddenstate)
updategatecandidatereducedquery
updatefunc reductionfunc
CharacteristicsofQRN
• Updategatecanbeconsideredaslocalattention• QRNchoosestoconsider/ignoreeachcandidatereducedquery• Thedecisionismadelocally(asopposedtoglobalsoftmax attention)
• SubclassofRecurrentNeuralNetwork(RNN)• Twoinputs,hiddenstate,gatingmechanism• Abletohandlesequentialdependency(attentioncannot)
• Simplerrecurrentupdateenablesparallelization overtime• Candidatehiddenstate(reducedquery)iscomputedfrominputsonly• Hiddenstatecanbeexplicitlycomputedasafunctionofinputs
Parallelizationcomputedfrominputsonly,socanbetriviallyparallelized
Canbeexplicitlyexpressedasthegeometricsumofpreviouscandidatehiddenstates
Parallelization
CharacteristicsofQRN
• Updategatecanbeconsideredaslocalattention• SubclassofRecurrentNeuralNetwork(RNN)• Simplerrecurrentupdateenablesparallelization overtime
QRNsitsbetweenneuralattentionmechanismandrecurrentneuralnetworks,takingtheadvantageofbothparadigms.
bAbI QADataset
• 20 differenttasks• 1kstory-questionpairsforeachtask(10kalsoavailable)• Syntheticallygenerated• Manyquestionsrequirelookingatmultiplesentences• Forend-to-endsystemsupervisedbyanswersonly
What’sdifferentfromSQuAD?
• Synthetic• Morethanlexical/syntacticunderstanding• Differentkindsofinferences• induction,deduction,counting,pathfinding,etc.
• Reasoningovermultiplesentences• InterestingtestbedtowardsdevelopingcomplexQAsystem(anddialogsystem)
bAbI QAResults(1k)(ICLR2017)
0
10
20
30
40
50
60
LSTM DMN+ MemN2N GMemN2N QRN(Ours)
AvgError(%)
AvgError(%)
bAbI QAResults(10k)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
MemN2N DNC GMemN2N DMN+ QRN(Ours)
AvgError(%)
AvgError(%)
DialogDatasets
• bAbI DialogDataset• Synthetic• 5differenttasks• 1kdialogsforeachtask
• DSTC2*Dataset• Realdataset• EvaluationmetricisdifferentfromoriginalDSTC2:responsegenerationinsteadof“state-tracking”• Eachdialogis800+utterances• 2407possibleresponses
bAbI DialogResults(OOV)
0
5
10
15
20
25
30
35
MemN2N GMemN2N QRN(Ours)
AvgError(%)
AvgError(%)
DSTC2*DialogResults
0
10
20
30
40
50
60
70
MemN2N GMemN2N QRN(Ours)
AvgError(%)
AvgError(%)
bAbI QAVisualization
𝑧/ = Localattention(updategate)atlayerl
DSTC2(Dialog)Visualization
𝑧/ = Localattention(updategate)atlayerl
So…
Reasoningcapability
NLUcapability End-to-end
Isthispossible?
Reasoningcapability
NLUcapability End-to-end
Orthis?
So… Whatshouldwedo?
• Disclaimer:completelysubjective!
• Logic(reasoning)isdiscrete• Modelinglogicwithdifferentiablemodelishard• Relaxation:eitherhardtooptimizeorconvergetobadoptimum(lowgeneralizationerror)• Estimation:Low-biasorlow-variancemethodsareproposed(Williams,1992;Jangetal.,2017),butimprovementsarenotsubstantial.• Bigdata:howmuchdoweneed?Exponentiallymany?• Perhapsnewparadigmisneeded…
“Ifyougotabilliondollarstospendonahugeresearchproject,whatwouldyouliketodo?”
“I'dusethebilliondollarstobuildaNASA-sizeprogramfocusingonnaturallanguageprocessing(NLP),inallofitsglory(semantics,pragmatics,etc).”
MichaelJordanProfessorofComputerScienceUCBerkeley
TowardsArtificialGeneralIntelligence…
Naturallanguageisthebesttooltodescribeandcommunicate“thoughts”
Askingandansweringquestionsisaneffectivewaytodevelopdeeper“thoughts”