FromDialogueSystemstoSocialChatbots:ReinforcementLearning,Seq2Seq,andbackagain.
SwissText keynote,9th June2017
VerenaRieser,ProfessorinComputerScienceHeriot-WattUniversity,Edinburgh,UK.
NLPGroup@HWU
Dr SimonKeizer,PDRA
Dr JekaterinaNovikova,PDRA Dr Ondrej
Dusek,PDRADr XingkunLiu,PDRA
AmandaCurry,PhDstudent
Xinnuo Xu,PhDstudent
NLPgroup@HWU
ProfVerenaRieser,Director
ProjectsIamworkingon:
• StatisticalNaturalLanguageGeneration(EPSRCDILIGENTproject)• TransferLearningforDialogueSystems(EPSRCMADRIGALproject)• AutomaticQualityEstimationforOutputGeneration(NVIDIA)• PersonalizedHumanRobotInteraction(withEmoTech LTD)• AmazonAlexaChallenge(Amazon)• SentimentAnalysisforArabic(SemEval’16winner)
CurrentResearchProjects
TalkingMachines
ThenewBotsarecoming….“Botsarethenewapps''
becausethey”fundamentallyrevolutionizehowcomputingisexperiencedbyeverybody.”
Microsoft’sCEONardella
Source:MICJan2015.
Marketforecast
Overview
• Task-drivenStatisticalDialogueSystems(SDS)• ReinforcementLearningandStateTracking
• SocialChatbots• Seq2Seqmodels• DeepRL?
• Futurechallenges• Evaluation?• Data?• Combiningtask-drivenandsocialsystems?
Overview
ShortcomingsofSiriSDSArchitecture
e.g.Rieser&Lemon,Comp.Ling.2011,ACL’10,’08,’06
e.g.Rieseretal.,ACL’05,’09,’10,’16EMNLP’12,’15,EACL’09,’14
e.g.Boidin &Rieser,Interspeech’09
Anexampleofaframe
“ShowmemorningflightsfromEdinburghtoLondononTuesday.”
SHOW:FLIGHTS:
ORIGIN:CITY: EdinburghDATE:TuesdayTIME:?
DEST:CITY:LondonDATE:?TIME:?
TaskrepresentationandNLU
DialogueEngineering:FSAwithVoiceXML etc.
“Aspokendialoguesystemisacomputeragentthatinteractswithhumansbyunderstandingandproducing
spokenlanguageinacoherentway.”[Rieser&Lemon,Springer2011]
StatisticalDialogueSystems(SDS)
• Planning• Adaptation• Robustness
Data-drivenMachineLearningmethods
ShortcomingsofSiri
Twomainresearchareas:1. BeliefMonitoringusingPartiallyObservableMarkov
DecisionProcesses(POMDPs),e.g.[Williams&Young,2007].
2. ActionSelection/PolicyOptimisation usingReinforcementLearning,e.g.[Singhetal.,2002],[Rieser &Lemon,2008,2011]
StatisticalApproachestotask-baseddialogue
ShortcomingsofSiriReinforcementLearning
Qπ (s,a) = Tss 'a
s '∑ [Rss 'a +γV π (s ')];
Bellmann optimalityequation(1952),see[SuttonandBarto,1998].
ShortcomingsofSiriPolicyOptimisationforStochasticEnvironments:MarkovDecisionProcesses
S1t1“Three”
S1t2 mumble
p=0.5
p=0.5
+1
-1
St-1
S2t2
S2t1+1
Hangup -10
“Yes”p=0.8
p=0.2
a1t-1
“Howmanydoyouwant?”
a2t-1“Doyouwantthree?”
Trade-offproblem
St-1
at-1
“Howmanydoyouwant?”
machineaction
oldbelief
Stnewbelief
“Threeplease”
otobserveddata
InferenceviaBayes Rule
BeliefMonitoringforPartiallyObservableEnvironments:POMDPs
ShortcomingsofSiriAfullystatisticalsystem(2010)
• Notenough(annotated)data• Traininsimulation(Rieser &Lemon,ACL2006-2010)• Fasterconvergingalgorithms(Pietquin etal.,2010;Gasic etal.2010)• Domain-transferlearning(Williams,2013;Youngetal.2014)
• InterfacewithNLG.• Mismatchbetween“whattosay”and“howtosay”it.• Hierarchicallearning(Rieser &Lemon,2010;Dethlefs etal.2011)• End-to-endneuralarchitecture(Wenetal.2016)
Challenges
Overview
• Task-drivenStatisticalDialogueSystems(SDS)• ReinforcementLearningandStateTracking
• SocialChatbots• Seq2Seqmodels• DeepRL?
• Futurechallenges• Evaluation?• Data?
Overview
ShortcomingsofSiriAmazonAlexaChallenge:HWUTeam
ShortcomingsofSiriTuringTest:“Exhibitintelligentbehaviour equivalentto,orindistinguishablefrom,thatofahuman.”(AlanTuring1950)
AmazonAlexaChallenge:“Conversecoherentlyandengaginglywithhumansoverpopulartopicsandeventsfor20minutes.”(Amazon2016-2017)
ChatBots
AIvs.AI:CleverBot (Carpenter2011)
22
CleverbotRolloCarpenter(2011)
CleverBot (Carpenter 2011):• n-gram models of question-answer pairs P(xi|xi-(n1),…xi-1) • Trained on BIG data.
Howfarcanyougowithbigdata?
CleverbotRolloCarpenter(2011)• End-to-end systems.• Learnfrom“raw”dialoguedata(e.g.OpenSubtitles).• Nosemanticorpragmaticannotationrequired.
ResponseGenerationSystems
Input-outputmapping
text-based
ShortcomingsofSiri• InformationRetrieval
• Cleverbot,Xiaoice,Tay etc.• Banchs &Li.,2012,Yuetal.2016:TickTock system.
•StatisticalMachineTranslation• Ritteretal.2011:Data-DrivenResponseGenerationinSocialMedia.
•DeepNeuralNets• Shangetal.,2015;Vinyals &Le,2015;Sordoni etal.,2015.
End-to-EndArchitectures
Seq2seqmodelarchitectureSeq2SeqModelsforEnd-to-EndSystems
ShortcomingsofSiri
• Trainon:1. Moviedialoguedatabases,e.g.OpenSubtitles.com
(Banchs &Li,2012)2. Twitterconversations(Ritteretal.,2011)3. Reddit(Al-Rfou etal.,2016)4. Ubuntucorpus(Loweetal.,SigDial 2015)
NeuralLearningfromLargeDataSets
Sampleoutputfromaneuralnetchatbot
# Context c Message m Generated Response r
1 noone can help this time its just i need a support system well im here to support you. whenever youneed me.
thank you, i appreciate it thank you
2 nothin much, and how’s the book?! its good but i’m only like halfway throughcuz i don’t feel like reading. i’m so bored ...
that’s good! i have the book but i’mbored too.
3 i know! what have you been up to? nothing really. just enjoying summer andworking. what about you?
that’s good! nothing much justlistening to music and you?
4 thanks, your way better though i’m just good at expressing it that’s but trustme you are freakin amazing and waay better
i know, i don’t trust you, i love youtoo! thank you
5 thanks. yeah i am good thanks! good, but angry! sorry you’re angry, been there, i hope it diesdown.
i hope so too, i’m not angry!!
6 figures. bass. never ate bass. bass is extremely mild with very few bones.my favorite after catfish
that’s one of the bones, chuck bass ismine too.
7 today i will walk for miles. clear sky clear mind. love walking. it’s too hot to walk this month.i haven’t moved. frozen. heat freeze.
it’s too hot for you to get up to honeyif you have frozen yogurt to you.
8 arresting people over “nasty tweets”? please. i’m notcondoning what the lad did but arresting him is a tadsilly.
it appears he may have sent death threats toothers while all this was kicking off too.
i did! he was armed with nuclearthreats? that’s what’s happening toyou.
Table 5: Sample responses produced by the MT-based DCGM-II+CMM system.
information that conflicts either internally within theresponse itself, or is at odds with the context, as in ex-amples 4-5. This is not unsurprising, since our modellacks mechanisms both for reflecting agent intent inthe response and for maintaining consistency withrespect to sentiment polarity. Longer context andmessage components may also result in responsesthat wander off-topic or lapse into incoherence as in6-8, especially when relatively low frequency uni-grams (“bass”, “threat”) are echoed in the response.In general, we expect that larger datasets and incorpo-ration of more extensive contexts into the model willhelp yield more coherent results in these cases. Con-sistent representation of agent intent is outside thescope of this work, but will likely remain a significantchallenge.
7 ConclusionWe have formulated a neural network architecturefor data-driven response generation trained from so-cial media conversations, in which generation ofresponses is conditioned on past dialog utterancesthat provide contextual information. We have pro-posed a novel multi-reference extraction techniqueallowing for robust automated evaluation using stan-dard SMT metrics such as BLEU and METEOR.Our context-sensitive models consistently outper-form both context-independent and context-sensitivebaselines by up to 11% relative improvement in
BLEU in the MT setting and 24% in the IR setting, al-beit using a minimal number of features. As our mod-els are completely data-driven and self-contained,they hold the potential to improve fluency and con-textual relevance in other types of dialog systems.
Our work suggests several directions for futureresearch. We anticipate that there is much room forimprovement if we employ more complex neural net-work models that take into account word order withinthe message and context utterances. Direct genera-tion from neural network models is an interesting andpotentially promising next step. Future progress inthis area will also greatly benefit from thorough studyof automated evaluation metrics.
Acknowledgments
We thank Alan Ritter, Ray Mooney, Chris Quirk,Lucy Vanderwende, Susan Hendrich and MouniReddy for helpful discussions, as well as the threeanonymous reviewers for their comments.
ReferencesMichael Auli, Michel Galley, Chris Quirk, and Geoffrey
Zweig. 2013. Joint language and translation modelingwith recurrent neural networks. In Proc. of EMNLP,pages 1044–1054.
Satanjeev Banerjee and Alon Lavie. 2005. METEOR:An automatic metric for MT evaluation with improved
204
Sordoni A,GalleyM,Auli M,BrockettC,JiY,MitchellM,Nie JY,GaoJ,DolanB.Aneuralnetworkapproachtocontext-sensitivegenerationofconversationalresponses.NAACL2015
trainedon127MTwittercontext-message-responsetriples
SampleOutputfromaNeuralNetchatbot
DeepRLProblemswithstandardSeq2Seq
Jiwei Li,MichelGalley,ChrisBrockett,Jianfeng Gao,andBillDolan.2016.ADiversity-PromotingObjectiveFunctionforNeuralConversationModels.
DeepRLDeepReinforcementLearning(Lietal.,2016)
Jiwei Li,WillMonroe,AlanRitter,MichelGalley,Jianfeng GaoandDanJurafsky:DeepReinforcementLearningforDialogueGeneration.
DeepRLDeepReinforcementLearning(Lietal.,2016)
Jiwei Li,WillMonroe,AlanRitter,MichelGalley,Jianfeng GaoandDanJurafsky:DeepReinforcementLearningforDialogueGeneration.
DeepRLRewardmodelling(Lietal.,2016)
Jiwei Li,WillMonroe,AlanRitter,MichelGalley,Jianfeng GaoandDanJurafsky:DeepReinforcementLearningforDialogueGeneration.
Reward=0.25EaseOfAnswering+0.25InformationFlow+0.5SemanticCoherence;
Overview
• Task-drivenStatisticalDialogueSystems(SDS)• ReinforcementLearningandStateTracking
• SocialChatbots• Seq2Seqmodels• DeepRL?
• Futurechallenges• Evaluation?• Data?• Combiningtask-drivenandsocialsystems?
Overview
• Noclearindicationof”success”.• Currentlyevaluatedturn-level:
• E.g.BLEU,METEOR,etc.• Lowcorrelationwithhumanscores(Lui etal.2016)(Novikova &Rieser,2017)
• Currentresearch:• Turn-level:Reference-lessqualityestimation(Dusek &Rieser,2017)• System-level:Estimatecustomerratings(Curry&Rieser,2017)
EvaluationforSocialDialogue(Curry&Rieser,2017)
ShortcomingsofSiriPitfallsofData
ShortcomingsofSiri
• Task-basedSDS:• ReinforcementLearningwith(PO)MDPs• RelyonDialogueActstomeasureprogresstowardsagoal.
• ResponseGenerationSystems/ChatBotsystems:• End-to-endsystems,distributionalsemantics• ChatBots aimfor“engagingstrategies”
• Challenges:• Qualitycontrol,evaluation.• Cleandatasets.• Integratingtask-basedsystemsandchatbots.
Summary:Data-drivenDialogueSystem
Thanksforlistening!
Comingup:End-to-EndSharedChallengeforNLGhttp://www.macs.hw.ac.uk/InteractionLab/E2E/