From Dialogue Systems to Social Chatbots: Reinforcement...

FromDialogueSystemstoSocialChatbots:ReinforcementLearning,Seq2Seq,andbackagain.

SwissText keynote,9th June2017

VerenaRieser,ProfessorinComputerScienceHeriot-WattUniversity,Edinburgh,UK.

NLPGroup@HWU

Dr SimonKeizer,PDRA

Dr JekaterinaNovikova,PDRA Dr Ondrej

Dusek,PDRADr XingkunLiu,PDRA

AmandaCurry,PhDstudent

Xinnuo Xu,PhDstudent

NLPgroup@HWU

ProfVerenaRieser,Director

ProjectsIamworkingon:

• StatisticalNaturalLanguageGeneration(EPSRCDILIGENTproject)• TransferLearningforDialogueSystems(EPSRCMADRIGALproject)• AutomaticQualityEstimationforOutputGeneration(NVIDIA)• PersonalizedHumanRobotInteraction(withEmoTech LTD)• AmazonAlexaChallenge(Amazon)• SentimentAnalysisforArabic(SemEval’16winner)

CurrentResearchProjects

TalkingMachines

ThenewBotsarecoming….“Botsarethenewapps''

becausethey”fundamentallyrevolutionizehowcomputingisexperiencedbyeverybody.”

Microsoft’sCEONardella

Source:MICJan2015.

Marketforecast

Overview

• Task-drivenStatisticalDialogueSystems(SDS)• ReinforcementLearningandStateTracking

• SocialChatbots• Seq2Seqmodels• DeepRL?

• Futurechallenges• Evaluation?• Data?• Combiningtask-drivenandsocialsystems?

Overview

ShortcomingsofSiriSDSArchitecture

e.g.Rieser&Lemon,Comp.Ling.2011,ACL’10,’08,’06

e.g.Rieseretal.,ACL’05,’09,’10,’16EMNLP’12,’15,EACL’09,’14

e.g.Boidin &Rieser,Interspeech’09

Anexampleofaframe

“ShowmemorningflightsfromEdinburghtoLondononTuesday.”

SHOW:FLIGHTS:

ORIGIN:CITY: EdinburghDATE:TuesdayTIME:?

DEST:CITY:LondonDATE:?TIME:?

TaskrepresentationandNLU

DialogueEngineering:FSAwithVoiceXML etc.

“Aspokendialoguesystemisacomputeragentthatinteractswithhumansbyunderstandingandproducing

spokenlanguageinacoherentway.”[Rieser&Lemon,Springer2011]

StatisticalDialogueSystems(SDS)

• Planning• Adaptation• Robustness

Data-drivenMachineLearningmethods

ShortcomingsofSiri

Twomainresearchareas:1. BeliefMonitoringusingPartiallyObservableMarkov

DecisionProcesses(POMDPs),e.g.[Williams&Young,2007].

2. ActionSelection/PolicyOptimisation usingReinforcementLearning,e.g.[Singhetal.,2002],[Rieser &Lemon,2008,2011]

StatisticalApproachestotask-baseddialogue

ShortcomingsofSiriReinforcementLearning

Qπ (s,a) = Tss 'a

s '∑ [Rss 'a +γV π (s ')];

Bellmann optimalityequation(1952),see[SuttonandBarto,1998].

ShortcomingsofSiriPolicyOptimisationforStochasticEnvironments:MarkovDecisionProcesses

S1t1“Three”

S1t2 mumble

p=0.5

p=0.5

+1

-1

St-1

S2t2

S2t1+1

Hangup -10

“Yes”p=0.8

p=0.2

a1t-1

“Howmanydoyouwant?”

a2t-1“Doyouwantthree?”

Trade-offproblem

St-1

at-1

“Howmanydoyouwant?”

machineaction

oldbelief

Stnewbelief

“Threeplease”

otobserveddata

InferenceviaBayes Rule

BeliefMonitoringforPartiallyObservableEnvironments:POMDPs

ShortcomingsofSiriAfullystatisticalsystem(2010)

• Notenough(annotated)data• Traininsimulation(Rieser &Lemon,ACL2006-2010)• Fasterconvergingalgorithms(Pietquin etal.,2010;Gasic etal.2010)• Domain-transferlearning(Williams,2013;Youngetal.2014)

• InterfacewithNLG.• Mismatchbetween“whattosay”and“howtosay”it.• Hierarchicallearning(Rieser &Lemon,2010;Dethlefs etal.2011)• End-to-endneuralarchitecture(Wenetal.2016)

Challenges

Overview



• Futurechallenges• Evaluation?• Data?

Overview

ShortcomingsofSiriAmazonAlexaChallenge:HWUTeam

ShortcomingsofSiriTuringTest:“Exhibitintelligentbehaviour equivalentto,orindistinguishablefrom,thatofahuman.”(AlanTuring1950)

AmazonAlexaChallenge:“Conversecoherentlyandengaginglywithhumansoverpopulartopicsandeventsfor20minutes.”(Amazon2016-2017)

ChatBots

AIvs.AI:CleverBot (Carpenter2011)

22

CleverbotRolloCarpenter(2011)

CleverBot (Carpenter 2011):• n-gram models of question-answer pairs P(xi|xi-(n1),…xi-1) • Trained on BIG data.

Howfarcanyougowithbigdata?

CleverbotRolloCarpenter(2011)• End-to-end systems.• Learnfrom“raw”dialoguedata(e.g.OpenSubtitles).• Nosemanticorpragmaticannotationrequired.

ResponseGenerationSystems

Input-outputmapping

text-based

ShortcomingsofSiri• InformationRetrieval

• Cleverbot,Xiaoice,Tay etc.• Banchs &Li.,2012,Yuetal.2016:TickTock system.

•StatisticalMachineTranslation• Ritteretal.2011:Data-DrivenResponseGenerationinSocialMedia.

•DeepNeuralNets• Shangetal.,2015;Vinyals &Le,2015;Sordoni etal.,2015.

End-to-EndArchitectures

Seq2seqmodelarchitectureSeq2SeqModelsforEnd-to-EndSystems

ShortcomingsofSiri

• Trainon:1. Moviedialoguedatabases,e.g.OpenSubtitles.com

(Banchs &Li,2012)2. Twitterconversations(Ritteretal.,2011)3. Reddit(Al-Rfou etal.,2016)4. Ubuntucorpus(Loweetal.,SigDial 2015)

NeuralLearningfromLargeDataSets

Sampleoutputfromaneuralnetchatbot

# Context c Message m Generated Response r

1 noone can help this time its just i need a support system well im here to support you. whenever youneed me.

thank you, i appreciate it thank you

2 nothin much, and how’s the book?! its good but i’m only like halfway throughcuz i don’t feel like reading. i’m so bored ...

that’s good! i have the book but i’mbored too.

3 i know! what have you been up to? nothing really. just enjoying summer andworking. what about you?

that’s good! nothing much justlistening to music and you?

4 thanks, your way better though i’m just good at expressing it that’s but trustme you are freakin amazing and waay better

i know, i don’t trust you, i love youtoo! thank you

5 thanks. yeah i am good thanks! good, but angry! sorry you’re angry, been there, i hope it diesdown.

i hope so too, i’m not angry!!

6 figures. bass. never ate bass. bass is extremely mild with very few bones.my favorite after catfish

that’s one of the bones, chuck bass ismine too.

7 today i will walk for miles. clear sky clear mind. love walking. it’s too hot to walk this month.i haven’t moved. frozen. heat freeze.

it’s too hot for you to get up to honeyif you have frozen yogurt to you.

8 arresting people over “nasty tweets”? please. i’m notcondoning what the lad did but arresting him is a tadsilly.

it appears he may have sent death threats toothers while all this was kicking off too.

i did! he was armed with nuclearthreats? that’s what’s happening toyou.

Table 5: Sample responses produced by the MT-based DCGM-II+CMM system.

information that conflicts either internally within theresponse itself, or is at odds with the context, as in ex-amples 4-5. This is not unsurprising, since our modellacks mechanisms both for reflecting agent intent inthe response and for maintaining consistency withrespect to sentiment polarity. Longer context andmessage components may also result in responsesthat wander off-topic or lapse into incoherence as in6-8, especially when relatively low frequency uni-grams (“bass”, “threat”) are echoed in the response.In general, we expect that larger datasets and incorpo-ration of more extensive contexts into the model willhelp yield more coherent results in these cases. Con-sistent representation of agent intent is outside thescope of this work, but will likely remain a significantchallenge.

7 ConclusionWe have formulated a neural network architecturefor data-driven response generation trained from so-cial media conversations, in which generation ofresponses is conditioned on past dialog utterancesthat provide contextual information. We have pro-posed a novel multi-reference extraction techniqueallowing for robust automated evaluation using stan-dard SMT metrics such as BLEU and METEOR.Our context-sensitive models consistently outper-form both context-independent and context-sensitivebaselines by up to 11% relative improvement in

BLEU in the MT setting and 24% in the IR setting, al-beit using a minimal number of features. As our mod-els are completely data-driven and self-contained,they hold the potential to improve fluency and con-textual relevance in other types of dialog systems.

Our work suggests several directions for futureresearch. We anticipate that there is much room forimprovement if we employ more complex neural net-work models that take into account word order withinthe message and context utterances. Direct genera-tion from neural network models is an interesting andpotentially promising next step. Future progress inthis area will also greatly benefit from thorough studyof automated evaluation metrics.

Acknowledgments

We thank Alan Ritter, Ray Mooney, Chris Quirk,Lucy Vanderwende, Susan Hendrich and MouniReddy for helpful discussions, as well as the threeanonymous reviewers for their comments.

ReferencesMichael Auli, Michel Galley, Chris Quirk, and Geoffrey

Zweig. 2013. Joint language and translation modelingwith recurrent neural networks. In Proc. of EMNLP,pages 1044–1054.

Satanjeev Banerjee and Alon Lavie. 2005. METEOR:An automatic metric for MT evaluation with improved

204

Sordoni A,GalleyM,Auli M,BrockettC,JiY,MitchellM,Nie JY,GaoJ,DolanB.Aneuralnetworkapproachtocontext-sensitivegenerationofconversationalresponses.NAACL2015

trainedon127MTwittercontext-message-responsetriples

SampleOutputfromaNeuralNetchatbot

DeepRLProblemswithstandardSeq2Seq

Jiwei Li,MichelGalley,ChrisBrockett,Jianfeng Gao,andBillDolan.2016.ADiversity-PromotingObjectiveFunctionforNeuralConversationModels.

DeepRLDeepReinforcementLearning(Lietal.,2016)

Jiwei Li,WillMonroe,AlanRitter,MichelGalley,Jianfeng GaoandDanJurafsky:DeepReinforcementLearningforDialogueGeneration.

DeepRLDeepReinforcementLearning(Lietal.,2016)


DeepRLRewardmodelling(Lietal.,2016)


Reward=0.25EaseOfAnswering+0.25InformationFlow+0.5SemanticCoherence;

Overview



• Futurechallenges• Evaluation?• Data?• Combiningtask-drivenandsocialsystems?

Overview

• Noclearindicationof”success”.• Currentlyevaluatedturn-level:

• E.g.BLEU,METEOR,etc.• Lowcorrelationwithhumanscores(Lui etal.2016)(Novikova &Rieser,2017)

• Currentresearch:• Turn-level:Reference-lessqualityestimation(Dusek &Rieser,2017)• System-level:Estimatecustomerratings(Curry&Rieser,2017)

EvaluationforSocialDialogue(Curry&Rieser,2017)

ShortcomingsofSiriPitfallsofData

ShortcomingsofSiri

• Task-basedSDS:• ReinforcementLearningwith(PO)MDPs• RelyonDialogueActstomeasureprogresstowardsagoal.

• ResponseGenerationSystems/ChatBotsystems:• End-to-endsystems,distributionalsemantics• ChatBots aimfor“engagingstrategies”

• Challenges:• Qualitycontrol,evaluation.• Cleandatasets.• Integratingtask-basedsystemsandchatbots.

Summary:Data-drivenDialogueSystem

Thanksforlistening!

Comingup:End-to-EndSharedChallengeforNLGhttp://www.macs.hw.ac.uk/InteractionLab/E2E/

Date post:	24-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

From Dialogue Systems to Social Chatbots: Reinforcement...

Documents