Understanding the User in SocialbotConversations
Mari Ostendorf& the Sounding Board Team
University of Washington
The Sounding Board TeamStudents Faculty Advisors
Yejin Choi - CSE - Noah Smith
Mari Ostendorf, EE
Elizabeth Clark CSE
Ari HoltzmanCSE
HaoFang
EE
MaartenSap
CSE
HaoCheng
EE
Teams of university students try to build a socialbot that • converses coherently and engagingly with people • on popular topics and current events.
10M conversations with real users + new type of conversational AIà many new research problems
This talk: understanding the user includes user modeling
Types of Conversational AI Systems
Accomplish Tasks
Social Conversation–
–
+
+
Chat Bot
Virtual Assistant
chitchat
execute commands, answer questions
Limited social back and forth
Limited content to talk about
2-way social & information
exchange
Socialbot
Roadmapo The socialbot as a conversational gatewayo Sounding Board system overviewo Characteristics of real userso User modeling – first stepso Take-aways & open issues
The Socialbot as a Conversational Gateway
A Perspective on Socialbotso A socialbot facilitates evolving user goals & interestso Users (should) know they are talking to a boto Broad applicationso Education: language learning, tutoring systemso Information exploration, interactive help & recommendationso Exercise/therapy coach, companion
Sounding Board:A Conversational Gatewayto Online Content
Sounding Board
Issues Vary for Different ParadigmsConversational AI
System ComponentsSpeech/language understanding Dialog managementBack-end application
Response generation
Narrow options & execute tasks
Reward = timely task completion
ASSISTANT SOCIALBOT
Structured Database
Task intents, form filling
Learn about interests & make suggestions
Reward = user satisfaction
Unstructured information
Social & info intents
Open domainConstrained domains
Sounding Board – System Overview
o Design philosophyo Brief system overviewo Evaluation
Early Stage Challengeso Software:o No experience with Alexa skill kits, built-in tools are more for
speech-enabling an existing appo No existing dialog system to build on
o Data:o Task is open domain & users want current content à
there was no good existing data for end-to-end trainingo Our initial system was sufficiently bad, we didn’t want to learn
from early user conversations with it
What Makes Someone a Good Conversationalist?
o Have something interesting to say
o Show interest in what your partner says
These principles apply to a socialbot
Have something interesting to sayo Users react positively to learning something new
o ... and negatively to old or unpleasant news
SpaceX sends beer ingredients to International Space Station just in time for Christmas
Man Given 'Options' Before Cutting Dog's Head Off, Ga. Sheriff Says
Babies as young as 10 months can assess how much someone values a particular goal by observing how hard they are willing to work to achieve it …
Show interest in what the user sayso Users lose interest when they get too much content that
they don’t care abouto Users like acknowledgment of their reactions & requestso Some users need encouragement to express opinions
…but it can be annoying This article mentioned Google. Have you heard of Google?
Design Philosophyo Content-driveno Daily content mining, large & dynamic content collectiono Knowledge grapho DM that promotes popular content, diverse sources (styles)
o User-centrico Language understanding that detects user sentimento Dialog management (DM) that tries to learn user personality,
handles rapid topic changes, tracks engagement, ….o Language generation with prosody-appropriate grounding
Multi-dimensional NLU Representation
What is your favorite color?
Let’s talk about technology.
That’s really interesting!
Tell me a joke.Commands
Questions
User Reactions
Topics
Hierarchical Dialog ManagementoMaster (Global)oRank topics, miniskills, contentoConsider: topic coherence, user
engagement, content availabilityoMiniskills (Local)ogreeting / goodbye / menuoprobe user personalityodiscuss a news article / movieo tell a fact / thought / advice / joke
Negotiation
Thought
Fact
Movie
From Speech Acts to Natural Language
GROUNDING
INFORMNEWSTITLE
REQUESTINPUT
INSTRUCTSKIP
Speech Acts
I’m glad you like it!
I read this article from yesterday. UT Austin and Google AI use machine learning ….
Have you read this news?
You can say “next” to talk about other news.
Response
Phrase Generation
Prosody Adjustment
UT Austin and Google AI use machine learning on data from NASA's Kepler Space Telescope to discover an eighth planetcircling a distant star.
o Crawl online contento Filter inappropriate &
depressing contento Index interesting &
uplifting contento noun phrases, entities, meta-info
o Knowledge grapho daily updatedo 80K entries, 300K topics
Content Management
science astronomy
Knowledge Graph
UT Austin and Google AI use machine learning on data from NASA's Kepler Space Telescope … planet … distant star.
How does NASAorganize a party? They plan-et!
Artificial intelligencein 2017 still can't truly understand humans
NASA… android device ... Google … Android device manager …
Janice Joplin was … fraternity brothers at UT Austin …
… NASA …
… AI …AI
… Google …Google
… UT Austin … UT Austin
Evaluation – AlexaPrize Protocolo Users (judges) rate conversations on a 1-5 scaleo Duration of conversation (tie breaker)
Rated by Rating DurationFinals(mid Nov 2017) Judges 3.17 10:22
Post-finals(Nov 24 – Dec 24)
Alexa owners 3.65 4:30
Diagnostic Evaluationo User ratings are expensive, noisy & sparseo user ratings have a very high variance (3.65 ± 1.40)o users may decline to rate the system (43% are rated)o conversations can have good and bad sections
o Users provide more information than is available from the final conversation rating & lengtho topic proposal, topic acceptance & rejectiono reaction to content
Local Rewards: Combine Ratings & Responses
26
Caveats & Constraintso ASR is imperfect
o No sentence segmentation or pause information
o Audio information is not available (i.e. no prosodic cues)
cause does that you’re gonna state that’s coolI’m happy you liked that.
How about robots what’s up with robots?Sorry. I'm actually not very familiar with robots what's up with robots.
Wow! You must be very smart! ….Yeah I did not.
Did you know that ….?
Characteristics of Real Users(from 10M Sounding Board Conversations)
o Observationso Personalityo Implications
Observations: Users Varyo Different interests, opinions on issues, sense of humoro Interaction styles: terse vs. verbose, politeness, …o Different goals: information seeking, opinion sharing,
getting to know each other, adversarialo Different ages
Content Preferences VaryDid you know that Malaysian vampires are tiny monsters that burrow into people's heads and force them to talk about cats?
That’s creepy.
Oh you are so funny.
Oh my god that’s funny.
AmusedThat’s not true.
Oh gods are you have to hear this.
What the heck.
Not amused
Cats are my favorite animals.
Let’s talk about cats.
Cat lover
Cool.Wow that’s interesting.
That’s awesome.
Not really listening?
Interaction Styles➟Cool➟One excuses
enough➟No thanks➟No➟No➟Sure➟Yeah➟Yeah➟No I didn’t
know that➟No➟Yeah➟No
➟No➟ Yes➟ Cool➟ Yeah that’s
cool➟ No I didn’t➟ No➟ Yes➟ No I didn’t
know that➟ No➟ Yes➟ Cool➟ ...
➟ So you’re saying mean is a type 2 diabetic art is junk food
➟ No I’d rather go back to talking about vampires
➟ I love vampires the something that I really loved do you know and rice the author
➟ it is weird i don't understand➟ Yes I would➟ i don't know that's an interesting question
and is it really true that garlic keeps vampires the wedding and what i
➟ what are they have their long fingernails for ➟ i think that that's probably true but i think it
vampires are evil and they don't care about sustaining things for human be-...
Terse UserTalkative User
Different User Goalso Information seeking
o Opinion sharing
o Getting to know each other
i don't know that's an interesting question and is it really true that garlic keeps vampires the wedding and what i
Tell me more about that. What is that? Why?
What’s your favorite video game?
i think that that's probably true but i think it vampires are evil and they don't care about sustaining things for human be-
My favorite video game is minecraft
Adversarial User Exampleso Taboo topics
!
o Offensive language"
o Venting"
Alexa, are you naked?
@#%*#*!!.I heard you mention unicorn. I'll do some research on that. How about we talk about some news about artificial intelligence?
You suck.Unfortunately, I’m not sure I understood what you said.
I’m a chatbot.
yeah can i get my butt hampshire suspense are there was a cough sure stop
ASR is more challenging with children ….
No problem. Let me think. How about we chat about…
Often “can you repeat that?” isn’t going to yield a much better result.
And more content filtering is needed…
Let’s talk about Santa Claus!
You know what I realized the other day? Santa Claus is the most elaborate lie ever told.
User Personality o User-centric topic
suggestionso Five-factor model
(Costa&McCrae, 1992)o E.g., “Do you talk a lot?”
o Helps us understand how users interact with Sounding Board
https://www.verywellmind.com/the-big-five-personality-dimensions-2795422
Trends for Personality Typeso Personality correlates with user ratingso Extroverted, agreeable, open -> higher ratings (p<<.001)
o Topics brought up by users o Introverts (AI, food, cats), extroverts (news, fashion)o Open & imaginative (AI, time travel, aliens)o Low conscientiousness (pokemon, video games, minecraft)
Implications
Content Management
Spoken Language
Understanding
Dialogue Management
Language Generation
Age & dialect impact ASR; verbosity impacts NLP
Content ranking (sources, topic, entities)Error handling, follow-up strategy
Flag rather than filter controversial contentMulti-dimensional content index (e.g.
ratings of user types)
Politeness, repetition rephrasing
User Modeling – First Steps
o Content ranking problemo User embedding model
Content Ranking Problem o Predict user engagement with proposed contento Content can be characterized based on:o Information source, broad topic, entitieso Sentence embeddings
o User engagement characterized based ono User suggested topicso User accepted/rejected topicso User pos/neg reactions to the contento User reaction to the bot
User engagement(subdialog reward)
Types of Features to Useo User-independent infoo Relatedness to current topic (depending on engagement)o General popularity in dialogs with other users
o User-specific featureso Engagement with related topics/sources earlier in the dialogo Age/personality factors reflected in language use
User-topic engagement data is sparse. User embeddings enable learning from similar users.
Predicting User Ratings of Conversationso Task: predict the conversation-level user rating using
linear regression with features that characterizeo Topic-initiation strategy and topic coherenceo Agent dialog acts & language useo User characteristics (verbosity) & language use
o Finding: the best performance is obtained with user characteristics alone
Features !Topic .198Agent .256User .301All .295
Conversational Style – User VectorsUser bag of words* LDA
Vector of ”topic” probabilities
10 LDA clusters – frequent words reflect:• People interacting in specific modes [jokes, music, quiz]• Politeness (would_like, can_I)• Interest in Alexa (what_is, your, favorite)• Positive engagement (cool, funny, interesting)• Self-oriented user (I_think, I_like, I_am)• Interest in video games
* And frequent bigrams
Towards a better unsupervised BOW model
o Is perplexity the right objective for learning user vectors?o Need tricks to make it work, e.g. drop frequent words (yes, no,
yeah, ….)
o A better objective: user re-identification
!!,#[1 + % &!$ , &!% − %(&!$ , &#$)]&
Distance to self Distance to others
!!,#[1 + % &!$ , &!% − %(&!$ , &#$)]&
Distance to self Distance to others
'(){+,}./,0[2 +4 +/2, +/5 −4(+/2, +02)]&
Distance to self Distance to others
Distance to self Distance to others
Cats are my favorite animals.
Let’s talk about cats.
Alexa, what’s your favorite singer?
Experiments on Twitter Userso Task: given a small set of
example users, find other users with similar interests
o Learn embeddings from user tweets
o Experiments with 16 groups, find match out of 43k
Model MRRword2vec 846
LDA 501Re-ID 24
W2V init à Re-ID 12
Results
(Jaech et al., NAACL 2018)
Evaluating User Embeddingso Can we do a preliminary assessment the user embeddings
without full system implementation & user testing?o Plans for using existing data:o Content engagement prediction accuracyo Conversation-level rating predictiono User response generation with context-aware language model
o Work in progress…
Sounding Board – Summaryo The socialbot as a conversational gateway:o Facilitate evolving user goals & interestso Learn new facts, explore information, share opinions
o Critical system componentso Tracking user intent and engagement
o Managing dynamic content (social chat knowledge)
o 10M conversations with real users + new type of conversational AI à many new research problems
User-Related Socialbot Take Awayso User-driven information exploration brings out
user variationo Understanding the user includes:owhat they just said (intent, sentiment)owho they are (interests, interaction style)
o User modeling has implications for all dialog system components (& evaluation)
Some Open Issueso User-dependent reward functiono Dialog policy learning with user embeddingso User response generation with context-aware language
model for a user simulator for reinforcement learning
Thank You
Prosody – What’s that?o It’s not what you say, but how you say ito Intonation, pausing, duration lengthening… (attributes
of the acoustic signal)
o Which communicateo User intent, sentiment, sarcasm, …o Socialbot empathy, enthusiasm, topic change,…