2
Applications of Dialogue Agents
• Conversational agents useful for:
– Booking airline flights
– Answering questions
– Electronic Customer Relationship Management (e-CRM) systems
3
Characteristics of Dialogue
• Characteristics of Dialogue– Turns and Utterances
• Dialogue is characterized by turn-taking• Overlapping is small (less than 5%).• Speaker transitions occur at utterance boundaries
– Boundaries based on cue words (e.g., “well, and, so”)
– Grounding• Speaker and hearer must establish common ground (the set of
things mutually believed) • Done via:
– attention, acknowledgement, contribution, demonstration, and display
– Conversational Implicature• Utterance interpretation relies on more than sentence meaning.• Requires drawing of inferences.
– A - “What day in May did you want to travel?”– C - “I need to be there for a meeting from the 12th to the 15th.”
4
Dialogue Acts
• Speech Acts:– Locutionary act– Illocutionary act– Perlocutionary act
• Dialogue Acts / Conversational Moves– Include various types of conversational functions.
• Dialogue Act Markup in Several Layers (DAMSL) architecture– Dialogue act tagging scheme– Hierarchical tag set– Codes levels of dialogue information
e.g. forward looking function, backward looking function.– focused on task-oriented dialogue
5
Automatic Interpretation of Dialogue Acts
• Two types of models:
– Plan Inference Models
– Cue-Based Models
6
Plan Inference Rules
• Rule based techniques consisting of manually crafted rule sets.
• Rules designed for “AI Planning”– How hearer will handle speaker requests– Also called action schema
• Includes constraints, preconditions, effects, and body.
• Based on BDI models (Allen, 1995)– Belief, Desire, Intention
• Belief modeled using KNOWs and KNOWIFs• Desire modeled using WANTs
7
Plan Inference Rules
Can you give me a list of flights from Atlanta?
Step 1: Decompose request: S.REQUESTS(S,H,InformIf(H,S,CanDo(H,Give(H,S,LIST)))))
Step 2: B(H,W(S,InformIf(H, S,CanDo(H,Give(H,S,LIST)))))
Step 3: B(H,W(S,KnowIf(H,S,CanDo(H,Give(H,S,LIST)))))
Step 4: B(H,W(S,CanDo(H,Give(H,S,LIST))))
Step 5: B(H,W(S,Give(H,S,LIST)))
Step 6: REQUEST(H,S,Give(H,S,LIST))
8
Plan Inference Rules
• Advantages– Extremely powerful– Combines rich knowledge structures and
planning techniques• Can capture direct and indirect uses of dialogue
• Disadvantages– Time consuming and labor intensive– Accounting for all possible reasoning makes
this approach AI-Complete.
9
Cue-based Interpretation
• Supervised machine learning techniques
• Trained on hand-labeled dialogue corpora– Use cues (linguistic features) for identifying
dialog types.– Word features:
• “please” “would you” REQUEST
– Conversational Structure• “yeah” after proposal AGREEMENT
10
Cue-based Interpretation
• Decision Tree Models– Shriberg et al. (1998)– Used Decision tree models trained to differentiate
statements, yes-no questions, wh-questions, and declarative questions.
• HMM Models– Woszczyna and Waibel (1994)– Build markov models of speech act probabilities.
• Similar to n-gram models, use Bayes’ Rule• D* = argmax P(D|C)
D
11
Cue-based Interpretation
• Advantages– Data driven approach less time consuming.– Use of machine learning with availability of
large corpora and modern computing power make such methods highly efficient.
• Disadvantages– Not as sophisticated and accurate as the plan
inference approach.
12
Evolution of Conversation Agents
• ELIZA– Weizenbaum (1966)– Simple dialogue manager– Match previous sentence to set of conditions
• PARRY– Colby et al. (1971)– Paranoid agent with emotional states and delusions
• Emotions included anger, fear, etc.
• BDI Model– Cohen and Perrault (1979)– Still prevalent due to high accuracy
• Machine Learning– 1990s – Present
13
Multimodal Agents
• REA– (Bickmore &
Cassell, 2004)– Developed at the
MIT Media Lab– Embodied Agent
• “Human” agents considered more trustworthy (Kiesler & Sproull, 1997).
– Designed to be a real estate agent
– Rule based system
14
Multimodal Agents
• COMIC– (Foster & Oberlander,
2004)– Animated Embodied
Agents– Use machine learning
algorithms to build agent models
– Models trained on corpus of video recordings of conversations.
– Models consider speech, facial expressions, body language, and discussion context.
15
References
• Allen, J. (1995). Natural Language Understanding. Benjamin Cummings, Menlo Park, CA.
• Bickmore T. & Cassell, J. (2004). Social Dialogue with Embodied Conversational Agents. In J. van Kuppevelt, L. Dybkjaer & N. Bernsen (Eds.), Natural, Intelligent and Effective Interaction with Multimodal Dialogue Systems. New York: Kluwer Academic.
• Colby, K. M., Weber, S., & Hilf, F. D. (1971). Artificial Paranoia. Artificial Intelligence, 2(1), 1-25.
• Foster, M. E. & Oberlander, J. (2006). Data-driven Generation of Emphatic Facial Displays. Proceedings of the EACL (2006).
• Kiesler, S., & Sproull, L. (1997). 'Social' Human-Computer Interaction. In B. Friedman (Ed.), Human Values and the Design of Computer Technology (pp. 191-199). Stanford, CA: CSLI Publications.
• Shriberg, E., Bates, R., et al. (1989). Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? Language and Speech, 41(3-4), 439-487.
• Weizenbaum, J. (1966). ELIZA – A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communication of the ACM, 9(1), 36-45.
• Woszczyna, M. and Waibel, A. (1994). Inferring Linguistic Structure in Spoken Language. ICSLP-94, 847-850.