+ All Categories
Home > Documents > Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel...

Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel...

Date post: 05-Jan-2016
Category:
Upload: dortha-patterson
View: 213 times
Download: 0 times
Share this document with a friend
16
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems Laboratory University of Central Florida IEEE International Conference on Systems, Man, and Cybernetics San Antonio, Texas October 12, 2009
Transcript
Page 1: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

Towards a Method For Evaluating Naturalness in Conversational Dialog

Systems

Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara

Intelligent Systems LaboratoryUniversity of Central Florida

IEEE International Conference on Systems, Man, and CyberneticsSan Antonio, TexasOctober 12, 2009

Page 2: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Agenda

Introduction Background Approach Project LifeLike

Page 3: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Introduction

Interactive Conversation Agent Evaluation Cannot rely solely on quantitative methods Subjectivity in ‘naturalness’ No general method to judge how well a conversation

agent performs Pivotal focus will be defining naturalness

How well a chatbot can maintain a natural conversation flow

LifeLike virtual avatar project as a backdrop Provide a suitable validation and verification method

Page 4: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Background: Early Systems

Declarative knowledge to process data Explicitly defined rules Constrained knowledge Limited capacity to assess and adapt Goal-oriented and data-driven behavior

ALICEbot

Page 5: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Background: Naturalness

Automatic Speech Recognition Context retrieval experimentation Intelligent tutoring

Adaptive Control of Thought Knowledge Acquisition agents

Quality of the information received Conversation length metric

ALICE-based bots

Page 6: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Background: Recent Advances

Sentence-based template matching Simple conversational memory

CMU’s Julia, Extempo’s Erin Interaction occurs in a reactive manner

Wlodzislaw et al Development of cognitive modules and

human interface realism Ontologies, concept description vectors,

semantic memory models, CYC

Page 7: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Background: Recent Advances

Becker and Wachsmuth Representation and actuation of coherent

emotional states Lars et al

Model for sustainable conversation Awareness of the human users and the

conversation topics Relies on textual input similar to ELIZA

Use of natural language processing for reasoning about human speech

Page 8: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Background: Conclusion

Breadth of research using chatbots Focus on creating more sophisticated

interpretative conversational modules Need exists for generalizable metrics Conversational agents widely experimented

with, but it has been lacking a basic framework for universal performance comparison

Page 9: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Previous Approaches

Mix of quantitative and qualitative measures

Subjective matters employ human user questionnaire Semeraro et al’s bookstore chatbot 7 characteristics: impression, command,

effectiveness, navigability, ability to learn, ability to aid, comprehension.

Does not provide statistical conclusiveness General indicator of performance

Page 10: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Previous Approaches

Shawar and Atwell’s universal chatbot evaluation system ALICE-based Afrikaans conversation agent

Dialog efficiency Dialog quality: reasonable, weird but

understandable, and nonsensical Users’ satisfaction, qualitatively measured

Proper assessment is end result in how successfully it accomplishes its intended goals

Page 11: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Previous Approaches

Evaluation of naturalness similar to general chatbot assessment

Rzepka et al’s 1-to-10 scale metrics Naturalness degree Willing to continue a conversation degree Human judges used these measures to

evaluate a conversation agent’s utterances No concrete baseline for naturalness Able to make relative measurements of

naturalness between dialog agents

Page 12: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Chatbot Objectives

Walker et al’s PARAdigm for DIalogue System Evaluation (PARADISE) Dialog performance relates to the

experience of the interaction (means) Task success is concerned with the utility of

the dialog exchange (ends) Objectives

Better than other dialog system solutions Similar to a human-to-human (naturalness)

interaction 

Page 13: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Task Success

Measure of goal satisfaction Attribute-value matrix

Derived from PARADISE Expected vs. actual Task success (κ) computed as the percentage of

correct responses

Page 14: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Performance Function

Derived from PARADISE Total effectiveness

Task success (κ) weighted by (α) Dialog costs (ci) weighted by (wi) Function (N) uses Z-score normalization

Balance out (κ) and (ci)

Page 15: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Approach: Proposed System

Task success Dialog costs

Efficiency Resource consumption Quantitative

Quality Actual conversational

content Quantitative or qualitative

Page 16: Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

University of Central Florida www.ucf.edu

Questions


Recommended