Research Challenges for Spoken Language Dialog Systems

transcript

Julie Baca, Ph.D.

Center for Advanced Vehicular Systems

Mississippi State University

Computer Science Graduate Seminar

November 27, 2002

Overview

Define dialog systems Describe research issues Present current work Give conclusions and discuss

future work

What is a Dialog System?

Current commercial voice products require adherence to “command and control” language, e.g., User: “Plan Route”

Such interfaces are not robust to variations from the fixed words and phrases.

What is a Dialog System?

Dialog systems seek to provide a natural conversational interaction between the user and the computer system, e.g., User: “Is there a way I can get to

Canal Street from here?

Domains for Dialog Systems

Travel reservation Weather forecasting In-vehicle driver assistance On-line learning environments

Dialog Systems: Information Flow Must model two-way flow of information User-to-system System-to-user

Dialog System

Research Issues

Many fundamental problems must be

solved for these systems to mature.

Three general areas include: Automatic Speech Recognition

(ASR) Natural Language Processing

(NLP) Human-computer Interaction (HCI)

NLP Issue for Dialog Systems: Semantics Must assess meaning, not just

syntactic correctness. Therefore, must handle

ungrammatical inputs, e.g., “The ……nearest .....station is… …

is there a gas station nearby?”

NLP Issue: Semantic Representation 1 For NLP, use semantic grammars Semantic frame with slots and

fillers: <destination> -> <prep> <place> <prep>-> “nearest”

<place>-> “gas station”

NLP Issue: Semantic Representation 2 Must also represent: “How do I get from Canal Street to Royal

Street?”<directions> -> <start> <destination><destination> -> <prep><place><place> -> <street_name> |

<business><street_name>-> “Canal St”| “Royal St”<prep> -> <to_prep><near_prep><near-prep> -> “nearest”|“closest”

NLP Issue: Semantic Representation 3 Two Approaches: Hand-craft the grammar for the

application, using robust parsing to understand meaning [1,2]. Problem: time, expense

Use statistical approach, generating initial rules and using annotated tree-banked data to discover the full rule set [3,4]. Problem: annotated training data

ASR/NLP Issue: Reducing Errors Most systems use a loose coupling

of ASR and NLP. Try earlier integration of semantics

with recognizer. Incorporate dialog “state” into

underlying statistical model. Problems:

Increases search space Training Data

NLP Issue: Resolving Meaning Using Context Must maintain knowledge of the

conversational context. After request for nearest gas station,

user says, “What is it close to?” Resolving “it” - anaphora

Another follow-up by the user,

“How about …restaurant?” Resolving “…” with “nearest”- ellipsis

Resolving Meaning: Discourse Analysis To resolve such requests, system

must track context of the conversation.

This is typically handled by a discourse analysis component in the Dialog Manager.

Dialog Manager: Discourse Analysis Anaphora resolution approach: Use

focus mechanism, assuming conversation has focus [5].

For our example, “gas station” is current focus.

But how about: “I’m at Food Max. How do I get to a gas

station close to it and a video store close to it?”

Problem: Resolving the two “its”.

Dialog System

Dialog Manager: Clarification Often cannot satisfy request in one

iteration. The previous example may require

clarification from the user, “Do you want to go to the gas

station first?”

HCI Issue:System vs. User Initiative

What level of control do you provide user in the conversation?

Mixed Initiative

Total system initiative provides low usability.

Total user initiative introduces higher error rate.

Thus, mixed initiative approach, balancing usability and error rate, is taken most often.

Allowing user to adapt the level explicitly has also shown merit [6].

ASR/HCI Issue:Error Handling How to handle possible errors? Assign confidence score to result

of recognizer. For results with lower confidence

score, request clarification or revert to system-oriented initiative.

Can incorporate dialog state in computing confidence score [7].

HCI Issue: Response Generation

How to present response to user in a way that minimizes cognitive load?

Varies depending on whether output is speech-only or speech /visual. Speech-only output must respect user short-term

memory limitations, e.g., lists must be short, timed appropriately, and allow repetition.

Speech/visual output must be complimentary, e.g., importance of redundancy and timing.

HCI Issue: Evaluating Dialog Systems How to compare and evaluate

dialog systems? PARADISE

(Paradigm for Dialog Systems Evaluation) provides a standard framework [8].

PARADISE: Evaluating Dialog Systems Task success

Was the necessary information exchanged?

Efficiency/Cost Number dialog turns, task completion

time Qualitative

ASR rejections, timeouts, helps Usability

User satisfaction with ASR, task ease, interaction pace, system response

Current Work

Research Challenges for Spoken Language Dialog Systems

Documents