+ All Categories
Home > Documents > SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Date post: 17-Mar-2016
Category:
Upload: jorn
View: 53 times
Download: 1 times
Share this document with a friend
Description:
SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS. Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee. This Tutorial. Introduction to Spoken Dialog System (SDS) for Human-Robot Interaction (HRI) Brief introduction to SDS Language processing oriented - PowerPoint PPT Presentation
Popular Tags:
88
SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee
Transcript
Page 1: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Intelligent Software Lab. POSTECHProf. Gary Geunbae Lee

Page 2: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Introduction to Spoken Dialog System (SDS) for Human-Robot Interaction (HRI) Brief introduction to SDS

Language processing oriented But not signal processing oriented

Mainly based on papers at ACL, NAACL, HLT, ICASSP, INTESPEECH, ASRU, SLT,

SIGDIAL, CSL, SPECOM, IEEE TASLP

2

This Tutorial

Page 3: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

OUTLINES INTRODUCTION

AUTOMATIC SPEECH RECOGNITION

SPOKEN LANGUAGE UNDERSTANDING

DIALOG MANAGEMENT

CHALLENGES & ISSUES MULTI-MODAL DIALOG SYSTEM DIALOG SIMULATOR

DEMOS

REFERENCES

Page 4: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

INTRODUCTION

Page 5: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Human-Robot Interaction (in Movie)

Page 6: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Human-Robot Interaction (in Real World)

Page 7: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Wikipedia (http://en.wikipedia.org/wiki/Human_robot_interaction)

What is HRI?

Human-robot interaction (HRI) is the study of interactions the study of interactions between people and robots.between people and robots. HRI is multidisciplinary with contributions from the fields of human-computer interaction, artificial intelligence, robotics, natural language natural language understandingunderstanding, and social science.

The basic goal of HRI is to develop principles and algorithms develop principles and algorithms to allow more natural and effective communication and allow more natural and effective communication and interactioninteraction between humans and robots.

Page 8: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

• Signal Processing• Speech Recognition• Speech Understanding• Dialog Management• Speech Synthesis

Area of HRIVision

Speech

Haptics

Emotion

Learning

Page 9: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPOKEN DIALOG SYSTEM (SDS)

Page 10: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Tele-service

Car-navigation Home networking

Robot interface

SDS APPLICATIONS

Page 11: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Talk, Listen and Interact

Page 12: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

AUTOMATIC SPEECH RECOGNITION

Page 13: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SCIENCE FICTION Eagle Eye (2008, D.J. Caruso)

Page 14: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

AUTOMATIC SPEECH RECOGNITION

x y

Speech Words

(x, y)

Training examples

Learning algorithm

A process by which an acoustic speech signal is converted into a set of words[Rabiner et al., 1993]

Page 15: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

NOISY CHANNEL MODEL GOAL

Find the most likely sequence w of “words” in language L given the sequence of acoustic observation vectors O

Treat acoustic input O as sequence of individual observations O = o1,o2,o3,…,ot

Define a sentence as a sequence of words: W = w1,w2,w3,…,wn

)|(maxargˆ OWPWLW

)()|(maxargˆ WPWOPWLW

)()()|(maxargˆ

OPWPWOPW

LW

Bayes rule

Golden rule

Page 16: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

TRADITIONAL ARCHITECTURE

FeatureExtraction Decoding

AcousticModel

PronunciationModel

LanguageModel

버스 정류장이어디에 있나요 ?

Speech Signals Word Sequence

버스 정류장이어디에 있나요 ?

NetworkConstruction

SpeechDB

TextCorpora

HMMEstimation

G2P

LMEstimation

WO

)()|(maxargˆ WPWOPWLW

Page 17: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

TRADITIONAL PROCESSES

Page 18: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

FEATURE EXTRACTION The Mel-Frequency Cepstrum Coefficients (MFCC) is a

popular choice [Paliwal, 1992]

Frame size : 25ms / Frame rate : 10ms

39 feature per 10ms frame Absolute : Log Frame Energy (1) and MFCCs (12) Delta : First-order derivatives of the 13 absolute coefficients Delta-Delta : Second-order derivatives of the 13 absolute

coefficients

Preemphasis/HammingWindow

FFT(Fast Fourier Transform)

Mel-scalefilter bank log|.|

DCT (Discrete Cosine Transform)

MFCC(12-Dimension)X(n)

25 ms

10ms . . .a1 a2

a3

Page 19: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

ACOUSTIC MODEL Provide P(O|Q) = P(features|phone) Modeling Units [Bahl et al., 1986]

Context-independent : Phoneme Context-dependent : Diphone, Triphone, Quinphone

pL-p+pR : left-right context triphone Typical acoustic model [Juang et al., 1986]

Continuous-density Hidden Markov Model Distribution : Gaussian Mixture

HMM Topology : 3-state left-to-right model for each phone, 1-state for silence or pause

),,( BA

K

kjkjktjkjj xNcxb

1

),;()(

codebook

bj(x)

Page 20: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

PRONUCIATION MODEL Provide P(Q|W) = P(phone|word) Word Lexicon [Hazen et al., 2002]

Map legal phone sequences into words according to phonotactic rules

G2P (Grapheme to phoneme) : Generate a word lexicon automatically

Several word may have multiple pronunciations Example

Tomato

P([towmeytow]|tomato) = P([towmaatow]|tomato) = 0.1 P([tahmeytow]|tomato) = P([tahmaatow]|tomato) = 0.4

[t]

[ow]

[ah]

[m]

[ey]

[aa]

[t] [ow]

0.2

0.8 1.0

1.0 0.5

0.5 1.0

1.01.0

Page 21: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

LANGUAGE MODEL Provide P(W) ; the probability of the sentence [Beaujard et al.,

1999] We saw this was also used in the decoding process as the

probability of transitioning from one word to another. Word sequence : W = w1,w2,w3,…,wn

The problem is that we cannot reliably estimate the conditional word probabilities, for all words and all sequence lengths in a given language

n-gram Language Model n-gram language models use the previous n-1 words to represent

the history

Bi-grams are easily incorporated in a viterbi search

n

iiin wwwPwwP

111 )|()1(

)|( 11 ii wwwP

)|()|( 1)1(11 iniiii wwwPwwwP

Page 22: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

LANGUAGE MODEL Example

Finite State Network (FSN)

Context Free Grammar (CFG)

Bigram

서울부산

에서

출발

세시네시

대구대전 도착

출발하는

기차버스

P( 에서 | 서울 )=0.2 P( 세시 | 에서 )=0.5P( 출발 | 세시 )=1.0 P( 하는 | 출발 )=0.5P( 출발 | 서울 )=0.5 P( 도착 | 대구 )=0.9…

$time = 세시 | 네시 ;$city = 서울 | 부산 | 대구 | 대전 ;$trans = 기차 | 버스 ;$sent = $city ( 에서 $time 출발 | 출발 $city 도착 ) 하는 $trans

Page 23: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Expanding every word to state level, we get a search network [Demuynck et al., 1997]

NETWORK CONSTRUCTION

I

L

S

A

M

I L

I

S A M

S A삼

Acoustic Model Pronunciation Model Language Model

I

I L

S A M

Wordtransition

P( 일 |x)

P( 사 |x)

P( 삼 |x)

P( 이 |x)LM is

applied

S A

start end이

Between-wordtransition

Intra-wordtransition

Search Network

Page 24: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DECODING Find

Viterbi Search : Dynamic Programming Token Passing Algorithm [Young et al., 1989]

)|(maxargˆ OWPWLW

• Initialize all states with a token with a null history and the likelihood that it’s a start state

• For each frame ak

– For each token t in state s with probability P(t), history H– For each state r

– Add new token to s with probability P(t) Ps,r Pr(ak), and history s.H

Page 25: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

HTK Hidden Markov Model Toolkit (HTK)

A portable toolkit for building and manipulating hidden Markov models [Young et al., 1996]

- HShell : User I/O & interaction with OS- HLabel : Label files- HLM : Language model- HNet : Network and lattices- HDic : Dictionaries- HVQ : VQ codebooks- HModel : HMM definitions- HMem : Memory management- HGrf : Graphics- HAdapt : Adaptation- HRec : Main recognition processing functions

Page 26: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SUMMARY

x y

Speech Words

(x, y)

Training examples

Learning algorithm

I

L

S

A

M

I L

I

S A M

S A삼

Acoustic Model Pronunciation Model Language Model

Decoding

Search Network Construction

Page 27: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Speech Understanding= Spoken Language Understanding (SLU)

Page 28: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPEECH UNDERSTANDING (in general)

Computer Program

Speaker ID /Language ID

Sentiment / Opinion

Named Entity / Relation

Topic / Intent

Speech Segment

Summary

Syntactic / Semantic Role

SQL

Meaning Representation

Dave /English

Nervous

LOC = pod bayOBJ = door

Control the Spaceship

Open the doors.

Open=Verb, the=Det. ...

select * from DOORS where ...

Page 29: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPEECH UNDERSTANDING (in SDS)

x y

InputSpeech or

Words

OutputIntentions

(x, y)

Training examples

Learning algorithm

A process by which natural langauge speech is mapped to frame structure encoding of its meanings [Mori et al., 2008]

Page 30: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

What’s difference between NLU and SLU? Robustness; noise and ungrammatical spoken language Domain-dependent; further deep-level semantics (e.g.

Person vs. Cast) Dialog; dialog history dependent and utt. by utt. Analysis

Traditional approaches; natural language to SQL conversion

ASRSpeech

SLU SQLGenerate Database

Text SemanticFrame SQL Response

A typical ATIS system (from [Wang et al., 2005])

LANGUAGE UNDERSTANDING

Page 31: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REPRESENTATION Semantic frame (slot/value structure) [Gildea and Jurafsky, 2002]

An intermediate semantic representation to serve as the interface between user and dialog system

Each frame contains several typed components called slots. The type of a slot specifies what kind of fillers it is expecting.

“Show me flights from Seattle to Boston”ShowFlight

Subject Flight

FLIGHT Departure_City Arrival_City

SEA BOS

<frame name=‘ShowFlight’ type=‘void’> <slot type=‘Subject’>FLIGHT</slot> <slot type=‘Flight’/> <slot type=‘DCity’>SEA</slot> <slot type=‘ACity’>BOS</slot> </slot></frame>

Semantic representation on ATIS task; XML format (left) and hierarchical representation (right) [Wang et al., 2005]

Page 32: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Meaning Representations for Spoken Dialog System Slot type 1: Intent, Subject Goal, Dialog Act (DA)

The meaning (intention) of an utt. at the discourse level Slot type 2: Component Slot, Named Entity (NE)

The identifier of entity such as person, location, organization, or time. In SLU, it represents domain-specific meaning of a word (or word group).

SEMANTIC FRAME

<frame domain=`RestaurantGuide'> <slot type=`DA' name=`SEARCH_RESTAURANT'/> <slot type=`NE' name=`CITY'>Pohang</slot> <slot type=`NE' name=`ADDRESS'>Daeyidong</slot> <slot type=`NE' name=`FOOD_TYPE'>Korean</slot></frame>

Ex) Find Korean restaurants in Daeyidong, Pohang

Page 33: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Two Classification ProblemsHOW TO SOLVE

Find Korean restaurants in Daeyidong, PohangInput:

Output: SEARCH_RESTAURANT

Dialog Act Identification

FOOD_TYPE ADDRESS CITY

Find Korean restaurants in Daeyidong, PohangInput:

Output: Named Entity Recognition

Page 34: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Encoding:

x is an input (word), y is an output (NE), and z is another output (DA).

Vector x = {x1, x2, x3, …, xT} Vector y = {y1, y2, y3, …, yT} Scalar z

Goal: modeling the functions y=f(x) and z=g(x)

PROBLEM FORMALIZATION

x Find Korean restaurants

in Daeyidong , Pohang .

y O FOOD_TYPE-B O O ADDRESS-B O CITY-B O

z SEARCH_RESTAURANT

Page 35: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

CASCADE APPROACH I

Classification(Dialog Act / Intent)

Sequential Labeling

(Named Entity / Frame Slot)

Automatic Speech

Recognition

Dialog Management

Sequential Labeling Model (e.g. HMM, CRFs)

Classification Model

(e.g. MaxEnt, SVM)

x,yx x,y,z

Named Entity Dialog Act

Page 36: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Dialog Act Named Entity

Improve NE, but not DA.

CASCADE APPROACH II

Classification(Dialog Act / Intent)

Sequential Labeling

(Named Entity / Frame Slot)

Automatic Speech

Recognition

Dialog Management

Multiple Sequential Models (e.g.

intent-dependent)

Classification Model

(e.g. MaxEnt, SVM)

x,y,zx x,z

z

Page 37: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Named Entity ↔ Dialog ActJOINT APPROACH

Joint Inference

Classification(Dialog Act / Intent)

Sequential Labeling

(Named Entity / Frame Slot)

Automatic Speech

Recognition

Dialog Management

Joint Model(e.g. TriCRFs)

x x,y,z

[Jeong and Lee, 2006]

Page 38: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MACHINE LEARNING FOR SLU Relational Learning (RL) or Structured Prediction (SP)

[Dietterich, 2002; Lafferty et al., 2004, Sutton and McCallum, 2006] Structured or relational patterns are important because

they can be exploited to improve the prediction accuracy of our classier

Argmax search (e.g. Sum-Max, Belief propagation, Viterbi etc)

Basically, RL for language processing is to use a left-to-right structure (a.k.a linear-chain or sequence structure)

Algorithms: CRFs, Max-Margin Markov Net (M3N), SVM for Independent and Structured Output (SVM-ISO), Structured Perceptron, etc.

Page 39: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MACHINE LEARNING FOR SLU Background: Maximum Entropy (a.k.a logistic regression)

Conditional and discriminative manner Unstructured! (no dependency in y) Dialog act classification problem

Conditional Random Fields [Lafferty et al. 2001] Structured versions of MaxEnt (argmax search in inference) Undirected graphical models Popular in language and text processing Linear-chain structure for practical implementation Named entity recognition problem

z

x

yt-1 yt yt+1

xt-1 xt xt+1

fk

gk

hk

Page 40: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SUMMARYSolve by isolate (or independent) classifiersuch as Naïve Bayes, and MaxEnt

Solve by structured (or relational) classifiersuch as HMM, and CRFs

Find Korean restaurants in Daeyidong, PohangInput:

Output: SEARCH_RESTAURANT

Dialog Act Identification

FOOD_TYPE ADDRESS CITY

Find Korean restaurants in Daeyidong, PohangInput:

Output: Named Entity Recognition

Page 41: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Coffee Break

Page 42: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG MANAGEMENT

Page 43: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG MANAGEMENT

x y

InputWords or Intentions

OutputSystem

Response

(x, y)

Training examples

Learning algorithm

A central component of a dialog system to produce system responses with external knowledge sources[McTear, 2004]

Page 44: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG MANAGEMENT GOAL

Answer your query (e.g., question and order) given the task domain It includes :

Provide query results Ask further slot information Confirm user utterance Notify invalid query Suggest the alternative

Related to dialog complexity and task complexity.

In practice Find the best system action a given the dialog state s

Page 45: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DESIGN ISSUES Task complexity

How hard the task is? How much the system has domain knowledge?

Simple Complex

Call Routing

CollaborativePlanning

WeatherInformation

Conversational English Tutoring

AutomaticBanking

Page 46: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DESIGN ISSUES Dialog complexity

What dialog phenomena are allowed Initiative strategies

e.g., system-initiative vs. user-initiative vs. mixed-initiative

Meta-dialogs; the dialog itself e.g., Could you hold on for a minute?

Subdialogs; clarification/confirmation e.g., You selected KE airlines, is it right?

Multiple dialog threads e.g., domain switching

Page 47: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG EXAMPLES Example 1S: Where are you going?U: Gangnam, SeoulS: What kinds of food do you have in Gangnam, Seoul?U: Please African foodS: Sorry, there are no African restaurants. Please select another food.

System-initiative Implicit confirmation Merely informs user of failed query Rigid dialog flow

Page 48: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG EXAMPLES Example 2S: May I help you?U: I’d like to have African food in Gannam, Seoul S: Sorry, there are no African restaurants. S: There are Korean, Japanese, Indian, and American restaurants in Gangnam.U: Please Korean restaurants.S: Would you like to find Korean restaurants?U: Yes pleases

Mixed-initiative Implicit/Explicit confirmation Suggests the alternative when query fails More natural dialog flow

Page 49: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG EXAMPLES Example 3

U: I’d like to have African food in Gangnam, Seoul S: Sorry, there are no African restaurants. S: How about American restaurants in Gangnam, Seoul?U: No I don’t like it.S: What is your favorite food?U: I like grilled and seasoned beef S: So, how about Korean restaurants?U: Good.

Mixed-initiative Implicit/Explicit confirmation Recommends the alternative when query fails Most natural dialog flow

Page 50: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

DIALOG CONTROL Finite-state based approach

Input : Single word or phrase State transition network (or graph) It can allow all legal dialog flow which is pre-defined in the state diagram.

Frame-based approach Input : Natural language with concept spotting Form-filling tasks to access information source But the questions do not have to be asked in a predetermined sequence

Plan-based approach Input : Unrestricted natural language The modeling of dialog as collaboration between intelligent agents to

solve some problems or task. For more complex task, such as negotiation and problem solving.

Page 51: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

KNOWLEDGE-BASED DM (KBDM) Rule-based approaches

Early KBDMs were developed with handcrafted rules (e.g., information state update).

Simple Example [Larsson and Traum, 2003]

Agenda-based approaches Recent KBDMs were developed with domain-

specific knowledge and domain-independent dialog engine.

Page 52: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

VoiceXML What is VoiceXML?

The HTML(XML) of the voice web. The open standard markup language for voice

application VoiceXML Resources : http://www.voicexml.org/

Can do Rapid implementation and management Integrated with World Wide Web Mixed-Initiative dialogue Simple Dialogue implementation solution

Page 53: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

VoiceXML EXAMPLES: Say one of: Sports scores; Weather information; Log in.U: Sports scores

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <menu> <prompt>Say one of: <enumerate/></prompt> <choice next="http://www.example.com/sports.vxml"> Sports scores </choice> <choice next="http://www.example.com/weather.vxml"> Weather information </choice> <choice next="#login"> Log in </choice> </menu> </vxml>

Page 54: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

AGENDA-BASED DM RavenClaw DM (CMU)

Using Hierarchical Task Decomposition A set of all possible dialogs in the domain Tree of dialog agents Each agent handles the corresponding part of the dialog

task

[Bohus and Rudnicky, 2003]

Page 55: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

EXAMPLE-BASED DM (EBDM) Example-based approaches

Dialog State Space

Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)

Dialog CorpusUSER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]

Turn #1 (Domain=Building_Guidance)

Dialog Example

Indexed by using semantic & discourse features

Having the similar state

),(argmax* heSe iEei

[Lee et al., 2009]

Page 56: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

STOCHASTIC DM Supervised approaches [Griol et al., 2008]

Find the best system action to maximize the conditional probability P(a|s) given the dialog state Based on supervised learning algorithms

MDP/POMDP-based approaches [Williams and Young, 2007] Find the optimal system action to maximize the reward

function R(a|s) given the belief state Based on reinforcement learning algorithms

In general, a dialog state space is too large So, generalizing the current dialog state is important

Page 57: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Dialog as a Markov Decision Process

User

SpeechUnderstanding

SpeechGeneration

StateEstimator

DialogPolicy Optimize

k

kk rR

us

ua

ma

ua~

ma~

duum ssas ~,~,~~MDP

usergoal

userdialog act

noisy estimate ofuser dialog act

dialoghistory

machinestate

machinedialog act

ReinforcementLearning

Reward),( mm asr

ms~

ds

[Williams and Young, 2007]

Page 58: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SUMMARY

x y

InputWords or Intentions

OutputSystem

Response

Dialog Corpus

Dialog Model

External DB

Agenda-based approachStochastic approachExample-based approach

Page 59: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Demo Building guidance dialog TV program guide dialog Multi-domain dialog with chatting

Page 60: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

CHALLENGES & ISSUESMULTI-MODAL DIALOG SYSTEM

Page 61: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MULTI-MODAL DIALOG SYSTEM

x y

InputGesture

OutputSystem

Response

(x, y)

Training examples

Learning algorithm

InputSpeech

Inputface

Page 62: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MULTIMODAL DIALOG SYSTEM A system which supports human-computer

interaction over multiple different input and/or output modes. Input: voice, pen, gesture, face expression, etc. Output: voice, graphical output, etc.

Applications GPS Information guide system Smart home control Etc.

여기에서 여기로 가는 제일 빠른 길 좀 알려 줘 .

voice

pen

Page 63: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MOTIVATION Speech: the Ultimate Interface?

(+) Interaction style: natural (use free speech) Natural repair process for error recovery

(+) Richer channel – speaker’s disposition and emotional state (if system’s knew how to deal with that..)

(-) Input inconsistent (high error rates), hard to correct error e.g., may get different result, each time we speak the

same words. (-) Slow (sequential) output style: using TTS (text-to-speech)

How to overcome these weak points? Multimodal interface!!

Page 64: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

ADVANTAGES Task performance and user preference

Migration of Human-Computer Interaction away from the desktop

Adaptation to the environment

Error recovery and handling

Special situations where mode choice helps

Page 65: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

TASK PERFORMANCE AND USER PREFERENCE Task performance and user preference for

multimodal over speech only interfaces [Oviatt et al., 1997] 10% faster task completion, 23% fewer words, (Shorter and simpler linguistic constructions) 36% fewer task errors, 35% fewer spoken disfluencies, 90-100% user preference to interact this way.

• Speech-only dialog system

Speech: Bring the drink on the table to the side of bed

• Multimodal dialog System

Speech: Bring this to herePen gesture:

Easy, Simplified

user utterance

!

Page 66: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

MIGRAION OF HCI AWAY FROM THE DESKTOP Small portable computing devices

Such as PDAs, organizers, and smart-phones Limited screen real estate for graphical output Limited input no keyboard/mouse (arrow keys, thumbwheel) Complex GUIs not feasible Augment limited GUI with natural modalities such as speech and pen

Use less space Rapid navigation over menu hierarchy

Other devices Kiosks, car navigation system…

No mouse or keyboard

Speech + pen gesture

Page 67: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

APPLICATION TO THE ENVIRONMENT Multimodal interfaces enable rapid adaptation to

changes in the environment Allow user to switch modes Mobile devices that are used in multiple environments

Environmental conditions can be either physical or social Physical

Noise: Increases in ambient noise can degrade speech performance switch to GUI, stylus pen input

Brightness: Bright light in outdoor environment can limit usefulness of graphical display

Social Speech many be easiest for password, account number

etc, but in public places users may be uncomfortable being overheard Switch to GUI or keypad input

Page 68: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

ERROR RECOVERY AND HANDLING Advantages for recovery and reduction of

error: Users intuitively pick the mode that is less error-prone. Language is often simplified. Users intuitively switch modes after an error

The same problem is not repeated. Multimodal error correction

Cross-mode compensation - complementarity Combining inputs from multiple modalities can reduce

the overall error rate Multimodal interface has potentially

Page 69: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SPECIAL SITUATIONS WHERE MODE CHOICE HELPS Users with disability People with a strong accent or a cold People with RSI Young children or non-literate users Other users who have problems when handle

the standard devices: mouse and keyboard

Multimodal interfaces let people choose their preferred interaction style depending on the actual task, the context, and their own preferences and abilities.

Page 70: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Demo Multimodal dialog in smart home domain English teaching dialog

Page 71: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

CHALLENGES & ISSUESDIALOG SIMULATOR

Page 72: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SYSTEM EVALUATION Real User Evaluation

Real Interaction

1. High Cost (-)

2. Human Factor- It looses objectivity (-)

Spoken Dialog System

1. Reflecting Real World (+)

Page 73: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SYSTEM EVALUATION Simulated User Evaluation

Simulated Interaction

Spoken Dialog System Simulated User

Virtual Environment

1. Low Cost (+)

2. Consistent Evaluation- It guarantees objectivity (+)

1. Not Real World (-)

Page 74: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

SYSTEM DEVELOPMENT Exposing System to Diverse Environment

•Different users•Noises•Unexpected focus shift

Spoken Dialog System

Page 75: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

USER SIMULATION

Spoken Dialog System Simulated Users

Simulated User Input

System Output

Page 76: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

PROBLEMS User Simulation for spoken dialog systems

involves four essential problems [Jung et al., 2009]

User Intention Simulation

User Utterance Simulation

ASR Channel SimulationSpoken Dialog System Simulated Users

Page 77: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

USER INTENTION SIMULATION Goal

Generating appropriate user intentions given the current dialog state

P( user_intention | dialog_state)

ExampleU1 : 근처에 중국집 가자S1 : 행당동에 북경 , 아서원 , 시온반점이 있고 홍익동에

정궁중화요리 , 도선동에 양자강이 있습니다 . U2 : 삼성동에는 뭐가 있지 ? Semantic Frame

(Intention)Dialog act WH-

QUESTIONMain Goal SEARCH-LOCNamed Entity LOC_ADDRES

S

Page 78: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

USER UTTERANCE SIMULATION Goal

Generating natural languages given the user intentions

P( user_utterance | user_intention )

Semantic Frame (Intention)Dialog act WH-

QUESTIONMain Goal SEARCH-LOCNamed Entity LOC_ADDRES

S

• 삼성동에는 뭐가 있지 ?• 삼성동 쪽에 뭐가 있지 ?• 삼성동에 있는 것은 뭐니 ?• …

Page 79: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

ASR CHANNEL SIMULATION Goal

Generating noisy utterances from a clean utterance at certain error rates

P( utternoisy | utterclean , error_rate)

• 삼성동에는 뭐가 있지 ?

• 삼성동에 뭐 있니 ? • 삼정동에는 뭐가 있지 ? • 상성동 뭐 가니 ?•삼성동에는 무엇이 있니 ?• …

Clean utterance Noisy utterance

Page 80: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

AUTOMATED DIALOG SYSTEM EVALUATION

[Jung et al., 2009]

Page 81: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Demo Self-learned dialog system Translating dialog system

Page 82: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES

Page 83: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES ASR (1/2)

L.R. Rabiner and B.H. Juang, 1993. Fundamentals of Speech Recognition, Prentice-Hall.

L.R. Bahl, P.F. Brown, P.V. de Souza, and R.L. Mercer, 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition, Proceedings of 1986 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.49–52.

K.K. Paliwal, 1992. Dimensionality reduction of the enhanced feature set for the HMMbased speech recognizer, Digital Signal Processing, vol.2, pp.157–173.

B.H. Juang, S.E. Levinson, and M.M. Sondhi, 1986. Maximum likelihood estimation for multivariate mixture observations of Markov chains, IEEE Transactions on Information Theory, vol.32, no.2, pp.307–309.

T.J. Hazen, I.L. Hetherington, H. Shu, and K. Livescu, 2002. Pronunciation modeling using a finite-state transducer representation, Proceedings of the ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation, pp.99–104.

Page 84: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES ASR (2/2)

K. Demuynck, J. Duchateau, and D.V. Compernolle, 1997. A static lexicon network representation for cross-word context dependent phones, Proceedings of the 5th European Conference on Speech Communication and Technology, pp.143–146.

S.J. Young, N.H. Russell, and J.H.S Thornton, 1989. Token passing: a simple conceptual model for connected speech recognition systems. Technical Report CUED/F-INFENG/TR.38, Cambridge University Engineering Department.

S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland, 1996. The HTK book. Entropics Cambridge Research Lab., Cambridge, UK.

HTK website: http://htk.eng.cam.ac.uk/

Page 85: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES SLU

R. De Mori et al. Spoken Language Understanding for Conversational Systems. Signal Processing Magazine. 25(3):50-58. 2008.

Y. Wang, L. Deng, and A. Acero. September 2005, Spoken Language Understanding: An introduction to the statistical framework. IEEE Signal Processing Magazine, 27(5):16-31.

D. Gildea, and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245-288.

M. Jeong and G.G. Lee, 2006. Jointly predicting dialog act and named entity for spoken language understanding, IEEE/ACL workshop on SLT.

T. G. Dietterich, 2002. Machine learning for sequential data: A review. Caelli(Ed.) Structural, Syntactic, and Statistical Pattern Recognition.

J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. ICML.

C. Sutton and A. McCallum, 2006. An introduction to conditional random fields for relational learning. In Introduction to Statistical Relational Learning. L. Getoor and B. Taskar, Eds. MIT Press.

Page 86: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES DM

M. F. McTear, Spoken Dialogue Technology - Toward the Conversational User Interface: Springer Verlag London, 2004.

S. Larsson, and D. R. Traum, “Information state and dialogue management in the TRINDI dialogue move engine toolkit,” Natural Language Engineering, vol. 6, pp. 323-340, 2006.

B. Bohus, and A. Rudnicky, “RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda,” in Proc. of the European Conference on Speech, Communication and Technology, 2003, pp. 597-600.

D. Griol, L. F. Hurtado, E. Segarra et al., “A statistical approach to spoken dialog systems design and evaluation,” Speech Communication, vol. 50, no. 8-9, pp. 666-682, 2008.

J. D. Williams, and S. Young, “Partially observable Markov decision processes for spoken dialog systems,” Computer Speech and Language, vol. 21, pp. 393-422, 2007.

C. Lee, S. Jung, S. Kim et al., “Example-based Dialog Modeling for Practical Multi-domain Dialog System,” Speech Communication, vol. 51, no. 5, pp. 466-484, 2009.

Page 87: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

REFERENCES MULTI-MODAL DIALOG SYSTEM & DIALOG

SIMULATOR S. L. Oviatt , A. DeAngeli, and K. Kuhn, 1997, Integration and

synchronization of input modes during multimodal human-computer interaction. In Proceedings of Conference on Human Factors in Computing Systems: CHI '97.

R. Lopez-Cozar, A. D. la Torre, J. C. Segura et al., “Assessment of dialogue systems by means of a new simulation technique,” Speech Communication, vol. 40, no. 3, pp. 387-407, 2003.

J. Schatzmann, B. Thomson, K. Weilhammer et al., “Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System,” in Proc. of the Human Language Technology/North American Chapter of the Association for Computational Linguistics, 2007, pp. 149-152.

S. Jung, C. Lee, K. Kim et al., “Data-driven user simulation for automated evaluation of spoken dialog systems,” Computer Speech and Language, 2009.

Page 88: SPOKEN DIALOG SYSTEM FOR INTELLIGENT SERVICE ROBOTS

Thank You & QAThank You & QA


Recommended