+ All Categories
Home > Documents > Belief Updating in Spoken Dialog Systems

Belief Updating in Spoken Dialog Systems

Date post: 19-Feb-2016
Category:
Upload: yuki
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Belief Updating in Spoken Dialog Systems. Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004. Misunderstandings. Misunderstandings are an important problem in spoken dialog systems - PowerPoint PPT Presentation
Popular Tags:
38
Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004
Transcript
Page 1: Belief Updating  in Spoken Dialog Systems

Belief Updating in Spoken Dialog SystemsDialogs on Dialogs Reading Group June, 2005

Dan BohusCarnegie Mellon University, January 2004

Page 2: Belief Updating  in Spoken Dialog Systems

2

Misunderstandings

Misunderstandings are an important problem in spoken dialog systems System obtains an incorrect semantic interpretation of the

users’ utterance

15-40% of turns Significant negative impact on overall success rate

Page 3: Belief Updating  in Spoken Dialog Systems

3

Confidence annotation

Use confidence scores to guard against potential misunderstandings

Traditionally: from speech recognition engine [Chase, Bansal, Cox, Kemp, etc]

Focuses on WER, not tuned to task at hand More recently: system-specific semantic

confidence scores [Carpenter, Walker, San-Segundo, etc]

Integrate knowledge from different levels in the system: speech recognition, language understanding, dialog management

Page 4: Belief Updating  in Spoken Dialog Systems

4

Correction Detection

Detect whether or not the user is trying to correct the system Related: aware-site detection

Similar ML approaches using multiple sources of knowledge [Litman, Swerts, Krahmer, etc]

Page 5: Belief Updating  in Spoken Dialog Systems

5

S: Where are you flying from?

U: [CityName={Aspen/0.6; Austin/0.2}]S: Did you say you wanted to fly out of Aspen?

U: [No/0.6] [CityName={Boston/0.8}]

Proposed: Belief Updating

Integrate confidence annotation and correction detection in a unified framework for continuously tracking beliefs

[CityName={Aspen/?; Austin/?; Boston/?}]

A “belief updating” problem:

initial belief+

system action+

user response

updated belief

Page 6: Belief Updating  in Spoken Dialog Systems

6

Formally…

Given: An initial belief Pinitial(C) over concept C A system action SA A user response R

Construct an updated belief Pupdated(C) As “accurate” as possible

Pupdated(C) ← f (Pinitial(C), SA, R)

Page 7: Belief Updating  in Spoken Dialog Systems

7

Examples

Page 8: Belief Updating  in Spoken Dialog Systems

8

Examples - continued

Page 9: Belief Updating  in Spoken Dialog Systems

9

Outline

Introduction Data A simplified version of the problem. Approach User behaviors Learning: Preliminary results More on evaluation Where to from here?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 10: Belief Updating  in Spoken Dialog Systems

10

Data

Collected in an experiment with RoomLine Phone-based, mixed initiative system for making conference

room reservations Equipped with explicit and implicit confirmations

Corpus statistics 46 participants 449 sessions, 8278 turns 13.5% misunderstandings [9.8% / 22.5%] 25.6% WER [19.6% / 39.5%] 11362 concept updates

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 11: Belief Updating  in Spoken Dialog Systems

11

System actions and concept updates

Explicit and implicit confirmations

Start time: Explicit Confirmation/grounding [EC]Date: Implicit Confirmation/grounding [IC]

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 12: Belief Updating  in Spoken Dialog Systems

12

System actions and concept updates

Date: Implicit Confirmation/grounding [IC]Start time: Implicit Confirmation/grounding [IC]End time: Implicit Confirmation/task [ICT]

Implicit Confirmations Task

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 13: Belief Updating  in Spoken Dialog Systems

13

# of Conflicting Hypotheses

Below 3% involve more than 1 hypothesis

System not using multiple hypotheses

[Future work: regenerate multiple hypotheses in batch]

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 14: Belief Updating  in Spoken Dialog Systems

14

Outline

Introduction Data A simplified version of the problem. Approach User behaviors Learning: preliminary results More on evaluation Where to from here?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 15: Belief Updating  in Spoken Dialog Systems

15

A Simplified Version

Given only 3% have more than 1 hypothesis,

Update belief in the top-hypothesis after implicit and explicit confirmations

Instead of Pupdated(C) ← f (Pinitial(C), SA, R)

Do ConfTopupdated(C) ← f (ConfTopinitial(C), SA, R) For SA = {EC, IC, ICT}

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 16: Belief Updating  in Spoken Dialog Systems

16

Approach

Use machine learning Dataset

Concept updates for EC, IC, ICTs

Features Initial confidence score ConfTopinitial(C) System action (SA) User response (R)

Target Updated confidence score ConfTopupdated(C) Data is labeled, so we have a binary target

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 17: Belief Updating  in Spoken Dialog Systems

17

Outline

Introduction Data A simplified version of the problem. Approach User behaviors Learning: preliminary results More on evaluation Where to from here?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 18: Belief Updating  in Spoken Dialog Systems

18

User behaviors

Study of user behaviors in response to ICs and ECs Can inform feature selection and feature development Provide insights into where the difficulties are Can inform potential strategy refinements

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 19: Belief Updating  in Spoken Dialog Systems

19

User responses to ECs

Transcripts

Decoded

YES NO Other

CORRECT 1097 [94.2% of cor]

8 62

INCORRECT 3 202 [69.9% of inc]

84

YES NO Other

CORRECT 1016 [87.3% of cor]

11 137

INCORRECT 2 171 [69.9% of inc]

116

~10%

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 20: Belief Updating  in Spoken Dialog Systems

20

“Other” Responses to EC

“Eyeball” estimates (out of 146 responses) ~70% simply repeat the correct concept value

That should come in as a handy feature

~10% change conversation focus ~10% turn overtaking issues

Maybe inhibit barge-in until Antoine finishes his thesis

~10% other

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 21: Belief Updating  in Spoken Dialog Systems

21

User responses to ICs

Transcripts

Decoded

YES NO Other

CORRECT 166 [31.3% of cor]

38 326

INCORRECT 15 75 [31.5% of inc]

148

YES NO Other

CORRECT 151 [28.5% of cor]

20 369

INCORRECT 16 62 [26.1% of inc]

160

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 22: Belief Updating  in Spoken Dialog Systems

22

Users Don’t Always Correct ICs

Actually, they corrected in 45% of the casesUser does not

correct User corrects

CORRECT 557 1

INCORRECT 126 [55% of incor]

104[45% of incor]

That means if we knew exactly when they correct, we’d still have (126+1)/788 = 16% error

So what do users do when they don’t correct? They may actually correct partially Completely ignore the error … (if non-essential) Readjust to accommodate task

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 23: Belief Updating  in Spoken Dialog Systems

23

More questions…

Understand better this “ignore” phenomenon Impact on task success?

IC correction rate: 49% (successful tasks) vs 41% (unsuccessful) Fixed vs more “flexible” scenarios

Impact of prompt length on P(user will correct)? “Essential” vs “non-essential” concepts?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 24: Belief Updating  in Spoken Dialog Systems

24

Outline

Introduction Data A simplified version of the problem. Approach User behaviors Learning: preliminary results More on evaluation Where to from here?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 25: Belief Updating  in Spoken Dialog Systems

25

Which ML technique?

Need good probability outputs Margins produced by discriminant classifiers are inadequate If you want probability scores, i.e. conf = 0.85 means that in

85% of cases with conf=0.85 the concept is right evaluate on a soft-metric [I’ll contradict myself later!! ]

Step-wise logistic regression Sample-efficient Feature selection Good soft-metric performance

optimizes for avg. log likelihood of data

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 26: Belief Updating  in Spoken Dialog Systems

26

Data. Features For each system action {EC, IC, ICT}

Initial Confidence score Other indicators about current state:

How well has the dialog been going Which concept are we talking about How far back was this concept acquired

Features on user response Confirmation and Disconfirmation markers Acoustic / Prosodic: f0 (min, max, range, maxslope, etc) + normalized

versions Num words; turn length (secs) Concept information: expected / repeated / new concepts and grammar

slots… Confidence Barge-in & Timeout info Lexical features (preselected by MI with “target” or confirm/disconfirm

markers)

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 27: Belief Updating  in Spoken Dialog Systems

27

Results

Actually using a 1-level logistic model-tree Split on answer_type = {yes, no, other, no_parse} Perform step-wise logistic regression on the 4 leaves

P-entry = 0.05 P-reject = 0.30 BIC stopping criterion

Also tried full-blown model tree, results are similar, maybe marginally worse

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 28: Belief Updating  in Spoken Dialog Systems

28

Explicit Confirmation

HARD SOFT

Initial 31.1% -0.5076

Heuristic 8.6% -0.1943

LMT(CV) 3.7% -0.1160

LMT(training) 2.9% -0.0851

Initial Heuristic LMT(CV) LMT(training)0

5

10

15

20

25

30

35

Erro

r rat

e (%

)

Initial Heuristic LMT(CV) LMT(training)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Avg

. Log

-Lik

elih

ood

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 29: Belief Updating  in Spoken Dialog Systems

29

Implicit Confirmation

Initial Heuristic LMT(CV) LMT(training)0

5

10

15

20

25

30

35

Erro

r rat

e (%

)

Initial Heuristic LMT(CV) LMT(training)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Avg

. Log

-Lik

elih

ood

HARD SOFT

Initial 31.4% -0.6217

Heuristic 24.0% -0.6736

LMT(CV) 19.6% -0.4521

LMT(training) 18.8% -0.4124

Oracle Baseline 16.1% -

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 30: Belief Updating  in Spoken Dialog Systems

30

Outline

Introduction Data A simplified version of the problem. Approach User behaviors Learning: preliminary results More on evaluation Where to from here?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 31: Belief Updating  in Spoken Dialog Systems

31

What can Logistic Regression / AVG-LL do for you?

D = {d1, d2, d3, d4, …} di = 1/0 P(D) = ∏P(di=1 | xi) Express density P(di=1 | xi) as:

P(d=1 | x) = 1 / (1 + exp(-wx)) You can actually derive this if you start with P(x | d) gaussian

Find parameters w to max(P(D)) argmax(P(D)) = argmax ∏P(di=1 | xi) argmax(P(D)) = argmin ∑-log(P(di=1 | xi))

Hence we maximize the average log-likelihood

But what does that mean?

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 32: Belief Updating  in Spoken Dialog Systems

32

Loss function in Logistic Regression

Log-likelihood loss function

0.01 0.1 0.7 0.8 1

If d=1, then P(d=1)=0.01 is ten times worse than P(d=1)=0.1,but P(d=1)=0.7 is about the same as P(d=1)=0.8

Things are mirrored for d=0

d=1

This does not match the “threshold” model commonly used to engage actions

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 33: Belief Updating  in Spoken Dialog Systems

33

A New Loss Function: T2

A loss function that better matches our domain: T2 (or even T3)

Optimize argmax ∑ T2(P(di=c | xi)) Not differentiable Not convex

0 t1 t2 1

d=1

C1

C2

0 t1 t2 1

d=0C3

C4

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 34: Belief Updating  in Spoken Dialog Systems

34

Smoothed version

A loss function that better matches our domain: T2 (or even T3)

Optimize argmax ∑ SmoothT2(P(di=c | xi)) Differentiable! But still not convex … multiple local maxima

0 t1 t2 1

d=1

C1

C2

SmoothT2(p) = σ1(p) + σ2(p)σi(p) = 1 / (1+exp(ki(p-θi)))

with ks and θs chosen accordingly

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 35: Belief Updating  in Spoken Dialog Systems

35

Costs & Thresholds

Costs: where from? “Expert” knowledge Derive from data (might be tricky)

Thresholds: where from? Fixed Actually optimize at the same time

SmoothT2 = SmoothT2(w, th1, th2) Differentiable in th1 and th2, so we can do gradient search for it

Calibrates in one step both the belief updating and the threshold to minimize loss

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 36: Belief Updating  in Spoken Dialog Systems

36

Questions: What Next?

ICT: can we do anything there? Looks really tough

Push for better performance … Add more features? … Debug the models more, eliminate singularities … Why doesn’t the model-tree do better?

Push for better understanding … What are the other interesting questions …

Optimize for new loss function More in the future: look at the full belief

updating problem

data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

Page 37: Belief Updating  in Spoken Dialog Systems

37

Thank You!

Page 38: Belief Updating  in Spoken Dialog Systems

38

Encoding System Actions

For each concept update, define system action signature: <IC, ICT, EC, REQ> IC: Implicit Confirm [grounding] ICT: Implicit Confirm [task] EC: Explicit Confirm REQ: Request

Each variable can have 1 of 4 values 0 C (action happens on concept of interest) OC (action happens on some other concept) C&OC (action happens both on concept of interest and some other

concept) Only certain combinations are valid and appear in the

data


Recommended