+ All Categories
Home > Documents > Statistical Spoken Dialogue Systems and the...

Statistical Spoken Dialogue Systems and the...

Date post: 10-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
27
Dialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering Department Cambridge, UK Steve Young Statistical Spoken Dialogue Systems and the Challenges for Machine Learning 1
Transcript
Page 1: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Dialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering Department Cambridge, UK

Steve Young

Statistical Spoken Dialogue Systems and the Challenges for Machine Learning

1

Page 2: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Dialog System Architecture

Semantic DecoderASR Belief

Tracker

Understanding

Turn Level Dialogue Level

Database/Application

MessageGenerator

ResponsePlanner

Generation

Turn Level Dialogue Level

TTS

User DialogPolicy

Dialog Manager

2

Recognition Hypotheses

Belief State

System Actions

System Response

Page 3: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Understanding: ASR -> Beliefs

3

CNNASR Hyp#1 [p ]

LSTM

1

Last System Act

WE x

p1

+ASR Hyp#1 [p ] WE x

p2

WE

LSTM

SoftMax

2

Per Turn Semantic Decoding Per Utterance Belief Tracking

Ps (v)

Repeated for Each Slot s

WordEmbedding

Belief State = Concatenation of Slot Probability Vectors

CNN

Page 4: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

c

c

c

c

c

c

c

c

c

1

2

3

4

5

6

7

8

9

I

am

looking

for

a

cheap

hotel

near

here

Slide convolutionfilter k of length lover utterance

ci = tanh fkl .wi:i+l−1 + b( )

Using a CNN to Extract Lexical Features

4

CNN is the key component: it scans each utterance applying convolution windows of 1, 2, 3, 4, … words

r

r

r

r

r

1

2

3

4

5

Sentencerepresentation

r

r

r

r

r

r

11

21

31

41

51

r

r

r

r

r

12

22

32

42

52

r

r

r

r

r

13

23

33

43

53

r

r

r

r

r

14

24

34

44

54

max

+ + +

window size l

filte

r num

ber k

f 43

CNNw

Page 5: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Understanding: ASR -> Beliefs

5

CNNASR Hyp#1 [p ]

LSTM

1

Last System Act

WE x

p1

+ASR Hyp#1 [p ] WE x

p2

WE

LSTM

SoftMax

2

Per Turn Semantic Decoding Per Utterance Belief Tracking

Ps (v)

Repeated for Each Slot s

WordEmbedding

Belief State = Concatenation of Slot Probability Vectors

CNN

Henderson, M., et al. (2014). Word-Based Dialog State Tracking with Recurrent Neural Networks. SigDial 2014, Philadelphia, PA. Rojas-Barahona, L., et al. (2016). Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding. Coling, Osaka, Japan. Mrksic, N., et al. (2016) Neural Belief Tracker: Data-Driven Dialogue State Tracking. arXiv:1606.03777

Page 6: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Generation: actions -> words

6

Need to convert abstract system actions to natural language e.g.

<name><s>

inform(<name>, <food>)

serves<name>

<food>serves

training

inform(name=“The Peking”, food=“chinese”) “The Peking serves chinese food”

SC-LSTM

food<food>

Page 7: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

running

inform(name=<name>, food=<food>) “ <name> serves <food> food”

Generation: actions -> words

7

Need to convert abstract system actions to natural language e.g.

request(<food>)

you

Solution: delexicalise the training data, and train a conditional LSTM

SC-LSTM like?

Page 8: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Semantically constrained LSTM

8

i o

f

c ht

ht−1wt

SC-LSTM

rdt−1 dt

semanticconditioningsystem

dialog act

word sequence

Page 9: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Dialog Manager

9

Weather Other

Domain

Local Maine

Location

Temp Rain

Weather Condition

Wind b

π

π a Actions: request, confirm,inform, execute, etc

1. Belief state b encodes the state of the dialog, including all relevant history.

2. Belief state is updated every turn of the dialog.

3. The policy determines the best action to make at each turn via a mapping from the belief state b to actions a.

4. Every dialog ends with a reward: +ve for success, -ve for failure. Plus a weak -ve reward for every turn to encourage brevity.

5. Reinforcement Learning is used to find the best policy.

π

Page 10: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Reinforcement Learning

10

π (b,a) :!n × A→ [0,1]Policy:

R = r(bτ ,aτ )τ=1

T

∑Reward: NB: no discounting:

π * = argmaxπ E[R |π ]{ }Problem: find

Page 11: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Policy Representation

• Gaussian Processes: data efficient, includes explicit confidence on Q-value. Can support large n, but action space |A| limited.

• Deep Neural Networks: scale well on both n and |A|, but no built-in confidence measure and poor convergence properties.

11

π (b,a) :!n × A→ [0,1] n ~ 20 - 100 |A| ~ 200+

Page 12: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Training Data

• Ideally, train directly on interactions with real users but ✦ training even a small domain may require around 5k

dialogues (many in exploration mode) ✦ reward signal is hard to measure (see later)

• In practice, train in stages ✦ initialise with corpus data ✦ train/test on user simulator ✦ tune on real users

12

Page 13: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Optimisation Algorithms• Policy Iteration

✦ GP Sarsa ✦ Deep Q-learning

• Policy Gradient ✦ Natural Actor Critic

• “Black box” methods ✦ Trust Regions

13

Page 14: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

1. NN policy: 1 common 32 node tanh hidden layer. Action outputs encoded via 2 softmax output partitions and 6 sigmoid partitions

2. Pre-trained (using SL for NN and prior for GP) on 720 dialogs from Cambridge restaurant domain.

3. Optimised (using RL) on 5000 simulated dialogues

SL 94.5%SL+RL 98.2%

NN Policy trained and tested on-line with real users.

Simulation Results

NAC trained Neural Net Policy vs GP Policy

Real User Results

Su, P-H, et al., Continuously Learning Neural Dialogue Management, arXiv:1606.02689

Page 15: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Curse of Dimensionality

15

Domain Complexity

Belief Space

Multiple Domains

“I am looking for a cheap italian restaurant.”

Single domain Simple types

action=search venue=restaurant price=cheap food=italian

Restaurant Domain

“Book a table at Nando’s after my meeting with Bill.”

Multi-domain Simple types

action=book venue=restaurant name=Nando’s when=?? action=lookup event=meeting attendee=Bill

Restaurant DomainCalendar Domain

action=book venue=restaurant when={time(19:45), date(today+1)}

“Book a table at 7:45pm tomorrow.”

Single domain Complex types

Multi-domain Complex types

“Book a table at Nando’s for 7:45pm tomorrow and invite Bill and John”

action=book venue=restaurant name = Nando’s when={time(19:45), date(today+1)} action=create event=meeting attendees = {“Bill”, “John”}

Page 16: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Bayesian Committee Machines

16

Assume M independent policies and a common belief state

Q1Domain1

b …

argmaxa Q̂(b,a){ }

Q2

Qi

Q̂ = f Q1,...Qi ,...( )Domain

2

Domaini

r(b,a)distribute reward to all committee members scaled by contribution to actual selected action

Page 17: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

17

Example using GP-RL:

M. Gasic et al (2015). "Policy Committee for Adaptation in Multi-domain Spoken Dialogue Systems." IEEE ASRU 2015, Scotsdale, AZ.

Rew

ard

Number of Training Dialogues

Laptop domain trained in parallel with Hotels

and Restaurants

Laptop domain trained in isolation

Three domains trained from scratch on line both individually and in parallel:

• Hotel info • Restaurant info • Laptop product guide

Q = ΣQ ΣiQ( )−1Qi

i=1

M

ΣQ = ΣiQ( )−1 − const

i=1

M

∑⎡⎣⎢

⎤⎦⎥

−1

Q̂ ∼ N Q,ΣQ( )where

Page 18: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Domain Complexity

18

b1

a1

b2

a2

b6

a6

b7

a7

π calendar r1 r2 r3 r4 r5 r6 r7

b3

a3

b4

a4

b5

a5How can I help?

Fix a meet-ing

Who with?

Bill

What time?

5.30

Was that 9.30?

No, 5.30

5.30pm?

Yes

Ok meet-ing at 5.30pm with Bill?

Yes

Meeting is scheduledSystem:

User:

b3

a3

b4

a4

b5

a5

GetTime

Page 19: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Hierarchical Reinforcement Learning

19

b1

a1

b2

a2

b6

a6

b7

a7

π calendar r1 r2

r3 r4 r5

r6 r7

b3

a3

b4

a4

b5

a5

GetTime

π time + +

Page 20: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Hierarchical Deep Reinforcement Learning

20

T. Kulkarni et al (2016). "Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation." arXiv:1604.06057.

DQNθ

DQNλ DQNλ DQNλ

DQNθ

bt bt+1 bt+N

at at+1

gt gt

at+N

gt gt+N

Topmeta-level

Subgoal-level eg GetTime

NextSubgoal

Page 21: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Measuring Success

21

Task success is not always obvious….

b1

a1

b2

a2

b6

a6

b7

a7

π calendar r1 r2 r3 r4 r5 r6 r7

b3

a3

b4

a4

b5

a5How can I help?

Fix a meet-ing

Who with?

Bill

What time?

5.30

Was that 9.30?

No, 5.30

5.30pm?

Yes

Ok meet-ing at 5.30pm with Bill?

Yes

Meeting is scheduledSystem:

User:

….so probably ok

Page 22: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Measuring Success

22

However, what about the problematic weather query?

π calendar

b1

a1

b2

a2

r1 r2 r3 r4

b3

a3

b4

a4

How can I help?

Hows the weather in

Maine

It’ll be fine all day in the Bay

area.

No, Maine

I know your name Steve, it’s “Steve”.

I want the weather in

Maine!

I dont believe it’s raining right now.

System:

User:

Page 23: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

On-line Reward Estimation

23

Estimated Reward Signal

LSTMEncode

GP-based Reward Estimator

User

If low confidence then

Prompt for user feedback

“good” or

“bad”

Episodic Dialogue Features

64-D embedding

b1

a1

b2

a2

r1 r2 r3 r4

b3

a3

b4

a4

Page 24: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

On-line Reward and Policy Learning

24

Page 25: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

On-line Reward and Policy Learning

25

P-H. Su et al (2016). "On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems." ACL 2016, Berlin.

Page 26: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

Summary• POMDPs and Reinforcement Learning provide a

powerful mathematical framework for decision making in intelligent conversational agents.

• DNNs provide a flexible building block for all stages of the dialogue system pipeline, though training is often problematic.

• Unrestricted conversation is challenging but there are several promising approaches to managing complexity.

• For commercially deployed systems, the user is a tremendous untapped resource, and Reinforcement Learning provides the framework for exploiting it.

26

Page 27: Statistical Spoken Dialogue Systems and the …mi.eng.cam.ac.uk/~sjy/presentations/SSDS-Challenges.pdfDialogue Systems Group Machine Intelligence Laboratory Cambridge University Engineering

27

CreditsAll members of the Cambridge Dialogue Systems Group Past and Present:

Milica Gasic Catherine Breslin Pawel Budzianowski Matt Henderson Filip Jurcicek Simon Keizer Dongho Kim Fabrice Lefevre Francois Mairesse Nikola MrksicLina Rojas Barahona Jost Schatzmann

Matt Stuttle Martin Szummer Eddy Su Blaise Thomson Pirros Tsiakoulis Stefan Ultes David Vandyke Karl Weilhammer Shawn Wen Jason Williams Hui Ye Kai Yu


Recommended