Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.

transcript

Dialogue Management

Ling575Discourse and Dialogue

May 18, 2011

Dialog Management TypesFinite-State Dialog Management

Frame-based Dialog Management InitiativeVoiceXMLDesign and evaluation

Information State ManagementDialogue Acts

Recognition & generation

Statistical Dialogue Managemant (POMDPs)

Finite-State Management

Pros and ConsAdvantages

Straightforward to encodeClear mapping of interaction to modelWell-suited to simple information accessSystem initiative

DisadvantagesLimited flexibility of interaction

Constrained input – single itemFully system controlledRestrictive dialogue structure, order

Ill-suited to complex problem-solving

Frame-based DialogueManagement

Finite-state too limited, stilted, irritating

More flexible dialogue

Frame-based Dialogue Management

Essentially form-fillingUser can include any/all of the pieces of

formSystem must determine which entered,

remain

Essentially form-fillingUser can include any/all of the pieces of

formSystem must determine which entered,

remain

Essentially form-fillingUser can include any/all of the pieces of formSystem must determine which entered, remain

System may have multiple framesE.g. flights vs restrictions vs car vs hotelRules determine next action, question, information

presentation

Frames and InitiativeMixed initiative systems:

A) User/System can shift control arbitrarily, any timeDifficult to achieve

B) Mix of control based on prompt type

Prompts:

Prompts:Open prompt:

Prompts:Open prompt: ‘How may I help you?’

Open-ended, user can respond in any wayDirective prompt:

Prompts:Open prompt: ‘How may I help you?’

Open-ended, user can respond in any wayDirective prompt: ‘Say yes to accept call, or no

o.w.’Stipulates user response type, form

Initiative, Prompts, Grammar

Prompt type tied to active grammarSystem must recognize suitable input

Restrictive vs open-ended

Shift from restrictive to openTune to user: Novice vs Expert

Dialogue Management:Confirmation

Miscommunication common in SDS“Error spirals” of sequential errors

Highly problematicRecognition, recovery crucial

Confirmation strategies can detect, mitigateExplicit confirmation:

Ask for verification of each input Implicit confirmation:

Include input information in subsequent prompt

Confirmation StrategiesExplicit:

Confirmation Strategy Implicit:

Pros and ConsGrounding of user input

Weakest grounding I.e. continued att’n, next relevant contibution

Weakest grounding insufficient I.e. continued att’n, next relevant contibution

Explicit:

Explicit: highest: repetition Implicit:

Explicit: highest: repetition Implicit: demonstration, display

Explicit;

Explicit;Pro: easier to correct; Con: verbose, awkward, non-

Implicit:

Explicit;Pro: easier to correct; Con: verbose, awkward, non-

Implicit:Pro: more natural, efficient; Con: less easy to correct

RejectionSystem recognition confidence is too low

System needs to repromptOften repeatedly

Out-of-vocabulary, out-of-grammar inputs

Strategies: Progressive prompting

Strategies: Progressive prompting Initially: ‘rapid reprompting’: ‘What?’, ‘Sorry?’

Strategies: Progressive prompting Initially: ‘rapid reprompting’: ‘What?’, ‘Sorry?’Later: increasing detail

Progressive prompting

VoiceXMLW3C standard for simple frame-based dialogues

Fairly common in commercial settings

Construct forms, menusForms get field data

Using attached promptsWith specified grammar (CFG)With simple semantic attachments

Simple VoiceXML Example

Frame-based Systems:Pros and Cons

Advantages

AdvantagesRelatively flexible input – multiple inputs, ordersWell-suited to complex information access (air)Supports different types of initiative

Disadvantages

AdvantagesRelatively flexible input – multiple inputs, ordersWell-suited to complex information access (air)Supports different types of initiative

Disadvantages Ill-suited to more complex problem-solving

Form-filling applications

Dialogue Manager Tradeoffs

Flexibility vs Simplicity/PredictabilitySystem vs User vs Mixed Initiative

Order of dialogue interaction

Conversational “naturalness” vs Accuracy

Cost of model construction, generalization, learning, etc

Dialog Systems DesignUser-centered design approach:

Study user and task:Interview users; record human-human interactions;

systems

Build simulations and prototypes:Wizard-of-Oz systems (WOZ): Human replaces system

Can assess issues in partial system; simulate errors, etc

systems

Build simulations and prototypes:Wizard-of-Oz systems (WOZ): Human replaces system

Can assess issues in partial system; simulate errors, etc

Iteratively test on users: Redesign prompts (email subdialog)Identify need for barge-in

SDS EvaluationGoal: Determine overall user satisfaction

Highlight systems problems; help tune

Classically: Conduct user surveys

SDS EvaluationUser evaluation issues:

Expensive; often unrealistic; hard to get real user to do

Create model correlated with human satisfaction

Criteria:

Criteria:Maximize task success

Measure task completion: % subgoals; Kappa of frame values

Criteria:Maximize task success

Measure task completion: % subgoals; Kappa of frame values

Minimize task costsEfficiency costs: time elapsed; # turns; # error correction

turnsQuality costs: # rejections; # barge-in; concept error rate

PARADISE Model

PARADISE ModelCompute user satisfaction with questionnaires

Extract task success and costs measures from corresponding dialogsAutomatically or manually

Perform multiple regression:Assign weights to all factors of contribution to UsatTask success, Concept accuracy key

Allows prediction of accuracy on new dialog w/Q&A

Information State Dialogue Management

Problem: Not every task is equivalent to form-filling

Real tasks require:

Real tasks require:Proposing ideas, refinement, rejection, grounding,

clarification, elaboration, etc

Real tasks require:Proposing ideas, refinement, rejection, grounding,

clarification, elaboration, etc

Information state models include: Information state Dialogue act interpreterDialogue act generatorUpdate rulesControl structure

Information State SystemsInformation state :

Discourse context, grounding state, intentions, plans.

Dialogue acts:Extension of speech acts, to include grounding

actsRequest-inform; Confirmation

Update rulesModify information state based on DAs

When a question is asked

Dialogue acts:Extension of speech acts, to include grounding acts

Request-inform; Confirmation

When a question is asked, answer itWhen an assertion is made,

Dialogue acts:Extension of speech acts, to include grounding acts

Request-inform; Confirmation

When a question is asked, answer itWhen an assertion is made,

Add information to context, grounding state

Information State Architecture

Simple ideas, complex execution

Dialogue ActsExtension of speech acts

Adds structure related to conversational phenomenaGrounding, adjacency pairs, etc

Many proposed tagsetsVerbmobil: acts specific to meeting sched domain

Many proposed tagsetsVerbmobil: acts specific to meeting sched domainDAMSL: Dialogue Act Markup in Several Layers

Forward looking functions: speech actsBackward looking function: grounding, answering

Many proposed tagsetsVerbmobil: acts specific to meeting sched domainDAMSL: Dialogue Act Markup in Several Layers

Forward looking functions: speech actsBackward looking function: grounding, answering

Conversation acts:Add turn-taking and argumentation relations

Verbmobil DA18 high level tags

Dialogue Act Interpretation

Automatically tag utterances in dialogue

Some simple cases:Will breakfast be served on USAir 1557?

Some simple cases:YES-NO-Q: Will breakfast be served on USAir

1557? I don’t care about lunch.

1557?Statement: I don’t care about lunch.Show be flights from L.A. to Orlando

1557?Statement: I don’t care about lunch.Command: Show be flights from L.A. to Orlando

Is it always that easy?Can you give me the flights from Atlanta to

Boston?

Some simple cases:YES-NO-Q: Will breakfast be served on USAir 1557?Statement: I don’t care about lunch.Command: Show be flights from L.A. to Orlando

Is it always that easy?Can you give me the flights from Atlanta to Boston?

Syntactic form: question; Act: request/commandYeah.

Some simple cases:YES-NO-Q: Will breakfast be served on USAir 1557?Statement: I don’t care about lunch.Command: Show be flights from L.A. to Orlando

Is it always that easy?Can you give me the flights from Atlanta to Boston?Yeah.

Depends on context: Y/N answer; agreement; back-channel

Dialogue Act AmbiguityIndirect speech acts

Dialogue Act RecognitionHow can we classify dialogue acts?

Sources of information:

Sources of information:Word information:

Please, would you: request; are you: yes-no question

Please, would you: request; are you: yes-no questionN-gram grammars

Prosody:

Prosody:Final rising pitch: question; final lowering: statementReduced intensity: Yeah: agreement vs backchannel

Adjacency pairs:

Adjacency pairs:Y/N question, agreement vs Y/N question, backchannelDA bi-grams

Task & CorpusGoal:

Identify dialogue acts in conversational speech

Task & CorpusGoal:

Identify dialogue acts in conversational speech

Spoken corpus: SwitchboardTelephone conversations between strangersNot task oriented; topics suggested1000s of conversations

recorded, transcribed, segmented

Dialogue Act TagsetCover general conversational dialogue acts

No particular task/domain constraints

Original set: ~50 tags Augmented with flags for task, conv mgmt

220 tags in labeling: some rare

Final set: 42 tags, mutually exclusiveSWBD-DAMSLAgreement: K=0.80 (high)

1,155 conv labeled: split into train/test

Common Tags

Statement & Opinion: declarative +/- op

Question: Yes/No&Declarative: form, force

Backchannel: Continuers like uh-huh, yeah

Turn Exit/Adandon: break off, +/- pass

Answer : Yes/No, follow questions

Agreement: Accept/Reject/Maybe

Probabilistic Dialogue Models

HMM dialogue models

HMM dialogue modelsStates = Dialogue acts; Observations: Utterances

Assume decomposable by utteranceEvidence from true words, ASR words, prosody

DA Classification - ProsodyFeatures:

Duration, pause, pitch, energy, rate, genderPitch accent, tone

Results:Decision trees: 5 common classes

45.4% - baseline=16.6%

Prosodic Decision Tree

DA Classification -WordsWords

Combines notion of discourse markers and collocations: e.g. uh-huh=Backchannel

Contrast: true words, ASR 1-best, ASR n-best

Results:Best: 71%- true words, 65% ASR 1-best

DA Classification - AllCombine word and prosodic information

Consider case with ASR words and acoustics

Consider case with ASR words and acousticsProsody classified by decision trees

Incorporate decision tree posteriors in model for P(f|d)

Consider case with ASR words and acousticsProsody classified by decision trees

Incorporate decision tree posteriors in model for P(f|d)

Slightly better than raw ASR

Integrated Classification

Focused analysisProsodically disambiguated classes

Statement/Question-Y/N and Agreement/BackchannelProsodic decision trees for agreement vs backchannel

Disambiguated by duration and loudness

Substantial improvement for prosody+wordsTrue words: S/Q: 85.9%-> 87.6; A/B: 81.0%->84.7

Substantial improvement for prosody+wordsTrue words: S/Q: 85.9%-> 87.6; A/B: 81.0%->84.7ASR words: S/Q: 75.4%->79.8; A/B: 78.2%->81.7

More useful when recognition is iffy

Many VariantsMaptask: (13 classes)

Serafin & DiEugenio 2004Latent Semantic analysis on utterance vectorsText onlyGame information; No improvement for DA history

Surendran & Levow 2006SVMs on term n-grams, prosodyPosteriors incorporated in HMMs

Prosody, sequence modeling improves

Surendran & Levow 2006SVMs on term n-grams, prosodyPosteriors incorporated in HMMs

Prosody, sequence modeling improves

MRDA: Meeting tagging: 5 broad classes

ObservationsDA classification can work on open domain

Exploits word model, DA context, prosodyBest results for prosody+wordsWords are quite effective alone – even ASR

Questions:

ObservationsDA classification can work on open domain

Exploits word model, DA context, prosodyBest results for prosody+wordsWords are quite effective alone – even ASR

Questions: Whole utterance models? – more fine-grainedLonger structure, long term features

Detecting Correction ActsMiscommunication is common in SDS

Utterances after errors misrecognized >2x as oftenFrequently repetition or paraphrase of original input

Systems need to detect, correct

Corrections are spoken differently:Hyperarticulated (slower, clearer) -> lower ASR

conf.Some word cues: ‘No’,’ I meant’, swearing..

Can train classifiers to recognize with good acc.

Generating Dialogue ActsGeneration neglected relative to generation

Stent (2002) model: Conversation acts, Belief modelDevelops update rules for content planning, e.g.

If user releases turn, system can do ‘TAKE-TURN’ actIf system needs to summarize, use ASSERT act

Generating Dialogue ActsGeneration neglected relative to generation

Stent (2002) model: Conversation acts, Belief modelDevelops update rules for content planning, i.e.

If user releases turn, system can do ‘TAKE-TURN’ actIf system needs to summarize, use ASSERT act

Identifies turn-taking as key aspect of dialogue gen.

Generating ConfirmationSimple systems use fixed confirmation strategy

Implicit or explicit

More complex systems can select dynamicallyUse information state and features to decide

Likelihood of error: Low ASR confidence score

If very low, can reject

If very low, can reject Sentence/prosodic features: longer, initial pause, pitch

Cost of error:

If very low, can reject Sentence/prosodic features: longer, initial pause, pitch range

Cost of error: Book a flight vs looking up information

Markov Decision Process models more detailed

Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.

Documents