Stochastic Language Generation for Spoken Dialog Systems

Stochastic Language Generation for Spoken Dialog Systems

Alice Oh

[email protected]

School of Computer ScienceLanguage Technologies Institute

Carnegie Mellon University

April 19, 2023 Speech Group 2

CarnegieMellon

Big Question

How can we design a good NLG system for spoken dialog systems?


CarnegieMellon

What is NLG?

Natural Language Understanding (NLU)

Natural Language Generation (NLG)

Text Semantic (Syntactic)Representation

TextSemantic (Syntactic)Representation


CarnegieMellon

Example of NLG

{act querycontent name}=> What is your full name?

(Carnegie Mellon Communicator)


CarnegieMellon

Example of NLG cat clause process type material

effect-type creativelex ‘score’tense past

participants agent cat properhead cat person-name

first-name [lex ‘Michael’]last-name [lex ‘Jordan’]

created cat npcardinal [value 36]definite nohead [lex ‘point’]

=> Michael Jordan scored 36 points.

(FUF/SURGE Elhadad and Robin, 1996)


CarnegieMellon

What is a good NLG system?

High quality output? Write like Shakespeare? Talk like … the presidential candidates? Or… just produce grammatical sentences?

Reusable? Portable? What about development & maintenance?


CarnegieMellon

What is a spoken dialog system?

A task-oriented human-computer interaction via natural spoken dialog

CMU Communicator Complex travel planning system

Jupiter Worldwide weather report

What is not a spoken dialog system? C-STAR speech-to-speech translation system American Airlines flight information system (IVR)


CarnegieMellon

NLG in spoken dialog systems

Language is different from text-based applications Shorter in length Simpler in structure Less strict in following grammatical rules

Lexicon is domain-specific


CarnegieMellon

Communicator Project

A spoken dialog system in which users engage in a telephone conversation with the system using natural language to solve a complex travel reservation task

Components Sphinx-II speech recognizer Phoenix semantic parser Domain agents Agenda-based dialog manager Stochastic natural language generator Festival domain-dependent Text-to-Speech (being integrated)

Want to know more? Call toll-free at 1-877-CMU-PLAN


CarnegieMellon

Problem Statement

Problem: build a generation engine for a dialog system that can combine the advantages, as well as overcome the difficulties, of the two dominant approaches (template-based generation, and grammar rule-based NLG)

Our Approach: design a corpus-driven stochastic generation engine that takes advantage of the characteristics of task-oriented conversational systems. Some of those characteristics are that Spoken utterances are much shorter in length There are well-defined subtopics within the task, so the language

can be selectively modeled


CarnegieMellon

Stochastic NLG: overview

Language Model: an n-gram language model of domain expert’s language built from a corpus of travel reservation dialogs

Generation: given an utterance class, randomly generates a set of candidate utterances based on the LM distributions

Scoring: based on a set of heuristics, scores the candidates and picks the best one

Slot filling: substitute slots in the utterance with the appropriate values in the input frame


CarnegieMellon

Stochastic NLG: overview

Input Frame{act querycontent depart_timedepart_date 20000501}

Language Models Candidate Utterances

What time on {depart_date}?

At what time would you beleaving {depart_city}?

Scoring

Generation

Best Utterance


Complete Utterance

What time on Mon, May 8th?Slot FillingTTS

Dialog Manager

TaggedCorpora


CarnegieMellon

Input Frame{act querycontent depart_timedepart_date 20000501}

Language Models Candidate Utterances


At what time would you beleaving {depart_city}?

Scoring

Generation

Best Utterance


Complete Utterance

What time on Mon, May 8th?Slot FillingTTS

Dialog Manager

TaggedCorpora


CarnegieMellon

FestivalSynthesis

DialogManager

PhoenixParser

SphinxASR

Speech signal

Words

BackendModules

Semantic Frames

Queries

Data

StochasticNLG


CarnegieMellon

Stochastic NLG: Corpora

Human-Human dialogs in travel reservations

(CMU-Leah, SRI-ATIS/American Express dialogs)

CMU(Agent)

CMU(User)

SRI(Agent)

SRI(User)

# of Dialogs 39 68

# of Utterances 970 946 2245 2060

# of Words 12852 7848 27695 17995


CarnegieMellon

Example

Utterances in Corpus:What time do you want to depart {depart_city}?

What time on {depart_date} would you like to depart?

What time would you like to leave?

What time do you want to depart on {depart_date}?

Output (different from corpus):What time would you like to depart?

What time on {depart_date} would you like to depart {depart_city}?

*What time on {depart_date} would you like to depart on {depart_date}?


CarnegieMellon

Evaluation

Transcription

Dialogs

StochasticNLG

TemplateNLG Dialogs

withOutputT

Dialogswith

OutputS

Batch-modeGeneration

ComparativeEvaluation


CarnegieMellon

Preliminary Evaluation

Batch-mode generation using two systems, comparative evaluation of output by human subjects

User Preferences (49 utterances total)

Weak preference for Stochastic NLG (p = 0.18)

subject stochastic templates difference1 41 8 332 34 15 193 17 32 -154 32 17 155 30 17 136 27 19 87 8 41 -33

average 27 21.29 5.71


CarnegieMellon

Stochastic NLG: Advantages

corpus-driven easy to build (minimal knowledge engineering) fast prototyping minimal input (speech act, slot values) natural output leverages data-collecting/tagging effort


CarnegieMellon

Open Issues

How big of a corpus do we need? How much of it needs manual tagging? How does the n in n-gram affect the output? What happens to output when two different human

speakers are modeled in one model? Can we replace “scoring” with a search algorithm?

Extra Slides


CarnegieMellon

Current Approaches

Traditional (rule-based) NLG hand-crafted generation grammar rules and other knowledge input: a very richly specified set of semantic and syntactic features Example*

(h / |possible<latent|

:domain (h2 / |obligatory<necessary|

:domain (e / |eat,take in|

:agent you

:patient (c / |poulet|))))

You may have to eat chicken Template-based NLG

simple to build input: a dialog act, and/or a set of slot-value pairs

* from a Nitrogen demo website, http://www.isi.edu/natural-language/projects/nitrogen/


CarnegieMellon

Stochastic NLG can also be thought of as a way to automatically build templates from a corpus

If you set n equal to a large enough number, most utterances generated by LM-NLG will be exact

duplicates of the utterances in the corpus.


CarnegieMellon

Tagging

CMU corpus tagged manually SRI corpus tagged semi-automatically using trigram

language models built from CMU corpus


CarnegieMellon

Tags

Utterance classes (29)query_arrive_city inform_airport

query_arrive_time inform_confirm_utterance

query_arrive_time inform_epilogue

query_confirm inform_flight

query_depart_date inform_flight_another

query_depart_time inform_flight_earlier

query_pay_by_card inform_flight_earliest

query_preferred_airport inform_flight_later

query_return_date inform_flight_latest

query_return_time inform_not_avail

hotel_car_info inform_num_flights

hotel_hotel_chain inform_price

hotel_hotel_info other

hotel_need_car

hotel_need_hotel

hotel_where

Attributes (24)airline flight_num

am hotel

arrive_airport hotel_city

arrive_city hotel_price

arrive_date name

arrive_time num_flights

car_company pm

car_price price

connect_airline

connect_airport

connect_city

depart_airport

depart_city

depart_date

depart_time

depart_tod


CarnegieMellon

Stochastic NLG: Generation

Given an utterance class, randomly generates a set of candidate utterances based on the LM distributions

Generation stops when an utterance has penalty score of 0 or the maximum number of iterations (50) has been reached

Average generation time: 75 msec for Communicator dialogs


CarnegieMellon

Stochastic NLG: Scoring

Assign various penalty scores for unusual length of utterance (thresholds for too-long and too-short) slot in the generated utterance with an invalid (or no) value in the

input frame a “new” and “required” attribute in the input frame that’s missing

from the generated utterance repeated slots in the generated utterance

Pick the utterance with the lowest penalty (or stop

generating at an utterance with 0 penalty)


CarnegieMellon

Stochastic NLG: Slot Filling

Substitute slots in the utterance with the appropriate values in the input frame

Example:What time do you need to arrive in {arrive_city}?What time do you need to arrive in New York?


CarnegieMellon

Stochastic NLG: Shortcomings

What might sound natural (imperfect grammar, intentional omission of words, etc.) for a human speaker may sound awkward (or wrong) for the system.

It is difficult to define utterance boundaries and utterance classes. Some utterances in the corpus may be a conjunction of more than one utterance class.

Factors other than the utterance class may affect the words (e.g., discourse history).

Some sophistication built into traditional NLG engines is not available (e.g., aggregation, anaphorization).


CarnegieMellon

Evaluation

Must be able to evaluate generation independent of the rest of the dialog system

Comparative evaluation using dialog transcripts need more subjects 8-10 dialogs; system output generated batch-mode by two

different engines

Evaluation of human travel agent utterances Do users rate them well? Is it good enough to model human utterances?


CarnegieMellon

What is NLG?


TextSemantic (Syntactic)

RepresentationText

Semantic (Syntactic)Representation



CarnegieMellon

TextSemantic (Syntactic)

RepresentationText

Semantic (Syntactic)Representation

Natural Language Generation (NLG)Natural Language Understanding (NLU)

Date post:	31-Dec-2015
Category:	Documents
Upload:	baker-avila
View:	40 times
Download:	2 times

Stochastic Language Generation for Spoken Dialog Systems

Documents