POSTECH Dialog-Based Computer Assisted Language Learning System

transcript

Intelligent Software Lab. POSTECHProf. Gary Geunbae Lee

Contents Introduction Methods

DB-CALL System Example-based Dialog Modeling Feedback Generation Translation Assistance Comprehension Assistance

Language Learner Simulation User Simulation Grammar Error Simulation

Discussion

RESEARCH BACKGROUND

BACKGROUND

• Globalization makes English more important as a world language• Extremely high cost of native speaker tutors• Most language learning software are dedicated to pronunciation practice• Dialog-based Computer-assisted Language Learning will be an excellent solution

ISSUES

• DB-CALL system should be able to understand student’s poor and non-native expressions• DB-CALL system should have high domain scalability to support various practical scenarios• DB-CALL system should provide educational functionalities which help students improve

their linguistic ability

PREVIOUS WORKS ON DB-CALL Let’s Go (CMU, 02-04)

Providing bus schdule information for CMU Non-native students

Adaptation the acoustic model and language model to non-native speakers

Edit-distance based corrective feedback

PREVIOUS WORKS ON DB-CALL

SPELL (Edinburgh, 05) Restourant Domain Scenario-based virtual

space Incorporating mal-rules

into the ASR grammar

PREVIOUS WORKS ON DB-CALL

DEAL (KTH, 07) Trade Domain Finite State Network-

based limited dialog management

When leaners get stuck, the system provides hints

POSTECH DB-CALL System

Crawler

Descrip-tion

Extractor

Parallel Sentence Extractor

<parallel><source>~~~~~~~</source><target

Example 1

Description 1

Example 2

Description 2

Example 3

Description 3

… …

ESL Dialog Tutoring

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------

> Expression> Description…

> Korean EXP> English EXP…

Try this ex-pression

DB-CALL System

1. Example-based Dialog Modeling

INTRODUCTION Spoken Dialog System

Applications Human-Robot Interface, Telematics, Tutoring, ...

PROBLEM & GOAL PROBLEM

How to determine the next system action Knowledge-based approach

Plan recipe / ISU rule / Agenda Data-driven approach

Statistical approach Supervised Learning based on state approximation Reinforcement Learning based on MDP/POMDP

Example-based approach GOAL

To develop a simple and practical approach to dia-log modeling for multi-domain dialog systems

Dialog State Space

Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)

Dialog CorpusUSER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]

Turn #1 (Domain=Building_Guidance)

Dialog Example

Indexed by using semantic & discourse features

Having the simi-lar state

),(argmax* heSe iEei

Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP

ALGORITHM

Query Generation Making SQL statement using Discourse

History and SLU results.

Example Search Trying to search semantically close

dialog examples in example DB given the current dialog state.

Example Selection Selecting the best example to max-

imize the utterance similarity mea-sure based on lexical and discourse information.

Noisy Input(from ASR/SLU)

ExampleSearch

ExampleSelection

QueryGeneration

Example DB

ContentDB

DiscourseHistory

RelaxationStrategy

SystemTemplate

EXPERIMENTAL RESULTS Real user evaluation

10 undergraduates Evaluation Metric

STR (Success Turn Rate) # of successful turns / # of total turns

TCR (Task Completion Rate) # of successful dialogs / # of total dialogs

AvgUserTurn Average user’s turn length per dialog

Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM

System #Dialogs AvgUserTurn STR(%)

TCR(%)

Car Navigation 50 4.54 86.25 92.00

Weather Informa-tion 50 4.46 89.01 94.00

EPG 50 4.50 83.99 90.00

Chatbot 50 5.60 64.31 -

Multi-domain 15 6.08 78.77 86.67

EXPERIMENTAL RESULTS

Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM

System Exact match Partial match No exampleCar Navigation 50.22 44.49 5.29

Weather Informa-tion 69.49 25.00 5.51

EPG 58.33 37.22 4.45

Chatbot 50.71 14.29 35.00

Multi-domain 69.23 24.62 6.15

Example match rate of each dialog system

ROBUST DIALOG MANAGEMENT PROBLEM

How to overcome errors in the real world

ROBUST DIALOG MANAGEMENT Error handling

Recovering ASR/SLU errors by interacting with the user at the conversa-tional level

N-best support Estimating the current state with uncertanity

ASR SLU DM

Noise reductionAdaptationN-best & lattice & CN

Robust parsingData-driven app.

Error handlingN-best support

+ERROR +ERROR

Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL

GOAL & IDEA To increase the robustness of EBDM with prior

knowledge1) Error HandlingIf the system knows what the user will do next

Dynamic Help Generation

LOCATION

OFFICE PHONE NUMBER

ROOM ROLE

FOCUS NODE

NEXT_TASK

AgendaHelpS: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone num-ber, or 3) Selecting the desired room for navi-gation.

UtterHelpS: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there.

GOAL & IDEA To increase the robustness of EBDM with prior

knowledge2) N-best supportIf the system knows which subtask will be more probable next

Rescoring N-best hypotheses (h1~hn)

LOCATION

OFFICE PHONE NUMBER

ROOMNAME

Subtask System Utterance System Action

LOCATION The director’s room is Room No. 201.

Inform(RoomNumber)

N-best User Utterances Subtask P(hi|S)

U1 (h1) What are office rooms in this building?

ROOM NAME 0.2

U2 (h2) What is the floor? FLOOR 0.4

U3 (h3) Where is it? LOCATION 0.3

U4 (h4) What is the phone num-ber?

OFFICEPHONE NUMBER

0.5(More proba-

ALGORITHM

ASR SLUFromUser

Discourse Interpretation

Focus Stack

ArgmaxNode

ArgmaxExample

V3 V4 V6

e1 e2 ek

EXPERIMENT SET-UP Simulated User Evaluation

Test set : 1000 simulated dialogs (<20 user turns) Domain : Intelligent robot for building guidance Using 5-best recognition hypotheses

Evaluation Metric TCR

# of successful dialogs / # of total dialogs AvgUserTurn

Average user’s turn length per dialog AvgScore

20 * TCR + (-1) * AvgUserTurn

Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)

0 10 20 30 40 503

P-EP-ERP-EAP-EAR

WER (%)

Legends Methods

P-E Using only Examples

P-ER Using Examples + Recovery

P-EA Using Examples + Agenda Graph

P-EAR Using Examples + Agenda Graph + Recovery

The average score of different methods

Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)

1 2 5 10 15 20 30 50 1002

WER0WER10WER20

n-best size

The average score of the P-EAR system according to n-best size

DEMO VIDEO PC demo

DEMO VIDEO Robot demo

2. Feedback Generation

INTRODUCTION

Recast Feedback

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------

Tutor: What is the purpose of you trip?User: My purpose business

Tutor: Sorry, I don’t understand. What did you say?User: I am here on business

Try this expression“I am here on business”

Clarifica-tion Re-quest

Recast Feedback

Learner Uptake

Tutoring Process

INTRODUCTION

Expression Suggestion

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------

Tutor: What is the purpose of you trip?

Tutor: Sorry, I can’t hear you.User: I am here on business

Try this expression“I am here on business”

TIMEOUT

Expression Sugges-

Learner Uptake

Tutoring Process

PROBLEMS How to recognize user intentions despite numerous errors in

their utterances The mal-rule based technique used in previous studies doesn’t

work on low level learners due to multiple errors Some utterances even seem to have a meaning that differs

from what they intended to say Intended meaning : When does the bus leave? learner’s utterance : Which time I have to leave?

How to choose appropriate user intentions to suggest when a timeout is expired

The system should take into consideration the dialog context as human tutors do

Performing Intention-based soft pattern-matching to gener-ate correct feedback

MATHODS Context-aware & Level-specific Intention

Recognition Intention-based pattern matching

Level 1Utterance Model

Level 1Data

Learner’s Utterance

Dialog State –basedModel

Level 2Utterance Model

Level NUtterance Model

Level 2Data

Level NData

Dialog State

Learner‘s Intention

ExampleExpresssion DB

Example Search

Example Ex-pressions

Pattern Matching

Feedback

Intention Recognizer Dialog Manager

Dialog StateUpdate

EXPERIMENT SET-UP Primitive data set

Immigration domain 192 dialogs, 3517 utterances (18.32 utt/dialog) Annotation

Manually annotated each utterance with the speaker’s intention and component slot-values

Automatically annotated each utterance with the dis-course information

Utterance Model Hybrid Model

Level-spec Hybrid

Level-spec Utterance

Level-ignore Hybrid

Level-ignore Utterance

Demo: POSTECH DB-CALL initial version 2008

3. Translation Assistance

ArchitectureExample format

Parallel Sentence Example

Extraction

ESL Dialog system / Other Applications

QueryExpression

Search Engine

Interface(function call)

Analysis

Building Bilingual Example Word alignment Widely used in Statistical Machine Translation

IBM Model 1~5, Symmetrization heuristics Word alignment presents a correspondence of

each word/phrase in a given bilingual example Example word alignment ( GIZA++ )

4. Comprehension Assistance

INTRODUCTION

ESL pob-cast

website

Expression-description

Dialog Sys-tem

Description Suggestion System

English Expression-Description Example Suggestion System When the user asks for a unfamiliar English ex-

pression, the system present its description to help understanding

Expression detection

Recommend

sentence

description

INTRODUCTION Expression-Description Pair Extraction Sys-

tem To present the expression example and its descrip-

tion, the system extracts expression-description pair from ESL podcast site

Phrase Description

routine test … we mean it's a normal,regular test that the doctor runs many, many different times with differentpatients, not a special test.

Treatment “Treatment” is anotherword for what the doctor gives you or does to you to help you.

EXAMPLE[script]

[description]

EXAMPLE[script]

[description]

Language Learner Simulation

1. User Simulation

INTRODUCTION User Simulation For Spoken Dialog System

Developing `simulated user’ who can replace real users

Application Automated evaluation of Spoken Dialog System

Detecting potential flaws Predicting overall behaviors of system

Learning dialog strategy in reinforcement learning framework

PROBLEM & GOAL PROBLEM

How to model real user User Intention simulation User Surface simulation ASR channel simulation

GOAL Natural Simulation Diverse Simulation Controllable Simulation

IDEA – User Intention Simulation

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Discourse Factors + Knowledge + Events …

Dialog is sequential behaviors Especially, user intention

User Intention simulation should take care of various discourse information

User Intention Simulation- Linear Conditional Random Field model

Turn Turn TurnTurn

Assumption An user utterance has only one intention

UI : User Intention State State=[dialog_act, main_goal, named_entities]

DI : Previou Discourse Information System Response + Discourse History

ALGORITHM

User Surface Simulation PROBLEM

How to generate user surface utterance which ex-press given user intention

Approach 2-phase user utterance generation

1-phase : candidate generation 2-phase : rescoring

UtteranceUtteranceUtteranceUtterance

UserUtterance

ModelSimula-

Selected Utter-ance

Rescor-ing

1 - phase 2 -phase

1 phase - GenerationDialog_Act _X_ Main_Goal

Structure Tag Transi-tion

Emission Prob.

Structure Tags : Component Slot Names + Part of Speech Tags S : member of Structure Tags given space W : member of vocabulary given space

Genera-tion

2phase - Rescoring PROBLEM

Rescoring and Selecting the good utterances Criteria

Human-like utterance Natural word transition

APPROACH Structure and Word interpolated BLEU score

SWB score Notice that

Evaluation on system generated utterances on utterance simulation and machine translation shares the same task

SWB = β * Structure_Sequence_BLEU + (1- β)* Word_Sequence_BLEU, where 0 ≤ β ≤1

We set beta as 0.2 since Korean language is an agglutinative language so that it is relatively free to the structural gram-mar.

ALGORITHM

ASR Channel Simulation PROBLEM

How to simulate ASR channel Knowledge-based approach Statistical Approach

It is difficult to collect ‘speech’ data for target domain. WER controllable simulation

APPROACH Linguistic Knowledge based simulation

Step 1 : Determining error position Step 2 : Generating Error types on error marked words Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions) Step 4 : Rescoring and selecting erroneous utterance

Error Type Distribution Determining Error types

Based on the results of English Speech Recogni-tion We assume that Korean speech recognition has similar

error distribution generally.

Greenberg et al., 2000

Error Generation Insertion error

Insert random word before the ‘insertion error mark’ Deletion error

Just delete it Substitution Error

Based on Sequence Alignment Algorithm Syllable-and Phone-based Alignment

Selecting some candidates in a dictionary Dynamic local alignment algorithm :

Needleman and Wunsch (1970) Get the similarity score

Similarity = α * Syllable_Alignment_Score + (1- α)* Phoneme_Alignment_Score, where 0 ≤ α ≤1

Vowel Confusion Matrix example

EXPERIMENT SET-UP Korean Car navigation Dialog system

SLU : Jeong and Lee (2006) DM : Lee et al. (2009)

Word Error Rate : 0.0 ~ 0.4 5000 dialog samples at each WER setting

Intention

D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated dialogs in the sense of n-gram precision based on BLEU metric calculation.

Intention

Utterance

ASR channel

Overall prediction

2. Grammar Error Simulation

INTRODUCTION Language learner simulation requires us to in-

vent grammar error simulation on top of the general user simulation

Dialog Manager

System Utterance Generator

Dialog System

Non-native ASR

Grammar Errors Simula-tor

User Utterance Simulator

User Intention Simulator

ASR Errors Simula-tor

Language Learner Simulator

REALISTIC ERROR

He wants to go to a movie theater

He wants to to a movie theater

He want go to movie theaterVS.

PROBLEMS How to incorporate expert knowledge about er-

ror characteristics of Korean language learners into the statistical model Subject-verb agreement errors Omission errors of the preposition of prepositional

verbs Omission errors of articles Etc.

MARKOV LOGIC NETWORK

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

METHOD The generation procedure involves three steps:

Generating probability over error types for each word through MLN inference

Determining an error type by sampling the generated proba-bility for each word

Creating an ill-formed output sentence by realizing the chosen error types

He wants to go to a movie theater

v_agr_subprp_lex_del

at_del

0.0000.0000.000

0.3710.0000.000

0.0000.2840.000

0.0000.0000.000

0.0000.2690.000

0.0000.0000.355

0.0000.0000. 000

0.0000.0000.000

none v_agr_sub prp_lex_del none none at_del none none

He want go to movie theater

1 step

2 step

3 step

Inference

Sampling

Realization

EXPERIMENT SET-UP Data Sets

NICT JLE Corpus Dividing the 167 error annotated files into 3 level

groups: Beginner(1-4) : 2,905 Intermediate(5-6) : 3,296 Advanced(7-9) : 2,752

Evaluation 10-fold cross validations performed for each group

The validation results were added together across the rounds

EXPERIMENTAL RESULTS Advanced

DKL(Real || Proposed)=0.068 vs. DKL(Real || Baseline)=0.122

EXPERIMENTAL RESULTS Intermediate

EXPERIMENTAL RESULTS Beginner

EXPERIMENTAL RESULTS Human Judgment

Evaluated 100 randomly chosen sentences con-sisting of 50 sentences each from the real and simulated data

The sequence of the test sentences was mixed so that the human judges did not know whether the source of the sentence was real or simulated

Two-level scale (0: Unrealistic, 1: Realistic)

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

POSTECH Dialog-Based Computer Assisted Language Learning System

Documents