POSTECH Dialog-Based Computer Assisted Language Learning System

Post on 12-Feb-2016

19 views 0 download

description

POSTECH Dialog-Based Computer Assisted Language Learning System. Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee. Contents. Introduction Methods DB-CALL System Example-based Dialog Modeling Feedback Generation Translation Assistance Comprehension Assistance - PowerPoint PPT Presentation

transcript

POSTECH Dialog-Based Computer Assisted Language Learning System

Intelligent Software Lab. POSTECHProf. Gary Geunbae Lee

Contents Introduction Methods

DB-CALL System Example-based Dialog Modeling Feedback Generation Translation Assistance Comprehension Assistance

Language Learner Simulation User Simulation Grammar Error Simulation

Discussion

RESEARCH BACKGROUND

BACKGROUND

• Globalization makes English more important as a world language• Extremely high cost of native speaker tutors• Most language learning software are dedicated to pronunciation practice• Dialog-based Computer-assisted Language Learning will be an excellent solution

ISSUES

• DB-CALL system should be able to understand student’s poor and non-native expressions• DB-CALL system should have high domain scalability to support various practical scenarios• DB-CALL system should provide educational functionalities which help students improve

their linguistic ability

PREVIOUS WORKS ON DB-CALL Let’s Go (CMU, 02-04)

Providing bus schdule information for CMU Non-native students

Adaptation the acoustic model and language model to non-native speakers

Edit-distance based corrective feedback

PREVIOUS WORKS ON DB-CALL

SPELL (Edinburgh, 05) Restourant Domain Scenario-based virtual

space Incorporating mal-rules

into the ASR grammar

PREVIOUS WORKS ON DB-CALL

DEAL (KTH, 07) Trade Domain Finite State Network-

based limited dialog management

When leaners get stuck, the system provides hints

POSTECH DB-CALL System

Crawler

Descrip-tion

Extractor

+

Parallel Sentence Extractor

+

<parallel><source>~~~~~~~</source><target

<parallel><source>~~~~~</source><target>~~~~~</target></parallel><Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~</composition><Additional><url>~~~~~~</url>

Example 1

Description 1

Example 2

Description 2

Example 3

Description 3

… …

ESL Dialog Tutoring

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------User: ----------Tutor: ----------

> Expression> Description…

> Korean EXP> English EXP…

Try this ex-pression

DB-CALL System

1. Example-based Dialog Modeling

INTRODUCTION Spoken Dialog System

Applications Human-Robot Interface, Telematics, Tutoring, ...

PROBLEM & GOAL PROBLEM

How to determine the next system action Knowledge-based approach

Plan recipe / ISU rule / Agenda Data-driven approach

Statistical approach Supervised Learning based on state approximation Reinforcement Learning based on MDP/POMDP

Example-based approach GOAL

To develop a simple and practical approach to dia-log modeling for multi-domain dialog systems

IDEA

Dialog State Space

Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)

Dialog CorpusUSER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]

Turn #1 (Domain=Building_Guidance)

Dialog Example

Indexed by using semantic & discourse features

Having the simi-lar state

),(argmax* heSe iEei

Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP

ALGORITHM

Query Generation Making SQL statement using Discourse

History and SLU results.

Example Search Trying to search semantically close

dialog examples in example DB given the current dialog state.

Example Selection Selecting the best example to max-

imize the utterance similarity mea-sure based on lexical and discourse information.

Noisy Input(from ASR/SLU)

ExampleSearch

ExampleSelection

QueryGeneration

Example DB

ContentDB

DiscourseHistory

NLG

RelaxationStrategy

SystemTemplate

EXPERIMENTAL RESULTS Real user evaluation

10 undergraduates Evaluation Metric

STR (Success Turn Rate) # of successful turns / # of total turns

TCR (Task Completion Rate) # of successful dialogs / # of total dialogs

AvgUserTurn Average user’s turn length per dialog

Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM

System #Dialogs AvgUserTurn STR(%)

TCR(%)

Car Navigation 50 4.54 86.25 92.00

Weather Informa-tion 50 4.46 89.01 94.00

EPG 50 4.50 83.99 90.00

Chatbot 50 5.60 64.31 -

Multi-domain 15 6.08 78.77 86.67

EXPERIMENTAL RESULTS

Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM

System Exact match Partial match No exampleCar Navigation 50.22 44.49 5.29

Weather Informa-tion 69.49 25.00 5.51

EPG 58.33 37.22 4.45

Chatbot 50.71 14.29 35.00

Multi-domain 69.23 24.62 6.15

Example match rate of each dialog system

ROBUST DIALOG MANAGEMENT PROBLEM

How to overcome errors in the real world

ROBUST DIALOG MANAGEMENT Error handling

Recovering ASR/SLU errors by interacting with the user at the conversa-tional level

N-best support Estimating the current state with uncertanity

ASR SLU DM

Noise reductionAdaptationN-best & lattice & CN

Robust parsingData-driven app.

Error handlingN-best support

+ERROR +ERROR

Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL

GOAL & IDEA To increase the robustness of EBDM with prior

knowledge1) Error HandlingIf the system knows what the user will do next

Dynamic Help Generation

LOCATION

OFFICE PHONE NUMBER

ROOM ROLE

GUIDE

FOCUS NODE

NEXT_TASK

AgendaHelpS: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone num-ber, or 3) Selecting the desired room for navi-gation.

UtterHelpS: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there.

GOAL & IDEA To increase the robustness of EBDM with prior

knowledge2) N-best supportIf the system knows which subtask will be more probable next

Rescoring N-best hypotheses (h1~hn)

LOCATION

OFFICE PHONE NUMBER

FLOOR

ROOMNAME

h2

h1

h3

h4

Subtask System Utterance System Action

LOCATION The director’s room is Room No. 201.

Inform(RoomNumber)

N-best User Utterances Subtask P(hi|S)

U1 (h1) What are office rooms in this building?

ROOM NAME 0.2

U2 (h2) What is the floor? FLOOR 0.4

U3 (h3) Where is it? LOCATION 0.3

U4 (h4) What is the phone num-ber?

OFFICEPHONE NUMBER

0.5(More proba-

ble)

ALGORITHM

ASR SLUFromUser

w1

w2

wn

u1

u2

un

EBDM

V1

V2 V3

V6V7

V4

V5

V8

V9

s1

s2

sn

Discourse Interpretation

Focus Stack

V1

V2

ArgmaxNode

ArgmaxExample

am*

V3 V4 V6

V6

e1 e2 ek

ej*

V6

EXPERIMENT SET-UP Simulated User Evaluation

Test set : 1000 simulated dialogs (<20 user turns) Domain : Intelligent robot for building guidance Using 5-best recognition hypotheses

Evaluation Metric TCR

# of successful dialogs / # of total dialogs AvgUserTurn

Average user’s turn length per dialog AvgScore

20 * TCR + (-1) * AvgUserTurn

EXPERIMENTAL RESULTS

Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)

0 10 20 30 40 503

5

7

9

11

13

15

17

P-EP-ERP-EAP-EAR

WER (%)

Aver

age

Scor

e

Legends Methods

P-E Using only Examples

P-ER Using Examples + Recovery

P-EA Using Examples + Agenda Graph

P-EAR Using Examples + Agenda Graph + Recovery

The average score of different methods

EXPERIMENTAL RESULTS

Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted)

1 2 5 10 15 20 30 50 1002

4

6

8

10

12

14

16

18

WER0WER10WER20

n-best size

Aver

age

Scor

e

The average score of the P-EAR system according to n-best size

DEMO VIDEO PC demo

DEMO VIDEO Robot demo

2. Feedback Generation

INTRODUCTION

Recast Feedback

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------

> Expression> Description…

> Korean EXP> English EXP…

Tutor: What is the purpose of you trip?User: My purpose business

Tutor: Sorry, I don’t understand. What did you say?User: I am here on business

Try this expression“I am here on business”

Clarifica-tion Re-quest

Recast Feedback

Learner Uptake

Tutoring Process

INTRODUCTION

Expression Suggestion

User Input

Tutor: ----------User: ----------Tutor: ----------User: ----------

> Expression> Description…

> Korean EXP> English EXP…

Tutor: What is the purpose of you trip?

Tutor: Sorry, I can’t hear you.User: I am here on business

Try this expression“I am here on business”

TIMEOUT

Expression Sugges-

tion

Learner Uptake

Tutoring Process

PROBLEMS How to recognize user intentions despite numerous errors in

their utterances The mal-rule based technique used in previous studies doesn’t

work on low level learners due to multiple errors Some utterances even seem to have a meaning that differs

from what they intended to say Intended meaning : When does the bus leave? learner’s utterance : Which time I have to leave?

How to choose appropriate user intentions to suggest when a timeout is expired

The system should take into consideration the dialog context as human tutors do

Performing Intention-based soft pattern-matching to gener-ate correct feedback

MATHODS Context-aware & Level-specific Intention

Recognition Intention-based pattern matching

Level 1Utterance Model

Level 1Data

Learner’s Utterance

Dialog State –basedModel

Level 2Utterance Model

Level NUtterance Model

Level 2Data

Level NData

Dialog State

Learner‘s Intention

ExampleExpresssion DB

Example Search

Example Ex-pressions

Pattern Matching

Feedback

Intention Recognizer Dialog Manager

Dialog StateUpdate

EXPERIMENT SET-UP Primitive data set

Immigration domain 192 dialogs, 3517 utterances (18.32 utt/dialog) Annotation

Manually annotated each utterance with the speaker’s intention and component slot-values

Automatically annotated each utterance with the dis-course information

EXPERIMENTAL RESULTS

Utterance Model Hybrid Model

EXPERIMENTAL RESULTS

Level-spec Hybrid

Level-spec Utterance

Level-ignore Hybrid

Level-ignore Utterance

EXPERIMENTAL RESULTS

Demo: POSTECH DB-CALL initial version 2008

3. Translation Assistance

ArchitectureExample format

Web

Parallel Sentence Example

Extraction

ESL Dialog system / Other Applications

QueryExpression

Search Engine

Interface(function call)

<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>

<Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~<composition>

<Additional><url>~~~~~~</url>

<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>

<Alignment Info><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~<composition>

<Additional><url>~~~~~~</url>

<parallel><source>~~~~~~~</source><target>~~~~~~~~</target></parallel>

<Alignment><s2t>~~~~~~~~</s2t><t2s>~~~~~~~~</t2s><composition>~~~~</composition></Alignment>

<Additional><url>~~~~~~</url></Additional>

Analysis

Building Bilingual Example Word alignment Widely used in Statistical Machine Translation

IBM Model 1~5, Symmetrization heuristics Word alignment presents a correspondence of

each word/phrase in a given bilingual example Example word alignment ( GIZA++ )

4. Comprehension Assistance

INTRODUCTION

ESL pob-cast

website

Expression-description

DB

Dialog Sys-tem

Description Suggestion System

English Expression-Description Example Suggestion System When the user asks for a unfamiliar English ex-

pression, the system present its description to help understanding

Expression detection

Recommend

sentence

description

INTRODUCTION Expression-Description Pair Extraction Sys-

tem To present the expression example and its descrip-

tion, the system extracts expression-description pair from ESL podcast site

Phrase Description

routine test … we mean it's a normal,regular test that the doctor runs many, many different times with differentpatients, not a special test.

Treatment “Treatment” is anotherword for what the doctor gives you or does to you to help you.

EXAMPLE[script]

[description]

EXAMPLE[script]

[description]

Language Learner Simulation

1. User Simulation

INTRODUCTION User Simulation For Spoken Dialog System

Developing `simulated user’ who can replace real users

Application Automated evaluation of Spoken Dialog System

Detecting potential flaws Predicting overall behaviors of system

Learning dialog strategy in reinforcement learning framework

PROBLEM & GOAL PROBLEM

How to model real user User Intention simulation User Surface simulation ASR channel simulation

GOAL Natural Simulation Diverse Simulation Controllable Simulation

IDEA – User Intention Simulation

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Discourse Factors + Knowledge + Events …

Dialog is sequential behaviors Especially, user intention

User Intention simulation should take care of various discourse information

User

Sys

User

Sys

User

Sys

User Intention Simulation- Linear Conditional Random Field model

Turn Turn TurnTurn

Assumption An user utterance has only one intention

UI : User Intention State State=[dialog_act, main_goal, named_entities]

DI : Previou Discourse Information System Response + Discourse History

UI

DI

UI

DI

UI

DI

UI

DI

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ALGORITHM

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

User Surface Simulation PROBLEM

How to generate user surface utterance which ex-press given user intention

Approach 2-phase user utterance generation

1-phase : candidate generation 2-phase : rescoring

UtteranceUtteranceUtteranceUtterance

..

UserUtterance

ModelSimula-

tion

Selected Utter-ance

Selected Utter-ance

Selected Utter-ance

Rescor-ing

1 - phase 2 -phase

1 phase - GenerationDialog_Act _X_ Main_Goal

S1

W1

S2

W2

S3

W3

S4

W4

S5

W5

Structure Tag Transi-tion

Emission Prob.

Structure Tags : Component Slot Names + Part of Speech Tags S : member of Structure Tags given space W : member of vocabulary given space

Genera-tion

Genera-tion

Genera-tion

Genera-tion

Genera-tion

Genera-tion

Genera-tion

Genera-tion

2phase - Rescoring PROBLEM

Rescoring and Selecting the good utterances Criteria

Human-like utterance Natural word transition

APPROACH Structure and Word interpolated BLEU score

SWB score Notice that

Evaluation on system generated utterances on utterance simulation and machine translation shares the same task

SWB = β * Structure_Sequence_BLEU + (1- β)* Word_Sequence_BLEU, where 0 ≤ β ≤1

We set beta as 0.2 since Korean language is an agglutinative language so that it is relatively free to the structural gram-mar.

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ALGORITHM

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ASR Channel Simulation PROBLEM

How to simulate ASR channel Knowledge-based approach Statistical Approach

It is difficult to collect ‘speech’ data for target domain. WER controllable simulation

APPROACH Linguistic Knowledge based simulation

Step 1 : Determining error position Step 2 : Generating Error types on error marked words Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions) Step 4 : Rescoring and selecting erroneous utterance

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Error Type Distribution Determining Error types

Based on the results of English Speech Recogni-tion We assume that Korean speech recognition has similar

error distribution generally.

Greenberg et al., 2000

Error Generation Insertion error

Insert random word before the ‘insertion error mark’ Deletion error

Just delete it Substitution Error

Based on Sequence Alignment Algorithm Syllable-and Phone-based Alignment

Selecting some candidates in a dictionary Dynamic local alignment algorithm :

Needleman and Wunsch (1970) Get the similarity score

Similarity = α * Syllable_Alignment_Score + (1- α)* Phoneme_Alignment_Score, where 0 ≤ α ≤1

Vowel Confusion Matrix example

EXPERIMENT SET-UP Korean Car navigation Dialog system

SLU : Jeong and Lee (2006) DM : Lee et al. (2009)

Word Error Rate : 0.0 ~ 0.4 5000 dialog samples at each WER setting

Intention

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated dialogs in the sense of n-gram precision based on BLEU metric calculation.

Intention

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Utterance

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ASR channel

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ASR channel

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Overall prediction

Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

2. Grammar Error Simulation

INTRODUCTION Language learner simulation requires us to in-

vent grammar error simulation on top of the general user simulation

SLU

Dialog Manager

System Utterance Generator

Dialog System

Non-native ASR

TTS

Grammar Errors Simula-tor

User Utterance Simulator

User Intention Simulator

ASR Errors Simula-tor

Language Learner Simulator

REALISTIC ERROR

He wants to go to a movie theater

He wants to to a movie theater

He want go to movie theaterVS.

PROBLEMS How to incorporate expert knowledge about er-

ror characteristics of Korean language learners into the statistical model Subject-verb agreement errors Omission errors of the preposition of prepositional

verbs Omission errors of articles Etc.

MARKOV LOGIC NETWORK

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

METHOD The generation procedure involves three steps:

Generating probability over error types for each word through MLN inference

Determining an error type by sampling the generated proba-bility for each word

Creating an ill-formed output sentence by realizing the chosen error types

He wants to go to a movie theater

v_agr_subprp_lex_del

at_del

none

0.0000.0000.000

0.921

0.3710.0000.000

0.449

0.0000.2840.000

0.604

0.0000.0000.000

0.866

0.0000.2690.000

0.605

0.0000.0000.355

0.506

0.0000.0000. 000

0.781

0.0000.0000.000

0.798

none v_agr_sub prp_lex_del none none at_del none none

He want go to movie theater

1 step

2 step

3 step

Inference

Sampling

Realization

EXPERIMENT SET-UP Data Sets

NICT JLE Corpus Dividing the 167 error annotated files into 3 level

groups: Beginner(1-4) : 2,905 Intermediate(5-6) : 3,296 Advanced(7-9) : 2,752

Evaluation 10-fold cross validations performed for each group

The validation results were added together across the rounds

EXPERIMENTAL RESULTS Advanced

DKL(Real || Proposed)=0.068 vs. DKL(Real || Baseline)=0.122

EXPERIMENTAL RESULTS Intermediate

DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.142

EXPERIMENTAL RESULTS Beginner

DKL(Real || Proposed)=0.075 vs. DKL(Real || Baseline)=0.092

EXPERIMENTAL RESULTS Human Judgment

Evaluated 100 randomly chosen sentences con-sisting of 50 sentences each from the real and simulated data

The sequence of the test sentences was mixed so that the human judges did not know whether the source of the sentence was real or simulated

Two-level scale (0: Unrealistic, 1: Realistic)

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

Q & A