+ All Categories
Home > Documents > Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Date post: 19-Dec-2015
Category:
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
80
Speech and Language Technology For Dialog-based CALL Gary Geunbae Lee, POSTECH
Transcript
Page 1: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Speech and Language TechnologyFor Dialog-based CALL

Gary Geunbae Lee, POSTECH

Page 2: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Outline

Introduction1

Spoken Dialog Systems2

4 PESAA: Postech English Speaking As-sessment and Assistant

5 Field Study

3 DBCALL: Educational Error Han-dling

Page 3: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

INTRODUC-TIONINTRODUC-TION

CHAPTER 1

Page 4: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

English Tu-toring Meth-ods

English Tu-toring Meth-ods Tranditional Approches

CALL Approches

<CMC> <ICALL>

<Classroom> <Textbook> <Multimedia>

Page 5: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Socio-Economic Ef-fects

Socio-Economic Ef-fects

• Changing our current foreign language educa-tion system in public schools From vocabulary and grammar methodology To speaking ability

• Significant effect of decreasing private English education fee private English education fee in Korea, reaching up

to 16 trillion won annually

• Expect the effect of the overseas export Japan, China, etc.

Page 6: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Interdiciplinary ResearchInterdiciplinary Research

NLP

• Dialog Management• Error Detection• Corrective Feedback

• Comprehensible Input and Output• Corrective Feedback• Attitude & Motivation

SLA

Evaluation

• Cognitive Effect• Affective Effect

Page 7: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Second Language Acquisition Theory

Second Lan-guage Ac-quisition

• Input Enhancement• Comprehensible input• Provision of inputs with high

frequency

• Immersion• Authentic environment• Direct form-meaning map-

ping

• Noticing & Attention• Output hypothesis test • Corrective feedback• Affective factors

• Motivation• Goal achievement & rewards• Interest• Importance of L2

Page 8: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Dialog-Based CALL (DB-CALL)

Dialog-Based CALL (DB-CALL)

<Educational Robot>

<3D Educational Game>

Spoken Dialog System DB-CALL System

Page 9: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Existing DB-CALL Systems

Existing DB-CALL Systems

Alelo Tactical language & culture training system Learn Iraqi Arabic by playing a fun video game Dedicated to serving langauge and culture

learning needs of military

SPELL Learning English in functional situations such

as going to a restaurant, expressing (dis-)likes, etc.

The speech recogniser is programmed to recognise grammatical and some ungrammatical utter-ances

DEAL Learning Dutch in a flea market situation The model can also convey extra linguis-

tic signs such as lip-synching, frowning, nodding, and eyebrow movements

Page 10: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Video DemoVideo Demo

Page 11: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

SPOKEN DIALOG SYS-TEMSSPOKEN DIALOG SYS-TEMS

CHAPTER 2

Page 12: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

SPOKEN DIALOG SYSTEM (SDS)

Page 13: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Tele-serviceTele-service

Car-navigationCar-navigation Home networkingHome networking

Robot interfaceRobot interface

SDS APPLICATIONS

Page 14: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Automatic Speech Recognition (ASR)

FeatureExtraction

Decoding

AcousticModel

PronunciationModel

LanguageModel

버스 정류장이어디에 있나요 ?

Speech Signals Word Sequence

버스 정류장이어디에 있나요 ?

NetworkConstruction

SpeechDB

TextCorpora

HMMEstimation

G2P

LMEstimation

WO

)()|(maxargˆ WPWOPWLW

Page 15: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

15

Spoken Language Understanding (SLU)

Dialog ActIdentificationDialog Act

IdentificationFrame-SlotExtraction

Frame-SlotExtraction

RelationExtractionRelation

Extraction

UnificationUnification

Feature Extraction / SelectionFeature Extraction / Selection

Info.SourceInfo.

Source

++

++

++

++ ++

Overall architecture for semantic analyzer

I like DisneyWorld.

Domain: ChatDialog Act: StatementMain Action: LikeObject.Location=DisneyWorld

Examples of semantic frame structure

Semantic Frame Extraction (~ Information Extrac-

tion Approach)1) Dialog act / Main action Identification ~ Classification

2) Frame-Slot Object Extraction ~ Named Entity Recognition

3) Object-Attribute Attachment ~ Relation ExtractionHow to get to DisneyWorld?Domain: NavigationDialog Act: WH-questionMain Action: SearchObject.Location.Destination=DisneyWorld

Page 16: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Named Entity ↔ Dialog Act

JOINT APPROACH

Joint Inference

Classification(Dialog Act / Intent)

Sequential Labeling

(Named Entity / Frame Slot)

Automatic Speech

Recognition

Dialog Management

Joint Model(e.g. TriCRFs)

x x,y,z

[Jeong and Lee, SLT2006][Jeong and Lee, IEEE TASLP2008]

Page 17: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

HDP-HMM for Unsupervised Dialog Acts

β ~ GEM(α), ω ~ Dir(ω0)for each hidden state k [∈ 1,2,…] πk ~ DP(α',β) ϕk ~ Dir(ϕ0), θk ~ Dir(θ0)for each dialog d λd ~ Beta(λ0) for time stamp t zt ~ Multi(πzt-) for each entity e ei ~ Multi(θzt)

for each word w xi ~ Bern(λd) [select word type] if xi = 0: wi ~ Multi(ϕzt) else wi ~ Multi(ω) [background LM]

zt

wt,i

zt+1

et,i

N

V

ϕk

πk

ϕ0

α'

βα

θk

θ0

zt-1

ωω0xt,i

Dλ0λd

Generative Story

Page 18: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

CRF with Posterior Regularization for un-supervised NER Constraints for NER

Constraints Learning

Welcome to the New York City Bus Tour Center .I want to buy tickets for me and my child .What kind of tour would you like to take ?We would like to go on a tour dur-ing the day .We have two daytime tours: the Downtown Tour and the All Around Town Tour .Which tour goes to the Statue of Liberty ?…

BOARD_TYPE:Hop-onBOARD_TYPE:Hop-offPLACE:Times SquarePLACE:Empire State BuildingPLACE:ChinatownPLACE:Site of the World Trade CenterPLACE:Statue of LibertyPLACE:Rockefeller CenterPLACE:Central Park…

HeuristicMatch-

ing

DICT/DB/Web

UNLABELDCORPUS

# We would like to go on a tour during the day . # -> null0:1.000:We would like to go on a tour during the day . # We have two daytime tours # -> the Downtown Tour and the All Around Town Tour .0:1.000:We have two daytime tours # Which tour goes to the Statue of Liberty ? # -> null0:1.000:Which tour goes to the <PLACE>Statue of Liberty</PLACE> ? # You can visit the Statue of Lib-erty on either tour . # -> null0:1.000:You can visit the <PLACE>Statue of Liberty</PLACE> on either tour .…

HYPOTHE-SIS

Welcome O:1.000 W1=<s> O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W2=<s>,Welcome O:1.000 W3=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W4=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W5=_ O:0.997 PLACE-b:0.001 TOURS-b:0.002 GUIDE-b:0.001 W6=to O:1.000 W7=Welcome,to O:1.000 W8=the O:0.924 PLACE-b:0.005 PLACE-i:0.006 TOURS-b:0.001 TOURS-i:0.064 W9=Welcome,the O:1.000 …

LABELEDFEATURES

ExtractFeatures

CRFModel with PR

Page 19: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Vanilla EXAMPLE-BASED DM (EBDM) Example-based approaches

Dialog State Space

Domain = Building_GuidanceDialog Act = WH-QUESTIONMain Goal = SEARCH-LOCROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled)LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0Previous Dialog Act = <s>, Previous Main Goal = <s> Discourse History Vector = [1,0,0,0,0]Lexico-semantic Pattern = ROOM_TYPE 이 어디 지 ?System Action = inform(Floor)

Dialog Corpus

USER: 회의 실 이 어디 지 ?[Dialog Act = WH-QUESTION][Main Goal = SEARCH-LOC][ROOM-TYPE = 회의실 ]SYSTEM: 3 층에 교수회의실 , 2 층에 대회의실 , 소회의실이 있습니다 . [System Action = inform(Floor)]

Turn #1 (Domain=Building_Guidance)

Dialog Example

Indexed by using semantic & discourse features

Having the simi-lar state

),(argmax* heSe iEei

[Lee et al., SPECOM2009]

Page 20: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Error handling and N-best support

To increase the robustness of EBDM with prior knowledge

1) Error Handling

If the system knows what the user will do next

Dynamic Help Generation

LOCATION

OFFICE PHONE NUMBER

ROOM ROLE

GUIDE

FOCUS NODE

NEXT_TASK

AgendaHelpS: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone num-ber, or 3) Selecting the desired room for navi-gation.

UtterHelpS: Next, you can say 1) “What is it?”, or 2) “What’s the phone number of [ROOM_NAME]?”, or 3) “ Let’s go there.

[Lee et al CSL2010]

Page 21: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Error handling and N-best support

To increase the robustness of EBDM with prior knowledge

2) N-best supportIf the system knows which subtask will be more probable next

Rescoring N-best hypotheses (h1~hn)

LOCATION

OFFICE PHONE NUMBER

FLOOR

ROOMNAME

h2

h1

h3

h4

Subtask System Utterance System Action

LOCATIONThe director’s room is Room No. 201.

Inform(RoomNumber)

N-best User Utterances Subtask P(hi|S)

U1 (h1)What are office rooms in this building?

ROOM NAME

0.2

U2 (h2) What is the floor? FLOOR 0.4

U3 (h3) Where is it? LOCATION 0.3

U4 (h4)What is the phone num-ber?

OFFICEPHONE NUMBER

0.5(More proba-

ble)

Page 22: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Misunderstanding handling by Confirma-tion

Dialog statehypotheses

ConfirmationAgent

(misunderstandingHandler)

EBDM

Multiple Dialog States

Representation

User Simulator

DEDB

ConfirmationStrategy

Confirmation

Task related system action

User

ASR

SLU

User’sActions

Executing Learning

[Kim et al SLT 2010]

Page 23: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

The Framework of ranking-based EBDM

DiscourseSimilarity

Relative Position

Scoring Mod-uleDialog

Examples

Dialog ActFeatures

Entity Con-

straint

User Intention(system intention)

RankSVM

CalculatedScores

system Intention(user intention)

EBDM

[Noh et al IWSDS2011]

Page 24: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Dialog Simulation User Simulation for spoken dialog systems in-

volves four essential problems

User Intention Simulation

User Utterance Simulation

ASR Channel SimulationSpoken Dialog System Simulated Users

[Jung et al., CSL 2009]

Page 25: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Design Step

Annotation Step

LanguageSynchronization Step

Training Step

Running Step

Semantic Structure

Dialog Structure

KnowledgeStructure

ModelSLUModel

DialogModel

Knowledge

Model

ASRModel

CorpusSLU

CorpusDialogCorpus

Knowledge

Source

SemanticAnnotato

r

DialogAnnotato

r

KnowledgeAnnotator

DialogUtterance

Pool

KnowledgeImporter

KnowledgeBuilder

DMTrainer

SLUTrainer

ASRTrainer

SLU DMASR

ExternalComponen

tDialog Studio

Component

File

DIALOG STUDIO ARCHITECTURE

[Jung et al., SPECOM 2008]

Page 26: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

humansubject Wizard

User speech

mic speaker

TTSText input

Wizard speech (Network RPC)

Architecture of WOZ

User Screen Wizard Screen

NPCsControl

User CharacterControl

[Lee et al SLATE2011]

Page 27: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

User Screen (Mission)

Page 28: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

DBCALL: EDUCATIONAL ERROR HANDLING

DBCALL: EDUCATIONAL ERROR HANDLING

CHAPTER 3

Page 29: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Global ErrorsGlobal Errors

• Global errors are errors that affect overall sen-tence organization. They are likely to have a marked effect on comprehension. [1] 

What is the purpose of your trip?

It’s ... I ... purpose business

Sorry, I didn’t under-stand. What did you say?You can say “I am here on busi-ness”I am here on business

Intention: inform(trip-purpose)

Page 30: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Lee, S., Lee, C., Lee, J., Noh, H., & Lee, G. G. (2010). Intention-based Corrective Feedback Generation using Context-aware Model. Proceedings of International Conference on Computer Supported Education.

Hybrid ModelHybrid Model

Level 1Data

Learner’s Utterance

Dialog ContextModel

Level 2Utterance Model

Level NUtterance Model

Level 2Data

Level NData

Dialog State

Learner‘s Intention

Level 1Utterance Model

Dialog Manager

• Robust to learners’ errors– Hybrid model combining utterance-based model and dialog

context-based model

Page 31: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Formulating the prediction as probabilistic inference:Formulating the prediction as probabilistic inference:

Chain ruleBayes’ ruleIgnore invariants

Dialog-Context ModelUtterance ModelMaximum Entropy

Features: • Word• Part of speech

Enhanced K-Nearest Neighbors

Features: • Previous system intention• Previous user intention• Current system intention• A list of exchanged information• Number of database query results

Page 32: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Dialog State Space

Domain = Fruit_StorePrevious System Intention = Ask(Select_Item)Previous User Intention = Inform(Order_Fruit) System Intention = Ask(Order_Quantity)Exchanged Information State = [ITEM_NAME = ‘orange’ (C), ITEM_QUANTITY = 3 (U)]Number of DB query results = 0

Dialog Corpus

SYSTEM: Namsu, what would you like to buy today?[Intention = Ask(Select_Item)]USER: I’d like to buy some oranges[Intention = Inform(Order_Fruit), ITEM_NAME = orange]SYSTEM: How many oranges do you need?[Intention = Ask(Order_Quantity)]USER: I need three oranges[Intention = Inform(Order_Quantity), NUM = three]

Segment #2 (Domain = Fruit Store)

Dialog State

Indexed by using semantic & discourse features

User Intention = Inform(Order_Quantity)User Intention

Dialog-Context ModelDialog-Context Model

Page 33: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Recast Feedback GenerationRecast Feedback Generation

ExampleExpresssion DB

Example Search

Example Ex-pressions

Pattern Matching

Feedback

IntentionRecognition

User’sUtterance

> θ No Feedback

Y

N

Page 34: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

What is the purpose of your trip?

I am here at business

On business

I am here on business

ErrorInfo: prep_sub(at/on)

Local Er-rors

Local Er-rors

• Local errors are errors that affect single elements in a sentence. [1]

[1] Ellis., R.  (2008). The Study of Second Language Acquisition. 2nd ed. Oxford: OUP

Page 35: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Local Error Detecter Archi-tecture

Local Error Detecter Archi-tecture

Text

Erroneous Text

Grammatical ErrorSimulation

ASR ASR’

N-gram LM

Merged Hy-potheses

Error-typeClassifier

GrammaticalityChecker

N-gram LM

Feed-back

Error PatternsError Frequency

Lee, S., Noh, H., Lee, K., & Lee, G. G., (2011) Grammatical Error Detection for Corrective Feedback Provision in Oral Conversations, Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco.

Page 36: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Two-Step ApproachTwo-Step Approach

• Data Imbalance Problem– Simply produce majority class– Or, High false positive rate

• Large number of error types – Makes model learning and selection procedure vastly compli-

cated• Grammaticality checking itself can be useful for some Ap-

plications– Categorizing learners’ proficiency level – Generating implicit corrective feedback such as repetition,

elicitation, and recast feedback

I am here at business

0 0 0 1 0

None None None PRP_LXC None

Grammaticality CheckingError Type Classification

Grammatical Error Detection

1)

2)

Page 37: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Grammaticality Checker

- Feature Extraction

Grammaticality Checker

- Feature Extraction

Page 38: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Grammaticality Checker

- Model Learning

Grammaticality Checker

- Model Learning• Binary Classification

– Support Vector Machine• Model Selection

– Radial Basis Kernel– Search for C, γ which optimize:

• Maximize F-scoreSubject to Precision > 0.90, False positive rate < 0.01

– 5-fold cross-validation

Page 39: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Error Type Classi-fication

Error Type Classi-fication

• Error type information is useful for– Meta-linguistic feedback– Sophisticated learner model

• Simplest way– Choose the error type associated with the top ranked er-

ror pattern– Two flaws:

• does not have a principled way to break tied error patterns• does not consider the error frequency

• Weighting according to error frequency– Score(e) = TS(e) + α * EF(e)

Page 40: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

GES: Grammar Error Sim-ulator

GES: Grammar Error Sim-ulator

Automatic Speech Recog-

nizer

Grammatical Er-ror Simulator

Incorrect Sen-tences

Correct Sen-

tences

Error Types

<LM Adaptation & Grammatical Error Detection>

Page 41: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

GES Applica-tion

GES Applica-tion

<Grammar Quiz Generation>

Page 42: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Markov Logic Net-work

Markov Logic Net-work

• subject-verb agreement errors• omission errors of prepositions• omission errors of articles

He want go to movie theater

Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. Proceedings of the ACL 2009, Singapore, August 2009.Sungjin Lee, Jonghoon Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee. (2011) Grammatical Error Simu-lation for Computer-Assisted Language Learning, Knowledge-Based Systems

Page 43: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Grammar Error SimulationGrammar Error Simulation

• Realistic errors– Encoding characteristics of learners’ errors using the Markov

logic

• Over-generalization of some rules of the L2

• Lack of knowledge of some rules of the L2

• Applying rules and forms of the first language into the L2

Page 44: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Overall ProcessOverall Process

Page 45: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

NICT JLE CorpusNICT JLE Corpus

• Number of interviews– 167

• Number of sentences of intervie-wees– 8,316

• Average length of sentences– 15.59

• Nubmer of total errors– 15,954

<n_num crr=“x”>...</n_num>

POS(i.e. n=noun)

Grammatical system(i.e. num=number)

Corrected form

Erroneous part

Example) I belong to two baseball <n_num crr=“teams”>team</n_num>

Page 46: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

PESAA: POSTECH ENGLISH SPEAKING ASSESSMENT & ASSISTANT

PESAA: POSTECH ENGLISH SPEAKING ASSESSMENT & ASSISTANT

CHAPTER 4

Page 47: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

English oral proficiency assessment:International test

Reading aloud

Describing a picture

Answering to questions

Proposing a solution or opinion

Interview

Talking on a topic

Discussion

Giving an opinion

Talking on a subject

Answering to ques-tions

Page 48: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

English oral proficiency assessment:Korean national test

• National English Ability Test (NEAT)

• Tasks– Answering short questions (communication)– Describing pictures (story telling)– Presentation

• Describing figures, tables, and graphs• Introducing products or events

– Giving an opinion (discussion)

Page 49: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

English oral proficiency assessment:General common tasks

• Giving an opinion / discussion

• Rubrics– Delivery

• Pronunciation• Fluency (Prosody)

– Language use• Grammar• Word choice

– Topic development• Organization• Discourse• Contents

Page 50: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Requirements:Real environment

Reading aloud

Describing a picture

Answering to questions

Proposing a so-lution or opin-

ion

Interview

Talking on a topic

Discussion

Giving an opinion

Talking on a subject

Answering to ques-

tions

Existing systems for read speech

Spontaneous speechText-independent input

NEAT

Page 51: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

51

Training data collection

• SNU pronunciation/prosody

Speech waveform

Spectrogram/ pitch contour

Word

PLU

Sentence stress

Page 52: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

52

For Public Use

• Boston University radio news corpus– Speech from FM radio news announcers– 424 paragraphs (30,821 words)– ToBI labels (pitch accent stress)– 0.48 marked stress per word– PLU set: TIMIT phonetic labeling system

Page 53: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

53

• Aix-Marsec database

Speech waveform

Spectrogram/ pitch contour

Multi-level annota-tion

Page 54: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Collecting Grammar Error Data:Picture description task

• From English learners of Korean• Story Telling based on pictures• 80 Students (5 tasks for each student)

Page 55: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Collecting Grammar Error Data: Error tagsets

• JLE Tagset– Consisting of 46 tags– Systematic tag structure– Some ambiguity caused by POS specific error tag structure

• CLC Tagset– World-widely used tagset including 76 tags– Systematic & Taxonomic tag structure– JLE issue is figured out by taxonomic tag structure

• NUCLE Tagset– 27 error tags– Quiet arbitrary tag structure

• UIUC Tagset– Only for articles and prepositions

Page 56: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

PESAA: Pronuciation Feedback

EPD

Error information

UserUser

Forced Alignment

Comparison

Feedback Generation

Actual pronu

nciation

Speech input

MaterialMaterial

Error Detec-tion

Error candidates

Pronouncing Simulation

ASR

Word-level transcription

Orthographic pronunciation

simulation part

recognition part

error detection & feedback part

Page 57: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Pronunciation Error simulation:Pronunciation Variants

Canonical pronuncia-tionNative speaker’s pronuncia-tionNon-native speaker’s pronuncia-tion[straik]

[sɨtɨraikɨ]

Strike

Page 58: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Pronunciation Error simulation:Learning context rules using Generalized TBL

nth initial ma-chine annota-

tion

Collect transfor-mations

Best transforma-tion

List of trans-formations

Machine anno-tated data

Training in-put

Left-right ngram context

Iterative initialization

n := n + 1

Merge transforma-tions

Trainingreference

Majority choice/ Context

n := 0

nth order initial-ization rules

Apply

n

Page 59: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Pronunciation Error simulation:Multi-tag Result

• Example Input– Input

• Let’s go shopping• # L EH T S # G OW # SH AH P EH NG #

• Example Output– #/# L/L EH/EH T/T S/S #/# G/G OW/OW|AO #/# SH/SH AA/AH|AA P/P IH/IH NG/NG

#/#• #/# L/L EH/EH T/T S/S #/# G/G OW/AO #/# SH/SH AA/AA P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/OW #/# SH/SH AA/AA P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/AO #/# SH/SH AA/AH P/P IH/EH NG/NG #/#• #/# L/L EH/EH T/T S/S #/# G/G OW/OW #/# SH/SH AA/AH P/P IH/EH NG/NG #/#

Page 60: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Pronunciation Error detection/feedback

Error candi-date infor-

mation

Feedback pref-erence

Error confi-dence

Word ASR con-fidence

Phoneme ASR confidence

Feedback deci-sion

Feedback

Feedback DB

Page 61: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Pronunciation Error detection/Feedback:Components

Feedback preference

Error confi-dence

Phoneme ASR confidence

Word ASR con-fidence

)|Pr( xr

),,|Pr( 11 rhef ),,|Pr( 1 rhxe

)|Pr( xh

Page 62: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

62

PESAA: Prosody Feedback

• Stress & Prosodic phrasing & boundary tone

Stress

Prosodic phrasing

Boundary tone

* Existence of word/sentence stress for each syllable/word

* Location of phrase breaks

* Type of boundary tone for each phrasal boundary

Page 63: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

63

Sentence Stress Feedback:Architecture

Alignment

TextText

Analysis

Speech Analysis

Sentence Stress

Prediction

Model

Rule ApplicationRules

PredictedSentence

Stress

ModelTraining

Model

Sentence Stress

Detection

DetectedSentence

Stress

FeedbackDiff.

TextAnalysisText

Speech Signal

ModelTraining

Page 64: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

64

Sentence Stress Prediction

• Feature used– Position info: the number

of phonemes in word, the number of syllables in word, …

– Stress info: word stress, sentence stress (rule-based prediction), …

– Lexical info: identity of word, identity of vowel

– Part-of-speech info

Name Description

S-basic Content words

U-basic Functional words

U-adhoc Unclassified FW EX LS POS

U-aux MD special cases

U-adv RP special cases

S-frgn FW foreign words

S-vb Last VB in multi-ple verbs

Page 65: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

65

Sentence Stress Detection

• Feature used– Duration info: duration of vowel, duration of

syllable, normalized duration of word accord-ing to the number of syllables, …

– Intensity info: energy of vowel (+delta)– F0 info: f0 of vowel (+delta)– MFCC info: mfcc of vowel (+delta, +delta-

delta)– Lexical info: identity of vowel

Page 66: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

66

Sentence Stress Feedback

• Adopting output probability– Feedback candidates: syllables in “predicted

stress” with low or high output probability

Pre-dicted stress

It may

be

the

most

im por tan

t ap point

ment

De-tected stress

It may

be

the

most

im por tan

t ap point

ment

Not stressed

Stressed

Page 67: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

67

Sentence Stress Feedback:Snapshot

Page 68: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

PESAA: Grammar Feedback

Spoken English

Written English

User Input

GE Pat-terns

Spoken GE Simu-

lator

GE tagged Texts/

SpeechTraining

Soft Constraint

Correct Sentences

Spoken GE Detec-

tor

SVMTraining

ASR/CNSPEEC

H

Written GE Detec-

tor

GE tagged Texts

Written GE

SimulatorTraining

Soft Constraint

Correct Sentences

GE Pat-terns

SVMTraining

TEXTGE Feed-

back

Page 69: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Grammar Error detection:Snapshot – written input

Page 70: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Grammar Error detection:Snapshot – spoken input

Page 71: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

FIELD STUDYFIELD STUDY

CHAPTER 5

Page 72: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Field Study: Robot-Assisted Language Learning

Experimental Design1

2 Cognitive Effects

Affective Effects3

Sungjin Lee, Hyungjong Noh, Jonghoon Lee, Kyusong Lee, Gary Geunbae Lee, Seongdae Sagong, Moon-sang Kim. (2011) On the Effectiveness of Robot-Assisted Language Learning, ReCALL Journal, Vol.23(1), SSCI.Sungjin Lee, Changgu Kim, Jonghoon Lee, Hyungjong Noh, Kyusong Lee, Gary Geunbae Lee.Affective Ef -fects of Speech-enabled Robots for Language Learning. Proceedings of the 2010 IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley, December 2010Sungjin Lee, Hyungjong Noh, Jonghoon Lee, Kyusong Lee, Gary Geunbae Lee. Cognitive Effects of Robot-Assisted Language Learning on Oral Skills. Proceedings of Interspeech Second Language Studies Workshop, Tokyo, Sep 2010.

Page 73: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

HRI Technol-ogy

HRI Technol-ogy

Page 74: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

HRI Experimental Design

HRI Experimental Design

• Setting and participants– 24 elementary students– Ranging in age over 9-13– Divided into two groups (beginner, intermedi-

ate)

• Material and treatment– 68 lessons

• 17 lessons for each level and theme– Simple to complex task– 2 hours a week extended over 8 weeks

Page 75: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

HRI Experimental Design

HRI Experimental Design

1) PC room

2) Pronunciation training room

3) Fruit and Vegetablestore

4) Stationerystore

Page 76: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Evaluation of Cognitive Effects

Evaluation of Cognitive Effects

• Data collection and analysis

– Evaluation method• Pre-test/Post-test

– For the listening skills• 15 items for multiple choice question• Cronbach’s alpha

– pre-test: 0.87, post-test: 0.66

– For the speaking skills• 10 items for 1-on-1 interview• Cronbach’s alpha

– pre-test: 0.93, post-test: 0.99

Page 77: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

<Cognitive effects on oral skills for overall students>

Experiment Result

Experiment Result

*p < .05

Page 78: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Evaluation of Affective Factors

Evaluation of Affective Factors

• Data collection• Questionnaire (4 point scale without a neutral option)

• Data analysis– For satisfaction in using robots

• Descriptive statistics– For interest in learning English, Confidence with English,

Motivation for learning English• Pre-/Post-test

Affective Factor N Ɨ R ƗƗ

Satisfaction in using robots 10 0.73

Interest in learning English 16 0.93(0.96)

Confidence with English 12 0.91(0.90)

Motivation for learning English 14 0.91(0.83)

N Ɨ = Number of questions, R ƗƗ = Cronbach’s alpha in the form of pre-test(post-test)

Page 79: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Effects on Affective Factors

Effects on Affective Factors

Satis

fact

ion

in u

sing

robo

ts

Inte

rest

in le

arni

ng E

nglis

h

Confid

ence

with

Eng

lish

Mot

ivat

ion

for l

earn

ing

Engl

ish

0

1

2

3

4

Pre-testPost-test

Page 80: Gary Geunbae Lee, POSTECH. Outline 1 1 2 2 4 4 5 5 3 3.

Thank you

Thank you


Recommended