+ All Categories
Home > Documents > An an Lad a Proposal

An an Lad a Proposal

Date post: 02-Jun-2018
Category:
Upload: ir-ayen
View: 217 times
Download: 0 times
Share this document with a friend

of 46

Transcript
  • 8/11/2019 An an Lad a Proposal

    1/46

    Ph.D. Thesis Proposal February 9th, 2005

    Learning the Structure of Task-Oriented

    Conversations from the Corpus

    Ananlada ChotimongkolLTI Ph.D. thesis proposal

    Thesis Committee:

    Alexander Rudnicky (Chair)William Cohen

    Carolyn Penstein Rose

    Gokhan Tur (AT&T Lab Research)

  • 8/11/2019 An an Lad a Proposal

    2/46

    Ph.D. Thesis Proposal February 9th, 2005

    2

    Outline Introduction to the problem

    Approach

    Research program

    Summary

  • 8/11/2019 An an Lad a Proposal

    3/46

    Ph.D. Thesis Proposal February 9th, 2005

    3

    Outline Introduction to the problem

    Approach

    Research program

    Summary

  • 8/11/2019 An an Lad a Proposal

    4/46

    Ph.D. Thesis Proposal February 9th, 2005

    4

    Building a new dialog system

    Speech

    Synthesizer

    Speech

    Recognizer

    Natural

    Language

    Generator

    I would like to fly to

    Seattle tomorrow.

    When would

    you like to

    leave?

    Natural

    Language

    Understanding

    Dialog

    Manager

    omain

    Knowledge

    problem: approach : research program : summary

  • 8/11/2019 An an Lad a Proposal

    5/46

    Ph.D. Thesis Proposal February 9th, 2005

    5

    Domain knowledge Steps in the task

    Specify the desired flight

    Search for flights that match the criteria Negotiate the flights

    Make a reservation

    Important information, keywords Destination, date, time, airlines, etc.

    Domain language: how do people talk

    problem: approach : research program : summary

  • 8/11/2019 An an Lad a Proposal

    6/46

    Ph.D. Thesis Proposal February 9th, 2005

    6

    What is the problem?

    Speech

    Synthesizer

    Speech

    Recognizer

    Natural

    Language

    Generator

    I would like to fly to

    Seattle tomorrow.

    When would

    you like to

    leave?

    Natural

    Language

    Understanding

    Dialog

    Manager

    omain

    Knowledge

    Can

    t reuseTime consumingMay need an expert

    problem: approach : research program : summary

  • 8/11/2019 An an Lad a Proposal

    7/46

    Ph.D. Thesis Proposal February 9th, 2005

    7

    Research goal Reduce human effort on acquiring

    domain knowledge when create a

    dialog system in a new domain

    problem: approach : research program : summary

  • 8/11/2019 An an Lad a Proposal

    8/46

    Ph.D. Thesis Proposal February 9th, 2005

    8

    Outline Introduction to the problem

    Approach

    Research Program

    Summary

  • 8/11/2019 An an Lad a Proposal

    9/46

    Ph.D. Thesis Proposal February 9th, 2005

    9

    Observations Task-oriented conversations have a

    clear structure

    Reflects domain information e.g. a task isdivided into sub-tasks

    Has recurring patterns that are observable

    through the language

    problem: approach: research program : summary

  • 8/11/2019 An an Lad a Proposal

    10/46

    Ph.D. Thesis Proposal February 9th, 2005

    10

    Thesis statement

    Approach Identify the structure of task-oriented dialogs

    Learn the structure from observations

    Develop a learning system that is able to identify allnecessarydomain knowledgerequired by a dialog

    system in a task-oriented domainthrough theobservation of human-human conversations

    problem: approach: research program : summary

  • 8/11/2019 An an Lad a Proposal

    11/46

    Ph.D. Thesis Proposal February 9th, 2005

    11

    Desired structure properties Sufficient

    Capture all domain knowledge required to carry

    out the task General (domain-independent)

    Can describe dialog in dissimilar domains andtypes

    Learnable Can be learned from data using a machine

    learning technique

    problem: approach: research program : summary

  • 8/11/2019 An an Lad a Proposal

    12/46

    Ph.D. Thesis Proposal February 9th, 2005

    12

    Previous Approaches Theoretical-oriented:

    Theory of Discourse Structure (Grosz and Sidner,

    1986) Discourse Representation Theory (DRT) (Kamp

    and Reyle, 1993)

    Engineering-oriented:

    Plan-based theory (Allen and Perrault, 1980) The theory of Conversation Acts (Traum and

    Hinkelman, 1992)

    problem: approach: research program : summary

  • 8/11/2019 An an Lad a Proposal

    13/46

    Ph.D. Thesis Proposal February 9th, 2005

    13

    Outline Introduction to the problem

    Approach

    Form-based dialog structure

    Dialog structure learning

    Research Program

    Summary

  • 8/11/2019 An an Lad a Proposal

    14/46

    Ph.D. Thesis Proposal February 9th, 2005

    14

    Form-based dialog structure Use a form-based dialog architecture to

    represent a structure of a dialog Concrete mappingbetween structure components

    and dialog system components Sufficientfor an information-accessing task Generalenough to represent other types of task-

    oriented dialogsThrough the analysis of dialogs

    Learnablefrom a corpus of human-humanconversationsPreliminary experiments on concept clustering

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    15/46

    Ph.D. Thesis Proposal February 9th, 2005

    15

    Form-based structure

    components Task Structure

    Domain information necessary for

    achieving the task goal

    Dialog mechanism

    The mechanisms that the participants use

    to advance the dialog toward the goal

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    16/46

    Ph.D. Thesis Proposal February 9th, 2005

    16

    Task structureData representation for domain information

    Task: a subset of dialogs that has a specific goal

    => a set of forms Sub-task: a step in a task that contributes toward a

    task goal

    => form

    Concept: key information=> slot

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    17/46

    Ph.D. Thesis Proposal February 9th, 2005

    17

    Task structure example:

    Bus schedule enquiry domain1. Task (multiple tasks):

    Which bus runs between A and B?

    When will the bus X arrive?

    2. Sub-tasks: no further decomposition

    3. Concepts:

    Bus Number={61C, 28X, }

    Location={CMU, airport, }

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    18/46

    Ph.D. Thesis Proposal February 9th, 2005

    18

    Task structure example:

    Map reading domain Task: draw a route on a map

    Sub-tasks:

    Draw a segment of a route

    Concepts:

    Landmark = {White_Mountain, Machete, }

    Orientation = {down, left, }

    Distance = {a couple of centimeters, an inch, }

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    19/46

    Ph.D. Thesis Proposal February 9th, 2005

    19

    Dialogue mechanisms

    (form operators) Task-oriented operations

    Manipulate a form (data structure)

    Ex: init_form, fill_form

    Discourse-oriented operations Manage the flow of a conversation

    Ex: acknowledgement, greeting

    Domain independent

    same consequence, only operation parameters that aredifferent

    Fill city_name in flight_information form

    Fill landmark in line_segment form

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    20/46

    Ph.D. Thesis Proposal February 9th, 2005

    20

    Bus schedule enquiry domain

    Form: Query_Departure_Time

    Depart_Location:

    Arrive_Location:

    Arrive_Time:Bus_Number:

    Form: Query_Departure_Time

    Depart_Location: forbes avenue

    Arrive_Location: the airport

    Arrive_Time:Bus_Number: 28X

    U2: fill_form_info: i wanted to take the 28X bus from /um/

    DepLoc:[forbes avenue] toArLoc:[the airport]

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    21/46

    Ph.D. Thesis Proposal February 9th, 2005

    21

    Form: Line_Segment

    Origin:

    Orientation:

    Distance:Path:

    Destination:

    Map reading domainGIVER89: fill_form_info:well go Orient:[straightup ] from Ori:[the

    Mod:[top] of the Landmark:[white mountain]] 'til you're

    just Dest:[Mod:[beside] the Landmark:[golden beach]]

    FOLLOWER90: acknowledge: right,

    Form: Line_Segment

    Origin:Modifier: topLandmark: white mountain

    Orientation:straightup

    Distance:Path:

    Destination:Modifier: besideLandmark: golden beach

    problem: approach : form-based structure: learning : research program : summary

  • 8/11/2019 An an Lad a Proposal

    22/46

    Ph.D. Thesis Proposal February 9th, 2005

    22

    Outline Introduction to the problem

    Approach

    Form-based dialog structure

    Dialog structure learning

    Research Program

    Contributions

    Thesis timeline

  • 8/11/2019 An an Lad a Proposal

    23/46

    Ph.D. Thesis Proposal February 9th, 2005

    23

    The learning framework Goal: minimize human effort

    Use unsupervised learning when possible

    Incorporating information from existing knowledgesources

    If additional knowledge from a human is required

    Train an initial model with a small amount of annotateddata

    Use unsupervised learning or active learning to exploreun-annotated data that is informative

    A human can correct a mistake

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    24/46

    Ph.D. Thesis Proposal February 9th, 2005

    24

    Learning problems Concept identification and clustering

    Form identification

    Operation classification

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    25/46

    Ph.D. Thesis Proposal February 9th, 2005

    25

    Concept identification and

    clustering Goal: Identify concept words and group

    the similar ones into the same cluster

    City={Pittsburgh, Boston, Austin, }

    Month={January, February, March, }

    Assumption:

    Word boundaries including compound wordboundaries are given

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    26/46

    Ph.D. Thesis Proposal February 9th, 2005

    26

    Approach1. Identify potential concept members

    Filter out noise, function words

    2. Cluster similar words together Statistical-based: Mutual information, Kullback-

    Liebler distance

    Knowledgebase: WordNet

    3. Select clusters that represent domainconcepts Use the same criteria as 1. but work on a cluster

    level

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    27/46

    Ph.D. Thesis Proposal February 9th, 2005

    27

    Concept clustering resultAlgorithms Precision Recall SS QS

    MI 0.82 0.41 0.72 0.60

    KL 0.83 0.42 0.73 0.61

    KL-single 0.70 0.33 0.59 0.49

    KL-complete 0.78 0.60 0.50 0.61

    KL-average 0.82 0.50 0.68 0.64

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    28/46

    Ph.D. Thesis Proposal February 9th, 2005

    28

    Form-based dialog structure

    summary Concrete mappingbetween structure

    components and dialog system components

    Sufficientfor an information-accessing task Generalenough to explain other types of

    task-oriented dialogsThrough the analysis of dialogs

    Learnablefrom a corpus of human-humanconversationsPreliminary experiments on concept clustering

    problem: approach : form-based structure : learning: research program : summary

  • 8/11/2019 An an Lad a Proposal

    29/46

    Ph.D. Thesis Proposal February 9th, 2005

    29

    Outline Introduction to the problem

    Approach

    Research Program

    Summary

    bl h h

  • 8/11/2019 An an Lad a Proposal

    30/46

    Ph.D. Thesis Proposal February 9th, 2005

    30

    Proposed research program Dialog structure analysis

    Is the scheme generalizable?

    Inter-annotator agreement experiment Is the scheme unambiguous?

    Improve concept clustering How can concepts best be identified?

    Form identification How are topics/forms identified?

    Operation classification How can operators be identified?

    problem: approach : research program: summary

    bl h h t t l i

  • 8/11/2019 An an Lad a Proposal

    31/46

    Ph.D. Thesis Proposal February 9th, 2005

    31

    Dialog structure analysis Goal:Verify that the proposed dialog

    structure is generalizedfor other task-

    oriented domainsAnalyze 2 more domains

    Tutoring domain (WHY Human Tutoring

    corpus) Meeting domain (CMU CALO Meeting

    corpus)

    problem: approach : research program: structure analysis: summary

    bl h h i t t t t

  • 8/11/2019 An an Lad a Proposal

    32/46

    Ph.D. Thesis Proposal February 9th, 2005

    32

    Inter-annotator agreement Goal: Verify thatthe proposed dialog

    structure can be understood and

    applied by other annotators Evaluate with kappa coefficient (K)

    problem: approach : research program: inter-annotator agreement: summary

    )(1

    )()(

    EP

    EPAP

    K

    bl h h i t t t t

  • 8/11/2019 An an Lad a Proposal

    33/46

    Ph.D. Thesis Proposal February 9th, 2005

    33

    Inter-annotator agreement

    experiments Two annotation tasks

    Task-structure identification Identify the structure of the task in the new domain

    Design domain-specific labels from the definition ofdialog structure

    Dialog structure recognition Annotate dialogs for the task-structure and the operation

    Two different types of task-oriented dialogs Air travel domain (information-accessing task)

    Map reading domain (command-and-control task)

    problem: approach : research program: inter-annotator agreement: summary

    problem : approach : research program: concept clustering : summary

  • 8/11/2019 An an Lad a Proposal

    34/46

    Ph.D. Thesis Proposal February 9th, 2005

    34

    Improve concept clustering Goal: Improve the quality of the concept

    identification and clustering technique

    1. Combine concept identification features Develop the concept likelihood score

    2. Combine statistical-based clustering withknowledgebase clustering

    Revise result from statistical-based clusteringwith information in the knowledgebase

    3. Implement post-clustering selection

    problem: approach : research program: concept clustering: summary

    problem : approach : research program: form identification : summary

  • 8/11/2019 An an Lad a Proposal

    35/46

    Ph.D. Thesis Proposal February 9th, 2005

    35

    Form Identification Goal: determine different types of forms

    that occur in the domain

    Assumption:

    A dialog may be annotated with conceptlabels

    problem: approach : research program: form identification: summary

    problem : approach : research program: form identification : summary

  • 8/11/2019 An an Lad a Proposal

    36/46

    Ph.D. Thesis Proposal February 9th, 2005

    36

    Approach Segment a dialog into a sequence of sub-

    tasks (form boundaries identification) Train a classifier on lexicon cohesion (Hearst,

    1994) and prosodic features Group together the sub-tasks that belong to

    the same form type Use unsupervised clustering based on cosine

    similarity

    Identify a set of slots that associated witheach form type Analyze a cluster of similar form instances

    problem: approach : research program: form identification: summary

    3problem : approach : research program: operation classification : summary

  • 8/11/2019 An an Lad a Proposal

    37/46

    Ph.D. Thesis Proposal February 9th, 2005

    37

    Operation Classification Goal: Learn the expressions that associate

    with each operation

    by classifying an utterance into a pre-defined setof operations

    Assumption

    A dialog may be annotated with concepts labels

    List of operation types are given Operation boundaries are known

    problem: approach : research program: operation classification: summary

    38problem : approach : research program: operation classification : summary

  • 8/11/2019 An an Lad a Proposal

    38/46

    Ph.D. Thesis Proposal February 9th, 2005

    38

    Supervised classification Features: words, concepts, prosody

    Markov model (Woszczyna and Waibel, 1994)

    States = operation types Emission probability

    Operation-dependent language model probability

    Decision tree probability for prosodic features

    Conditional random fields (Lafferty et al., 2001) Use the same model structure as Markov model

    j

    jj UTFUZ

    UTP )),(exp()(

    1)|(

    problem: approach : research program: operation classification: summary

    39problem : approach : research program: operation classification : summary

  • 8/11/2019 An an Lad a Proposal

    39/46

    Ph.D. Thesis Proposal February 9th, 2005

    39

    Unsupervised learning

    and active learning1. Train an initial classifier from human-labeled data

    2.Apply the current classifier to an unlabeledoperation (Unsupervised learning) if the confidence is high, add

    this instance and the predicted label into the training set

    (Active learning) if the confidence is low, ask a human tolabel this instance and then add it into the training set

    3. Train a new classifier on all labeled data (bothmachine-labeled and human-labeled)

    Step 2-3 can be iterated

    problem: approach : research program: operation classification: summary

    40problem : approach : research program: operation classification : summary

  • 8/11/2019 An an Lad a Proposal

    40/46

    Ph.D. Thesis Proposal February 9th, 2005

    40

    Classifier confidence score1. Difference in probabilities between the

    first rank and the second rank

    2. The entropy of the classifier output

    High entropy = low confidence

    )|(

    1log)|()(

    ijj

    ijUTp

    UTpTH

    problem: approach : research program: operation classification: summary

    41

  • 8/11/2019 An an Lad a Proposal

    41/46

    Ph.D. Thesis Proposal February 9th, 2005

    41

    Outline Introduction to the problem

    Approach

    Research Program

    Summary

    42problem : approach : research program: form identification : summary

  • 8/11/2019 An an Lad a Proposal

    42/46

    Ph.D. Thesis Proposal February 9th, 2005

    42

    Thesis contributions A dialog structure framework that is

    sufficient, generaland learnable, and has a

    concrete mappingbetween dialog structurecomponents and dialog system behavior

    A machine learning technique for inferring thestructure of the dialog from data with limit

    amount of human supervisionReduce human effort in acquiring domain-specific

    information

    problem: approach : research program: form identification : summary

    43problem : approach : research program: form identification : summary

  • 8/11/2019 An an Lad a Proposal

    43/46

    Ph.D. Thesis Proposal February 9th, 2005

    43

    Thesis contributions (Cont.) An unsupervised algorithm that can identify

    and cluster domain concepts from un-

    annotated data An utterance-type classifier that is able to

    utilize unlabeled data through unsupervisedlearning and active learning

    A discourse segmentation algorithm that canidentify the boundaries between similar typesub-tasks and dissimilar type sub-tasks

    problem: approach : research program: form identification : summary

    44problem : approach : research program: form identification : summary

  • 8/11/2019 An an Lad a Proposal

    44/46

    Ph.D. Thesis Proposal February 9th, 2005

    44

    Timeline

    problem: approach : research program: form identification : summary

    Research Activity

    Spring 2005 Summer 2005 Winter 2005 Spring 2006

    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Sep Dec Jan Feb Mar Apr

    Dialog structure analysis

    Inter-annotator agreement

    experiment

    Concept identification and

    clustering

    Operation classification

    Form identification

    Thesis write up

    45

  • 8/11/2019 An an Lad a Proposal

    45/46

    Ph.D. Thesis Proposal February 9th, 2005

    45

    Question?

    46

  • 8/11/2019 An an Lad a Proposal

    46/46

    46

    Reference Grosz, B. and Sidner, C., Attentions, intentions and the structure of discourse,

    Computational Linguistics, Vol. 12, pp. 175-204, 1986. Kamp, H. and Reyle, U., From Discourse to Logic: Introduction to Modeltheoretic

    Semantics of Natural Language, Formal Logic and Discourse RepresentationTheory, Kluwer, Dordrecht, The Netherlands, 1993.

    Allen, J. and Perrault, R.,Analyzing intention in utterances

    , ArtificialIntelligence, Vol. 15, pp. 143-178, 1980.

    Traum, D. and Hinkelman, E., Conversation Acts in Task-Oriented SpokenDialogue, Computational Intelligence, Vol. 8, No. 3, pp. 575-599, 1992.

    Hearst, M., Multi-paragraph segmentation of expository text, Proceedings of the32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces,NM, 1994.

    Woszczyna, M. and Waibel, A., Inferring linguistic structure in spoken language,

    Proceedings of ICSLP-1994, Yokohama, Japan, September, 1994. Lafferty, J., McCallum, A. and Pereira, F., Conditional random fields: Probabilistic

    models for segmenting and labeling sequence data, Proceedings of 18thInternational Conference on Machine Learning, pp. 282-289, San Francisco, CA,2001.


Recommended