Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 1 times |
Discourse Annotation for Improving Spoken Dialogue Systems
Joel Tetreault, Mary Swift, Preethum Prithviraj, Myroslava Dzikovska, James Allen University of RochesterDepartment of Computer ScienceACL Workshop on Discourse AnnotationJuly 25, 2004
Reference in Spoken Dialogue Resolving anaphoric expressions correctly is critical
in task-oriented domains Makes conversation easier for humans
Reference resolution module provides feedback to other components in system Ie. Incremental Parsing, Interpretation Module
Investigate how to improve RRM: Does deep semantic information provide an improvement
over syntactic approaches? Discourse Structure could be effective in reducing search
space of antecedents and improving accuracy (Grosz and Sidner, 1986)
Goal
Construct a linguistically rich parsed corpus to test algorithms and theories on reference in spoken dialogue, to provide overall system improvement Implicit roles
Paucity of empirical work on reference in spoken dialogue (Bryon and Stent 1998, Eckert & Strube, 2000; etc.)
Outline
Corpus Construction Parsing Monroe Domain Reference Annotation Dialogue Structure Annotation
Results Personal pronoun evaluation Dialogue Structure
Summary
Parsing Monroe Domain Domain: Monroe Corpus of 20 transcriptions (Stent, 2001) of
human subjects collaborating on Emergency Rescue 911 tasks Each dialogue was at least 10 minutes long, and most were over
300 utterances long Work presented here focuses on 5 of the dialogues 17 (1756
utterances) Goals: develop a corpus of sentences parsed with rich syntactic,
semantic, discourse information to: Improve TRIPS parser (Swift et al., 2004) Train statistical parser for comparison with existing parser Develop incremental parser (Stoness et al., 2004) Develop automated techniques for marking repairs
Parser information for Reference Rich parser output is helpful for discourse
annotation and reference resolution: Referring expressions identified (pronoun, NP, impros) Verb roles and temporal information (tense, aspect)
identified Noun phrases have semantic information associated
with them Speech act information (question, acknowledgment) Discourse markers (so, but) Semi-automatic annotation increases reliability
Monroe Corpus Example
UTT SPK:SA TEXTUtt53 S: TELL and so we're going to take an ambulance
from saint mary's hospital Utt54 U: TELL oh you never told me about the ambulancesUtt55 U: WH-QU how many do you have Utt56 S: TELL there's one at saint mary's hospital and
two at rochester general hospitalUtt57 U: IDENTIFY two Utt58 U: CONFIRM okay Utt59 S: TELL and we're going to take an ambulance from
saint mary's to east main street Utt60 S: CCA and that is as far as i have planned Utt61 U: CONFIRM okay Utt62A U: CONFIRM okay
TRIPS Parser
Broad-coverage, deep parser Uses bottom-up algorithm with CFG and
domain independent ontology combined with a domain model
Flat unscoped LF with events and labeled semantic roles based on FrameNet
Semantic information for noun phrases based on EuroWordNet
Semantics Example: “an ambulance” (TERM :VAR V213818
:LF (A V213818 (:* LF::LAND-VEHICLE W::AMBULANCE) :INPUT (AN AMBULANCE))
:SEM ($ F::PHYS-OBJ (SPATIAL-ABSTRACTION SPATIAL-POINT)
(GROUP -) (MOBILITY LAND-MOVABLE) (FORM ENCLOSURE) (ORIGIN ARTIFACT) (OBJECT-FUNCTION VEHICLE) (INTENTIONAL -) (INFORMATION -) (CONTAINER (OR + -))
(TRAJECTORY -)))
Semantic Representations for “Them” “and then send them to Strong Hospital” (TERM :VAR V3337536
:LF (PRO V3337536
(SET-OF (:* LF::REFERENTIAL-SEM THEM))
:SEM ($ F::PHYS-OBJ (F::MOBILITY F::MOVABLE)))
Corpus Construction Mark sentence status (ungrammatical, incomplete,
conjoined) and mark speech repairs Parse with domain-specific semantic restrictions for
better coverage Handcheck sentences, marking GOOD or BAD
Criteria for GOOD: both syntactic and semantic must be correct
Update parser to cover BAD cases Reparse and repeat handchecking
DataCollection
CorpusAnnotation
RunParser
ManualUpdate
ParserUpdate
Reparse &Merge
Current CoverageCorpus % Good Good Bad NA Total
S2 90.8% 325 34 37 405
S4 76.1% 246 78 61 388
S12 89.9% 151 17 21 189
S16 84.2% 298 56 29 383
S17 85.2% 311 54 26 392
Overall 84.1% 1331 239 174 1757
Reference Annotation
Annotated dialogues for reference w/undergraduate researchers (created a Java Tool: PronounTool)
Markables determined by LF terms Identification numbers determined by :VAR field of LF
term Used stand-off file to encode what each pronoun refers
to (refers-to) and the relation between pronoun and antecedent (relation)
Post-processing phase assigns an unique identification number to coreference chains
Also annotated coreference between definite noun phrases
Reference Annotation
Used slightly modified MATE scheme: pronouns divided into the following types: IDENTITY (Coreference) (278) FUNCTIONAL (20) PROPOSITON/D.DEXEIS (41) ACTION/EVENT (22) INDEXICAL (417) EXPLETIVE (97) DIFFICULT (5)
Dialogue Structure
How to integrate discourse structure into a reference module? Is it worth it?
Shallow techniques may work better: may not be necessary to get a fine embedding to improve reference resolution
Implemented QUD-based technique and Dialogue Act model (Eckert and Strube, 2000)
Annotated in a stand-off file
literal QUD
Questions Under Discussion (Craige Roberts, Jonathan Ginzburg) – questions or modals can be viewed as creating a discourse segment
Result – questions provide a shallow discourse structuring, but that maybe enough to improve performance
Entities in QUD main segment can be viewed as the topic Segment closed when question is answered (use ack
sequences, change in entities used) only entities from answer and entities in question are
accessible Can be used in TRIPS to reduce search space of entities –
set context size
QUD Annotation Scheme
Annotate: Start utterance End utterance Type (aside, repeated question, unanswered, nil)
QUD
Issue 1: easy to detect Q’s (use Speech-Act information), but how do you know Q is answered?
Cue words, multiple acknowledgements, changes in entities discussed provide strong clues that question is finishing
Issue 2: what is more salient to a QUD pronoun – the QUD topic or a more recent entity?
Dialogue Act Segmentation
Utterances that are not acknowledged by the listener may not be in common ground and thus not accessible to pronominal reference
Evaluation showed improvement for pronouns referring to abstract entities, and strong annotator reliability
Each utterance marked as I: contains content (initiation), A: acknowledgment, C: combination of the above
Results
Incorporating semantics into reference resolution algorithm (LRC) improves performance from 61.5% to 66.9% (CATALOG ’04)
Preliminary QUD results show an additional boost to 67.3% (DAARC ’04)
E&S Automated: 63.4% E&S Manual: 60.0%
Issues
Inter-annotator agreement for QUD annotation Segment ends are hardest to synch
Ungrammatical and fragmented utterances: Parse automatically or manually?
Small corpus size: need more data for statistical evaluations
Parser freeze? important for annotators to stay abreast of latest changes
Summary
Semi-automated parsing process to produce reliable discourse annotation
Discourse annotation done manually, but automated data helps guide manual annotation
Result: spoken dialogue corpus with rich linguistic data