Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | joy-hamilton |
View: | 212 times |
Download: | 0 times |
Talk Schedule Question Answering from Email
Bryan KlimtJuly 28, 2005
Project Goals• To build a practical working question
answering system for personal email
• To learn about the technologies that go into QA (IR,IE,NLP,MT)
• To discover which techniques work best and when
System Overview
Dataset
• 18 months of email (Sept 2003 to Feb 2005)
• 4799 total• 196 are talk announcements• hand labelled and annotated
• 478 questions and answers
A new email arrives…
• Is it a talk announcement?
• If so, we should index it.
Email Classifier
Email Data
LogisticRegression
DecisionLogisticRegressionCombo
Classification Performance
• precision = 0.81• recall = 0.66• (previous works had better
performance)
• Top features:– abstract, bio, speaker, copeta, multicast,
esm, donut, talk, seminar, cmtv, broadcast, speech, distinguish, ph, lectur, ieee, approach, translat, professor, award
Annotator• Use Information Extraction
techniques to identify certains types of data in the emails– speaker names and affiliations– dates and times– locations– lecture series and titles
Annotator
Rule-based Annotator
• Combine regular expressions and dictionary lookups
• defSpanType date =:...[re('^\d\d?$') ai(dayEnd)?
ai(month)]...;
• matches “23rd September”
Conditional Random Fields
• Probabilistic framework for labelling sequential data
• Known to outperform HMMs (relaxation of independence assumptions) and MEMMs (avoid “label bias” problem)
• Allow for multiple output features at each node in the sequence
Rule-based vs. CRFs
Rule-based vs. CRFs
• Both results are much higher than in previous study
• For dates, times, and locations, rules are easy to write and perform extremely well
• For names, titles, affiliations, and series, rules are very difficult to write, and CRFs are preferable
Template Filler• Creates a database record for each talk
announced in the email• This database is used by the NLP answer
extractor
Filled TemplateSeminar {
title = “Keyword Translation from English toChinese for Multilingual QA”
name = Frank Lintime = 5:30pmdate = Thursday, Sept. 23location = 4513 Newell Simon Hallaffiliation = series =
}
Search Time• Now the email is index• The user can ask questions
IR Answer ExtractorWhere is Frank Lin’s talk?
0.5055 3451.txtsearch[468:473]: "frank"search[2025:2030]: "frank"search[474:477]: "lin”
0.1249 2547.txtsearch[580:583]: "lin”
0.0642 2535.txtsearch[2283:2286]: "lin"
• Performs a traditional IR (TF-IDF) search using the question as a query
• Determines the answer type from simple heuristics (“Where”->LOCATION)
IR Answer Extractor
NL Question Analyzer
• Uses Tomita Parser to fully parse questions to translate them into a structured query language
• “Where is Frank Lin’s talk?”• ((FIELD LOCATION)
(FILTER (NAME “FRANK LIN”)))
NL Answer Extractor
• Simply executes the structured query produced by the Question Analyzer
• ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”)))
• select LOCATION from seminar_templates where NAME=“FRANK LIN”;
Results
• NL Answer Extractor -> 0.870• IR Answer Extractor -> 0.755
Answer Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NL Answer Extractor IR Answer Extractor
Results• Both answer extractors have similar
(good) performance• IR based extractor
– easy to implement (1-2 days)– better on questions w/ titles and names– very bad on yes/no questions
• NLP based extractor– more difficult to implement (4-5 days)– better on questions w/ dates and times
Examples• “Where is the lecture on dolphin language?”
– NLP Answer Extractor: Fails to find any talk– IR Answer Extractor: Finds the correct talk– Actual Title: “Natural History and Communication of
Spotted Dolphin, Stenella Frontalis, in the Bahamas”
• “Who is speaking on September 10?”– NLP Extractor: Finds the correct record(s)– IR Extractor: Extracts the wrong answer– A talk “10 am, November 10” ranks higher than one
on “Sept 10th”
Future Work• Add an annotation “feedback loop” for the
classifier
• Add a planner module to decide which answer extractor to apply to each individual question
• Tune parameters for classifier and TF-IDF search engine
• Integrate into a mail client!
Conclusions• Overall performance is good enough for the
system to be helpful to end users
• Both rule-based and automatic annotators should be used, but for different types of annotations
• Both IR-based and NLP-based answer extractors should be used, but for different types of questions
DEMO