Date post: | 05-Dec-2014 |
Category: |
Education |
Upload: | behrang-qasemizadeh |
View: | 145 times |
Download: | 2 times |
Extracting Information for Context-aware Meeting Preparation
Simon Scerri, Behrang Q. Zadeh, Maciej Dabrowski, Ismael Rivera
26.05.2014 LREC 2014. Reykjavik, Iceland.
General Objectives
LREC 2014Wednesday, 28 May 2014
Information Extraction Targets
Information Items & their attributes: (Semi)Structured• Email Messages• Instant Messages• Documents• Calendar Events• Folders
Item Titles, Descriptions & Content: Complex/Unstructured• Keywords• Action Items: Information Request & Task Request
LREC 2014Wednesday, 28 May 2014
Architecture
LREC 2014Wednesday, 28 May 2014
Keyword Extraction - Method
Keyword Extraction
General Text Processing Indexing and Storage
Keyword Extraction - Method
LREC 2014Wednesday, 28 May 2014
• Generic term extraction architecure• Based on the assumption that similar
terms appear in similar contexts• Use the context of previously known
terms to identify new terms
• Random Indexing for the Construction of a VSM at reduced dimension
• Create a training set using the previously known terms
• Use Linear least square support vector machine (SVM)
MLTagger• Mining technical terms (Expert Vocabulary) in semi-supervised
manner (minimum user intervention)• Train or Use Pre-Trained Models• Input: a Sentence
• Tagger based on Liblinear SVMs • Includes POS tags, Dependency Structures• Includes user feedback to identify relevant terms
• Output: Set of weighted terms
technology term / 1.3071887518221268term tagger / 0.859136213710545technology term tagger / 0.75647809808033technology related terms / 0.38733521155619terms / 0.3856395759054531Dependency / 0.24820541872752222identification of technology related terms / 0.22234662115108667
technology / 0.2218680207043609technology / 0.20526909576693653features including POS tags / 0.169229802088223Dependency Structures / 0.1408195803257369features including Part / 0.12821844123781564Part of Speech tags / 0.10986616318102964
Keyword Extraction - Evaluation
LREC 2014Wednesday, 28 May 2014
Evaluation over corpora of scientific papersSection A of ACL Anthology Reference CorpusSemantic web dog food corpusEvaluated datasets are availed here: http://parsie.deri.ie/datasets/TTI/
Precision-Recall estimation
GATE Pipeline (English)• Conditional Corpus
ANNIE IE System• Tokeniser/NE Transducer/POS Tagger
Gazetteer Lookup• Verbs (Actions, Activities, Modal verbs)• Grammatical Person
JAPE Hand-coded Rules• 62 rules in 16 phases• Grammatical Person
Action Item Extraction - Method
LREC 2014Wednesday, 28 May 2014
Action Item Extraction - Method
LREC 2014Wednesday, 28 May 2014
Action Item Extraction - Evaluation
LREC 2014Wednesday, 28 May 2014
Human vs Automatic Annotation• > 100 email messages• > 240 chat turns • Confirmation of Extracted Action Items• Marking False positives & False negatives (Missed Items)
Results• F2-measure: 0.69• Email only: 0.71• IM only: 0.64
Extracted Items: Unified Representation
LREC 2014Wednesday, 28 May 2014
Future Work
LREC 2014Wednesday, 28 May 2014
Action Item Extraction• Separation of pipelines
• Email & IM
• IM Pipeline: Abbreviation/TxtSpk replacement service
Keyword Extraction• Iterative Learning Procedure (App Validation)
•Active Learning – k-nearest-neighbour Regression instead of SVM
• Chat-email histories to Extract Background Knowledge• Application of Association Measures for Filtering Candidate terms