Date post: | 29-Nov-2014 |
Category: |
Science |
Upload: | aljaz-kosmerlj |
View: | 63 times |
Download: | 1 times |
Funded under: FP7Area: Language Technologies (ICT-2011.4.2)Project reference: 288342Coordinator: Marko Grobelnik
www.xlike.org
INTERFACE
Aljaž Košmerlj, Jenya Belyaeva, Gregor Leban, Blaž Fortuna, Marko Grobelnik
Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia
We present a system for manually extracting structured event information from freeform newswire text. The extraction isperformed on news articles preprocessed by services developed within the XLike project and is guided by suggestions thesystem produces using machine learning techniques. Results of testing performed using human annotators show thesystem can produce meaningful data and suggest several avenues for improvement of the system.
List of articles about the event and alist of entities (i.e. noun phrases)found in the articles.
Type of event described in thearticles defined by user or selectedfrom suggestions.
List of filled and unfilled rolesdefined by users for the selectedevent type.
Entity role selection using adropdown list – either in text or inentity list.
ABSTRACT
INPUTSets of articles about the sameevent from the Event Registryservice (http://eventregistry.org).
PIPELINE EVENT TYPE SUGGESTION EVALUATION
suggestions generated by SVM classifierbuilt using the QMiner data analyticsplatform (http://qminer.ijs.si)
11 annotators annotating thesame 10 events
12.1% ± 3.1% of proposedentities annotated per event
6.2 ± 0.9 roles filled per event
average pairwise event typeagreement: 5.9 ± 2.0
average pairwise Jaccard indexof roles with same annotation:0.25 ± 0.09
average number of successfulevent type suggestions peruser: 6.6 ± 1.9
built on dataset of 100 events annotatedinto 5 event types (road accident, productlaunch, protest, earthquake and bombing)by an expert annotator
features include concepts found in eventby Event Registry as well as bag-of-wordsfeatures computed on article titles andevent summary
leave-one-out testing classificationaccuracy score of: CA = 0.67