Date post: | 08-Jul-2015 |
Category: |
Technology |
Upload: | kklo |
View: | 315 times |
Download: | 0 times |
TIMELINE FROM NEWS
KK Lo
GOAL...
RELATED WORK
Topic Detection and Tracking
Temporal and Event Tagging
2communities
Topic Detection and Tracking
tracking topics?classifying documents
discovering new topic
Events of interest
assume each article is an event
Problems
lack of details
publication date =event happen time?
Temporal and Event Tagging
? Tagging events and their temporal relationships
too many Events....
Problems
Result obtained from the TARSQI toolkit
Event
Event Event
Event
EventEventEvent
MY SOLUTION
APPLY SUMMARIZATIONTECHNIQUE AS
EVENT FILTERING
3components
Prior Ranking1. Sentence A
2. Sentence B
3. Sentence C
4. ...
Beginning sentence has a higher prior probability
0prior probability
Grasshopper
A Page-rank-like ranking algorithm
s1
s2s3
s4
s5
cosine similarities
TARSQI Toolkit
explicit time
event instance
event-time link
event-event link
From TEXT to TimeML
Event FilteringEvents in TimeML
Appear in the Top Selected Sentences?
PICK
BYENO
YES
Temporal Reasoner
Find the (start, end) bound for each events
2008Dec
event1event2
event3
2009
RESULT?
Sentence Selection Quality
Special Thanks to
for the data and ROUGE =p
250-words summary form 25 documents with DUC2007 Data Set
How can we represent 3320 events on a timeline?
Effect of Sentence Filtering
D0701A D0720E
#Event before Filtering 3320 1435
#Event after Filtering 67 37
choosing the top 10 sentences
This shows that my approach is a failure
Time-Event AnchoringD0701A D0720E
#Event before Filtering
3320 1435
#Failure 3085 1129
#Event after Filtering
67 37
#Failure 49 29
WHY?Unable to deduce the
relationships for all pair of events
TARSQI only support single document
e.g. 50 tagged events,only 50 pairs of relation are taggedshould be 50C2 = 1225
LESSON LEARNED
3areas
Topic Detection and Tracking
Temporal and Event Tagging
Automatic Summarization
my project
The limit of existing technology
cannot get enough information from the documents
The limit of temporal analysis
OR EVEN
cosine similarity with tf-idf weighting is computational
expensive
2.5 hrs for 867 sentences
DUC2007 Documents are hard to parse
different documents have different format........
no standard date format...
contains some special characters that cause troubles
to XML parsers...
Q & A