Post on 05-Aug-2015
transcript
TEMPORAL SUMMARIZATION OF EVENT-RELATED UPDATES
IN WIKIPEDIA
Mihai Georgescu, Dang Duc Pham, Sergej Zerr , Nattiya KanhabuaStefan Siersdorfer, Wolfgang Nejdl
L3S Research CenterLeibniz University Hannover
Introduction
• Wikipedia is a free multilingual online encyclopedia covering a wide range of general and specic knowledge
• Most up-to-date encyclopedia
• One of the reasons that drives editing and updating in Wikipedia is the occurrence of new events in the real world
• All updates are kept in an edit history
• Use the edit history of Wikipedia for extracting event-related information and then present it in a comprehensive way
entity
Events cause increased activity on Wikipedia
Extract and present event-related information from the Wikipedia updates
Event
entity
Peaks in update activity correlate with eventsEdit history for the Barack Obama article (monthly)
Mar-
04
May-0
4
Jul-
04
Sep-0
4
Nov-0
4
Jan-0
5
Mar-
05
May-0
5
Jul-
05
Sep-0
5
Nov-0
5
Jan-0
6
Mar-
06
May-0
6
Jul-
06
Sep-0
6
Nov-0
6
Jan-0
7
Mar-
07
May-0
7
Jul-
07
Sep-0
7
Nov-0
7
Jan-0
8
Mar-
08
May-0
8
Jul-
08
Sep-0
8
Nov-0
8
Jan-0
9
Mar-
09
May-0
9
Jul-
09
Sep-0
9
Nov-0
9
Jan-1
0
0
200
400
600
800
1000
1200
1400
1600
November 4, Obama won the presidency
Presidential Campaign Events
Inauguration January 20, 2009
Supported the Secure Fence Act
Announced his candidacyFebruary 10, 2007 won the 2009
Nobel Peace Prize
MotivationDonald Rumsfeld’s resignation from US Secretary of Defense
causes a burst of event-related updates
Oct-0
1
Jan-
02
Apr-0
2
Jul-0
2
Oct-0
2
Jan-
03
Apr-0
3
Jul-0
3
Oct-0
3
Jan-
04
Apr-0
4
Jul-0
4
Oct-0
4
Jan-
05
Apr-0
5
Jul-0
5
Oct-0
5
Jan-
06
Apr-0
6
Jul-0
6
Oct-0
6
Jan-
07
Apr-0
7
Jul-0
7
Oct-0
7
Jan-
08
Apr-0
8
Jul-0
8
Oct-0
8
Jan-
09
Apr-0
9
Jul-0
9
Oct-0
9
Jan-
100
100
200
300
400
500
600
700
800November 8, 2006
Wikipedia UpdateDifference between current version and previous version
Previous Revision
Current Revision
Words Added Words Removed
Comment
Section Title
TimestampAuthor
Position
Pipeline for identifying and summarizing event-related information from Wikipedia updates
EntityEvent-related
updates detection
Event identificationand
summarizationEvents and summaries
All Update
s
Event-relatedUpdates
Event-Related Updates Detection
Mar-0
4
May-0
4
Jul-0
4
Sep-
04
Nov-0
4
Jan-
05
Mar-0
5
May-0
5
Jul-0
5
Sep-
05
Nov-0
5
Jan-
06
Mar-0
6
May-0
6
Jul-0
6
Sep-
06
Nov-0
6
Jan-
07
Mar-0
7
May-0
7
Jul-0
7
Sep-
07
Nov-0
7
Jan-
08
Mar-0
8
May-0
8
Jul-0
8
Sep-
08
Nov-0
8
Jan-
09
Mar-0
9
May-0
9
Jul-0
9
Sep-
09
Nov-0
9
Jan-
100
200
400
600
800
1000
1200
1400
1600
Classify (SVN)(2616/10680)
Event Related Updates
Detect Bursts
Event-Related Updates Detection
Burst Detection Classification
TemporalSummarization• Time-based clustering
• Burst Detection ( each burst corresponds to an event)
• Sentence identification• Weight = #updates made to the sentence• Positions occupied in the updated revisions
• Text-based clustering• Incremental clustering JaccardSimilarity – Sentence Cluster• Cluster weight aggregation of member sentences weight• Representative sentence
• Position-based clustering• Maximum gap of 10 sentences => Positions Cluster• Mapping Sentences Cluster- Positions Cluster
• Summarization as ranked sentencesTop M Sentence Clusters Representative - Position Clusters
TemporalSummarization
1. Burst Detection
2. Sentence Extraction
3. Text Similarity
4. Spatial Similarity
1. Time similarity