DelftUniversity ofTechnology
Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28
Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands
2Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
200,000,000number of tweets published per day
3
Pukkelpop 2011
People tweet about everything,
everywhere :-)
4
Pukkelpop 2011
81,000 tweets in four hours
became a tragedy
Filtering
200,000,000
Search & Analytics
Useful tweets?
5
Case NijmegenTrain accident
6Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
First tweet…
And then your train blasts off full of the anvils. #Nijmegen #veolia
7Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
First picture…
Astonishing! My train rams the platform at Nijmegen!
http://pic.twitter.com/QVVfJHyd
8Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Traditional news media
A train ramed the anvils at Nijmegen.
9Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?
2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?
Twitter streams
Research Challenges
Filtering
topic
Search & Analytics
information need
10Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
11Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident system
12Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
13Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Incident detection
• Twiticident relies on Emergency Broadcasting Services for detecting incidents.
• In the Netherlands : P2000 communication network
14Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Incident Profiling• For an incident i:
• The profile of an incident is described as a set of tuples.
• Each tuple includes a facet-value pair (f, v) and its weight to the incident i.
Location, Netherlands
0.4
Incident,Train
accident0.5
Location, Nijmegen
0.8
Orgranization,Veolia
0.6
Incident,Crash
1.0
15Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
16Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo
17Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
18Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Semantic Enrichment
• Named Entity Recognition
• Classification : Casualties, Damages, Risks…
• Linkage : External Resources
• Metadata extraction
19Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
20Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Filtering
•Which tweets are relevant to the incidents?
• Preprocessing : Language detection
• Semantic Filtering : Compare tweet with P(i)
• Semantic Filtering with News Context• P’(i) : P(i) complemented with f-v pairs from
news
21Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
22Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Faceted Search
•Strategies (ranking)
• Frequency-based
• Time-sensitive based
• Personalized
23Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Real-time analyticsWhat type of things are mentioned in the tweets?
What aspects are mentioned over time? What do people report about over time?
Impact Area
24Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Evaluation - Dataset
• Twitter corpus ( TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )• 4,766,901 tweets classified as English• 6.2 million entity-extractions
• News (Same time period)• 62 RSS News Feeds• 13,959 News Articles• 357,559 entity-extractions
25Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor tweets Filtering (1/2)
Semantic strategies outperform the keyword-based filtering regarding all metrics.
26Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor tweets Filtering (2/2)
The semantic strategy is more robust and achieves higher precisions for complex topics.
27Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor Faceted Search (1/2)
The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.
28Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor Faceted Search (2/2)
The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values.
29Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Conclusions
• What we have done:
• Twitcident, a framework for filtering, searching, and
analyzing information about incidents that people
publish in their Social Web Streams
• What we have achieved:
• Better filtering of Twitter messages for a given incident.
• Better search for relevant information about an incident
within the filtered messages.
30Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Thank you!
Ke Tao @taubau
@wisdelfthttp://twitcident.org