Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Post on 11-May-2015

485 views 2 download

Tags:

description

Talk by Ke Tao (from Web Information Systems, TU Delft) at 23rd ACM Conference on Hypertext and Social Media, June 28 2012, Milwaukee, WI, USA

transcript

DelftUniversity ofTechnology

Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao

Web Information Systems, TU Delft, the Netherlands

2Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

200,000,000number of tweets published per day

3

Pukkelpop 2011

People tweet about everything,

everywhere :-)

4

Pukkelpop 2011

81,000 tweets in four hours

became a tragedy

Filtering

200,000,000

Search & Analytics

Useful tweets?

5

Case NijmegenTrain accident

6Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First tweet…

And then your train blasts off full of the anvils. #Nijmegen #veolia

7Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First picture…

Astonishing! My train rams the platform at Nijmegen!

http://pic.twitter.com/QVVfJHyd

8Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Traditional news media

A train ramed the anvils at Nijmegen.

9Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?

2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?

Twitter streams

Research Challenges

Filtering

topic

Search & Analytics

information need

10Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

11Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident system

12Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

13Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident detection

• Twiticident relies on Emergency Broadcasting Services for detecting incidents.

• In the Netherlands : P2000 communication network

14Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident Profiling• For an incident i:

• The profile of an incident is described as a set of tuples.

• Each tuple includes a facet-value pair (f, v) and its weight to the incident i.

Location, Netherlands

0.4

Incident,Train

accident0.5

Location, Nijmegen

0.8

Orgranization,Veolia

0.6

Incident,Crash

1.0

15Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

16Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo

17Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

18Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Semantic Enrichment

• Named Entity Recognition

• Classification : Casualties, Damages, Risks…

• Linkage : External Resources

• Metadata extraction

19Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

20Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Filtering

•Which tweets are relevant to the incidents?

• Preprocessing : Language detection

• Semantic Filtering : Compare tweet with P(i)

• Semantic Filtering with News Context• P’(i) : P(i) complemented with f-v pairs from

news

21Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

22Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Faceted Search

•Strategies (ranking)

• Frequency-based

• Time-sensitive based

• Personalized

23Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Real-time analyticsWhat type of things are mentioned in the tweets?

What aspects are mentioned over time? What do people report about over time?

Impact Area

24Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Evaluation - Dataset

• Twitter corpus ( TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )• 4,766,901 tweets classified as English• 6.2 million entity-extractions

• News (Same time period)• 62 RSS News Feeds• 13,959 News Articles• 357,559 entity-extractions

25Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (1/2)

Semantic strategies outperform the keyword-based filtering regarding all metrics.

26Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (2/2)

The semantic strategy is more robust and achieves higher precisions for complex topics.

27Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (1/2)

The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.

28Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (2/2)

The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values.

29Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Conclusions

• What we have done:

• Twitcident, a framework for filtering, searching, and

analyzing information about incidents that people

publish in their Social Web Streams

• What we have achieved:

• Better filtering of Twitter messages for a given incident.

• Better search for relevant information about an incident

within the filtered messages.

30Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Thank you!

Ke Tao @taubau

@wisdelfthttp://twitcident.org