+ All Categories
Home > Social Media > Mining Twitter for Real-Time Trend and Information Discovery

Mining Twitter for Real-Time Trend and Information Discovery

Date post: 09-Dec-2014
Category:
Upload: azubiaga
View: 646 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
43
Mining Twitter for real-time trend and information discovery Yahoo! Research Barcelona Arkaitz Zubiaga NLP & IR Group @ UNED December 19th, 2011
Transcript
Page 1: Mining Twitter for Real-Time Trend and Information Discovery

Mining Twitter for real-time trend and informationdiscovery

Yahoo! Research Barcelona

Arkaitz Zubiaga

NLP & IR Group @ UNED

December 19th, 2011

Page 2: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Index

1 Motivation

2 Our Work (I): Classification of Trending Topics

3 Our Work (II): Real-Time Summarization of Events

4 Outlook

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 2 / 43

Page 3: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter

Twitter is a microblogging service with over 200 million users.

Users share short messages of up to 140 characters (tweets).

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 3 / 43

Page 4: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter: following users

Different from Facebook, following is not reciprocal.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 4 / 43

Page 5: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter: retweeting

Retweet: users can help spread tweets by others.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 5 / 43

Page 6: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter

Retweeting enables fast spread of messages.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 6 / 43

Page 7: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Increase of activity on Twitter

As of October 2011, Twitter received 250 million tweets per day.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 7 / 43

Page 8: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Variety of Twitter accounts

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 8 / 43

Page 9: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Usefulness of Twitter

Twitter provides...1 ...large amounts of data in real-time,

2 from a wide variety of sources,

3 with the ability to spread rapidly.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 9 / 43

Page 10: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter’s popularity

Twitter has gained widespread popularity as a tool for...

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 10 / 43

Page 11: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Using Twitter for... following events

(1) Live-tweeting about and following events.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 11 / 43

Page 12: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Using Twitter for... helping others

(2) Helping others, as in natural disasters.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 12 / 43

Page 13: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Using Twitter for... finding out about news

and (3) Finding out about breaking news.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 13 / 43

Page 14: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Twitter on the media

Lots of researchers are analyzing tweets.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 14 / 43

Page 15: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Trends on Twitter

The news about the Japan earthquake broke on Twitter.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 15 / 43

Page 16: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Video: Japan earthquake on Twitter

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 16 / 43

Page 17: Mining Twitter for Real-Time Trend and Information Discovery

Motivation

Research on Twitter

Most of the research on Twitter focus on the analysis of streams afterthey happened.

Very little research deals with the real-time analysis of streams.

Our goal: How can we mine Twitter streams to acquire real-timeknowledge about events and trends?

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 17 / 43

Page 18: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Index

1 Motivation

2 Our Work (I): Classification of Trending Topics

3 Our Work (II): Real-Time Summarization of Events

4 Outlook

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 18 / 43

Page 19: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Trending Topics on Twitter

Trending topics reflect the top conversations being discussed onTwitter more than usually.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 19 / 43

Page 20: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

What produces trending topics?

What kinds of events leverage those trending topics?

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 20 / 43

Page 21: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Typology of Trending Topics

News: Japan earthquake.

Current events: a soccer game.

Memes: funny and viral ideas.

Commemoratives: World AIDS Day.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 21 / 43

Page 22: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Goal

Find out the type of a trending topic as soon as it emerges.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 22 / 43

Page 23: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Dataset

1,036 unique trending topics, with up to 1,500 associatedtweets as soon as they trended.

Manual classification of trending topics:

616 current events.251 memes.142 news.27 commemoratives.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 23 / 43

Page 24: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Experiment Settings

Support Vector Machines (one-against-all)

500 trends for the training set.

10 runs.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 24 / 43

Page 25: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Representation of Trending Topics

2 different representation approaches:

Twitter features: 15 straightforward language-independentfeatures that rely on the social spread of trends.

Bag-of-words: Text of tweets (TF).

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 25 / 43

Page 26: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Results

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 26 / 43

Page 27: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Results

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 27 / 43

Page 28: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (I): Classification of Trending Topics

Main findings

Trending topics can accurately (78.4%) be categorized using socialfeatures:

Outperforming use of textual content.

Without making use of external data.

In real-time as the trending topic emerges.

Arkaitz Zubiaga, Damiano Spina, Vıctor Fresno, and Raquel Martınez.2011. Classifying trending topics: a typology of conversation triggers onTwitter. CIKM 2011.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 28 / 43

Page 29: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Index

1 Motivation

2 Our Work (I): Classification of Trending Topics

3 Our Work (II): Real-Time Summarization of Events

4 Outlook

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 29 / 43

Page 30: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Events on Twitter

When users live-tweet about events:

They produce vast amounts of tweets about events.

Users want to follow what others say.

Users cannot follow the overwhelming amounts of tweets.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 30 / 43

Page 31: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Stream summarization

Can we summarize streams of tweets in such a way that:

Users receive a reduced stream that they can follow?

Users do not miss any key sub-event occurred during the event?

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 31 / 43

Page 32: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Study of soccer games

Copa America 2011 (July 1-26, 2011):

26 soccer games.

11k-70k tweets per game.

Tweets are written in 30 languages.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 32 / 43

Page 33: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Gold Standard

Live reports gathered from Yahoo! Sports.

Yahoo! journalists provide annotations for:

Goals.Penalties.Red Cards.Disallowed Goals.Game Starts, Ends, Stops & Resumptions.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 33 / 43

Page 34: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Histogram of a Soccer Game

time elapsed

twee

t rat

e

500

1000

1500

2000

2500

1310854000 1310856000 1310858000 1310860000 1310862000 1310864000

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 34 / 43

Page 35: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Summarization of soccer games

2-step summarization:1 Sub-event detection.

2 Tweet selection.

summary

Sub-event Detection

Tweet Selection

real-time

tweets stream

tweet

tweet

tweet

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 35 / 43

Page 36: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

1st Step: Sub-event Detection

Increase [Zhao et al., 2011]: a sub-event occurred when a suddenincrease is given in the tweeting rate (1.7 as much as the previousrate).

Outliers: learns from audience. High tweeting rates ascompared to rates seen so far will be considered sub-events (90%percentile).

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 36 / 43

Page 37: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

1st Step: Results

P R F1 #

Increase 0.29 0.81 0.41 45.4

Outliers 0.51 0.84 0.63 25.6

Increase-based approach provides more sub-events, with many FPs(recall-based).

Outlier-based approach (rather based on outstanding tweeting rates)improves in P and R.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 37 / 43

Page 38: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

2nd Step: Tweet Selection

Each term appearing in tweets in a given timeframe is given a weightaccording to:

Frequency (TF).

Language Models (KLD).

These weightings enable to choose a representative tweet, as the tweetwith higher value adding up weights of its terms.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 38 / 43

Page 39: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

2nd Step: Results

es en pt

Goals (54)TF 0.98 0.98 0.98KLD 1.00 1.00 1.00

Penalties (2)TF 1.00 0.50 1.00KLD 1.00 0.50 1.00

Red cards (12)TF 0.75 0.75 0.83KLD 0.92 0.92 1.00

Disallowed goals (10)TF 0.40 0.50 0.40KLD 0.40 0.50 0.30

Game starts (26)TF 0.73 0.74 0.79KLD 0.84 0.79 0.83

Game ends (26)TF 1.00 1.00 1.00KLD 1.00 1.00 1.00

Game stops TF 0.62 0.60 0.57& resumptions (63) KLD 0.68 0.60 0.59

OverallTF 0.79 0.74 0.78KLD 0.84 0.77 0.82

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 39 / 43

Page 40: Mining Twitter for Real-Time Trend and Information Discovery

Our Work (II): Real-Time Summarization of Events

Main findings

Use of state-of-the-art text analysis methods generates accuratesummaries:

With precision and recall values above 80% (100% for keysub-events).

In real-time as the game is being played.

In 3 different languages (es, en, pt).

Without need of external data.

Damiano Spina, Arkaitz Zubiaga, Enrique Amigo, Julio Gonzalo. TowardsReal-Time Summarization of Events from Twitter Streams. To Appear.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 40 / 43

Page 41: Mining Twitter for Real-Time Trend and Information Discovery

Outlook

Index

1 Motivation

2 Our Work (I): Classification of Trending Topics

3 Our Work (II): Real-Time Summarization of Events

4 Outlook

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 41 / 43

Page 42: Mining Twitter for Real-Time Trend and Information Discovery

Outlook

Outlook

Work 1:Further dig into each type of trending topic, in order to look forsubtypes of trends.

Work 2:Evaluate the performance of the summarizer on other kinds ofscheduled events (award ceremonies, keynote talks,...)Evaluate novelty of information garnered from tweets.

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 42 / 43

Page 43: Mining Twitter for Real-Time Trend and Information Discovery

Outlook

Any Questions?

Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 43 / 43


Recommended