+ All Categories
Home > Technology > Twitter mining

Twitter mining

Date post: 10-May-2015
Category:
Upload: magicpeach
View: 2,080 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
Microblog(Twitter) mining yutao
Transcript
Page 1: Twitter mining

Microblog(Twitter) mining

yutao

Page 2: Twitter mining

What is twitter?

• 140 character tweet• Hashtag # before relevant keywords in tweet• RT means to “re-tweet” or forward a tweet • @ reference refers to a user’s screen name

Page 3: Twitter mining

Why it is different?

• Very short in length• Written in informal style• Social

Page 4: Twitter mining

What is twitter, a social network or a news media?(www2010)

• Following is mostly not reciprocated(not so “social”)

• Users talk about timely topics• A few users reach large audience directly• Most users can reach large audience by word-

of-mouth quickly

Page 5: Twitter mining

early Analysis

Page 6: Twitter mining

Analysis 1: Take the people out

• Krishnamurthy et al (2008) • users were classified by

follower/following counts, Numbers and ratios

• means and mechanisms of their engagement

Web (61.7%), mobile/text (7.5%), software (22.4%)

Page 7: Twitter mining

Analysis 2: Content Category

Four meta-categories • daily chatter• conversations• information / URL sharing• news reporting

Page 8: Twitter mining

Analysis 3: measuring user influence

• Indegree, retweets and mentions• Strong correlation between retweet and

mention• Most connected != most influential

Page 9: Twitter mining

User influence

Page 10: Twitter mining
Page 11: Twitter mining
Page 12: Twitter mining
Page 13: Twitter mining

How to detect spam?

• classification• Content attributes

hashtags, trending topicsreplies, mentions, http links

• User behavior attributesage of user account

• Graph based attribute

Page 14: Twitter mining

Sentiment analysis

• Supervised classification• Training data come from twitter, instead of

human labeled• Happy emotions: “:-)”, “:)”, “=)”, “:D” etc• Sad emotions: “:-(”, “:(”, “=(”, “;(” etc• Objective: newspapers and magzines

such as “NY times”

Page 15: Twitter mining

Trend detection

• Bursty keywords detection• Bursty keywords grouping• Context extraction(such as PCA, SVD)

Page 16: Twitter mining

twitter search(wsdm2011)

Page 17: Twitter mining

The largest difference

• Twitter search order by time• Search engine order by relevance

• Social• Time

Page 18: Twitter mining

recommendation

Page 19: Twitter mining

Recommending content from information streams

• The filtering problem:– “I get 1000+ items in my stream daily but only

have time to read 10 of them. Which ones should I read?”

• The Discovery Problem:– “There are millions of URLs posted daily on twitter.

Am I missing something important there outside my own Twitter stream?”

Page 20: Twitter mining

Recommending content from information streams

• Recency of content: only interesting within a short time after published.– always a “cold start” situation

• Explicit interaction among users– Explicitly interact by subscribing or sharing

• User-generated content– People are content producers as well as

consumers

Page 21: Twitter mining

Recommending content from information streams

Page 22: Twitter mining

URL Sources

• Considering all URLs was impossible• FoF : URLs from followee-of-followees• Popular : URLs that are popular across whole

twitter

Page 23: Twitter mining

Topic relevance scores

• Topic profile of URLs– Use term vectors as profiles– Built from tweets that have mentioned the URL

• Topic profile of users– Self-topic: content profile based on what I post– Followee-Topic: content profile based on what my

followees post

Page 24: Twitter mining

Social network scores

• “Popular Vote” in among my followees-of-followees– People “vote” a URL by tweeting it– Votes are weighted using social network structure– URLs with more votes in total are assigned higher

score

Page 25: Twitter mining
Page 26: Twitter mining

Recommending twitter users to follow

• Social graph• Profile user– User himself– Followers– followees

Page 27: Twitter mining

Microblog summarization

Page 28: Twitter mining

The phrase reinforcement algorithm

• Looking for the most commonly occurring phrases– Users tend to use similar words when describing a

particular topic– RT

Page 29: Twitter mining
Page 30: Twitter mining

Hybrid TF-IDF summarization

• TF: the document is the entire collection of posts

• IDF: the document is a single post

Page 31: Twitter mining

Topic model

Page 32: Twitter mining

32

Content modeling on Twitter

Surface word features

tf.idf cosine similarity,

etc.

Deeper natural

language processing

Parsing, parts of speech,

coreference, etc

dats yur mom not me lol

THE_REAL_SHAQ

Page 33: Twitter mining

33

Best model in ranking

experiments

Labeled LDA

Content modeling on Twitter

Surface word features

Topic models, Dimensionality

reduction

Supervised classification

#hashtags, emoticons,

questions, etc.

tf.idf cosine similarity,

etc.

Latent Dirichlet Allocation (LDA),

LSA, etc.

Naïve Bayes,SVM, etc.

Page 34: Twitter mining

34

Content modeling with Labeled LDADiscover unlabeled topicsParameter K=200 latent

topic dimensions

Model common labels500 - 1000 dimensions for hashtags, emoticons, etc.

obama president american america says country russia pope island

I’m going go out gonna see im tonight sleep tomorrow about am night

:) good day morning thanks have happy hope birthday

:) can‘t wait see one yay!!! cant tomorrow got !! next christmas

Smile : )

#jobs featured manager sales engineer yahoo location senior

#jobs

Page 35: Twitter mining

35

Content modeling with Labeled LDA

new muppetblog political commentary link

@kermit heyy wanna catch a movie

just ate a cookie #yummy

4 1 1 1

2 2 2 3 3

5 5 #yummy #yummy

Histogram as signature for set of posts

4 1 1 1

2 2 2 3 3

5 5 #yummy #yummy

Page 36: Twitter mining

36

Twitter content by category

Substance27%

Status12%

Style38%

Social23%

can make help if someone tell_me them anyone use makes any sense trying explain

obama president american america says country russia pope island failed honduras

haha lol :) funny :p omg hahaha yeah too yes thats ha wow cool lmao though kinda

am still doing sleep so going tired bed awake supposed hell asleep early sleeping sleepy

night sleep bed going off tomorrow bye tonight goodnight all im time now nite

iphone new phone app mobile apple ipod blackberry touch pro store apps free android an

up what's hit pick whats hey set twitter sign give catch when show first wats make

im get dont gonna shit gotta wanna cuz damn ur make cant say cause bout ill mad tired

Page 37: Twitter mining

37

Characterizing Microblogs with Topic Models

Outline• Modeling Twitter content with topic models• Characterizing, recommending and filtering

Page 38: Twitter mining

Characterizing users

Page 39: Twitter mining

Characterizing users

Page 40: Twitter mining

TwitterRank: Finding Topic-sensitive Influential Twitterers

• Apply LDA to distill topics automatically• Find topics in the twitterer’s content to

represent her interests– Twitterer’s content = aggregated tweets

• Twitterers with “following” relationships are more similar than those without according to the topics they are interested in

Page 41: Twitter mining

Topic-specific TwitterRank

Page 42: Twitter mining

Interesting application

• Personalized and automatic social summarization of events in video

• Twitter Can Predict the Stock Market• Predicting elections with twitter• Earthquake(time, location)

Page 43: Twitter mining

thanksmany pictures and slides come from the internet


Recommended