+ All Categories
Home > Documents > Twitter Trending Topic...

Twitter Trending Topic...

Date post: 16-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
Department of Electrical Engineering and Computer Science Twitter Trending Topic Classification Kathy Lee, Diana Palsetia, Ramanathan Narayanan, Md. Mostofa Ali Patwary, Ankit Agrawal, and Alok Choudhary ICDM 2011 Workshop on Optimization Based Methods for Emerging Data Mining Problems (OEDM'11)
Transcript
Page 1: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Twitter Trending Topic Classification

Kathy Lee, Diana Palsetia, Ramanathan Narayanan,

Md. Mostofa Ali Patwary, Ankit Agrawal, and Alok Choudhary

ICDM 2011 Workshop on Optimization Based Methods for Emerging Data Mining Problems (OEDM'11)

Page 2: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Motivation

•  Information explosion •  200 million tweets per day*

•  Twitter provides trending topics

•  Most popular topics that people tweet about

•  What is this trending topic about? •  Hashtags, name of individual,

words in other language, etc •  Is this person a musician, artist,

politician, or a sport man?

Trends: United States trends Boone Logan #MyYearofVIP Barrett Jones Outland #itsalwayssunny Ed Hochuli Vaseline Brett Keisel #beyondsaredstraight Gail Kim

Trending Topics

* http://www.marketinggum.com/twitter-statistics-2011-updated-stats/

Page 3: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Extended Motivation Trending Topics Trends: United States trends Boone Logan #MyYearofVIP Barrett Jones Outland #itsalwayssunny Ed Hochuli Vaseline Brett Keisel #beyondsaredstraight Gail Kim

Page 4: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Our Goal: Classify Trending Topics

Trending Topics General Categories •  Business •  Health •  Music •  Politics •  Sports •  Science •  Technology .

. .

Trends: United States trends Boone Logan #MyYearofVIP Barrett Jones Outland #itsalwayssunny Ed Hochuli Vaseline Brett Keisel #beyondsaredstraight Gail Kim

Page 5: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods •  Results •  Conclusion

Page 6: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

System Architecture      

Text-­‐based  Model  

Valida0on        r  

   Text-­‐based  Modeling    

   

 Network-­‐based    Modeling  

       

Data Collection

Trending  Topic  

Lady  gaga              burberry  

       ipad  

toy  story  3  

tornado  

superbowl  

Category    

music      

fashion      technology  

tv  &  movies  

sports  

other  news  

Lady  gaga  

Trending  Topic      +  

Defini0on  

   

Tweets  

 

Labeling Data Modeling

Machine Learning

Network-­‐based  Model  

Valida0on  Topics A Topics B

Page 7: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods •  Results •  Conclusion

Page 8: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Building Training Set •  23000 trending topics (topics trended February 2010 – July 2011) •  Downloaded trend definition and tweets while

each of 23000 topics was trending •  Random subset of 1000 topics •  Removed topics without trend definitions

Page 9: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Labeling •  2 annotators labeled each topic •  3rd annotator intervened in case of

disagreement •  Removed topics that were labeled differently

by all 3 annotators •  768 trending topics in final training set •  Find 5 similar topics to 768 topics •  Labeled 3005 topics in total

Page 10: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Distribution of training data

8 25

15 22 18 19 8

24 27

75

49

13 17

148

52

73 83

92

0

20

40

60

80

100

120

140

160

Num

ber o

f Top

ics

Page 11: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods

Text-based classification Network-based classification

•  Results •  Conclusion

Page 12: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Document

Document

Trend Definition

Tweets

Page 13: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Text-based data classification •  Bag-of-Words Text Classification 1.  Preprocessing

•  Remove hyperlinks

2.  Apply string-to-word vector filter •  Remove symbols and stop words •  Transform tokens into TF-IDF (term-frequency inverse-

document-frequency) weight

3.  Apply various classification models •  Naïve Bayes, Naïve Bayes Multinomial, and SVM

Page 14: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods

Text-based classification Network-based classification

•  Results •  Conclusion

Page 15: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Algorithm •  Finds topic-specific influential users using

social network information •  Friend-Follower relationship, tweet time, number of tweets,

etc

•  Take top 300 influential users for each topic •  Finds 5 most similar topics using the common

influential users between two topics •  Classify a topic using categories of its similar

topics

Page 16: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

X is more influential than Y on Topic A

* R. Narayanan, “Mining Text for Relationship Extraction and Sentiment Analysis,” Ph.D. dissertation, 2010.

Network-based Classification Topic-specific Influential Users*

Topic A Topic B

X

Y

A

Page 17: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Network-based Classification User similarity Model*

* R. Narayanan, “Mining Text for Relationship Extraction and Sentiment Analysis,” Ph.D. dissertation, 2010.

Topic A Topic B

Topic C

Page 18: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Network-based Classification User similarity Model*

Topics A and B are more closely related

than Topics A and C If |Ainfl ∩ Binfl| > |Ainfl ∩ Cinfl|

* R. Narayanan, “Mining Text for Relationship Extraction and Sentiment Analysis,” Ph.D. dissertation, 2010.

Topic A Topic B

Topic C

Page 19: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Network-based Classification Topic “macbook” and 5 similar topics

IWORK technology

MAGIC TRACKPAD

technology

APPLE IPAD technology

MOBILEME technology

#LANDSEND charity&deals

MACBOOK class?

11

11 11

11 10

Similar Topic Class of Similar Topic

# Common Influential

Users

iwork technology 11

magic trackpad technology 11

#landsend charity & deals 11

apple ipad technology 11

mobileme technology 10

technology = 11 + 11 + 11 + 10 = 43 charity&deals = 11

Numbers in diagram : number of common influential users between topic “macbook” and the similar topic

Page 20: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Input to classifier Topic technology charity

&deals books music fashion tv

&movies … Class

macbook 43 11 0 0 0 0 … ?

queen_rowling 0 0 30 0 0 10 … ?

lady_gaga 0 0 0 40 0 0 … ?

Table with 768 rows and 19 columns

•  Run various classifier •  C5.0, K-Nearest Neighbor, SVM, Logistic Regression

Page 21: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods •  Results •  Conclusion

Page 22: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Experimental Setup •  TD: Trend Definition •  Model(x, y): classifier model used to classify a

document consisting of x number of tweets per topic using y top frequent terms •  e.g., NBM(100,1000)

•  Naïve Bayes Multinomial classifier •  Document containing 100 tweets using •  1000 top frequent terms

•  WEKA and SPSS modeler for classification •  10-fold cross validation

Page 23: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Text-based Classification Results

53

63.93 65.36

54 61.76 59.81

44.5 45.31 42.83

19.27

0

10

20

30

40

50

60

70

Acc

urac

y (%

)

Page 24: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Network-based classification results

70.96 63.28

54.34 53.45

19.27

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

C 5.0 K-Nearest Neighbor

Support Vector

Machine

Logistic Regression

ZeroR

Acc

urac

y (%

)

Page 25: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

•  Motivation •  Method Overview •  Data Set •  Methods •  Results •  Conclusion

Page 26: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Key Contributions •  Use of social network structure for topic

classification •  Good accuracy (65%) on Text-based

classification •  tweets are not grammatically structured (noisy)

•  Network-based classifier (71%) outperforms text-based classifier

Page 27: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Future Work •  Integrate text-based classification and

network-based classification •  Multi-labeling

•  topics could fall under more than one category - e.g., news about a famous actor’s biography

Page 28: Twitter Trending Topic Classificationusers.eecs.northwestern.edu/~kml649/publication/Presentation_Twit… · Lady! gaga Trending! Topic + Definion!! Tweets!! Labeling Data Modeling

Department of Electrical Engineering and Computer Science

Questions?

Thank you !


Recommended