Date post: | 23-Jan-2015 |
Category: |
Technology |
Upload: | jan-vosecky |
View: | 971 times |
Download: | 0 times |
Dynamic Multi-Faceted Topic Discovery in Twitter
Jan Vosecky
Di Jiang
Kenneth Wai-Ting Leung
Wilfred Ng
2
3
Representation
• Vector space model– Term vector sparseness issue
• Topic models– Latent topic vector better than VSM?
4
Topic Models
A latent topic in LDA
“Arab revolutions”
Libya 0.00040Force 0.00020Human 0.00010Abuse 0.00010Protect 0.00009Secure 0.00008War 0.00005Execute 0.00004
5
A topic in Twitter?
• Not just words• People talk about entities
Locations
Time
…PersonsOrganizations
6
Multi-faceted Topic Model
• Each topic consists of n facets– Elements of each facet ~ multinomial distribution
• Each document d is a distribution over topics– General terms, named entities and timestamp
drawn from the respective facet of topic z
7
Multi-faceted Topic Model
Multi-faceted latent topic “Arab revolutions”
General terms Persons Locations Organizations
Time
8
Parameter Inference
• Scalability– Gibbs sampling and variational inference
process data in a batch
• Online inference– Stochastic variational inference
to process streaming data
Model continuously updated
Constant time to process a new doc
doc doc doc doc
inference
doc doc doc doc
inference
……
9
Perplexity comparison:Online inference vs. Gibbs sampling
K = 50 K = 200
10
Tweet Clustering
(a) Manually-labeled dataset (b) Hashtag-labeled dataset
DBSCANK-means Direct DBSCANK-means Direct
Vector space model (TF-IDF)
11
Summary
• Model multi-faceted topics in microblogs– Entity-oriented and dynamic
• Online inference method
• Beneficial for downstream applications
12
Thank You!
Jan Vosecky
Di Jiang
Kenneth Wai-Ting Leung
Wilfred Ng