Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 2 times |
Tweetool (0. 1 100 version)Final Report
Yilei Qian
Computer Science
University of Southern
California
A Twitter Recommend System based on Topic Modeling
Ideas
• Following too many points on Twitter
• Too many news every day
• Cannot find the interested and valued news
• Don’t know the name which user want to follow
• Need someone to recommend who to follow
• Need someone to recommend the hottest news
• Use topic modeling to re-rank all the user
Traditional Method
Traditional Method
Traditional Method
Topic Modeling
Topic Modeling
Topic Modeling
• a topic model is a type of statistical model for
discovering the abstract "topics" that occur in a
collection of documents.
• Always used in natural language processing.
Reference Papers:
Steyvers,m. and Griffiths, T., “Probabilistic topic
models,” Hand book of latent semantic analysis
Blei, D.M and Ng, A.Y and Jordan, M.I, “Latent
Dirichlet Allocation”, The Journal of Machine Learning
Research 2003
Label based LDA
Step:
1. Build the LDA Model
2. Train the model instance by train document
3. Run the LDA for all the data based on trained model
instance
Problem:
4. Punctuation marks. E.g. “”,.={}() …
5. Frequent words. E.g I , you….
6. Other Noise
Result Generate
1. By Angle
Value = 2. By Distance
Value =
13-Dimension Topics
1. Art & Design2. Book3. Business4. Charity5. Entertainment6. Family7. Fashion8. Food & Drink9. Health10. Music11. News12. Science & Technology13. Sports
Languages & Tools
• Web UI: HTML + AJAX(Unfinished) +CSS(unfinished)+Twitter
REST API
• Android UI: Java, Android 2.1(unfinished)
• Server Side: Java 1.6, Servlet 2.0, Spring 3.0, Hibernate 3.3
• Twitter API: Twitter4j 2.2.1 (300 request per hour)
• Server: Tomcat 7.08
• Database: MySQL 5.5
• Data Package: JSON
• Develop Platform: Eclipse 3.4
• Total code lines: 2000(+) + 2421 + 462 = 5000(+)
• Subversion:
• http://tweetool-yilei.googlecode.com/svn/trunk/tweetool-yilei-read-
only
Architecture
DB
Twitterfetch
LLDATweetool
Hibernate DAO
Work Flow
Servlets
Work Flow
Work Flow
Mobile DeviceHTML
APPLICATIONCONTEXT
Distributed Crawler & Computing
Problems(endless T_T)
1. High noise in topic model
• Few words, Odd marks, Abbreviation
2. Unfamiliar with Twitter API, A lot of bugs
3. Transaction Problems
4. The Ugly UI
5. Poor performance
6. Don’t have enough time. Many functions are
unfinished
7. Tweetool system should be reconstructed !!!
Environment: 7000+Users 22,0000+Tweets
Future Work
1. Try to finish it
2. Debug
3. Build a better train file
4. Add feedback function
5. Better topics classification
Web UI (Design Version)
Android UI
FunctionButton
FunctionButton
FunctionButton
FunctionButton
Titile
Main Menu News Menu
Title
News
News
News