Date post: | 27-Jan-2015 |
Category: |
Education |
Upload: | mohammad-ali-abbasi |
View: | 104 times |
Download: | 2 times |
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 1
DATA MINING AND MACHINE LEARNINGIN A NUTSHELL
LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL MEDIA WITH COUPLED MUTUAL
REINFORCEMENT
Mohammad-Ali Abbasihttp://www.public.asu.edu/~mabbasi2/
SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERINGARIZONA STATE UNIVERSITY
http://dmml.asu.edu/
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 2
About the paper
• Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement– Jiang Bian, Georgia Institute of Technology– Yandong Liu, Emory University– Ding Zhou, Facebook Inc.– Eugene Agichtein, Emory University– Hongyuan Zha, Georgia Institute of Technology
• WWW 2009, April 20–24, 2009, Madrid, Spain.
2
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 3
Community Question Answering (CQA)
• Is a popular forum for users to pose questions for the other users to answer
• User can ask natural language question
• Is comparable with regular web search
3
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 4
Sample: Yahoo! Answers
• Introduction
4
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 5
What is the problem?
• retrieve answers from a social media archive with a large amount information– the quality, accuracy, and comprehensiveness of
the submitted questions and answers varies widely
– A large fraction of the content is not useful for answering queries
– Current approaches require large amounts of manually labeled data
5
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 6
CQA environment
• Users
• Question
• Answers
6
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 7
The goal
• Identify – High quality Answers– High quality Questions– High reputation Users
• Simultaneously
• With the minimum manual labeling
7
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 8
The contribution of this paper
• developing a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process
• more effective for finding high-quality answers, questions, and users.
• improves the accuracy of search over CQA archives
8
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 9
Current approaches
• Relies on the users reputation,
• OR- Require large amount of supervision,
• OR- focus on the network properties of the CQA
• without considering the actual content of the information exchanged
9
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 10
How to rank?
• Current approaches:– Content QualityOR– User reputation
• This paper:– Content QualityAND– User reputation
10
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 11
Definitions
• Question Quality– A question's effectiveness at attracting high quality
answers
• Answer Quality– the responsiveness, accuracy, and comprehensiveness of
the answer to a question.
• Question Reputation– indicating the expected quality of the questions posted by
a user
• Answer Reputation– the expected quality of the answers posted by a user.
11
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 12
Model the problem
• Solution
12
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 13
Mutual reinforcement Principle
• Solution
13
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 14
Feature Space: X(Q), X(A), X(U)
• Solution
14
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 15
Learning quality and reputation(Coupled Mutual Reinforcement)
• P(x): probability of being “good”
• Model of P(x)
• B is Coefficient of the linear model and can be found by maximizing:
15
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 16
Non independent equations
• Conditional log-likelihood
• Objective function
16
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 17
CQA-MR Algorithm
• Solution
17
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 18
Experimental Setup- Data Collection
• From Yahoo! Answers with their API
• Use TREC QA benchmark Archive to crawl QA archives (http://trec.nist.gov/data.html)
• Get all available answers for each question– 107293 users– 27354 questions– 224617 answers
18
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 19
Evaluation Metrics
• Mean Reciprocal Rank(MRR)– the reciprocal of the rank at which the first relevant
answer was returned, or 0 if none of the top N results contained a relevant answer
• Precision at K– for a given query, P(K) reports the fraction of answers
ranked in the top K results that are labeled as relevant
• Mean Average of Precision(MAP)– the mean of the precision at K values calculated after each
relevant answer was retrieved
19
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 20
User reputation methods
• Baseline– users are ranked by “indegree" (number of answers
posted)
• HITS– Users are ranked based on their authority scores
• CQA-Supervised– classify users into those with "high" and "low” reputation,
and trained over the features
• CQA-MR– predict user reputation based on mutual- reinforcement
algorithm
20
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 21
CQA Retrieval methods
• Baseline– score computed as the difference of up votes and down
votes
• Gbrank– did not include answer and question quality and user
reputation
• GBrank-HITS:– optimized GBrank by adding user reputation calculated by
HITS algorithm
• GBrank-Supervised– supervised learning and optimize GBrank by adding
obtained quality21
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 22
Precision at K for the top contributors
• Experiments
22
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 23
Precision at K
• Experiments
23
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 24
Accuracy
• Experiments
24
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 25
Training Labels
• Experiments
25
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 26
Training Labels
• Experiments
26
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab
Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement 27
Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis.
http://www.public.asu.edu/~mabbasi2/