+ All Categories
Home > Documents > Modeling User Interactions in Social Media

Modeling User Interactions in Social Media

Date post: 23-Feb-2016
Category:
Upload: shino
View: 175 times
Download: 0 times
Share this document with a friend
Description:
Modeling User Interactions in Social Media. Eugene Agichtein Emory University. Outline. User-generated content Community Question Answering Contributor authority Content quality Asker satisfaction Open problems. 3. Trends in search and social media. Search in the East: - PowerPoint PPT Presentation
83
Modeling User Interactions in Social Media Eugene Agichtein Emory University
Transcript

Slide 1

Modeling User Interactions in Social MediaEugene AgichteinEmory University

OutlineUser-generated contentCommunity Question AnsweringContributor authorityContent qualityAsker satisfactionOpen problems

3

Trends in search and social mediaSearch in the East:Heavily influenced by social media: Naver, Baidu Knows, TaskCn, ..

Search in the West:Social media mostly indexed/integrated in search repositories

Two opposite trends in social media search:Moving towards point relevance (answers, knowledge search)Moving towards browsing experience, subscription/push model

How to integrate active engagement and contribution with passive viewing of content?Social Media TodayPublished: 4Gb/daySocial Media: 10Gb/DayPage views: 180-200Gb/day

Technorati+Blogpulse~120M blogs~2M posts/day

Twitter: since 11/07:~2M users~3M msgs/day

Facebook/Myspace: 200-300M usersAverage 19 min/day

Yahoo Answers90M users, ~20M questions, ~400M answers[From Andrew Tomkins/Yahoo!, SSM2008 Keynote]

People Helping PeopleNaver: popularity reportedly exceeds web searchYahoo! Answers: some users answer thousands of questions dailyAnd get a t-shirtOpen, quirky, information shared, not soldUnlike Wikipedia:Chatty threads: opinions, support, validationNo core group of moderators to enforce quality

6Where is the nearest car rental to Carnegie Mellon University?8

9

10Successful SearchGive up on magic. Lookup CMU address/zipcodeGoogle maps Query: car rental near:5000 Forbes Avenue Pittsburgh, PA 15213

11

Total time: 7-10 minutes, active workSomeone must know this 13

+0 minutes : 11pm14

15

16

+1 minute17

+36 minutes+7 hours: perfect answer

Why would one wait hours?Rational thinking: effective use of timeUnique information needSubjective/normative questionComplexHuman contact/communityMultiple viewpoints

20http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO

21Challenges in ____ing Social MediaEstimating contributor expertiseEstimating content qualityInfering user intentPredicting satisfaction: general, personalizedMatching askers with answerersSearching archivesDetecting spam

22Work done in collaboration with:

Qi Guo

Yandong Liu

Abulimiti Aji

Thanks:

Prof. Hongyuan ZhaJiang BianYahoo! Research: ChaTo Castillo, Gilad Mishne, Aris Gionis, Debora Donato, Ravi Kumar

Pawel JurczykRelated WorkAdamic et al., WWW 2007, WWW 2008Expertise sharing, network structureKumar et al.: Info diffusion in blogspaceHarper et al., CHI 2008: Answer qualityLescovec et al: Cascades, preferential attachment modelsGlance & Hurst: BloggingKraut et al.: community participation and retentionSSM 2008 Workshop (Searching Social Media)Elsas et al, blog search, ICWSM 2008s2324Estimating Contributor AuthorityQuestion 1Question 2Answer 5Answer 1Answer 2Answer 4Answer 3User 1User 2User 3User 6User 4User 5Answer 6Question 3User 1User 2User 3User 6User 4User 5

P. Jurczyk and E. Agichtein, Discovering Authorities in Question Answer Communities Using Link Analysis (poster),CIKM 2007

Hub (asker)Authority (answerer)25Finding Authorities: Results

26Qualitative ObservationsHITS effective

HITS ineffective

27

Trolls28Estimating Content QualityE. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008

2929

3030

3131

3232

3333Community

3434

3535

36

3737

38from all subsets, as follows:UQV Average number of "stars" to questions by the sameasker.; The punctuation density in the question's subject.; The question's category (assigned by the asker).; \Normalized Clickthrough:" The number of clicks onthe question thread, normalized by the average numberof clicks for all questions in its category.UAV Average number of "Thumbs up" received by answerswritten by the asker of the current question.; Number of words per sentence.UA Average number of answers with references (URLs)given by the asker of the current question.UQ Fraction of questions asked by the asker in which heopens the question's answers to voting (instead of pick-ing the best answer by hand).UQ Average length of the questions by the asker.UAV The number of \best answers" authored by the user.U The number of days the user was active in the system.UAV \Thumbs up" received by the answers wrote by theasker of the current question, minus \thumbs down",divided by total number of \thumbs" received.; \Clicks over Views:" The number of clicks on a ques-tion thread divided by the number of times the ques-tion thread was retrieved as a search result (see [2]).; The KL-divergence between the question's languagemodel and a model estimated from a collection of ques-tion answered by the Yahoo editorial team (availablein http://ask.yahoo.com).3939

40; Answer length.; The number of words in the answer with a corpus fre-quency larger than c.UAV The number of \thumbs up" minus \thumbs down" re-ceived by the answerer, divided by the total number of\thumbs" s/he has received.; The entropy of the trigram character-level model ofthe answer.UAV The fraction of answers of the answerer that have beenpicked as best answers (either by the askers of suchquestions, or by a community voting).; The unique number of words in the answer.U Average number of abuse reports received by the an-swerer over all his/her questions and answers.UAV Average number of abuse reports received by the an-swerer over his/her answers.; The non-stopword word overlap between the questionand the answer.; The Kincaid [21] score of the answer.QUA The average number of answers received by the ques-tions asked by the asker of this answer.; The ratio between the length of the question and thelength of the answer.UAV The number of \thumbs up" minus \thumbs down" re-ceived by the answerer.QUAV The average numbers of \thumbs" received by the an-swers to other questions asked by the asker of this an-swer.Rating Dynamics414242Editorial Quality != Popularity != Usefulness

43Yahoo! Answers: Time to Fulfillment

1. 2006 FIFA World Cup2. Optical3. Poetry4. Football (American)5. Scottish Football (Soccer)Time to close a question (hours) for sample question categoriesTime to close (hours)6. Medicine7. Winter Sports8. Special Education9. General Health Care10. Outdoor Recreation44Predicting Asker SatisfactionGiven a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community.

Satisfied :The asker has closed the question ANDSelected the best answer ANDRated best answer >= 3 stars Else, Unsatisfied

Yandong Liu

Jiang BianY. Liu, J. Bian, and E. Agichtein, Predicting Information Seeker Satisfaction in Community Question Answering, in SIGIR 2008 45MotivationSave time: dont bother to postSuggest a good forum for information needNotify user when satisfactory answer contributedFrom relevance to information need fulfillmentExplicit ratings from asker & community

46ASP: Asker Satisfaction Predictionasker is satisfied

asker is not satisfiedTextCategoryAnswerer HistoryAsker HistoryAnswerQuestionWikipediaNewsClassifier47DatasetsCrawled from Yahoo! Answers in early 2008 (Thanks, Yahoo!)QuestionsAnswersAskersCategories% Satisfied216,1701,963,615158,51510050.7%Available at http://ir.mathcs.emory.edu/shared 48Dataset StatisticsCategory#Q#A#A per QSatisfiedAvg asker ratingTime to close by asker2006 FIFA World Cup(TM)119435659329.8655.4%2.6347 minutesMental Health15111597.6870.9%4.301 day and 13 hoursMathematics65123293.5844.5%4.4833 minutesDiet & Fitness45024365.4168.4%4.301.5 daysAsker satisfaction varies by category#Q, #A, Time to close -> Asker Satisfaction

49Satisfaction Prediction: Human JudgesTruth: askers ratingA random sample of 130 questionsResearchers Agreement: 0.82 F1: 0.45

Amazon Mechanical TurkFive workers per question. Agreement: 0.9 F1: 0.61. Best when at least 4 out of 5 raters agree

50ASP vs. Humans (F1)ClassifierWith TextWithout TextSelected FeaturesASP_SVM0.690.720.62ASP_C4.50.750.760.77ASP_RandomForest0.700.740.68ASP_Boosting0.670.670.67ASP_NB0.610.650.58Best Human Perf0.61Baseline (nave)0.66ASP is significantly more effective than humansHuman F1 is lower than the nave baseline!51Features by Information Gain0.14219 Q: Askers previous rating0.13965 Q: Average past rating by asker0.10237 UH: Member since (interval)0.04878 UH: Average # answers for by past Q0.04878 UH: Previous Q resolved for the asker0.04381 CA: Average rating for the category0.04306 UH: Total number of answers received0.03274 CA: Average voter rating0.03159 Q: Question posting time0.02840 CA: Average # answers per Q52Offline vs. Online PredictionOffline prediction:All features( question, answer, asker & category)F1: 0.77Online prediction:NO answer featuresOnly asker history and question features (stars, #comments, sum of votes)F1: 0.7453Feature AblationPrecisionRecallF1Selected features0.800.730.77No question-answer features0.760.740.75No answerer features0.760.750.75No category features 0.750.760.75

No asker features0.720.690.71No question features0.680.720.70Asker & Question features are most important. Answer quality/Answerer expertise/Category characteristics:may not be importantcaring or supportive answers often preferred5454Satisfaction: varying by asker experience

Group together questions from askers with the same number of previous questionsAccuracy of prediction increase dramaticallyReaching F1 of 0.9 for askers with >= 5 questions55Personalized Prediction of Asker Satisfaction with infoSame information != same usefulness for different users!

Personalized classifier achieves surprisingly good accuracy (even with just 1 previous question!)

Simple strategy of grouping users by number of previous questions is more effective than other methods for users with moderate amount of history

For users with >= 20 questions, textual features are more significant56Some Personalized Models

57Satisfaction Prediction When Grouping Users by Age

58Self-Selection: First Experience Crucial

Days as member vs. rating# prev questions vs. rating59SummaryAsker satisfaction is predictableCan achieve higher than human accuracy by exploiting interaction historyUsers experience is importantGeneral model: one-size-fits-all2000 questions for training model are enoughPersonalized satisfaction prediction:Helps with sufficient data (>= 1 prev interactions, can observe text patterns with >=20 prev. interactions)ProblemsSparsity: most users post only a single questionCold start problemCF: individualize content, no (visible) rating history C.f: Digg: ratings are publicSubjective information needs

6061

62Subjectivity in CQAHow can we exploit structure of CQA to improve question classification?

Case Study: Question Subjectivity PredictionSubjective: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting? Objective: What is the difference between chemotherapy and radiation treatments?62B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008

63Dataset Statistics (~1000 questions)http://ir.mathcs.emory.edu/shared/ ObjectiveSubjective64Key ObservationsAnalysis of real questions in CQA is challenging:Typically complex and subjectiveCan be ill-phrased and vagueNot enough annotated data

Idea: Can we utilize the inherent structure of the CQA interactions, and use unlabeled CQA data to improve classification performance?

64Include example of vague/ill-phrased subjective question and best answer (selected by asker). Ideally, from our labeled dataset. 65Natural Approach: Co-TrainingIntroduced in:Combining labeled and unlabeled data with co-training, Blum and Mitchell, 1998Two views of the dataE.g.: content and hyperlinks in web pagesProvide complementary informationIteratively construct additional labeled data6566Questions and Answers: Two ViewsExample:Q: Has anyone got one of those home blood pressure monitors? and if so what make is it and do you think they are worth getting?A: My mom has one as she is diabetic so its important for her to monitor it she finds it useful.Answers usually match/fit questionMy mom she findsAskers can usually identify matching answers by selecting the best answer

6667CoCQA: A Co-Training Framework over Questions and Answers67Labeled DataCQCAQAUnlabeled Data????????????????????QA+--++----++--+Unlabeled Data????????????????????Validation

(Holdout training data)ClassifyStopInclude one more box on lower right corner: after "stop" lights up, show box "apply final classifier on test data"68Results Summary FeaturesMethodQuestionQuestion+Best AnswerSupervised0.7170.695GE0.712 (-0.7%)0.717 (+3.2%)CoCQA0.731 (+1.9%)0.745 (+7.2%)6869CoCQA for varying amount of labeled data

6970SummaryUser-generated ContentGrowingImportant: impact on main-stream media, scholarly publishing, Can provide insight into information seeking and social processesTraining data for IR, machine learning, NLP, .Need to re-think quality, impact, usefulness

71Current workIntelligently route a question to ``good answerersImprove web search ranking by incorporating CQA data``Cost models for CQA-based question processing vs. other methodsDynamics of User FeedbacksDiscourse analysis

72TakeawaysPeople specify their information need fully when they know humans are on the other end

Next generation of search must be able to cope with complex, subjective, and personal information needs

To move beyond relevance, must be able to model user satisfaction

CQA generates rich data to allow us (and other researchers) to study user satisfaction, interactions, intent for real usersEstimating contributor expertise [CIKM 2007]Estimating content quality [WSDM 2008]Inferring asker intent [EMNLP 2008]Predicting satisfaction [SIGIR 2008, ACL 2008]Matching askers with answerers Searching CQA archives [WWW 2008]Coping with spam [AIRWeb 2008]Thank you!http://www.mathcs.emory.edu/~eugene Backup Slides7575Question-Answer Features

Q: length, posting timeQA: length, KL divergenceQ:VotesQ:Terms7676User Features

U: Member sinceU: Total pointsU: #QuestionsU: #Answers7777Category FeaturesCA: Average time to close a questionCA: Average # answers per questionCA: Average asker ratingCA: Average voter ratingCA: Average # questions per hourCA: Average # answers per hourCategory#Q#A#A per QSatisfiedAvg asker ratingTime to close by askerGeneral Health1347375.4670.4%4.491 day and 13 hours

Backup slides7979Prediction MethodsHeuristic: # answers Baseline: guess the majority class (satisfied).ASP: (our system)ASP_SVM: Our system with the SVM classifierASP_C4.5: with the C4.5 classifierASP_RandomForest: with the RandomForest classifierASP_Boosting: with the AdaBoost algorithm combining weak learnersASP_NaiveBayes: with the Naive Bayes classifier8080Satisfaction Prediction: Human Perf (Contd): Amazon Mechanical TurkMethodologyUsed the same 130 questionsFor each question, list the best answer, as well as other four answers ordered by votesFive independent raters for each question. Agreement: 0.9 F1: 0.61. Best accuracy achieved when at least 4 out of 5 raters predicted asker to be satisfied (otherwise, labeled as unsatisfied).8181Some Results

82Details of CoCQA implementationBase classifierLibSVMTerm Frequency as Term WeightAlso tried Binary, TF*IDFSelect top K examples with highest confidenceMargin value in SVM

8283Feature SetCharacter 3-grams has, any, nyo, yon, oneWordsHas, anyone, got, mom, she, findsWord with Character 3-gramsWord n-grams (n


Recommended