Post on 20-May-2020
transcript
Iden%fying Relevant Social Media Content: Leveraging Informa%on Diversity and User Cogni%on
Munmun De Choudhury1, Sco% Counts2 & Mary Czerwinski2 1Rutgers, The State University of New Jersey
2Microso< Research, Redmond
6/8/11 2
Modern Social Interac%onal Modes
Facebook Slashdot
Engadget
Flickr
LiveJournal Digg
YouTube Blogger
MetaFilter Reddit MySpace
Orkut TwiIer
How do we idenEfy the most “relevant” or “’best” items on a topic, from millions and even billions of units of social media content?
Discrete, regular and fixed sampling la\ce
• Shannon‐Nyquist sampling theorem: “If a funcEon x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.”
Interfaces / tools #Responses
TwiQer website 50
TwiQer clients, such as Tweetdeck, TwiQerific etc. 25
Search engines, such as Bing Social 19
Third party apps, such as TwiQer plugin for Google
9
Uni‐dimensional informaEon presentaEon; but social media informaEon is diverse.
CharacterisEcs of social media – high dimensionality
Geo
graphy
Authority
ConversaEonal nature ThemaEc category
InformaEon Diversity InformaEon Diversity
[Simon 1971, Zaichkowsky 1985, Jost 2006]
“Goodness of a set” – using measures of human informaEon processing
Engagement Memory encoding InteresEngness InformaEveness
Dimensional Importance
• Survey based feedback on the importance of different dimensions – referred to as “concentraEon parameters”. – ParEcipants (11 ‘acEve’ TwiQer users) were requested to rate each of the
tweet dimensions on a scale of 1 through 7, where 1 implied “not important at all”, and 7 meant “highly important”.
– The survey also allowed them to idenEfy other dimensions that they might think to be significant.
6/8/11 25
Social Media Content Selec%on
• Every tweet ti represented as a vector of its dimensions and their corresponding weights.
• We propose an iteraEve clustering for tweet set generaEon – based on entropy distorEon minimizaEon technique. – The sets are constructed given a sampling raEo ρ and a diversity parameter value ω.
– The (sub)‐opEmal set to be constructed is represented as, ΨS*(ρ ,ω).
• Start with a random tweet as a seed. • IteraEvely keep on adding tweets from ΨS, say ti, such that the
distorEon (in terms of L1‐norm) of entropy of the set (say,ΨS(i,ω)) on addiEon of the tweet ti is least with respect to the specified diversity measure ω.
6/8/11 26
How does this method compare to state‐of‐the‐art techniques?
6/8/11 27
TwiQer, Firehose, June 2010, total 1.4 Billion tweets
Quan%ta%ve evalua%on framework
• We defined a set of baseline techniques using simplified version of our proposed algorithm: – Random set (B1)
– Randomly sampled diversity level (B2) – Equal weighEng of tweet dimensions (B3)
– Another two methods were used: “most recent” tweets (MR) and “most tweeted URL” (MTU) meaning the tweets corresponding to URLs that were highly shared in the network
6/8/11 28
Cogni%ve metrics
• Explicit Measures. Explicit measures consisted of three 7‐point Likert scale raEngs made a<er reading each tweet set, – “interesEngness” – “informaEveness”
• Implicit Measures. – CogniEve Engagement [Czerwinski 2001] – ideally if the informaEon
presented in a tweet sample is very engaging, the parEcipant would underesEmate the Eme taken to go through it.
– RecogniEon Memory for tweets already shown – related to encoding in the long‐term memory [Sperling 1973, Smith 1979].
6/8/11 30
Hypothesis I. Tweet sets generated by proposed method will be beQer than those from baseline methods.
6/8/11 34
Performance Evalua%on (Contd.)
6/8/11 36
Interac%ons Interes%ngness Informa%veness Cogni%ve engagement
Recogni%on Memory
B1 X PM 0.002 0.009 0.007 0.097
B2 X PM 0.027 0.117 0.011 0.105
B3 X PM 0.241 0.351 0.138 0.411
MR X PM 0.0003 <0.0001 0.003 0.005
MTU X PM 0.061 0.171 0.004 0.214
TesEng for staEsEcal significance: one‐tail paired t‐test; confidence level p<0.1.
Hypothesis II: ParEcipants will perceive the diversity of sets by our method more accurately than by baselines.
6/8/11 37
Diversity Percep%on
6/8/11 38
B1 B2 B3 Proposed Method
d’ Error d’ Error d’ Error d’ Error
ω = 0.1 2.8 20.6% 2.2 11.1% 2.1 8.8% 1.1 7.8%
ω = 0.6 1.7 47.5% 2.9 28.1% 3.3 20.8% 5.4 13.6%
ω = 0.9 5.1 20.6% 5.5 14.6% 6.1 9.5% 6.8 7.3%
Perceived diversity is more accurate for highly heterogeneous and highly homogenous Tweet samples. Diversity percepEon is beQer for our proposed method.
Hypothesis III: ParEcipants responses will be affected by the level of diversity in the various tweet sets shown
6/8/11 39
Impact of Diversity on User Response
6/8/11 40
ParEcipant raEngs on different cogniEve aspects of informaEon consumpEon seems to be higher for highly homogenous and highly heterogeneous informaEon samples
Conclusions
• Content selecEon methodologies of large social spaces that incorporate cogniEve metrics of content consumpEon can enable the design of beQer content exploraEon interfaces. – InformaEon diversity is key
– User appear to cogniEvely encode informaEon beQer, when presented with samples of high or low diversity
6/8/11 41
Are there empirical bounds on what degrees of diversity in a sample best suit content consumpEon?
6/8/11 43
If so, can these entropy signatures guide the content selecEon methodology more adequately?
6/8/11 45
Qualita%ve evalua%on
@Paramedic_Fla Some oil spill events from Monday, June 7, 2010 hQp://bit.ly/cRwfXn
@miamiauto Some oil spill events from Monday, June 7, 2010: A summary of events on Monday, June 7, Day 48 of the Gulf of Mexi... hQp://bit.ly/9HNG9Z
@franklanguage RT @DAYLEE F@CK that! Broken pipe is not NATURAL! RT @RayBeckermanFreedomWorks CEO, Calls Oil Spill Natural Disaster hQp://bit.ly/coUY4l
@Teasdallqrb Public offers 'helpful' ideas on containing BP oil spill ‐ NEWS.com.au
@_paigenesss RT @TEDchris: A Gulf oil spill picture I will never forget. hQp://twitpic.com/1toz8a
@LeiaOfAlderaan CiEzen Speaks The Truth ON BP Gulf Oil Spill‐‐the Govt, BP Are Doing Nothing, There Are No Leaders Here hQp://bit.ly/BP‐Gulf‐Oil‐Spill
@FausEnagwlxo WOOW! NO WAY! so brutal! hQp://ilil.me/h MTV Movie Summer Jam WWDC Oil Spill XEna Another Cinderella Story
@minxdeluxe RT @OliBarreQ: Visualizing the BP Oil Spill hQp://www.ifitwasmyhome.com/
[TwiIer search‐alike] Most Recent tweets [Bing‐alike] Most tweeted URL‐containing tweets
@JosephAGallant Erin Brockovich to meet with fishermen who say oil spill dispersant used by BP made them sick. hQp://huff.to/aGVWIl #tcot #BP #oilspill
@dixie_patriot Oil spill cap catching about 10,000 barrels a day|LONDON ? BP's oil spill cap, designed to stop a huge leak from .. hQp://oohja.com/xeWhD
@MoCuad My heart breaks all over again, every Eme I'm reminded of the oil spill.
@NFGNL Looking for Liability in BP's Gulf Oil Spill: White Collar Watch examines the potenEal criminal and civil liab.. hQp://nyE.ms/9lUMaT
@jameelee How You Can Volunteer to Clean Up the Gulf of Mexico Oil Spill hQp://ow.ly/1V3cu
@conchkid Gulf;Oil Spill Many Federal Judges Have Links To Oil Industry hQp://bit.ly/9v45UT
@NewsOnGreen BP Oil Spill: Containment Cap To Be Replaced Next Month hQp://dlvr.it/1WDZ8
@TrinitySaveNeo CiEzen Speaks The Truth ON BP Gulf Oil Spill‐‐the Govt, BP Are Doing Nothing, There Are No Leaders Here hQp://bit.ly/BP‐Gulf‐Oil‐Spill
Proposed Method (user‐weighted; ω=0.1; ordered) Proposed Method (user‐weighted; ω=0.6; ordered) 48 6/8/11