Loren TerveenComputer Science & Engineering
The University of MinnesotaAugust 2011
If You Build It? Benefits and Costs of Creating Your
Own Online Community
1
TheorySimulationLab studiesSurveysQualitative studiesBuild and learn
(e.g., Google, Facebook, Wikipedia)Build To Learn
Background: ways of knowing
Build to learn
GroupLens Research• Create new interaction /
social computing techniques• Do empirical, quantitative
research• Learn from what we and
others build
DataExperimental Control
To answer the kinds of research questions we like to ask, we need:
1. Learning from others’ data2. Learning from our own data3. Exercising experimental control
The rest of the talk
Q&A systemsWikipedia
1. Learning from others’ data
WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance. Lam, S.K., Uduwage, A., Dong, Z., Sen, S., Musicant, D.R., Terveen, L., Riedl, J. WikiSym 2011.
NICE: Social translucence through UI intervention. A. Halfaker, B. Song, D. A. Stuart, A. Kittur and J. Riedl. Wikisym 2011.
Don't bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. A. Halfaker, A. Kittur and J. Riedl. Wikisym 2011.
Mentoring in Wikipedia: A Clash of Cultures. D. Musicant, Y. Ren, J. Johnson and J. Riedl. Wikisym 2011.
The Effects of Group Composition on Decision Quality in a Social Production Community, Lam, S.K., Karim, J., Riedl, J. Group 2010.
The Effects of Diversity on Group Productivity and Member Withdrawal in Online Volunteer Groups, Chen, J., Ren, Y., Riedl, J. CHI 2010.
rv you're dumb: Identifying Discarded Work in Wiki Article History, Ekstrand, M.D., Riedl, J.T. Wikisym 2009.
A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia, Halfaker, A., Kittur, N., Kraut, R., Riedl, J. Wikisym 2009.
Is Wikipedia Growing a Longer Tail?, Lam, S.K., Riedl, J. Group 2009. Wikipedians are born, not made: a study of power editors on Wikipedia, Panciera, K., Halfaker, A.,
Terveen, L. Group 2009. SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia, Cosley, D.,
Frankowski, D., Terveen, L., Riedl, J. IUI 2007. Creating, Destroying, and Restoring Value in Wikipedia, Priedhorsky, R., Chen, J., Lam, S.K.,
Panciera, K., Terveen, L., Riedl, J. Group 2007.
GroupLens Wikipedia Research
WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance. Lam, S.K., Uduwage, A., Dong, Z., Sen, S., Musicant, D.R., Terveen, L., Riedl, J.
www.grouplens.org/node/466
9
http://www.nytimes.com/2011/01/31/business/media/31link.html?_r=1&src=busln
A topic generally restricted to teenage girls, like friendship bracelets, can seem short at four paragraphs when compared with lengthy articles on something boys might favor, like, toy soldiers or baseball cards, whose voluminous entry includes a detailed chronological history of the subject.
(BTW, it’s not about the friendship bracelets)
Trigger…
Only 16% of new editors joining Wikipedia during 2009 identified themselves as women
Women made only 9% of the edits by this cohortNew women editors are more likely to stop editing
and leave Wikipedia when their edits are revertedTopics of particular interest to women appear to
get less (and poorer) coverage in Wikipedia
(Hmm… maybe Wikipedia has a low collective IQ!)
Come to Wikisym to get the details!
Findings
2. Learning from our own data
MovieLensCyclopath
GroupLens online communities
200 Union St SE
Lagoon Theatre
How do contributors to open content systems become contributors?
Inspired by…
Research Question
Wikipedians fill different niches than non-Wikipedians
Wikipedians branch out to new areas and topics as they mature
Wikipedians take on more “community work” as they mature
Becoming WikipedianBryant, Forte, & Bruckman 2005
Qualitative study with nine participants self-reporting
Evidence for “becoming”?
Our goal: test these findings quantitatively
Quantity of workQuality of workNature of work
Are Wikipedians Born or Made?
A registered editor with 250+ edits over his/her lifetime
Wikipedian
If editors reach 250 edits within our data set, they are labeled Wikipedian from the
beginning
DataEnglish Wikipedia dump (January 13, 2008)Edits from bots and other non-human means removed
We counted: Only registered editors Wikipedians (users with 250+ edits) - 38K Non-wikipedians - random sample of 38K
Edits per day per editor(“User days”)
(“Day 1”)
Quantity
Quantity
Is a user’s fate sealed?
Born MadeWikipedians are
Measure: Persistent Word Revisions (PWRs)Proportion of words added that persist five
revisions
Quality
Quality
Other quality metrics?
Born MadeWikipedians are
Conjecture: Wikipedians take on community maintenance work over time
Several ways to formalizeEditing in “talk” (and other) namespaces
(Nope: still “born”)Referring to “community norms” (Wikipedia
policies) to explain edits
Nature of Work
Community
Learning norms vs. learning to appeal to the norms?
Training: effective editing
Born MadeWikipedians are
Common pattern: Initial burst of activity, decline, steady state Wikipedians look different from day one Little evidence for “Becoming Wikipedian”: Wikipedians are
born, not made Can we reconcile? This is depressing!
Possible responses: Early interventions Change the culture Systemic initiatives, e.g., APS Wikipedia Initiative: http://
www.psychologicalscience.org/index.php/members/aps-wikipedia-initiative
Accept the reality of the long tail
Summary of findings
We can’t ask Wikipedia users about our interpretations
What if the learning happened before users registered?
But: methodological worries
As of September 2009, we identified:1172 “unambiguous” users
268 of these users made some edits440 “ambiguous” users
For unambiguous usersDay 1 = First time a user came to the site (not
the day they registered)
Cyclopath: viewing and pre-registration activities are visible
Same pattern as for Wikipedia
0
50
100
150
200
250
300
Do Not Edit Do Edit
# of
use
rs
And few users edited before registration
Some viewing before registration
# of users
0100200300400500600700800
0 1-50 51-100 101-250 251-500 501-1000
1001+
A minute or two
<= 5 min. <= 15<= 30 <= 60
But amount of viewing before registration (or before editing) does not predict subsequent behavior
“Born, Not Made” still seems true
FollowupsCyclopath user surveys – Wikisym 2011 paper
Why these patterns?What ‘triggers’ initial contribution?And how might we nurture ongoing
participation?Cyclopath contextual interviews
planned
3. Exercising Experimental Control
Motivating participation: How can we get more work done in open content systems?
Idea: match users with tasks they’re likely to be interested in and capable of doing
Requirements:Introduce tasks matching algorithms/interfacesAssign users to different conditionsGather data necessary for evaluationSurvey users
Research Question
Get work done Nurture new
usersServe community
Goals
Recommender algorithms
Interaction design
ToolsCollective Effort Model
Social Influence
Theory
Intelligent Task
Routing
MovieLens
Task: Edit movie
content
Four strategies to suggest movies to a user
High Pred(individual value of outcomes)
Pick movies the system thinksthe user will really like
Rare Rated(lower effort for a given performance)
Pick movies the user has ratedthat few others have
Needs Work (contribution matters to group)
Pick movies that are missingthe most information
Random(baseline)
Pick random movies
theory-based
Assign ML users to four groups, one per algorithmAbout 2,000 subjects, 200 contributors
Count # editors, contributions, fields
The experiment
Editing behavior by strategy
0
50
100
150
200
250
Number of editors Number of edits Fields filled inMetric
Cou
nt
HighPredRareRated
NeedsWorkRandom
Rare rated: dominantNeeds work: bang for buckRandom: not bad hereHigh prediction: lousy
Task matching workedFamiliarity of user with task was most helpful
Reduces effortIncreases value
Note: we’ve tried this approach in Wikipedia and Cyclopath, tooDifferent issuesGenerality
Summary of findings
MovieLens14 years of continuous developmentSeveral complete software architecture / UI
redos (and another needed!)1 full-time software engineerMuch graduate student time over the years
That’s great, but is there a catch?
~140K lines of code, in multiple languages1 full-time software engineerGrad students: expectation they will spend
25-30% of their time on ‘development’ tasksLooming tasks:
UI redesign / reimplementationExpanding geographic coverage
Cyclopath
Significant resources devoted to developmentBut: typically enables new experiments and/or
builds the user communityAnd: funding for these resources often came only
due to the success of the system/community
Adding it up
Fewer papersBut: papers of a type that would be
impossible otherwiseWe can investigate questions in different
settings, applying different methods: cumulative science
Cycloplan (in collab. with Metropolitan Council)Planners can develop ideas informed by usage data (“What if
I add a trail here?”)Planners can share plans with publicPublic can explore plans, give feedback (“How much would
my route be improved with this trail?”)Public can share concerns directly to relevant officials
Participatory Crowdsourcing (in collab. with IBM)Citizens as sensorsContinua of participation; incentives
Models for participation in open content systemsRoles, privileges, processes: Nupedia vs. WikipediaModels for volunteer participation
Initial vs. ongoing
Towards TMSP
The GroupLens Research Group, particularly:John RiedlJoe KonstanReid PriedhorskyDan CosleyKatie Panciera
And:Tom Erikcson, IBM
Me:[email protected]: @lorenterveen
Thanks to…