Post on 16-Dec-2015
transcript
Task Assignment Optimization in Crowdsourcing(and its applications to political campaigns)
Sihem Amer-YahiaDR CNRS @ LIG
Sihem.Amer-Yahia@imag.fr
POLIWEB PEPS
Grenoble Dec 8th, 2014
POLIWEB 2014
Traditional Campaigning
Social MediaCampaigning
Sources Onsite campaign officers
Report timeliness Mixed (limited resources)
Report quality ✔ (reliance on experts)
Report cost High (time spent and bus tours)
POLIWEB 2014
Traditional Campaigning
Social MediaCampaigning
Sources Onsite campaign officers
Tweets, FB posts, text messages, dedicated websites
Report timeliness Mixed (limited resources)
✔
Report quality ✔ (reliance on experts)
Mixed (noisy; rumor and misinformation)
Report cost High (time spent and bus tours)
✔
POLIWEB 2014
Barack Obama’s campaign, 2012http://www.theguardian.com/world/2011/apr/04/barack-obama-twitter-facebook-election
• An article in The Guardian (Monday 4 April 2011) noted that this was the first U.S. presidential re-election campaign to use Twitter and Facebook for promotion.
• Twitter hashtag: #Obama2012• Bill Clinton and George Bush, spent the first phases of their
campaigns on a nationwide bus tour. • Obama was covering the country with less effort, speaking to
500,000 of his grassroot activists via a live teleconference.
POLIWEB 2014
Share your story on https://stories.barackobama.com/
POLIWEB 2014
Individual stories
POLIWEB 2014
Expressing Task Assignment in ECCO
• Input: tasks to complete, human workers• Output: completed tasks
• Each task has skill/quality/cost requirements • Each worker has human factors: skill, expected
wage, acceptance ratio
• Desirable properties:– Task-centric: high quality tasks (relevant workers), low cost– Worker-centric: balanced workload, good incentive (high pay,
relevant tasks)– System-centric: low latency
POLIWEB 2014
Talk outline
1. Quick overview of existing crowdsourcing
2. Task assignment in ECCO
POLIWEB 2014
Crowdsourcing
• Crowdsourcing: a variety of tasks– Micro-tasks: data gathering (e.g. picture/video tagging, opinion
solicitation (e.g. restaurant ratings,)– Collaborative tasks: document editing (e.g., Wikipedia), creative
design, fansubbing, solution outsourcing (e.g., Netflix contest)
• Existing systems– Platforms: AMT, Turkit, Innocentive, CloudFlower, etc.– Crowd: volatile, asynchronous arrival/departure, various levels of
attention/accuracy/expertise
• 3 primary processes– Worker skill estimation– Worker-to-task assignment– Task accuracy evaluation
POLIWEB 2014
Challenges
• Who Evaluates What and How?• How to Estimate Worker Skills?• How to Assign Tasks to Workers?• How to do all of the above efficiently?
• Magnified by: – Human factors– Scale
POLIWEB 2014
Related work
• Developing a dedicated platform for each campaign is costly
• Recent research undertakes some challenges in silo, for specific cases: e.g. real-time crowdsourcing, highly volatile crowds, single worker skill– Active learning strategies for task accuracy improvement [Boim
et. Al. 2012, Krager et. al. 2011, Ramesh et. al. 2012]– Worker-to-task-assignment [Ho et. al. 2012]
• Human involvement introduces uncertainty– Worker availability– Worker wage: deviations even among persons of the same
profile, due to workload, time– Worker skill: may decline with workload, change with motivation
Task Assignment Optimization in
Knowledge-IntensiveCrowdsourcing@VLDBJ 2015 (to appear)
joint work with:Senjuti Basu Roy (UW Tacoma), Gautam Das, Habibur Rahman,
Saravanan Thirumuruganathan (UT Arlington)
POLIWEB 2014
Maximize task quality under task-centric and worker-centric constraints
objective: maximize aggregated vt
aggregated worker skills and wagestask quality constraint
task budget
POLIWEB 2014
Task Assignment Solution Overview
Task Assignment Problem is NP-hard (reduction using Multiple-Knapsack Problem)
Our approach:• Offline – Index Building for a workload of tasks• Online – Index Maintenance when tasks occurs
– How to replace a worker who is not available or does not accept a task?
POLIWEB 2014
Optimal Solution Offline Index Building
IP-based
POLIWEB 2014
Approximation Solution for Offline Index Building
• Objective function submodular and becomes monotonic when W2 = 0
• Contribution to index building– A greedy deterministic algorithm with a 1-1/e approximation
factor when submodular and monotonic– A greedy randomized algorithm with a 2/5 approximation factor
when submodular • Contribution to index maintenance
– Solve a marginal IP– Cluster workers to reduce size
POLIWEB 2014
Experimental Evaluation
• Quality Experiments using multiple Applications– Collaborative Document Editing– Workers asked to produce reports on 5 different topics: 1) Political unrest in Egypt, 2) NSA document leakage, 3) Playstation games, 4) All electric cars and 5) Global warming
• Scalability experiments – A collaborative crowd simulator
POLIWEB 2014
Quality Experiments
• A total of 230 workers hired on AMT
• A set of 8 multiple choice questions per task, to assess skills
• Study conducted in multiple phases– Phase1- Skill and Cost of workers learned using benchmark
dataset– Phase2- Task assignment– Phase3- Completed tasks evaluated by crowd workers
POLIWEB 2014
AMT worker distributions (Egypt task)
POLIWEB 2014
AMT worker distributions (Egypt task)
POLIWEB 2014
Quality Assessment
• Scale of 1-5 by 150 AMT workers• Compared to Benchmark and Online-greedy
POLIWEB 2014
Summary of Quality Experiments(from translation task)
• Higher affinity impacts positively quality• A large group (beyond size 10) is less effective• Region-based affinity is more effective than age-
gender based
POLIWEB 2014
Summary and Future Work
1. Crowdsourcing is a powerful paradigm for campaign reporting
2. Implicit reporting serves campaign awareness
3. Explicit reporting with recurring crowds opens new research opportunities for effective task assignment
4. Task assignment is effective when skill learning and task evaluation are possible
5. Research in this area may rely on a general-purpose crowdsourcing platform
POLIWEB 2014
Opportunities
POLIWEB 2014
Traditional Campaigning
Social MediaCampaigning
Future Campaigning?
Sources Onsite campaign officers
Tweets, FB posts, text messages, dedicated websites
Online tweet analysis discovers and follows course of campaign
Report timeliness Mixed (limited resources)
✔ ✔ (if tweet monitoring is accurate and timely)
Report quality ✔ (reliance on experts)
Mixed (noisy; rumor and misinformation)
✔ (via careful assignment)
Report cost High (time spent and bus tours)
✔ ✔ (within budget)
POLIWEB 2014
Social media for large studies of behaviorhttp://www.sciencemag.org/content/346/6213/1063.summary?sid=084adb48-e72d-463d-81f6-cd71d3031d27
• On 3 November 1948, the day after Harry Truman won the United States presidential elections, the Chicago Tribune published one of the most famous erroneous headlines in newspaper history: “Dewey Defeats Truman”
• The headline was informed by telephone surveys, which had inadvertently under-sampled Truman supporters
• Mounting evidence suggests that many of the forecasts and analyses being produced with social media analysis misrepresent the real world [1]
[1] Zeynep Tufekci: Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. ICWSM 2014
POLIWEB 2014
Using Social media for large studies of behavior
• For instance, Instagram is “especially appealing to adults aged 18 to 29, African-American, Latinos, women, urban residents” [2]
• Pinterest is dominated by females, aged 25 to 34, with an average annual household income of $100,000 [3]
• These sampling biases are rarely corrected for (if even acknowledged)• Population proxy effect has caused substantially incorrect estimates of political
orientation on Twitter [4]• Academic culture that celebrates only positive findings. Without seeing failed
studies, we cannot assess the extent to which successful findings are the result of random chance. Issue has been observed when predicting political election outcomes with Twitter [5]
[2] M. Duggan, J. Brenner, The demographics of social media users; www.pewinternet.org/2013/02/14/the-demographics-of-social-media-users-2012/
[3] 13 ‘pinteresting’ facts about Pinterest users; www.pinterest.com/pin/234257618087475827/
[4] Raviv Cohen, Derek Ruths: Classifying Political Orientation on Twitter: It's Not Easy! ICWSM 2013
[5] H. Schoen et al., Internet Res. 23, 528 (2013)
POLIWEB 2014
Share your story on obamacare.com