+ All Categories
Home > Documents > Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing...

Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing...

Date post: 30-Mar-2015
Category:
Upload: joan-pasker
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng; Twitter: @ipeirotis “A Computer Scientist in a Business School” http://behind-the-enemy-lines.com
Transcript
Page 1: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Rewarding Crowdsourced Workers

Panos Ipeirotis

New York Universityand

Google

Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Twitter: @ipeirotis

“A Computer Scientist in a Business School”

http://behind-the-enemy-lines.com

Page 2: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Example: Build an Web Page Classifier

Need a large number of labeled sites for training Get people to look at sites and label them as:

G (general audience) PG (parental guidance) R (restricted) X (porn)

Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost:

$15/hr Mechanical Turk: 2500 websites/hr, cost:

$12/hr

Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost:

$15/hr Mechanical Turk: 2500 websites/hr, cost:

$12/hr

Page 3: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Challenges

We do not know the true category for the objects– Available only after (costly) manual inspection

We do not know quality of the workers

We want to label objects with true categories We want (need?) to know the quality of the workers

Page 4: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

1. Initialize “correct” label for each object (e.g., use majority vote)

2. Estimate error rates for workers (using “correct” labels)

3. Estimate “correct” labels (using error rates, weight worker votes according to quality)

4. Go to Step 2 and iterate until convergence

1. Initialize “correct” label for each object (e.g., use majority vote)

2. Estimate error rates for workers (using “correct” labels)

3. Estimate “correct” labels (using error rates, weight worker votes according to quality)

4. Go to Step 2 and iterate until convergence

Expectation Maximization Estimation

Iterative process to estimate worker error rates

Page 5: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Challenge: From Confusion Matrixes to Quality Scores

How to check if a worker is a spammer

using the confusion matrix?(hint: error rate not enough)

How to check if a worker is a spammer

using the confusion matrix?(hint: error rate not enough)

Confusion matrix for spammer worker

P[X → X]=0.847% P[X → G]=99.153%

P[G → X]=0.053% P[G → G]=99.947%

Confusion matrix for good worker P[X → X]=99.847% P[X →

G]=0.153% P[G → X]=4.053% P[G →

G]=95.947%

Page 6: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Challenge 1: Spammers are lazy and smart!

Confusion matrix for spammer P[X → X]=0% P[X → G]=100%

P[G → X]=0% P[G → G]=100%

Confusion matrix for good worker

P[X → X]=80% P[X → G]=20%

P[G → X]=20%P[G → G]=80% Spammers figure out how to fly under the radar…

In reality, we have 85% G sites and 15% X sites

Error rate of spammer = 0% * 85% + 100% * 15% = 15% Error rate of good worker = 85% * 20% + 85% * 20% = 20%

False negatives: Spam workers pass as legitimate

Page 7: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Challenge 2: Humans are biased!

Error rates for legitimate (but biased) employeeP[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →

X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →

X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%

Error rates for legitimate (but biased) employeeP[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →

X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →

X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%

We have 85% G sites, 5% P sites, 5% R sites, 5% X sites

Error rate of spammer (all G) = 0% * 85% + 100% * 15%

= 15% Error rate of biased worker = 80% * 85% + 100% * 5% =

73%

False positives: Legitimate workers appear to be spammers

(important note: bias is not just a matter of “ordered” classes)

Page 8: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Solution: Fix bias first, compute error rate afterwards

When biased worker says G, it is 100% G When biased worker says P, it is 100% G When biased worker says R, it is 50% P, 50% R When biased worker says X, it is 100% X

Small ambiguity for “R-rated” votes but other than that, fine!

Error Rates for legitimate (but biased) employeeP[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →

X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →

X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%

Error Rates for legitimate (but biased) employeeP[G → G]=20.0% P[G → P]=80.0%P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P →

X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R →

X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%

Page 9: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

When spammer says G, it is 25% G, 25% P, 25% R, 25% X When spammer says P, it is 25% G, 25% P, 25% R, 25% X When spammer says R, it is 25% G, 25% P, 25% R, 25% X When spammer says X, it is 25% G, 25% P, 25% R, 25% X[note: assume equal priors]

The results are highly ambiguous. No information provided!

Error Rates for spammerP[G → G]=100.0% P[G → P]=0.0% P[G → R]=0.0% P[G → X]=0.0%P[P → G]=100.0% P[P → P]=0.0% P[P → R]=0.0% P[P → X]=0.0%P[R → G]=100.0% P[R → P]=0.0% P[R → R]=0.0% P[R → X]=0.0%P[X → G]=100.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=0.0%

Error Rates for spammerP[G → G]=100.0% P[G → P]=0.0% P[G → R]=0.0% P[G → X]=0.0%P[P → G]=100.0% P[P → P]=0.0% P[P → R]=0.0% P[P → X]=0.0%P[R → G]=100.0% P[R → P]=0.0% P[R → R]=0.0% P[R → X]=0.0%P[X → G]=100.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=0.0%

Solution: Fix bias first, compute error rate afterwards

Page 10: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

[***Assume misclassification cost equal to 1, solution generalizes to arbitrary misclassification costs across categories]

• High cost: probability spread across classes

• Low cost: probability mass concentrated in one class

Assigned Label Corresponding “Soft” Label Label Cost

Spammer: G <G: 25%, P: 25%, R: 25%, X: 25%> 0.75

Good worker: P <G: 100%, P: 0%, R: 0%, X: 0%> 0.0

Expected Misclassification Cost

Page 11: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Quality Score

• A spammer is a worker who assigns labels randomly, regardless of what the true class is.

• Quality score useful for ranking workers

• Unaffected by systematic biases

• Scalar, so no need to examine confusion matrices

)P(

)Worker(1)Worker(

riorCost

CostreQualitySco

Quality Score: A scalar measure of quality

Page 12: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Quality Score

Challenges

• Thresholding has wrong incentive structure:

• Decent (but still useful) workers remain unused

• If you are above the threshold, no need to improve

• Uncertainty: The quality score is not really a fixed number

• Fluctuations in payment are puzzling for workers

• Best to have only increases in payment

Question: How to pay workers?

Page 13: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Two Types of Workers Divide workers into two groups

– Qualified Workers• The quality satisfies the target

quality

– Unqualified Workers• The qualify fails to meet the target

quality levels

Page 14: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

– p: the price paid to all qualified workers

– fW(w): the pdf distribution of worker reservation

wage

– FW(w): the cdf distribution of worker reservation

wage

– R: the fixed price paid by external client for

each qualified object

A Simple Pricing Model for Qualified Workers

Page 15: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

15fW(w): LogNormal(3,1), Selling price R=50

Example

fW(w)

Revenue

Optimal Worker Salary = 21

Page 16: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

The Value of Unqualified Workers

Number of Workers 1 3 5 7 9 11

Classification Cost 0.300

0.216

0.163 0.126 0.099 0.079

We need ~9 workers to achieve the required quality

Binary classification (1:1) Worker confusion matrix

“Accept” classification cost <=0.1

Value: 1/9

Page 17: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

– p: the price paid to all qualified workers

– fW(w): the pdf distribution of worker reservation

wage

FW(w): the cdf distribution of worker reservation

wage

– Adjust for the presence of “unqualified”

workers, and each unqualified worker “counts”

as 1/k of a qualified one

– R: the fixed price paid by external client for each qualified

object

A Pricing Model for Workers

Page 18: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Optimal Prices

p* p*/3

p*/9

Page 19: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Quality Score

Challenges

• Thresholding has wrong incentive structure:

• Decent (but still useful) workers remain unused

• If you are above the threshold, no need to improve

• Uncertainty: The quality score is not really a fixed number

• Fluctuations in payment are puzzling for workers

• Best to have only increases in payment

Question: How to pay workers?

Page 20: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Bayesian Estimates for Uncertainty

P[0 → 0]=Beta(2,1) P[0 → 1]=Beta(1,2)

P[1 → 0]=Beta(1,2) P[0 → 0]=Beta(2,1)

P[0 → 0]=Beta(101,1) P[0 → 1]=Beta(1,101)

P[1 → 0]=Beta(1,101) P[0 → 0]=Beta(101,1)

Worker A Worker B

Page 21: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Example of the piece-rate payment of a worker

Real-Time Payment and Reimbursement

Fair Paymen

t

# Tasks 10 20 30 40 Infinity

Piece-rate Payment (cents)

11 18 21 23 40

Page 22: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Example of the piece-rate payment of a worker

10

Real-Time Payment and Reimbursement

Fair Paymen

tPotenti

al “Bonus

# Tasks 10 20 30 40 Infinity

Piece-rate Payment (cents)

11 18 21 23 40

Payment

Number of Tasks

Piece

-rate

Paym

ent

Page 23: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

10 20

Real-Time Payment and Reimbursement

Fair Paymen

tPotenti

al “Bonus

# Tasks 10 20 30 40 Infinity

Piece-rate Payment (cents)

11 18 21 23 40

Example of the piece-rate payment of a worker

Payment

Payment

Reimbursement

Number of Tasks

Piece

-rate

Paym

ent

Page 24: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

10 20

Real-Time Payment and Reimbursement

Fair Paymen

t

30

Potential

“Bonus”

# Tasks 10 20 30 40 Infinity

Piece-rate Payment (cents)

11 18 21 23 40

Example of the piece-rate payment of a worker

Payment

Payment

Reimbursement Payment

Reimbursement

Piece

-rate

Paym

ent

Page 25: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

10 20

Real-Time Payment and Reimbursement

Fair Paymen

t

30 40

Potential

“Bonus”

# Tasks 10 20 30 40 Infinity

Piece-rate Payment (cents)

11 18 21 23 40

Example of the piece-rate payment of a worker

Payment

Payment

Reimbursement Payment

Reimbursement

Payment

Reimbursement

Piece

-rate

Paym

ent

Page 26: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Synthetic Experiment Setup

N=10,000 tasks

R=200 cents

Cost <= 0.01

Labeling Process: Workers– Arrival frequency: every 600 seconds– Number of each arrival: 10 workers– Submitting speed: 30 seconds per task

The evaluation criterion is Unit Time Profit:

Page 27: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

(Synthetic) Experimental Setup

Scatter Plot of Confusion Matrix, Reservation Wage

Qualify level and reservation wage independently distributed

Qualify level and reservation wage positively correlated

Page 28: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Experimental Results

No Correlation Positive Correlation0

5

10

15

20

25

30

35

Quality-Based PriceUniform Price: 4.4(30%-quan-tile)Uniform Price: 5.7(40%-quan-tile)Uniform Price: 7.4(50%-quan-tile)Uniform Price: 9.5(60%-quan-tile)Uniform Price: 12.5(70%-quantile)

24.6%

159.6%

Avera

ge P

rofit

per

Seco

nd

Page 29: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

29

Workers reacting to bad rewards/scores

Score-based feedback leads to strange interactions:

The “angry, has-been-burnt-too-many-times” worker: “F*** YOU! I am doing everything correctly and you know

it! Stop trying to reject me with your stupid ‘scores’!”

The overachiever worker: “What am I doing wrong?? My score is 92% and I want to

have 100%”

Page 30: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

30

National Academy of Sciences Dec 2010 “Frontiers of Science” conference

Your workers behave like my mice!

An unexpected connection…

Page 31: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

31

Your workers behave like my mice!

Eh?

Page 32: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

32

Your workers want to use only their motor skills, not their cognitive skills

Page 33: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

33

The Biology Fundamentals

Brain functions are biologically expensive (20% of total energy consumption in humans)

Motor skills are more energy efficient than cognitive skills (e.g., walking)

Brain tends to delegate easy tasks to part of the neural system that handles motor skills

Page 34: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

34

An unexpected connection at theNAS “Frontiers of Science” conf.

Your workers want to use only their motor skills, not their cognitive skills

Makes sense

Page 35: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

35

An unexpected connection at theNAS “Frontiers of Science” conf.

And here is how I train my mice to behave…

Page 36: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

36

The Mice Experiment

Cognitive

Solve mazeFind pellet

MotorPush lever three times

Pellet drops

Page 37: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

37

How to Train the Mice?

Confuse motor skills!Reward cognition!

I should try this the moment that I get back to my room

Page 38: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

38

Punishing Worker’s Motor Skills

Punish bad answers with frustration of motor skills (e.g., add delays between tasks)– “Loading image, please wait…”– “Image did not load, press here to reload”– “404 error. Return the HIT and accept again”

→Make this probabilistic to keep feedback implicit

Page 39: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

39

Rewarding (?) Cognitive Effort

Reward good answers by rewarding the cognitive part of the brain – Introduce variety– Introduce novelty– Give new tasks fast– Show score improvements faster (but not the opposite)

– Show optimistic score estimates

Page 40: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

40

Page 41: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

41

Experiments

Web page classification Image tagging Email & URL collection

Page 42: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

42

Experimental Summary (I)

Spammer workers quickly abandon– No need to display scores, or ban– Low quality submissions from ~60% to ~3%– Half-life of low-quality from 100+ HITs to less than 5

Good workers unaffected– No significant effect on participation of workers with

good performance– Lifetime of participants unaffected– Longer response times (after removing the

“intervention delays”; that was puzzling)

Page 43: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

43

Experimental Summary (II)

Remember, scheme was for training the mice…

15%-20% of the spammers start submitting good work!

????

Page 44: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

44

Two key questions

Why response time was slower for some good workers?

Why some low quality workers start working well?

????

Page 45: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

45

System 1: “Automatic” actions

System 2:“Intelligent” actions

Page 46: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

46

System 1 Tasks

Page 47: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

47

System 2 Tasks

Page 48: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

48

Status: Usage of System 1

(“Automatic”)

Status: Usage of System 2

(“Intelligent”)

Not Performing Well?Disrupt and Engage

System 2

Performing Well?Check if System 1

can Handle, remove System 2 stimuli

Performing Well?

Not Performing Well?Hell/Slow ban

Out

Page 49: Rewarding Crowdsourced Workers Panos Ipeirotis New York University and Google Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng;

Thanks!

Q & A?


Recommended