Some Examples
Call for professional helps Award 50,000 to 1,000,000 for each tasks
Office work platform
Microtask platform Over 30,000 tasks at the same time
Amazon Mechanical Turk
A micro-task marketplace Task prices are usually between 0.01
to 1 USD Easy-to-use interface
Amazon Mechanical Turk
Human Intelligence Task (HIT) Tasks hard for computers
Developer Prepay the money Publish HITs Get results
Worker Complete the HITs Get paid
A Survey of Mechanical Turk Survey on 1000 Turkers (Turk
workers) Two identical surveys (Oct. 2008 and
Dec. 2008) Consistent results Blog post:
A Computer Scientist in a Business School
Compare with Internet Demographics Use the data from ComScore
In summary, Tukers are younger
Portion of 21-35 years old: 51% vs. 22% in internet mainly female
70% female vs. 50 % female having lower income
65% turkers with income < 60k/year vs. 45% in internet
having smaller family 55% turkers have no children vs. 40% in internet
Dataset Collection
Dataset is important in computer science!
In multimedia analysis Is there X in the image Where is Y in the image
In natural language processing What is the emotion of this sentence
And in lots of other applications
Dataset Collection
Utility Annotation By Sorokin and Forsyth at UIUC Image analysis
Type keyword Select examples Click on landmarks Outline figures
Dataset Collection
Linguistic annotations (Snow et al. 2008) Word similarity
USD 0.2 to label 30 word pairs
Dataset Collection
Linguistic annotations (Snow et al. 2008) Affect recognition
USD 0.4 to label 20 headlines (140 labels)
Dataset Collection
Linguistic annotations (Snow et al. 2008) Textual entailment
If “Microsoft was established in Italy in 1985”, then “Microsoft was established in 1985” ?
Word sense disambiguation “a bass on the line” vs. “a funky bass line”
Temporal annotation Ran happens before fell:
“The horse ran past the barn fekk”
Dataset Collection
Document relevance evaluation Alonso et al. (2008)
User rating collection Kittur et al. (2008)
Noun compound paraphrasing Nakov (2008)
Name resoluation Su et al. (2007)
Quality
Multiple non-experts can beat experts 三個臭皮匠勝過一個諸葛亮 Black line
agreement among turkers
Green line: single expert
Golden result: agreement among
multiple experts
QoE Measurement
QoE (Quality of Experience) Subjective measure of user perception
Traditional approach User studies by MOS ratings (Bad ->
Excellent) Crowdsourcing with paired
comparison Diverse user input Easy to understand Interval scale scores can be calculated
Iterative Tasks
Turkit: tools for iterative tasks on Mturk
Imperative programming paradigm Basic elements
Variable (a = b) Control (if else statement) Loop (for, while statement)
Turning MTurk into a programming platform which integrates human brain powers
Iterative Text Improvement A Wikipedia-like scenario
One Turker improve the text Other Turkers vote if the improvement is
valid
Iterative Text Improvement Image description
Instructions for the improve-HIT Please improve the description for this image People will vote whether to approve your
changes Use no more than 500 characters
Instructions for the vote-HIT Please select the better description for this
image Your vote must agree with the majority to be
approved
Iterative Text Improvement Image description A partial view of a pocket calculator
together with some coins and a pen.
A view of personal items a calculator, and some gold and copper coins, and a round tip pen, these are all pocket and wallet sized item used for business, writing, calculating prices or solving math problems and purchasing items.
A close-up photograph of the following items:* A CASIO multi-function calculator* A ball point pen, uncapped* Various coins, apparently European, both copper and gold
…Various British coins; two of £1 value, three of 20p value and one of 1p value. …
Iterative Text Improvement Image descriptionA close-up photograph of the following
items:
A CASIO multi-function, solar powered scientific calculator.
A blue ball point pen with a blue rubber grip and the tip extended.
Six British coins; two of £1 value, three of 20p value and one of 1p value.
Seems to be a theme illustration for a brochure or document cover treating finance - probably personal finance.
Iterative Text Improvement Handwriting Recognition
Version 1 You (?) (?) (?) (work). (?) (?) (?) work (not)
(time). I (?) (?) a few grammatical mistakes. Overall your writing style is a bit too (phoney). You do (?) have good (points), but they got lost amidst the (writing). (signature)
Iterative Text Improvement Handwriting Recognition
Version 6 “You (misspelled) (several) (words). Please spell-
check your work next time. I also notice a few grammatical mistakes. Overall your writing style is a bit too phoney. You do make some good (points), but they got lost amidst the (writing). (signature)”
Repeated Labeling
Crowdsourcing -> Multiple imperfect labeler Each worker is a labeler Labels are not always correct
Repeated labeling Improve the supervised induction
Increase the single-label accuracy Decrease the cost for acquiring training data
Repeated Labeling
Repeated labeling helps improve the overall quality when the accuracy of single labeler low.
Selected Repeated Labeling Repeat-label the most uncertain points
Label uncertainty (LU) Whether the label distribution is stable Calculated from beta distribution
Model uncertainty (MU) Whether the model has high confidence
for the label Calculated from model predictions
Selected Repeated Labeling Selected repeated labeling improves
the overall quality of crowdsourcing approach.
GRR: no selected repeated labelingMU: Model UncertaintyLU: Label UncertaintyLMU: integrate Label and Model Uncertainty
Incentive vs. Performance
High financial incentive -> high performance?
User studies (Mason and Watt 2009) Order images
Ex: choose the busiest image
Solve word puzzles
Incentive vs. Performance
Workers always wants moreH
ow
much
work
ers
thin
k th
ey d
ese
rve
Users would be influenced by their paid amount
Pay little at first, and incrementally increase the payment