Task and Workflow Design II
KSE 652 Social Computing System Design and Analysis
Uichin Lee
Contents
• Turkomatic: divide and conquer strategy for performing more “challenging tasks” in M-Turk
• TurKontrol: decision-theoretic approach for work-flow control (e.g., how many improve/vote tasks?)
• Turkalytics: monitoring workers’ behavior remotely
Turkomatic: Automatic Recursive Task and Workflow Design for Mechanical Turk
CHI'11 WIP
Turkomatic• Turkomatic interface accepts task requests
written in natural language• Subdivide phase:
– For each request, it posts a HIT to M-Turk, asking workers to break the task down into a set of logical subtasks
– Each subtask is then automatically reposted to M-Turk; subtask can be further broken down
• Merge phase: – Once all subtasks are completed, HITs are
posted asking workers to combine subtask solutions into a coherent whole
• The end result will then be delivered to the requester
Subdivide Phase
• Decomposition of tasks, and the creation of solution elements
Divide and Merge
Divide and Merge
Evaluation• Tasks:
– Producing a written essay in response to a prompt: “please write a five-paragraph essay on the topic of your choice”
– Solving an example SAT test “Please solve the 16-question SAT located at http://bit.ly/SATexam”
– Payment: $0.10 to $0.40 per HIT• Each “subdivide” or “merge” HIT received answers within 4
hours; solutions to the initial task were completed within 72 hours
• Essay: the final essay (about “university legacy admissions”) displayed a reasonably good understanding of a topic; yet the writing quality is often mixed
• SAT: the task was divided into 12 subtasks (containing 1-3 questions); the score was 12/17
Decision-Theoretic Control of Crowd-Sourced Workflows
Peng Dai, Mausam, Daniel S. WeldAAAI 2010
Motivation
• Iterative workflow (i.e., improve and vote) used in TurKit has the following problems: – What is the optimal number of iterations?– How many ballots (votes) should we use?– How do answers change if the workers are more/less
skilled?
Iterative workflow
TurKontrol: Computation Model
• Text α is improved to text α’ (after improve task)• Given a pair (α, α’), a series of votes can be received
(bk ) to judge which one is better
TurKontrol: Computation Model• Text α: quality density function: fQ(q) – prior• A worker x takes an improvement job and
submits α‘• Text α‘ done by worker x:
quality density function: fQ’|q,x(q’) – posterior • Quality density function of text α‘
TurKontrol: Computation Model• Voting:
– A series of n votes: b = b1, b2, …, bn where bi {1, 0}∈– Posterior probability after n votes: fQ|b (q) and fQ’|b (q’)
• Difficulty: – Closer the two results the more difficult to judge– d(q, q’) = 1 - |q-q’|M where M is constant; and d [0, 1], ∈
• Accuracy (of a worker x) – ax(d) = ½ [1+(1-d)r] where r is a knob for controlling accuracy dist
If the i-th worker xi has accuracy axi (d),
TurKontrol: Computation Model• For a given pair (α, α’), its posterior probabilities
(Q, Q’) are fQ|b(q) and fQ’|b(q)
where
α
Given that we don’t know the worker, an average worker is used
TurKontrol: Computation Model
Improveα α‘
Cost: c_imp
Voteα
Cost: c_b
α'
fQ(q) fQ’(q’)
fQ|b (q)fQ’|b (q’)
fQ|b+1 (q)fQ’|b+1 (q’)
Utility function:
utilit
y
quality
TurKontrol: Computation Model• Utility estimation of a pair (α, α’), for (1) improve and
(2) voting task– (2) utility of a vote task
– (1) utility of an improve task
• Decision making: – Three options: (a) vote, (b) improve, or(c) accept– k-step lookahead: evaluate all sequences of k decisions,
and find the sub-sequence with the highest utility
U: utility functioncb: vote cost
cimp: improve cost
Numerical Results• Convex utility function with max 1000• Fixed cost (improve, vote) = (30, 10) • Net utility: utility of submitted artifact –payment to workers• TurKit: performs as many iterations as possible (max allowance 400)• TurKontrol (2): lookahead of 2
cf: accuracy of workers ax(d) = ½[1+(1-d)r]
Turkalytics: Real-time Analytics for Human Computation
Paul Heymann and Hector Garcia-MolinaWWW'11
Basic Buyer human programming• A human program generates forms; advertised through a marketplace. • Workers look at posts, and then complete the forms for compensation.
Game Maker human programming• The programmer writes a human program and a game. • The game implements features to make it fun and difficult to cheat. • The human program loads and dumps data from the game.
Human Processing programming
Human Processing programming• Task description:
– Input, output, web forms, human driver, other information– Human task instance
• Human drivers: interact with workers– Functions: initialization (forms, games), retrieving results – “Human Program” accesses workers via “human drivers”
• Recruiters: post task instances into the marketplaces, (by working with marketplace drivers)– Marketplace driver provides an interface to marketplaces
(description) (instance)
Turkalytics
• Challenge: collecting reliable data about the workers and the tasks they perform
• Why?– If a task is not being completed, is it because no workers
are seeing it? Is it because the task is currently being offered at too low a price?
– How does the task completion time break down? – Do workers spend more time previewing tasks or doing
them? – Do they take long breaks? – Which are the more “reliable” workers?
Interaction Model
• Search-Preview-Accept (SPA) model
Interaction Model• Search-Continue-RapidAccept-Accept-Preview (SCRAP)
Continue completing a task that was accepted but not submitted
Accept the next task in a HITGroup w/o previewing it
Turkalytics Data Models
Turkalytics ArchitectureClient-side javascript: ta.js Log Server
Client-side javascript: ta.js
ta.js
ta.js
Ajax: POST
Log messages (JSON )
Analysis Server
Log messages (JSON )
Implementation: client-side Javascript
• Requester embeds a Turkalytics script (ta.js) into a HIT (when designing a HIT)– Monitoring: Detect relevant worker data and actions.– Sending: Log events by making image requests to the
log server (ajax: POST)
Implementation: ta.js -- client-side JavaScript
• ta.js’s monitoring activities:– Client Information: Worker’s screen resolution? What
plugins are supported? Can ta.js set cookies?– DOM Events: Over the course of a page view, the
browser emits various events (e.g., load, submit, before unload, and unload events)
– Activity: listens on a second-by-second basis for the mousemove, scroll and keydown events to determine if the worker is active or inactive.
– Form Contents: examines forms on the page and their contents; logs initial form contents, incremental updates, and final state.
Implementation: log/analysis
• Log Server:– Simple web app built on Google’s App Engine. – Receives logging events from clients running ta.js and saves them
to a data store. • IP address, user agent, and referer, etc
• Analysis Server: – Periodically polls the log server to download any new events that
have been received – Event inserted into DB, considering the following:
• Time constraints: data availability to analysis server• Dependencies: if events are dependent on one another• Incomplete input: if all events are not received yet..• Unknown input: what if unexpected input is received?
Implementation: analysis
// what type of data (event) is sent // actual data for a given type
Detailed info about task
// session ID
Experiments• Tasks:
– Named Entity Recognition (NER): This task, posted in groups of 200 by a researcher in Natural Language Processing, asks workers to label words in a Wikipedia article if they correspond to people, organizations, locations, or demonyms. (2, 000 HITs, 1 HIT Type, more than 500 workers.)
– Turker Count (TC): This task, posted once a week by a professor of business at U.C. Berkeley, asks workers to push a button, and is designed just to gauge how many workers are present in the marketplace. (2 HITs, 1 HIT Type, more than 1, 000 workers each.)
– Create Diagram (CD): This task, posted by the authors, asked workers to draw diagrams for this paper based on hand drawn sketches
Experiments: origin of workers
• GeoLite City DB from MaxMind to geolocate all remote users by IP address
Experiments: worker characteristics
Experiments: states/actions
• RapidAccept is quite popular (Continue is rare)
Experiments: # previews• Artificial recency for NER/CD (keep making them near the top in the list):
NER and CD exhibit less severe drop as opposed to TC
ArtificialRecency
Experiments: activity vs. delay
• Average active and total seconds for each worker who completed the NER task (correlation 0.88)
Discussion
• Multi-tasking users? Activity vs. working time• Privacy??– We can collect as much as we can..– How about Google Analytics? Any web pages that we visit
can collect such information…
• False data injection?• How can we better utilize the dataset?– Re-designing existing tasks, pricing, etc. (or mining user
behavior?)
Summary
• Turkomatic: divide and conquer strategy for performing more “challenging tasks” in M-Turk
• TurKontrol: decision-theoretic approach for work-flow control (e.g., how many improve/vote tasks?)
• Turkalytics: monitoring workers’ behavior remotely