Download - Task and Workflow Design II KSE 652 Social Computing System Design and Analysis Uichin Lee.

Task and Workflow Design II

KSE 652 Social Computing System Design and Analysis

Uichin Lee

Contents

• Turkomatic: divide and conquer strategy for performing more “challenging tasks” in M-Turk

• TurKontrol: decision-theoretic approach for work-flow control (e.g., how many improve/vote tasks?)

• Turkalytics: monitoring workers’ behavior remotely

Turkomatic: Automatic Recursive Task and Workflow Design for Mechanical Turk

CHI'11 WIP

Turkomatic• Turkomatic interface accepts task requests

written in natural language• Subdivide phase:

– For each request, it posts a HIT to M-Turk, asking workers to break the task down into a set of logical subtasks

– Each subtask is then automatically reposted to M-Turk; subtask can be further broken down

• Merge phase: – Once all subtasks are completed, HITs are

posted asking workers to combine subtask solutions into a coherent whole

• The end result will then be delivered to the requester

Subdivide Phase

• Decomposition of tasks, and the creation of solution elements

Divide and Merge

Divide and Merge

Evaluation• Tasks:

– Producing a written essay in response to a prompt: “please write a five-paragraph essay on the topic of your choice”

– Solving an example SAT test “Please solve the 16-question SAT located at http://bit.ly/SATexam”

– Payment: $0.10 to $0.40 per HIT• Each “subdivide” or “merge” HIT received answers within 4

hours; solutions to the initial task were completed within 72 hours

• Essay: the final essay (about “university legacy admissions”) displayed a reasonably good understanding of a topic; yet the writing quality is often mixed

• SAT: the task was divided into 12 subtasks (containing 1-3 questions); the score was 12/17

http://bit.ly/SATexam

Decision-Theoretic Control of Crowd-Sourced Workflows

Peng Dai, Mausam, Daniel S. WeldAAAI 2010

Motivation

• Iterative workflow (i.e., improve and vote) used in TurKit has the following problems: – What is the optimal number of iterations?– How many ballots (votes) should we use?– How do answers change if the workers are more/less

skilled?

Iterative workflow

TurKontrol: Computation Model

• Text α is improved to text α’ (after improve task)• Given a pair (α, α’), a series of votes can be received

(bk ) to judge which one is better

TurKontrol: Computation Model• Text α: quality density function: fQ(q) – prior• A worker x takes an improvement job and

submits α‘• Text α‘ done by worker x:

quality density function: fQ’|q,x(q’) – posterior • Quality density function of text α‘

TurKontrol: Computation Model• Voting:

– A series of n votes: b = b1, b2, …, bn where bi {1, 0}∈– Posterior probability after n votes: fQ|b (q) and fQ’|b (q’)

• Difficulty: – Closer the two results the more difficult to judge– d(q, q’) = 1 - |q-q’|M where M is constant; and d [0, 1], ∈

• Accuracy (of a worker x) – ax(d) = ½ [1+(1-d)r] where r is a knob for controlling accuracy dist

If the i-th worker xi has accuracy axi (d),

TurKontrol: Computation Model• For a given pair (α, α’), its posterior probabilities

(Q, Q’) are fQ|b(q) and fQ’|b(q)

where

α

Given that we don’t know the worker, an average worker is used

TurKontrol: Computation Model

Improveα α‘

Cost: c_imp

Voteα

Cost: c_b

α'

fQ(q) fQ’(q’)

fQ|b (q)fQ’|b (q’)

fQ|b+1 (q)fQ’|b+1 (q’)

Utility function:

utilit

y

quality

TurKontrol: Computation Model• Utility estimation of a pair (α, α’), for (1) improve and

(2) voting task– (2) utility of a vote task

– (1) utility of an improve task

• Decision making: – Three options: (a) vote, (b) improve, or(c) accept– k-step lookahead: evaluate all sequences of k decisions,

and find the sub-sequence with the highest utility

U: utility functioncb: vote cost

cimp: improve cost

Numerical Results• Convex utility function with max 1000• Fixed cost (improve, vote) = (30, 10) • Net utility: utility of submitted artifact –payment to workers• TurKit: performs as many iterations as possible (max allowance 400)• TurKontrol (2): lookahead of 2

cf: accuracy of workers ax(d) = ½[1+(1-d)r]

Turkalytics: Real-time Analytics for Human Computation

Paul Heymann and Hector Garcia-MolinaWWW'11

Basic Buyer human programming• A human program generates forms; advertised through a marketplace. • Workers look at posts, and then complete the forms for compensation.

Game Maker human programming• The programmer writes a human program and a game. • The game implements features to make it fun and difficult to cheat. • The human program loads and dumps data from the game.

Human Processing programming

Human Processing programming• Task description:

– Input, output, web forms, human driver, other information– Human task instance

• Human drivers: interact with workers– Functions: initialization (forms, games), retrieving results – “Human Program” accesses workers via “human drivers”

• Recruiters: post task instances into the marketplaces, (by working with marketplace drivers)– Marketplace driver provides an interface to marketplaces

(description) (instance)

Turkalytics

• Challenge: collecting reliable data about the workers and the tasks they perform

• Why?– If a task is not being completed, is it because no workers

are seeing it? Is it because the task is currently being offered at too low a price?

– How does the task completion time break down? – Do workers spend more time previewing tasks or doing

them? – Do they take long breaks? – Which are the more “reliable” workers?

Interaction Model

• Search-Preview-Accept (SPA) model

Interaction Model• Search-Continue-RapidAccept-Accept-Preview (SCRAP)

Continue completing a task that was accepted but not submitted

Accept the next task in a HITGroup w/o previewing it

Turkalytics Data Models

Turkalytics ArchitectureClient-side javascript: ta.js Log Server

Client-side javascript: ta.js

ta.js

ta.js

Ajax: POST

Log messages (JSON )

Analysis Server

Log messages (JSON )

Implementation: client-side Javascript

• Requester embeds a Turkalytics script (ta.js) into a HIT (when designing a HIT)– Monitoring: Detect relevant worker data and actions.– Sending: Log events by making image requests to the

log server (ajax: POST)

Implementation: ta.js -- client-side JavaScript

• ta.js’s monitoring activities:– Client Information: Worker’s screen resolution? What

plugins are supported? Can ta.js set cookies?– DOM Events: Over the course of a page view, the

browser emits various events (e.g., load, submit, before unload, and unload events)

– Activity: listens on a second-by-second basis for the mousemove, scroll and keydown events to determine if the worker is active or inactive.

– Form Contents: examines forms on the page and their contents; logs initial form contents, incremental updates, and final state.

Implementation: log/analysis

• Log Server:– Simple web app built on Google’s App Engine. – Receives logging events from clients running ta.js and saves them

to a data store. • IP address, user agent, and referer, etc

• Analysis Server: – Periodically polls the log server to download any new events that

have been received – Event inserted into DB, considering the following:

• Time constraints: data availability to analysis server• Dependencies: if events are dependent on one another• Incomplete input: if all events are not received yet..• Unknown input: what if unexpected input is received?

Implementation: analysis

// what type of data (event) is sent // actual data for a given type

Detailed info about task

// session ID

Experiments• Tasks:

– Named Entity Recognition (NER): This task, posted in groups of 200 by a researcher in Natural Language Processing, asks workers to label words in a Wikipedia article if they correspond to people, organizations, locations, or demonyms. (2, 000 HITs, 1 HIT Type, more than 500 workers.)

– Turker Count (TC): This task, posted once a week by a professor of business at U.C. Berkeley, asks workers to push a button, and is designed just to gauge how many workers are present in the marketplace. (2 HITs, 1 HIT Type, more than 1, 000 workers each.)

– Create Diagram (CD): This task, posted by the authors, asked workers to draw diagrams for this paper based on hand drawn sketches

Experiments: origin of workers

• GeoLite City DB from MaxMind to geolocate all remote users by IP address

Experiments: worker characteristics

Experiments: states/actions

• RapidAccept is quite popular (Continue is rare)

Experiments: # previews• Artificial recency for NER/CD (keep making them near the top in the list):

NER and CD exhibit less severe drop as opposed to TC

ArtificialRecency

Experiments: activity vs. delay

• Average active and total seconds for each worker who completed the NER task (correlation 0.88)

Discussion

• Multi-tasking users? Activity vs. working time• Privacy??– We can collect as much as we can..– How about Google Analytics? Any web pages that we visit

can collect such information…

• False data injection?• How can we better utilize the dataset?– Re-designing existing tasks, pricing, etc. (or mining user

behavior?)

Summary

• Turkomatic: divide and conquer strategy for performing more “challenging tasks” in M-Turk

• TurKontrol: decision-theoretic approach for work-flow control (e.g., how many improve/vote tasks?)

• Turkalytics: monitoring workers’ behavior remotely