+ All Categories
Home > Documents > Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation...

Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation...

Date post: 09-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
45
Crowdsourcing: Challenges & Opportunities in Web Science Ujwal Gadiraju Web Science Course Sommersemester 2016-17 April 26th, 2016 1
Transcript
Page 1: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing:Challenges & Opportunities

in Web Science

Ujwal Gadiraju

Web Science CourseSommersemester 2016-17

April 26th, 2016

1

Page 2: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Source: altamartv

2

Page 3: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Source: http://www.mission4636.org/

3

Page 4: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Dalila: I need Thomassin Apo pleaseApo: Kenscoff Route: Lat: 18.495746829274168, Long:-72.31849193572998Apo: This Area after Petion-Ville and Pelerin 5 is not on Google Map. We have no streets nameApo: I know this place like my pocketDalila: thank God u was here

“just got emergency SMS, child delivery, USCG are acting, and the GPS coordinates of the location we got from the translators were 100% accurate!”

4

Page 5: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

● People from over 50 countries participated in relief efforts

● Free phone number 4636● Maps about aid stations and

food distribution centers● Sustainability: Created 100

jobs

Ahead of the curve in all relief efforts!

Mission 4636

HOW ?!

A triumph of people working together and doing their small bits.

5

Page 6: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

CONTENTS

➢ Crowdsourcing ○ Implicit vs. Explicit Data Collection○ Intrinsic vs. Extrinsic Motivation○ Microtask Crowdsourcing

➢ Quality Control Mechanisms○ Gold Standard Questions○ Qualification Tests & Pre-screening○ Task Design○ Worker Behavioral Metrics

➢ Applications in Web Science

6

Page 7: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing - A Brief Introduction

“The whole is greater than the sum of its parts.”

- Aristotle

● Accumulating small contributions from each crowd worker to solve a bigger problem.

7

Page 8: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing - A Brief Introduction

“The whole is greater than the sum of its parts.”

- Aristotle

Accumulating small contributions from each crowd worker to solve a bigger problem.

8

Page 9: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Another popular outcome of a

crowdsourcing initiative!

9

Page 10: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing - A Definition

“Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call. “

-- Jeff Howe, 2006

10

Page 11: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Implicit vs. Explicit Data Collection

Implicit ⇒ When the crowd is unaware of what exactly their actions in given tasks are contributing to.

vs.

Explicit ⇒ When the crowd is fully aware of the goal they are trying to achieve by completing a given task.

11

Page 12: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Intrinsic vs. Extrinsic Motivation

Intrinsic ⇒ When the crowd is motivated by factors inherent to the task itself. For example, altruistic participation.

vs.

Extrinsic ⇒ When the crowd is motivated by factors external to the task. For example, monetary rewards. More than fun and money. Worker Motivation

in Crowdsourcing-A Study on Mechanical Turk. Kaufmann, Nicolas, Thimo Schulze, and Daniel Veit. AMCIS. Vol. 11. 2011.

12

Page 13: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Paid Microtask Crowdsourcing

13

Page 14: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing gone Awry

14

Example: Sochi Winter Olympics 2014 Mascot

Page 15: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Quality Control Mechanisms (1/2)

15

Challenges

○ Diverse pool of workers

○ Wide range of behavior

○ Various motivations

Ross, J., Irani, L., Silberman, M., Zaldivar, A. and Tomlinson, B. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI'10 Extended Abstracts on Human factors in computing systems. ACM.

Kazai, Gabriella, Jaap Kamps, and Natasa Milic-Frayling. The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy. Proceedings of CIKM’12. ACM.

Page 16: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Quality Control Mechanisms (2/2)

Gold-standard Questions

⇒ Relying on questions with priorly known answers to filter out low quality workers.

Qualification Tests/Pre-screening ⇒ Relying on screening to predict crowd work quality.

Task Design & Behavioral Metrics

⇒ Using task design and worker behavior to ensure good quality.

16

Oleson, David, et al. “Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing." Human computation (2011).

Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys. Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. In CHI’15.

Page 17: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Survey Design

➢ CrowdFlower Platform to deploy survey

➢ Survey questions○ Demographics○ Educational & general background

➢ 34 Questions in total○ Open-ended○ Multiple Choice○ Likert-type

➢ Responses from 1000 crowd workers

○ Monetary Compensation per

worker : 0.2 USD 17

Page 18: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

❏ Questions regarding previous tasks that were successfully completed

❏ 2 Attention-check questions ❏ Engage workers

❏ Gold-standard to separate

Trustworthy/Untrustworthy workers (we found

568 trustworthy, 432 untrustworthy)

18

Page 19: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Analyzing Malicious Behavior in the Crowd

Based on the following aspects, we investigated the behavioral patterns of crowd workers.

19

I. eligibility of a worker to participate in a task

II. conformation to the pre-set rules

III. satisfying expected requirements fully

Page 20: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Malicious Workers

“workers with ulterior motives, who either simply sabotage a task, or provide poor responses in an

attempt to quickly attain task completion for monetary gains”

20

➢ Typically adopted solution to prevent/flag malicious activity : Gold-Standard Questions

➢ Flourishing crowdsourcing markets, advances in malicious activity

Need to understand workers behavior and types of malicious activity.

Page 21: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Worker Behavioral Patterns

21

Ineligible Workers (IW)

Fast Deceivers (FD)

Rule Breakers (RB)

Smart Deceivers (SD)

Gold Standard Preys (GSP)

Instruction: Please attempt this microtask ONLY IF you have successfully completed 5 microtasks previously.Response: ‘this is my first task’

eg: Copy-pasting same text in response to multiple questions, entering gibberish, etc.Response: ‘What’s your task?’ , ‘adasd’, ‘fgfgf gsd ljlkj’

Instruction: Identify 5 keywords that represent this task (separated by commas).Response: ‘survey, tasks, history’ , ‘previous task yellow’

Instruction: Identify 5 keywords that represent this task (separated by commas).Response: ‘one, two, three, four, five’

These workers abide by the instructions and provide valid responses, but stumble at the gold-standard questions!

Page 22: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Our Observations

22

We manually annotated each response from the 1000 workers.

➢ 568 workers passed the gold-standard: Trustworthy workers (TW)

➢ 432 workers failed to pass the gold-standard: Untrustworthy workers (UW)

➢ 335 trustworthy workers gave perfect responses: Elite workers

➢ 665 non-elite workers (233 TW, 432 UT) were manually classified into the different classes according to their behavioral patterns.

Page 23: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Distribution of Workers

23

Page 24: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Acceptability : “The acceptability of a response can be assessed based on the extent to which a response meets the priorly stated expectations.”E.g.

Instruction: Please attempt this microtask ONLY IF you have successfully completed 5 microtasks previously. Response: ‘survey, tasks, history’ ⇒ ‘0’ Response: ‘previous, job, finding, authors, books’ ⇒ ‘1’

where, n is the total number of responses from a worker and Ari represents the acceptability of response ‘i’

We consider only open-

ended questions!

Measuring the Maliciousness of Workers

24

Page 25: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Degree of maliciousness of trustworthy (TW) and untrustworthy workers (UW) and their average task completion time (r=0.51).

Degree of Maliciousness of Crowd Workers

25

Page 26: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Tipping Point“the first point at which a worker begins to exhibit

malicious behavior after having provided an acceptable response”

26

Page 27: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Task Design Guidelines

❏ Using the ‘Tipping Point’ for early detection of malicious activity.

❏ Using ‘Malicious Intent’ as a measure to discard unreliable

responses from workers and improve the quality of results.

❏ Pre-screening to tackle Ineligible Workers (IW).

❏ Stringent and persistent validators and monitoring worker

progress to tackle Fast Deceivers (FD) and Rule Breakers (RB).

❏ Psychometric approaches to tackle Smart Deceivers (SD).

❏ Post-processing to accommodate fair responses from Gold -

standard Preys (GSP).

27

Page 28: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Application of Crowdsourcing

in Web Science…

Ranking Buildings &

Mining the Web for Popular

Architectural Patterns

28

Ranking Buildings and Mining the Web for Popular Architectural Patterns. Ujwal Gadiraju, Stefan Dietze and Ernesto Diaz-Aviles. WebScience 2015, Oxford, UK.

Page 29: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Camillo Sitte

Main works are “an aesthetic criticism” of 19th century

urbanism. The whole is much more than

the sum of it’s parts.

“City Planning according to artistic principles.”

29

Page 30: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Form follows function VS Ornamentalism

Louis Sullivan

Father of Modernism. Father of Skyscrapers.

“That life is recognizable in its expression,

That form ever follows function.

This is the law.”

30

Page 31: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Built Environment

Space SyntaxIMPLICATIONS

● Urban planning● Impact of an architectural structure● Identify needs for restructuring,

adequate maintenance and trigger retrofit scenarios

● Predict impact of building projects

31

Page 32: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

What do People Think About Buildings?

● (On the way)/(at) home, work, play.● Buildings invoke feelings [1,2].● Research has established that

buildings shape the built environment.

● Built environment influences various aspects within a community.

[1]. Brain electrical responses to high-and low-ranking buildings. Oppenheim et al. Clinical EEG and Neuroscience, 2009.

[2]. Hippocampal contributions to the processing of architectural ranking. Oppenheim et al. NeuroImage, 2010.

32

Page 33: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Surveying Experts to establish Influential Factors

Building Types

- Skyscrapers

- Bridges

- Churches

- Halls

- Airports

Emerging factors :

● Historic importance● Effect on/of the

surroundings/built environment

● Materials used● Size of the building/structure● Personal experiences● Level of Details Emerging factors :

- Ease of access to airport- Efficiency of movement/processing inside airport- General design & Appearance

33

Page 34: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Crowdsourcing Ground Truth

● 5-point Likert Scale (Strongly Dislike - Strongly Like)

● Gold Standards and precautions to detect and curtail malicious workers or bots [1].

● Images presented with same resolution and dimensions [2].

● Avoid bias by using images from Wikimedia Commons.

● 18,500 trusted responses from 7,396 workers.

[1]. Understanding Malicious Behavior on Crowdsourcing Platforms - The Case of Online Surveys. Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze and Gianluca Demartini. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 2015.[2]. "Size does matter: how image size affects aesthetic perception?." Chu, Wei-Ta, Yu-Kuang Chen, and Kuan-Ta Chen. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.

34

Page 35: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Emerging Influential Factors

35

Page 36: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Processing Pipeline for Automated Ranking of Buildings

Crowdsourcing Web Mining

● News Articles and Blogs

● Tweets

● Meta-data from flickr images (title, description, tags favorites, comments)

36

Page 37: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Automated Ranking-Workflow

DatasetCharacteristics

37

Page 38: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Models for Ranking Buildings

● Based on perception-related metadata from relevant Flickr images.

● Sentic feature vectors using EmoLex.● RankSVM to learn model(s).● Feature selection for construction of different

models.● Best performing model : Weighted Model

(weighted combination of feature vectors according to influential factors)

38

Page 39: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Properties

39

Influential Factors

Ground Truth (Crowdsourcing)

Ranking Models

Ranked List

CORRELATE

Well-perceived patterns for Architectural Structures

top-k

Page 40: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

DBpedia properties corresponding to Influential Factors

Caveat :

Coverage of DBpedia properties w.r.t. influential factors is limited

SIZE

dbpedia-owl: runwayLength

dbpedia-owl: Length

dbprop: architectureStyle

dbprop: seatingCapacity

dbpedia: floorCount

40

Page 41: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Consolidation of Patterns

CHURCHES: Best-perceived Architectural Styles

● Gothic Revival● Romanesque● Gothic

41

Page 42: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Consolidation of Patterns

42

Page 43: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Conclusions & Future Work

● Functionalism vs Ornamentalism?● Correlating building rankings with

structured data from the Web can help us to establish popular architectural patterns.

● Building type-specific methods are important.

● Multidimensional architectural patterns through regression of influential factors.

● Using Web Data (both social and structured) in order to fill in the missing gaps.

For example,

buildings with x size, y uniqueness, z materials used, … are best perceived. 43

Page 44: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

SUMMARY

➢ Crowdsourcing ○ Implicit vs. Explicit Data Collection○ Intrinsic vs. Extrinsic Motivation○ Microtask Crowdsourcing

➢ Quality Control Mechanisms○ Gold Standard Questions○ Qualification Tests & Pre-screening○ Task Design○ Worker Behavioral Metrics

➢ Applications in Web Science

44

Page 45: Crowdsourcing: Challenges & Opportunities in Web Science€¦ · Crowdsourcing." Human computation (2011). Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of

Contact Details :

[email protected]

http://www.L3S.de


Recommended