Understanding and Defending Against Malicious ...gangw/poster_turf.pdf · - SDH (sandaha.com), two...

Date post:	02-Feb-2018
Category:	Documents
Upload:	lydang
View:	214 times
Download:	0 times

Download Report this document

Share this document with a friend

Embed Size (px):

2. Understanding Crowdturfing 1. Malicious Crowdsourcing 3. Defense: Machine Learning Classifiers Adversarial Machine Learning Understanding and Defending Against Malicious Crowdsourcing University of California at Santa Barbara More accurate classifiers can be more vulnerable Ben Y. Zhao (PI), Haitao Zheng (CoPI), Gang Wang (PhD student), SANDLab http://sandlab.cs.ucsb.edu [email protected] [1] G. WANG, T. WANG, H. ZHENG, B. ZHAO. Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In Proc. of Usenix Security (2014) [2] G. WANG, C. WILSON, X. ZHAO, Y. ZHU, M. MOHANLAL, H. ZHENG, B. ZHAO. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW (2012) Summary New Threat: Malicious Crowdsourcing = Crowdturfing + Hire a large group of real Internet users for malicious attacks + Fake reviews, rumors, targerted spam + Most existing defenses failed against real users (e.g., CAPTCHA) Research Questions + How does crowdturfing work? [1] + What’s the scale, economics and impact of crowturfing campaigns? [1] + How to defend against crowdturfing? [2] Crowdturfing Sites + Web services that recruit Internet users as workers (spam for $) + Connect workers to customers who want to run malicious campaigns Key Players + Customers: pay to run a campaign + Workers: real users, spam for $ + Target Networks: social networks, revew sites Scale and Revenue + Measurements of two largest crowdturfing sites (in China) - ZBJ (zhubajie.com), five years - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per month Crowdturfing around the World ZBJ, SDH Fiverr, Freelancer, MinuteWorkers, Myeasytasks, Microworkers, Shorttasks Paisalive Machine Learning (ML) vs. Crowdturfing + Simple method does not work on real users (e.g., CAPTCHA, rate limit) + Machine learning: more sophiscaed modeling on user behaviors + Perfect context to study adversarial machine learning - Human workers are adaptive to evade classifiers - Crowdturf admins can temper with training data by chaning worker behaviors How Effective is ML-based Detecor? + Groundtruth: 28K workers in crowdturfing campaigns on Weibo (Chinenes Twitter) + Baseline users: 371K Weibo user accounts + 30 behavioral features + Classiiers: Random Forest, Decision Tree, SVM, Naive Bayes, Bayesian Network 0% 10% 20% 30% 40% 50% 60% RF Tree SVMr SVMp NB BN False Posive Rate False Negave Rate + Random Forest is the most accurate (95% accuracy) + 99% accuracy on professional workers (>100 tasks) + How robust are those classifiers? Classiﬁer Training Data Training e.g. SVM Poisning Aack Evasion Aack + Evasion attack: individual workers change behaviors to evade the detection - Impact: single feature-change saves 95% of workers + Poisoning attack: site admins tamper with training data to mislead classifier training + Machine learning classifiers are effective against current crowd-workers + Classifiers are highly vulnerable to adversarial attacks. Future works will focus on improving the robustiness of ML-classifiers Example: Poisoning Attack + Inject mislabeled samples to training data wrong classifier e.g., inject benign accounts as “workers” in training data + Uniformly change workers behavior by enforcing task policies hard to train an accurate classifier 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 False Posive Rate Rao of Injected Sample to Turﬁng Tree SVMp RF SVMr Customer Crowdturfing Site Target Networks Crowd-workers 1 10 100 1000 10000 100000 1000000 1 10 100 1000 10000 Jan. 2008 Jan. 2009 Jan. 2010 Jan. 2011 Campaigns per Month Dollars per Month Site Growth Over Time Campaigns Campaigns $ $ ZBJ SDH Model Training Detecon

Transcript

Page 1: Understanding and Defending Against Malicious ...gangw/poster_turf.pdf · - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per

2. Understanding Crowdturfing1. Malicious Crowdsourcing

3. Defense: Machine Learning Classifiers Adversarial Machine Learning

Understanding and Defending Against Malicious CrowdsourcingUniversity of California at Santa Barbara

More accurate classifiers can be more vulnerable

Ben Y. Zhao (PI), Haitao Zheng (CoPI), Gang Wang (PhD student),

SANDLabhttp://[email protected]

[1] G. WANG, T. WANG, H. ZHENG, B. ZHAO. Man vs. machine: practical adversarial detection of malicious crowdsourcing workers. In Proc. of Usenix Security (2014)[2] G. WANG, C. WILSON, X. ZHAO, Y. ZHU, M. MOHANLAL, H. ZHENG, B. ZHAO. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW (2012)

Summary

New Threat: Malicious Crowdsourcing = Crowdturfing + Hire a large group of real Internet users for malicious attacks + Fake reviews, rumors, targerted spam + Most existing defenses failed against real users (e.g., CAPTCHA)

Research Questions + How does crowdturfing work? [1]

+ What’s the scale, economics and impact of crowturfing campaigns? [1] + How to defend against crowdturfing? [2]

Crowdturfing Sites + Web services that recruit Internet users as workers (spam for $) + Connect workers to customers who want to run malicious campaigns

Key Players + Customers: pay to run a campaign + Workers: real users, spam for $ + Target Networks: social networks, revew sites

Scale and Revenue + Measurements of two largest crowdturfing sites (in China) - ZBJ (zhubajie.com), five years - SDH (sandaha.com), two yeras + 18.5M tasks, 79K campaigns, 180K workers + Millions dollars of revenue per month

Crowdturfing around the World

ZBJ, SDH Fiverr, Freelancer, MinuteWorkers, Myeasytasks, Microworkers, Shorttasks Paisalive

Machine Learning (ML) vs. Crowdturfing+ Simple method does not work on real users (e.g., CAPTCHA, rate limit)+ Machine learning: more sophiscaed modeling on user behaviors+ Perfect context to study adversarial machine learning - Human workers are adaptive to evade classifiers - Crowdturf admins can temper with training data by chaning worker behaviors

How Effective is ML-based Detecor?+ Groundtruth: 28K workers in crowdturfing campaigns on Weibo (Chinenes Twitter)+ Baseline users: 371K Weibo user accounts+ 30 behavioral features+ Classiiers: Random Forest, Decision Tree, SVM, Naive Bayes, Bayesian Network

0%10%20%30%40%50%60%

RF Tree SVMr SVMp NB BN

False Positive RateFalse Negative Rate

+ Random Forest is the most accurate (95% accuracy) + 99% accuracy on professional workers (>100 tasks) + How robust are those classifiers?

Classifier

Training Data

Traininge.g. SVM

Poisning Attack

Evasion Attack

+ Evasion attack: individual workers change behaviors to evade the detection - Impact: single feature-change saves 95% of workers

+ Poisoning attack: site admins tamper with training data to mislead classifier training

+ Machine learning classifiers are effective against current crowd-workers+ Classifiers are highly vulnerable to adversarial attacks. Future works will focus on improving the robustiness of ML-classifiers

Example: Poisoning Attack+ Inject mislabeled samples to training data wrong classifier e.g., inject benign accounts as “workers” in training data + Uniformly change workers behavior by enforcing task policies hard to train an accurate classifier

0 0.2 0.4 0.6 0.8 1

Fals

e Po

sitiv

e Ra

Ratio of Injected Sample to Turfing

Tree

SVMpRF

SVMr

CustomerCrowdturfing Site Target Networks

Crowd-workers

100

1000

10000

100000

1000000

100

1000

10000

Jan. 2008 Jan. 2009 Jan. 2010 Jan. 2011

Cam

paig

ns p

er M

onth

Dolla

rs p

er M

onth

Site Growth Over Time

CampaignsCampaigns

$ZBJ

SDH

Model Training

Detection

Understanding and Defending Against Malicious ...gangw/poster_turf.pdf · - SDH (sandaha.com), two...

Documents