Sybil In Online Social Networks (OSNs) - NDSS SymposiumSybil In Online Social Networks (OSNs) 1...

Sybil In Online Social Networks (OSNs) 1

  Sybil (sɪbəl): fake identities controlled by attackers  Friendship is a pre-cursor to other malicious activities  Does not include benign fakes (secondary accounts)

  Research has identified malicious Sybils on OSNs  Twitter [CCS 2010]  Facebook [IMC 2010]  Renren [IMC 2011], Tuenti [NSDI 2012]

Real-world Impact of Sybil (Twitter) 2

  Russian political protests on Twitter (2011)  25,000 Sybils sent 440,000 tweets  Drown out the genuine tweets from protesters

Follo

wer

s July 21st

Jul-4 Jul-8 Jul-12 Jul-16 Jul-20 Jul-24 Jul-28 Aug-1

100,000 new followers in 1 day

900K

800K

700K

4,000 new followers/day

Security Threats of Sybil (Facebook)

  Large Sybil population on Facebook  August 2012: 83 million (8.7%)

  Sybils are used to:  Share or Send Spam  Theft of user’s personal information  Fake like and click fraud

3

50 likes per dollar

Malicious URL

Community-based Sybil Detectors

  Prior work on Sybil detectors   SybilGuard [SIGCOMM’06], SybilLimit [Oakland '08], SybilInfer [NDSS’09]

  Key assumption: Sybils form tight-knit communities   Sybils have difficulty “friending” normal users?

4

Do Sybils Form Sybil Communities? 5

  Measurement study on Sybils in the wild [IMC’11]   Study Sybils in Renren (Chinese Facebook)

  Ground-truth data on 560K Sybils collected over 3 years

  Sybil components: sub-graphs of connected Sybils

5

1

10

100

1000

10000

1 10 100 1000 10000

Edg

es T

o N

orm

al U

sers

Edges Between Sybils

  Sybil components are internally sparse   Not amenable to community detection   New Sybil detection system is needed

Detect Sybils without Graphs

  Anecdotal evidence that people can spot Sybil profiles   75% of friend requests from Sybils are rejected  Human intuition detects even slight inconsistencies in Sybil profiles

  Idea: build a crowdsourced Sybil detector   Focus on user profiles   Leverage human intelligence and intuition

  Open Questions  How accurate are users?

  What factors affect detection accuracy?

 How can we make crowdsourced Sybil detection cost effective?

6

Outline 7

  Introduction

  User Study

  Feasibility Experiment

  Accuracy Analysis

  Factors Impacting User Accuracy

  Scalable Sybil Detection System

  Conclusion

Details in Paper

User Study Setup*

  User study with 2 groups of testers on 3 datasets   2 groups of users

  Experts – Our friends (CS professors and graduate students)   Turkers – Crowdworkers from online crowdsourcing systems

  3 ground-truth datasets of full user profiles   Renren – given to us by Renren Inc.   Facebook US and India – crawled

 Sybils profiles – banned profiles by Facebook  Legitimate profiles – 2-hops from our own profiles

8

Data collection details *IRB Approved

9

Classifying Profiles

Browsing Profiles

Screenshot of Profile (Links Cannot be Clicked)

Real or fake? Why?

Navigation Buttons

Experiment Overview

Dataset # of Profiles Test Group # of Testers

Profile per

Tester Sybil Legit.

Renren 100 100 Chinese Expert 24 100

Chinese Turker 418 10

Facebook US

32 50 US Expert 40 50

US Turker 299 12

Facebook India

50 49 India Expert 20 100

India Turker 342 12

10

More Profiles per Experts

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80 90 100

CD

F (%

)

Accuracy Per Teser (%)

Turker

Expert

Individual Tester Accuracy 11

Much Lower Accuracy

Excellent! 80% of experts have

>80% accuracy!

• Experts prove that humans can be accurate • Turkers need extra help…

Wisdom of the Crowd

  Is wisdom of the crowd enough?

  Majority voting  Treat each classification by each tester as a vote  Majority vote determines final decision of the crowd

  Results after majority voting (20 votes)  Both Experts and Turkers have almost zero false positives  Turker’s false negatives are still high

 US (19%), India (50%), China (60%)

12

• False positive rates are excellent • What can be done to improve turker accuracy?

Eliminating Inaccurate Turkers 13

0

20

40

60

80

100

0 10 20 30 40 50 60 70

Maj

ority

Vot

e Fa

lse

Neg

ativ

e (%

)

Turker Accuracy Threshold (%)

China India US

Dramatic Improvement

Removing inaccurate turkers can effectively reduce false negatives!

Outline 14

  Introduction

  User Study

  Scalable Sybil Detection System   System Design   Trace-driven Simulation

  Conclusion

A Practical Sybil Detection System 15

1.  Scalability   Must scale to millions of users   High accuracy with low costs

2.  Preserve user privacy when giving data to turkers

Key insight to designing our system •  Accuracy in turker population highly skewed •  Only 10% turkers > 90% accurate 0

20 40 60 80

100

0 10

20

30

40

50

60

70

80

90

100

Accuracy (%) C

DF

(%)

Details in Paper

16

Social Network Heuristics

User Reports Suspicious Profiles

All Turkers

OSN Employees

Turker Selection Accurate Turkers

Very Accurate Turkers

Sybils

System Architecture

Flag Suspicious Users

Crowdsourcing Layer Maximize Utility of High Accuracy Turkers

Rejected!

•  Continuous Quality Control •  Locate Malicious Workers

Trace Driven Simulations

 Simulation on 2000 profiles  Error rates drawn from survey data  Calibrate 4 parameters to:

 Minimize false positives & false negatives  Minimize votes per profile (minimize cost)

17

Results (Details in Paper) •  Average 6 votes per profile •  <1% false positives •  <1% false negatives

Accurate Turkers

Very Accurate Turkers

Results++ •  Average 8 votes per profile •  <0.1% false positives •  <0.1% false negatives

Estimating Cost

  Estimated cost in a real-world social networks: Tuenti   12,000 profiles to verify daily   14 full-time employees  Annual salary 30,000 EUR* (~$20 per hour) $2240 per day

  Crowdsourced Sybil Detection   20sec/profile, 8 hour day 50 turkers   Facebook wage ($1 per hour) $400 per day

18

Augment existing automated systems

Cost with malicious turkers •  25% of turkers are malicous •  $504 per day

*http://www.glassdoor.com/Salary/Tuenti-Salaries-E245751.htm

Conclusion 19

  Designed a crowdsourced Sybil detection system  False positives and negatives <1%  Resistant to infiltration by malicious workers  Low cost

  Currently exploring prototypes in real-world OSNs

Questions? 20

Thank you!

Ground-truth Data Collection (Legit.)

  Facebook Crawl

Seeds from Lab

8 Seeds

1-hop friends

86k 2-hop friends

Random Selection

100 Legitimate Profiles

50 US 50

India

50 US 50 IN

21

Ground-truth Data Collection (Sybil)

  Facebook Crawl

Profile Pictures

Publicly Available Image

Do not consider Facebook links

Google Search By Image

Users

If >90% of pictures on web

Suspicious Profiles

Suspicious Profiles Dataset

Confirmed Sybils

573 Confirmed Sybils

Banned by Facebook

22

Preserving User Privacy 23

  Showing profiles to crowdworkers raises privacy issues   Solution: reveal profile information in context

! Crowdsourced

Evaluation

! Crowdsourced

Evaluation

Public Profile Information

Friend-Only Profile

Information Friends

Survey Fatigue 24

US Experts US Turkers

0

20

40

60

80

100

0

20

40

60

80

100

0 2 4 6 8 10

Acc

urac

y (%

)

Tim

e pe

r Pr

ofile

(s)

Profile Order

No fatigue

0

20

40

60

80

100

0

20

40

60

80

100

0 10 20 30 40

Acc

urac

y (%

)

Tim

e pe

r Pr

ofile

(s)

Profile Order Accuracy Time

Fatigue matters All testers speed up over time

Wisdom of the Crowd

  Treat each classification by each tester as a vote   Majority vote determines final decision

25

Dataset Test Group False

Positives False

Negatives

Renren Chinese Expert 0% 3%

Chinese Turker 0% 63%

Facebook US

US Expert 0% 10%

US Turker 2% 19%

Facebook India

India Expert 0% 16%

India Turker 0% 50%

Almost Zero False Positives Experts

Perform Okay

Turkers Miss Lots of Sybils

• False positive rates are excellent • Turkers need extra help against false negatives • What can be done to improve accuracy?

Sybil Profile Difficulty 26

0 10 20 30 40 50 60 70 80 90

100

0 5 10 15 20 25 30 35

Ave

rage

Acc

urac

y pe

r Sy

bil (

%)

Sybil Profiles Ordered By Turker Accuracy

Turker Expert

Experts perform well on most difficult Sybils

Really difficult profiles

• Some Sybils are more stealthy • Experts catch more tough Sybils than turkers

How Many Votes Do You Need? 27

0

20

40

60

80

100

2 4 6 8 10 12 14 16 18 20 22 24

Erro

r Ra

te (

%)

Votes per Profile

China India

US

False Negatives

False Positives

• Only need a few votes • False positives reduce quickly • Fewer votes = less cost

Individual Tester Accuracy 28

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80 90 100

CD

F (%

)

Accuracy per Tester (%)

Chinese Turker

Chinese Expert

Much Lower Accuracy

• Experts prove that humans can be accurate • Turkers need extra help…

Excellent! 80% of experts have

>90% accuracy!

Date post:	28-May-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Sybil In Online Social Networks (OSNs) - NDSS SymposiumSybil In Online Social Networks (OSNs) 1...

Documents