+ All Categories
Home > Documents > Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From...

Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From...

Date post: 05-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
28
Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems Mshabab Alrizah Sencun Zhu Xinyu Xing Gang Wang
Transcript
Page 1: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Ad-blocking Systems

M s h a b a b A l r i z a h S e n c u n Z h u X i n y u X i n g G a n g W a n g

Page 2: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Conclusion

Outline

2

Objectives

Datasets

Analysis: FP & FN errors

Analysis: Evasions

Methodology

Background Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

O

bje

ctiv

es

Dat

aset

s FP

& F

N e

rro

rs

Evas

ion

M

eth

od

olo

gy

Bac

kgro

un

d

Page 3: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

3

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Ad-Blocking System

Place your screenshot here

3

Page 4: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

4

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Ad-Blocking System

Place your screenshot here

4

Page 5: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Crowdsourcing and Ad-blocking Systems

5

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Want big impact?

Use big image.

Want big impact?

Use big image.

Page 6: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Previous Work Studied …

• Relationships among Internet users, ad publishers,

and ad blocker

• Economic ramifications of the ad-blocking systems

• Different problems or complementary

solutions.

• Specific cases of ad blocking.

• e.g., trackerblocking, anti-adblocking

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

6

Page 7: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Yet…

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

7

• Remains a lack of deep understanding on:

– Filter list effectiveness

– The crowdsourcing functionality and contribution

– The potential pitfalls and security vulnerabilities

Page 8: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Objectives Provide an in-depth study on the dynamic changes of the filter-list to answer the flowing key questions.

― Q1: How prevalent are the errors of missing real advertisements( false negative

(FN) errors) and the errors of blocking legitimate content( false positive (FP)

errors)?

― Q2: What are the primary sources of FP errors?

― Q3: How effective is crowdsourcing in detecting and mitigating FP and FN

errors?

― Q4: How robust is the filter-list against evasion attacks?

8

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Page 9: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Objectives

9

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

― Q1: (prevalence of FN and FP errors)?

― Q2 (primary sources of FP errors)?

― Q3: (crowdsourcing effectiveness)?

― Q4: (Robustness of the filter-list)?

Page 10: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Methodology

10

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

• Collect and track dynamic changes of the filter list ( EasyList) – Collecting 117,683 versions of EasyList( 2009 to 2018).

– Cleaning and extracting those versions created to correct FP and FN errors

• Extract filter rules added or removed and build a record for each rule. – Each record contains information about the rule (e.g. time of creation, deletion,

EasyList versions .

Datasets Collecting and Cleaning

• Collect posts of FP and FN errors in EasyList forum.

• 23,240 topics with at least one report.

• Extract the reports from the posts and build a record of each report.

• The report record contains information such the contributor profile, webpage has the error, EasyList editor responses….

Dataset D1

Dataset D2

Page 11: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Problem: Many reports do not have evidences

of correction

Q1: Error Prevalence

11

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

• To answer the question we need to know:

1. Types of the errors

2. Websites with the errors

Filter’s Record

Dataset D1

Dataset D2

Report’s Record Error Record

Dataset D2A

Solution: Link Reports with EasyList

Page 12: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q2: Primary Sources of FP Errors

12

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

• Required knowledge: – The web page that has the FP error(s).

– The element impacted.

– The filter that caused the error.

– The EasyList versions created to fix the error(s).

• Reproduced FPs using: Chrome Extension

FP Error record

Controller and checker

Dataset D2A

Old EasyList Version

Webpage

Dataset D2B

Page 13: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q3: Crowdsourcing Effectiveness

13

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

• Extracting from Dataset D2 the crowdsourcing behaviors: Reports Type of Report Reporter profile EasyList editor response EasyList editor profile Time of correction Reason of rejection …….

Page 14: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q4: Robustness of Filter-List

14

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

• Extract from Dataset D1 the EasyList’s behavior: Reasons of adding rules. Syntax of rules. Ad server’s domains. Change of ad element attribute. ….

• Extract from Dataset D2 the websites’ behaviors: Reasons of FN errors. Responses of EasyList community. …

• Study the reaction of ad networks. Historical traffic information of the ad-severs. …

Page 15: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Datasets

15

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Dataset D2

Dataset D2A: Linking EasyList Filter Rules with True Reports Dataset # Note

FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018

Dataset # Note

True Instances of FP errors 570 2,203 webpages studied.

Dataset D2B: Reproducing FPs

Dataset # Note Cleaned EasyList Versions 55,607 From November 30, 2009, to December 7, 2018

Added Filter Rules 534,020 In order to correct FP and FN errors Removed Filter Rules 448,479 In order to correct FP and FN errors

Dataset D1

Dataset # Note

Reports of FN errors 17,968 From November 30, 2009, to December 7, 2018

Reports of FP errors 5,272 From November 30, 2009, to December 7, 2018

Dataset # Note Historical traffic information of the ad-severs 567,293 Traffic information of 6903 ad server domains during 4-years.

Ad-servers: traffic information

Page 16: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q1 Analysis: Error Prevalence

Websites with FN and FP errors 16

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Page 17: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q2 Analysis: Sources of FP Errors:

The responsibility (the source of the error)

for the errors.

17

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Time

t1

t2

t3

Non-Ad Content Filter Rule

t1<t2<t3

Web Designer’s Fault Ad blocker’s Fault

Web Designer’s create Ad Blocker create

0%

30%

60%

90%

Designer's Fault Ad-blocker's Fault

Block Reques

Hide Element

Page 18: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q3 Analysis: Crowdsourcing Effectiveness

FP and FN error reports submitted by different categories of users

18

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

# Reports Avg. of days SD.

Title FP FN FP FN FP FN

Anonymous 530 853 2.37 1.80 6.88 7.38

New Member 371 307 3.94 9.31 8.77 21.09

Senior Member 160 749 2.31 6.42 5.35 17.48 Developer 83 99 1.80 16.30 5.52 31.08

Other Lists Editor 105 603 1.65 2.65 3.86 11.02

Veteran 255 751 1.95 5.34 5.17 14.31 Editor 80 338 0.58 0.52 1.49 2.98

Total 1,584 3,700 2.09 6.05 5.29 15.05

30%

70%

False PositiveReports

False NegativeReports

Page 19: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Contributions by Different Types of Users

19

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Anonymous 23%

New Member

8% Senior Member

20% Developer

3%

Other Lists Editor 17%

Veteran 20%

Editor 9%

FN Reports

Anonymous 34%

New Member

23% Senior

Member 10%

Developer 5%

Other Lists Editor

7%

Veteran 16%

Editor 5%

FP Reports

Page 20: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Contributions of Different Types of Users

• To Anonymous, New Member, Senior Member, and Veteran classes, the error type and website popularity dependent.

20

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Anonymous New Member

Senior Member

Developer Other Lists Editor

Veteran Editor

P-value of X-squared 7.17E-24 9.16E-06 1.03E-07 0.030464 0.1235264 0.000166 0.0611805

Pearson correlation -0.05275 -0.158028 0.09673657 0.061129 0.0363365 0.1041299 0.067597

• Anonymous and New Members contributed more on correcting FP errors than FN errors for lower-rank websites. • Expert members tend to the opposite side.

Page 21: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

21

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Delay of reporting FP errors

Delay in Reporting FP Errors

Page 22: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Q4 Analysis: Robustness of Filter-list against Evasion Attacks

15 different evasion attacks:

• More-Studied Attacks (4),

• Less-Studied Attacks (3), and

• Nonstudied Attacks (8).

22

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Page 23: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

More-Studied Attacks

23

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Attacks Our Findings

WebSockets. Since 2016, EasyList had blocked : • 291 websites. • 137 ad servers.

Anti-ad Blocker. Reaction: • Restricting content on the sites (paywalls, blocking the websites) • Redirecting the users to different websites or content.

Randomization of Ad Attributes and URLs.

• 15 websites using randomization. • Facebook appeared most frequently .

Factoring Acceptable Ads List Sitekeys.

Our datasets do not show any of this attack.

Page 24: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

24

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Less-Studied Attacks Attacks Our Findings

Changing Ad-Server Domains. • 52% of the ad servers’ traffic activities disappeared in three days. • 84% of these 52% ad servers were blocked shortly after they were

used. • Ad servers with long life: 61% were significantly influenced by the

blocking. • The EasyList community ran code to monitor the changes of ad

servers (limited)

Changing Ad-Element Attributes. • EasyList did not have the capability to automatically trace the changes of ad elements.

• Manually detected by EasyList, 553 instances changed the filters in response to this type of evasion.

Changing the Path of Ad Source. • 644 websites changed their the ad URL’s paths.

Page 25: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Nonstudied Attacks

1. Exploiting Obsolete Whitelist Filters.

2. Using Generic Exception Rules (Whitelist Filters).

3. Exploiting False Positive Errors.

4. First-Party Content and Inline Script.

5. ISP Injecting Ads.

6. Background Redirection.

7. Exploiting WebRTC.

8. CSS Background Image Hack.

25

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Domains in the whitelist filters were not monitored by

EasyList.

EasyList counters the anti-ad blocker or solve FP error by

GER. So ..?

Page 26: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Limitations and Future Work • Limitations:

– The dataset covered the historical dated back to 2009. We could not find any data before November 2009.

– Conservative approach was used to link the reported errors to the EasyList updates. • Trade-off between the scale of the data and the accuracy of the analysis.

– The Internet Archive data was limited.

• Future work – Crowdsourcing mechanisms.

– Dynamic analysis.

– And more…

26

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Page 27: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

Conclusions

• An in-depth measurement study to reveal

― Q1: Prevalence of FP and FN errors

― Q2: Primary sources of FP errors

― Q3: Effectiveness of crowdsourcing in detecting and mitigating FP and FN errors

― Q4: Robustness of filter-list against evasion attacks?

• Our findings are expected to help shed light on any future work to evolve ad blocking and/or to optimize crowdsourcing mechanisms.

27

Ob

ject

ives

D

atas

ets

FP &

FN

err

ors

Ev

asio

n

Met

ho

do

logy

B

ackg

rou

nd

Page 28: Errors, Misunderstandings, and Attacks · FP and FN Reports linked with EasyList changes 5,284 From November 30, 2009, to December 7, 2018 Dataset # Note True Instances of FP errors

M s h a b a b A l r i z a h

m a a 2 5 @ p s u . e d u


Recommended