Website Fingerprinting at Internet Scale

Website Fingerprinting at Internet Scale

Andriy Panchenko1, Fabian Lanze1, Andreas Zinnen2,Martin Henze3, Jan Pennekamp1, Klaus Wehrle3, Thomas Engel1

1Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg2RheinMain University of Applied Sciences, Germany

3RWTH Aachen University, Germany

Background

Why people use Tor...

Privacy has become a general concernAccess to the Internet is censored in many countries

Website Fingerprinting

Client

OR

OR

OR

OR

OR

OR

ORServer

?

Tor: The Onion RouterMost popular low-latency anonymization networkMany users rely on Tor to access unfiltered information


Client

OR

OR

OR

OR

OR

OR

ORServer

Entry Middle Exit

?



Client

OR

OR

OR

OR

OR

OR

ORServer

Entry Middle Exit

?



Client

OR

OR

OR

OR

OR

OR

ORServer

Entry Middle Exit

?

What is website fingerprinting?Identify website accessed without breaking cryptographyAttacker is a passive observerFeatures based on packet size, direction, ordering, timing

Website Fingerprinting - state of the art

Widely discussed and hot topic in anonymity research

State-of-the-art approach: Wang et al. (Usenix Sec’14)k-Nearest Neighbor approachmanually selected features (e.g., bursts, unique lengths)about 4,000 featuresrecognition rates > 90%

2 scenarios for evaluationClosed world: user visits only a fixed number of websitesOpen world: monitor set of sites (user may visit unknown sites)

Our method

IdeaDon’t try to guess which characteristics may be relevantUse a representation that implicitly covers all characteristics

Our feature set: (Nin,Nout,Sin,Sout︸︷︷︸basic properties

, C1, · · · , Cn︸︷︷︸cumulative features

)

0 2 4 6 8 10 12 14 16 18

Packet Number

−1000

0

1000

2000

3000

4000

5000

6000

7000

Cum

ulat

ive

Sum

ofPa

cket

Size

s C(T1)

Ci sampled for T1

C(T2)

Ci sampled for T2

Example

20 40 60 80 100

Feature Index

0

50

100

150

200

Feat

ure

Valu

e[k

Byt

e]

about.comgoogle.de

Fixed number of distinctive characteristics from traces with varyinglengthsFingerprints can be visualizedUsed as input for a Support Vector Machine

Layers of data representation

TLS records

TCP packets

Record 1 *

Packet 2

Tor cells

Packet 3Packet 1

Cell 3Cell 2Cell 1

Record 2

Cell 5Cell 4

Information src for feature extraction: Cell vs. TLS vs. TCPPractically nigligible effect on the classification accuracy

Comparison with state of the art – classification

Closed worldAccuracy [%] for 100 most popular websites

90 instances 40 instancesk-NN (3736 features) 90.84 89.19

Our method (104 features) 91.38 92.03

Open worldForeground: 100 blocked websites, background: 9,000 popular websites

TPR FPRk-NN 90.59 2.24

Our method 96.92 1.98

Comparison of computational performance

0 10000 20000 30000 40000 50000

Background Set Size

10−4

10−3

10−2

10−1

100

101

102

103A

vera

geP

roce

ssin

gT

ime

[h]

k-NNCUMULCUMUL (parallelized)

Computation time for 100 random monitored pages in open world

Website fingerprinting in reality

CritiqueData sets used are not representative!

too small, only popular websites / index pages

Simplified assumptions, wrong metrics for evaluation

RND-WWW: How do people access the world wide web?Twitter

> 120,000 web pages

Alexa-one-click

Googling the trends

Googling at random

Censored in China

Tor-Exit: Which pages do users actually access over Tor?Monitor a Tor Exit node ⇒ 211,148 web pages

Webpage fingerprinting at Internet scale

Question: Does the attack scale under realistic assumptions?

Which metric to evaluate?Accuracy: fraction of true resultsTrue Positive rate / Recall: fraction of monitored pages detectedFalse Positive Rate: fraction of false alarms

Problem: misleading interpretation ⇒ base rate fallacy

Precision: probability that the classifier is correct given it hasdetected a monitored page

Focus of evaluationPrecision and recall for increasing background set sizesRandom subset as foreground



Results for RND-WWW

0 0.2 0.4 0.6 0.8 1Recall

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884

0 0.2 0.4 0.6 0.8 1Precision

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884



Results for Tor-Exit

0 0.2 0.4 0.6 0.8 1Recall

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884

b = 211148

0 0.2 0.4 0.6 0.8 1Precision

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884

b = 211148



Results for Tor-Exit

0 0.2 0.4 0.6 0.8 1Recall

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884

b = 211148

0 0.2 0.4 0.6 0.8 1Precision

0

20

40

60

80

100

Fra

ctio

nof

Fore

grou

ndPa

ges

[%]

b = 1000

b = 5000

b = 9000

b = 20000

b = 50000

b = 111884

b = 211148

Answer: No.


Question: Is it at least possible for certain pages?


Question: Is it at least possible for certain pages?

Minimum number of mistakenly confused pages

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

Number of Webpage Confusions

Fractionof

ForegroundPag

es[%

] b=20 000b=50 000b=100 000

No single page without a confusingly similar page in a realistic universe.

How about fingerprinting websites? (1/2)

A website is a collection of web pages served under the same domainIs it possible to fingerprint a website when only a subset of its pagesare available for training?

Experiment: 20 websites

AL

JAZE

ER

AA

MA

ZON

BB

CC

NN

EB

AY

FAC

EB

OO

KIM

DB

KIC

KA

SSL

OV

ESH

AC

KR

AK

UT

EN

RE

DD

IT RT

SPIE

GE

LST

AC

KO

VE

RF

LO

WT

MZ

TO

RP

RO

JEC

TT

WIT

TE

RW

IKIP

ED

IAX

HA

MST

ER

XN

XX

ALJAZEERAAMAZON

BBCCNN

EBAYFACEBOOK

IMDBKICKASS

LOVESHACKRAKUTEN

REDDITRT

SPIEGELSTACKOVERFLOW

TMZTORPROJECT

TWITTERWIKIPEDIAXHAMSTER

XNXX

5151

50 151

5150 1

5151

49 1 151

5151

1 1 48 151

1 5051

50 151

1 5051

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

AL

JAZE

ER

AA

MA

ZON

BB

CC

NN

EB

AY

FAC

EB

OO

KIM

DB

KIC

KA

SSL

OV

ESH

AC

KR

AK

UT

EN

RE

DD

IT RT

SPIE

GE

LST

AC

KO

VE

RF

LO

WT

MZ

TO

RP

RO

JEC

TT

WIT

TE

RW

IKIP

ED

IAX

HA

MST

ER

XN

XX

ALJAZEERAAMAZON

BBCCNN

EBAYFACEBOOK

IMDBKICKASS

LOVESHACKRAKUTEN

REDDITRT

SPIEGELSTACKOVERFLOW

TMZTORPROJECT

TWITTERWIKIPEDIAXHAMSTER

XNXX

47 1 2 128 5 1 1 4 3 1 1 3 3 1

43 1 1 4 22 45 1 32 1 32 3 1 2 2 1 2 2 2 1

41 2 1 1 1 2 349 2

1 49 11 45 2 2 1

1 2 2 44 1 13 484 1 44 1 11 2 1 471 3 2 1 2 3 31 1 1 2 2 2

1 2 1 46 11 1 3 7 31 1 7

4 2 1 1 1 5 1 1 1 1 331 3 1 1 5 3 37

3 1 471 50

(a) only index pages (b) different pages

How about fingerprinting websites? (2/2)

Transition of results from closed-world to the realistic open-worldsetting is typically not trivialWebsite fingerprinting scales better than webpage fingerprinting

0 20000 40000 60000 80000 100000 120000Background Set Size

0.0

0.2

0.4

0.6

0.8

1.0

PrecisionRecall

0 20000 40000 60000 80000 100000 120000Background Set Size

0.0

0.2

0.4

0.6

0.8

1.0

PrecisionRecall

Summary

Our classifier with 104 features outperforms state of the artAlarming results under simplified assumptions can’t be generalizedWebpage fingerprinting does not scale for appropriate universe sizesfor any webpageWebsite fingerprinting is not only more realistic and also significantlymore effectiveConclusions drawn need to be reconsidered

Scripts and RND-WWW dataset:http://lorre.uni.lu/~andriy/zwiebelfreunde/

http://lorre.uni.lu/~andriy/zwiebelfreunde/

We are hiring!

Our lab within the Interdisciplinary Centre for Security, Reliability and Trust(Uni Luxembourg) is looking for PhD candidates and PostDocs in the area

of anonymity and privacy

More information: http://secan-lab.uni.lu/jobs

http://secan-lab.uni.lu/jobs

Date post:	15-Nov-2021
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Website Fingerprinting at Internet Scale

Documents