Beauty and the Burst - USENIX | The Advanced Computing ... · Pulp Fiction Die Hard 12 Monkeys The...

Post on 10-Jul-2020

5 views 0 download

transcript

1

Beauty and the Burst Remote Identification of Encrypted

Video Streams

Roei Schuster

Cornell Tech, Tel Aviv University

Vitaly Shmatikov

Cornell Tech

Eran Tromer

Columbia University, Tel Aviv University

2

Video traffic is interesting

3

Video traffic is encrypted

4

What can still be learned?

Video traffic is encrypted

5

victim

streaming service

Traffic analysis for video identification

6

victim

streaming service

Traffic analysis for video identification

7

victim

Metadata! packet times, sizes, …

streaming service

Traffic analysis for video identification

8

Victim is watching

“Beauty and the Beast”!

victim

Metadata! packet times, sizes, …

streaming service

Traffic analysis for video identification

9

Initial buffering, then “on”/“off” bursts pa

cke

t siz

e (

byte

s)

time (seconds)

10

Initial buffering, then “on”/“off” bursts pa

cke

t siz

e (

byte

s)

time (seconds)

11

[RLLTBD ’11], [ARNL ’12], [MFWS ’13], …

Initial buffering, then “on”/“off” bursts pa

cke

t siz

e (

byte

s)

time (seconds)

12

[RLLTBD ’11], [ARNL ’12], [MFWS ’13], …

Where do bursts come from?

Initial buffering, then “on”/“off” bursts pa

cke

t siz

e (

byte

s)

time (seconds)

13

streaming service

Video representation on server

14

streaming service

Video representation on server

15

streaming service

Pulp Fiction

Die Hard

12 Monkeys

The Fifth Element

Die Hard II

Armageddon

Video representation on server

16

Pulp Fiction

Die Hard

12 Monkeys

The Fifth Element

Die Hard II

Armageddon

MPEG-DASH standard:

widely adopted by Netflix, YouTube, others

Video representation on server

17

Pulp Fiction

Die Hard

12 Monkeys

The Fifth Element

Die Hard II

Armageddon

segment2.m4s

segment3.m4s

segment4.m4s

segment1.m4s

video stored in segment-files

MPEG-DASH standard:

widely adopted by Netflix, YouTube, others

Video representation on server

18

Pulp Fiction

Die Hard

12 Monkeys

The Fifth Element

Die Hard II

Armageddon

segment2.m4s

segment3.m4s

segment4.m4s

segment1.m4s

video stored in segment-files

segment = a few seconds

of playback

MPEG-DASH standard:

widely adopted by Netflix, YouTube, others

0-5sec

5-10sec

10-15sec

15-20sec

Video representation on server

19

server client

buffer below threshold?

yes

request next

segment

no

segment2.m4s

segment3.m4s

segment4.m4s

segment1.m4s

server

segment5.m4s

segment6.m4s

DASH client-server interaction (simplified)

20

server client

buffer below threshold?

yes

request next

segment

no

segment2.m4s

segment3.m4s

segment4.m4s

segment1.m4s

server

segment5.m4s

segment6.m4s

segment fetched every few seconds

DASH client-server interaction (simplified)

21

server client

buffer below threshold?

yes

request next

segment

no

segment2.m4s

segment3.m4s

segment4.m4s

segment1.m4s

server

segment5.m4s

segment6.m4s

fetching causes a traffic burst

segment fetched every few seconds

DASH client-server interaction (simplified)

22

Different video seconds require different amount of bytes to encode

Bitra

te (

byte

s)

Time (seconds)

Iguana vs. Snakes VBR

Variable bit rate encoding

23

Phases of Iguana vs Snakes in Bitrate

scenery, movement,

tension rising

Bitra

te (

bits p

er

se

co

nd

)

Time (seconds)

24

Phases of Iguana vs Snakes in Bitrate

tension peaking, iguana is still

Bitra

te (

bits p

er

se

co

nd

)

Time (seconds)

25

Phases of Iguana vs Snakes in Bitrate

chase

Bitra

te (

bits p

er

se

co

nd

)

Time (seconds)

26

Phases of Iguana vs Snakes in Bitrate

chase

iguana almost captured

Bitra

te (

bits p

er

se

co

nd

)

Time (seconds)

27

Phases of Iguana vs Snakes in Bitrate

iguana safe, resting

Bitra

te (

bits p

er

se

co

nd

)

Time (seconds)

28

Different video seconds require different amount of bytes to encode

Bitra

te (

byte

s)

Time (seconds)

Iguana vs. Snakes VBR

Variable bit rate encoding

29

Pulp Fiction

Die Hard

12 Monkeys

The Fifth Element

Die Hard II

Armageddon Segment2.m4s

Segment3.m4s

Segment4.m4s

Segment1.m4s

Segment5.m4s

0-5sec

5-10sec

10-15sec

15-20sec

20-25sec

Variable bit rate variable segment size

30

Variable segment size variable burst size

buffering On/off bursts

bu

rst siz

e (

byte

s)

Time (seconds)

31

Variable segment size variable burst size

buffering On/off bursts

bu

rst siz

e (

byte

s)

Time (seconds)

32

segments

VBR pattern

stream time

burst sizes

content

MPEG-DASH leak

33

burst sizes

stream time

Does the pattern of burst (segment) sizes uniquely

characterize a title?

Can we learn a title’s identifying

pattern?

From a leak to a fingerprint

34

burst sizes

stream time

Does the pattern of burst (segment) sizes uniquely

characterize a title?

Can we learn a title’s identifying

pattern?

Diversity: empirically measure pairwise distances for 3500 downloaded and segmented YouTube titles

From a leak to a fingerprint

35

burst sizes

stream time

Does the pattern of burst (segment) sizes uniquely

characterize a title?

Consistency: empirically evaluate attacker’s measurement error bound

Can we learn a title’s identifying

pattern?

Diversity: empirically measure pairwise distances for 3500 downloaded and segmented YouTube titles

From a leak to a fingerprint

36

burst sizes

stream time

Does the pattern of burst (segment) sizes uniquely

characterize a title?

Consistency: empirically evaluate attacker’s measurement error bound

Can we learn a title’s identifying

pattern?

~20% of YouTube titles have fingerprints

Diversity: empirically measure pairwise distances for 3500 downloaded and segmented YouTube titles

From a leak to a fingerprint

37

victim network

Pulp Fiction

Die Hard

Armageddon attacker network

12 Monkeys

Attack overview

38

victim network

Pulp Fiction

Die Hard

Armageddon attacker network

12 Monkeys

Attack overview

39

metadata

victim network

Pulp Fiction

Die Hard

Armageddon attacker network

12 Monkeys

Attack overview

40

detectors

metadata

train

ing

victim network

Pulp Fiction

Die Hard

Armageddon attacker network

12 Monkeys

Attack overview

41

detectors

metadata

train

ing

victim network

Pulp Fiction

Die Hard

Armageddon Armageddon attacker network

12 Monkeys

Attack overview

42

detectors

metadata

train

ing

victim network

Pulp Fiction

Die Hard

Armageddon Armageddon attacker network

12 Monkeys

Attack overview

43

detectors

metadata

train

ing

victim network

Pulp Fiction

Die Hard

Armageddon Armageddon attacker network

Victim is watching

“Armageddon”!

12 Monkeys

Attack overview

44

detectors

train

ing

victim network

Pulp Fiction

Die Hard

12 Monkeys

Armageddon Armageddon attacker network

vantage point?

Attack details

metadata

45

bursts

Wi-Fi access points, proxies, routers, enterprise or national network censors, ISPs

on-path vantage

point

Scenario I: on-path attack

46

detectors

train

ing

victim network

Pulp Fiction

Die Hard

12 Monkeys

Armageddon Armageddon attacker network

machine learning

Attack details

metadata

47

• Very good at learning high-level concepts that are hard to

express formally (e.g., “traffic traces are similar”)

• Existing NN architectures very accurate on classification

and detection problems

Deep neural networks

48

• Robust: can operate on noisy and coarse

measurements

• Agnostic to protocol-specific attributes (e.g.,

QUIC vs. TLS)

• Can learn features other than burst patterns,

e.g., arrival patterns of individual packets

• Can use multiple session representations,

train on all at once

Advantages of neural networks

49

Each feature is a time-series, sampled at 0.25-second intervals (example: bytes per second)

0 0.25 0.5 0.75 1

time (seconds)

packe

t siz

e

1500

300

0.25 ⋅

15000

2 ⋅ 1500300……

Features considered: downstream/upstream/total values of bytes per

second, packet per second, average packet length, and burst sizes

Features

50

detectors

train

ing

victim network

Pulp Fiction

Die Hard

12 Monkeys

Armageddon Armageddon attacker network

neural net

Attack

On-path attacker

metadata

51

10 titles 100 1-minute sessions

18 titles 100 3-minute sessions + 3500 sessions of different other titles

10 titles 100 1.5-minute sessions

100 titles 100 1-minute sessions

Datasets and identification experiments

52

10 titles 100 1-minute sessions

18 titles 100 3-minute sessions + 3500 sessions of different other titles

10 titles 100 1.5-minute sessions

100 classes

100 titles 100 1-minute sessions

Datasets and identification experiments

53

10 titles 100 1-minute sessions

18 titles 100 3-minute sessions + 3500 sessions of different other titles

10 titles 100 1.5-minute sessions

100 classes

18+1=19 classes

open-world identification

100 titles 100 1-minute sessions

Datasets and identification experiments

54

10 titles 100 1-minute sessions

18 titles 100 3-minute sessions + 3500 sessions of different other titles

10 titles 100 1.5-minute sessions

100 classes

18+1=19 classes

10 classes

10 classes

open-world identification

100 titles 100 1-minute sessions

Datasets and identification experiments

55

10 titles 100 1-minute sessions

18 titles 100 3-minute sessions + 3500 sessions of different other titles

10 titles 100 1.5-minute sessions

100 classes

18+1=19 classes

10 classes

10 classes

98.5% accuracy

99.5% accuracy

98.6% accuracy

92.5% accuracy

open-world identification

100 titles 100 1-minute sessions

Datasets and identification experiments

56

Netflix (feature: total burst size)

YouTube (feature: total burst size)

Predicted label Predicted label

“unknown” class, 3500 samples

Empirical results: confusion matrices

57

Netflix (feature: total burst size)

YouTube (feature: total burst size)

Predicted label Predicted label

“unknown” class, 3500 samples Exactly 2 false positives

No recurrent confusions (despite many same-series titles)

Empirical results: confusion matrices

58

0 false positives with 0.988 recall

0.0005 false positive rate with 0.93 recall

Netflix (feature: total burst size)

YouTube (feature: total burst size)

Tuning for precision

59

detectors

train

ing

victim network

Pulp Fiction

Die Hard

12 Monkeys

Armageddon Armageddon attacker network

vantage point?

Attack details

train

ing

neural net

metadata

60

Wi-Fi access points, proxies, routers, enterprise or national network censors, ISPs

bursts

victim network

on-path vantage

point

Off-path attackers

61

bursts

victim network Off-path attackers

62

A visited webpage? A smartphone app?

bursts

victim network Off-path attackers

63

A visited webpage? A smartphone app?

bursts

victim network Off-path attackers

Example: checking Facebook feed while streaming “Armageddon”

68

A visited webpage? A smartphone app?

bursts

victim network

Web ad

Three-fold confinement: different device, browser process, sandboxed iframe

Off-path attackers

Example: checking Facebook feed while streaming “Armageddon”

69

Browser

neighbor

viewer Cross-device attack

70

JavaScript attacker client

Browser

attacker Web site

neighbor

viewer Cross-device attack

71

JavaScript attacker client

Browser

attacker Web site

neighbor

viewer Cross-device attack

messages

72

JavaScript attacker client

Congestion

Browser

attacker Web site

neighbor

viewer Cross-device attack

messages

73

JavaScript attacker client

Congestion

Browser

attacker Web site

bursts

neighbor

viewer Cross-device attack

messages

74

JavaScript attacker client

Congestion

Browser

attacker Web site

bursts

delays

neighbor

viewer Cross-device attack

messages

75

JavaScript attacker client

Noisy, coarse estimate of actual traffic bursts

Congestion

Browser

attacker Web site

bursts

delays

neighbor

viewer Cross-device attack

messages

76

traffic burst sizes (scaled down) Message

delays

Delay-bursts dela

y (

mill

iseconds)

time (seconds)

77

For each traffic burst, compute aggregate delay induced. Use resulting time-series as input to neural network

traffic burst sizes (scaled down) Message

delays

Delay-bursts dela

y (

mill

iseconds)

time (seconds)

78

Delay-bursts vs. traffic bursts

delay-bursts time series: the delays induced by traffic bursts

79

Accuracy: 0.965

false positive rate: 0.003, recall 0.933

1/10 cross-device attack: precision vs. recall

80

JavaScript detector code

Browser

attacker Web site

Cross-device attack

neighbor

viewer

81

attacker Web site

Cross-site attack

Streaming client

victim PC browser window

browser window

JavaScript detector code

82

• Modern streaming traffic characteristics

– Title bitrate pattern unique when sampled at few-seconds granularity

– Fetching at segment granularity (= every few seconds)

• Maximizes “quality of experience”, server load, and network

bandwidth utilization

• However, information leakage is intrinsic…

Buffer below threshold?

yes

fetch next segment

no

Mitigating the DASH leak

83

• Further information and the paper:

https://beautyburst.github.io/

Thank you!