+ All Categories
Home > Documents > User Sessions - uni-hamburg.de

User Sessions - uni-hamburg.de

Date post: 28-Feb-2022
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
34
Analyzing Characteristic Host Access Management of Information Security University of Regensburg, Germany Patterns for Re-Identification of Web User Sessions Nordsec 27. – 29. October 2010 Aalto University, Espoo, Finland Dominik Herrmann, Christoph Gerber , Christian Banse, Hannes Federrath
Transcript

Dr. Max Mustermann Referat Kommunikation & Marketing Verwaltung

Analyzing Characteristic Host Access

Management of Information Security University of Regensburg, Germany

Patterns for Re-Identification of Web User Sessions

Nordsec 27. – 29. October 2010 Aalto University, Espoo, Finland

Dominik Herrmann, Christoph Gerber, Christian Banse, Hannes Federrath

behavior-based web user re-identification Christoph Gerber 2

agenda

problem description

relation to text-mining

case study and

test setting

re-identification

behavior-based web user re-identification Christoph Gerber 3

problem description

•  small user group (e.g. users of a proxy-server) •  all HTTP-requests are recorded •  changing IP-addresses / different surfing sessions

Proxy-Server

IP 1, User 1: www.wikipedia.de IP 2, User 2: www-sec.uni-r.de IP 2, User 2: www.cse.tkk.fi IP 1, User 1: www.google.de

behavior-based web user re-identification Christoph Gerber 4

perspective of a proxy server

t

Host

Session 1: IP1

Session 2: IP2

Session 3: IP3 Session 4: IP4

x x

x x

o

o

o o

¤ ¤

¤

¤

¤

Δ Δ

Δ Δ

Δ

x

HTTP-request to a certain host issued by a user with the IP-address 4

behavior-based web user re-identification Christoph Gerber 5

perspective of a proxy server

t

Host

Session 1: IP1

Session 2: IP2

Session 3: IP3 Session 4: IP4

x x

x x

o

o

o o

¤ ¤

¤

¤

¤

Δ Δ

Δ Δ

Δ

x

User 1

User 2

behavior-based web user re-identification Christoph Gerber 6

perspective of a proxy server

t

Host

Session 1: IP1

Session 2: IP2

Session 3: IP3 Session 4: IP4

x x

x x

o

o

o o

¤ ¤

¤

¤

¤

Δ Δ

Δ Δ

Δ

x

User 1

User 2 User 1? User 2?

someone else?

behavior-based web user re-identification Christoph Gerber 7

perspective of a proxy server

t

Host

Session 1: IP1

Session 2: IP2

Session 3: IP3 Session 4: IP4

x x

x x

o

o

o o

¤ ¤

¤

¤

¤

Δ Δ

Δ Δ

Δ

x

aggregated session

User 2

User 1

behavior-based web user re-identification Christoph Gerber 8

modeling the classification problem

(x11, x2

1, x32, x4

1)

X4: www.google.de X3: www.cse.tkk.fi X2: www-sec.uni-r.de X1: www.wikipedia.de

Session 4: IP4

Δ Δ

Δ Δ

Δ

•  each session (s) consists of a multiset

•  each surfing session (s) is an instance of a class

•  each class represents an user

behavior-based web user re-identification Christoph Gerber 9

classification of user sessions

Host

Session 1: IP1

Session 2: IP2

Session 3: IP3

Training

User 1 User 2

Test Classifier

x x

x x

o

o

o o

¤ ¤

¤

¤

¤

x

t Session 4: IP4

Δ

Δ Δ

Δ

aggregated session

Δ

behavior-based web user re-identification Christoph Gerber 10

similarity to text-mining-problems

•  word frequency and host frequency following a power-law

http://www.cs.princeton.edu/introcs/data/bible.txt

text-retrieval user re-identification

0

10000

20000

30000

40000

50000

60000

70000

1 6 11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

101

freq

uen

cy

host ranking

0

10000

20000

30000

40000

50000

60000

70000

1 6 11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

freq

uen

cy

word ranking

behavior-based web user re-identification Christoph Gerber 11

text-mining toolbox

•  multinomial naive bayes (MNB)

•  vector transformations -  TF transformation

-  IDF transformation

-  cosine normalisation (N)

cf. Manning Raghavan,Schütze: Introduction to Information Retrieval. Cambridge Press 2009. [21]

Training Test

behavior-based web user re-identification Christoph Gerber 12

related work

•  Pang et al. (2007) -  re-identification of users in 802.11 wireless networks

•  Yang (2008) -  focus on fraud detection

•  Kumpost (2009) -  focus on re-identification of web users

behavior-based web user re-identification Christoph Gerber 13

test setting and case study

•  test users •  local proxy server •  host obfuscation •  client/server architecture

key value participants 28 duration of study in days 57 number of HTTP requests 2,684,736

number of unique hosts 25,124

behavior-based web user re-identification Christoph Gerber 14

data acquisition users scope of protection WWW

local proxy-server

HTTP-traffic to WWW

host-obfuscation

#1

#2

#n

...

study data

aggregated study data

behavior-based web user re-identification Christoph Gerber 15

host obfuscation

hash-function

www.google.de

<salt> 134BC2D1F..0D

serveral times repeated

•  hashing of hostnames •  + salt to prevent dictionary attacks •  + iterations to prevent building of own dictionary

behavior-based web user re-identification Christoph Gerber 16

user contribution on a daily basis

behavior-based web user re-identification Christoph Gerber 17

user contribution on a daily basis

behavior-based web user re-identification Christoph Gerber 18

re-identification attack

•  attacker's view -  limited knowledge -  practical relevance

•  simulations -  for evaluating the driving factors

•  countermeasures

behavior-based web user re-identification Christoph Gerber 19

attacker's view (training)

•  Δt = 24h •  decision to track a specific user ut on day t •  training with Ut classes on day t with St sessions

time

Host

day t

User 1 User 2

MNB

x x

x x

o o

o o x

Training

behavior-based web user re-identification Christoph Gerber 20

attacker's view (attack)

•  Δt = 24h •  decision to track a specific user ut on day t •  training with Ut classes on day t with St sessions

•  on day t+1 assinging each session s to a class ut •  evaluating the classification result for class cu

time

Host

day t

Training Test MNB

x x

x x

o o

o o ¤

¤

¤

¤

¤

Δ Δ

Δ Δ

Δ

x

day t+1

User 1 User 2

behavior-based web user re-identification Christoph Gerber 21

prediction scheme of attacker's view

correctly classified by proxy-server

wrong classification – error is detectable for proxy-server

wrong classification – error not detectable for proxy-server

- attacker sucessfully recognizes the user

- attacker sucessfully recognizes the absence of the user

- more than one user was predicted to belong to class cu

- attacker detects absence of user; but user was online

- attacker wrongly recognizes the user

behavior-based web user re-identification Christoph Gerber 22

results from the attacker's view

•  user re-identification works -  60.5% correctly classified sessions

•  and can be improved by vector transformations -  73.1% by applying TF-N transformation

•  further improvements are possible -  77.6% by 'learning' the user habbits

•  more improvements conceivable -  timing-information -  filenames -  GET-parameters -  destination-ports -  ...

?

behavior-based web user re-identification Christoph Gerber 23

results from the attacker's view

•  user re-identification works -  60.5% correctly classified sessions

•  and can be improved by vector normalization -  70.1% by applying TF-N transformation

•  further improvements are possible -  77.6% by 'learning' the user habbits

•  more improvements conceivable -  timing-information -  filenames -  GET-parameters -  destination-ports -  ...

?

none N IDF IDF-N TF TF-N TF-IDF TF-IDF-N

60.5% 62.9% 65.0% 62.8% 56.0% 73.1% 66.1% 72.8%

behavior-based web user re-identification Christoph Gerber 24

results from the attacker's view

•  user re-identification works -  60.5% correctly classified sessions

•  and can be improved by vector transformations -  73.1% by applying TF-N transformation

•  further improvements are possible -  77.6% by 'learning' the user habbits

•  more improvements conceivable -  timing-information -  filenames -  GET-parameters -  destination-ports -  ...

?

behavior-based web user re-identification Christoph Gerber 25

simulations

•  simulation of simultaneously surfing sessions -  putting together the cronologically succeeding sessions -  always 28 users / session

•  in each experiment one parameter was modified -  session duration -  number of simultaneous users -  offset between last training and first test session -  number of consecutive training instances

•  each experiment was repeated 25 times

behavior-based web user re-identification Christoph Gerber 26

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

(SIM)

session duration

•  longer session times support re-identification

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

session duration in minutes

behavior-based web user re-identification Christoph Gerber 27

numer of simultaneous users

•  the fewer simultaneous users the better it works

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

24 hours3 hours1 hour10 min

number of concurrent users

session duration

behavior-based web user re-identification Christoph Gerber 28

offset between test and training sessions

•  each user tends to act similar at the same time of the day

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160

3 hours1 hour

offset between test and training in hours

behavior-based web user re-identification Christoph Gerber 29

number of training instances

•  more training instances are better, but only few are needed

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

number of training instances

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14 16 18 20

3 hours1 hour

1 hour (48 hours train/test offset)10 min

behavior-based web user re-identification Christoph Gerber 30

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50

re-id

entif

ied

sess

ions

[%]

number of proxy servers

1 day3 hours

countermeasures

•  using multiple, non-colluding proxy servers works -  but is not practicable (at this early stage)

•  more distribution schemes conceivable

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

behavior-based web user re-identification Christoph Gerber 31

countermeasures

•  analyzing a part of the host frequency distribution

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

acce

ss fr

eque

ncy

Host ranking

behavior-based web user re-identification Christoph Gerber 32

countermeasures

•  analyzing a part of the host frequency distribution -  keep the most popular hosts

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

acce

ss fr

eque

ncy

Host ranking

behavior-based web user re-identification Christoph Gerber 33

countermeasures

•  analyzing a part of the host frequency distribution -  keep the most popular hosts -  can not prevent from user re-identification

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5

re-id

entif

ied

sess

ions

[%]

proportion of most popular hosts kept

1 day3 hours1 hour

10 minutes

prop

ortio

n of

cor

rect

ly

clas

sifie

d se

ssio

ns

behavior-based web user re-identification Christoph Gerber 34

conclusion and discussion

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET http://www.ab.com/index.html HTTP/1.0" 200 2326

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET http://www.ab.com/index.html HTTP/1.0" 200 2326

•  re-identification as a feasible attack •  evaluated on a privacy preserving case study

•  works well for small closed groups •  not only for relevant for proxy-servers

•  improvements in using context information

•  improvements in gathering more realistic sessions


Recommended