+ All Categories
Home > Documents > User Behavior Analysis in Wi-Fi network

User Behavior Analysis in Wi-Fi network

Date post: 04-Feb-2016
Category:
Upload: regis
View: 30 times
Download: 0 times
Share this document with a friend
Description:
User Behavior Analysis in Wi-Fi network. Anna Rosenberg Supervisor: Orly Avner. Overview. The goal of this project: to analyze a Wi-Fi network’s APs to model the wireless clients using the network The contributions of this project: analysis of Access Points - PowerPoint PPT Presentation
34
USER BEHAVIOR ANALYSIS IN WI-FI NETWORK Anna Rosenberg Supervisor: Orly Avner
Transcript
Page 1: User Behavior Analysis in Wi-Fi  network

USER BEHAVIOR ANALYSIS IN WI-FI NETWORKAnna Rosenberg

Supervisor: Orly Avner

Page 2: User Behavior Analysis in Wi-Fi  network

Overview

The goal of this project: to analyze a Wi-Fi network’s APs to model the wireless clients using the

network The contributions of this project:

analysis of Access Points the use of k-means and g-means

algorithms for clustering the network’s users

Page 3: User Behavior Analysis in Wi-Fi  network

Previous Work

"Modeling client arrivals at access points in wireless campus-wide networks (Maria Papadopouli, Haipeng Shen, Manolis Spanakis)" models of the arrival processes of clients at APs

as a time-varying Poisson process with different arrival-rate function

analyzing the traffic load characteristics (e.g., bytes, number of packets, associations, distinct clients, type of clients)

clustering the APs based on their visit arrival and on the building type

Page 4: User Behavior Analysis in Wi-Fi  network

Previous Work

Characterizing user behavior and network performance in a public wireless LAN. In Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, 2002. (Anand Balachandran, Geoffrey Voelker, Paramvir Bahl, and VenkatRangan) Their overall analysis of user behavior shows that:

Users are evenly distributed across all APs and user arrivals are correlated in time and space

User arrivals can be correlated into the network according to a two-state Markov-Modulated Poisson Process (MMPP).

There is an implicit correlation between session duration and average data rates. Longer sessions typically have very low data requirements. Most of the sessions with high average data rate are very short.

Page 5: User Behavior Analysis in Wi-Fi  network

Previous Work

Modeling users’ mobility among Wi-Fi access points.( Minkyong Kim, David Kotz) Networks messages were collected on the

Dartmouth campus Modeling user movements between APs Clustering the APs based on their peak hour

Page 6: User Behavior Analysis in Wi-Fi  network

Data

Router (Sniffer) Packets:

MAC address of the access points MAC address of the user Source/Destination IP addresses Size of the packet The time it was received

Page 7: User Behavior Analysis in Wi-Fi  network

IEEE 802.11 Architecture

Cells (called Basic Service Set or BSS) Base Station (called Access Point or in

short AP). Access Points are connected through

backbone (called Distribution System or DS)

The examined network:16 APs

Page 8: User Behavior Analysis in Wi-Fi  network

Arrival Rate at APs

AP1, AP8:

0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 24:000

1

2

3

4

5

6

7x 10

5

Arrival Time [Hour/2]

Rat

e [B

/min

]

Plot of rate [B/min] for AP 1 , averaging window=0.5 hour

0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 24:000

20

40

60

80

100

120

140

160

180

Arrival Time [Hour/2]

Rat

e [B

/min

]

Plot of rate [B/min] for AP 8 , averaging window=0.5 hour

Active from midday till the evening Active only in the evening

Page 9: User Behavior Analysis in Wi-Fi  network

Analyzing the arrival rate with different averaging windows

Page 10: User Behavior Analysis in Wi-Fi  network

Analyzing the arrival rate with different averaging windows

0 5 10 15 20 250

5

10

15x 10

5

Arrival Time [hour]

Rat

e [B

/min

]Plot of rate [B/min] for AP 1

w=0.1w=0.2

w=0.25

w=0.3

w=0.35w=0.4

Page 11: User Behavior Analysis in Wi-Fi  network

Users

3273 users The transmission rate :

Page 12: User Behavior Analysis in Wi-Fi  network

Coherence with the time of lectures and breaks

Users are active during the breaks and not active during the lectures that last 50-55 minutes.

Page 13: User Behavior Analysis in Wi-Fi  network

Visit duration

How to define a visit?

We chose 30 minutes as a maximal inter-arrival time between two packets that can be considered as packets of one visit.

Page 14: User Behavior Analysis in Wi-Fi  network

Features

The average characteristics: Average visit duration Average inter-arrival times between the

visits Average traffic Number of visits Total number of days in the systemThe std of inter-arrival times

The std of traffic The std of visitduration

Page 15: User Behavior Analysis in Wi-Fi  network

Features

No typical clusters that can be found among the networks users:

Av. inter visit timesvs. Av. visit duration

Av. inter visit timesvs. Number of visits

Page 16: User Behavior Analysis in Wi-Fi  network

Features

Av. trafficvs. Av. visit duration

Av. trafficvs. Number of visits

Page 17: User Behavior Analysis in Wi-Fi  network

Clustering

Unsupervised learning problem Finding a structure in a collection of

unlabeled data

Collection of objects which are “similar” Distance measure

Page 18: User Behavior Analysis in Wi-Fi  network

K-Means Clustering

Features: Average visit duration Average inter-visit times Average traffic per packets Maximal distance between visits Minimal distance between visits

Page 19: User Behavior Analysis in Wi-Fi  network

Results of K-Means Clustering K=2

Av. visit duration vs. Av. inter visit times

Av. inter visit timesvs. Av. traffic per packet

Max. distance between visitsvs. Min. distance between visits

Page 20: User Behavior Analysis in Wi-Fi  network

Results of K-Means Clustering K=3

Av. visit duration vs. Av. inter visit times

Av. inter visit timesvs. Av. traffic per packet

Max. distance between visitsvs. Min. distance between visits

Page 21: User Behavior Analysis in Wi-Fi  network

Results of K-Means Clustering K=4

Max. distance between visitsvs. Min. distance between visits

Av. visit duration vs. Av. inter visit times

Av. inter visit timesvs. Av. traffic per packet

Page 22: User Behavior Analysis in Wi-Fi  network

K-Means Clustering: conclusion k-means clustering algorithm based on average

characteristics of networks’ users can’t produce any isolated clusters. That is why we conclude that the algorithm based on average characteristics can’t cluster well the networks’ users.

Possible reasons for unsuccessful clustering: Using feature set that doesn’t provide enough

information about the system Not enough samples Using Euclidian distance

Page 23: User Behavior Analysis in Wi-Fi  network

G-Means Clustering Algorithm The right number k of clusters to use is

often not obvious Based on a statistical test for the

hypothesis that a subset of data follows a Gaussian distribution

The standard statistical significance level α - desired probability of incorrectly splitting

Page 24: User Behavior Analysis in Wi-Fi  network

G-means

Different feature set provides more data points Each point consists of the following components:

The visit duration The inter time between this visits and the previous visit Number of packets that were sent during the visit The average amount of data that was accessed during

the visit Normalize the data components to get proper results

even with simple Euclidean distance metric 50 users with maximal number of visits: 3457 points Users with more than 10 visits: 572 users, 15105

points

Page 25: User Behavior Analysis in Wi-Fi  network

G-means

The dependence of number of clusters on α:

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

50

100

150

200

250alpha vs av. numvber of clusters

alpha

num

ber

of c

lust

ers

Page 26: User Behavior Analysis in Wi-Fi  network

G-means results

70 clusters α = 0.0001

0 10 20 30 40 50 60 700

0.5

1

1.5

2

2.5

3

3.5

4user 136 labels

0 10 20 30 40 50 60 700

1

2

3

4

5

6user 202 labels

58 visits, 8788 packets, 30 clusters; the most common clusters:11, 20, 29 and 35.

59 visits, 28777 packets, 31 clusters; the most common cluster 30

Page 27: User Behavior Analysis in Wi-Fi  network

Evaluation

Purity 1( , ) max k j

jk

purity C cN

1 2{ , ,..., }k - the set of clusters

1 2{ , ,..., }JC c c c - the set of classes

Example: the majority class and number of members of the majority class for the three clusters are: x,5(cluster 1); o,4(cluster 2); and ◊,3(cluster 3). Purity is (1/17)×(5+4+3)≈0.71

Page 28: User Behavior Analysis in Wi-Fi  network

Evaluation

The dependence of the purity on α

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18alpha vs purity

alpha

purit

y

Page 29: User Behavior Analysis in Wi-Fi  network

Evaluation

New Evaluation Measurethe level of possibility of representing each user by one typical cluster1

1 Ni

i i

xE

N M

N – total number of users - number of samples contained in the most common cluster of user i - total number of samples of user i

Example: There are 3 users: x, o and ◊.Number of samples contained in the most common class and total number of samplesfor the three user are:5,8(user x); 4,5(user o); and 3,4(user ◊).E=(1/3)×(5/8 + 4/5 + 3/4)≈0.725

Page 30: User Behavior Analysis in Wi-Fi  network

Evaluation

The dependence of the evaluation measure E on α

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

alpha

Page 31: User Behavior Analysis in Wi-Fi  network

G-Means Clustering: conclusion g-means clustering algorithm based on the

points that consist of the 4 characteristics (that were described earlier) can’t represent each user by one typical cluster. That is why we conclude that this algorithm can’t cluster well the networks’ users.

Possible reasons for unsuccessful clustering: Using feature set that doesn’t provide enough

information about the system Not enough samples Using Euclidian distance

Page 32: User Behavior Analysis in Wi-Fi  network

Conclusions

The Access Points’ arrival rate is coherent with the time of lectures and breaks. The APs show low activity during the lectures and high activity

during the breaks. k-means clustering algorithm based on average

characteristics of networks’ users can’t produce any isolated clusters. That is why we conclude that the algorithm based on average characteristics can’t cluster well the networks’ users.

g-means clustering algorithm based on the points that consist of the 4 characteristics (that were described earlier) can’t represent each user by one typical cluster. That is why we conclude that this algorithm can’t cluster well the networks’ users.

Page 33: User Behavior Analysis in Wi-Fi  network

Future work

Select another subset of features Use another clustering algorithm Try to collect more data samples

Page 34: User Behavior Analysis in Wi-Fi  network

Questions


Recommended