An intelligent cyber security system against DDoS attacks ...sankur/SankurFolder/Jour... ·...

Computer Networks 136 (2018) 137–154

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier.com/locate/comnet

An intelligent cyber security system against DDoS attacks in SIP

networks

Murat Semerci a , ∗, Ali Taylan Cemgil a , Bülent Sankur b

a Department of Computer Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey b Department of Electrical and Electronics Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey

a r t i c l e i n f o

Article history:

Received 26 September 2017

Revised 11 January 2018

Accepted 25 February 2018

Available online 7 March 2018

Keywords:

Anomaly detection

Malicious user detection

DDoS

Mahalanobis distances

Sequence alignment kernel

a b s t r a c t

Distributed Denial of Services (DDoS) attacks are among the most encountered cyber criminal activities

in communication networks that can result in considerable financial and prestige losses for the corpora-

tions or governmental organizations. Therefore, autonomous detection of a DDoS attack and identification

of its sources is essential for taking counter-measures. This study proposes an intelligent security system

against DDoS attacks in communication networks that is composed of two components: A monitor for

detection of DDoS attacks and a discriminator for detection of users in the system with malicious in-

tents. A novel adaptive real time change-point model that tracks the changes in Mahalanobis distances

between sampled feature vectors in the monitored system accounts for possible DDoS attacks. A clus-

tering model that runs over the similarity scores of behavioral patterns between the users is used to

segregate the malicious from the innocent. The proposed model is deployed over a simulated telephone

network that uses a Session Initiation Protocol (SIP) server. The performance of the models are evaluated

on data generated by this high throughput simulation environment.

© 2018 Elsevier B.V. All rights reserved.

1

j

v

w

t

a

i

p

a

T

c

s

c

w

t

u

d

c

c

n

c

P

r

f

e

p

o

t

w

t

t

b

c

C

s

d

t

h

1

. Introduction

Distributed Denial of Service (DDoS) attacks are one of the ma-

or cyber threats on communications networks. DDoS attacks occur

ery frequently because they are fairly simple and cheap to initiate

hile their broad impact on users and service providers can po-

entially be severe. Such an attack incapacitates the victim server

nd renders it unable to provide services at all or at desired qual-

ty of service levels to its subscribers. With the cost-effective de-

loyment of cloud systems, DDoS attacks might affect the overall

vailability of the services by targeting more than one server [1] .

hey can even be a tool for political struggle on a grander scale; a

ase in point is the set of DDoS attacks to Turkey’s domain name

ervers by hacktivist groups in December 2015 [2] . As a more radi-

al case, they can be exerted over smart power transmission grids,

ith potentially more catastrophic consequences [3] . Therefore au-

omatic detection of DDoS attacks and identification of malicious

sers are crucial in protecting the network entities and for non-

egraded service continuity.

Telephone service providers follow the trend of changing their

ircuit-switched networks to packet-switched ones in view of the

∗ Corresponding author.

E-mail address: [email protected] (M. Semerci).

d

s

s

r

ttps://doi.org/10.1016/j.comnet.2018.02.025

389-1286/© 2018 Elsevier B.V. All rights reserved.

ost-effectiveness and maturity of the Voice-over-IP (VoIP) tech-

ology. The most popular protocol for control signaling between

ommunicating parties in VoIP is currently the Session Initiation

rotocol (SIP) [4] . SIP is based on a simple, HTTP-like text-based

equest-response transaction model. It provides basic signaling

unctionalities required for registering clients, checking their pres-

nce and on-line availability, exchanging their communication ca-

abilities, and overall managing the sessions. With the deployment

f 5G, VoIP is expected to be one of the major instruments for

he multimedia communication. The wide deployment of VoIP net-

orks and the key importance of telephone networks have made

he security issues of SIP servers extremely important.

VoIP networks are under a variety of cyber threats and the in-

ensity of attacks seems only to be growing [5] . The attacks can

e motivated by potential financial benefits, such as pilfering call

harges or causing data leakage masqueraded as a stealth threat.

onversely, it may be part of a plan to cause financial losses to the

ervice providers via heavy service disruption [6] .

In this paper we introduce a novel real-time online intrusion

etection and prevention system for communication networks, par-

icularly for networks with SIP traffic. The proposed system both

etects the presence of an attack and identifies the attackers. The

ystem focuses on the DDoS attacks that flood and suffocate a

erver with excessive amount of requests. One clue for the occur-

ence of a DDoS attack is a marked change in the messaging traffic

https://doi.org/10.1016/j.comnet.2018.02.025

http://www.ScienceDirect.com

http://www.elsevier.com/locate/comnet

http://crossmark.crossref.org/dialog/?doi=10.1016/j.comnet.2018.02.025&domain=pdf

mailto:[email protected]


138 M. Semerci et al. / Computer Networks 136 (2018) 137–154

m

fl

a

i

f

t

b

m

i

a

i

s

l

u

a

p

s

l

c

I

a

s

c

t

s

t

t

a

m

a

t

p

a

i

n

i

m

g

T

i

a

o

patterns in the network. To this effect, we develop a change detec-

tion algorithm that monitors the network traffic intensities at the

server side. Significant changes in the characteristics of messaging

flows are interpreted as the onset or offset of a potential DDoS at-

tack. We assume tacitly that in a DDoS attack, the attackers are

always acting in a coordinated manner.

A novel aspect of the proposed change-point detection method

is that it relies on the adaptive tracking of Mahalanobis distances

between successive state vectors as a way to monitor abnormal

changes in messaging traffic. This enables the monitor to adapt it-

self to the normal traffic regime and/or to the diurnal or seasonal

variations while at the same time remaining sensitive to abnormal

changes. One advantage of our method is that it is model-free, that

is, it is an unsupervised approach to detect traffic anomalies. The

system makes use only of the observed messaging traffic type and

intensity, and does not require any additional information such as

tracebacks. An abnormal change in the traffic regime is declared

if the Mahalanobis distance sequence of the state vectors in suc-

cessive time windows exceeds a threshold function. This threshold

value can be set to a constant as a function of sytem parameters

or can be adaptively set. A preliminary version of the Mahalanobis-

based anomaly detection algorithm was presented in a conference

paper [7] . The first part of this paper presents is an extension of

the model with an adaptive thresholding function. Notice that the

attacker identification, as described in the second part of the paper,

was not part of the conference paper. The second novelty of our

study is that the algorithm beside detecting the occurrence of an

attack, it also can pinpoint the set of attackers. In other words un-

der certain realistic assumptions, it can discriminate between mes-

saging patterns of the attackers and those of the non-malicious,

i.e., normal users. Similarly, the attacker identification model runs

in an unsupervised mode and it is independent of underlying at-

tack model except for the assumption of attacker coordination. Per-

formance results of the algorithm are studied under extensive net-

work traffic and attack traffic simulations.

In Section 2 , we give a brief overview of cyber threats related

to SIP and proposed remedies for them. In Section 3 we define the

variables and symbols used to describe the time series correspond-

ing to the messaging history of the users and the state of the sys-

tem. In Section 4 , we introduce our change point monitor based

on Mahalanobis distances as an instrument to detect (D)DoS at-

tacks. The method for normal versus malicious user discrimination

is detailed in Section 5 . The performance of the proposed methods

is evaluated using simulation data and compared against those of

competitor algorithms in Section 6 . Finally, conclusions are drawn

in Section 7 where we also discuss the future work in the context

of IoT.

2. Literature review

In addition to the session layer attacks, telecommunication net-

works are also susceptible to a plethora of other threats below the

session layer [8] . Since these are discussed in detail elsewhere, in

this work, we focus solely on SIP-specific threats.

SIP attacks typically exploit vulnerabilities in the SIP protocol.

Signature-based attacks utilize properties of the SIP grammar, and

can be detected by pattern matching between ongoing traffic and

the set of signatures. In other words, this type of attack can be

determined or even prevented by inspecting the steps that the at-

tacker must follow through. The non-signature based threats, e.g.,

behavior-based attacks such as DDoS, are harder to detect. SIP

threats can be roughly categorized into 4 groups [8] :

• Service Abuse Threats: These attacks include commercial abuse

of services to gain some financial benefit such as toll fraud or

billing avoidance.

• Eavesdropping, Interception and Modification Threats: These at-

tacks concentrate on illegally intervening to the call with the

goal of capturing sensitive information.

• Social Threats: These attacks use protocol shortcomings, mis-

configurations or implementation bugs of SIP server implemen-

tation and use these weaknesses to misrepresent the identity of

malicious parties to the subscribers.

• (Distributed) Denial of Service ((D)DoS): These attacks focus on

the SIP server to prevent it from giving service to the sub-

scribers or to cause significant degradation in the quality of

network services. An attacker can achieve this by flooding the

server with SIP messages and depleting the network and server

resources, such as CPU, memory, bandwidth. In the DoS attack,

only one machine is involved to mount the attack on the SIP

server. If the attacks are simultaneously performed by many

machines, possibly coordinated, the attack becomes a DDoS at-

tack. The botnet attack, where the attack is staged by many

zombie machines that are controlled by a master node, is a

well-known instance of DDoS.

There is a large variety of possible DDoS attacks, such as Do-

ain Name Server (DNS) attack and fuzzing attack [9,10] . The DNS

ooding attack wastes the bandwidth resources by injecting fake

ddresses, tying up the call during address resolution, and caus-

ng unnecessary messaging traffic between DNS and SIP server. The

uzzing attack, on the other hand, wastes CPU time by forcing it

o parse invalid SIP messages. DDoS attacks in SIP networks can

e grouped into four classes: SIP message payload tampering, SIP

essage flow tampering, SIP message flooding, and finally exploit-

ng SIP vulnerabilities, e.g., for toll fraud [11] .

Many methods have been proposed to detect and prevent DDoS

ttacks in VoIP networks. For example, for the SIP message flood-

ng varieties, an extended finite state machines (EFSM) can be de-

igned for SIP transactions in order to monitor transaction anoma-

ies [12] . Selected network traffic variables are tracked and if an

ndefined transaction occurs or any traffic variable count exceeds

pre-determined threshold, a preventive action is triggered. A full

rotocol stack intrusion detection and prevention system for VoIP

ystems is proposed in [13] . This is a table-based system that col-

ects, correlates and tuples data from different protocols on the

ommunication stack, e.g., MAC addresses, IP addresses, subscriber

Ds, packet timestamps. The decisions, such as dropping packets,

re given by certain rules applied over these tuples.

In [14] , the packets are labeled with respect to their transmis-

ion control protocol (TCP) flags. An alarm is raised if the packet

ounts in a time window deviates from the distribution fitted for

he normal traffic. In an alternate research, a naive Bayesian clas-

ifier has been constructed as a DDoS detector based on network

raffic variables. In [7,15] , a Bayesian change point model that de-

ects traffic surges or dips, which possibly correspond to DDoS

ttacks is proposed. The model is a hierarchical hidden Markov

odel that links the features extracted from SIP network traffic

nd server load to latent variables. One set of these variables tracks

he hidden dynamics of the system and the others serve as change

oint indicators. The output of the model is the posterior prob-

bility of a change indication, which is calculated at fixed time

ntervals.

As for SIP message payload tampering variety, an N -gram tech-

ique has been considered to detect the fuzzing attacks exploit-

ng malformed SIP messages. In this case, based on a corpus of SIP

essages, which contains both valid and malformed messages, 4-

rams, i.e., sequential 4-byte blocks in SIP messages, are extracted.

he 4-grams which exceed a given frequency threshold are des-

gnated as significant features and their occurrence count vectors

re used as features to train classifiers [16] . An experimental study

f applying 5 different machine learning models to detect DDoS

M. Semerci et al. / Computer Networks 136 (2018) 137–154 139

a

a

t

a

m

T

s

e

t

a

m

I

a

L

u

n

r

i

o

p

f

h

r

p

f

a

d

t

f

b

h

i

s

b

s

d

b

c

t

n

p

s

w

i

c

f

o

p

m

s

r

w

s

s

n

t

w

m

o

a

p

n

e

t

p

T

b

t

m

o

r

n

i

a

p

D

h

p

t

p

3

c

i

p

e

t

r

o

t

S

T

t

c

a

t

l

i

s

c

t

m

i

w

m

m

v

c

a

t

s

t

f

t

o

c

S

x

ttacks in SIP-deployed networks have been conducted [17] . The

uthors have implemented a simulation environment in order to

rain and evaluate the performance of the models. The classifiers

re trained with pre-generated training data collected from SIP

essage headers, which contain both attacks and normal traffic.

he models are required to be re-trained whenever the network or

ervice operating conditions are changed. The trained classifiers are

valuated in terms of accuracy and time overhead required to run

hem on-line for each message. A recent research proposes using

n autoregressive integrated moving average (ARIMA) time series

odel to classify the normal traffic, DoS and DDoS attacks [18] for

P networks. The number of packets and the number of IP sources

re tracked for each time unit and their ratios are stored. The local

yapunov exponents are calculated for these ratios and these val-

es are compared with a threshold to discriminate malicious from

on-malicious traffic type.

A statistical anomaly detection model, to which our method has

esemblances, was proposed in [19,20] . This method detects signif-

cant deviations in the 3G mobile network traffic patterns based

n a variant of Kullback–Leibler (KL) divergence between two em-

irical distributions. The collected data samples for each observed

eature within time window are fitted into respective univariate

istograms. Then, these empirical distributions are compared with

eference distributions of the observed features based on the pro-

osed divergence metric. If the distance of any of the inspected

eature distributions to that of its corresponding reference exceeds

n empirically set threshold, then an alarm is raised to declare a

etected anomaly. A human expert gives the final decision about

he detected anomaly as to whether it is an attack or not.

The spread of intelligent mobile devices has resulted in a new

acet of mobile botnets. The distributed characteristics of the mo-

ile network (capability to change IP addresses frequently) and

uge number of easily-hacked zombie devices by malwares make

t hard to prevent the DDoS attacks with conventional PC-centric

olutions. Besides using the Internet for command propagation, the

ot master can coordinate the zombies in some exceptional ways

uch as Bluetooth communication or SMS/MMS messaging. Three

ifferent command and control architectures (coordination of zom-

ies by the master) to start a mobile botnet DDoS attack are dis-

ussed in [21] . A recent study uses machine learning techniques

o discriminate applications that are malwares used in mobile bot-

ets. The manifest files of Android Application Packages (APK) are

rocessed to extract features. After some pre-processing steps, the

elected features are used in training classifiers to detect the mal-

ares [22] .

A detailed survey on historical evolution of Botnets is provided

n [23] . A detailed review of network intrusion systems which are

apable of detecting DDoS attacks and the specific methods used

or detection can be found in [24] .

Analysis of time series for classification, prediction, change and

utlier detection has been active research topics for decades with

articular focus on financial markets [25] . Among the plethora of

ethods proposed one can mention: i) methods that map the time

eries into a new feature space, such as spectral entropy, autocor-

elation etc. [26] ; (ii) kernel methods for time-series classification

ith emphasis on sequence alignment [27–29] ; (iii) clustering time

eries with a combined distance function satisfying the triangle

imilarity, which is the cosine value between two vector, and dy-

amic time warping distance [30] ; (iv) approaches fitting the data

o a number of possible models, such as a hidden Markov model

ith dynamic time warping, or an autoregressive moving average

odel with dynamic time warping, and clustering the data based

n model instance with the best fit [31,32] ; (v) singular spectral

nalysis where data is embedded, the embedding matrix decom-

osed and reconstructed into trend, noise and oscillatory compo-

ents.
o
Metrics, which are functions to calculate distances between two

ntities in a set, can be used to detect anomalies in the network

raffic and in [33] two such information metrics have been pro-

osed for DDoS attacks. Similarly, a DDoS detector which uses the

sallis entropy has been proposed [34] . The Mahalanobis distance,

ased on inverse covariance matrix, has been previously used in

he detection of abnormal callers (outliers) by inspecting their SIP

essage flows [35] . In this study, however, we use an adaptively

n-line trained variety of the Mahalanobis distance for a time se-

ies. We use the time series of Mahalanobis distances accompa-

ying the input time series to detect DDoS attacks as well as to

dentify the malicious user from their messaging behavior analysis.

One of the first IDS system architectures that uses behavioral

nalysis to detect DDoS attacks and the malicious attackers was

roposed in [36] . The attacking entities aiming for a distributed

oS attack are characterized by a common messaging pattern. This,

owever, cannot be represented by a rule-based system. The pro-

osed system consists of three components: A sniffer to capture

he packets, a preprocessor to extract informative features from the

ackets and a classifier to detect the anomalies in the traffic.

. Mathematical notation

We first introduce the notation specific to the communication

ontrol, e.g., SIP, messaging. Time is discrete, represented by the

nstants t = i � at which user behavior data is collected and then

rocessed to output a feature vector. � is an observation interval,

.g., 1 s long, within which user messaging activities are moni-

ored. A messaging activity observed at the server side is the ar-

ival of one of the SIP messages (invite, bye, 200 etc.) from a user

r the transmission of such a message to a user.

At the end of this interval, the r th user’s activity is denoted by

he d -dimensional vector v r , where d is the number of different

IP request or response message types taken into consideration.

he vector v is an integer vector whose components correspond

o the number of times each one of the d message types has oc-

urred within the i th time frame ((i − 1)� < t < i �) . Not all users

re active in each observation interval. An active user, for example

he r th one, is a registered user that has sent and/or received at

east one SIP message within the given observation interval, and it

s indicated by u r , r = 1 , . . . , | U| , where | U | is the cardinal of this

et.

Next let’s look into the details of the user’s count vectors. A

ount vector results from the sum of individual messaging activi-

ies of an active user. The r th active user is assumed to run P r > 0

essaging activities within the observation interval. Each messag-

ng activity is represented v p r , p = 1 , . . . , P r , which is a unit vector

ith one component being 1, and the rest 0. Let’s call this as a

essage indicator vector, because it indicates which one of the d -

essages has occurred. Then v r =

∑ P r p=1

v p r , v r is simply the count

ector of messages sent by the r th user, as shown in Fig. 1 .

Finally, let us introduce the d -dimensional count vector, x ,

alled the state vector, that represents the collective activities of

ll | U | active users within a time frame. The state vector, which is

he total message count vector from all users at the server side is

imply the sum of the active user count vectors, x =

∑ | U| r=1

v r , and

his is illustrated in Fig. 2 .

We have so far omitted any specific index to denote the time

rames to avoid notational clutter. However, we will use the no-

ation x i , x j ∈ �

d to denote server state vectors at the i th and j th

bservation intervals. These feature vectors or server state vectors

an be used to monitor the traffic regime changes in a network.

Let M be a d × d positive (semi) definite matrix ( M ∈ S + or M ∈ ++ ). D M

( x i , x j ) is the distance between the feature vectors x i and

j calculated over metric matrix M . f (M | x n : x n −k −1 ) is a function

f M defined over the time window of length k tracked between


0 1 · · · 0 · · · 0

0 0 · · · 0 · · · 1

0 0 · · · 1 · · · 0...

.... . .

.... . . · · ·

0 0 · · · 0 · · · 0

1

v1r

0

v2r

· · · 0

vpr

· · · 0

vPrr

Observation interval between (i 1) ∗ Δ and i ∗ Δ

, →

v1,r

v2,r

v3,r

...

vd−1,r

vd,r

vr

Fig. 1. The r th user count vector resulting from the accumulation of message indi-

cator vectors ( v r =

∑ P r p=1 v

p r ) in an observation interval.

v1,1 v1,2 · · · v1,r · · · v1,|U |

v2,1 v2,2 · · · v2,r · · · v2,|U |

v3,1 v3,2 · · · v3,r · · · v3,|U |...

.... . .

.... . . · · ·

vd−1,1 vd−1,2 · · · vd−1,r · · · vd−1,|U |

vd,1

v1

vd,2

v2

· · · vd,r

vr

· · · vd,|U |v|U|

Observation interval between (i − 1) ∗ Δ and i ∗ Δ

→

x1

x2

x3

...

xd−1

xd

x

Fig. 2. The server state vector is the sum of user count vectors ( x =

∑ | U| r=1

v r )

wpr=

vpr

tpr

Fig. 3. The time-stamped user message vector is the concatenation of user unit

message vector and the time it is sent (w

p r ∈ � d+1 ) .

n

t

a

f

|

4

c

t

f

s

t

b

t

c

c

t

a

s

j

M

t

o

o

c

w

t

t

4

l

m

t

t

a

c

i

c

t

v

u

D

a

T

D

w

t

s

4

s

feature vectors from x n −k −1 to x n : From the time index n − k − 1

to time index n. D ld ( A, B ) is a function defined over any two same

dimension matrices, A and B .

Notice that up to this point we have neglected the stamp in-

formation, that is, the actual time instances t 1 r , . . . , t P r r within a

generic �-long time frame, at which the P r messaging activities,

say, of the r th user, are occuring. We can incorporate this infor-

mation by augmenting the dimensionality of the message indica-

tor vector, v p r , by one, as follows: (w

p r )

� = ((v p r )

� , t p r ) . Thus, w

pr

is the timestamp-enriched version of the message indicator vector

v p r . Notice that w

p r ∈ �

d+1 consists of the concatenation of message

indicator vector v p r and the time instance at which the message

occurs, t p r , as given in Fig. 3 .

Given the definitions above, any user u r , can be mapped to a

time series, which can be represented as one of these two matri-

ces: V r = [ v 1 r | v 2 r | . . . | v P r r ] or W r = [ w

1 r | w

2 r | . . . | w

P r r ] . The kernel func-

tion that measures the similarity of any two user pair ( u q , u r ), is

represented as K ( u q , u r ). κ(w

p r r , w

p q q ) is defined as the heat ker-

el to calculate the similarity between time-stamped message vec-

ors of any two users in the same interval: p th r message of r th user

nd p th q message of q th user. Using the user pair kernel functions,

or that time interval, we can calculate the kernel matrix K of size

U | × | U |.

. Adaptive distance-based change point detection estimator

Feature instances extracted from adjacent intervals within the

orrelation length of a stationary process tend to have high statis-

ical similarity. On the other hand, features originating from dif-

erent generative processes or from different sections of a non-

tationary process can be expected to have large pairwise dis-

ances. Based on this premise, a significant change in the distances

etween consecutive feature vectors in a time series can be in-

erpreted as an indicator of a change in the data generating pro-

ess. The Hidden Markov Model (HMM) can capture these regime

hanges as a switching variable from one generator to another in

he hidden layer. In the context of communication networks, such

n abrupt change in feature vectors corresponding to traffic inten-

ity patterns and/or of server resource utilization rates can be con-

ectured to signal a DDoS attack. A Distance-based Change Point

ethod (DCPM), as used in our work, first tracks the distances be-

ween sequential feature vectors and then computes the statistics

f these distances to decide for a change or not. Judicious choice

f a distance function can prove critical in the performance of ma-

hine learning algorithms. To this effect, one can use one of the

ell-known distance functions or attempt to learn a distance func-

ion specific to the problem at hand. In this work, we have opted

o use a learning scheme for the Mahalanobis distance.

.1. Mahalanobis distance

The Mahalanobis distance D M

between x i , x j ∈ �

d can be calcu-

ated as in Eq. (1) . The Mahalanobis distance is defined over sym-

etric positive semi-definite (PSD) matrices, ( M ∈ S + ) d × d , and

he choice of M can be made to account for the correlations be-

ween features and the differences between scales. The inverse of

full rank sample covariance matrix, �, gives rise to a special

ase of Mahalanobis distance ( M ∈ S ++ ), which assumes the data

s generated from a multivariate Gaussian distribution Under this

hoice and the Gaussian assumption, it can be shown that M maps

he data to uncorrelated and unit-variance Gaussian variables. Con-

ersely, if the features follow a standard Gaussian distribution with

ncorrelated components, then we have: M = � = I .

M

(x i , x j ) = (x i − x j ) � M (x i − x j ) (1)

Any symmetric positive semi-definite matrix can be factorized

s M = A

� A such that A is an e × d projection matrix and e ≤ d .

hus, the relation below can be obtained.

M

(x i , x j ) = (x i − x j ) � M (x i − x j )

= (x i − x j ) � A

� A (x i − x j )

= (A (x i − x j )) � A (x i − x j )

= (Ax i − Ax j ) � (Ax i − Ax j )

= ‖ a i − a j ‖

2 2 = D E (a i , a j )

= D A (x i , x j ) (2)

here a i = Ax i is the projected vector, and D E is the Euclidean dis-

ance. Eq. (2) shows that the Mahalanobis distance in the feature

pace is equivalent to the Euclidean distance in a projected space.

.2. Distance-based change point model

The distance-based change detection is achieved by inspecting

um of distances over a sliding window, called moving distance,


w

m

r

v

a

c

o

M

t

i

c

s

a

a

m

a

e

M

β

l

i

c

f

i

d

i

t

w

l

m

k

a

M

a

i

D

w

t

M

T

T

4

o

t

t

a

t

Algorithm 1 Adaptive Online Distance-Based Change Point Detec-

tion Algorithm.

1: Initialize M 0 (default I ).

2: Set k , λ, β and α (for εth ).

3: repeat

4: Inspect the SIP traffic in the time window of size k, and

compute the count vector.

5: if f (M n −1 | x n : x n −k −1 ) > εth then

6: Raise alarm.

7: Run the malicious user detector defined in Algorithm 4.

8: end if

9: Evaluate M

∗.

10: Set M n −1 = M

∗.

11: until the flow ends

d

b

f

p

o

d

t

x

d

d

χ

i

a

g

Z

Z

ε

c

o

b

o

a

p

a

T

d

f

ε

a

d

(

t

t

t

t

m

t

s

s

i

s

here distances between the current feature vector and its im-

ediate predecessors in a time-frame of size k are summed. The

esult of the sliding window sum is compared with a threshold

alue, εth, and an alarm is raised for the potential occurrence of

regime change. This step is followed by the malicious user dis-

rimination algorithm, as detailed in Section 4 . The main novelty

f this method is that we learn the weight matrix M (called the

ahalanobis metric from now on) under a loss function so that

he detection algorithm is adapted to inlier variations and trends

n the traffic intensity to avoid false alarms. The inlier variations

an be due to diurnal or week-day based changes or to short-lived

poradic flurry of call activities.

The moving distance over a k-sized time frame can be defined

s a function of the symmetric positive definite matrix, M ∈ S ++ ,s follows:

f (M | x n : x n −k −1 ) =

n −1 ∑

j= n −k −1

(x n − x j ) � M (x n − x j ) (3)

If the moving distance computed using the current Mahalanobis

etric is above the threshold, f (M n −1 | x n : x n −k −1 ) > εth , then an

larm is raised. The Mahalanobis metric is updated periodically at

ach time interval under the loss function given below:

min

∈ S ++ f (M | x n : x n −k −1 ) + λD ld (M , M n −1 ) + βD ld (M , I ) (4)

In Eq. (4) , the second and the third terms, λD ld (M , M n −1 ) and

D ld ( M, I ), respectively, are regularization functions based on the

ogarithmic determinant divergence [37] (LogDet). LogDet function

s a pseudo-metric that measures the distance between two matri-

es and is defined in Eq. (5) . Detailed information about the LogDet

unction is given in the Appendix section. The former regularizer

mposes the updated matrix to be as similar as possible to its pre-

ecessor. The latter one forces it to be as close as possible to the

dentity matrix to prevent it from converging to an irrelevant ma-

rix and at the same time to induce sparsity. Thus, their relative

eights can be gauged to trade-off the update rate of the Maha-

anobis metric and the aging of the effect of the past measure-

ents. The four parameters to be set are the sliding window size

(time frame size), the two regularization cost weights, λ and β ,

nd the parameter α for thresholding. At the start of the algorithm,

0 is initialized as the identity matrix, M 0 = I . Since the LogDet is

convex function of M , we are guaranteed to find the optimal pos-

tive definite matrix, that minimizes the criterion in Eq. (5) .

ld (M , M t−1 ) = tr (MM

−1 t−1 ) − log det (MM

−1 t−1 ) − d (5)

here tr( • ) is the trace function for the matrices.

The optimal Mahalanobis metric ( M

∗) can be found by taking

he derivative of Eq. (4) and setting it to zero.

∗ =

(λ

λ + βM

−1 n −1 +

β

λ + βI

+

1

λ + β

n −1 ∑

j= n −k −1

(x n − x j )(x n − x j ) � )

−1 (6)

his Mahalanobis metric update is repeated at each time index.

he change detection algorithm is given in Algorithm 1 .

.3. Thresholding of the moving distances

The characteristics of the moving average of distances depend

n the traffic volume intensity, the dimension of the feature vec-

or, the size of the time frame etc., and hence it becomes critical

o set a threshold value judiciously to detect regime anomalies or

brupt changes. In this study we test comparatively two different

hreshold functions.

Experimental evidence has shown that we can approximate the

istribution of the moving sum of distances as a Chi-squared distri-

ution. It is then assumed that Mahalanobis distances are obtained

rom a Gaussian distribution such that μ = x n in the immediate

ast observation interval, and � = M

−1 . If y , which is the set of

bservations in the current sliding window, is a d -dimensional ran-

om vector drawn from a Gaussian distribution with a mean vec-

or μ and a d -rank covariance matrix �, then z = (y − x n ) � M (y −

n ) = (y − μ) � �−1 (y − μ) becomes Chi-Squared distributed with

-degrees of freedom.

Let z i denote one of k independent, identically distributed ran-

om variables that follow a chi-square distribution such as z 1 ∼2 α,d 1

, z 2 ∼ χ2 α,d 2

, . . . , z k ∼ χ2 α,d k

. Due to the additive property of

ndependent chi-squared variables, the sum of the random vari-

bles follows a chi-square distribution with d 1 + d 2 + · · · + d k de-

rees of freedom. That is,

= z 1 + z 2 + · · · + z k

∼ χ2 α,d 1 + d 2 + ···+ d k (7)

Thus, the threshold of our anomaly detection model becomes

th = χ2 α,k ∗d

. The α parameter is the probability of accepting a

hance fluctuation as an anomaly. In other words in the absence

f an attack, the score of the moving average of distances, denoted

y Z above, has a probability less than α to exceed the thresh-

ld εth . The converse event of Z exceeding the threshold can be

ccepted as an anomaly with probability 1 − α. The value of α de-

ends on the requirements of the system and it is typically set by

n human expert to some such value as α ∈ {0.1, 0.05, 0.02, 0.01}.

his is a statistical approach that is based on the sum of observed

istances.

An alternate, empirically found constant threshold, which is a

unction of two system parameters is given below:

th = c k

(d

2

)2

(8)

nd is found to work equally well. This fixed threshold value only

epends on the time frame size ( k ), the number of dimensions

d ) and a constant c . As a plausible argument for the fact that

he constant thresholding function works equally well, we observe

hat the same parameters ( k and d ) are also inherent in the χ2

hresholding. Notice also that there is some liberty in adjusting

his threshold by setting the constant c according to the require-

ents of the deployed system. A case in point could be to make

he constant indexed by time periods c n , e.g., to account for sea-

onal trends.

More importantly, even though the threshold is set to a con-

tant, the system is still an adaptive model due to the adaptation

nherent in the updates of the Mahalanobis metric. At each ob-

ervation interval, the Mahalanobis metric is updated to accom-


w1q ,w

1r

w1q ,w

2r

w2q ,w

2r

w2q ,w

1r

w3q ,w

1r

w3q ,w

2r

Fig. 4. All possible alignments W q = [ w

1 q | w

2 q | w

3 q ] and W r = [ w

1 r | w

2 r ] .

u

s

g

t

5

W

1

s

t

i

T

F

m

m

K

w

t

s

K |

K )

5

h

q

t

t

i

D

w

i

t

s

w

5

a

h

0

d

a

s

K

modate the new distances between the observations. Therefore

whenever the threshold is exceeded, it means that the quadratic

smoother could not smooth out the new measurement digressions,

and therefore it is very likely to be an anomaly.

5. Malicious user discrimination

If a detected anomaly is in fact a DDoS attack, the next task is

to identify the set of malicious users that are presumably coordi-

nating to mount a distributed attack. For this analysis, each sub-

scriber’s behavior history in the observation interval is represented

as a time-series, as given in Fig. 1 . We process the time series us-

ing a similarity functions so that the subscribers with similar be-

havior patterns are clustered into the same group. We have pro-

posed and evaluated two different attacker discrimination meth-

ods. The first one is based on a global time series alignment kernel

that makes use of both epoch differences and feature distances be-

tween message sequences. The second one uses the user message

count vectors at the end of periodic observation intervals, i.e., the

information on message time instants are ignored. The pairwise

similarity of any two users is calculated using their count vectors.

5.1. Sequence alignment kernel

We consider the ensemble of the timestamped messages sent

by a user within a time frame of k units, say (n − k − 1) , . . . , (n −1) , as message sequences. Each user’s sequence can have a differ-

ent number of messaging events, each event occuring at a different

time instant. In other words, a user’s message sequence or time se-

ries corresponds to the ensemble of messages sent by a registered

terminal within the designated observation interval, each event be-

ing characterized by the type of SIP message and its timestamp.

Our goal is to estimate the similarity of messaging activities of the

users via a kernel-based scheme. For this purpose, the message se-

quences must be aligned without pair repetition. The similarity be-

tween two sequences of possibly different lengths, i.e, number of

messaging events, can be determined as the sum of similarities of

all their feasible alignments. Thus two sequences are more simi-

lar as a pair if their messaging types, e.g., invite or bye, and their

occurrences in time resemble each other.

Let us assume the user time series, i.e., timestamped message

sequences, ( W q , W r ) of the user pair ( u q , u r ), W q = [ w

1 q | w

2 q | w

3 q ]

and W r = [ w

1 r | w

2 r ] with three and two messaging events, respec-

tively. Fig. 4 shows an example of all possible alignments for these

two sequences. In this specific example, there are 5 possible align-

ments, as follows:

• (w

1 q , w

1 r ) , (w

1 q , w

2 r ) , (w

2 q , w

2 r ) , (w

3 q , w

2 r )

• (w

1 q , w

1 r ) , (w

2 q , w

2 r ) , (w

3 q , w

2 r )

• (w

1 q , w

1 r ) , (w

2 q , w

1 r ) , (w

2 q , w

2 r ) , (w

3 q , w

2 r )

• (w

1 q , w

1 r ) , (w

2 q , w

1 r ) , (w

3 q , w

2 r )

• (w

1 q , w

1 r ) , (w

2 q , w

1 r ) , (w

3 q , w

1 r ) , (w

3 q , w

2 r )

A global alignment kernel has been proposed in [38] , which

ses dynamic programming to compute the similarity of all pos-

ible alignments of two sequences. We use a variation of this al-

orithm, where we employ a pairwise heat kernel that is based on

he Mahalanobis distance and differences of time stamps.

.1.1. Global sequence alignment kernel

Given the two message sequences W q = [ w

1 q | w

2 q | . . . | w

P q q ] and

r = [ w

1 r | w

2 r | . . . | w

P r r ] for the user pair ( u q , u r ) in a state space

, we set the doubly-indexed series T p q ,p r as T p q , 0 = 0 for p q = , . . . , P q , T 0 ,p r = 0 for p r = 1 , . . . , P r , and T 0 , 0 = 1 . We also as-

ume that there is a function to measure the similarity be-

ween the p th q signaling event of user u q and the p th

r signal-

ng event of other user u r , κ(w

p q q , w

p r r ) . Computing recursively

(p q , p r ) ∈ { 1 , . . . , P q } × { 1 , . . . , P r } , for the terms, one has:

p q ,p r = (T p q ,p r −1 + T p q −1 ,p r −1 + T p q −1 ,p r ) κ(w

p q q , w

p r r ) (9)

inally, the unnormalized similarity between two users ( u q , u r ) is

easured when the recursion has considered all possible align-

ents, that is:

unnormed (u q , u r ) = T P q ,P r (10)

After that the kernel matrix for all user pairs has been obtained,

e unit-diagonal normalize the | U | × | U | kernel matrix, where | U | is

he number of active users in the system, in order to eliminate any

caling issues:

(u q , u r ) =

K unnormed (u q , u r ) √

K unnormed (u q , u q ) √

K unnormed (u r , u r ) , q, r = 1 , . . . , | U

(u q , u r ) → [0 , 1] (11

We will call this kernel as the time series kernel.

.1.2. Pairwise heat kernel

Each user in a time window can be represented in terms of

er ordered timestamped message sequence. Recall that user se-

uences can have differing lengths and can consist of different

ypes of messages.

A kernel function (pairwise heat function) for any two times-

amped vectors, (w

p q q )

� = ((v p q q )

� , t p q q ) and (w

p r r ) � = ((v

p r r ) � , t p r r )

s evaluated as:

κ(w

p q q , w

p r r ) = exp (−γ D M

(v p q q , v

p r r ) − ρ| t p q q − t p r r | ) (12)

M

(v p q q , v

p r r ) = (v

p q q − v p r r )

� M (v p q q − v p r r )

here M is the Mahalanobis metric evaluated at that observation

nterval as in Eq. (6) . Note that κ(w

p q q , w

p r r ) = 1 iff v

p q q = v

p r r and

p q q = t

p r r . The coefficients γ and ρ determine the weights of mes-

age type distance and timing distance, respectively. In this study

e have assumed γ = ρ = 1 .

.2. User distance kernel

A kernel matrix of pairwise user-to-user similarities can be cre-

ted based on their Mahalanobis distances. User pairs would have

igh similarity (close to 1) if their Mahalanobis distance is close to

; conversely, if the pair similarity is small (close to 0), then their

istance is large. The Mahalanobis distance kernel can be regarded

s a variant of Gaussian kernel.

Any two users, u q and u r , can be compared based on their mes-

aging count vectors v q , v r ∈ �

d , as follows:

(u q , u r ) = exp (−(v q − v r ) � M (v q − v r )) (13)


i

t

i

d

f

o

c

(

p

e

5

t

t

w

c

g

s

m

p

t

r

i

a

c

b

n

m

s

t

d

w

m

m

g

A

L

w

a

5

r

a

c

g

m

t

s

s

c

s

s

t

l

a

e

A

c

a

A

6

o

p

o

t

d

We will call this kernel simply as distance kernel. K ( u q , u r ) is 1

ff v q = v r Note that this feature vector does not take into account

he occurrence timing of the messages, but averages the messag-

ng traffic in that interval. We would like to point out again the

ifference between the two ways of measuring user behavior dif-

erences. In Eq. (13) , we consider the messaging events integrated

ver the observation interval, represented by the d -dimensional

ount vector of messaging events according to their types. In Eqs.

11) and ( 12 ), we calculate the difference of user behaviors by com-

aring and measuring distances, messaging event by messaging

vent, as they occur during the observation interval.

.3. Spectral clustering

A matrix of pairwise user-to-user similarities is created from

he users’ messages as in Eq. (11) or (13) . The kernel matrix, K ,

hen corresponds to a fully connected weighted adjacency graph,

here the users are the vertices and the similarities are the edge

osts. The adjacency matrix is expected to consists of two sub-

raphs: One representing the malicious users characterized by

imilar behavior patterns, and the other representing the non-

alicious users with random-like behavior patterns. In order to

artition this graph into these two sub-graphs, we have used

he normalized Laplacian spectral clustering algorithm. Such algo-

ithms are conceived to find graph partitioning solutions in cluster-

ng problems. In the literature there are various spectral clustering

lgorithms. We have preferred to use normalized Laplacian spectral

lustering because we want to not only have the similar nodes to

e closely projected to each other, but also to have the dissimilar

odes to be projected far from each other. The normalized spectral

ethods satisfy both of these criteria, as discussed in [39] .

The degree of q th active user in the kernel matrix, which is the

um of all the weight entries related the q th active user, at a given

ime frame is evaluated as:

g q =

| U| ∑

r=1

K q,r (14)

here K q,r = K (u q , u r ) .

The degree matrix D is a diagonal matrix whose diagonal ele-

ents contain the degree values, dg 1 , dg 2 , . . . , dg | U| . The Laplacian

atrix, L , is evaluated as in Eq. (15) and the spectral clustering al-

orithm is given in Algorithm 2 .

lgorithm 2 Normalized Laplacian Spectral Clustering.

1: Given K , evaluate D and L , which are all in �

| U |×| U | . 2: Compute the first two eigenvectors, ψ 1 and ψ 2 , of the two

smallest eigenvalues 0 = λ1 < λ2 for the generalized eigenprob-

lem L ψ = D ψ , where is the diagonal matrix of eigenvalues

λ1 , . . . , λ| U| . 3: Matricize ψ 1 and ψ 2 vectors to obtain � ∈ �

| U|×2 . Use the rows

of � as the new feature vectors in the mapped space, y ∈ �

2 .

Apply 2-means clustering.

4: Return the cluster label vector C from 2-means clustering.

= D − K (15)

here K is the | U | × | U | is kernel matrix whose entries, K ( u q , u r ),

re calculated as in Eq. (11) or ( 13 ).

.4. Automatic identification of malicious users cluster

The malicious users are conjectured to be characterized by

epetitive and correlated behaviors, and the rest of users are char-

cterized by uncoordinated and diverse behaviors. Once the two

lusters are obtained, then the final task would be that of distin-

uishing the attacker set.

For each of the two clusters, we compute the sample covariance

atrix of the user message sequence vectors in that cluster. Since

he malicious user cluster is assumed to consist of similar mes-

aging behaviors, such message vectors are expected to be more

trongly aligned along a few particular axes. In fact, in the extreme

ase when all messages in the cluster are of the same type, the

ample covariance matrix would be the 0 matrix. Therefore, we as-

ign the cluster with significantly higher eigenvalue concentration

o malicious users. This algorithm, based on the heuristics that ma-

icious users must be somewhat coordinated to mount an attack,

nd therefore that the data vectors must concentrate along a few

igenvectors as given in Algorithm 3 . Each cluster is assumed to

lgorithm 3 Cluster Selection Heuristics.

1: For the given cluster label vector C , determine the two clusters,

C 1 and C 2 .

2: For the two clusters, evaluate the sample covariance matrix of

the projected message vectors.

3: if a cluster has a covariance matrix = 0 then

4: Return this cluster.

5: else

6: Evaluate the eigenvalues of the cluster covariance matrices.

7: Return the cluster with the highest eigenvalue

8: end if

ontain at least two subscribers.

Putting all of these steps together, the algorithm to detect the

ttackers is summarized in Algorithm 4 .

lgorithm 4 Attacker Detection.

1: if Global Sequence Kernel is used then

2: Set the weight parameters γ and ρ of the pairwise heat ker-

nel.

3: Evaluate the kernel matrix K such that ∀ (u q , u r ) ∈ U × U , we

have K q,r = K(u q , u r ) , where we use the time-stamped mes-

sage sequences W q , W r of the q th and rth users in the given

time interval, respectively, with the alignment kernel, and U

is the set of active users.

4: Unit-diagonal normalize K unnormed to obtain K .

5: end if

6: if User Distance Kernel is used then

7: Evaluate the kernel matrix K such that ∀ (u q , u r ) ∈ U × U , we

have K q,r = K(u q , u r ) , where we use the total message count

vectors v q , v r of the q th and rth users in the given time in-

terval, respectively, with the distance kernel, and U is the set

of active users.

8: end if

9: Apply the normalized Laplacian spectral clustering algorithm

over K such that # clusters = 2, as defined in Algorithm 2.

10: Use the cluster label vector C returned by the spectral cluster-

ing in cluster selection heuristics as defined in Algorithm 3.

11: Return the selected cluster members as the set of attackers.

. Experiments

As is often reported in the literature, we have also found that

btaining and getting the permission to use VoIP server datasets

roves to be very problematic, mostly due to privacy concerns

f the subscribers and the commercial secrecy concerns of the

elecommunication operators. Therefore, we have used simulated

ata sets to analyze the performance of the change point detection


Fig. 5. SIP network simulation framework.

t

u

t

T

l

t

a

(

e

t

t

h

W

c

t

t

t

a

t

e

s

o

a

i

o

a

I

5

m

s

(

w

p

o

a

s

d

a

b

a

λ

model, detailed in Section 4 , and of the malicious user identifi-

cation algorithm, given in Section 5 . An Asterisk-based PBX soft-

ware, named Trixbox , is deployed as the SIP server in a virtual ma-

chine [40] . To mimic the traffic on a SIP server, we have built a

probabilistic SIP network simulation system, which initiates calls

between a number of probabilistically chosen users in real-time

[41,42] . An application that creates the user agents is deployed on

another virtual machine. We have used PJSIP open source library

[43] and implemented it in Python language. Lastly NOVA V-Spy, a

vulnerability scanning tool, is installed on a final virtual machine

and is used to simulate DDoS attacks targeting the server [44] . An

overview of the simulation environment is provided in Fig. 5 . The

proposed security system runs on the same machine with the SIP

server, as represented with a gray box in Fig. 5 .

The traffic simulator, based on a probabilistic model, generates

real-time SIP messaging traffic among registered subscribers [41] .

The probabilistic model is basically a library that initiates all per-

mitted actions of subscribers in generating real-life SIP messaging

traffic through a SIP server. Instances of subscriber actions are: the

potential callees and callers (the social network), how likely to call

a certain contact (the phone book), how often to become active

(registration frequency to the SIP server), how long to wait before

the next call (the call frequency), how likely to make a call (the call

probability), how likely to answer an incoming call (the response

probability), and how long to talk on the phone (the call duration).

The parameters provided to the simulator determines the behavior

of probabilistic model and therefore statistically the actions of sub-

scribers. The environmental parameters of the simulator are the to-

tal number of subscribers in the SIP server can serve and the num-

ber of social groups, where a social group is defined as the subset

of subscribers that are more likely to interact with each other as

compared to the rest. All subscribers are created as bots on the

simulation machine and they all follow legitimate messaging rules

of the protocol. An existing subscriber bot is active as long as its

registration on the SIP server has not expired; therefore only active

bots can interact with each other.

Data are collected by inspecting each SIP packet that arrives to

or is sent by the server. Counts of 14 SIP request and 14 SIP re-

sponse packets are recorded periodically for each time unit (which

is assumed to be 1 s in our experiments) and the 28-dimensional

vector, made up of packet counts per unit interval, constitute the

input data. The SIP message types, which are described in details

in RFC 3261 [4] , for which we record periodically the counts are as

follows:

• Requests: Register, Invite, Subscribe, Notify, Options, Ack, Bye,

Cancel, Prack, Publish, Info, Refer, Message, Update

• Responses: 100, 180, 183, 200, 400, 401, 403, 404, 405, 481,

4 86, 4 87, 500, 603

The experimental environment is controlled by two parame-

ers: the intensity of the background traffic, that is, the normal-

ser traffic and the intensity of DDoS attack traffic. In our simula-

ion system at any time there are 200 active registered subscribers.

here are 5 levels of preset normal traffic intensity created col-

ectively by the subscriber bots. The normal subscriber bots, on

he average generate a total of 5, 10, 20, 40 and 80 call attempts

mong themselves (0.025 to 0.4 message/bot), in any observation

1 s) interval. We grade these background traffic intensities as lev-

ls from 1 to 5. Fig. 6 exhibits these traffic intensities for a simula-

ion setting. Note that the gray tones in the plots are proportional

o the message counts so that the darker an region in the plot, the

igher the number of that type messages observed in that interval.

hite represents intervals with no messages and intervals with a

ount higher than 200 messages are shown in pitch black.

During an attack, the attackers start sending messages more in-

ensely to the SIP server. In the low-level attack setting, each at-

acker sends on the average 50 messages, while in a high-level at-

ack, 100 messages are sent per unit interval. In each run, ten DDoS

ttack sessions are simulated, consisting of attacks using the five

ypes of messages (Invite, Register, Options, Cancel and Bye), and

ach carried out once with low intensity and once with high inten-

ity. The runs are repeated ten folds, such that in each fold attacks

ccur in a different order and a different set registered subscribers

re selected to act as attackers.

During a DDoS attack, for a given setup, as long as not explic-

tly stated otherwise, 10 randomly selected users, that is 5 percent

f the subscribers, play the role of attackers. During attacks, the

ttackers start sending messages more intensely to the SIP server.

n the low-level attack setting, each attacker sends on the average

0 messages, while in a high-level attack, their rate becomes 100

essages per second. In each run, ten DDoS attack sessions are

imulated, consisting of attacks using the five types of messages

invite, register, options, cancel and bye), and each carried out once

ith low intensity and once with high intensity. The runs are re-

eated ten folds, such that in each fold attacks occur in a different

rder and a different set registered subscribers are selected to act

s attackers. In Fig. 6 the darker regions correspond to attacks.

The experiments are executed in a 10-fold cross-validation

etup. One dataset is used for determining the parameters of the

istance change point model, and the remaining nine datasets

re run with the estimated parameters. Recall that the distance-

ased change-point detector had three different parameters; we

pply a grid search to find the best parameters ( k = { 5 , 7 , 9 , 11 } ,= { 1 . 0 , 2 . 0 , 4 . 0 } , β = { 1 . 0 , 2 . 0 , 4 . 0 } and an additional fourth one


Fig. 6. Illustration of traffic intensities generated by the simulator.

f

t

ρ

t

q

a

o

t

a

a

a

a

u

P

R

F

F

g

c

t

t

p

f

r

o

i

a

t

c

6

o

t

r

p

t

s

s

f

p

o

s

a

t

i

s

t

fl

o

c

f

o

a

A

p

a

t

D

u

c

g

o

p

or χ2 thresholding α = { 0 . 01 , 0 . 02 } ). The default values are set for

he parameters of the time series alignment kernel as ( γ = 1 . 0 and

= 1 . 0 ). For the ARIMA model, we perform an exhaustive search

o find its optimal parameters p = { 1 , 2 , 3 , 5 , 10 } , d = { 0 , 1 , 2 } and

= { 0 , 1 , 2 , 3 , 5 , 10 } . Since we know the labels (the attack times and the identity of

ttackers) in the simulated data, we can evaluate the performance

f the proposed system in terms of the F-Measure. In the ideal case

he F-measure would be 1, which can be obtained only when there

re no falsely accused users (i.e., precision P = 1), all the attackers

re identified correctly (i.e., recall R = 1), and all the change points

re identified by the change point detector. The precision, recall

nd F-measure are evaluated as follows, for the case of malicious

sers:

recision (P) =

# detected true malicious users

# all detected malicious users (16)

ecall (R) =

# detected true malicious users

# all true malicious users (17)

-Measure (F) = 2

P R

P + R

(18)

or change point detection performance, we can replace the ar-

uments of P and R by detected true change points, all detected

hange points, and all true change points.

We have experimented with the duration of the observation in-

erval. Fig. 7 shows the effect of the length of the observation in-

erval for the normal traffic setting. Not surprisingly, as the sam-

ling interval increases, the messaging counts also go higher. A

ew words are in order to explain Fig. 7 : The abscissa represents

eal time in seconds while an observation in taken every T sec-

nds, T = 1 , 2 , . . . , 10 . We used in the graphic, the same gray tone

s used to represent the observed count vector, hence the appear-

nce of stretched bars. Furthermore, the longer the observation in-

erval, the more the number of messages seen for any type, and

onsequently the plots become have darker areas.

.1. Comparison with a competitor algorithm

Fig. 8 shows the performance of our algorithm in detecting the

nset and offset instants of the DDoS attacks. This figure is illus-

rative in that for the each simulation traffic setting, the best pa-

ameters are chosen for the models found via grid search. These

arameters used for performance comparisons are given in each

able. The ordinate lists the 28 types of SIP messages, the abscissa

hows the time in seconds, and the levels of gray show the inten-

ity of messages. The red lines indicate the change point instants

ound by the algorithms. The experiments demonstrate that both

roposed methods of thresholding are successful in detecting the

nset of attacks, but they may sometimes fail to detect the off-

et. The possible reason is that an attack causes an abrupt change

gainst the background of normal user traffic; however, after that

he incoming message intensity subsides at the end of an attack,

ts aftershock effects linger on at the server side with server re-

ponse messages. The ARIMA model often fails to raise alarms at

he correct instances and it is also affected by the short-time small

uctuations in the counts. Therefore, it gives incorrect onset and

ffset indications.

Attack Onset and Offset Determination: In our evaluations, we

onsider detecting the start and end instants of the attacks. There-

ore, we measure the number of attacks for which the onset and

ffset are correctly detected as well the miss and false alarm prob-

bilities (errors of the first type and of the second type). For the

RIMA model, we look at the start and end point of a contingent

eriod which is detected as an anomaly. Since an attack engenders

different behavior in the network (anomaly), the anomaly detec-

or should be able to detect the start and the end of an attack.

uring the comparison, we use the start and end times of contin-

ous intervals detected as the change-points for a fair comparison.

The 10-fold cross-validation performance scores of the distance

hange-point detector and the ARIMA based DDoS detector are

iven in Table 1 . Both thresholding functions are deployed. The

nset and offset times of the attacks are known and they are com-

ared with the change-point times returned by the models. For the


Fig. 7. Illustration of traffic intensities as a function of observation interval. All traffic is generated at level 3.

Table 1

The performances of the change-point detectors for normal traffic for 1 sec-

ond (For constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k =

5 , λ = 2 , β = 2 , α = 0 . 02 , for ARIMA p = 2 , d = 1 , q = 0 ).

Change-point detector Precision Recall F-Score

Constant-thresholding DCPM 0.70 ± 0.04 0.88 ± 0.07 0.79 ± 0.04

χ2 - Thresholding DCPM 0.81 ± 0.07 0.73 ± 0.10 0.77 ± 0.03

ARIMA 0.25 ± 0.11 0.15 ± 0.09 0.25 ± 0.04

w

t

6

m

t

s

d

i

c

b

g

c

t

t

t

s

6

i

e

s

i

1

o

a

m

o

s

o

t

r

ARIMA model, the change points are assumed to correspond to the

time instances where the alarms are raised (onset) and the alarms

are silenced (offset).

To assess the attack detection performance of the DCPM

(Distance-based Change Point Method) algorithm vis-a-vis to an al-

ternative method, we have run simulation experiments methods

with a method from the literature, an ARIMA-based DDoS detec-

tor [18] . The rationale for the choice of this competitor algorithm

is that it was the only model we could find in the literature oper-

ating in an on-line and unsupervised mode. At this stage we use

only one of the thresholding methods, namely χ2 thresholding as

in Eq. (7) , since the two methods yield comparable results. Their

comparative performance are given in Table 1 The proposed meth-

ods have higher performance scores than ARIMA. The main reason

is that the ARIMA detector fails to behave consistently in the attack

interval. It gives false onsets and offsets during an ongoing attack.

The DCPM methods give comparably close scores, but it should be

noted that the parameters should be set with respect to system

characteristics such as tolerance to false alarms or traffic intensity.

6.2. Effect of the observation interval length

Table 2 shows that increasing observation interval improves the

accuracy of the system; in fact the F-score increased by 10 points

when the interval is augmented from 1 to 10s. The obvious rea-

son is that the longer observation interval makes the attack traffic

statistics increasingly more distinct from the background. However

this improvement comes at the price of reduced time resolution,

here the onset and offset instances of the attack become propor-

ionally blurred.

.3. Effect of traffic intensity

Table 3 shows the effect of traffic intensity over the perfor-

ance of the change point detector. Even though the F -scores of

he detector running with empirical and statistical thresholds are

imilar, their precision and recall scores differ. The χ2 threshold

etector has higher precision resulting in lesser false alarms, but

t might miss more frequently an attack. On the other hand, the

onstant thresholding is more successful in detecting an attack

ut it results in more false alarms. Not surprisingly as the back-

round traffic intensity increases, the detection performance de-

reases. Obviously the fluctuations in the normal traffic confounds

he attack traffic, which becomes less distinctive. Conversely, when

he background traffic is low, the abrupt changes caused by the at-

acks are easier to detect. The optimal set of parameters should be

ought for each traffic intensity.

.4. Effect of overlapping attack intervals

Fig. 9 illustrates the flexibility of the proposed model. In this

nstance, the register attack is applied incrementally such that at

vents distanced in time by 80–90 s, a new set of 10 attackers

tarts an attack and at the same time the intensity of their attack is

ncreased by additional steps of 5 messages. For example, a set of

0 attackers starts at the 175th s with 5 register messages per sec-

nd, resulting in a total of 50 register messages per second; then

different set of 10 attackers starts at the 255th s with 10 register

essages per second, resulting in a total of 100 messages per sec-

nd etc. The final set of attackers sends 50 register messages per

econd per attacker. When we set λ = β = 1 , for the fixed thresh-

ld model, the algorithm is able to detect the start and the end of

he attacks when c ≤ 3. For the chi-square thresholding, the algo-

ithm is able to detect the onset and offsets when α > 0.005.


Fig. 8. The change points and alarms raised by the models. The first five attacks are low-level (50 messages per attacker), while the last five attacks are high-level (100

messages per attacker).


Table 2

The performances of the change-point detectors for normal traffic intensity for different sampling

rates (For constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k = 5 , λ = 2 , β =

2 , α = 0 . 02 ).

Detector Score 1 s 2 s 3 s 5 s 10 s

Constant Precision 0.70 ± 0.04 0.72 ± 0.04 0.73 ± 0.04 0.74 ± 0.02 0.77 ± 0.01

Recall 0.88 ± 0.07 0.92 ± 0.04 0.97 ± 0.02 0.98 ± 0.01 0.99 ± 0.01

F-Score 0.79 ± 0.04 0.81 ± 0.04 0.83 ± 0.02 0.84 ± 0.01 0.87 ± 0.01

χ2 Precision 0.81 ± 0.07 0.83 ± 0.05 0.85 ± 0.04 0.87 ± 0.03 0.92 ± 0.02

Recall 0.73 ± 0.10 0.75 ± 0.04 0.76 ± 0.03 0.80 ± 0.04 0.84 ± 0.02

F-Score 0.77 ± 0.04 0.79 ± 0.04 0.81 ± 0.05 0.84 ± 0.02 0.88 ± 0.01

Table 3

The performances of the change-point detectors for different traffic intensity levels for 1 s (For

constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k = 5 , λ = 2 , β = 2 , α = 0 . 02 ).

Detector Score Level 1 Level 2 Level 3 Level 4 Level 5

Constant Precision 0.77 ± 0.03 0.73 ± 0.05 0.70 ± 0.04 0.69 ± 0.06 0.68 ± 0.08

Recall 0.90 ± 0.03 0.88 ± 0.04 0.88 ± 0.07 0.85 ± 0.06 0.83 ± 0.07

F-Score 0.82 ± 0.04 0.8 ± 0.07 0.79 ± 0.04 0.77 ± 0.07 0.75 ± 0.07

χ2 Precision 0.88 ± 0.06 0.85 ± 0.04 0.81 ± 0.07 0.81 ± 0.08 0.79 ± 0.06

Recall 0.79 ± 0.03 0.78 ± 0.03 0.73 ± 0.10 0.72 ± 0.06 0.7 ± 0.08

F-Score 0.83 ± 0.05 0.81 ± 0.05 0.77 ± 0.04 0.76 ± 0.06 0.74 ± 0.08

Fig. 9. Register attacks increasing at incremental steps of 5.

t

c

l

t

t

l

m

c

f

w

l

t

6

w

o

c

r

n

b

n

6.5. Detection performance for time overlapped attacks

Fig. 10 shows the performance of the detector when the at-

tacks are overlapping. The vertical bars in the figure indicate the

detected onsets and offsets of the anomalous traffic when a fixed

threshold is used k = 5 , λ = β = 1 and c = 1 . The χ2 thresholding

for α = 0 . 02 shows very similar performance. Each attack type is

executed twice with 10 different attackers each time. The first oc-

currences are with 5 messages per second and the second ones are

with 10 messages per second. For example, the first cancel attack

starts with 50 messages (5 cancel messages ∗ 10 attackers) per sec-

ond around 400th s and the second cancel attack starts with 100

messages per second around 850th s.

6.6. Effects of DCPM parameters

The λ and β parameters provide the trade-off between aging

and agility of the system. If the aging parameter λ is set to a value

higher than β , the system is more resistive to the current change

in the system and it is biased to keep its status quo. On the con-

trary, a higher β value means the system is unbiased to any change

and its effects to the system will be eliminated sooner.

In the case of χ2 thresholding, the α parameter determines the

tolerance to false alarms. If it is set to a high value (e.g α = 0 . 1 ),

he algorithm is more likely to raise an alarm in case of an abrupt

hange even though it may not be caused by an attack. If it has a

ow value (e.g., α = 0 . 01 ), then the number of false The c parame-

er plays a role similar to the α parameter of χ2 thresholding, in

hat c determines the tolerance for the false alarms. Setting it to

ow values (e.g. c = 0 . 5 ), may cause even a fluctuation of the nor-

al traffic to raise an alarm. Conversely, for its high values (e.g.,

= 5 ), then attacks might go undetected.

Fig. 11 shows the Receiver Operating Characteristic (ROC) curve

or the DCPMs for all other parameters fixed other than c and α,

hich are the constant threshold coefficient and the significance

evel, respectively. Both c and α decrease while we traverse along

he curves.

.7. Performance of attacker identification methods

To assess attacker identification performance, we experiment

ith the two proposed spectral clustering methods, the one based

n the time series kernel and the other on the distance kernel. As a

ompetitor method for attacker identification, we use the time se-

ies clustering method proposed in [30] . In the latter method, dy-

amic time warping distance is used for calculating the distances

etween time series having different lengths, and a one-nearest

eighbor network is thus extracted. The performances of these


Fig. 10. Overlapping mixed types of attack increasing at incremental steps of 5.

Fig. 11. The receiver operating characteristics curve of distance-based change point models as α goes from 0.1 to 0.01 by 0.01 decrease and c goes from 1.0 to 0.65 by 0.05

decrease.

Table 4

The performance of different attacker identifiers. Constant and χ 2 stand for DCPMs

run with constant and χ2 thresholding, respectively. Distance, Time Series and Clus-

tering represent the distance kernel, the time-series kernel and the time-series clus-

tering, respectively, for the sampling interval of 1 s and level 4 traffic and where the

DCPM parameters are the same with Table 1 .

Model Precision Recall F-Score

Constant - Distance 0.64 ± 0.13 0.5 ± 0.08 0.55 ± 0.1

Constant - Time series 0.68 ± 0.1 0.51 ± 0.1 0.57 ± 0.1

Constant - Clustering 0.49 ± 0.12 0.37 ± 0.07 0.4 ± 0.08

χ2 - Distance 0.72 ± 0.07 0.57 ± 0.04 0.62 ± 0.05

χ2 - Time series 0.77 ± 0.05 0.60 ± 0.06 0.66 ± 0.06

χ2 - Clustering 0.5 ± 0.19 0.36 ± 0.13 0.4 ± 0.15

t

t

s

l

t

o

c

t

b

p

a

(

u

1

c

v

u

s

n

c

i

n

m

hree attacker identification methods are given in Table 4 . Notice

hat the attacker identification methods are run in an unsupervised

etting. This is a viable approach since most of the time, in reality,

abeled training data would not be available. This is due to either

he changing characteristics of attack models, e.g. a 0 ′ th day attack,

r privacy and prestige concerns of the service providers.

Fig. 12 shows the normalized kernel matrices, calculated ac-

ording to the time series kernel and the distance kernel, respec-

ively, as in Eqs. (11) –(13) . These matrices represent the messaging

ehavior similarity of the set of 200 users, as it is set in this ex-

eriment: In Fig. 12 however, we plot the behavioral similarity of

subset of 25 users for clarity of illustration. 10 of the 200 users

though only the behaviour of only 25 − 10 = 15 of the normal

sers are plotted) mount a DDoS attack, as shown in Fig. 12 a and

2 b. The users with similar behavior patterns have kernel values

lose to 1 (dark cells), while the uncorrelated users have kernel

alues close to 0 (white cells). Note that for the similarity val-

es between the attackers (the attacker-attacker cells), there are

ome gray shades implying a modest similarity. But the attacker-

ormal user cells are almost all completely white (close to 0) be-

ause malicious and innocents users have totally different behav-

ors. In summary, the attackers are closer to each other than to the

ormal users.

Fig. 12 c shows the attacker group labeled according to the 2-

eans clustering and malicious-user differentiation heuristics in


a) The evaluated time series kernel matrix (γ = ρ = 1)

b) The evaluated distance kernel matrix

c) Both kernels result in the same spectral label matrix

Fig. 12. The difference between kernels in discriminating the malicious users. Note

that the plots show only 25 of the 200 users for clarity purposes. In the bottom

figure, the dark red cells correspond to the attacker pairs. (For interpretation of the

references to color in this figure legend, the reader is referred to the web version

of this article.)

a) The time series alignment kernel mapping

b) The distance kernel mapping

Fig. 13. The mapping of the two kernel matrices in the projected 2-d space ( s 1, s 2)

after spectral clustering. The blue squares are the attackers and the red circles are

the innocent users for the level 4 intensity traffic with the observation interval of

1 s. (For interpretation of the references to color in this figure legend, the reader is

referred to the web version of this article.)

p

t

i

a

a

t

t

i

i

a

s

m

l

u

Algorithm 3 . The dark red cells correspond to attacker pairs; K q, r is

1 (dark red) if ( u q , u r ) is an attacker pair. Both detectors correctly

segregate the 10 malicious users from the remaining 15 innocent

users.

Fig. 13 shows how the same subscribers in Fig. 12 are mapped

by the spectral clustering. Note that in the plots, the number of

oints are less than 25. The reason is that some users overlap in

he reduced 2-dimensional space.

The performance of the attacker identifier algorithms are given

n Table 4 . The identities of the attackers and of the normal users

re known since we are using simulation data. The labels of the

ttackers returned by the models are compared with true labels of

he users, and performance scores are computed accordingly. The

hree attacker identifiers, two of which are the models proposed

n this work (distance and time series kernels) and the third one

s the time series clustering model [30] , are comparatively evalu-

ted. The time series kernel algorithm yields the best performance

cores. The time series clustering model [30] has the worst perfor-

ance. The most likely reason for the lower performance of the

atter is that this model uses Euclidean distance, hence does not

se any data-driven weighting, for the calculation of dynamic time


Table 5

The processing times of attacker identification methods in seconds, for the obser-

vation interval of 1 second with setting in Table 1 .

Intensity Distance kernel Time series Clustering

Level 1 0.11 ± 0.07 19.81 ± 26.46 0.91 ± 1.03

Level 2 0.11 ± 0.07 21.20 ± 26.7 0.93 ± 1.05

Level 3 0.12 ± 0.07 22.02 ± 28.5 0.94 ± 1.12

Level 4 0.12 ± 0.08 26.89 ± 29.72 0.96 ± 1.45

Level 5 0.12 ± 0.08 29.4 ± 37.21 1.03 ± 1.94

Table 6

The processing times of attacker identification methods in seconds, for the different

observation intervals.

Sampling Distance kernel Time series Clustering

1 s 0.12 ± 0.07 22.02 ± 28.5 0.94 ± 1.12

2 s 0.16 ± 0.12 126.33 ± 144.30 5.06 ± 7.84

3 s 0.16 ± 0.08 326.55 ± 446.37 20.38 ± 31.66

5 s 0.24 ± 0.13 − 72.00 ± 95.23

10 s 0.76 ± 0.45 − 162.30 ± 286.62

w

t

6

m

t

t

l

v

c

s

t

f

T

e

t

o

t

o

p

a

s

t

m

r

t

s

s

T

1

7

n

t

t

o

t

t

m

p

a

t

fl

t

l

c

i

a

s

c

m

e

k

i

e

u

o

t

n

c

m

χ

t

a

m

d

o

o

t

p

s

t

s

t

S

r

r

s

t

t

i

t

s

S

o

d

m

t

f

T

m

n

n

A

d

k

a

t

s

t

d

arping distance. The experimental results, in Table 4 , show that

he two proposed methods have almost the same F -Score values.

.8. Time comparison of attacker identification methods

The time series kernel approach used in attacker detection is

ore accurate than the distance kernel, but requires longer run

imes. The run-times of times-series clustering model are between

he other two models, but it has significantly lower accuracy. Simi-

ar conclusions can be drawn for other traffic intensities and obser-

ation intervals. The average running times of the attacker identifi-

ation methods are given in Table 5 , where γ = ρ = 1 for the time

eries kernel, the order, which is the number of nearest neighbors

o process during cluster member candidate selection, is set to 1

or time series clustering. The observation interval is taken as 1 s.

he distance kernel does not have any parameters to be set. For

ach method, the number of clusters to be found is set to 2. This

able shows that as the traffic rate goes high, the running times

f the identification methods also increase. Note that the running

ime of the time series method is much higher than that of the

ther two models. The reason is that the computational load of

airwise similarities in the time-series kernel increases proportion-

lly to the number of active users and the number of messages

ent by the users.

Table 6 shows the run times of the models with respect to

he observation interval. The longer the interval, the longer the

odels take to identify the objects. The running time of time se-

ies kernel increases exponentially since the kernel is evaluated by

he pairwise similarity calculation of each messages sent by the

ubscribers. The longer the interval, the more messages the sub-

cribers send. Also there are more number of active subscribers.

he running time of time series kernel is not evaluated for 5 and

0 s interval since it takes so much time.

. Conclusions and future directions

This study has focused on the detection of DDoS attacks in SIP

etworks and on the identification of users coordinated in an at-

ack. An adaptive cyber security monitor is developed consisting of

wo basic components: a change-point detector to alert the system

f an ongoing attack and an identifier for the malicious user set.

The proposed change-point model tracks the Mahalanobis dis-

ance between the messaging counts in successive observation in-

ervals. The rationale is that a marked (dis)similarity of sequential

essage count vectors can uncover abrupt changes in the traffic

attern. High dissimilarity instances, i.e., the Mahalanobis distance

bove a threshold, is labeled as a candidate attack. The setting of

he threshold is critical to differentiate DDoS attacks from random

uctuations of the traffic. The proposed DCPM is capable to adapt

o the traffic variations due to the on-line estimation of the Maha-

anobis metric and hence yields significantly better performance as

ompared to the literature results.

Identification of DDoS attackers is based on behavioral similar-

ty in messaging sequences. Based on the premise that attackers

ct in a coordinated way while normal users show a much less

tructured messaging pattern, two corresponding clusters are con-

eived. The user-to-user similarity is measured by kernelizing their

essaging time series. In the time-series kernel function we use

xplicitly the timestamps of the messaging events; in the distance

ernel, we collapse the messaging activities within an observation

nterval into a cumulative count vector. The behavioral clusters are

xtracted using normalized Laplacian spectral clustering.

The performance of the proposed system is tested over a sim-

lated SIP network environment, which simulates transactions of

rdinary subscribers and attackers. Depending on the intensity of

he normal network traffic, observation interval and attack mag-

itude, our F-scores are more than 0.70 for the distance-based

hange point models, which is much higher than that of the ARIMA

odel. The area under curve (AUC) values are 0.993 and 0.994 for2 and constant thresholding, respectively.

The effects of observation window length, background traffic in-

ensity and parameter settings for the proposed DCPM methods

re discussed in details. The longer observation windows result in

ore accurate attack detectors, but they come with a price of re-

uced onset/offset resolution. As one should expect, the intensity

f background traffic has a diminishing effect on the performance

f the proposed methods: The more fluctuations the traffic has,

he lower the F -scores are, which are still higher than 0.70. The

arameters of the models should be calibrated to account for the

easonal changes.

The attacker identification algorithms are also compared in de-

ails. The time-series kernel has higher F -scores but has a con-

iderable running time. Longer observation intervals form longer

ime-series and the running times increase almost exponentially.

imilarly, the higher intensity of traffic causes an increase in the

unning time. Even though the distance kernel has lower accu-

acy values, its running time is almost unaffected by the ob-

ervation window interval or the traffic intensity. The reason is

hat each user is represented as a vector and the number of

he operations are not affected by the window interval or the

ntensity.

This study can be advanced in several ways. First, in addition

o observed message traffic, one can use additional data sources,

uch as SIP server log registers or its resource usage, e.g., CPU load.

econd, the distance change-point model compares only the last

bservation interval with the immediately preceding k frames to

etect changes in the traffic. This can be extended to consider the

ost current m frames and the k frames in its past. We conjec-

ure that the comparison of two group-of-frames might diminish

alse alarms, that is, changes detected which are not DDoS attacks.

hirdly, though the time-series kernel has slightly higher perfor-

ance, it takes longer time to respond due to the cost of ker-

el matrix computation. The distance kernel is faster but it does

ot benefit from the occurrence time information of the messages.

n hybrid kernel might provide a more accurate detector than the

istance kernel and a faster detector than the sequence alignment

ernel. Finally, we have so far considered the cost of false negatives

nd false positives to be equal. From the point of view of opera-

ors that deploy SIP servers, these two costs are not equal and this

hould be taken into consideration in setting the threshold for at-

acker detection. Delayed response to an DDoS attack and suffering

egradation of quality of service should be weighted against taking


w

D

E

D

)

)

)

i

D

g

b

o

c

�

o

R

preventive action toward subscribers that may not all be malicious

users.

DDoS attacks may look deceptively simple, but they have

proved to be hard to prevent, and they will be one of the major

cyber security concerns with the spread of Internet-Of-Things (IoT)

devices. The capabilities of IoT devices and their security vulnera-

bilities (e.g. weak passwords or no protection mechanisms at all)

make them easy victims as zombies for botnet applications, such

as Mirai [45] . The botnets are also evolving and becoming adap-

tive against the deployed counter-measures. Thus, more research

should be carried over particular in detection of attack sources

to overcome the possible outages and network congestion on the

horizon with the wide spread of IoT devices.

Acknowledgment

This study is partially funded with TEYDEB project number

3140701 “Realization of Anomaly Detection and Prevention with

Learning System Architectures, Quality Improvement, High Rate

Service Availability and Rich Services in a VoIP Firewall Product”,

by the Scientific and Technological Research Council Of Turkey

(TUBITAK). NOVA V-Gate and V-Spy are trademark cyber-security

products of NETAS.

This section focuses on the details of the LogDet function.

The Kullback–Leibler (KL) divergence from distribution Q to dis-

tribution P , where p ( x ) and q ( x ) are their respective probability

density functions, and x ∈ �

d , is calculated as:

D KL (P ‖ Q ) =

∫ p(x ) log

(p(x )

q (x )

)dx (19)

= E P

[ log

(P

Q

)] (20)

Assuming that both p and q are multivariate Gaussian distributions

with mean vectors μp and μq and covariance matrices �p and �q ,

respectively, one has:

p(x ) =

1

(2 π) d/ 2 det ( �p ) 1 / 2 exp

(− 1

2

(x − μp ) � �p

−1 (x − μp ) )

(21)

q (x ) =

1

(2 π) d/ 2 det ( �q ) 1 / 2 exp

(− 1

2

(x − μq ) � �q

−1 (x − μq ) )

(22)

Using definitions 21 and 22 in Eq. (19), one obtains:

D KL (P ‖ Q ) = E P [

log P − log Q

](23)

=

1

2

E P [

− log det �p − (x − μp ) � �p

−1 (x − μp )

+ log det �q + (x − μq ) � �q

−1 (x − μq ) ]

(24)

=

1

2

log det �q

det �p +

1

2

E P [

− (x − μp ) � �p

−1 (x − μp )

+(x − μq ) � �q

−1 (x − μq ) ]

(25)

=

1

2

log det �q

det �p +

1

2

E P [

− tr ( �p −1 (x − μp )(x − μp )

� )

+ tr ( �q −1 (x − μq )(x − μq )

� ) ]

(26)

Here we use the identity a � Bc = tr (Bca � ) for any a, c ∈ �

d and

B ∈ �

d × d . Since both the trace and integration are linear operators,

e can proceed as follows:

KL (P ‖ Q ) =

1

2

log det �q

det �p − 1

2

tr ( �p −1

E P [(x − μp )(x − μp )

� ]) +

1

2

tr ( �q −1

E P [(x − μq )(x − μq )

� ]) (27)

=

1

2

log det �q

det �p − 1

2

tr ( �p −1 �p )

+

1

2

tr ( �q −1

E P [xx

� − μq x

� − x μq � + μq μq

� ]) (28)

= −1

2

log det �p

det �q − 1

2

tr (I )

+

1

2

tr ( �q −1

E P [(xx

� − μq x

� − x μq � + μq μq

� ]) (29)

Note that one has, det �p

det �q = det �p �q

−1 , tr (I ) = d and

P

[xx �

]= �p + μp μp

� . The Kullback–Leibler divergence becomes:

KL (P ‖ Q ) = −1

2

log det �p �q −1 − 1

2

d +

1

2

tr

(�q

−1 ( �p + μp μp �

−μq μp � − μp μq

� + μq μq � )

)(30

= −1

2


2

d +

1

2

tr ( �q −1 �p )

+

1

2

tr

(�q

−1 ( μp μp � − μq μp

� − μp μq � + μq μq

� ) )

(31

= −1

2


2

d +

1

2

tr ( �q −1 �p )

+

1

2

( μp − μq ) � �q

−1 ( μp − μq ) (32

Now, let’s assume that the means vectors of p and of q are

dentical, μp = μq , then we can conclude that:

KL (P ‖ Q ) =

1

2

(tr ( �q

−1 �p ) − log det �p �q −1 − d

)(33)

=

1

2

(tr ( �p �q

−1 ) − log det �p �q −1 − d

)(34)

=

1

2

D ld ( �p , �q ) (35)

D ld ( �p , �q ) is called logarithmic determinant (LogDet) diver-

ence and can be used as a pseudo-metric to measure the distance

etween two matrices. A metric function D : X × X → [0, ∞ ) defined

n a set X must hold three basic conditions, as below, for any a, b,

∈ X :

1. D ( a, b ) ≥ 0 and D (a, b) = 0 iff a = b (non-negativity and

positive-definiteness)

2. D (a, b) = D (b, a ) (symmetry)

3. D (a, c) ≤ D (a, b) + D (b, c) (triangle inequality)

D ld ( �p , �q ) is a pseudo-metric on positive-definite matrices,

p , �q ∈ �

d × d , since it only guarantees non-negativity, and the

ther two rules do not necessarily hold true.

eferences

[1] N. Raza, I. Rashid, F.A. Awan, Security and management framework for an or-

ganization operating in cloud environment, Ann. Telecommun. 72 (5) (2017)

325–333, doi: 10.1007/s12243- 017- 0567- 6 . [2] D. Bolton, Anonymous ‘declares war’ on Turkey, claims responsibility for

recent massive cyberattacks, 2015, ( http://www.independent.co.uk/life- style/gadgets- and- tech/news/anonymous- declares- war- on- turkey- opsis- russia-

cyberattack- erdogan- a6784026.html ), [Online; accessed 04-06-2017].

https://doi.org/10.1007/s12243-017-0567-6

http://www.independent.co.uk/life-style/gadgets-and-tech/news/anonymous-declares-war-on-turkey-opsis-russia-cyberattack-erdogan-a6784026.html


[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[

[

[3] B.B. Gupta, T. Akhtar, A survey on smart power grid: frameworks, tools, se-curity issues, and solutions, Ann. Telecommun. 72 (9) (2017) 517–549, doi: 10.

1007/s12243- 017- 0605- 4 . [4] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks,

M. Handley, E. Schooler, SIP: session initiation protocol, RFC 3261, 2002 . http://www.ietf.org/rfc/rfc3261.txt

[5] M. Cooney, IBM warns of rising VoIP cyber-attacks, 2016, ( http://www.networkworld.com/article/3146095/security/ibm- warns- of- rising- voip- cyber-

attacks.html ), [Online; accessed 04-06-2017].

[6] C. Wilson, DDoS attacks targeting traditional telecom systems, 2012, ( https://www.arbornetworks.com/blog/asert/ddos-attacks-targeting-traditional-

telecom-systems/ ), [Online; accessed 04-06-2017]. [7] C. Yildiz , M. Semerci , T.Y. Ceritli , B. Kurt , B. Sankur , A.T. Cemgil , Change point

detection for monitoring SIP networks, in: Proceedings of the European Con-ference on Networks and Communications (EuCNC2016), 2016 .

[8] A.D. Keromytis , A comprehensive survey of voice over IP security research,

IEEE Commun. Surv. Tutor. 14 (2) (2012) 514–537 . [9] D. Sisalem , J. Kuthan , S. Ehlert , Denial of service attacks targeting a SIP VoIP

infrastructure: attack scenarios and prevention mechanisms, IEEE Network 20(5) (2006) 26–31 .

[10] E.Y. Chen , M. Itoh , Scalable detection of SIP fuzzing attacks, in: Proceedings ofthe Second International Conference on Emerging Security Information, Sys-

tems and Technologies., SECURWARE, 2008, pp. 114–119 .

[11] S. Ehlert , D. Geneiatakis , T. Magedanz , Survey of network security systemsto counter SIP-based denial-of-service attacks, Comput. Secur. 29 (2) (2010)

225–243 . [12] Z. Chen , R. Duan , The formal analyse of DoS attack to SIP based on the SIP

extended finite state machines, in: Proceedings of the International Conferenceon Computational Intelligence and Software Engineering, 2010, pp. 1–4 .

[13] N. Vrakas, C. Lambrinoudakis, An intrusion detection and prevention system

for IMS and VoIP services, Int. J. Inf. Secur. 12 (3) (2013) 201–217, doi: 10.1007/s10207- 012- 0187- 0 .

[14] R. Vijayasarathy , S.V. Raghavan , B. Ravindran , A system approach to networkmodeling for DDoS detection using a Naive Bayesian classifier, in: Proceedings

of the Third International Conference on Communication Systems and Net-works (COMSNETS), IEEE, 2011, pp. 1–10 .

[15] C. Yildiz , T.Y. Ceritli , B. Kurt , B. Sankur , A.T. Cemgil , Attack detection in VOIP

networks using Bayesian multiple change-point models, in: Proceedings of theTwenty Fourth Conference on Signal Processing and its Applications (SIU),

2016, pp. 1301–1304 . [16] M. Nassar , R. State , O. Festor , A framework for monitoring SIP enterprise net-

works, in: The Fourth International Conference on Network and System Secu-rity (NSS), 2010, pp. 1–8 .

[17] Z. Tsiatsikas, D. Geneiatakis, G. Kambourakis, S. Gritzalis, Realtime DDoS De-

tection in SIP Ecosystems: Machine Learning Tools of the Trade, Springer In-ternational Publishing, Cham, pp. 126–139.

[18] S.M.T. Nezhad , M. Nazari , E.A. Gharavol , A novel DoS and DDos attacks detec-tion algorithm using ARIMA time series model and chaotic system in computer

networks, IEEE Commun. Lett. 20 (4) (2016) 700–703 . [19] A . D’Alconzo, A . Coluccia, P. Romirer-Maierhofer, Distribution-based anomaly

detection in 3g mobile networks: from theory to practice, Int. J. Netw. Manag.20 (5) (2010) 245–269, doi: 10.1002/nem.747 .

20] A . D’Alconzo, A . Coluccia, F. Ricciato, P. Romirer-Maierhofer, A distribution-

based approach to anomaly detection and application to 3g mobile traffic, in:Proceedings of the IEEE Global Telecommunications Conference on GLOBECOM,

2009, pp. 1–8, doi: 10.1109/GLOCOM.2009.5425651 . [21] M. Anagnostopoulos, G. Kambourakis, S. Gritzalis, New facets of mobile botnet:

architecture and evaluation, Int. J. Inf. Secur. 15 (5) (2016) 455–473, doi: 10.1007/s10207-015-0310-0 .

22] G. Kirubavathi, R. Anitha, Structural analysis and detection of android botnets

using machine learning techniques, Int. J. Inf. Secur. (2017) 1–15, doi: 10.1007/s10207- 017- 0363- 3 .

23] S.S. Silva, R.M. Silva, R.C. Pinto, R.M. Salles, Botnets: a survey, Comput. Netw.57 (2) (2013) 378–403 . https://doi.org/10.1016/j.comnet.2012.07.021

[24] P. Garca-Teodoro, J. Daz-Verdejo, G. Maci-Fernndez, E. Vzquez, Anomaly-basednetwork intrusion detection: techniques, systems and challenges, Comput. Se-

cur. 28 (12) (2009) 18–28 . https://doi.org/10.1016/j.cose.2008.08.003 25] M. Gupta , J. Gao , C.C. Aggarwal , J. Han , Outlier detection for temporal data: a

survey, IEEE Trans. Knowl. Data Eng. 26 (9) (2014) 2250–2267 . 26] R.J. Hyndman , E. Wang , N. Laptev , Large-scale unusual time series detection,

in: Proceedings of the IEEE International Conference on Data Mining Work-shop, ICDMW, 2015, pp. 1616–1619 . Atlantic City, NJ, USA, November 14–17,

2015

[27] M. Cuturi , Fast global alignment kernels, in: Proceedings of the Twenty EighthInternational Conference on Machine Learning, ICML, 2011, pp. 929–936 . Belle-

vue, Washington, USA, June 28 - July 2, 2011 28] K.R. Sivaramakrishnan , K. Karthik , C. Bhattacharyya , Kernels for large margin

time-series classification, in: Proceedings of the International Joint Conferenceon Neural Networks, 2007, pp. 2746–2751 .

29] H. Chen , F. Tang , P. Tino , X. Yao , Model-based kernel for efficient time series

analysis, in: Proceedings of the Nineteenth ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, in: KDD ’13, ACM, New York,

NY, USA, 2013, pp. 392–400 . 30] X. Zhang , J. Liu , Y. Du , T. Lv , A novel clustering method on time series data,

Expert Syst. Appl. 38 (9) (2011) 11891–11900 . [31] T. Oates , L. Firoiu , P. Cohen , Clustering time series with hidden Markov mod-

els and dynamic time warping, in: Proceedings of the IJCAI-99 Workshop on

Neural, Symbolic, and Reinforcement Learning Methods for Sequence Learning,1999 .

32] Y. Xiong , D.-Y. Yeung , Mixtures of ARMA models for model-based time seriesclustering, in: Proceedings of the IEEE International Conference on Data Min-

ing, 2002 . [33] S. Behal, K. Kumar, Detection of DDoS attacks and flash events using novel

information theory metrics, Comput. Netw. 116 (Supplement C) (2017) 96–110 .

https://doi.org/10.1016/j.comnet.2017.02.015 34] B. Tellenbach, M. Burkhart, D. Schatzmann, D. Gugelmann, D. Sornette, Accu-

rate network anomaly classification with generalized entropy metrics, Comput.Netw. 55 (15) (2011) 3485–3502 . https://doi.org/10.1016/j.comnet.2011.07.008

[35] J. Heo , E.Y. Chen , T. Kusumoto , M. Itoh , Statistical SIP traffic modelingand analysis system, in: Proceedings of the Tenth International Sympo-

sium on Communications and Information Technologies, 2010, pp. 1223–1228 .

10.1109/ISCIT.2010.5665175 36] S. D’Antonio, M. Esposito, F. Oliviero, S.P. Romano, D. Salvi, Behavioral network

engineering: making intrusion detection become autonomic, Annales Des Télé-communications 61 (9) (2006) 1136–1148, doi: 10.1007/BF03219885 .

[37] J.V. Davis , B. Kulis , P. Jain , S. Sra , I.S. Dhillon , Information-theoretic metriclearning, in: Proceedings of the Twenty Fourth International Conference on

Machine Learning, 2007, pp. 209–216 . New York, NY, USA

38] M. Cuturi , J.P. Vert , O. Birkenes , T. Matsui , A kernel for time series based onglobal alignment, in: Proceedings of IEEE International Conference on Acous-

tics, Speech and Signal Processing, 2, 2007, pp. 413–416 . 39] U. Luxburg , A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007)

395–416 . 40] Fonality, Trixbox business phone solutions, 2016, ( https://www.fonality.com/

trixbox ), [Online; accessed 04-06-2017]. [41] B. Kurt , C. Yildiz , T.Y. Ceritli , M. Yamac , M. Semerci , B. Sankur , A.T. Cemgil , A

probabilistic SIP network simulation system, in: Twenty Fourth Conference on

Signal Processing and its Applications (SIU), IEEE, 2016, pp. 1049–1052 . 42] C. Yildiz , B. Kurt , T.Y. Ceritli , A.T. Cemgil , B. Sankur , BOUN-SIM API Reference,

Technical Report, Department of Computer Engineering, Bogazici University,2016 .

43] Teluu, PJSIP, 2005, ( http://www.pjsip.org/ ), [Online; accessed 04-06-2017]. 44] NETAS, Nova V-spy, 2016, ( http://novacybersecurity.com/products/nova _ vspy ),

[Online; accessed 04-06-2017].

45] J. Gamblin, Mirai source code, 2016, ( https://github.com/jgamblin/Mirai- Source- Code ), [Online; accessed 04-06-2017].

https://doi.org/10.1007/s12243-017-0605-4

http://www.ietf.org/rfc/rfc3261.txt

http://www.networkworld.com/article/3146095/security/ibm-warns-of-rising-voip-cyber-attacks.html

https://www.arbornetworks.com/blog/asert/ddos-attacks-targeting-traditional-telecom-systems/

http://refhub.elsevier.com/S1389-1286(18)30098-7/sbref0004























https://doi.org/10.1007/s10207-012-0187-0



















https://doi.org/10.1002/nem.747

https://doi.org/10.1109/GLOCOM.2009.5425651

https://doi.org/10.1007/s10207-015-0310-0

https://doi.org/10.1007/s10207-017-0363-3


https://doi.org/10.1016/j.cose.2008.08.003











































https://doi.org/10.1007/BF03219885















https://www.fonality.com/trixbox















http://www.pjsip.org/

http://novacybersecurity.com/products/nova_vspy

https://github.com/jgamblin/Mirai-Source-Code


al and Electronics Engineering Department and the Department of Computer Engineering

his M.S. degrees from the Department of Computer Engineering, Bogazici University, in elaer Polytechnic Institute, Troy, NY, in 2010. Currently he is a Ph.D. student in Bogazici

, distance learning and kernel machines.

rsity Nijmegen, the Netherlands and worked as a Postdoctoral Researcher at Amsterdam

s Lab., University of Cambridge, UK. He is currently an Associate Professor of Computer research interests are in Bayesian statistical methods, approximate inference, machine

Department of Electrical-Electronic Engineering. His research interests are in the areas

ion and multimedia systems. He has held visiting positions at the University of Ottawa, re des Télécommunications, Paris. He was the chairman of several conferences (EUSIPCO’

of the European Signal Processing Association. Dr. Sankur is presently an associate editor and Video Computing, and Image and Video Processing.

Murat Semerci received his B.S. degrees from the Electric

(Double Major Program), Bogazici University, in 2005 and2007 and from the Computer Science Department, Renss

University. His research interests include machine learning

Ali Taylan Cemgil received his Ph.D. from Radboud Unive

University and the Signal Processing and CommunicationEngineering at Bogaziçi University, Istanbul, Turkey. His

learning, and audio signal processing.

Bülent Sankur is presently at Bogazici University in the

of digital signal processing, security and biometry, cognitTechnical University of Delft, and Ecole Nationale Supérieu

05, ICASSP’ 00 etc.) and a member of administrative boardin the journals of Annals des Télécommunications, Image

Date post:	21-May-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

An intelligent cyber security system against DDoS attacks ...sankur/SankurFolder/Jour... ·...

Documents