Computer Networks 136 (2018) 137–154
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
An intelligent cyber security system against DDoS attacks in SIP
networks
Murat Semerci a , ∗, Ali Taylan Cemgil a , Bülent Sankur b
a Department of Computer Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey b Department of Electrical and Electronics Engineering, Bogazici University, Bebek, Istanbul 34342, Turkey
a r t i c l e i n f o
Article history:
Received 26 September 2017
Revised 11 January 2018
Accepted 25 February 2018
Available online 7 March 2018
Keywords:
Anomaly detection
Malicious user detection
DDoS
Mahalanobis distances
Sequence alignment kernel
a b s t r a c t
Distributed Denial of Services (DDoS) attacks are among the most encountered cyber criminal activities
in communication networks that can result in considerable financial and prestige losses for the corpora-
tions or governmental organizations. Therefore, autonomous detection of a DDoS attack and identification
of its sources is essential for taking counter-measures. This study proposes an intelligent security system
against DDoS attacks in communication networks that is composed of two components: A monitor for
detection of DDoS attacks and a discriminator for detection of users in the system with malicious in-
tents. A novel adaptive real time change-point model that tracks the changes in Mahalanobis distances
between sampled feature vectors in the monitored system accounts for possible DDoS attacks. A clus-
tering model that runs over the similarity scores of behavioral patterns between the users is used to
segregate the malicious from the innocent. The proposed model is deployed over a simulated telephone
network that uses a Session Initiation Protocol (SIP) server. The performance of the models are evaluated
on data generated by this high throughput simulation environment.
© 2018 Elsevier B.V. All rights reserved.
1
j
v
w
t
a
i
p
a
T
c
s
c
w
t
u
d
c
c
n
c
P
r
f
e
p
o
t
w
t
t
b
c
C
s
d
t
h
1
. Introduction
Distributed Denial of Service (DDoS) attacks are one of the ma-
or cyber threats on communications networks. DDoS attacks occur
ery frequently because they are fairly simple and cheap to initiate
hile their broad impact on users and service providers can po-
entially be severe. Such an attack incapacitates the victim server
nd renders it unable to provide services at all or at desired qual-
ty of service levels to its subscribers. With the cost-effective de-
loyment of cloud systems, DDoS attacks might affect the overall
vailability of the services by targeting more than one server [1] .
hey can even be a tool for political struggle on a grander scale; a
ase in point is the set of DDoS attacks to Turkey’s domain name
ervers by hacktivist groups in December 2015 [2] . As a more radi-
al case, they can be exerted over smart power transmission grids,
ith potentially more catastrophic consequences [3] . Therefore au-
omatic detection of DDoS attacks and identification of malicious
sers are crucial in protecting the network entities and for non-
egraded service continuity.
Telephone service providers follow the trend of changing their
ircuit-switched networks to packet-switched ones in view of the
∗ Corresponding author.
E-mail address: [email protected] (M. Semerci).
d
s
s
r
ttps://doi.org/10.1016/j.comnet.2018.02.025
389-1286/© 2018 Elsevier B.V. All rights reserved.
ost-effectiveness and maturity of the Voice-over-IP (VoIP) tech-
ology. The most popular protocol for control signaling between
ommunicating parties in VoIP is currently the Session Initiation
rotocol (SIP) [4] . SIP is based on a simple, HTTP-like text-based
equest-response transaction model. It provides basic signaling
unctionalities required for registering clients, checking their pres-
nce and on-line availability, exchanging their communication ca-
abilities, and overall managing the sessions. With the deployment
f 5G, VoIP is expected to be one of the major instruments for
he multimedia communication. The wide deployment of VoIP net-
orks and the key importance of telephone networks have made
he security issues of SIP servers extremely important.
VoIP networks are under a variety of cyber threats and the in-
ensity of attacks seems only to be growing [5] . The attacks can
e motivated by potential financial benefits, such as pilfering call
harges or causing data leakage masqueraded as a stealth threat.
onversely, it may be part of a plan to cause financial losses to the
ervice providers via heavy service disruption [6] .
In this paper we introduce a novel real-time online intrusion
etection and prevention system for communication networks, par-
icularly for networks with SIP traffic. The proposed system both
etects the presence of an attack and identifies the attackers. The
ystem focuses on the DDoS attacks that flood and suffocate a
erver with excessive amount of requests. One clue for the occur-
ence of a DDoS attack is a marked change in the messaging traffic
138 M. Semerci et al. / Computer Networks 136 (2018) 137–154
m
fl
a
i
f
t
b
m
i
a
i
s
l
u
a
p
s
l
c
I
a
s
c
t
s
t
t
a
m
a
t
p
a
i
n
i
m
g
T
i
a
o
patterns in the network. To this effect, we develop a change detec-
tion algorithm that monitors the network traffic intensities at the
server side. Significant changes in the characteristics of messaging
flows are interpreted as the onset or offset of a potential DDoS at-
tack. We assume tacitly that in a DDoS attack, the attackers are
always acting in a coordinated manner.
A novel aspect of the proposed change-point detection method
is that it relies on the adaptive tracking of Mahalanobis distances
between successive state vectors as a way to monitor abnormal
changes in messaging traffic. This enables the monitor to adapt it-
self to the normal traffic regime and/or to the diurnal or seasonal
variations while at the same time remaining sensitive to abnormal
changes. One advantage of our method is that it is model-free, that
is, it is an unsupervised approach to detect traffic anomalies. The
system makes use only of the observed messaging traffic type and
intensity, and does not require any additional information such as
tracebacks. An abnormal change in the traffic regime is declared
if the Mahalanobis distance sequence of the state vectors in suc-
cessive time windows exceeds a threshold function. This threshold
value can be set to a constant as a function of sytem parameters
or can be adaptively set. A preliminary version of the Mahalanobis-
based anomaly detection algorithm was presented in a conference
paper [7] . The first part of this paper presents is an extension of
the model with an adaptive thresholding function. Notice that the
attacker identification, as described in the second part of the paper,
was not part of the conference paper. The second novelty of our
study is that the algorithm beside detecting the occurrence of an
attack, it also can pinpoint the set of attackers. In other words un-
der certain realistic assumptions, it can discriminate between mes-
saging patterns of the attackers and those of the non-malicious,
i.e., normal users. Similarly, the attacker identification model runs
in an unsupervised mode and it is independent of underlying at-
tack model except for the assumption of attacker coordination. Per-
formance results of the algorithm are studied under extensive net-
work traffic and attack traffic simulations.
In Section 2 , we give a brief overview of cyber threats related
to SIP and proposed remedies for them. In Section 3 we define the
variables and symbols used to describe the time series correspond-
ing to the messaging history of the users and the state of the sys-
tem. In Section 4 , we introduce our change point monitor based
on Mahalanobis distances as an instrument to detect (D)DoS at-
tacks. The method for normal versus malicious user discrimination
is detailed in Section 5 . The performance of the proposed methods
is evaluated using simulation data and compared against those of
competitor algorithms in Section 6 . Finally, conclusions are drawn
in Section 7 where we also discuss the future work in the context
of IoT.
2. Literature review
In addition to the session layer attacks, telecommunication net-
works are also susceptible to a plethora of other threats below the
session layer [8] . Since these are discussed in detail elsewhere, in
this work, we focus solely on SIP-specific threats.
SIP attacks typically exploit vulnerabilities in the SIP protocol.
Signature-based attacks utilize properties of the SIP grammar, and
can be detected by pattern matching between ongoing traffic and
the set of signatures. In other words, this type of attack can be
determined or even prevented by inspecting the steps that the at-
tacker must follow through. The non-signature based threats, e.g.,
behavior-based attacks such as DDoS, are harder to detect. SIP
threats can be roughly categorized into 4 groups [8] :
• Service Abuse Threats: These attacks include commercial abuse
of services to gain some financial benefit such as toll fraud or
billing avoidance.
• Eavesdropping, Interception and Modification Threats: These at-
tacks concentrate on illegally intervening to the call with the
goal of capturing sensitive information.
• Social Threats: These attacks use protocol shortcomings, mis-
configurations or implementation bugs of SIP server implemen-
tation and use these weaknesses to misrepresent the identity of
malicious parties to the subscribers.
• (Distributed) Denial of Service ((D)DoS): These attacks focus on
the SIP server to prevent it from giving service to the sub-
scribers or to cause significant degradation in the quality of
network services. An attacker can achieve this by flooding the
server with SIP messages and depleting the network and server
resources, such as CPU, memory, bandwidth. In the DoS attack,
only one machine is involved to mount the attack on the SIP
server. If the attacks are simultaneously performed by many
machines, possibly coordinated, the attack becomes a DDoS at-
tack. The botnet attack, where the attack is staged by many
zombie machines that are controlled by a master node, is a
well-known instance of DDoS.
There is a large variety of possible DDoS attacks, such as Do-
ain Name Server (DNS) attack and fuzzing attack [9,10] . The DNS
ooding attack wastes the bandwidth resources by injecting fake
ddresses, tying up the call during address resolution, and caus-
ng unnecessary messaging traffic between DNS and SIP server. The
uzzing attack, on the other hand, wastes CPU time by forcing it
o parse invalid SIP messages. DDoS attacks in SIP networks can
e grouped into four classes: SIP message payload tampering, SIP
essage flow tampering, SIP message flooding, and finally exploit-
ng SIP vulnerabilities, e.g., for toll fraud [11] .
Many methods have been proposed to detect and prevent DDoS
ttacks in VoIP networks. For example, for the SIP message flood-
ng varieties, an extended finite state machines (EFSM) can be de-
igned for SIP transactions in order to monitor transaction anoma-
ies [12] . Selected network traffic variables are tracked and if an
ndefined transaction occurs or any traffic variable count exceeds
pre-determined threshold, a preventive action is triggered. A full
rotocol stack intrusion detection and prevention system for VoIP
ystems is proposed in [13] . This is a table-based system that col-
ects, correlates and tuples data from different protocols on the
ommunication stack, e.g., MAC addresses, IP addresses, subscriber
Ds, packet timestamps. The decisions, such as dropping packets,
re given by certain rules applied over these tuples.
In [14] , the packets are labeled with respect to their transmis-
ion control protocol (TCP) flags. An alarm is raised if the packet
ounts in a time window deviates from the distribution fitted for
he normal traffic. In an alternate research, a naive Bayesian clas-
ifier has been constructed as a DDoS detector based on network
raffic variables. In [7,15] , a Bayesian change point model that de-
ects traffic surges or dips, which possibly correspond to DDoS
ttacks is proposed. The model is a hierarchical hidden Markov
odel that links the features extracted from SIP network traffic
nd server load to latent variables. One set of these variables tracks
he hidden dynamics of the system and the others serve as change
oint indicators. The output of the model is the posterior prob-
bility of a change indication, which is calculated at fixed time
ntervals.
As for SIP message payload tampering variety, an N -gram tech-
ique has been considered to detect the fuzzing attacks exploit-
ng malformed SIP messages. In this case, based on a corpus of SIP
essages, which contains both valid and malformed messages, 4-
rams, i.e., sequential 4-byte blocks in SIP messages, are extracted.
he 4-grams which exceed a given frequency threshold are des-
gnated as significant features and their occurrence count vectors
re used as features to train classifiers [16] . An experimental study
f applying 5 different machine learning models to detect DDoS
M. Semerci et al. / Computer Networks 136 (2018) 137–154 139
a
a
t
a
m
T
s
e
t
a
m
I
a
L
u
n
r
i
o
p
f
h
r
p
f
a
d
t
f
b
h
i
s
b
s
d
b
c
t
n
p
s
w
i
c
f
o
p
m
s
r
w
s
s
n
t
w
m
o
a
p
n
e
t
p
T
b
t
m
o
r
n
i
a
p
D
h
p
t
p
3
c
i
p
e
t
r
o
t
S
T
t
c
a
t
l
i
s
c
t
m
i
w
m
m
v
c
a
t
s
t
f
t
o
c
S
x
ttacks in SIP-deployed networks have been conducted [17] . The
uthors have implemented a simulation environment in order to
rain and evaluate the performance of the models. The classifiers
re trained with pre-generated training data collected from SIP
essage headers, which contain both attacks and normal traffic.
he models are required to be re-trained whenever the network or
ervice operating conditions are changed. The trained classifiers are
valuated in terms of accuracy and time overhead required to run
hem on-line for each message. A recent research proposes using
n autoregressive integrated moving average (ARIMA) time series
odel to classify the normal traffic, DoS and DDoS attacks [18] for
P networks. The number of packets and the number of IP sources
re tracked for each time unit and their ratios are stored. The local
yapunov exponents are calculated for these ratios and these val-
es are compared with a threshold to discriminate malicious from
on-malicious traffic type.
A statistical anomaly detection model, to which our method has
esemblances, was proposed in [19,20] . This method detects signif-
cant deviations in the 3G mobile network traffic patterns based
n a variant of Kullback–Leibler (KL) divergence between two em-
irical distributions. The collected data samples for each observed
eature within time window are fitted into respective univariate
istograms. Then, these empirical distributions are compared with
eference distributions of the observed features based on the pro-
osed divergence metric. If the distance of any of the inspected
eature distributions to that of its corresponding reference exceeds
n empirically set threshold, then an alarm is raised to declare a
etected anomaly. A human expert gives the final decision about
he detected anomaly as to whether it is an attack or not.
The spread of intelligent mobile devices has resulted in a new
acet of mobile botnets. The distributed characteristics of the mo-
ile network (capability to change IP addresses frequently) and
uge number of easily-hacked zombie devices by malwares make
t hard to prevent the DDoS attacks with conventional PC-centric
olutions. Besides using the Internet for command propagation, the
ot master can coordinate the zombies in some exceptional ways
uch as Bluetooth communication or SMS/MMS messaging. Three
ifferent command and control architectures (coordination of zom-
ies by the master) to start a mobile botnet DDoS attack are dis-
ussed in [21] . A recent study uses machine learning techniques
o discriminate applications that are malwares used in mobile bot-
ets. The manifest files of Android Application Packages (APK) are
rocessed to extract features. After some pre-processing steps, the
elected features are used in training classifiers to detect the mal-
ares [22] .
A detailed survey on historical evolution of Botnets is provided
n [23] . A detailed review of network intrusion systems which are
apable of detecting DDoS attacks and the specific methods used
or detection can be found in [24] .
Analysis of time series for classification, prediction, change and
utlier detection has been active research topics for decades with
articular focus on financial markets [25] . Among the plethora of
ethods proposed one can mention: i) methods that map the time
eries into a new feature space, such as spectral entropy, autocor-
elation etc. [26] ; (ii) kernel methods for time-series classification
ith emphasis on sequence alignment [27–29] ; (iii) clustering time
eries with a combined distance function satisfying the triangle
imilarity, which is the cosine value between two vector, and dy-
amic time warping distance [30] ; (iv) approaches fitting the data
o a number of possible models, such as a hidden Markov model
ith dynamic time warping, or an autoregressive moving average
odel with dynamic time warping, and clustering the data based
n model instance with the best fit [31,32] ; (v) singular spectral
nalysis where data is embedded, the embedding matrix decom-
osed and reconstructed into trend, noise and oscillatory compo-
ents.
oMetrics, which are functions to calculate distances between two
ntities in a set, can be used to detect anomalies in the network
raffic and in [33] two such information metrics have been pro-
osed for DDoS attacks. Similarly, a DDoS detector which uses the
sallis entropy has been proposed [34] . The Mahalanobis distance,
ased on inverse covariance matrix, has been previously used in
he detection of abnormal callers (outliers) by inspecting their SIP
essage flows [35] . In this study, however, we use an adaptively
n-line trained variety of the Mahalanobis distance for a time se-
ies. We use the time series of Mahalanobis distances accompa-
ying the input time series to detect DDoS attacks as well as to
dentify the malicious user from their messaging behavior analysis.
One of the first IDS system architectures that uses behavioral
nalysis to detect DDoS attacks and the malicious attackers was
roposed in [36] . The attacking entities aiming for a distributed
oS attack are characterized by a common messaging pattern. This,
owever, cannot be represented by a rule-based system. The pro-
osed system consists of three components: A sniffer to capture
he packets, a preprocessor to extract informative features from the
ackets and a classifier to detect the anomalies in the traffic.
. Mathematical notation
We first introduce the notation specific to the communication
ontrol, e.g., SIP, messaging. Time is discrete, represented by the
nstants t = i � at which user behavior data is collected and then
rocessed to output a feature vector. � is an observation interval,
.g., 1 s long, within which user messaging activities are moni-
ored. A messaging activity observed at the server side is the ar-
ival of one of the SIP messages (invite, bye, 200 etc.) from a user
r the transmission of such a message to a user.
At the end of this interval, the r th user’s activity is denoted by
he d -dimensional vector v r , where d is the number of different
IP request or response message types taken into consideration.
he vector v is an integer vector whose components correspond
o the number of times each one of the d message types has oc-
urred within the i th time frame ((i − 1)� < t < i �) . Not all users
re active in each observation interval. An active user, for example
he r th one, is a registered user that has sent and/or received at
east one SIP message within the given observation interval, and it
s indicated by u r , r = 1 , . . . , | U| , where | U | is the cardinal of this
et.
Next let’s look into the details of the user’s count vectors. A
ount vector results from the sum of individual messaging activi-
ies of an active user. The r th active user is assumed to run P r > 0
essaging activities within the observation interval. Each messag-
ng activity is represented v p r , p = 1 , . . . , P r , which is a unit vector
ith one component being 1, and the rest 0. Let’s call this as a
essage indicator vector, because it indicates which one of the d -
essages has occurred. Then v r =
∑ P r p=1
v p r , v r is simply the count
ector of messages sent by the r th user, as shown in Fig. 1 .
Finally, let us introduce the d -dimensional count vector, x ,
alled the state vector, that represents the collective activities of
ll | U | active users within a time frame. The state vector, which is
he total message count vector from all users at the server side is
imply the sum of the active user count vectors, x =
∑ | U| r=1
v r , and
his is illustrated in Fig. 2 .
We have so far omitted any specific index to denote the time
rames to avoid notational clutter. However, we will use the no-
ation x i , x j ∈ �
d to denote server state vectors at the i th and j th
bservation intervals. These feature vectors or server state vectors
an be used to monitor the traffic regime changes in a network.
Let M be a d × d positive (semi) definite matrix ( M ∈ S + or M ∈ ++ ). D M
( x i , x j ) is the distance between the feature vectors x i and
j calculated over metric matrix M . f (M | x n : x n −k −1 ) is a function
f M defined over the time window of length k tracked between
140 M. Semerci et al. / Computer Networks 136 (2018) 137–154
0 1 · · · 0 · · · 0
0 0 · · · 0 · · · 1
0 0 · · · 1 · · · 0...
.... . .
.... . . · · ·
0 0 · · · 0 · · · 0
1
v1r
0
v2r
· · · 0
vpr
· · · 0
vPrr
Observation interval between (i 1) ∗ Δ and i ∗ Δ
, →
v1,r
v2,r
v3,r
...
vd−1,r
vd,r
vr
Fig. 1. The r th user count vector resulting from the accumulation of message indi-
cator vectors ( v r =
∑ P r p=1 v
p r ) in an observation interval.
v1,1 v1,2 · · · v1,r · · · v1,|U |
v2,1 v2,2 · · · v2,r · · · v2,|U |
v3,1 v3,2 · · · v3,r · · · v3,|U |...
.... . .
.... . . · · ·
vd−1,1 vd−1,2 · · · vd−1,r · · · vd−1,|U |
vd,1
v1
vd,2
v2
· · · vd,r
vr
· · · vd,|U |v|U|
Observation interval between (i − 1) ∗ Δ and i ∗ Δ
→
x1
x2
x3
...
xd−1
xd
x
Fig. 2. The server state vector is the sum of user count vectors ( x =
∑ | U| r=1
v r )
wpr=
vpr
tpr
Fig. 3. The time-stamped user message vector is the concatenation of user unit
message vector and the time it is sent (w
p r ∈ � d+1 ) .
n
t
a
f
|
4
c
t
f
s
t
b
t
c
c
t
a
s
j
M
t
o
o
c
w
t
t
4
l
m
t
t
a
c
i
c
t
v
u
D
a
T
D
w
t
s
4
s
feature vectors from x n −k −1 to x n : From the time index n − k − 1
to time index n. D ld ( A, B ) is a function defined over any two same
dimension matrices, A and B .
Notice that up to this point we have neglected the stamp in-
formation, that is, the actual time instances t 1 r , . . . , t P r r within a
generic �-long time frame, at which the P r messaging activities,
say, of the r th user, are occuring. We can incorporate this infor-
mation by augmenting the dimensionality of the message indica-
tor vector, v p r , by one, as follows: (w
p r )
� = ((v p r )
� , t p r ) . Thus, w
pr
is the timestamp-enriched version of the message indicator vector
v p r . Notice that w
p r ∈ �
d+1 consists of the concatenation of message
indicator vector v p r and the time instance at which the message
occurs, t p r , as given in Fig. 3 .
Given the definitions above, any user u r , can be mapped to a
time series, which can be represented as one of these two matri-
ces: V r = [ v 1 r | v 2 r | . . . | v P r r ] or W r = [ w
1 r | w
2 r | . . . | w
P r r ] . The kernel func-
tion that measures the similarity of any two user pair ( u q , u r ), is
represented as K ( u q , u r ). κ(w
p r r , w
p q q ) is defined as the heat ker-
el to calculate the similarity between time-stamped message vec-
ors of any two users in the same interval: p th r message of r th user
nd p th q message of q th user. Using the user pair kernel functions,
or that time interval, we can calculate the kernel matrix K of size
U | × | U |.
. Adaptive distance-based change point detection estimator
Feature instances extracted from adjacent intervals within the
orrelation length of a stationary process tend to have high statis-
ical similarity. On the other hand, features originating from dif-
erent generative processes or from different sections of a non-
tationary process can be expected to have large pairwise dis-
ances. Based on this premise, a significant change in the distances
etween consecutive feature vectors in a time series can be in-
erpreted as an indicator of a change in the data generating pro-
ess. The Hidden Markov Model (HMM) can capture these regime
hanges as a switching variable from one generator to another in
he hidden layer. In the context of communication networks, such
n abrupt change in feature vectors corresponding to traffic inten-
ity patterns and/or of server resource utilization rates can be con-
ectured to signal a DDoS attack. A Distance-based Change Point
ethod (DCPM), as used in our work, first tracks the distances be-
ween sequential feature vectors and then computes the statistics
f these distances to decide for a change or not. Judicious choice
f a distance function can prove critical in the performance of ma-
hine learning algorithms. To this effect, one can use one of the
ell-known distance functions or attempt to learn a distance func-
ion specific to the problem at hand. In this work, we have opted
o use a learning scheme for the Mahalanobis distance.
.1. Mahalanobis distance
The Mahalanobis distance D M
between x i , x j ∈ �
d can be calcu-
ated as in Eq. (1) . The Mahalanobis distance is defined over sym-
etric positive semi-definite (PSD) matrices, ( M ∈ S + ) d × d , and
he choice of M can be made to account for the correlations be-
ween features and the differences between scales. The inverse of
full rank sample covariance matrix, �, gives rise to a special
ase of Mahalanobis distance ( M ∈ S ++ ), which assumes the data
s generated from a multivariate Gaussian distribution Under this
hoice and the Gaussian assumption, it can be shown that M maps
he data to uncorrelated and unit-variance Gaussian variables. Con-
ersely, if the features follow a standard Gaussian distribution with
ncorrelated components, then we have: M = � = I .
M
(x i , x j ) = (x i − x j ) � M (x i − x j ) (1)
Any symmetric positive semi-definite matrix can be factorized
s M = A
� A such that A is an e × d projection matrix and e ≤ d .
hus, the relation below can be obtained.
M
(x i , x j ) = (x i − x j ) � M (x i − x j )
= (x i − x j ) � A
� A (x i − x j )
= (A (x i − x j )) � A (x i − x j )
= (Ax i − Ax j ) � (Ax i − Ax j )
= ‖ a i − a j ‖
2 2 = D E (a i , a j )
= D A (x i , x j ) (2)
here a i = Ax i is the projected vector, and D E is the Euclidean dis-
ance. Eq. (2) shows that the Mahalanobis distance in the feature
pace is equivalent to the Euclidean distance in a projected space.
.2. Distance-based change point model
The distance-based change detection is achieved by inspecting
um of distances over a sliding window, called moving distance,
M. Semerci et al. / Computer Networks 136 (2018) 137–154 141
w
m
r
v
a
c
o
M
t
i
c
s
a
a
m
a
e
M
β
l
i
c
f
i
d
i
t
w
l
m
k
a
M
a
i
D
w
t
M
T
T
4
o
t
t
a
t
Algorithm 1 Adaptive Online Distance-Based Change Point Detec-
tion Algorithm.
1: Initialize M 0 (default I ).
2: Set k , λ, β and α (for εth ).
3: repeat
4: Inspect the SIP traffic in the time window of size k, and
compute the count vector.
5: if f (M n −1 | x n : x n −k −1 ) > εth then
6: Raise alarm.
7: Run the malicious user detector defined in Algorithm 4.
8: end if
9: Evaluate M
∗.
10: Set M n −1 = M
∗.
11: until the flow ends
d
b
f
p
o
d
t
x
d
d
χ
i
a
g
Z
Z
ε
c
o
b
o
a
p
a
T
d
f
ε
a
d
(
t
t
t
t
m
t
s
s
i
s
here distances between the current feature vector and its im-
ediate predecessors in a time-frame of size k are summed. The
esult of the sliding window sum is compared with a threshold
alue, εth, and an alarm is raised for the potential occurrence of
regime change. This step is followed by the malicious user dis-
rimination algorithm, as detailed in Section 4 . The main novelty
f this method is that we learn the weight matrix M (called the
ahalanobis metric from now on) under a loss function so that
he detection algorithm is adapted to inlier variations and trends
n the traffic intensity to avoid false alarms. The inlier variations
an be due to diurnal or week-day based changes or to short-lived
poradic flurry of call activities.
The moving distance over a k-sized time frame can be defined
s a function of the symmetric positive definite matrix, M ∈ S ++ ,s follows:
f (M | x n : x n −k −1 ) =
n −1 ∑
j= n −k −1
(x n − x j ) � M (x n − x j ) (3)
If the moving distance computed using the current Mahalanobis
etric is above the threshold, f (M n −1 | x n : x n −k −1 ) > εth , then an
larm is raised. The Mahalanobis metric is updated periodically at
ach time interval under the loss function given below:
min
∈ S ++ f (M | x n : x n −k −1 ) + λD ld (M , M n −1 ) + βD ld (M , I ) (4)
In Eq. (4) , the second and the third terms, λD ld (M , M n −1 ) and
D ld ( M, I ), respectively, are regularization functions based on the
ogarithmic determinant divergence [37] (LogDet). LogDet function
s a pseudo-metric that measures the distance between two matri-
es and is defined in Eq. (5) . Detailed information about the LogDet
unction is given in the Appendix section. The former regularizer
mposes the updated matrix to be as similar as possible to its pre-
ecessor. The latter one forces it to be as close as possible to the
dentity matrix to prevent it from converging to an irrelevant ma-
rix and at the same time to induce sparsity. Thus, their relative
eights can be gauged to trade-off the update rate of the Maha-
anobis metric and the aging of the effect of the past measure-
ents. The four parameters to be set are the sliding window size
(time frame size), the two regularization cost weights, λ and β ,
nd the parameter α for thresholding. At the start of the algorithm,
0 is initialized as the identity matrix, M 0 = I . Since the LogDet is
convex function of M , we are guaranteed to find the optimal pos-
tive definite matrix, that minimizes the criterion in Eq. (5) .
ld (M , M t−1 ) = tr (MM
−1 t−1 ) − log det (MM
−1 t−1 ) − d (5)
here tr( • ) is the trace function for the matrices.
The optimal Mahalanobis metric ( M
∗) can be found by taking
he derivative of Eq. (4) and setting it to zero.
∗ =
(λ
λ + βM
−1 n −1 +
β
λ + βI
+
1
λ + β
n −1 ∑
j= n −k −1
(x n − x j )(x n − x j ) � )
−1 (6)
his Mahalanobis metric update is repeated at each time index.
he change detection algorithm is given in Algorithm 1 .
.3. Thresholding of the moving distances
The characteristics of the moving average of distances depend
n the traffic volume intensity, the dimension of the feature vec-
or, the size of the time frame etc., and hence it becomes critical
o set a threshold value judiciously to detect regime anomalies or
brupt changes. In this study we test comparatively two different
hreshold functions.
Experimental evidence has shown that we can approximate the
istribution of the moving sum of distances as a Chi-squared distri-
ution. It is then assumed that Mahalanobis distances are obtained
rom a Gaussian distribution such that μ = x n in the immediate
ast observation interval, and � = M
−1 . If y , which is the set of
bservations in the current sliding window, is a d -dimensional ran-
om vector drawn from a Gaussian distribution with a mean vec-
or μ and a d -rank covariance matrix �, then z = (y − x n ) � M (y −
n ) = (y − μ) � �−1 (y − μ) becomes Chi-Squared distributed with
-degrees of freedom.
Let z i denote one of k independent, identically distributed ran-
om variables that follow a chi-square distribution such as z 1 ∼2 α,d 1
, z 2 ∼ χ2 α,d 2
, . . . , z k ∼ χ2 α,d k
. Due to the additive property of
ndependent chi-squared variables, the sum of the random vari-
bles follows a chi-square distribution with d 1 + d 2 + · · · + d k de-
rees of freedom. That is,
= z 1 + z 2 + · · · + z k
∼ χ2 α,d 1 + d 2 + ···+ d k (7)
Thus, the threshold of our anomaly detection model becomes
th = χ2 α,k ∗d
. The α parameter is the probability of accepting a
hance fluctuation as an anomaly. In other words in the absence
f an attack, the score of the moving average of distances, denoted
y Z above, has a probability less than α to exceed the thresh-
ld εth . The converse event of Z exceeding the threshold can be
ccepted as an anomaly with probability 1 − α. The value of α de-
ends on the requirements of the system and it is typically set by
n human expert to some such value as α ∈ {0.1, 0.05, 0.02, 0.01}.
his is a statistical approach that is based on the sum of observed
istances.
An alternate, empirically found constant threshold, which is a
unction of two system parameters is given below:
th = c k
(d
2
)2
(8)
nd is found to work equally well. This fixed threshold value only
epends on the time frame size ( k ), the number of dimensions
d ) and a constant c . As a plausible argument for the fact that
he constant thresholding function works equally well, we observe
hat the same parameters ( k and d ) are also inherent in the χ2
hresholding. Notice also that there is some liberty in adjusting
his threshold by setting the constant c according to the require-
ents of the deployed system. A case in point could be to make
he constant indexed by time periods c n , e.g., to account for sea-
onal trends.
More importantly, even though the threshold is set to a con-
tant, the system is still an adaptive model due to the adaptation
nherent in the updates of the Mahalanobis metric. At each ob-
ervation interval, the Mahalanobis metric is updated to accom-
142 M. Semerci et al. / Computer Networks 136 (2018) 137–154
w1q ,w
1r
w1q ,w
2r
w2q ,w
2r
w2q ,w
1r
w3q ,w
1r
w3q ,w
2r
Fig. 4. All possible alignments W q = [ w
1 q | w
2 q | w
3 q ] and W r = [ w
1 r | w
2 r ] .
u
s
g
t
5
W
1
s
t
i
T
F
m
m
K
w
t
s
K |
K )
5
h
q
t
t
i
D
w
i
t
s
w
5
a
h
0
d
a
s
K
modate the new distances between the observations. Therefore
whenever the threshold is exceeded, it means that the quadratic
smoother could not smooth out the new measurement digressions,
and therefore it is very likely to be an anomaly.
5. Malicious user discrimination
If a detected anomaly is in fact a DDoS attack, the next task is
to identify the set of malicious users that are presumably coordi-
nating to mount a distributed attack. For this analysis, each sub-
scriber’s behavior history in the observation interval is represented
as a time-series, as given in Fig. 1 . We process the time series us-
ing a similarity functions so that the subscribers with similar be-
havior patterns are clustered into the same group. We have pro-
posed and evaluated two different attacker discrimination meth-
ods. The first one is based on a global time series alignment kernel
that makes use of both epoch differences and feature distances be-
tween message sequences. The second one uses the user message
count vectors at the end of periodic observation intervals, i.e., the
information on message time instants are ignored. The pairwise
similarity of any two users is calculated using their count vectors.
5.1. Sequence alignment kernel
We consider the ensemble of the timestamped messages sent
by a user within a time frame of k units, say (n − k − 1) , . . . , (n −1) , as message sequences. Each user’s sequence can have a differ-
ent number of messaging events, each event occuring at a different
time instant. In other words, a user’s message sequence or time se-
ries corresponds to the ensemble of messages sent by a registered
terminal within the designated observation interval, each event be-
ing characterized by the type of SIP message and its timestamp.
Our goal is to estimate the similarity of messaging activities of the
users via a kernel-based scheme. For this purpose, the message se-
quences must be aligned without pair repetition. The similarity be-
tween two sequences of possibly different lengths, i.e, number of
messaging events, can be determined as the sum of similarities of
all their feasible alignments. Thus two sequences are more simi-
lar as a pair if their messaging types, e.g., invite or bye, and their
occurrences in time resemble each other.
Let us assume the user time series, i.e., timestamped message
sequences, ( W q , W r ) of the user pair ( u q , u r ), W q = [ w
1 q | w
2 q | w
3 q ]
and W r = [ w
1 r | w
2 r ] with three and two messaging events, respec-
tively. Fig. 4 shows an example of all possible alignments for these
two sequences. In this specific example, there are 5 possible align-
ments, as follows:
• (w
1 q , w
1 r ) , (w
1 q , w
2 r ) , (w
2 q , w
2 r ) , (w
3 q , w
2 r )
• (w
1 q , w
1 r ) , (w
2 q , w
2 r ) , (w
3 q , w
2 r )
• (w
1 q , w
1 r ) , (w
2 q , w
1 r ) , (w
2 q , w
2 r ) , (w
3 q , w
2 r )
• (w
1 q , w
1 r ) , (w
2 q , w
1 r ) , (w
3 q , w
2 r )
• (w
1 q , w
1 r ) , (w
2 q , w
1 r ) , (w
3 q , w
1 r ) , (w
3 q , w
2 r )
A global alignment kernel has been proposed in [38] , which
ses dynamic programming to compute the similarity of all pos-
ible alignments of two sequences. We use a variation of this al-
orithm, where we employ a pairwise heat kernel that is based on
he Mahalanobis distance and differences of time stamps.
.1.1. Global sequence alignment kernel
Given the two message sequences W q = [ w
1 q | w
2 q | . . . | w
P q q ] and
r = [ w
1 r | w
2 r | . . . | w
P r r ] for the user pair ( u q , u r ) in a state space
, we set the doubly-indexed series T p q ,p r as T p q , 0 = 0 for p q = , . . . , P q , T 0 ,p r = 0 for p r = 1 , . . . , P r , and T 0 , 0 = 1 . We also as-
ume that there is a function to measure the similarity be-
ween the p th q signaling event of user u q and the p th
r signal-
ng event of other user u r , κ(w
p q q , w
p r r ) . Computing recursively
(p q , p r ) ∈ { 1 , . . . , P q } × { 1 , . . . , P r } , for the terms, one has:
p q ,p r = (T p q ,p r −1 + T p q −1 ,p r −1 + T p q −1 ,p r ) κ(w
p q q , w
p r r ) (9)
inally, the unnormalized similarity between two users ( u q , u r ) is
easured when the recursion has considered all possible align-
ents, that is:
unnormed (u q , u r ) = T P q ,P r (10)
After that the kernel matrix for all user pairs has been obtained,
e unit-diagonal normalize the | U | × | U | kernel matrix, where | U | is
he number of active users in the system, in order to eliminate any
caling issues:
(u q , u r ) =
K unnormed (u q , u r ) √
K unnormed (u q , u q ) √
K unnormed (u r , u r ) , q, r = 1 , . . . , | U
(u q , u r ) → [0 , 1] (11
We will call this kernel as the time series kernel.
.1.2. Pairwise heat kernel
Each user in a time window can be represented in terms of
er ordered timestamped message sequence. Recall that user se-
uences can have differing lengths and can consist of different
ypes of messages.
A kernel function (pairwise heat function) for any two times-
amped vectors, (w
p q q )
� = ((v p q q )
� , t p q q ) and (w
p r r ) � = ((v
p r r ) � , t p r r )
s evaluated as:
κ(w
p q q , w
p r r ) = exp (−γ D M
(v p q q , v
p r r ) − ρ| t p q q − t p r r | ) (12)
M
(v p q q , v
p r r ) = (v
p q q − v p r r )
� M (v p q q − v p r r )
here M is the Mahalanobis metric evaluated at that observation
nterval as in Eq. (6) . Note that κ(w
p q q , w
p r r ) = 1 iff v
p q q = v
p r r and
p q q = t
p r r . The coefficients γ and ρ determine the weights of mes-
age type distance and timing distance, respectively. In this study
e have assumed γ = ρ = 1 .
.2. User distance kernel
A kernel matrix of pairwise user-to-user similarities can be cre-
ted based on their Mahalanobis distances. User pairs would have
igh similarity (close to 1) if their Mahalanobis distance is close to
; conversely, if the pair similarity is small (close to 0), then their
istance is large. The Mahalanobis distance kernel can be regarded
s a variant of Gaussian kernel.
Any two users, u q and u r , can be compared based on their mes-
aging count vectors v q , v r ∈ �
d , as follows:
(u q , u r ) = exp (−(v q − v r ) � M (v q − v r )) (13)
M. Semerci et al. / Computer Networks 136 (2018) 137–154 143
i
t
i
d
f
o
c
(
p
e
5
t
t
w
c
g
s
m
p
t
r
i
a
c
b
n
m
s
t
d
w
m
m
g
A
L
w
a
5
r
a
c
g
m
t
s
s
c
s
s
t
l
a
e
A
c
a
A
6
o
p
o
t
d
We will call this kernel simply as distance kernel. K ( u q , u r ) is 1
ff v q = v r Note that this feature vector does not take into account
he occurrence timing of the messages, but averages the messag-
ng traffic in that interval. We would like to point out again the
ifference between the two ways of measuring user behavior dif-
erences. In Eq. (13) , we consider the messaging events integrated
ver the observation interval, represented by the d -dimensional
ount vector of messaging events according to their types. In Eqs.
11) and ( 12 ), we calculate the difference of user behaviors by com-
aring and measuring distances, messaging event by messaging
vent, as they occur during the observation interval.
.3. Spectral clustering
A matrix of pairwise user-to-user similarities is created from
he users’ messages as in Eq. (11) or (13) . The kernel matrix, K ,
hen corresponds to a fully connected weighted adjacency graph,
here the users are the vertices and the similarities are the edge
osts. The adjacency matrix is expected to consists of two sub-
raphs: One representing the malicious users characterized by
imilar behavior patterns, and the other representing the non-
alicious users with random-like behavior patterns. In order to
artition this graph into these two sub-graphs, we have used
he normalized Laplacian spectral clustering algorithm. Such algo-
ithms are conceived to find graph partitioning solutions in cluster-
ng problems. In the literature there are various spectral clustering
lgorithms. We have preferred to use normalized Laplacian spectral
lustering because we want to not only have the similar nodes to
e closely projected to each other, but also to have the dissimilar
odes to be projected far from each other. The normalized spectral
ethods satisfy both of these criteria, as discussed in [39] .
The degree of q th active user in the kernel matrix, which is the
um of all the weight entries related the q th active user, at a given
ime frame is evaluated as:
g q =
| U| ∑
r=1
K q,r (14)
here K q,r = K (u q , u r ) .
The degree matrix D is a diagonal matrix whose diagonal ele-
ents contain the degree values, dg 1 , dg 2 , . . . , dg | U| . The Laplacian
atrix, L , is evaluated as in Eq. (15) and the spectral clustering al-
orithm is given in Algorithm 2 .
lgorithm 2 Normalized Laplacian Spectral Clustering.
1: Given K , evaluate D and L , which are all in �
| U |×| U | . 2: Compute the first two eigenvectors, ψ 1 and ψ 2 , of the two
smallest eigenvalues 0 = λ1 < λ2 for the generalized eigenprob-
lem L ψ = D ψ , where is the diagonal matrix of eigenvalues
λ1 , . . . , λ| U| . 3: Matricize ψ 1 and ψ 2 vectors to obtain � ∈ �
| U|×2 . Use the rows
of � as the new feature vectors in the mapped space, y ∈ �
2 .
Apply 2-means clustering.
4: Return the cluster label vector C from 2-means clustering.
= D − K (15)
here K is the | U | × | U | is kernel matrix whose entries, K ( u q , u r ),
re calculated as in Eq. (11) or ( 13 ).
.4. Automatic identification of malicious users cluster
The malicious users are conjectured to be characterized by
epetitive and correlated behaviors, and the rest of users are char-
cterized by uncoordinated and diverse behaviors. Once the two
lusters are obtained, then the final task would be that of distin-
uishing the attacker set.
For each of the two clusters, we compute the sample covariance
atrix of the user message sequence vectors in that cluster. Since
he malicious user cluster is assumed to consist of similar mes-
aging behaviors, such message vectors are expected to be more
trongly aligned along a few particular axes. In fact, in the extreme
ase when all messages in the cluster are of the same type, the
ample covariance matrix would be the 0 matrix. Therefore, we as-
ign the cluster with significantly higher eigenvalue concentration
o malicious users. This algorithm, based on the heuristics that ma-
icious users must be somewhat coordinated to mount an attack,
nd therefore that the data vectors must concentrate along a few
igenvectors as given in Algorithm 3 . Each cluster is assumed to
lgorithm 3 Cluster Selection Heuristics.
1: For the given cluster label vector C , determine the two clusters,
C 1 and C 2 .
2: For the two clusters, evaluate the sample covariance matrix of
the projected message vectors.
3: if a cluster has a covariance matrix = 0 then
4: Return this cluster.
5: else
6: Evaluate the eigenvalues of the cluster covariance matrices.
7: Return the cluster with the highest eigenvalue
8: end if
ontain at least two subscribers.
Putting all of these steps together, the algorithm to detect the
ttackers is summarized in Algorithm 4 .
lgorithm 4 Attacker Detection.
1: if Global Sequence Kernel is used then
2: Set the weight parameters γ and ρ of the pairwise heat ker-
nel.
3: Evaluate the kernel matrix K such that ∀ (u q , u r ) ∈ U × U , we
have K q,r = K(u q , u r ) , where we use the time-stamped mes-
sage sequences W q , W r of the q th and rth users in the given
time interval, respectively, with the alignment kernel, and U
is the set of active users.
4: Unit-diagonal normalize K unnormed to obtain K .
5: end if
6: if User Distance Kernel is used then
7: Evaluate the kernel matrix K such that ∀ (u q , u r ) ∈ U × U , we
have K q,r = K(u q , u r ) , where we use the total message count
vectors v q , v r of the q th and rth users in the given time in-
terval, respectively, with the distance kernel, and U is the set
of active users.
8: end if
9: Apply the normalized Laplacian spectral clustering algorithm
over K such that # clusters = 2, as defined in Algorithm 2.
10: Use the cluster label vector C returned by the spectral cluster-
ing in cluster selection heuristics as defined in Algorithm 3.
11: Return the selected cluster members as the set of attackers.
. Experiments
As is often reported in the literature, we have also found that
btaining and getting the permission to use VoIP server datasets
roves to be very problematic, mostly due to privacy concerns
f the subscribers and the commercial secrecy concerns of the
elecommunication operators. Therefore, we have used simulated
ata sets to analyze the performance of the change point detection
144 M. Semerci et al. / Computer Networks 136 (2018) 137–154
Fig. 5. SIP network simulation framework.
t
u
t
T
l
t
a
(
e
t
t
h
W
c
t
t
t
a
t
e
s
o
a
i
o
a
I
5
m
s
(
w
p
o
a
s
d
a
b
a
λ
model, detailed in Section 4 , and of the malicious user identifi-
cation algorithm, given in Section 5 . An Asterisk-based PBX soft-
ware, named Trixbox , is deployed as the SIP server in a virtual ma-
chine [40] . To mimic the traffic on a SIP server, we have built a
probabilistic SIP network simulation system, which initiates calls
between a number of probabilistically chosen users in real-time
[41,42] . An application that creates the user agents is deployed on
another virtual machine. We have used PJSIP open source library
[43] and implemented it in Python language. Lastly NOVA V-Spy, a
vulnerability scanning tool, is installed on a final virtual machine
and is used to simulate DDoS attacks targeting the server [44] . An
overview of the simulation environment is provided in Fig. 5 . The
proposed security system runs on the same machine with the SIP
server, as represented with a gray box in Fig. 5 .
The traffic simulator, based on a probabilistic model, generates
real-time SIP messaging traffic among registered subscribers [41] .
The probabilistic model is basically a library that initiates all per-
mitted actions of subscribers in generating real-life SIP messaging
traffic through a SIP server. Instances of subscriber actions are: the
potential callees and callers (the social network), how likely to call
a certain contact (the phone book), how often to become active
(registration frequency to the SIP server), how long to wait before
the next call (the call frequency), how likely to make a call (the call
probability), how likely to answer an incoming call (the response
probability), and how long to talk on the phone (the call duration).
The parameters provided to the simulator determines the behavior
of probabilistic model and therefore statistically the actions of sub-
scribers. The environmental parameters of the simulator are the to-
tal number of subscribers in the SIP server can serve and the num-
ber of social groups, where a social group is defined as the subset
of subscribers that are more likely to interact with each other as
compared to the rest. All subscribers are created as bots on the
simulation machine and they all follow legitimate messaging rules
of the protocol. An existing subscriber bot is active as long as its
registration on the SIP server has not expired; therefore only active
bots can interact with each other.
Data are collected by inspecting each SIP packet that arrives to
or is sent by the server. Counts of 14 SIP request and 14 SIP re-
sponse packets are recorded periodically for each time unit (which
is assumed to be 1 s in our experiments) and the 28-dimensional
vector, made up of packet counts per unit interval, constitute the
input data. The SIP message types, which are described in details
in RFC 3261 [4] , for which we record periodically the counts are as
follows:
• Requests: Register, Invite, Subscribe, Notify, Options, Ack, Bye,
Cancel, Prack, Publish, Info, Refer, Message, Update
• Responses: 100, 180, 183, 200, 400, 401, 403, 404, 405, 481,
4 86, 4 87, 500, 603
The experimental environment is controlled by two parame-
ers: the intensity of the background traffic, that is, the normal-
ser traffic and the intensity of DDoS attack traffic. In our simula-
ion system at any time there are 200 active registered subscribers.
here are 5 levels of preset normal traffic intensity created col-
ectively by the subscriber bots. The normal subscriber bots, on
he average generate a total of 5, 10, 20, 40 and 80 call attempts
mong themselves (0.025 to 0.4 message/bot), in any observation
1 s) interval. We grade these background traffic intensities as lev-
ls from 1 to 5. Fig. 6 exhibits these traffic intensities for a simula-
ion setting. Note that the gray tones in the plots are proportional
o the message counts so that the darker an region in the plot, the
igher the number of that type messages observed in that interval.
hite represents intervals with no messages and intervals with a
ount higher than 200 messages are shown in pitch black.
During an attack, the attackers start sending messages more in-
ensely to the SIP server. In the low-level attack setting, each at-
acker sends on the average 50 messages, while in a high-level at-
ack, 100 messages are sent per unit interval. In each run, ten DDoS
ttack sessions are simulated, consisting of attacks using the five
ypes of messages (Invite, Register, Options, Cancel and Bye), and
ach carried out once with low intensity and once with high inten-
ity. The runs are repeated ten folds, such that in each fold attacks
ccur in a different order and a different set registered subscribers
re selected to act as attackers.
During a DDoS attack, for a given setup, as long as not explic-
tly stated otherwise, 10 randomly selected users, that is 5 percent
f the subscribers, play the role of attackers. During attacks, the
ttackers start sending messages more intensely to the SIP server.
n the low-level attack setting, each attacker sends on the average
0 messages, while in a high-level attack, their rate becomes 100
essages per second. In each run, ten DDoS attack sessions are
imulated, consisting of attacks using the five types of messages
invite, register, options, cancel and bye), and each carried out once
ith low intensity and once with high intensity. The runs are re-
eated ten folds, such that in each fold attacks occur in a different
rder and a different set registered subscribers are selected to act
s attackers. In Fig. 6 the darker regions correspond to attacks.
The experiments are executed in a 10-fold cross-validation
etup. One dataset is used for determining the parameters of the
istance change point model, and the remaining nine datasets
re run with the estimated parameters. Recall that the distance-
ased change-point detector had three different parameters; we
pply a grid search to find the best parameters ( k = { 5 , 7 , 9 , 11 } ,= { 1 . 0 , 2 . 0 , 4 . 0 } , β = { 1 . 0 , 2 . 0 , 4 . 0 } and an additional fourth one
M. Semerci et al. / Computer Networks 136 (2018) 137–154 145
Fig. 6. Illustration of traffic intensities generated by the simulator.
f
t
ρ
t
q
a
o
t
a
a
a
a
u
P
R
F
F
g
c
t
t
p
f
r
o
i
a
t
c
6
o
t
r
p
t
s
s
f
p
o
s
a
t
i
s
t
fl
o
c
f
o
a
A
p
a
t
D
u
c
g
o
p
or χ2 thresholding α = { 0 . 01 , 0 . 02 } ). The default values are set for
he parameters of the time series alignment kernel as ( γ = 1 . 0 and
= 1 . 0 ). For the ARIMA model, we perform an exhaustive search
o find its optimal parameters p = { 1 , 2 , 3 , 5 , 10 } , d = { 0 , 1 , 2 } and
= { 0 , 1 , 2 , 3 , 5 , 10 } . Since we know the labels (the attack times and the identity of
ttackers) in the simulated data, we can evaluate the performance
f the proposed system in terms of the F-Measure. In the ideal case
he F-measure would be 1, which can be obtained only when there
re no falsely accused users (i.e., precision P = 1), all the attackers
re identified correctly (i.e., recall R = 1), and all the change points
re identified by the change point detector. The precision, recall
nd F-measure are evaluated as follows, for the case of malicious
sers:
recision (P) =
# detected true malicious users
# all detected malicious users (16)
ecall (R) =
# detected true malicious users
# all true malicious users (17)
-Measure (F) = 2
P R
P + R
(18)
or change point detection performance, we can replace the ar-
uments of P and R by detected true change points, all detected
hange points, and all true change points.
We have experimented with the duration of the observation in-
erval. Fig. 7 shows the effect of the length of the observation in-
erval for the normal traffic setting. Not surprisingly, as the sam-
ling interval increases, the messaging counts also go higher. A
ew words are in order to explain Fig. 7 : The abscissa represents
eal time in seconds while an observation in taken every T sec-
nds, T = 1 , 2 , . . . , 10 . We used in the graphic, the same gray tone
s used to represent the observed count vector, hence the appear-
nce of stretched bars. Furthermore, the longer the observation in-
erval, the more the number of messages seen for any type, and
onsequently the plots become have darker areas.
.1. Comparison with a competitor algorithm
Fig. 8 shows the performance of our algorithm in detecting the
nset and offset instants of the DDoS attacks. This figure is illus-
rative in that for the each simulation traffic setting, the best pa-
ameters are chosen for the models found via grid search. These
arameters used for performance comparisons are given in each
able. The ordinate lists the 28 types of SIP messages, the abscissa
hows the time in seconds, and the levels of gray show the inten-
ity of messages. The red lines indicate the change point instants
ound by the algorithms. The experiments demonstrate that both
roposed methods of thresholding are successful in detecting the
nset of attacks, but they may sometimes fail to detect the off-
et. The possible reason is that an attack causes an abrupt change
gainst the background of normal user traffic; however, after that
he incoming message intensity subsides at the end of an attack,
ts aftershock effects linger on at the server side with server re-
ponse messages. The ARIMA model often fails to raise alarms at
he correct instances and it is also affected by the short-time small
uctuations in the counts. Therefore, it gives incorrect onset and
ffset indications.
Attack Onset and Offset Determination: In our evaluations, we
onsider detecting the start and end instants of the attacks. There-
ore, we measure the number of attacks for which the onset and
ffset are correctly detected as well the miss and false alarm prob-
bilities (errors of the first type and of the second type). For the
RIMA model, we look at the start and end point of a contingent
eriod which is detected as an anomaly. Since an attack engenders
different behavior in the network (anomaly), the anomaly detec-
or should be able to detect the start and the end of an attack.
uring the comparison, we use the start and end times of contin-
ous intervals detected as the change-points for a fair comparison.
The 10-fold cross-validation performance scores of the distance
hange-point detector and the ARIMA based DDoS detector are
iven in Table 1 . Both thresholding functions are deployed. The
nset and offset times of the attacks are known and they are com-
ared with the change-point times returned by the models. For the
146 M. Semerci et al. / Computer Networks 136 (2018) 137–154
Fig. 7. Illustration of traffic intensities as a function of observation interval. All traffic is generated at level 3.
Table 1
The performances of the change-point detectors for normal traffic for 1 sec-
ond (For constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k =
5 , λ = 2 , β = 2 , α = 0 . 02 , for ARIMA p = 2 , d = 1 , q = 0 ).
Change-point detector Precision Recall F-Score
Constant-thresholding DCPM 0.70 ± 0.04 0.88 ± 0.07 0.79 ± 0.04
χ2 - Thresholding DCPM 0.81 ± 0.07 0.73 ± 0.10 0.77 ± 0.03
ARIMA 0.25 ± 0.11 0.15 ± 0.09 0.25 ± 0.04
w
t
6
m
t
s
d
i
c
b
g
c
t
t
t
s
6
i
e
s
i
1
o
a
m
o
s
o
t
r
ARIMA model, the change points are assumed to correspond to the
time instances where the alarms are raised (onset) and the alarms
are silenced (offset).
To assess the attack detection performance of the DCPM
(Distance-based Change Point Method) algorithm vis-a-vis to an al-
ternative method, we have run simulation experiments methods
with a method from the literature, an ARIMA-based DDoS detec-
tor [18] . The rationale for the choice of this competitor algorithm
is that it was the only model we could find in the literature oper-
ating in an on-line and unsupervised mode. At this stage we use
only one of the thresholding methods, namely χ2 thresholding as
in Eq. (7) , since the two methods yield comparable results. Their
comparative performance are given in Table 1 The proposed meth-
ods have higher performance scores than ARIMA. The main reason
is that the ARIMA detector fails to behave consistently in the attack
interval. It gives false onsets and offsets during an ongoing attack.
The DCPM methods give comparably close scores, but it should be
noted that the parameters should be set with respect to system
characteristics such as tolerance to false alarms or traffic intensity.
6.2. Effect of the observation interval length
Table 2 shows that increasing observation interval improves the
accuracy of the system; in fact the F-score increased by 10 points
when the interval is augmented from 1 to 10s. The obvious rea-
son is that the longer observation interval makes the attack traffic
statistics increasingly more distinct from the background. However
this improvement comes at the price of reduced time resolution,
here the onset and offset instances of the attack become propor-
ionally blurred.
.3. Effect of traffic intensity
Table 3 shows the effect of traffic intensity over the perfor-
ance of the change point detector. Even though the F -scores of
he detector running with empirical and statistical thresholds are
imilar, their precision and recall scores differ. The χ2 threshold
etector has higher precision resulting in lesser false alarms, but
t might miss more frequently an attack. On the other hand, the
onstant thresholding is more successful in detecting an attack
ut it results in more false alarms. Not surprisingly as the back-
round traffic intensity increases, the detection performance de-
reases. Obviously the fluctuations in the normal traffic confounds
he attack traffic, which becomes less distinctive. Conversely, when
he background traffic is low, the abrupt changes caused by the at-
acks are easier to detect. The optimal set of parameters should be
ought for each traffic intensity.
.4. Effect of overlapping attack intervals
Fig. 9 illustrates the flexibility of the proposed model. In this
nstance, the register attack is applied incrementally such that at
vents distanced in time by 80–90 s, a new set of 10 attackers
tarts an attack and at the same time the intensity of their attack is
ncreased by additional steps of 5 messages. For example, a set of
0 attackers starts at the 175th s with 5 register messages per sec-
nd, resulting in a total of 50 register messages per second; then
different set of 10 attackers starts at the 255th s with 10 register
essages per second, resulting in a total of 100 messages per sec-
nd etc. The final set of attackers sends 50 register messages per
econd per attacker. When we set λ = β = 1 , for the fixed thresh-
ld model, the algorithm is able to detect the start and the end of
he attacks when c ≤ 3. For the chi-square thresholding, the algo-
ithm is able to detect the onset and offsets when α > 0.005.
M. Semerci et al. / Computer Networks 136 (2018) 137–154 147
Fig. 8. The change points and alarms raised by the models. The first five attacks are low-level (50 messages per attacker), while the last five attacks are high-level (100
messages per attacker).
148 M. Semerci et al. / Computer Networks 136 (2018) 137–154
Table 2
The performances of the change-point detectors for normal traffic intensity for different sampling
rates (For constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k = 5 , λ = 2 , β =
2 , α = 0 . 02 ).
Detector Score 1 s 2 s 3 s 5 s 10 s
Constant Precision 0.70 ± 0.04 0.72 ± 0.04 0.73 ± 0.04 0.74 ± 0.02 0.77 ± 0.01
Recall 0.88 ± 0.07 0.92 ± 0.04 0.97 ± 0.02 0.98 ± 0.01 0.99 ± 0.01
F-Score 0.79 ± 0.04 0.81 ± 0.04 0.83 ± 0.02 0.84 ± 0.01 0.87 ± 0.01
χ2 Precision 0.81 ± 0.07 0.83 ± 0.05 0.85 ± 0.04 0.87 ± 0.03 0.92 ± 0.02
Recall 0.73 ± 0.10 0.75 ± 0.04 0.76 ± 0.03 0.80 ± 0.04 0.84 ± 0.02
F-Score 0.77 ± 0.04 0.79 ± 0.04 0.81 ± 0.05 0.84 ± 0.02 0.88 ± 0.01
Table 3
The performances of the change-point detectors for different traffic intensity levels for 1 s (For
constant thresholding k = 5 , λ = 1 , β = 4 , c = 1 , for χ 2 thresholding k = 5 , λ = 2 , β = 2 , α = 0 . 02 ).
Detector Score Level 1 Level 2 Level 3 Level 4 Level 5
Constant Precision 0.77 ± 0.03 0.73 ± 0.05 0.70 ± 0.04 0.69 ± 0.06 0.68 ± 0.08
Recall 0.90 ± 0.03 0.88 ± 0.04 0.88 ± 0.07 0.85 ± 0.06 0.83 ± 0.07
F-Score 0.82 ± 0.04 0.8 ± 0.07 0.79 ± 0.04 0.77 ± 0.07 0.75 ± 0.07
χ2 Precision 0.88 ± 0.06 0.85 ± 0.04 0.81 ± 0.07 0.81 ± 0.08 0.79 ± 0.06
Recall 0.79 ± 0.03 0.78 ± 0.03 0.73 ± 0.10 0.72 ± 0.06 0.7 ± 0.08
F-Score 0.83 ± 0.05 0.81 ± 0.05 0.77 ± 0.04 0.76 ± 0.06 0.74 ± 0.08
Fig. 9. Register attacks increasing at incremental steps of 5.
t
c
l
t
t
l
m
c
f
w
l
t
6
w
o
c
r
n
b
n
6.5. Detection performance for time overlapped attacks
Fig. 10 shows the performance of the detector when the at-
tacks are overlapping. The vertical bars in the figure indicate the
detected onsets and offsets of the anomalous traffic when a fixed
threshold is used k = 5 , λ = β = 1 and c = 1 . The χ2 thresholding
for α = 0 . 02 shows very similar performance. Each attack type is
executed twice with 10 different attackers each time. The first oc-
currences are with 5 messages per second and the second ones are
with 10 messages per second. For example, the first cancel attack
starts with 50 messages (5 cancel messages ∗ 10 attackers) per sec-
ond around 400th s and the second cancel attack starts with 100
messages per second around 850th s.
6.6. Effects of DCPM parameters
The λ and β parameters provide the trade-off between aging
and agility of the system. If the aging parameter λ is set to a value
higher than β , the system is more resistive to the current change
in the system and it is biased to keep its status quo. On the con-
trary, a higher β value means the system is unbiased to any change
and its effects to the system will be eliminated sooner.
In the case of χ2 thresholding, the α parameter determines the
tolerance to false alarms. If it is set to a high value (e.g α = 0 . 1 ),
he algorithm is more likely to raise an alarm in case of an abrupt
hange even though it may not be caused by an attack. If it has a
ow value (e.g., α = 0 . 01 ), then the number of false The c parame-
er plays a role similar to the α parameter of χ2 thresholding, in
hat c determines the tolerance for the false alarms. Setting it to
ow values (e.g. c = 0 . 5 ), may cause even a fluctuation of the nor-
al traffic to raise an alarm. Conversely, for its high values (e.g.,
= 5 ), then attacks might go undetected.
Fig. 11 shows the Receiver Operating Characteristic (ROC) curve
or the DCPMs for all other parameters fixed other than c and α,
hich are the constant threshold coefficient and the significance
evel, respectively. Both c and α decrease while we traverse along
he curves.
.7. Performance of attacker identification methods
To assess attacker identification performance, we experiment
ith the two proposed spectral clustering methods, the one based
n the time series kernel and the other on the distance kernel. As a
ompetitor method for attacker identification, we use the time se-
ies clustering method proposed in [30] . In the latter method, dy-
amic time warping distance is used for calculating the distances
etween time series having different lengths, and a one-nearest
eighbor network is thus extracted. The performances of these
M. Semerci et al. / Computer Networks 136 (2018) 137–154 149
Fig. 10. Overlapping mixed types of attack increasing at incremental steps of 5.
Fig. 11. The receiver operating characteristics curve of distance-based change point models as α goes from 0.1 to 0.01 by 0.01 decrease and c goes from 1.0 to 0.65 by 0.05
decrease.
Table 4
The performance of different attacker identifiers. Constant and χ 2 stand for DCPMs
run with constant and χ2 thresholding, respectively. Distance, Time Series and Clus-
tering represent the distance kernel, the time-series kernel and the time-series clus-
tering, respectively, for the sampling interval of 1 s and level 4 traffic and where the
DCPM parameters are the same with Table 1 .
Model Precision Recall F-Score
Constant - Distance 0.64 ± 0.13 0.5 ± 0.08 0.55 ± 0.1
Constant - Time series 0.68 ± 0.1 0.51 ± 0.1 0.57 ± 0.1
Constant - Clustering 0.49 ± 0.12 0.37 ± 0.07 0.4 ± 0.08
χ2 - Distance 0.72 ± 0.07 0.57 ± 0.04 0.62 ± 0.05
χ2 - Time series 0.77 ± 0.05 0.60 ± 0.06 0.66 ± 0.06
χ2 - Clustering 0.5 ± 0.19 0.36 ± 0.13 0.4 ± 0.15
t
t
s
l
t
o
c
t
b
p
a
(
u
1
c
v
u
s
n
c
i
n
m
hree attacker identification methods are given in Table 4 . Notice
hat the attacker identification methods are run in an unsupervised
etting. This is a viable approach since most of the time, in reality,
abeled training data would not be available. This is due to either
he changing characteristics of attack models, e.g. a 0 ′ th day attack,
r privacy and prestige concerns of the service providers.
Fig. 12 shows the normalized kernel matrices, calculated ac-
ording to the time series kernel and the distance kernel, respec-
ively, as in Eqs. (11) –(13) . These matrices represent the messaging
ehavior similarity of the set of 200 users, as it is set in this ex-
eriment: In Fig. 12 however, we plot the behavioral similarity of
subset of 25 users for clarity of illustration. 10 of the 200 users
though only the behaviour of only 25 − 10 = 15 of the normal
sers are plotted) mount a DDoS attack, as shown in Fig. 12 a and
2 b. The users with similar behavior patterns have kernel values
lose to 1 (dark cells), while the uncorrelated users have kernel
alues close to 0 (white cells). Note that for the similarity val-
es between the attackers (the attacker-attacker cells), there are
ome gray shades implying a modest similarity. But the attacker-
ormal user cells are almost all completely white (close to 0) be-
ause malicious and innocents users have totally different behav-
ors. In summary, the attackers are closer to each other than to the
ormal users.
Fig. 12 c shows the attacker group labeled according to the 2-
eans clustering and malicious-user differentiation heuristics in
150 M. Semerci et al. / Computer Networks 136 (2018) 137–154
a) The evaluated time series kernel matrix (γ = ρ = 1)
b) The evaluated distance kernel matrix
c) Both kernels result in the same spectral label matrix
Fig. 12. The difference between kernels in discriminating the malicious users. Note
that the plots show only 25 of the 200 users for clarity purposes. In the bottom
figure, the dark red cells correspond to the attacker pairs. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version
of this article.)
a) The time series alignment kernel mapping
b) The distance kernel mapping
Fig. 13. The mapping of the two kernel matrices in the projected 2-d space ( s 1, s 2)
after spectral clustering. The blue squares are the attackers and the red circles are
the innocent users for the level 4 intensity traffic with the observation interval of
1 s. (For interpretation of the references to color in this figure legend, the reader is
referred to the web version of this article.)
p
t
i
a
a
t
t
i
i
a
s
m
l
u
Algorithm 3 . The dark red cells correspond to attacker pairs; K q, r is
1 (dark red) if ( u q , u r ) is an attacker pair. Both detectors correctly
segregate the 10 malicious users from the remaining 15 innocent
users.
Fig. 13 shows how the same subscribers in Fig. 12 are mapped
by the spectral clustering. Note that in the plots, the number of
oints are less than 25. The reason is that some users overlap in
he reduced 2-dimensional space.
The performance of the attacker identifier algorithms are given
n Table 4 . The identities of the attackers and of the normal users
re known since we are using simulation data. The labels of the
ttackers returned by the models are compared with true labels of
he users, and performance scores are computed accordingly. The
hree attacker identifiers, two of which are the models proposed
n this work (distance and time series kernels) and the third one
s the time series clustering model [30] , are comparatively evalu-
ted. The time series kernel algorithm yields the best performance
cores. The time series clustering model [30] has the worst perfor-
ance. The most likely reason for the lower performance of the
atter is that this model uses Euclidean distance, hence does not
se any data-driven weighting, for the calculation of dynamic time
M. Semerci et al. / Computer Networks 136 (2018) 137–154 151
Table 5
The processing times of attacker identification methods in seconds, for the obser-
vation interval of 1 second with setting in Table 1 .
Intensity Distance kernel Time series Clustering
Level 1 0.11 ± 0.07 19.81 ± 26.46 0.91 ± 1.03
Level 2 0.11 ± 0.07 21.20 ± 26.7 0.93 ± 1.05
Level 3 0.12 ± 0.07 22.02 ± 28.5 0.94 ± 1.12
Level 4 0.12 ± 0.08 26.89 ± 29.72 0.96 ± 1.45
Level 5 0.12 ± 0.08 29.4 ± 37.21 1.03 ± 1.94
Table 6
The processing times of attacker identification methods in seconds, for the different
observation intervals.
Sampling Distance kernel Time series Clustering
1 s 0.12 ± 0.07 22.02 ± 28.5 0.94 ± 1.12
2 s 0.16 ± 0.12 126.33 ± 144.30 5.06 ± 7.84
3 s 0.16 ± 0.08 326.55 ± 446.37 20.38 ± 31.66
5 s 0.24 ± 0.13 − 72.00 ± 95.23
10 s 0.76 ± 0.45 − 162.30 ± 286.62
w
t
6
m
t
t
l
v
c
s
t
f
T
e
t
o
t
o
p
a
s
t
m
r
t
s
s
T
1
7
n
t
t
o
t
t
m
p
a
t
fl
t
l
c
i
a
s
c
m
e
k
i
e
u
o
t
n
c
m
χ
t
a
m
d
o
o
t
p
s
t
s
t
S
r
r
s
t
t
i
t
s
S
o
d
m
t
f
T
m
n
n
A
d
k
a
t
s
t
d
arping distance. The experimental results, in Table 4 , show that
he two proposed methods have almost the same F -Score values.
.8. Time comparison of attacker identification methods
The time series kernel approach used in attacker detection is
ore accurate than the distance kernel, but requires longer run
imes. The run-times of times-series clustering model are between
he other two models, but it has significantly lower accuracy. Simi-
ar conclusions can be drawn for other traffic intensities and obser-
ation intervals. The average running times of the attacker identifi-
ation methods are given in Table 5 , where γ = ρ = 1 for the time
eries kernel, the order, which is the number of nearest neighbors
o process during cluster member candidate selection, is set to 1
or time series clustering. The observation interval is taken as 1 s.
he distance kernel does not have any parameters to be set. For
ach method, the number of clusters to be found is set to 2. This
able shows that as the traffic rate goes high, the running times
f the identification methods also increase. Note that the running
ime of the time series method is much higher than that of the
ther two models. The reason is that the computational load of
airwise similarities in the time-series kernel increases proportion-
lly to the number of active users and the number of messages
ent by the users.
Table 6 shows the run times of the models with respect to
he observation interval. The longer the interval, the longer the
odels take to identify the objects. The running time of time se-
ies kernel increases exponentially since the kernel is evaluated by
he pairwise similarity calculation of each messages sent by the
ubscribers. The longer the interval, the more messages the sub-
cribers send. Also there are more number of active subscribers.
he running time of time series kernel is not evaluated for 5 and
0 s interval since it takes so much time.
. Conclusions and future directions
This study has focused on the detection of DDoS attacks in SIP
etworks and on the identification of users coordinated in an at-
ack. An adaptive cyber security monitor is developed consisting of
wo basic components: a change-point detector to alert the system
f an ongoing attack and an identifier for the malicious user set.
The proposed change-point model tracks the Mahalanobis dis-
ance between the messaging counts in successive observation in-
ervals. The rationale is that a marked (dis)similarity of sequential
essage count vectors can uncover abrupt changes in the traffic
attern. High dissimilarity instances, i.e., the Mahalanobis distance
bove a threshold, is labeled as a candidate attack. The setting of
he threshold is critical to differentiate DDoS attacks from random
uctuations of the traffic. The proposed DCPM is capable to adapt
o the traffic variations due to the on-line estimation of the Maha-
anobis metric and hence yields significantly better performance as
ompared to the literature results.
Identification of DDoS attackers is based on behavioral similar-
ty in messaging sequences. Based on the premise that attackers
ct in a coordinated way while normal users show a much less
tructured messaging pattern, two corresponding clusters are con-
eived. The user-to-user similarity is measured by kernelizing their
essaging time series. In the time-series kernel function we use
xplicitly the timestamps of the messaging events; in the distance
ernel, we collapse the messaging activities within an observation
nterval into a cumulative count vector. The behavioral clusters are
xtracted using normalized Laplacian spectral clustering.
The performance of the proposed system is tested over a sim-
lated SIP network environment, which simulates transactions of
rdinary subscribers and attackers. Depending on the intensity of
he normal network traffic, observation interval and attack mag-
itude, our F-scores are more than 0.70 for the distance-based
hange point models, which is much higher than that of the ARIMA
odel. The area under curve (AUC) values are 0.993 and 0.994 for2 and constant thresholding, respectively.
The effects of observation window length, background traffic in-
ensity and parameter settings for the proposed DCPM methods
re discussed in details. The longer observation windows result in
ore accurate attack detectors, but they come with a price of re-
uced onset/offset resolution. As one should expect, the intensity
f background traffic has a diminishing effect on the performance
f the proposed methods: The more fluctuations the traffic has,
he lower the F -scores are, which are still higher than 0.70. The
arameters of the models should be calibrated to account for the
easonal changes.
The attacker identification algorithms are also compared in de-
ails. The time-series kernel has higher F -scores but has a con-
iderable running time. Longer observation intervals form longer
ime-series and the running times increase almost exponentially.
imilarly, the higher intensity of traffic causes an increase in the
unning time. Even though the distance kernel has lower accu-
acy values, its running time is almost unaffected by the ob-
ervation window interval or the traffic intensity. The reason is
hat each user is represented as a vector and the number of
he operations are not affected by the window interval or the
ntensity.
This study can be advanced in several ways. First, in addition
o observed message traffic, one can use additional data sources,
uch as SIP server log registers or its resource usage, e.g., CPU load.
econd, the distance change-point model compares only the last
bservation interval with the immediately preceding k frames to
etect changes in the traffic. This can be extended to consider the
ost current m frames and the k frames in its past. We conjec-
ure that the comparison of two group-of-frames might diminish
alse alarms, that is, changes detected which are not DDoS attacks.
hirdly, though the time-series kernel has slightly higher perfor-
ance, it takes longer time to respond due to the cost of ker-
el matrix computation. The distance kernel is faster but it does
ot benefit from the occurrence time information of the messages.
n hybrid kernel might provide a more accurate detector than the
istance kernel and a faster detector than the sequence alignment
ernel. Finally, we have so far considered the cost of false negatives
nd false positives to be equal. From the point of view of opera-
ors that deploy SIP servers, these two costs are not equal and this
hould be taken into consideration in setting the threshold for at-
acker detection. Delayed response to an DDoS attack and suffering
egradation of quality of service should be weighted against taking
152 M. Semerci et al. / Computer Networks 136 (2018) 137–154
w
D
E
D
)
)
)
i
D
g
b
o
c
�
o
R
preventive action toward subscribers that may not all be malicious
users.
DDoS attacks may look deceptively simple, but they have
proved to be hard to prevent, and they will be one of the major
cyber security concerns with the spread of Internet-Of-Things (IoT)
devices. The capabilities of IoT devices and their security vulnera-
bilities (e.g. weak passwords or no protection mechanisms at all)
make them easy victims as zombies for botnet applications, such
as Mirai [45] . The botnets are also evolving and becoming adap-
tive against the deployed counter-measures. Thus, more research
should be carried over particular in detection of attack sources
to overcome the possible outages and network congestion on the
horizon with the wide spread of IoT devices.
Acknowledgment
This study is partially funded with TEYDEB project number
3140701 “Realization of Anomaly Detection and Prevention with
Learning System Architectures, Quality Improvement, High Rate
Service Availability and Rich Services in a VoIP Firewall Product”,
by the Scientific and Technological Research Council Of Turkey
(TUBITAK). NOVA V-Gate and V-Spy are trademark cyber-security
products of NETAS.
This section focuses on the details of the LogDet function.
The Kullback–Leibler (KL) divergence from distribution Q to dis-
tribution P , where p ( x ) and q ( x ) are their respective probability
density functions, and x ∈ �
d , is calculated as:
D KL (P ‖ Q ) =
∫ p(x ) log
(p(x )
q (x )
)dx (19)
= E P
[ log
(P
Q
)] (20)
Assuming that both p and q are multivariate Gaussian distributions
with mean vectors μp and μq and covariance matrices �p and �q ,
respectively, one has:
p(x ) =
1
(2 π) d/ 2 det ( �p ) 1 / 2 exp
(− 1
2
(x − μp ) � �p
−1 (x − μp ) )
(21)
q (x ) =
1
(2 π) d/ 2 det ( �q ) 1 / 2 exp
(− 1
2
(x − μq ) � �q
−1 (x − μq ) )
(22)
Using definitions 21 and 22 in Eq. (19), one obtains:
D KL (P ‖ Q ) = E P [
log P − log Q
](23)
=
1
2
E P [
− log det �p − (x − μp ) � �p
−1 (x − μp )
+ log det �q + (x − μq ) � �q
−1 (x − μq ) ]
(24)
=
1
2
log det �q
det �p +
1
2
E P [
− (x − μp ) � �p
−1 (x − μp )
+(x − μq ) � �q
−1 (x − μq ) ]
(25)
=
1
2
log det �q
det �p +
1
2
E P [
− tr ( �p −1 (x − μp )(x − μp )
� )
+ tr ( �q −1 (x − μq )(x − μq )
� ) ]
(26)
Here we use the identity a � Bc = tr (Bca � ) for any a, c ∈ �
d and
B ∈ �
d × d . Since both the trace and integration are linear operators,
e can proceed as follows:
KL (P ‖ Q ) =
1
2
log det �q
det �p − 1
2
tr ( �p −1
E P [(x − μp )(x − μp )
� ]) +
1
2
tr ( �q −1
E P [(x − μq )(x − μq )
� ]) (27)
=
1
2
log det �q
det �p − 1
2
tr ( �p −1 �p )
+
1
2
tr ( �q −1
E P [xx
� − μq x
� − x μq � + μq μq
� ]) (28)
= −1
2
log det �p
det �q − 1
2
tr (I )
+
1
2
tr ( �q −1
E P [(xx
� − μq x
� − x μq � + μq μq
� ]) (29)
Note that one has, det �p
det �q = det �p �q
−1 , tr (I ) = d and
P
[xx �
]= �p + μp μp
� . The Kullback–Leibler divergence becomes:
KL (P ‖ Q ) = −1
2
log det �p �q −1 − 1
2
d +
1
2
tr
(�q
−1 ( �p + μp μp �
−μq μp � − μp μq
� + μq μq � )
)(30
= −1
2
log det �p �q −1 − 1
2
d +
1
2
tr ( �q −1 �p )
+
1
2
tr
(�q
−1 ( μp μp � − μq μp
� − μp μq � + μq μq
� ) )
(31
= −1
2
log det �p �q −1 − 1
2
d +
1
2
tr ( �q −1 �p )
+
1
2
( μp − μq ) � �q
−1 ( μp − μq ) (32
Now, let’s assume that the means vectors of p and of q are
dentical, μp = μq , then we can conclude that:
KL (P ‖ Q ) =
1
2
(tr ( �q
−1 �p ) − log det �p �q −1 − d
)(33)
=
1
2
(tr ( �p �q
−1 ) − log det �p �q −1 − d
)(34)
=
1
2
D ld ( �p , �q ) (35)
D ld ( �p , �q ) is called logarithmic determinant (LogDet) diver-
ence and can be used as a pseudo-metric to measure the distance
etween two matrices. A metric function D : X × X → [0, ∞ ) defined
n a set X must hold three basic conditions, as below, for any a, b,
∈ X :
1. D ( a, b ) ≥ 0 and D (a, b) = 0 iff a = b (non-negativity and
positive-definiteness)
2. D (a, b) = D (b, a ) (symmetry)
3. D (a, c) ≤ D (a, b) + D (b, c) (triangle inequality)
D ld ( �p , �q ) is a pseudo-metric on positive-definite matrices,
p , �q ∈ �
d × d , since it only guarantees non-negativity, and the
ther two rules do not necessarily hold true.
eferences
[1] N. Raza, I. Rashid, F.A. Awan, Security and management framework for an or-
ganization operating in cloud environment, Ann. Telecommun. 72 (5) (2017)
325–333, doi: 10.1007/s12243- 017- 0567- 6 . [2] D. Bolton, Anonymous ‘declares war’ on Turkey, claims responsibility for
recent massive cyberattacks, 2015, ( http://www.independent.co.uk/life- style/gadgets- and- tech/news/anonymous- declares- war- on- turkey- opsis- russia-
cyberattack- erdogan- a6784026.html ), [Online; accessed 04-06-2017].
M. Semerci et al. / Computer Networks 136 (2018) 137–154 153
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[[
[
[3] B.B. Gupta, T. Akhtar, A survey on smart power grid: frameworks, tools, se-curity issues, and solutions, Ann. Telecommun. 72 (9) (2017) 517–549, doi: 10.
1007/s12243- 017- 0605- 4 . [4] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks,
M. Handley, E. Schooler, SIP: session initiation protocol, RFC 3261, 2002 . http://www.ietf.org/rfc/rfc3261.txt
[5] M. Cooney, IBM warns of rising VoIP cyber-attacks, 2016, ( http://www.networkworld.com/article/3146095/security/ibm- warns- of- rising- voip- cyber-
attacks.html ), [Online; accessed 04-06-2017].
[6] C. Wilson, DDoS attacks targeting traditional telecom systems, 2012, ( https://www.arbornetworks.com/blog/asert/ddos-attacks-targeting-traditional-
telecom-systems/ ), [Online; accessed 04-06-2017]. [7] C. Yildiz , M. Semerci , T.Y. Ceritli , B. Kurt , B. Sankur , A.T. Cemgil , Change point
detection for monitoring SIP networks, in: Proceedings of the European Con-ference on Networks and Communications (EuCNC2016), 2016 .
[8] A.D. Keromytis , A comprehensive survey of voice over IP security research,
IEEE Commun. Surv. Tutor. 14 (2) (2012) 514–537 . [9] D. Sisalem , J. Kuthan , S. Ehlert , Denial of service attacks targeting a SIP VoIP
infrastructure: attack scenarios and prevention mechanisms, IEEE Network 20(5) (2006) 26–31 .
[10] E.Y. Chen , M. Itoh , Scalable detection of SIP fuzzing attacks, in: Proceedings ofthe Second International Conference on Emerging Security Information, Sys-
tems and Technologies., SECURWARE, 2008, pp. 114–119 .
[11] S. Ehlert , D. Geneiatakis , T. Magedanz , Survey of network security systemsto counter SIP-based denial-of-service attacks, Comput. Secur. 29 (2) (2010)
225–243 . [12] Z. Chen , R. Duan , The formal analyse of DoS attack to SIP based on the SIP
extended finite state machines, in: Proceedings of the International Conferenceon Computational Intelligence and Software Engineering, 2010, pp. 1–4 .
[13] N. Vrakas, C. Lambrinoudakis, An intrusion detection and prevention system
for IMS and VoIP services, Int. J. Inf. Secur. 12 (3) (2013) 201–217, doi: 10.1007/s10207- 012- 0187- 0 .
[14] R. Vijayasarathy , S.V. Raghavan , B. Ravindran , A system approach to networkmodeling for DDoS detection using a Naive Bayesian classifier, in: Proceedings
of the Third International Conference on Communication Systems and Net-works (COMSNETS), IEEE, 2011, pp. 1–10 .
[15] C. Yildiz , T.Y. Ceritli , B. Kurt , B. Sankur , A.T. Cemgil , Attack detection in VOIP
networks using Bayesian multiple change-point models, in: Proceedings of theTwenty Fourth Conference on Signal Processing and its Applications (SIU),
2016, pp. 1301–1304 . [16] M. Nassar , R. State , O. Festor , A framework for monitoring SIP enterprise net-
works, in: The Fourth International Conference on Network and System Secu-rity (NSS), 2010, pp. 1–8 .
[17] Z. Tsiatsikas, D. Geneiatakis, G. Kambourakis, S. Gritzalis, Realtime DDoS De-
tection in SIP Ecosystems: Machine Learning Tools of the Trade, Springer In-ternational Publishing, Cham, pp. 126–139.
[18] S.M.T. Nezhad , M. Nazari , E.A. Gharavol , A novel DoS and DDos attacks detec-tion algorithm using ARIMA time series model and chaotic system in computer
networks, IEEE Commun. Lett. 20 (4) (2016) 700–703 . [19] A . D’Alconzo, A . Coluccia, P. Romirer-Maierhofer, Distribution-based anomaly
detection in 3g mobile networks: from theory to practice, Int. J. Netw. Manag.20 (5) (2010) 245–269, doi: 10.1002/nem.747 .
20] A . D’Alconzo, A . Coluccia, F. Ricciato, P. Romirer-Maierhofer, A distribution-
based approach to anomaly detection and application to 3g mobile traffic, in:Proceedings of the IEEE Global Telecommunications Conference on GLOBECOM,
2009, pp. 1–8, doi: 10.1109/GLOCOM.2009.5425651 . [21] M. Anagnostopoulos, G. Kambourakis, S. Gritzalis, New facets of mobile botnet:
architecture and evaluation, Int. J. Inf. Secur. 15 (5) (2016) 455–473, doi: 10.1007/s10207-015-0310-0 .
22] G. Kirubavathi, R. Anitha, Structural analysis and detection of android botnets
using machine learning techniques, Int. J. Inf. Secur. (2017) 1–15, doi: 10.1007/s10207- 017- 0363- 3 .
23] S.S. Silva, R.M. Silva, R.C. Pinto, R.M. Salles, Botnets: a survey, Comput. Netw.57 (2) (2013) 378–403 . https://doi.org/10.1016/j.comnet.2012.07.021
[24] P. Garca-Teodoro, J. Daz-Verdejo, G. Maci-Fernndez, E. Vzquez, Anomaly-basednetwork intrusion detection: techniques, systems and challenges, Comput. Se-
cur. 28 (12) (2009) 18–28 . https://doi.org/10.1016/j.cose.2008.08.003 25] M. Gupta , J. Gao , C.C. Aggarwal , J. Han , Outlier detection for temporal data: a
survey, IEEE Trans. Knowl. Data Eng. 26 (9) (2014) 2250–2267 . 26] R.J. Hyndman , E. Wang , N. Laptev , Large-scale unusual time series detection,
in: Proceedings of the IEEE International Conference on Data Mining Work-shop, ICDMW, 2015, pp. 1616–1619 . Atlantic City, NJ, USA, November 14–17,
2015
[27] M. Cuturi , Fast global alignment kernels, in: Proceedings of the Twenty EighthInternational Conference on Machine Learning, ICML, 2011, pp. 929–936 . Belle-
vue, Washington, USA, June 28 - July 2, 2011 28] K.R. Sivaramakrishnan , K. Karthik , C. Bhattacharyya , Kernels for large margin
time-series classification, in: Proceedings of the International Joint Conferenceon Neural Networks, 2007, pp. 2746–2751 .
29] H. Chen , F. Tang , P. Tino , X. Yao , Model-based kernel for efficient time series
analysis, in: Proceedings of the Nineteenth ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, in: KDD ’13, ACM, New York,
NY, USA, 2013, pp. 392–400 . 30] X. Zhang , J. Liu , Y. Du , T. Lv , A novel clustering method on time series data,
Expert Syst. Appl. 38 (9) (2011) 11891–11900 . [31] T. Oates , L. Firoiu , P. Cohen , Clustering time series with hidden Markov mod-
els and dynamic time warping, in: Proceedings of the IJCAI-99 Workshop on
Neural, Symbolic, and Reinforcement Learning Methods for Sequence Learning,1999 .
32] Y. Xiong , D.-Y. Yeung , Mixtures of ARMA models for model-based time seriesclustering, in: Proceedings of the IEEE International Conference on Data Min-
ing, 2002 . [33] S. Behal, K. Kumar, Detection of DDoS attacks and flash events using novel
information theory metrics, Comput. Netw. 116 (Supplement C) (2017) 96–110 .
https://doi.org/10.1016/j.comnet.2017.02.015 34] B. Tellenbach, M. Burkhart, D. Schatzmann, D. Gugelmann, D. Sornette, Accu-
rate network anomaly classification with generalized entropy metrics, Comput.Netw. 55 (15) (2011) 3485–3502 . https://doi.org/10.1016/j.comnet.2011.07.008
[35] J. Heo , E.Y. Chen , T. Kusumoto , M. Itoh , Statistical SIP traffic modelingand analysis system, in: Proceedings of the Tenth International Sympo-
sium on Communications and Information Technologies, 2010, pp. 1223–1228 .
10.1109/ISCIT.2010.5665175 36] S. D’Antonio, M. Esposito, F. Oliviero, S.P. Romano, D. Salvi, Behavioral network
engineering: making intrusion detection become autonomic, Annales Des Télé-communications 61 (9) (2006) 1136–1148, doi: 10.1007/BF03219885 .
[37] J.V. Davis , B. Kulis , P. Jain , S. Sra , I.S. Dhillon , Information-theoretic metriclearning, in: Proceedings of the Twenty Fourth International Conference on
Machine Learning, 2007, pp. 209–216 . New York, NY, USA
38] M. Cuturi , J.P. Vert , O. Birkenes , T. Matsui , A kernel for time series based onglobal alignment, in: Proceedings of IEEE International Conference on Acous-
tics, Speech and Signal Processing, 2, 2007, pp. 413–416 . 39] U. Luxburg , A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007)
395–416 . 40] Fonality, Trixbox business phone solutions, 2016, ( https://www.fonality.com/
trixbox ), [Online; accessed 04-06-2017]. [41] B. Kurt , C. Yildiz , T.Y. Ceritli , M. Yamac , M. Semerci , B. Sankur , A.T. Cemgil , A
probabilistic SIP network simulation system, in: Twenty Fourth Conference on
Signal Processing and its Applications (SIU), IEEE, 2016, pp. 1049–1052 . 42] C. Yildiz , B. Kurt , T.Y. Ceritli , A.T. Cemgil , B. Sankur , BOUN-SIM API Reference,
Technical Report, Department of Computer Engineering, Bogazici University,2016 .
43] Teluu, PJSIP, 2005, ( http://www.pjsip.org/ ), [Online; accessed 04-06-2017]. 44] NETAS, Nova V-spy, 2016, ( http://novacybersecurity.com/products/nova _ vspy ),
[Online; accessed 04-06-2017].
45] J. Gamblin, Mirai source code, 2016, ( https://github.com/jgamblin/Mirai- Source- Code ), [Online; accessed 04-06-2017].
154 M. Semerci et al. / Computer Networks 136 (2018) 137–154
al and Electronics Engineering Department and the Department of Computer Engineering
his M.S. degrees from the Department of Computer Engineering, Bogazici University, in elaer Polytechnic Institute, Troy, NY, in 2010. Currently he is a Ph.D. student in Bogazici
, distance learning and kernel machines.
rsity Nijmegen, the Netherlands and worked as a Postdoctoral Researcher at Amsterdam
s Lab., University of Cambridge, UK. He is currently an Associate Professor of Computer research interests are in Bayesian statistical methods, approximate inference, machine
Department of Electrical-Electronic Engineering. His research interests are in the areas
ion and multimedia systems. He has held visiting positions at the University of Ottawa, re des Télécommunications, Paris. He was the chairman of several conferences (EUSIPCO’
of the European Signal Processing Association. Dr. Sankur is presently an associate editor and Video Computing, and Image and Video Processing.
Murat Semerci received his B.S. degrees from the Electric
(Double Major Program), Bogazici University, in 2005 and2007 and from the Computer Science Department, Renss
University. His research interests include machine learning
Ali Taylan Cemgil received his Ph.D. from Radboud Unive
University and the Signal Processing and CommunicationEngineering at Bogaziçi University, Istanbul, Turkey. His
learning, and audio signal processing.
Bülent Sankur is presently at Bogazici University in the
of digital signal processing, security and biometry, cognitTechnical University of Delft, and Ecole Nationale Supérieu
05, ICASSP’ 00 etc.) and a member of administrative boardin the journals of Annals des Télécommunications, Image