Applied Soft Computing - ISSEL...G. Mamalakis et al. / Applied Soft Computing 25 (2014) 1–14 3...

Od

GD

a

ARRAA

KIIMDFA

1

1

aotvsgeedttta

dl

h1

Applied Soft Computing 25 (2014) 1–14

Contents lists available at ScienceDirect

Applied Soft Computing

j ourna l h o mepage: www.elsev ier .com/ locate /asoc

f daemons and men: A file system approach towards intrusionetection

. Mamalakis ∗, C. Diou, A.L. Symeonidis, L. Georgiadisepartment of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece

r t i c l e i n f o

rticle history:eceived 7 August 2013eceived in revised form 16 May 2014ccepted 29 July 2014vailable online 17 September 2014

eywords:ntrusion detection systemsnformation security

a b s t r a c t

We present FI2DS a file system, host based anomaly detection system that monitors Basic Security Module(BSM) audit records and determines whether a web server has been compromised by comparing mon-itored activity generated from the web server to a normal usage profile. Additionally, we propose a setof features extracted from file system specific BSM audit records, as well as an IDS that identifies attacksbased on a decision engine that employs one-class classification using a moving window on incom-ing data. We have used two different machine learning algorithms, Support Vector Machines (SVMs) andGaussian Mixture Models (GMMs) and our evaluation is performed on real-world datasets collected fromthree web servers and a honeynet. Results are very promising, since FI2DS detection rates range between

−2 −4
achine learningata miningile systemnomaly detection
91% and 95.9% with corresponding false positive rates ranging between 8.1× 10 % and 9.3× 10 %. Com-parison of FI2DS to another state-of-the-art filesystem-based IDS, FWRAP, indicates higher effectivenessof the proposed IDS in all three datasets. Within the context of this paper FI2DS is evaluated for the webdaemon user; nevertheless, it can be directly extended to model any daemon-user for both intrusiondetection and postmortem analysis.

© 2014 Elsevier B.V. All rights reserved.

. Introduction

.1. About intrusion detection

In contemporary computer and communication networkslmost everybody uses the Internet backbone to exchange personalr sensitive information in a daily basis. Through cellphones, lap-ops, net-pads and smart sensors people exchange data on top ofarious types of applications like email services, web transactions,ocial networks, file transfers, etc. In tandem with this extremerowth of information exchange via the Internet, cyber-threatsvolve to exploit the expanding attack surface. In order to protectnd-users, specialized software and hardware solutions have beeneployed (firewalls, antiviruses, spam detectors, Intrusion Detec-ion Systems, sandboxes, etc). It is common knowledge, though,hat none of these solutions alone is enough to offer absolute pro-
ection; usually a combination of them is essential for providing andequate level of cyber-protection.
∗ Corresponding author. Tel.: +30 2310 99 4379; fax: +30 2310 99 4370.E-mail addresses: [email protected], [email protected] (G. Mamalakis),

[email protected] (C. Diou), [email protected] (A.L. Symeonidis),[email protected] (L. Georgiadis).

ttp://dx.doi.org/10.1016/j.asoc.2014.07.026568-4946/© 2014 Elsevier B.V. All rights reserved.

Intrusion detection systems (IDS), known as the “computerworld’s burglar alarm”, are based on the idea of identifying attackswhen or after they occur, and fire an alarm or take some action– Intrusion Response Systems (IRS) – according to their configura-tion. Different types of IDS have been proposed during the lasttwo and a half decades. Network IDS (NIDS) [3,10,13,28,36] mon-itor network activity, while host-based IDS (HIDS) monitor datagenerated within a host, including command histories [15], sys-tem calls [2,6], function calls [26] and file system data [34]. HybridIDS [16,40] monitor both network and host activity. Misuse IDS[8,14] are trained with malicious data to identify attacks, whileanomaly-based IDS [6,7] raise an alarm whenever monitored activ-ity diverges from a normal usage profile. It is not uncommon to usedistributed architectures of different IDS types in order to enhancethe security perimeter of large computer networks.

1.2. Our approach

We have built a system that is able to distinguish the man fromthe daemon on a running server. In the following paragraphs we
describe how such a system can be used to allow intrusion identi-fication of compromised daemons.
As most experienced system administrators know, no matterhow well configured, hardened and up-to-date a running system

dx.doi.org/10.1016/j.asoc.2014.07.026

http://www.sciencedirect.com/science/journal/15684946

www.elsevier.com/locate/asoc

http://crossmark.crossref.org/dialog/?doi=10.1016/j.asoc.2014.07.026&domain=pdf

mailto:[email protected]





dx.doi.org/10.1016/j.asoc.2014.07.026

2 d Soft

omeOserurid

towccdopvaaamtbt

bBafavsmeaiEOFmcaorlr

ffiaStdchrc[

G. Mamalakis et al. / Applie

ffering network services is, it still has a chance of being compro-ised (it only takes a clumsy PHP developer to create an easily

xploitable vulnerability even on the most protected web server).n a server system the attack surface consists mainly of the public

ervices it offers (HTTP, FTP, SSH, etc.) and furthermore, attackersxploiting a vulnerability on a service are usually rewarded with aemote shell running with the privileges of the exploited daemon-ser. Once this type of access is granted, attackers are allowed toun arbitrary commands as this daemon user. Hence, one way todentify when a daemon has been compromised is to monitor theaemon process for abnormal behaviour.

We formulate our objective as a machine learning problem andry to identify novel features that are informative enough for vari-us HIDS anomaly detection algorithms. Within the context of thisork we argue that this distinction of daemon-human behaviour

an be accomplished, since daemons usually behave in a very spe-ific and repetitive way, serving web pages from certain paths orelivering mails to pre-configured mailboxes. Attackers, on thether hand, after compromising the vulnerable daemon usuallyerform actions like searching the system or network for furtherulnerabilities, inspecting the system to identify its architecturend OS version, cleaning log files from their trails, downloadingdditional utilities to gain more privileges, defacing a web-pagend generally performing actions that deviate from the way a dae-on usually behaves. We argue that this divergence is reflected on

he overall behaviour of the daemon and therefore an IDS shoulde able to identify the few diverging actions of an attacker againsthe daemon’s normal usage profile.

So, in this paper we present FI2DS, a multiprocessing, python-ased, file system IDS, that is able to identify attacks by readingSM audit records both on-line and off-line (from /dev/auditpipend BSM binary files, respectively), generating feature vectorsrom monitoring file system activity and, ultimately, by employinglternative machine learning techniques on the generated featureectors. FI2DS utilizes file system data because: (a) the file systemtores most attackers’ traces on all OS’s, and (b) this storage is per-anent (contrary to the system’s memory, or CPU registers, for

xample). The best and easiest source for monitoring file systemctivity on our FreeBSD1 servers that have been used for our exper-ments, was FreeBSD’s Audit System that generates BSM records.2

xcept from FreeBSD, BSM is also available for Solaris and MacS X, hence our proposed IDS can run on those systems as well.urthermore, MS Windows Advanced Auditing and Linux audit areechanisms that generate audit records analogous to BSM, so FI2DS

an be ported to support those OS’s too. It is important to stresst this point that FI2DS can be used in parallel with other typesf IDS (HIDS, NIDS etc.) that are capable of identifying attacks noteflected on the file system. FI2DS practically adds an additionalevel of awareness regarding the usage of the file system from theunning daemons.

In the context of this paper FI2DS reads audit records generatedrom the httpd daemon user (www), builds a normal usage pro-le and then monitors incoming activity for divergence through

moving window mechanism. We have employed both one-classupport Vector Machines and Gaussian Mixture Models (see Sec-ion 4) for novelty detection and have carried out experiments onatasets originating from three real-world web servers. The mali-ious activity dataset is inferred from commands gathered from a
oneynet we have deployed just for this purpose. Our experimentalesults indicate that FI2DS achieves high detections rates with loworresponding false positive rates and when compared to FWRAP34], FI2DS outperforms it.
1 http://www.freebsd.org.2 http://www.freebsd.org/doc/en/books/handbook/audit.html.

Computing 25 (2014) 1–14

1.3. Related work

In her seminal paper [4], Denning expresses the idea that intru-sions against computers and networks may be detected if weassume that computer and/or network usage activity can be auto-matically profiled, and that the trails of intrusions are present in thisactivity. Therefore, if a security-benign usage profile can been cre-ated for a monitored system, then all subsequent profiles createdby the system’s later activity can be compared with this baselineusing some meaningful metric and if great divergence is found, thesystem may fire an alarm to inform about the incident. Denning wasthe first to talk about how anomaly detection could be used for com-puter systems and specifically for host-based intrusion detection.Her initial idea came to life with the deployment of Multics IntrusionDetection and Alerting System (MIDAS) [30], an expert system whoserule-base used Production Based Expert System Toolset(P-BEST) andwas developed by the National Computer Security Center (NCSC).

A few years later, Forrest et al [5,6,9,39] demonstrated the notionof immune systems, inspired by her studies in natural immune sys-tems. The key concept in her work was how to define the sense ofself in the UNIX processes so as to detect intrusions by identifyingabnormalities through processes’ deviations from self. Her repre-sentation of self in UNIX was through sequences of system calls.The approach of analysing system calls has been adopted by manyother researchers [18,24,38,41], but as Wagner et al describe in [37],HIDS based on sequences of system calls seem to be susceptible tomimicry attacks.

Even though a lot of research has been held with the subjectof intrusion detection, only a few papers have been involved withanomaly based IDS that identify attacks based on file system dataand none of them seems to use the BSM audit mechanism to collectthem, as we do. Some of the HIDS that used MIT’s Lincoln Labora-tories’ DARPA’s ’98 and ’99 BSM datasets [20], have trained SVM’s[2,42] or other one class classifiers [12] for their detection mech-anisms but, contrary to our proposed IDS, they have not used filesystem semantics on their features. Lastly, the legacy, discontinuedEMERALD’s [27] eXpert-BSM [19] monitor implements an expertsystem IDS that uses BSM data as its input. But due to the fact thatexpert-BSM is a misuse IDS, instead of being trained with benigndata to form a norm and try to identify attacks on newly arrivingdata by inspecting how related they are with respect to the norm,it uses a malicious rule base in its expert system that probes newlyarriving data against it.

One more interesting work, although more relevant to perma-nent data storage, is the one proposed by Stanton et al. [33]. Eventhough they follow a very different modeling approach and usedifferent types of data (block level data combined with file systemdata), their idea illustrates some features that are worth mention-ing. They propose a 3-tiered anomaly based IDS, File and BlockSurveillance System (FABS), that monitors file system and devicecontroller data for abnormal behaviour. FABS builds a normal usageprofile by studying sequences of events (disk accesses at the file andblock level) and classifies as malicious all events that deviate fromit. It uses C-miner’s [17] rule based engine to build their IDS andproposes a GUI prototype (VisFlowConnect-SS) for visualisation.

Another interesting approach that uses file system informationis that of Stolfo et al. [34], where the proposed system, File Wrap-per Anomaly Detector (FWRAP), uses features extracted from thefile system of a Linux system, through the use of a kernel mod-ule the authors wrote, in order to build a normal usage profile;FWRAP uses the information extracted from this module to detectattacks based on a Bayesian estimation technique. Although our
general idea uses the same source of information as Stolfo’s – thefile system – our technique differs significantly. First of all, the fea-ture set and the algorithms used are entirely different. Secondly,Stolfo et al. introduce a command entry in each feature vector which
http://www.freebsd.org

http://www.freebsd.org/doc/en/books/handbook/audit.html

d Soft Computing 25 (2014) 1–14 3

rFotatpcqcioWe

sfitw

1

csofiiows(

2

wrtap

mwaotd

2

raOOh

aaTa

Table 1BSM audit record printed with praudit.

header,131,11,access(2),0,Fri Jan 18 12:07:54 2013, + 393 ms

path,/lib/libjail.so.1

attribute,444,root,wheel,91,730199,2923288

argument,2,0x0,mode

subject,www,root,wheel,root,wheel,67997,0,0,0.0.0.0


enders the problem more related to command-histories IDS’s [31].inally, the experiments focus on totally different aspects thanurs: their approach is to build a normal usage profile by moni-oring root’s and other non-privileged human accounts’ activity on

Linux workstation and then identify attacks spawned by one ofhese human users, whereas our approach is to build a normal usagerofile by auditing a daemon user’s file system activity (www in ourase) and try to identify actions that a human user performs mas-ueraded as this daemon. Despite FI2DS and FWRAP differences, aomparison between FI2DS and FWRAP is meaningful, since FWRAPs currently the state-of-the-art IDS that is, at least, partially basedn file system information directly extracted from the OS kernel.e therefore implemented FWRAP and used it as a baseline in the

xperiments presented in this paper.Summarizing, our proposed IDS’ novelties with respect to the

tate of the art are: (a) the feature space we have created to maple system activity, (b) the daemon-user-centric angle we look athe IDS problem from, and (c) the source of our file-system datasethich uses the BSM mechanism.

.4. Paper outline

The rest of the paper is organized as follows: Section 1.3 dis-usses related work and the state of the art, with respect to fileystem IDS’s. Section 2 presents our IDS architecture and the way itperates. Section 3 describes the features we have selected to maple system activity on, while Section 4 discusses the machine learn-

ng techniques we have used for building our detection engines inur experiments. These experiments as well as their results alongith comparisons with the FWRAP and its algorithm, PAD, are pre-

ented in Section 5, followed by a brief discussion of our resultsSection 6) and a conclusion with future work ideas in Section 7.

. Proposed IDS architecture

Our IDS comprises several components (shown in Fig. 1) andorks in three modes: training, ids and postmortem. In short, BSM

ecords are collected, parsed and preprocessed and are then usedo generate a set of feature vectors. These are in turn used to buildn anomaly detection model during training mode, or to assess therobability of attack during ids or postmortem mode.

The various components/modules of this system as well as itsodes of operation are outlined in the following paragraphs. Asill be shown in subsequent sections, the focus of this work is on

dequately expressing the web server’s file system activity, when isbserved from BSM audit records. Appropriate selection of featureso be extracted allows development of highly effective anomalyetection models with high detection and low false alarm rates.

.1. BSM audit records

BSM records are used as input to our system. Sun’s Basic Secu-ity Module (BSM) [32] is a mechanism that allows for fine graineduditing and is available for different OS’s (Solaris, FreeBSD, MacS, etc.). The implementation we have used in this paper is ofpenBSM, part of the TrustedBSD project,3 since the servers weave collected our data from were running the FreeBSD OS.

BSM audit records are generated by different OS facilities (kernel
nd user-land) and are handled by the auditd daemon. A sampleudit record printed by the FreeBSD’s praudit utility is depicted inable 1. Each record comprises a number of audit tokens and eachudit token describes different system’s aspects. The audit record
3 http://www.trustedbsd.org.

return,success,0

trailer,131

of Table 1 shows a different token in each line; the header, path,attribute and subject tokens are the ones that have been usedin our feature generation process. An extended explanation of theBSM mechanism, its records and tokens can be found in [32].

The filtered record is parsed by the feature generation com-ponent (fgc) to form a feature vector; however, this procedure isexplained in the following paragraphs.

From the header token we use the event attribute that showsus the system call that has been used when the file in questionwas accessed (access(2)). Hence, this part of the header token givesus the mode of access to the specific file. The path token containsonly one attribute, the path (in our example /etc/libjail.so.1),which is the basis of forming a feature vector. From the subjecttoken we are consulting the effective UID of the user in order toacquire only those records that are related to the specific daemonuser (www in our example).

2.2. IDS components and running modes

The proposed IDS filters out BSM audit data unrelated to filesystem activity, creates feature vectors and generates an innocu-ous usage profile from standard usage data in order to identifyattacks. This procedure is depicted in Fig. 1 and is implementedby the following IDS components:

The audit sensor component (asc) either listens on/dev/auditpipe for incoming BSM audit records or readsaudit records from a binary BSM file. This module is used fordata gathering and its output is passed right to the input of thepre-processing component that follows.

The preprocessor component (ppc) collects the binary auditrecords, filters them based on the system’s configuration and parsesthem in order to produce meaningful data objects to be used by thefeature generation component. Filters are boolean expressions thatare applied on each audit record and on each of its tokens to deter-mine whether the record/token will be discarded or retained forfurther processing. These filters can range from very simple ones,like “is there a path token in the record’s path”, to more sophis-ticated, like “the path should contain the substring /etc and theuser should be www or the time should be less than now” and pro-vide a simple, yet very powerful, fine grained control mechanismas to which audit records and tokens will be processed by the fea-ture generation mechanism that follows. When FI2DS is running intraining mode, ppc is responsible for generating a database fromwhich the statistics are calculated during the feature generationprocedure.

The feature generation component (fgc) is responsible for the cre-ation of feature vectors from the data obtained by ppc. The pathand the mode of access of the audit records are used for queryingthe database, and statistics are computed that are used for the for-mation of the different features. In our experimental set-up, oncea feature vector is created it is passed to the machine learning com-ponent for further processing. A detailed analysis of the features
used in the web server case, discussed in this paper, is presented inSection 3.
Finally, the decision making component (dmc) is used (a) for train-ing the machine learning model when the IDS runs in training mode,

http://www.trustedbsd.org

4 G. Mamalakis et al. / Applied Soft Computing 25 (2014) 1–14

w in training mode, while the solid line is for the IDS and postmortem modes.

(Faicbii

ma

lacuffca

afa

ofa

3

nd

1

2

3

rf

Table 2Statistics computed from the reference database.

Symbol Description

N Records in the databaseNf Times file f has been accessedNmf Times f has been accessed with mode m

that contains library files is not common behaviour for a web dae-mon user and should be distinguished from write attempts at afolder used for file uploads, even if the access statistics are similarin both cases.

Table 3List of frequency features extracted in the proposed IDS.

Feature Probability Value

x1 Pr(f) Nf/Nx Pr(f|p) N /N

Fig. 1. Our IDS architecture. The dashed line indicates the information flo

b) for generating an attack probability for feature vectors whenI2DS runs in postmortem or ids mode, and (c) for detecting attacksnd taking some action upon intrusion identification when runningn postmortem or ids mode. In our set-up, an attack probability isomputed and assigned to each feature vector; this attack proba-ility is updating the attack score of a moving window and an alarm

s triggered when the mean attack score of the records containedn this window exceeds a certain threshold.

The various IDS components behave differently in each runningode. The three running modes – ids, training and postmortem –

re analysed as follows:Training mode: in this mode audit records are either read on-

ine – from the auditpipe – or off-line – from a file – through asc,nd are subsequently parsed and pre-processed by ppc. The sameomponent is responsible for generating a statistics database to besed for feature generation (Fig. 1). Next, ppc’s output is passed to

gc where feature vectors are created and the feature vectors areurther passed to dmc for training. Once the training procedure isompleted, the trained model is saved and used as the baseline ofttack identification.

Ids mode: in this mode audit records are read on-line from theuditpipe, they are parsed and pre-processed by ppc and given to fgcor feature generation; fgc’s output is passed into dmc for generatingn attack score and identify potential attacks.

Post-mortem mode: this mode is similar to on-line mode. Thenly difference is that in this case, audit records are read off-linerom a binary file. The system can inform the analyst as to whichudit records triggered an alarm.

. Feature extraction

Each incoming BSM audit record is processed to extract a smallumber of features that capture information relevant to anomalyetection. We identify two types of features:

. Frequency features: Relative frequency values that measure howfrequently files and/or directories are accessed by the web dae-mon.

. Binary features: Binary values that identify qualitative character-istics of file system directories (e.g. whether a parent directorycontains configuration files or executables).

.1. Frequency features

In order to generate frequency features, a large number ofecords (hundreds of thousands to millions) is initially collectedrom the monitored daemon so as to extract statistics on the access

Np Times the parent directory p of file f has been accessedNmp Accesses of files with parent p and mode mNpp Accesses of the parent of f’s parent directory

frequency of files and directories. Table 2 lists the values that arecollected in the database, which are subsequently used to extractthe frequency features indicated in Table 3. The use of probabilitiesPr(f) and Pr(m|f) is intuitively justified: a record indicating accessof a file f that is not normally accessed by the web daemon (at leastnot with mode m) is used as evidence to detect unusual behaviour.However, it is common for system daemons to create multiple tem-porary files and directories that have not previously appeared in thedatabase. Features x1 and x5 alone cannot discriminate this type ofnormal operation from an attack and need to be complementedwith additional features to avoid false alarms. The probabilitiesPr(f|p), Pr(p) and Pr(m|p) provide additional information on howcommon it is to access the parent folder p of f, f within p and files inp with mode m respectively. Furthermore, Pr(p|pp) conveys infor-mation on the frequency of the parent folder with respect to itsown parent folder; this can be discriminative in cases of normalbehaviour with low Pr(f) and low Pr(p).

3.2. Binary features

Apart from the frequency of file and directory access there areadditional characteristics of the accessed files and directories whichare important for anomaly detection. A write attempt at a folder

2 f p

x3 Pr(p) Np/Nx4 Pr(p|pp) Np/Npp

x5 Pr(m|f) Nmf/Nf

x6 Pr(m|p) Nmp/Np

G. Mamalakis et al. / Applied Soft

Table 4List of binary features extracted in the proposed IDS.

Feature Assertion

x7 Parent is a configuration folderx8 Parent is a library folderx9 Parent contains executable files

wsi

darsta

4

lt1lt

p

rtfrhfpMr

4

n

p

wosc�sair

aolei

the training set with which the GMM one-class classifier has beentrained (i.e. mean = 1/N

∑Ni=1 log p(xi|y = 0)) and std is the corre-

x10 Parent contains device nodesx11 Parent is a temporary folderx12 First time file is accessed with this mode

Thus, the frequency features of Section 3.1 are complementedith a set of binary features (i.e. with values in {0, 1}) that corre-

pond to the truth value of assertions about the accessed file andts parent directory. These are summarized in Table 4.

Features x7–x11 are determined based on a predefined list ofirectories provided by the system administrator. For example, thedministrator may indicate that directory /etc contains configu-ation files. If an audit record detects file access on a file in a /etcubdirectory then x7 is 1, otherwise it is 0. An exception to this pat-ern is x12 that has been used to emphasize previously unseen fileccess records.

. Anomaly detection models

Anomaly detection in our IDS is based on unsupervised machineearning. A set of BSM records is collected during system opera-ion and the extracted features are concatenated to form a set of2-dimensional feature vectors xi, one for each record i. This col-

ection is used to train a model that allows us to decide whetherhe following expression holds:

(x|y = 0) < T (1)

where x is the input BSM record feature vector, y = 1 if theecord corresponds to an attack and 0 otherwise and p(x|y = 0) ishe probability density function (pdf) of normal operation in theeature space. The threshold T is used to determine whether x cor-esponds to unusual behaviour. Setting T to higher values leads toigher detection rate, while setting it to lower values leads to lower

alse positive rate. We have used two different approaches to solveroblem (1), namely Gaussian Mixture Models and Support Vectorachines, to show that the detection results are not necessarily

elated to the algorithm of the detection engine.

.1. Anomaly detection with GMM

GMMs [23] estimate the target pdf as a mixture of multivariateormal pdfs, i.e.

(x|y = 0) =n∑

i=1

aiKi exp(

−12

(x − �i)T �−1

i (x − �i))

(2)

here Ki = (2�)−k/2|�i|−1/2 is a normalisation factor, n is the numberf mixtures (distributions), ai correspond to the mixture weightsuch that

∑ni=1ai = 1, while �i and �i are the mean value and

ovariance matrix of the i-th mixture respectively. We may allowi to be a diagonal or full matrix, depending on the desired expres-

iveness of the model and the number of parameters that we areble to estimate with the available sample. Parameter estimations achieved via expectation-maximization (EM) in a set of sampleecords.

The proposed prototype IDS has been implemented in Pythonnd the scikit-learn package [25] was used for the implementation
f GMM. A typical web server quickly collects thousands or mil-ions of BSM audit records, so we selected full covariance matrixstimation. Selection of the number of mixtures n and threshold Ts discussed in Section 5.
Computing 25 (2014) 1–14 5

4.2. Anomaly detection with SVM

For SVMs we use Schölkopf’s one-class SVM algorithm [29]. Inthis case we do not directly estimate the pdf p(x|y = 0), but insteadaim at producing a model f such that f ≥ 0 when (1) holds and f < 0otherwise. This approach therefore solves a simpler problem thanthe one of Section 4.1, that involves full estimation of the p(x|y = 0)pdf. Function f is estimated by adapting the SVM binary classifica-tion algorithm to consider all available samples as members of oneclass and the origin as the only member of the opposite class.

One-class SVM uses

f(x) = sign (w�(x) − �) (3)

where �(x) is unknown, but its inner product is computed via akernel

K(x, y) = �(x)�(y) (4)

that satisfies Mercer’s conditions [1]. We wish to compute w and �such that the margin between the two classes is maximized, whilethe number of misclassification errors is minimized. After defin-ing the optimization problem (details in [29]) the one-class SVMfunction becomes

f(x) = sign

(N∑

i=1

aiK(xi, x) − �

)(5)

where x is the input feature vector, xi are the N samples used fortraining, 0 ≤ ai ≤ 1/�N are determined during optimisation and �satisfies

� =∑

j

ajK(xj, xi) (6)

for all i. The parameter � controls how “tight” the separating boundwill be around the target distribution and is thus related to thethreshold T of Eq. (1). In our experiments we used the Radial BasisFunction kernel, i.e.

K(x, y) = e−�‖x−y‖2(7)

5. Experiments

The way we have created our training sets and test sets, as wellas the methods we have employed to run our experiments alongwith their results are presented in the paragraphs that follow.

5.1. Experiment setup

FI2DS is written in Python and its design and modules are pre-sented in Section 2. In order to read BSM binary data, we used thepybsm4 library. As mentioned earlier, all machine learning algo-rithms were written using scikit-learn.

For our experiments with SVMs we have used a fixed setting forthe threshold T of Eq. (1) (i.e. fixed value of � = 0.5). For GMM, wechose a full covariance matrix and selected 10 to be the numberof mixtures. Furthermore, for numerical stability we have used logprobabilities and selected the threshold T to be

log T = mean − 3 × std (8)

where mean is the average GMM log-probability density score of

sponding standard deviation.

4 http://www.opensource.apple.com/source/OpenBSM/OpenBSM-21/test/pybsm.c.

http://www.opensource.apple.com/source/OpenBSM/OpenBSM-21/test/pybsm.c

6 d Soft

ItdsftaaHimoa

dsprecmoi

5

fhbps8WsdS3

cMsoUwrmraTowrmtt(

wa


We conducted four types of experiments to validate both ourDS and feature set performance. They answer the following ques-ions, in this order: (1) Are the selected features expressive? Howoes a complex classifier – like SVM – behave on different traininget sizes and method of training-set selection? (2) If the proposedeature set is effective, are all the features necessary in order to dis-inguish the man from the daemon? (3) How complex is the IDSnomaly detection problem with the proposed features? How does

simpler classifier – like GMM – work on our feature space? (4)ow would the proposed anomaly detection approach be applied

n a real IDS and how effective would it be? The results were provenore than promising for both SVM and GMM classifiers. Discussion

f the experiments performed, followed by an analysis of the resultsre illustrated in Sections 5.3–5.5 and 5.7.

As one may infer, not all of our experiments can use the sameatasets. Experiment set 1 uses different training sets of variousizes to show how training set selection affects detection and falseositive rates. From the results of experiment set 1 we choose a rep-esentative training set for each server and use it on all other threexperiments. Evaluation sets are the same for all experiments, buthange slightly in experiment set 5 to become compatible with theoving window mechanism we employed (Section 2.2) to make

ur IDS suitable for on-line use. More information on our datasetss found in the following paragraphs.

.2. The dataset

As we have already mentioned, our datasets have been createdrom three real-life web servers. Vergina hosts 27 sites, www eeosts 4 sites and thmmy hosts 4 sites. Some of these sites haveeen created with CMS’s, some are comprised of static HTMLages and others have been written with custom PHP code. Theervers are running 3 different versions of FreeBSD (6-STABLE,-STABLE, 9-STABLE) and are all using apache5 as their web server.e followed no specific rules during the data collection, since all

ervers have different visiting patterns. We just started collectingata and stopped when the audit file’s size was at least 1 GB large.erver thmmy reached this limit in almost a day, www ee needed

and vergina 2.The only publicly available datasets used in IDS evaluations that

ontain BSM audit records are DARPA’s ’98 and ’99 datasets fromIT’s Lincoln Laboratories.6 Moreover, the operation of FI2DS pre-

umes that BSM audit records originate from the daemon-user, inur case www, to identify file system specific abnormal behaviour.nfortunately, as illustrated in [20,21], the only attacks against theeb server (apache) in these datasets are DoS attacks which are not

elated to file system activity. Furthermore, the web server – likeost other services and system processes – on these datasets was

unning as user root, so there is no way of distinguishing betweenudit records coming from the web server and those that don’t.hus, a new dataset had to be created for the proper evaluation ofur IDS. This was achieved by monitoring three of our Department’seb servers and the dataset creation process is described in the

emainder of this section. The dataset that was used in the experi-ents can be found on-line.7 It is important to note at this point that

he fact that FI2DS focuses on attacks that are related to the file sys-em does not pose any restriction on using additional types of IDSHIDS, NIDS, etc.) to extend the range of attacks that can be detected.

The main problem we had to solve when creating our datasetsas the generation of malicious activity. The best way to cre-

te malicious activity is to know exactly how all attackers think

5 http://www.apache.org.6 http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideval/data/.7 ftp://ftp3.ee.auth.gr/pub/mamalos/dataset2.tar.xz.

Computing 25 (2014) 1–14

and act, and emulate their behaviour by running various exploitsagainst the monitoring servers in order to first compromise thehost and then gain escalated privileges. Yet, unfortunately, suc-cessful exploits are not easy to write or to be found in our days,since most OS’s are equipped with stack protections, heap pro-tections, etc. Even if we intentionally created a vulnerability forapache and managed to write an exploit and shell-code for it,it would be difficult to run it on our monitoring servers, sinceeach of them runs a different major version of FreeBSD whichmeans that a candidate exploit would have to be able to run onall FreeBSD versions or we would have to write different versionsof the exploit for each OS, thus increasing its already high com-plexity. Moreover, proper evaluation would require far more thanone exploit, and writing more than one implies even higher com-plexity. Therefore, we decided to approach this problem from thepoint where the attacker has compromised the victim host andforth.

Attackers rarely stop activities on the compromised host rightafter successful intrusion. No matter how access was gained onthe victim system (e.g. via and exploit with a shell-code, by takingadvantage of some PHP programming error, via an SQL injection,etc.), attackers are expected to use the privileges of the daemonuser towards meeting their objectives. Examples of objectivesinclude gaining escalated privileges, installing a back-door, openinga covert channel, searching for additional information with respectto the victim’s host, using the host as a hop to issue subsequentattacks on other systems, installing spam engines, installing bit-coin generators, etc. Most – if not all – of these actions involvesome interaction with the file system. Hence, if we identified whenthe daemon user performs such actions, we would be able to inferthat an attack has taken place and fire an alarm. This is equiva-lent to firing an alarm not when a burglar picks the lock or breaksthe window, but rather once they step on the floor. Of course, thisdoes not prevent FI2DS from identifying the moment of intrusionas long as it involves interaction with the file system. Effectively,what we decided to do in our experiments was to –at least– identifyhuman actions over daemon’s actions by monitoring the daemon’sfile system activity.

5.2.1. The honeynetIn order to understand how an attacker behaves and be able

to emulate their behaviour we have chosen to deploy a honeynet.First, we created two distinct, fully interactive honeypots runningFreeBSD, that had six users with easy to guess passwords whichwere configured to log each user’s input on an append- only file.Those systems were built inside FreeBSD jails, and were highlyhardened in order to protect the system from further compromis-ing that could lead to erasing the log file. Moreover, we set-up a webserver on each honeypot, configured to serve a popular CMS systemhaving default admin usernames and passwords that were moni-tored too. After one month of operation there was no successfulattempt on any of our honeypots, so the next honeyport implemen-tation we decided to deploy was kippo,8 because it allows multiplepasswords per user and is –at least– semi-interactive.

Kippo emulates a system running the SSH protocol and saves allsuccessful and unsuccessful connection attempts -along with theusernames and passwords usedin its database. Once an attackerguesses a password correctly and logs into the system, all subse-
quent input and output is stored in kippo’s database. Additionally,the user is granted super-user privileges inside the emulated sys-tem which means that when an attacker logs into a kippo honeypotthey already possess all available user privileges.
8 https://code.google.com/p/kippo/.

http://www.apache.org

http://www.ll.mit.edu/mission/communications/cyber/CSTcorpora/ideval/data/

ftp://ftp3.ee.auth.gr/pub/mamalos/dataset2.tar.xz

https://code.google.com/p/kippo/

G. Mamalakis et al. / Applied Soft

Table 5Datasets’ sizes used in our experiments: Normal usage set(nus), normal usage testset (nuts), malicious test set(mts).

Dataset nus nuts mts

vergina 522,257 150,000 3133

tdthOftfrarfimarmc

5

m4wcrc

cuoacotuT

toha

oaa(wtcfeetwi

www ee 1,399,474 150,000 2433thmmy 3,340,123 150,000 1035

Our honeynet comprises six kippo semi-interactive nodes plushe two initial FreeBSD-jail full interactive nodes, that span threeifferent subnets. Throughout a three month monitoring period,wo of our kippo nodes were compromised, 27,160 connectionsave been established, 643 of which resulted in successful logins.n 21 of those sessions the attacker issued at least one command

rom which 130 different lines of input have been identified. Fromhis input, a set of 22 unique Unix commands has been extractedorming 34 different commands containing specific paths. The maineason why the total amount of unique malicious commands is rel-tively small is because the attackers were already connected asoot, so they had no motive in trying to investigate the system forurther vulnerabilities (Local-to-Root, L2R) and/or try to circumventts security perimeter. Hence, in order to enrich our malicious com-

and set, we asked our sys-admins to login into one of our serverss user www and act as if they had compromised it, trying to gainoot access. By inspecting their command histories we gathered 15ore commands to add to our list, so our total number of malicious

ommands became 49.

.2.2. Training and evaluation setsDue to different server configurations, not all malicious com-

ands could be issued on all our servers, so in www ee we issued6 of them, on thmmy we executed 47 of them and only in verginae were able to execute all 49 of them. To produce our mali-

ious dataset on each server, we configured them to log BSM auditecords that were generated when we ran the appropriate mali-ious commands-set as user www.

Except from the malicious dataset, which served as our mali-ious test set (mts), we also created (a) a training set (ts) with normalsage data activity, and (b) a normal usage test set (nuts) to test howur system behaves on newly arrived normal activity, so as to beble to train and validate our IDS. Of course, since our data had beenollected from real-world servers, there was no certainty that anyf the audited systems had not been compromised during or beforehe data-gathering procedure, which means that our IDS performsnsupervised anomaly detection. Our final datasets are shown inable 5.

So, overall, a training set, a malicious test set and a normal usageest set have been generated for each server. Due to the large sizef our initial normal usage datasets (nus), only a sufficient subsetas been used as our ts in our experiments to reduce training andnomaly detection complexity.

Table 5 shows that thmmy’s mts is considerably smaller than thethers’. This is because thmmy’s server configuration is hardened,nd non-root users (like www who we used in our experiments)re unable to execute more than half of the malicious commands24/47). When www runs these commands the system respondsith “permission denied” errors (21/24) or “no such file or direc-

ory” errors (3/24). Even though we initially thought to filter out theommands failing to execute, we decided to keep them and exploreurther how FI2DS responds on hardened systems. These errors gen-rate a standard set of BSM records on the server, regardless of the
xecuted command. The inability to run these commands lowershe detection rate on the specific dataset in experiment sets 1–4here the detection engine evaluates each record separately; but
nterestingly, as we see in experiment set 5, this inability does not

Computing 25 (2014) 1–14 7

affect the real detection rate of FI2DS that uses a moving window,proving its robustness on such phaenomena.

As far as the test sets are concerned, the malicious part consistsof the feature vectors generated by the malicious dataset and theeach innocuous test set was populated with 150,000 feature vectorsfrom each nus’ tail, and was – of course – excluded from the trainingset. We have performed a number of experiments that used all nusas the innocuous test set (excluding training samples) for all serversand we achieved the exact same false positive rates, indicating thatan innocuous test set of this size is sufficient and indicative.

5.2.3. Training and evaluation sets for FWRAP and PADAs discussed earlier, for comparison purposes we implemented

the FWRAP IDS [34], which was the only alternative we found inthe related literature that is closest to our approach. As describedin [34], FWRAP utilizes seven features for intrusion detection:

1. UID. Which is the user ID running the process.2. WD. Which is the working directory of a user running the process.3. CMD which is the command line invoking the running process.4. DIR, which is the parent directory of the touched file.5. FILE, which is the name of the file being accessed.6. PRE-FILE, which is the concatenation of the three previous

accessed files.7. FREQUENCY. Which encodes the access frequency of files in the

training records. This value is estimated from the training dataand discretized into four categories:(a) NEVER (for processes that do not touch any file).(b) FEW (where a file had been accessed only once or twice).(c) SOME (where a file had been accessed about 3–10 times).(d) OFTEN (more than SOME).

In order to run FWRAP with our data, all datasets needed to betransformed to fit its algorithm, PAD [35]. From these features, WDand CMD had to be excluded during our transformation procedurebecause they did not exist in our dataset. Nevertheless, due to thenature of our daemon-related analysis, none of them would con-tribute during the intrusion detection process anyway, since theirvalues would not vary. That is because: (a) the working directory(WD) of the user running the process would always be equal to theworking directory of the apache process, and (b) the command lineinvoking the running process (CMD) would almost always be thepath of the php command with the exception of the few times thatcommand httpd itself would run.

Moreover, as shown later on experiment set 3, we ran a compari-son of PAD with our feature set and its original feature set (describedabove). The problem we had to face was that some of our features(frequency features in Table 3) are not categorical, whereas PADuses categorical features. Therefore, for each such feature on eachdataset, we found its maximum and minimum value and we dividedthe space between these two values into 20 labeled intervals ofequal length. Then, in order to create the PAD-compatible dataset,all values of the specific feature were parsed and a new value wasassigned to them that was equal to the label of the interval theybelonged to.

5.3. Experiment set 1: model effectiveness and training set size

Experiment set 1 evaluates the effectiveness of the proposedsystem using one class SVM as the anomaly detection mechanismfor different training set sizes and method of training set feature
vector selection. We selected the training set with two ways: (a)contiguous feature vectors, (b) randomly selected feature vectors.First, as we explained in the previous paragraph, for each serverwe selected 150,000 features from nus to form our nuts. Then, for


Fig. 2. Average fprs and drs on all datasets when selecting feature vectors contiguously.

ets wh

atbD

Fig. 3. Average fprs and drs on all datas

range of training set sizes and for each method of collection, werained a one-class SVM and evaluated its false positive rate (fpr)ased on the common nuts, and its detection rate (dr) based on mts.uring the evaluation process each record was assessed separately

Fig. 4. Average, minimum, maximum, standard deviation and cl2opt

en selecting feature vectors randomly.

and an alarm was triggered if it deviated from the norm. The trainsize ranged from 10,000 to 140,000 feature vectors, and for eachtrain size more than one experiments took place. Figs. 2 and 3 showthe minimum dr and the mean and minimum fpr for all datasets for

values for thmmy when selecting feature vectors contiguously.

d Soft Computing 25 (2014) 1–14 9

dttoefcsblaasFadf

s

Sdt

tsfas1tqqmduttstia

sfasmmwdhigciatsioAffd

features are found in at least 49% of C’s elements. The lowest per-centage is sufficiently close to the rest, thus one may argue that allof the selected features participate in the combinations that score

Table 6Detection rates (dr), false positive rates (fpr) and training set sizes (tss) for SVM onall datasets.


ifferent train sizes and for contiguously and randomly selectedraining sets respectively. Fig. 4 is a more analytic view of thmmyhat depicts mean, max, min, std, cl2opt for fprs and drs for contigu-usly selected training sets. Metric cl2opt (close to optimal) for fprs isqual to the percentage of fprs that are less or equal to 1.2 × min(fpr)or the specific train size, while for the drs it is equal to the per-entage of the drs that are greater or equal to 0.8 × max(dr) for thepecific train size. For each training set size (tss), a different num-er of experiments (k) have taken place. k depends on tss and nus’

ength. k is computed as follows: For our contiguous experimentsnd for a specific value of tss, we start from the beginning of nusnd select tss number of records. For the next experiment of thisize, we start at an offset of step(tss) and choose tss records again.or the n-th experiment of this size, we start at offset n × step(tss)nd choose tss records, with max(n) = div(len(nus), step(tss)), and itepends on nus’ length and the specific tss. step(tss) is calculated asollows:

tep(tss) ={

tss if tss = 10, 000tss

2if tss > 10, 000

o, k = max(n) is the total number of experiments we ran for eachifferent tss size, for both contiguously and randomly selectedraining sets.

As we can observe from Figs. 2 and 3, the behaviour of the detec-ion engine is quite similar for both methods of selection for theame tss, with a slight exception of thmmy’s minimum and meanprs in the 100,000 and 110,000 region; in this region, both ratesre suddenly rising a bit. So, for vergina and www ee on all trainizes the mean fpr is close to 10% and the minimum fpr is less than0%; for thmmy, the corresponding rates are 20% and 10% respec-ively. But as we can see from Fig. 4 more specifically, fpr cl2opt isuite high for many different values of tss, which means that it isuite possible to score fprs close to its minimum for those sizes; aanifestation of this fact is shown in the following paragraphs, that

escribe the selection of the baseline training set for each datasetsed in the subsequent experiments. Moreover, with the excep-ion of thmmy whose mean tprs are a bit lower than 90%, the otherwo server’s mean tprs are above 95%, for both methods of sampleelection. Hence, despite thmmy’s small divergence, we can cer-ainly conclude that our model is definitely descriptive, as far asntrusion detection is concerned, for many different values of tssnd for both methods of feature selection.

Throughout the rest of our experiments, we have selected onepecific training set for each server that performs well in order toorm a baseline and be able to compare and discuss further resultsgainst it. As we see in Figs. 2 and 3 this does not affect generality,ince good results can be achieved for almost all different sizes andethod of selection on each server. What is interesting is that drinimums (Fig. 5) are very close to dr means, which means thathen someone trains their IDS, they do not need to worry about

rs, since they never fall lower than the minimum which is alreadyigh enough. On the other hand, what they should be worried about

s achieving an acceptable low fpr and the desired dr is in essenceuaranteed. Hence, the way we selected our baseline training setsan be used as the default method of choosing the appropriate train-ng set for any server wishing to use FI2DS. We start by choosing

test set larger than the maximum desired tss (e.g. 150,000) andrain an SVM one-class classifier using training sets of increasingizes that start from the first element of nus; the training set sizencreases with a step of 10,000 feature vectors on each run. We stopur runs when a low false positive rate (<10%) has been achieved.
s one may infer from Table 6, www ee’s training set was selected
rom the first run, thmmy’s from the third and vergina’s from theourth. Results (Table 6) show detection rates higher than 95% for allatasets and corresponding false positive rates that do not exceed

Fig. 5. Minimum detection rates (drs) on all datasets and for both methods of selec-tion.

8.15%. One class SVM was trained with a gamma value of 0.1, anRBF kernel and 0.001 tolerance.

In order to rule out the possibility that only one feature, or asmall combination of our features is highly informative by itself andis the primary reason why the machine learning novelty detectorperformed that well, Experiment set 2 was carried out.

5.4. Experiment set 2: feature effectiveness

Given the fact that the dimensionality of our feature space is rel-atively low, instead of using a sub-optimal feature selection methodto validate our features, we simply decided to run experiments forall 212 − 1 possible combinations of features, using the SVM oneclass classifier and the baseline training sets of Experiment set 1.

Fig. 6 shows a histogram of how frequently a specific featureis found in the set of feature combinations that scored the high-est drs while scoring the lowest fprs at the same time. For verginaand www ee feature combinations that scored drs higher than 90%and fprs lower than 10% have been selected, while for thmmy 1000feature combinations have been selected that scored the highestdrs while having corresponding fprs lower than 10%. The intersec-tion of those three sets that contain the combinations of featuresscoring the best for each server, denoted C, is comprised of 176 ele-ments, from a total of 4095. In Fig. 6, integers on the x axis representa different feature, while the y axis depicts the percentage of theelements of C that contain the specific feature.

From Fig. 6 we can see that all features are participating inthe highest scoring combinations of features, at least to an extent.x axis represents the feature number, using the numbering ofTables 3 and 4. The feature that is found in the fewest elements(22.16%) of C is x5 (Pr(m|f)) and the one that is found in all ele-ments is x7 (parent contains configuration files). The rest of the

Dataset dr fpr tss

vergina 98.21% 8.39% 40,000www ee 95.84% 6.07% 10,000thmmy 88.11% 6.69% 30,000


ent w

ti

pcfdvwhhitneftwiamaa

5

msstipamt

tef

tF(

5.6. Experiment set 4: FI DS vs FWRAP

Up until this point we have shown that FI2DS can be used as a filesystem IDS that scores high detection rates and low false positive

Table 7Detection rates (dr), false positive rates (fpr) and training set sizes (tss) for GMM onall datasets.

Fig. 6. A histogram expressing in what percentage each feature is pres

he highest results, at least to an extent, and therefore all are neededn order to form a good classifier.

Fig. 7, on the other hand, shows the mean detection rate and falseositive rate for the elements of C that contain 1–12 features. Onean clearly identify that the fpr is decreasing monotonically as moreeatures are present in the feature vector while the dr is initiallyecreasing but after at least four features are present in the featureector it starts rising again. We see a small divergence for thmmy,here dr starts falling a bit after 10 features are present, but as weave explained before, due to its hardened configuration, almostalf of the malicious commands return an error which is confus-

ng the detection engine, so we cannot expect thmmy’s behaviouro always agree with the norm as we cannot always explain theorm from thmmy’s point of view. On the other hand, the differ-nce in behaviour of fpr to dr can be easily explained: When theeature space is comprised of fewer features (less than 3 in Fig. 7),he detection mechanism cannot be trained well and fpr is very highhile dr is very high as well; this means that most monitored activ-

ty, malicious or not, is recognised as attack. When more featuresre added to the feature space, the IDS starts learning benign dataore accurately, and so the dr initially decreases, but then rises

gain while the fpr keeps decreasing until all features are presentnd the feature space is more informative.

.5. Experiment set 3: feature space validation

Results from Experiment set 1 indicate that the model imple-ented can perform well on all datasets. Results from Experiment

et 2 depict that our feature selection process does not lead touperfluous features. What remains to be answered is whether (a)he selected features are mapping the input to a highly discrim-native space – thus creating a relatively easy anomaly detectionroblem which more than one machine learning algorithms areble to solve – or if the machine learning problem is hard, yet SVManaged to solve it, and (b) if this feature space performs better

han alternatives found in the related literature.To gain insight regarding question (a), a second novelty detec-

ion algorithm was chosen, namely GMM with the configurationxplained in Section 5.1. We have used the baseline training setsor each server as explained in Experiment set 1.

We ran GMM for all our datasets, and as we can see from Table 7,he results clearly validate the effectiveness of the proposed IDS.prs are significantly lower than the corresponding ones on SVMTable 6) and for some datasets (thmmy, www ee) the fp rate is close

ithin the feature sets that scored low fprs and high corresponding drs.

to or even lower than 1%; the lower fprs of GMM compared to thoseof SVM should not necessarily surprise us, as stated in [22]. More-over, with the exception of the thmmy dataset, drs are very closeto those of Table 6. It should be mentioned that by raising the log-probability density threshold, one can raise the detection rate witha trade-off in false positive rate and vice versa, but we have chosento pick the log-probability threshold without borrowing knowledgefrom mts in order to keep our IDS unsupervised.

Furthermore, to answer the comparison question (b) we imple-mented the algorithm used by FWRAP, PAD [35], and ran it withboth feature sets – ours and FWRAP’s – on all three servers, usingthe same training sets and evaluation sets that have been describedearlier. A detailed explanation of how the existing datasets havebeen transformed to match PAD’s needs has been presented in Sec-tion 5.2.3. With the results of these experiments, the ROC curves ofFigs. 8 and 9 have been calculated, that belong to PAD running withour feature set and FWRAP’s respectively. As a comparison metricswe have used the computed areas under each ROC curve, whichas Figs. 8 and 9 depict, PAD’s area running with our feature set isgreater than PAD’s area running with FWRAP’s feature set for allthree datasets, indicating that our feature set makes PAD performbetter than FWRAP’s feature set does.

What is equally interesting is that these results, apart fromshowing that our feature set is more descriptive with respect tointrusion detection than FWRAP’s, implicitly strengthen the resultsreferring to question (a) as well. And that is because PAD is a verysimple decision making algorithm that performs well when usingour feature set, and therefore another simple one-class classifierusing our feature set can perform well on all datasets, proving oncemore that our feature space is highly discriminative.

2

Dataset dr fpr tss

vergina 98.02% 1.98% 40,000www ee 95.85% 0.65% 10,000thmmy 83.38% 1.1% 30,000

G. Mamalakis et al. / Applied Soft Computing 25 (2014) 1–14 11

Fig. 7. Mean fpr’s and dr’s scored by a different numbers of features for each dataset.

Fig. 8. ROC curves of all datasets using PAD algorithm with our feature set.

Fig. 9. ROC curves of all datasets using the FWRAP IDS.

Table 8Detection rates (dr), false positive rates (fpr) and training set sizes (tss) for FWRAPon all datasets.

Dataset dr fpr tss

vergina 99.81% 20.12% 40,000
www ee 98.73% 13.33% 10,000thmmy 96.33% 14.34% 30,000
rates. Moreover, as we have explained in Section 5.2, there are nopublicly available datasets that can be used with FI2DS and there-fore no direct comparison of FI2DS against some other alternateIDS can be made through a score comparison on it. This compar-ison issue has been addressed by implementing the FWRAP IDS,training and evaluating it with all our datasets and calculating itsdetection rates, false positive rates (Table 8) and ROC curves (Fig. 9).Furthermore, we calculated the ROC curves for GMM (Fig. 10) onall servers with the same datasets in order to be able to directlycompare FI2DS with FWRAP.

As one can notice from Table 8, FWRAP achieves high detec-tion rates, but also relatively high false positive rates, especially incomparison with the associated results of FI2DS running with SVM(Table 6) and GMM (Table 7). There was no threshold we could find

Fig. 10. ROC curves of all datasets using the GMM IDS.


Table 9Best window size, threshold combinations with corresponding detection rates (dr)and false positive rates (fpr) on all datasets.

Dataset Window size Threshold dr fpr

vergina 6 51–60% 95.92% 0.081%

faatit

5

hilfsfitsbsnw

iEtomenrm

f1hsbWorcow

e(thsm

6

aI

Table 10t-statistic, p-value and training set sizes (tss) for t-test on FI2DS running with PAD vsFWRAP.

Dataset t-statistic p-value tss

H1 : �FI2DS > �FWRAP

we can conclude with high confidence (1.37 × 10−63 < p < 1.46× 10−81) that FI2DS outperforms FWRAP on all datasets, since the

Table 11t-statistic, p-value and training set sizes (tss) for t-test on FI2DS vs FWRAP.

www ee 6 51–60% 95.74% 0.00093%thmmy 6 51–60% 91.30% 0.02%

or FWRAP that would make the false positive rates fall any lower,nd hence the peculiar endings of the related ROC curves (gener-ted by scipy [11]). Moreover, from Figs. 9 and 10 it is apparenthat FI2DS ROC area for each server is greater than the correspond-ng FWRAP area, which clearly indicates that FI2DS performs betterhan FWRAP.

.7. Experiment set 5: moving window evaluation

On all previous experiments, detection and false positive ratesave been calculated for each record separately. This implies that

f an attack is constituted of 800 records that 760 of them areabeled malicious, the aforementioned mechanism fires 760 dif-erent alarms. Since this behaviour is not practical at all for liveystems, we added a moving window mechanism on FI2DS thatres an alarm whenever the percentage of malicious records it con-ains exceeds a certain threshold. In this method, each record istill screened for its malicious intent, but the alarm-firing process isased on some aggregation. We have deliberately chosen to use theimplest aggregate metric (percentage of malicious records to totalumber of records in window) so as to see how the simplest movingindow implementation would affect our detection mechanism.

As far as the datasets that have been used in this experiments concerned, our ts have remained the same (baseline ts fromxperiment set 1) but our mts and nuts have changed to meethe experiment’s demands. We created an mts that is comprisedf malicious and innocuous regions of audit records, where eachalicious region contains a malicious command’s audit records and

ach innocuous region contains 1000 audit records collected fromus. nuts, on the other hand, comprises the hole nus excluding theecords used for ts. The moving window slides through these newts and nuts and once it recognizes an attack it fires an alarm.

In this experiment set, moving windows sizes (mws) rangingrom 2 to 40 and various percentage thresholds (pt) ranging from0% to 60% have been tested and a total of 48 different combinationsave been found that on all datasets scored tprs >90% with corre-ponding fprs <10−3. From these 48 combinations, 10 scored theest results on all three datasets, and these are presented in Table 9.e used GMM as our detection mechanism using the configuration

f Section 5.1 due to GMM’s lower fprs compared to SVM’s and PAD’sunning with our feature set; fprs play a very significant role, espe-ially in systems like ours where there are lots of feature vectors inur sensor’s input, thus forcing us to keep fprs as low as possible,ithout significant decline in corresponding drs.

As we see in Table 9, fprs have fallen 2–4 orders of magnitude,ven though we did not use any false positive reduction techniquee.g. training a second classifier to learn false positives, etc.) main-aining high drs. This time, thmmy’s dataset scores significantlyigher than it did on our previous examples, vindicating our deci-ion to not exclude the commands that failed to execute from itsts.

. Discussion

Throughout this series of experiments FI2DS has been evalu-ted against various tests to be assessed as an anomaly detectionDS. In these experiments, we have initially shown that the feature

vergina 81.20 0.0 40,000www ee 37.66 3.47 × 10−194 10,000thmmy 35.60 3.29 × 10−180 30,000

set presented in this paper can be utilized successfully for intru-sion detection when the detection engine is based on one-classSVM, yielding high detection rates and low false positive rates.Additionally, we have shown that all members of our feature vec-tor are to some degree necessary during the intrusion detectionprocess, so none of them could be left out. Furthermore, we haveargued that SVM can be replaced with less complex algorithms, likeGMM and PAD, without deteriorating the detection performance ofFI2DS, indicating that our feature space is highly discriminative withrespect to intrusion detection.

For comparison purposes, FI2DS has been evaluated againstFWRAP, an alternate file system based IDS of the relevant literaturethat introduces a different set of features (see Section 5.2.3) and adifferent decision engine [35]. The two IDS’s have been comparedbased on detection rates, false positive rates and ROC areas, whereFI2DS outperformed FWRAP in all comparisons and the comparisonresults will be validated for statistical significance. Furthermore, aROC area comparison has been held between PAD running with ourfeature set and PAD running with FWRAP’s feature set to directlycompare the two feature sets and the results of this comparisonwere in favour of our approach.

To further strengthen our comparison findings and test if ourresults are statistically significant, we performed a t-test (hypothe-sis test) that was based on the ROC areas that were calculated during1000 runs of FI2DS running with GMM, FWRAP and FI2DS runningwith PAD on each server using the same train sizes and evaluationsizes as already mentioned. On each run, a new randomly selectedcontiguous training set and a new randomly selected evaluationset of appropriate size was created, and these sets were used by allthree IDS’s to calculate the corresponding ROC areas.

By t-testing the results of PAD and FWRAP using the followingnull and alternate hypothesis:

H0 : �PAD = �FWRAP

H1 : �PAD > �FWRAP(9)

we received the results depicted in Table 10. As one can deducewith high certainty (level of significance 0 < p < 3.47 × 10−194), theexpected ROC area of FI2DS running with PAD is greater than therespective expected ROC area of FWRAP, meaning that our featureset is more descriptive with respect to intrusion detection than thefeature set used by FWRAP.

Moreover, by t-testing the results of FI2DS and FWRAP using thefollowing null and alternate hypothesis and evaluating the resultsof Table 11:

H0 : �FI2DS = �FWRAP

(10)

Dataset t-statistic p-value tss

vergina 51.15 1.46 × 10−281 40,000www ee 46.56 7.99 × 10−253 10,000thmmy 18.06 1.37 × 10−63 30,000

d Soft

eR

urtw

7

taatodn

FocicFec

d3wb

oarmttwsd

ataidraat

A

psan

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[


xpected ROC area of FI2DS is highly likely larger than the expectedOC area of FWRAP.

Finally, a real life implementation of FI2DS was presented thatses a sliding window mechanism that achieves high detectionates with low corresponding false positive rates in all our servers,hat in the best case (see Table 9) reached a 95.74% detection rateith an associated 0.00093% false positive rate.

. Conclusion and further work

We have presented FI2DS, a File system Intrusion Detection Sys-em that performs anomaly detection based on file system BSMudit records. An audit sensor gathers incoming audit records and

pre-processing step filters them to obtain those that are relevanto the monitored daemon. A feature extraction step computes a setf features from each audit record and a machine learning anomalyetection model determines whether each record corresponds toormal behaviour or an alarm should be raised.

Two categories of features are extracted, frequency and binary.requency features allow us to compute the probability of accessf files and directories with specific modes based on previouslyollected audit data. Binary features define qualitative character-stics of the accessed files and directories, such as whether theyorrespond to executable files, configuration files, library files etc.or anomaly detection, two different unsupervised approaches aremployed in FI2DS, namely one-class SVMs with RBF kernel and fullovariance matrix GMMs.

To experimentally validate our approach we generated threeatasets from high traffic real-world web servers, hosting a total of5 web sites. The results of our experiments have been comparedith those of an alternate IDS (FWRAP) and have been proven to

e better.FI2DS uses a moving window to assess audit records and react

n an attack. Results of our experiments show that the proposedpproach is highly effective for intrusion detection on web servers,eaching high detection rates and low false positives rates. Further-ore, good results were obtained with different schemes, dictating

hat the proposed features are highly discriminative for FI2DS, sohat even relatively simple machine learning algorithms performell. At the same time, analysis with all possible feature subsets

hows that all of the proposed features are useful, so feature redun-ancy is negligible.

FI2DS is the first IDS to use only the path attribute of BSMudit records for anomaly detection and the first one to introducehe specific feature space. Experimental results show that ourpproach is highly promising and encourage further research workn this direction. Examples include evaluation of FI2DS for otheraemons beyond Web servers, evaluation on data collected fromeal attack environments (War Games), false positive analysisnd reduction, cluster analysis of the proposed feature space andpplication of more sophisticated moving window algorithms onhe decision engine.

cknowledgments

We would like to thank the Department of Electrical and Com-uter of the Aristotle’s University of Thessaloniki, Greece and theirys-admin team for allowing us to collect data from their serversnd Miltos Allamanis for his directions on how to use GMMs for our
ovelty detection problem.
eferences

[1] C. Burges, A tutorial on support vector machines for pattern recognition, DataMin. Knowl. Discov. 2 (2) (1998) 121–167.

[

[[

Computing 25 (2014) 1–14 13

[2] W. Chen, S. Hsu, H. Shen, Application of svm and ann for intrusion detection,Comput. Oper. Res. 32 (10) (2005) 2617–2634.

[3] E. Corchado, Á Herrero, Neural visualization of network traffic data for intrusiondetection, Appl. Soft Comput. 11 (2) (2011) 2042–2056.

[4] D. Denning, An intrusion-detection model, IEEE Trans. Softw. Eng. 2 (1987)222–232.

[5] S. Forrest, S. Hofmeyr, A. Somayaji, T. Longstaff, A sense of self for unix processes,in: IEEE Symposium on Security and Privacy, 1996 Proceedings, IEEE, 1996, pp.120–128.

[6] S. Forrest, A. Perelson, L. Allen, R. Cherukuri, Self-nonself discrimination in acomputer, in: IEEE ComputerSociety Symposium on Research in Security andPrivacy, 1994 Proceedings, IEEE, 1994, pp. 202–212.

[7] P. Gogoi, B. Borah, D.K. Bhattacharyya, Anomaly detection analysis of intrusiondata using supervised & unsupervised approach, J. Converg. Inf. Technol. 5 (1)(2010) 95–110.

[8] J. Hochberg, K. Jackson, C. Stallings, J. McClary, D. DuBois, J. Ford, Nadir: Anautomated system for detecting network intrusion and misuse, Comput. Secur.12 (3) (1993) 235–248.

[9] S. Hofmeyr, S. Forrest, A. Somayaji, Intrusion detection using sequences ofsystem calls, J. Comput. Secur. 6 (3) (1998) 151–180.

10] S.-J. Horng, M.-Y. Su, Y.-H. Chen, T.-W. Kao, R.-J. Chen, J.-L. Lai, C.D. Perkasa, Anovel intrusion detection system based on hierarchical clustering and supportvector machines, Expert Syst. Appl. 38 (1) (2011) 306–313.

11] E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open Source Scientific Tools forPython, 2001, http://www.scipy.org/scipylib/citing.html.

12] I. Kang, M. Jeong, D. Kong, A differentiated one-class classification methodwith applications to intrusion detection, Expert Syst. Appl. 39 (4) (2012)3899–3905.

13] G. Kou, Y. Peng, Z. Chen, Y. Shi, Multiple criteria mathematical programmingfor multi-class classification and application in network intrusion detection,Inf. Sci. 179 (4) (2009) 371–381.

14] S. Kumar, E. Spafford, A Pattern Matching Model for Misuse Intrusion Detection,in: Proceedings of 17th National Computer Security Conference, 1994.

15] T. Lane, C. Brodley, Sequence matching and learning in anomaly detection forcomputer security, in: AAAI Workshop: AI Approaches to Fraud Detection andRisk Management, 1997, pp. 43–49.

16] W. Lee, S. Stolfo, P. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, J. Zhang,Real time data mining-based intrusion detection, in: DARPA Information Sur-vivability Conference and Exposition II, 2001, DISCEX’01, Proceedings, vol. 1,IEEE, 2001, pp. 89–100.

17] Z. Li, Z. Chen, S. Srinivasan, Y. Zhou, C-miner: Mining block correlations in stor-age systems, in: Proceedings of the 3rd USENIX Conference on File and StorageTechnologies, vol. 186., USENIX Association, 2004.

18] Y. Liao, V. Vemuri, Use of k-nearest neighbor classifier for intrusion detection,Comput. Secur. 21 (5) (2002) 439–448.

19] U. Lindqvist, P. Porras, expert-bsm: a host-based intrusion detection solutionfor sun solaris, in: Proceedings 17th Annual Computer Security ApplicationsConference, ACSAC 2001, IEEE, 2001, pp. 240–251.

20] R. Lippmann, D. Fried, I. Graf, J. Haines, K. Kendall, D. McClung, D. Weber, S.Webster, D. Wyschogrod, R. Cunningham, et al., Evaluating intrusion detectionsystems: The 1998 darpa off-line intrusion detection evaluation, in: DARPAInformation Survivability Conference and Exposition, DISCEX’00, Proceedings,vol. 2, IEEE, 2000, pp. 12–26.

21] R. Lippmann, J. Haines, D. Fried, J. Korba, K. Das, The 1999 darpa off-line intru-sion detection evaluation, Comput. Netw. 34 (4) (2000) 579–595.

22] L. Manevitz, M. Yousef, One-class document classification via neural networks,Neurocomputing 70 (7) (2007) 1466–1481.

23] G.J. McLachlan, K.E. Basford, Mixture models. Inference and applications toclustering Statistics: Textbooks and Monographs, vol. 1, Dekker, New York,1988.

24] D. Mutz, F. Valeur, G. Vigna, C. Kruegel, Anomalous system call detection, ACMTrans. Inform. Syst. Secur. 9 (1) (2006) 61–93.

25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning inPython, J. Mach. Learn. Res. 12 (2011) 2825–2830.

26] S. Peisert, M. Bishop, S. Karin, K. Marzullo, Analysis of computer intrusions usingsequences of function calls, IEEE Trans. Depend. Secure Comput. 4 (2) (2007)137–150.

27] P. Porras, P. Neumann, Emerald: event monitoring enabling response to anoma-lous live disturbances, in: Proceedings of the 20th national information systemssecurity conference, 1997, pp. 3–365.

28] M. Roesch, et al., Snort: lightweight intrusion detection for networks, in: LISA,1999, pp. 229–238.

29] B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, R. Williamson, Estimatingthe support of a high-dimensional distribution, Neural Comput. 13 (7) (2001)1443–1471.

30] M. Sebring, E. Shellhouse, M. Hanna, R. Whitehurst, Expert systems in intrusiondetection: a case study, in: Proceedings of the 11th National Computer SecurityConference, 1988, pp. 4–81.

31] K. Sequeira, M. Zaki, Admit: anomaly-based data mining for intrusions, in: Pro-
ceedings of the eighth ACM SIGKDD international conference on Knowledgediscovery and data mining, ACM, 2002, pp. 386–395.
32] S. Soft, Sunshield Basic Security Module Guide, Sun Microsystems, 1995.33] P. Stanton, W. Yurcik, L. Brumbaugh, Fabs: file and block surveillance system for

determining anomalous disk accesses, in: Information Assurance Workshop,

http://refhub.elsevier.com/S1568-4946(14)00431-1/sbref0005















































































































































































































































































































































































































































































































































































































































































































































































































































































































1 d Soft

[

[

[

[

[

[

[

4 G. Mamalakis et al. / Applie

2005, IAW’05, Proceedings from the Sixth Annual IEEE SMC, IEEE, 2005, pp.207–214.

34] S. Stolfo, S. Hershkop, L. Bui, R. Ferster, K. Wang, Anomaly detection in computersecurity and an application to file system accesses, Found. Intell. Syst. (2005)14–28.

35] S.J. Stolfo, F. Apap, E. Eskin, K. Heller, S. Hershkop, A. Honig, K. Svore, A compar-ative evaluation of two algorithms for windows registry anomaly detection, J.Comput. Secur. 13 (4) (2005) 659–693.

36] C.-F. Tsai, C.-Y. Lin, A triangle area based nearest neighbors approach to intru-sion detection, Pattern Recognit. 43 (1) (2010) 222–229.

37] D. Wagner, P. Soto, Mimicry attacks on host-based intrusion detection systems,in: Proceedings of the 9th ACM Conference on Computer and CommunicationsSecurity, ACM, 2002, pp. 255–264.

[

[

Computing 25 (2014) 1–14

38] W. Wang, X. Guan, X. Zhang, L. Yang, Profiling program behavior for anomalyintrusion detection based on the transition and frequency property of computeraudit data, Comput. Secur. 25 (7) (2006) 539–550.

39] C. Warrender, S. Forrest, B. Pearlmutter, Detecting intrusions using systemcalls: alternative data models, in: Proceedings of the 1999 IEEE Symposiumon Security and Privacy, IEEE, 1999, pp. 133–145.

40] J. Xu, C. Shelton, Intrusion detection using continuous time bayesian networks,J. Artif. Intell. Res. 39 (1) (2010) 745–774.

41] D. Yeung, Y. Ding, Host-based intrusion detection using dynamic and staticbehavioral models, Pattern Recognit. 36 (1) (2003) 229–243.

42] Z. Zhang, H. Shen, Application of online-training svms for real-time intru-sion detection with different considerations, Comput. Commun. 28 (12) (2005)1428–1442.







































































































































































































































































Date post:	20-Apr-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Applied Soft Computing - ISSEL...G. Mamalakis et al. / Applied Soft Computing 25 (2014) 1–14 3...

Documents