Research ArticleFusion of Heterogeneous Intrusion Detection Systems forNetwork Attack Detection
Jayakumar Kaliappan1 Revathi Thiagarajan2 and Karpagam Sundararajan1
1Computer Science and Engineering Kamaraj College of Engineering and Technology Tamilnadu 626 001 India2Information Technology Mepco Schlenk Engineering College Tamilnadu 626 005 India
Correspondence should be addressed to Jayakumar Kaliappan k jeyakumar1979yahoocoin
Received 31 March 2015 Revised 15 June 2015 Accepted 1 July 2015
Academic Editor Juan M Corchado
Copyright copy 2015 Jayakumar Kaliappan et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
An intrusion detection system (IDS) helps to identify different types of attacks in general and the detection rate will be higher forsome specific category of attacksThis paper is designed on the idea that each IDS is efficient in detecting a specific type of attack Inproposed Multiple IDS Unit (MIU) there are five IDS units and each IDS follows a unique algorithm to detect attacks The featureselection is donewith the help of genetic algorithmThe selected features of the input traffic are passed on to theMIU for processingThe decision from each IDS is termed as local decision The fusion unit inside the MIU processes all the local decisions with thehelp of majority voting rule and makes the final decision The proposed system shows a very good improvement in detection rateand reduces the false alarm rate
1 Introduction
Intrusion detection system (IDS) monitors the behavior ofa given environment and identifies the activities that aremalicious or legitimateThere are two common approaches tointrusion detectionmisuse detection and anomaly detectionMisuse detection via signature verification compares a userrsquosactions with the known signatures of attackers attemptingto enter a system It is useful for finding known intrusiontypes but it cannot detect new attacks [1] Anomaly detectionidentifies behavior that differs from well-known statisticalpatterns for users systems or networks Machine learningtechniques are used to capture the normal usage patternsand classify the new behavior as either normal or anomalousIn spite of their capability in detecting unknown attacksanomaly detection systems result in high false alarm rate[2] Anomaly detection can be combined with signatureverification to identify attacks
Feature selection is the most crucial step in constructingany intrusion detection system [3] A set of attributes or fea-tures that are identified to be the most effective are extractedin order to construct a suitable IDS Identifying the featuresthat are relevant to the learning algorithm is a challenge
In some cases redundant features can lead to noisy datathat distract the learning algorithm and degrade the accuracyof the IDS and this slows down the training and testingprocesses Feature selection is proved to have a high impacton the performance of the classifiers Experiments show thatfeature selection can reduce the building and testing time ofa classifier
Multiclassifier Systems (MCSs) focus on the grouping ofclassifiers with heterogeneous or homogeneous modelingbackgrounds to give the final outcome MCSs perform wellwhen there is very sparse data sample for learning In thescarcity case MCSs can use bootstrapping methods such asbagging or boosting [4] MCSs allow training classifiers on adata setrsquos partitions and combining their results using appro-priate combination rules Two canonical topologies work inthe designing ofMCSsThey are parallel and serial topologiesIn parallel topology each classifier supplies the same inputdata so that the last decision of the combined classifier resultis made on the basis of the outputs of each classifier obtainedseparately Alternatively in the serial (or conditional) topol-ogy each classifier is applied in a certain order implying somekind of grade or ordering over them
Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 314601 8 pageshttpdxdoiorg1011552015314601
2 The Scientific World Journal
The rest of the paper is organized as follows Section 2enumerates related works The proposed methodologies areelaborately dealt with in Section 3 with the algorithms fortraining and testing multiple IDS Section 4 discusses theperformance evaluation of the experiments in detail with theresults Section 5 presents the sum-up of the study
2 Related Works
Thomas and Balakrishnan [5] have optimized the perfor-mance of IDS using fusion of multiple IDS The assignmentof weight for each IDS is outlined in this paper and theweights are aggregated to take a correct decision DARPA1999 data set is used to evaluate the IDSs which are out-dated It contains more redundant records and so it affectsclassifier accuracy In their method binary values are usedto decide attack or normal Giacinto et al [6] proposed apattern-recognition approach based on the fusion of multipleclassifiers for network intrusion detection It provides a bettertradeoff between generalization abilities and false alarmgeneration Unfortunately the performances of fusion ruleson unknown attacks show no improvement over the resultsof the individual networks that are obtained No fusion ruleprovides improvements on the performances of the neuralnetwork trained on the overall feature set that attains the sameperformance of oracle Siraj et al [7] proposed the DecisionEngine of an Intelligent Intrusion Detection System (IIDS)that fuses information from different intrusion detectionsensors using an artificial intelligence technique Like neuralnetworks it cannot do self-learning and self-trainingThere isno functionality for customizing the standard attack Parikhand Chen [8] proposed ensemble of classifiers to combinedata from various sources and reduce the cost of falsealarm DLEARNIN and DCMS algorithms are used for theabovementioned purpose In their paper sum and productrules are not used Outputs are not directly compared Gia-cinto et al [9] proposed an unsupervised anomaly-based IDSCombination of one-class classifiers is used in their work fordesigning each module with distinct features for trainingFor high values of false alarm rate the system gives lowdetection rate Li et al [10] constructed a compact data setby clustering redundant data into a compact one Featuresare reduced from 41 to 19 using clustering and the use ofant colony optimization improved the efficiency of intrusiondetectionThe combination of the critical features used in thismethod could not distinguish the attackers and normal usersSung andMukkamala [11] have removed one feature at a timeto carry out an experiment on SVM and neural networkKDDCuprsquo99 data set has been used to verify this techniqueFor five-class classification out of 41 features only 19 of themost significant features are used Li et al [12] proposeda wrapper-based feature selection algorithm to constructlightweight IDS They applied a modified Random MutationHill Climbing (RMHC) for search strategy and modified thelinear SVMfor valuation criterionThismethod speeds up theprocess of selecting features and gives a high detection rate forIDS Since the types of intruders are wider in nature in todayrsquosinformation era the scope for the designing of improved IDSis high motivating the proposed work
3 The Proposed System
31 Motivation With the advent of online business and thesocial network the genuineness of the information availablein the internet has become a question Many human androbot based intruders are playing in an aggressive manner togain advantages of the information Also the kind of attacksin the Internet is nondeterministic in nature making it verycomplex task to detect and react Most of the present daystand-alone intrusion detection systems are not capable ofachieving a reasonably high detection rate and low false alarmrate Most of the existing works on IDSs show distinct per-formance in detecting a certain class of attack with improvedaccuracywhile performingmoderately for the other classes ofattacks It has become possible to obtain a more reliable andaccurate decision for awider class of attacks by combining thedecisions of multiple intrusion detection systems
Nowadays the processors are working in an unimagin-able speed So combining multiple IDSs is not a big issue inthe computation point of view and best-of-breed solutionshave been achieved earlier A better analysis of existingdata gathered by various individual IDSs can detect manyattacks that currently go undetected From the literaturesurvey it is learnt that the usage of appropriate feature selec-tion techniques simplifies the models to make them easierto interpret shorter the training times and enhance thegeneralization by reducing overfitting The challenges indesigning and deploying IDS are increasing due to the widerreach of the Internet services and nonavailability of standardprocedure for characterizing the intruders
32 The Proposed System Architecture The anomaly-basedIDSs identify the abnormal unusual behaviors on a networkand tag them as attacks It does not need any specific knowl-edge The disadvantage of this method is that it producesmore number of false alarmsThe signature-based IDS is wellversed in detecting attacks that match a predefined patternand it produces very minimum number of false alarms andthe fusion of signature-based and anomaly-based techniquesis done for three main reasons First the false alarm rateshould be minimum and it is only possible in signature-based IDS Second any IDS has to identify new attacks andit is possible through anomaly-based techniques Third theidea is that every IDS is efficient in detecting specific typesof attack For example anomaly-based IDS is suitable fordetectingDOS andR2L type attacks and signature-based IDSis good for detecting U2R and PROBE which can be inferredfrom Table 6 The fusion of signature-based and anomaly-based techniques will be able to detect more attacks with lessfalse alarm rate The proposed system consists of a MultipleIDS Unit (MIU) which contains five IDS units following fivedifferent algorithms
The proposed system architecture is shown in Figure 1It contains three phases of work In the first phase featureselection is done with the help of information gain (IG) andgenetic algorithm (GA) There are totally 41 features presentin KDDCuprsquo99 data set Certain features are irrelevant or notneeded for the IDS
The Scientific World Journal 3
MIU
Feature selection(information gain + genetic algorithm)
IDS-1
SVMAnomaly-based
IDS-2
IBKAnomaly-based
IDS-3
J48
Signature-based
IDS-4RandomforestAnomaly-based
IDS-5BayesNet
Signature-based
Decisionfusion unit
Input (X)f1 f2 f3 f4 middot middot middot f40 f41
Final decisionOutput (Y)
Figure 1 The proposed system architecture
Input Feature set FS [ ]Output An array IG [ ] populated with information gain value for each featureInitialize 119894 = 0foreach (119865 in FS)
IG [119894] = IGR(119865)119894++
endfor
Algorithm 1 Information gain calculation
Input Binary chromosome [41]Output Information gain sum with Feature countfor (119894 = 0 to 40)
if (chromosome [119894] == 1)then igsum = igsum + IG [119894]fcnt = fcnt + 1endif
endfor
Algorithm 2 Maximum information gain with minimum featurecount algorithm
When all the 41 features of the input traffic are takenfor processing there is a delay in processing and inefficientoutput is produced Experimenting with all the combinationsof the features is exponentially complex in nature Henceonly the relevant features are chosen with the help of geneticalgorithm (Algorithms 1 and 2) The selected features aregiven as input The feature selection phase will help in
drawing out the relevant features This increases classifieraccuracy and reduces computation speed
In the second phase the output from the first phase (ieinput traffic with selected feature alone) is given as an inputto the MIU and the output is the local decision (119910
119894) which
categorizes the input traffic (DOS PROBE U2R R2L andNORMAL) Five IDSs each with a unique algorithm arepresent in the MIU The five different types of IDS algo-rithms used are Support Vector Machines (SVM) [13] IBKRandomForest J48 and BayesNet SVM IBK and Random-Forest come under the category of anomaly-based IDS [1 2]J48 and BayesNet come under the category of signature-based IDS [1] Every IDS algorithm in theMIU (Algorithm 3)receives the input traffic data record and does the classifica-tion for every input record and five outputs (local decisions)1199101 1199102to 1199105are obtained
In the third phase the output from each IDS119894in MIU
considered as local decision (119910119894) is passed on to the catego-
rization unit The input traffic category is divided into twogroups ATTACK and NOT A ATTACK groups The trafficcategories DOS PROBE U2R and R2L are labeled as
4 The Scientific World Journal
Algorithm MIUInput Input traffic data record 119865 set of all featuresOutput Return whether traffic data record is (ATTACK or NOT A ATTACK)Process(1) Find information gain for each feature in 119865 and store it in IG following Algorithm 1(2) Using Algorithm 2 as the fitness function in the genetic algorithm the features are selected(3) Pass the input traffic data record with 11989110158401015840 into classification algorithm (SVM) which returns the
attack category for each input traffic data record(4) Repeat Step (3) on other classification algorithms IBK J48 RandomForest and BayesNet(5) For each input traffic data record now there are five local decision 119910
1 1199102 119910
5from five
classification algorithms(6) The local decision 119910
119894is labeled as 119910119910
1or 119910119910
2
1199101199101mdashstands for ATTACK1199101199102mdashstands for NOT A ATTACK
If (119910119894== ldquoDOSrdquo 119910
119894== ldquoPROBErdquo 119910
119894== ldquoU2Rrdquo 119910
119894== ldquoR2Lrdquo)
Then119910119894= 1199101199101
Else119910119894= 1199101199102
(7) For each input traffic data record decision from five IDS units is either 1199101199101or 119910119910
2count
the number of 1199101199101and 119910119910
2
If (1199101199101gt 3)
Final decision = 1199101199101
ElseFinal decision = 119910119910
2
Algorithm 3 The proposed system algorithm
Cate
goriz
er
IDS-1
Fusion unit
ATTACK
NOT
ATTACK
IDS-2
IDS-3
IDS-4
IDS-5
y1
y2
y3
y4
y5
yy1
yy1
yy1
yy2
yy2
Figure 2 Fusion process
ATTACK group Normal is labeled as NOT A ATTACKgroup For example if the output (119910
2) from the IDS 2 is
PROBE then it falls under the attack group Fusion process isdepicted in Figure 2 The output from the categorization unit119910119910119894for each local decision (119910
119894) is taken to the decision unit
and the global decision (119911) is taken based on the majorityvoting rule If 3 out of 5 outputs from categorization unitsuggest 119910119910
1(Attack) then the decision unit decides that the
input traffic is of ATTACK type else it is NOT A ATTACK
33 Feature Selection
331 Information Gain Ratio (IGR) Let 119878 be a set of trainingset sampleswith their corresponding labels Suppose there are119898 classes and the training set contains 119878
119894samples of class 119894 and
119878 is the total number of samples in the training set expectedinformation gain ratio is needed to classify a given sample Itis calculated by using the equation
119868 (1198781 1198782 119878119898) = minus119898
sum119894minus1(119878119894
119878) log2(119878119894
119878) (1)
Feature 119865 with values 1198911 1198912 119891V can divide the training
set into V subsets 1198781 1198782 119878V where 119878119895 is the subset which
has the value 119891119895for feature 119865 Furthermore let 119878
119895contain 119878
119894119895
samples of class 119894 Entropy of the feature 119865 is
119864 (119865) =
V
sum119895minus1
1198781119895 + sdot sdot sdot + 119878119898119895
119878lowast 119868 (1198781119895 119878119898119895) (2)
Information gain for 119865 can be calculated as
IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)
332 GA-Based Feature Selection To reduce the dimension-ality and to get better accuracy the relevant features have tobe selected Feature selection is done using genetic algorithmGenetic algorithm fitness function is designed in such a waythat the number of features selected has to be minimum and
The Scientific World Journal 5
Table 1 Genetic algorithm parameters
Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01
the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome
11011100011110101100111001110110011010001
Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1
Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment
The proposed implementation steps are given inAlgorithm 3
4 Performance Evaluation and Results
41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance
The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set
(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records
(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased
(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification
rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3
42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes
False AlarmRate = FPTN + FP
lowast 100
DetectionRate = TPTP + FN
lowast 100
Accuracy = TP + TNTP + TN + FP + FN
lowast 100
(4)
In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs
43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA
Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken
From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features
From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved
False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00
Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 The Scientific World Journal
The rest of the paper is organized as follows Section 2enumerates related works The proposed methodologies areelaborately dealt with in Section 3 with the algorithms fortraining and testing multiple IDS Section 4 discusses theperformance evaluation of the experiments in detail with theresults Section 5 presents the sum-up of the study
2 Related Works
Thomas and Balakrishnan [5] have optimized the perfor-mance of IDS using fusion of multiple IDS The assignmentof weight for each IDS is outlined in this paper and theweights are aggregated to take a correct decision DARPA1999 data set is used to evaluate the IDSs which are out-dated It contains more redundant records and so it affectsclassifier accuracy In their method binary values are usedto decide attack or normal Giacinto et al [6] proposed apattern-recognition approach based on the fusion of multipleclassifiers for network intrusion detection It provides a bettertradeoff between generalization abilities and false alarmgeneration Unfortunately the performances of fusion ruleson unknown attacks show no improvement over the resultsof the individual networks that are obtained No fusion ruleprovides improvements on the performances of the neuralnetwork trained on the overall feature set that attains the sameperformance of oracle Siraj et al [7] proposed the DecisionEngine of an Intelligent Intrusion Detection System (IIDS)that fuses information from different intrusion detectionsensors using an artificial intelligence technique Like neuralnetworks it cannot do self-learning and self-trainingThere isno functionality for customizing the standard attack Parikhand Chen [8] proposed ensemble of classifiers to combinedata from various sources and reduce the cost of falsealarm DLEARNIN and DCMS algorithms are used for theabovementioned purpose In their paper sum and productrules are not used Outputs are not directly compared Gia-cinto et al [9] proposed an unsupervised anomaly-based IDSCombination of one-class classifiers is used in their work fordesigning each module with distinct features for trainingFor high values of false alarm rate the system gives lowdetection rate Li et al [10] constructed a compact data setby clustering redundant data into a compact one Featuresare reduced from 41 to 19 using clustering and the use ofant colony optimization improved the efficiency of intrusiondetectionThe combination of the critical features used in thismethod could not distinguish the attackers and normal usersSung andMukkamala [11] have removed one feature at a timeto carry out an experiment on SVM and neural networkKDDCuprsquo99 data set has been used to verify this techniqueFor five-class classification out of 41 features only 19 of themost significant features are used Li et al [12] proposeda wrapper-based feature selection algorithm to constructlightweight IDS They applied a modified Random MutationHill Climbing (RMHC) for search strategy and modified thelinear SVMfor valuation criterionThismethod speeds up theprocess of selecting features and gives a high detection rate forIDS Since the types of intruders are wider in nature in todayrsquosinformation era the scope for the designing of improved IDSis high motivating the proposed work
3 The Proposed System
31 Motivation With the advent of online business and thesocial network the genuineness of the information availablein the internet has become a question Many human androbot based intruders are playing in an aggressive manner togain advantages of the information Also the kind of attacksin the Internet is nondeterministic in nature making it verycomplex task to detect and react Most of the present daystand-alone intrusion detection systems are not capable ofachieving a reasonably high detection rate and low false alarmrate Most of the existing works on IDSs show distinct per-formance in detecting a certain class of attack with improvedaccuracywhile performingmoderately for the other classes ofattacks It has become possible to obtain a more reliable andaccurate decision for awider class of attacks by combining thedecisions of multiple intrusion detection systems
Nowadays the processors are working in an unimagin-able speed So combining multiple IDSs is not a big issue inthe computation point of view and best-of-breed solutionshave been achieved earlier A better analysis of existingdata gathered by various individual IDSs can detect manyattacks that currently go undetected From the literaturesurvey it is learnt that the usage of appropriate feature selec-tion techniques simplifies the models to make them easierto interpret shorter the training times and enhance thegeneralization by reducing overfitting The challenges indesigning and deploying IDS are increasing due to the widerreach of the Internet services and nonavailability of standardprocedure for characterizing the intruders
32 The Proposed System Architecture The anomaly-basedIDSs identify the abnormal unusual behaviors on a networkand tag them as attacks It does not need any specific knowl-edge The disadvantage of this method is that it producesmore number of false alarmsThe signature-based IDS is wellversed in detecting attacks that match a predefined patternand it produces very minimum number of false alarms andthe fusion of signature-based and anomaly-based techniquesis done for three main reasons First the false alarm rateshould be minimum and it is only possible in signature-based IDS Second any IDS has to identify new attacks andit is possible through anomaly-based techniques Third theidea is that every IDS is efficient in detecting specific typesof attack For example anomaly-based IDS is suitable fordetectingDOS andR2L type attacks and signature-based IDSis good for detecting U2R and PROBE which can be inferredfrom Table 6 The fusion of signature-based and anomaly-based techniques will be able to detect more attacks with lessfalse alarm rate The proposed system consists of a MultipleIDS Unit (MIU) which contains five IDS units following fivedifferent algorithms
The proposed system architecture is shown in Figure 1It contains three phases of work In the first phase featureselection is done with the help of information gain (IG) andgenetic algorithm (GA) There are totally 41 features presentin KDDCuprsquo99 data set Certain features are irrelevant or notneeded for the IDS
The Scientific World Journal 3
MIU
Feature selection(information gain + genetic algorithm)
IDS-1
SVMAnomaly-based
IDS-2
IBKAnomaly-based
IDS-3
J48
Signature-based
IDS-4RandomforestAnomaly-based
IDS-5BayesNet
Signature-based
Decisionfusion unit
Input (X)f1 f2 f3 f4 middot middot middot f40 f41
Final decisionOutput (Y)
Figure 1 The proposed system architecture
Input Feature set FS [ ]Output An array IG [ ] populated with information gain value for each featureInitialize 119894 = 0foreach (119865 in FS)
IG [119894] = IGR(119865)119894++
endfor
Algorithm 1 Information gain calculation
Input Binary chromosome [41]Output Information gain sum with Feature countfor (119894 = 0 to 40)
if (chromosome [119894] == 1)then igsum = igsum + IG [119894]fcnt = fcnt + 1endif
endfor
Algorithm 2 Maximum information gain with minimum featurecount algorithm
When all the 41 features of the input traffic are takenfor processing there is a delay in processing and inefficientoutput is produced Experimenting with all the combinationsof the features is exponentially complex in nature Henceonly the relevant features are chosen with the help of geneticalgorithm (Algorithms 1 and 2) The selected features aregiven as input The feature selection phase will help in
drawing out the relevant features This increases classifieraccuracy and reduces computation speed
In the second phase the output from the first phase (ieinput traffic with selected feature alone) is given as an inputto the MIU and the output is the local decision (119910
119894) which
categorizes the input traffic (DOS PROBE U2R R2L andNORMAL) Five IDSs each with a unique algorithm arepresent in the MIU The five different types of IDS algo-rithms used are Support Vector Machines (SVM) [13] IBKRandomForest J48 and BayesNet SVM IBK and Random-Forest come under the category of anomaly-based IDS [1 2]J48 and BayesNet come under the category of signature-based IDS [1] Every IDS algorithm in theMIU (Algorithm 3)receives the input traffic data record and does the classifica-tion for every input record and five outputs (local decisions)1199101 1199102to 1199105are obtained
In the third phase the output from each IDS119894in MIU
considered as local decision (119910119894) is passed on to the catego-
rization unit The input traffic category is divided into twogroups ATTACK and NOT A ATTACK groups The trafficcategories DOS PROBE U2R and R2L are labeled as
4 The Scientific World Journal
Algorithm MIUInput Input traffic data record 119865 set of all featuresOutput Return whether traffic data record is (ATTACK or NOT A ATTACK)Process(1) Find information gain for each feature in 119865 and store it in IG following Algorithm 1(2) Using Algorithm 2 as the fitness function in the genetic algorithm the features are selected(3) Pass the input traffic data record with 11989110158401015840 into classification algorithm (SVM) which returns the
attack category for each input traffic data record(4) Repeat Step (3) on other classification algorithms IBK J48 RandomForest and BayesNet(5) For each input traffic data record now there are five local decision 119910
1 1199102 119910
5from five
classification algorithms(6) The local decision 119910
119894is labeled as 119910119910
1or 119910119910
2
1199101199101mdashstands for ATTACK1199101199102mdashstands for NOT A ATTACK
If (119910119894== ldquoDOSrdquo 119910
119894== ldquoPROBErdquo 119910
119894== ldquoU2Rrdquo 119910
119894== ldquoR2Lrdquo)
Then119910119894= 1199101199101
Else119910119894= 1199101199102
(7) For each input traffic data record decision from five IDS units is either 1199101199101or 119910119910
2count
the number of 1199101199101and 119910119910
2
If (1199101199101gt 3)
Final decision = 1199101199101
ElseFinal decision = 119910119910
2
Algorithm 3 The proposed system algorithm
Cate
goriz
er
IDS-1
Fusion unit
ATTACK
NOT
ATTACK
IDS-2
IDS-3
IDS-4
IDS-5
y1
y2
y3
y4
y5
yy1
yy1
yy1
yy2
yy2
Figure 2 Fusion process
ATTACK group Normal is labeled as NOT A ATTACKgroup For example if the output (119910
2) from the IDS 2 is
PROBE then it falls under the attack group Fusion process isdepicted in Figure 2 The output from the categorization unit119910119910119894for each local decision (119910
119894) is taken to the decision unit
and the global decision (119911) is taken based on the majorityvoting rule If 3 out of 5 outputs from categorization unitsuggest 119910119910
1(Attack) then the decision unit decides that the
input traffic is of ATTACK type else it is NOT A ATTACK
33 Feature Selection
331 Information Gain Ratio (IGR) Let 119878 be a set of trainingset sampleswith their corresponding labels Suppose there are119898 classes and the training set contains 119878
119894samples of class 119894 and
119878 is the total number of samples in the training set expectedinformation gain ratio is needed to classify a given sample Itis calculated by using the equation
119868 (1198781 1198782 119878119898) = minus119898
sum119894minus1(119878119894
119878) log2(119878119894
119878) (1)
Feature 119865 with values 1198911 1198912 119891V can divide the training
set into V subsets 1198781 1198782 119878V where 119878119895 is the subset which
has the value 119891119895for feature 119865 Furthermore let 119878
119895contain 119878
119894119895
samples of class 119894 Entropy of the feature 119865 is
119864 (119865) =
V
sum119895minus1
1198781119895 + sdot sdot sdot + 119878119898119895
119878lowast 119868 (1198781119895 119878119898119895) (2)
Information gain for 119865 can be calculated as
IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)
332 GA-Based Feature Selection To reduce the dimension-ality and to get better accuracy the relevant features have tobe selected Feature selection is done using genetic algorithmGenetic algorithm fitness function is designed in such a waythat the number of features selected has to be minimum and
The Scientific World Journal 5
Table 1 Genetic algorithm parameters
Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01
the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome
11011100011110101100111001110110011010001
Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1
Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment
The proposed implementation steps are given inAlgorithm 3
4 Performance Evaluation and Results
41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance
The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set
(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records
(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased
(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification
rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3
42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes
False AlarmRate = FPTN + FP
lowast 100
DetectionRate = TPTP + FN
lowast 100
Accuracy = TP + TNTP + TN + FP + FN
lowast 100
(4)
In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs
43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA
Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken
From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features
From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved
False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00
Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 3
MIU
Feature selection(information gain + genetic algorithm)
IDS-1
SVMAnomaly-based
IDS-2
IBKAnomaly-based
IDS-3
J48
Signature-based
IDS-4RandomforestAnomaly-based
IDS-5BayesNet
Signature-based
Decisionfusion unit
Input (X)f1 f2 f3 f4 middot middot middot f40 f41
Final decisionOutput (Y)
Figure 1 The proposed system architecture
Input Feature set FS [ ]Output An array IG [ ] populated with information gain value for each featureInitialize 119894 = 0foreach (119865 in FS)
IG [119894] = IGR(119865)119894++
endfor
Algorithm 1 Information gain calculation
Input Binary chromosome [41]Output Information gain sum with Feature countfor (119894 = 0 to 40)
if (chromosome [119894] == 1)then igsum = igsum + IG [119894]fcnt = fcnt + 1endif
endfor
Algorithm 2 Maximum information gain with minimum featurecount algorithm
When all the 41 features of the input traffic are takenfor processing there is a delay in processing and inefficientoutput is produced Experimenting with all the combinationsof the features is exponentially complex in nature Henceonly the relevant features are chosen with the help of geneticalgorithm (Algorithms 1 and 2) The selected features aregiven as input The feature selection phase will help in
drawing out the relevant features This increases classifieraccuracy and reduces computation speed
In the second phase the output from the first phase (ieinput traffic with selected feature alone) is given as an inputto the MIU and the output is the local decision (119910
119894) which
categorizes the input traffic (DOS PROBE U2R R2L andNORMAL) Five IDSs each with a unique algorithm arepresent in the MIU The five different types of IDS algo-rithms used are Support Vector Machines (SVM) [13] IBKRandomForest J48 and BayesNet SVM IBK and Random-Forest come under the category of anomaly-based IDS [1 2]J48 and BayesNet come under the category of signature-based IDS [1] Every IDS algorithm in theMIU (Algorithm 3)receives the input traffic data record and does the classifica-tion for every input record and five outputs (local decisions)1199101 1199102to 1199105are obtained
In the third phase the output from each IDS119894in MIU
considered as local decision (119910119894) is passed on to the catego-
rization unit The input traffic category is divided into twogroups ATTACK and NOT A ATTACK groups The trafficcategories DOS PROBE U2R and R2L are labeled as
4 The Scientific World Journal
Algorithm MIUInput Input traffic data record 119865 set of all featuresOutput Return whether traffic data record is (ATTACK or NOT A ATTACK)Process(1) Find information gain for each feature in 119865 and store it in IG following Algorithm 1(2) Using Algorithm 2 as the fitness function in the genetic algorithm the features are selected(3) Pass the input traffic data record with 11989110158401015840 into classification algorithm (SVM) which returns the
attack category for each input traffic data record(4) Repeat Step (3) on other classification algorithms IBK J48 RandomForest and BayesNet(5) For each input traffic data record now there are five local decision 119910
1 1199102 119910
5from five
classification algorithms(6) The local decision 119910
119894is labeled as 119910119910
1or 119910119910
2
1199101199101mdashstands for ATTACK1199101199102mdashstands for NOT A ATTACK
If (119910119894== ldquoDOSrdquo 119910
119894== ldquoPROBErdquo 119910
119894== ldquoU2Rrdquo 119910
119894== ldquoR2Lrdquo)
Then119910119894= 1199101199101
Else119910119894= 1199101199102
(7) For each input traffic data record decision from five IDS units is either 1199101199101or 119910119910
2count
the number of 1199101199101and 119910119910
2
If (1199101199101gt 3)
Final decision = 1199101199101
ElseFinal decision = 119910119910
2
Algorithm 3 The proposed system algorithm
Cate
goriz
er
IDS-1
Fusion unit
ATTACK
NOT
ATTACK
IDS-2
IDS-3
IDS-4
IDS-5
y1
y2
y3
y4
y5
yy1
yy1
yy1
yy2
yy2
Figure 2 Fusion process
ATTACK group Normal is labeled as NOT A ATTACKgroup For example if the output (119910
2) from the IDS 2 is
PROBE then it falls under the attack group Fusion process isdepicted in Figure 2 The output from the categorization unit119910119910119894for each local decision (119910
119894) is taken to the decision unit
and the global decision (119911) is taken based on the majorityvoting rule If 3 out of 5 outputs from categorization unitsuggest 119910119910
1(Attack) then the decision unit decides that the
input traffic is of ATTACK type else it is NOT A ATTACK
33 Feature Selection
331 Information Gain Ratio (IGR) Let 119878 be a set of trainingset sampleswith their corresponding labels Suppose there are119898 classes and the training set contains 119878
119894samples of class 119894 and
119878 is the total number of samples in the training set expectedinformation gain ratio is needed to classify a given sample Itis calculated by using the equation
119868 (1198781 1198782 119878119898) = minus119898
sum119894minus1(119878119894
119878) log2(119878119894
119878) (1)
Feature 119865 with values 1198911 1198912 119891V can divide the training
set into V subsets 1198781 1198782 119878V where 119878119895 is the subset which
has the value 119891119895for feature 119865 Furthermore let 119878
119895contain 119878
119894119895
samples of class 119894 Entropy of the feature 119865 is
119864 (119865) =
V
sum119895minus1
1198781119895 + sdot sdot sdot + 119878119898119895
119878lowast 119868 (1198781119895 119878119898119895) (2)
Information gain for 119865 can be calculated as
IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)
332 GA-Based Feature Selection To reduce the dimension-ality and to get better accuracy the relevant features have tobe selected Feature selection is done using genetic algorithmGenetic algorithm fitness function is designed in such a waythat the number of features selected has to be minimum and
The Scientific World Journal 5
Table 1 Genetic algorithm parameters
Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01
the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome
11011100011110101100111001110110011010001
Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1
Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment
The proposed implementation steps are given inAlgorithm 3
4 Performance Evaluation and Results
41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance
The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set
(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records
(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased
(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification
rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3
42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes
False AlarmRate = FPTN + FP
lowast 100
DetectionRate = TPTP + FN
lowast 100
Accuracy = TP + TNTP + TN + FP + FN
lowast 100
(4)
In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs
43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA
Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken
From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features
From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved
False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00
Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 The Scientific World Journal
Algorithm MIUInput Input traffic data record 119865 set of all featuresOutput Return whether traffic data record is (ATTACK or NOT A ATTACK)Process(1) Find information gain for each feature in 119865 and store it in IG following Algorithm 1(2) Using Algorithm 2 as the fitness function in the genetic algorithm the features are selected(3) Pass the input traffic data record with 11989110158401015840 into classification algorithm (SVM) which returns the
attack category for each input traffic data record(4) Repeat Step (3) on other classification algorithms IBK J48 RandomForest and BayesNet(5) For each input traffic data record now there are five local decision 119910
1 1199102 119910
5from five
classification algorithms(6) The local decision 119910
119894is labeled as 119910119910
1or 119910119910
2
1199101199101mdashstands for ATTACK1199101199102mdashstands for NOT A ATTACK
If (119910119894== ldquoDOSrdquo 119910
119894== ldquoPROBErdquo 119910
119894== ldquoU2Rrdquo 119910
119894== ldquoR2Lrdquo)
Then119910119894= 1199101199101
Else119910119894= 1199101199102
(7) For each input traffic data record decision from five IDS units is either 1199101199101or 119910119910
2count
the number of 1199101199101and 119910119910
2
If (1199101199101gt 3)
Final decision = 1199101199101
ElseFinal decision = 119910119910
2
Algorithm 3 The proposed system algorithm
Cate
goriz
er
IDS-1
Fusion unit
ATTACK
NOT
ATTACK
IDS-2
IDS-3
IDS-4
IDS-5
y1
y2
y3
y4
y5
yy1
yy1
yy1
yy2
yy2
Figure 2 Fusion process
ATTACK group Normal is labeled as NOT A ATTACKgroup For example if the output (119910
2) from the IDS 2 is
PROBE then it falls under the attack group Fusion process isdepicted in Figure 2 The output from the categorization unit119910119910119894for each local decision (119910
119894) is taken to the decision unit
and the global decision (119911) is taken based on the majorityvoting rule If 3 out of 5 outputs from categorization unitsuggest 119910119910
1(Attack) then the decision unit decides that the
input traffic is of ATTACK type else it is NOT A ATTACK
33 Feature Selection
331 Information Gain Ratio (IGR) Let 119878 be a set of trainingset sampleswith their corresponding labels Suppose there are119898 classes and the training set contains 119878
119894samples of class 119894 and
119878 is the total number of samples in the training set expectedinformation gain ratio is needed to classify a given sample Itis calculated by using the equation
119868 (1198781 1198782 119878119898) = minus119898
sum119894minus1(119878119894
119878) log2(119878119894
119878) (1)
Feature 119865 with values 1198911 1198912 119891V can divide the training
set into V subsets 1198781 1198782 119878V where 119878119895 is the subset which
has the value 119891119895for feature 119865 Furthermore let 119878
119895contain 119878
119894119895
samples of class 119894 Entropy of the feature 119865 is
119864 (119865) =
V
sum119895minus1
1198781119895 + sdot sdot sdot + 119878119898119895
119878lowast 119868 (1198781119895 119878119898119895) (2)
Information gain for 119865 can be calculated as
IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)
332 GA-Based Feature Selection To reduce the dimension-ality and to get better accuracy the relevant features have tobe selected Feature selection is done using genetic algorithmGenetic algorithm fitness function is designed in such a waythat the number of features selected has to be minimum and
The Scientific World Journal 5
Table 1 Genetic algorithm parameters
Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01
the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome
11011100011110101100111001110110011010001
Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1
Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment
The proposed implementation steps are given inAlgorithm 3
4 Performance Evaluation and Results
41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance
The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set
(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records
(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased
(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification
rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3
42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes
False AlarmRate = FPTN + FP
lowast 100
DetectionRate = TPTP + FN
lowast 100
Accuracy = TP + TNTP + TN + FP + FN
lowast 100
(4)
In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs
43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA
Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken
From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features
From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved
False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00
Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 5
Table 1 Genetic algorithm parameters
Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01
the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome
11011100011110101100111001110110011010001
Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1
Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment
The proposed implementation steps are given inAlgorithm 3
4 Performance Evaluation and Results
41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance
The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set
(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records
(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased
(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification
rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3
42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes
False AlarmRate = FPTN + FP
lowast 100
DetectionRate = TPTP + FN
lowast 100
Accuracy = TP + TNTP + TN + FP + FN
lowast 100
(4)
In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs
43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA
Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken
From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features
From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved
False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00
Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 The Scientific World Journal
Table 2 Most relevant features for each attack and information gain measures
Attack type Attack pattern Igsum value Various combination of features giving high information gain value
PROBE
ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37
portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40
DOS
back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38
neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39
teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40
U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40
rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41
R2L
guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41
multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39
warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39
Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38
Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set
Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples
Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538
25192 100 22544 100 3600
Table 4 Confusion matrix
Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack
selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068
The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and
Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically
Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods
5 Conclusion
The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 7
Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features
ClassifierTraining data set Testing data set
All features(seconds)
28 features(seconds)
Reduction in training(built-in) time ()
All features(seconds)
28 features(seconds)
Reduction intesting (built-in)
time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4
Table 6 Detection rate and false alarm rate of each classifier for test data
Attack typeDetection rate False alarm rate
Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet
DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22
Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack
Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)
DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1
100
0102030405060708090
Det
ectio
n ra
te (
)
DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work
are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is
40
35
30
25
20
15
10
5
0DOS PROBE U2R R2L
Thomas and Balakrishnan work [5]Proposed system
Attack types
False
alar
m ra
te
Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work
the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 The Scientific World Journal
compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013
[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014
[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005
[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014
[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009
[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003
[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004
[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008
[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008
[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012
[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA
[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009
[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011
[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html
[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014