Research Article Fusion of Heterogeneous Intrusion...

Research ArticleFusion of Heterogeneous Intrusion Detection Systems forNetwork Attack Detection

Jayakumar Kaliappan1 Revathi Thiagarajan2 and Karpagam Sundararajan1

1Computer Science and Engineering Kamaraj College of Engineering and Technology Tamilnadu 626 001 India2Information Technology Mepco Schlenk Engineering College Tamilnadu 626 005 India

Correspondence should be addressed to Jayakumar Kaliappan k jeyakumar1979yahoocoin

Received 31 March 2015 Revised 15 June 2015 Accepted 1 July 2015

Academic Editor Juan M Corchado

Copyright copy 2015 Jayakumar Kaliappan et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

An intrusion detection system (IDS) helps to identify different types of attacks in general and the detection rate will be higher forsome specific category of attacksThis paper is designed on the idea that each IDS is efficient in detecting a specific type of attack Inproposed Multiple IDS Unit (MIU) there are five IDS units and each IDS follows a unique algorithm to detect attacks The featureselection is donewith the help of genetic algorithmThe selected features of the input traffic are passed on to theMIU for processingThe decision from each IDS is termed as local decision The fusion unit inside the MIU processes all the local decisions with thehelp of majority voting rule and makes the final decision The proposed system shows a very good improvement in detection rateand reduces the false alarm rate

1 Introduction

Intrusion detection system (IDS) monitors the behavior ofa given environment and identifies the activities that aremalicious or legitimateThere are two common approaches tointrusion detectionmisuse detection and anomaly detectionMisuse detection via signature verification compares a userrsquosactions with the known signatures of attackers attemptingto enter a system It is useful for finding known intrusiontypes but it cannot detect new attacks [1] Anomaly detectionidentifies behavior that differs from well-known statisticalpatterns for users systems or networks Machine learningtechniques are used to capture the normal usage patternsand classify the new behavior as either normal or anomalousIn spite of their capability in detecting unknown attacksanomaly detection systems result in high false alarm rate[2] Anomaly detection can be combined with signatureverification to identify attacks

Feature selection is the most crucial step in constructingany intrusion detection system [3] A set of attributes or fea-tures that are identified to be the most effective are extractedin order to construct a suitable IDS Identifying the featuresthat are relevant to the learning algorithm is a challenge

In some cases redundant features can lead to noisy datathat distract the learning algorithm and degrade the accuracyof the IDS and this slows down the training and testingprocesses Feature selection is proved to have a high impacton the performance of the classifiers Experiments show thatfeature selection can reduce the building and testing time ofa classifier

Multiclassifier Systems (MCSs) focus on the grouping ofclassifiers with heterogeneous or homogeneous modelingbackgrounds to give the final outcome MCSs perform wellwhen there is very sparse data sample for learning In thescarcity case MCSs can use bootstrapping methods such asbagging or boosting [4] MCSs allow training classifiers on adata setrsquos partitions and combining their results using appro-priate combination rules Two canonical topologies work inthe designing ofMCSsThey are parallel and serial topologiesIn parallel topology each classifier supplies the same inputdata so that the last decision of the combined classifier resultis made on the basis of the outputs of each classifier obtainedseparately Alternatively in the serial (or conditional) topol-ogy each classifier is applied in a certain order implying somekind of grade or ordering over them

Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 314601 8 pageshttpdxdoiorg1011552015314601

2 The Scientific World Journal

The rest of the paper is organized as follows Section 2enumerates related works The proposed methodologies areelaborately dealt with in Section 3 with the algorithms fortraining and testing multiple IDS Section 4 discusses theperformance evaluation of the experiments in detail with theresults Section 5 presents the sum-up of the study

2 Related Works

Thomas and Balakrishnan [5] have optimized the perfor-mance of IDS using fusion of multiple IDS The assignmentof weight for each IDS is outlined in this paper and theweights are aggregated to take a correct decision DARPA1999 data set is used to evaluate the IDSs which are out-dated It contains more redundant records and so it affectsclassifier accuracy In their method binary values are usedto decide attack or normal Giacinto et al [6] proposed apattern-recognition approach based on the fusion of multipleclassifiers for network intrusion detection It provides a bettertradeoff between generalization abilities and false alarmgeneration Unfortunately the performances of fusion ruleson unknown attacks show no improvement over the resultsof the individual networks that are obtained No fusion ruleprovides improvements on the performances of the neuralnetwork trained on the overall feature set that attains the sameperformance of oracle Siraj et al [7] proposed the DecisionEngine of an Intelligent Intrusion Detection System (IIDS)that fuses information from different intrusion detectionsensors using an artificial intelligence technique Like neuralnetworks it cannot do self-learning and self-trainingThere isno functionality for customizing the standard attack Parikhand Chen [8] proposed ensemble of classifiers to combinedata from various sources and reduce the cost of falsealarm DLEARNIN and DCMS algorithms are used for theabovementioned purpose In their paper sum and productrules are not used Outputs are not directly compared Gia-cinto et al [9] proposed an unsupervised anomaly-based IDSCombination of one-class classifiers is used in their work fordesigning each module with distinct features for trainingFor high values of false alarm rate the system gives lowdetection rate Li et al [10] constructed a compact data setby clustering redundant data into a compact one Featuresare reduced from 41 to 19 using clustering and the use ofant colony optimization improved the efficiency of intrusiondetectionThe combination of the critical features used in thismethod could not distinguish the attackers and normal usersSung andMukkamala [11] have removed one feature at a timeto carry out an experiment on SVM and neural networkKDDCuprsquo99 data set has been used to verify this techniqueFor five-class classification out of 41 features only 19 of themost significant features are used Li et al [12] proposeda wrapper-based feature selection algorithm to constructlightweight IDS They applied a modified Random MutationHill Climbing (RMHC) for search strategy and modified thelinear SVMfor valuation criterionThismethod speeds up theprocess of selecting features and gives a high detection rate forIDS Since the types of intruders are wider in nature in todayrsquosinformation era the scope for the designing of improved IDSis high motivating the proposed work

3 The Proposed System

31 Motivation With the advent of online business and thesocial network the genuineness of the information availablein the internet has become a question Many human androbot based intruders are playing in an aggressive manner togain advantages of the information Also the kind of attacksin the Internet is nondeterministic in nature making it verycomplex task to detect and react Most of the present daystand-alone intrusion detection systems are not capable ofachieving a reasonably high detection rate and low false alarmrate Most of the existing works on IDSs show distinct per-formance in detecting a certain class of attack with improvedaccuracywhile performingmoderately for the other classes ofattacks It has become possible to obtain a more reliable andaccurate decision for awider class of attacks by combining thedecisions of multiple intrusion detection systems

Nowadays the processors are working in an unimagin-able speed So combining multiple IDSs is not a big issue inthe computation point of view and best-of-breed solutionshave been achieved earlier A better analysis of existingdata gathered by various individual IDSs can detect manyattacks that currently go undetected From the literaturesurvey it is learnt that the usage of appropriate feature selec-tion techniques simplifies the models to make them easierto interpret shorter the training times and enhance thegeneralization by reducing overfitting The challenges indesigning and deploying IDS are increasing due to the widerreach of the Internet services and nonavailability of standardprocedure for characterizing the intruders

32 The Proposed System Architecture The anomaly-basedIDSs identify the abnormal unusual behaviors on a networkand tag them as attacks It does not need any specific knowl-edge The disadvantage of this method is that it producesmore number of false alarmsThe signature-based IDS is wellversed in detecting attacks that match a predefined patternand it produces very minimum number of false alarms andthe fusion of signature-based and anomaly-based techniquesis done for three main reasons First the false alarm rateshould be minimum and it is only possible in signature-based IDS Second any IDS has to identify new attacks andit is possible through anomaly-based techniques Third theidea is that every IDS is efficient in detecting specific typesof attack For example anomaly-based IDS is suitable fordetectingDOS andR2L type attacks and signature-based IDSis good for detecting U2R and PROBE which can be inferredfrom Table 6 The fusion of signature-based and anomaly-based techniques will be able to detect more attacks with lessfalse alarm rate The proposed system consists of a MultipleIDS Unit (MIU) which contains five IDS units following fivedifferent algorithms

The proposed system architecture is shown in Figure 1It contains three phases of work In the first phase featureselection is done with the help of information gain (IG) andgenetic algorithm (GA) There are totally 41 features presentin KDDCuprsquo99 data set Certain features are irrelevant or notneeded for the IDS

The Scientific World Journal 3

MIU

Feature selection(information gain + genetic algorithm)

IDS-1

SVMAnomaly-based

IDS-2

IBKAnomaly-based

IDS-3

J48

Signature-based

IDS-4RandomforestAnomaly-based

IDS-5BayesNet

Signature-based

Decisionfusion unit

Input (X)f1 f2 f3 f4 middot middot middot f40 f41

Final decisionOutput (Y)

Figure 1 The proposed system architecture

Input Feature set FS [ ]Output An array IG [ ] populated with information gain value for each featureInitialize 119894 = 0foreach (119865 in FS)

IG [119894] = IGR(119865)119894++

endfor

Algorithm 1 Information gain calculation

Input Binary chromosome [41]Output Information gain sum with Feature countfor (119894 = 0 to 40)

if (chromosome [119894] == 1)then igsum = igsum + IG [119894]fcnt = fcnt + 1endif

endfor

Algorithm 2 Maximum information gain with minimum featurecount algorithm

When all the 41 features of the input traffic are takenfor processing there is a delay in processing and inefficientoutput is produced Experimenting with all the combinationsof the features is exponentially complex in nature Henceonly the relevant features are chosen with the help of geneticalgorithm (Algorithms 1 and 2) The selected features aregiven as input The feature selection phase will help in

drawing out the relevant features This increases classifieraccuracy and reduces computation speed

In the second phase the output from the first phase (ieinput traffic with selected feature alone) is given as an inputto the MIU and the output is the local decision (119910

119894) which

categorizes the input traffic (DOS PROBE U2R R2L andNORMAL) Five IDSs each with a unique algorithm arepresent in the MIU The five different types of IDS algo-rithms used are Support Vector Machines (SVM) [13] IBKRandomForest J48 and BayesNet SVM IBK and Random-Forest come under the category of anomaly-based IDS [1 2]J48 and BayesNet come under the category of signature-based IDS [1] Every IDS algorithm in theMIU (Algorithm 3)receives the input traffic data record and does the classifica-tion for every input record and five outputs (local decisions)1199101 1199102to 1199105are obtained

In the third phase the output from each IDS119894in MIU

considered as local decision (119910119894) is passed on to the catego-

rization unit The input traffic category is divided into twogroups ATTACK and NOT A ATTACK groups The trafficcategories DOS PROBE U2R and R2L are labeled as


Algorithm MIUInput Input traffic data record 119865 set of all featuresOutput Return whether traffic data record is (ATTACK or NOT A ATTACK)Process(1) Find information gain for each feature in 119865 and store it in IG following Algorithm 1(2) Using Algorithm 2 as the fitness function in the genetic algorithm the features are selected(3) Pass the input traffic data record with 11989110158401015840 into classification algorithm (SVM) which returns the

attack category for each input traffic data record(4) Repeat Step (3) on other classification algorithms IBK J48 RandomForest and BayesNet(5) For each input traffic data record now there are five local decision 119910

1 1199102 119910

5from five

classification algorithms(6) The local decision 119910

119894is labeled as 119910119910

1or 119910119910

2

1199101199101mdashstands for ATTACK1199101199102mdashstands for NOT A ATTACK

If (119910119894== ldquoDOSrdquo 119910

119894== ldquoPROBErdquo 119910

119894== ldquoU2Rrdquo 119910

119894== ldquoR2Lrdquo)

Then119910119894= 1199101199101

Else119910119894= 1199101199102

(7) For each input traffic data record decision from five IDS units is either 1199101199101or 119910119910

2count

the number of 1199101199101and 119910119910

2

If (1199101199101gt 3)

Final decision = 1199101199101

ElseFinal decision = 119910119910

2

Algorithm 3 The proposed system algorithm

Cate

goriz

er

IDS-1

Fusion unit

ATTACK

NOT

ATTACK

IDS-2

IDS-3

IDS-4

IDS-5

y1

y2

y3

y4

y5

yy1

yy1

yy1

yy2

yy2

Figure 2 Fusion process

ATTACK group Normal is labeled as NOT A ATTACKgroup For example if the output (119910

2) from the IDS 2 is

PROBE then it falls under the attack group Fusion process isdepicted in Figure 2 The output from the categorization unit119910119910119894for each local decision (119910

119894) is taken to the decision unit

and the global decision (119911) is taken based on the majorityvoting rule If 3 out of 5 outputs from categorization unitsuggest 119910119910

1(Attack) then the decision unit decides that the

input traffic is of ATTACK type else it is NOT A ATTACK

33 Feature Selection

331 Information Gain Ratio (IGR) Let 119878 be a set of trainingset sampleswith their corresponding labels Suppose there are119898 classes and the training set contains 119878

119894samples of class 119894 and

119878 is the total number of samples in the training set expectedinformation gain ratio is needed to classify a given sample Itis calculated by using the equation

119868 (1198781 1198782 119878119898) = minus119898

sum119894minus1(119878119894

119878) log2(119878119894

119878) (1)

Feature 119865 with values 1198911 1198912 119891V can divide the training

set into V subsets 1198781 1198782 119878V where 119878119895 is the subset which

has the value 119891119895for feature 119865 Furthermore let 119878

119895contain 119878

119894119895

samples of class 119894 Entropy of the feature 119865 is

119864 (119865) =

V

sum119895minus1

1198781119895 + sdot sdot sdot + 119878119898119895

119878lowast 119868 (1198781119895 119878119898119895) (2)

Information gain for 119865 can be calculated as

IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)

332 GA-Based Feature Selection To reduce the dimension-ality and to get better accuracy the relevant features have tobe selected Feature selection is done using genetic algorithmGenetic algorithm fitness function is designed in such a waythat the number of features selected has to be minimum and


Table 1 Genetic algorithm parameters

Modeling description SettingPopulation size 40Selection technique Roulette wheelCrossover type Uniform crossoverCrossover rate 05Mutation rate 01

the sum of their information gain value should be maximumThe genetic algorithm is designed to have a population sizeof 40 The binary chromosome of length 41 is constructedwith each bit representing a featureThis binary chromosomeis given as input to the fitness function (Algorithm 2) Theinformation gain value (IG) of the selected features (ie bitset as 1) is summed up to get the total information gain value(igsum) The total number of 1rsquos set in the chromosome givesthe feature count (fcnt) For example consider the followingchromosome

11011100011110101100111001110110011010001

Here bit 5 is set (ie value = 1) then it indicates that the5th feature is selected for processing In this chromosometotally 24 bits are set so the feature count (fcnt) is 24The totalinformation gain value (igsum) obtained by summing up theinformation gain (IG) of 24 selected features is 037586 Thegenetic algorithm parameter values are listed in Table 1

Table 2 gives the various eminent feature combinationsobtained for different attack types using genetic algorithmThe features that are mostly repeated in the list are selectedfor the experiment

The proposed implementation steps are given inAlgorithm 3

4 Performance Evaluation and Results

41 NSL-KDD Data Set One of the main drawbacks in theKDDCuprsquo99 data set is repetition of records which causes thelearning algorithms to be partial towards the repeated rec-ords Thus it prevents them from learning irregular recordswhich are usually more harmful to networks in U2R andR2L attacks In addition the occurrences of these redundantrecords in the test set will cause biased result in the perfor-mance

The NSL-KDD benchmark data set [14] has the followingbenefits over the KDDCuprsquo99 data set

(i) It does not include repeated records in the trainingset and so the classifiers will not be partial towardsmore repeated records

(ii) There is no replica record in the testing sets There-fore the performances of the learners are not biased

(iii) The number of selected records from each groupof difficulty level is inversely proportional to thepercentage of records in the original KDDCuprsquo99 dataset and thus helps an accurate evaluation of differentlearning techniques As a result the classification

rates of various machine learning methods vary in awider range which makes it more efficient to detectdifferent types of attacksThe sample distributions onthe training and testing data sets with the correctedlabels of NSL-KDD data set are shown in Table 3

42 Performance EvaluationMetrics Theperformance of theproposed intrusion detection system is evaluated with thehelp of confusion matrix The classification performance ofIDS is measured by false alarm rate detection rate andaccuracy They can be calculated using the confusion matrixin Table 4 Confusion matrix is a 2 times 2 matrix where therows represent actual classes while the columns have thecorresponding values to the predicted classes

False AlarmRate = FPTN + FP

lowast 100

DetectionRate = TPTP + FN

lowast 100

Accuracy = TP + TNTP + TN + FP + FN

lowast 100

(4)

In this section the performance of the proposed intrusiondetection system is studied with the help of an experimentIn this experiment only the relevant features are selectedusing the information gain algorithm and genetic algorithmThe selected features and training data set are given as inputto the MIU unit and the performance measures such asaccuracy detection rate and false alarm rate are consideredfor evaluationThe results are tabulated and plotted as graphs

43 Experiment Results All experiments were performedon a Windows platform having configuration Intel core2DuoCPU 249GHZ 2GBRAM Simulations and the anal-ysis of experimental results are performed with the use ofWeka machine learning tool [15] and JAVA

Selected features are considered for training the fusionIDS in this experiment and test data with 2839 of novel(new attack) data is taken

From Table 5 it is inferred that for J48 classifier thereis 57 of reduction in testing time when considering 28features instead of taking all features

From Table 6 it is inferred that detection rate and falsealarm rate of intrusion detection systems with feature selec-tion using single classifier like SVM IBK J48 RandomForestand BayesNet are inferior to those of the fusion IDS unit Forexample in U2R type of attack the detection rate achievedby SVM classifier is 86 IBK classifier is 83 J48 is 825and BayesNet is 805When a fusion IDS unit withmultipleheterogeneous IDS is used a higher detection rate of 99 isachieved

False alarm rate (FAR) is reduced a lot when a fusion IDSunit with multiple heterogeneous IDS is used For examplethe FAR found for DOS attack type using SVM is 07 IBK is03 J48 is 01 RandomForest is 02 and BayesNet is 03Whenthe fusion IDS is used the FAR is achieved at 00

Detection rate (DTR) and false alarm rate (FAR) ofthe proposed system for the different types of attack using


Table 2 Most relevant features for each attack and information gain measures

Attack type Attack pattern Igsum value Various combination of features giving high information gain value

PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37

portsweep 058 3 4 10 24 27 29 34 35 36 37 41satan 075 1 3 5 11 15 19 23 24 25 27 28 29 30 31 32 35 39 40 41mscan 111 1 3 4 5 7 12 17 21 25 27 28 29 31 33 35 39 40 41saint 033 1 5 7 12 16 24 25 29 32 33 34 35 37 38 40

DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38

neptune 773 1 3 4 5 6 7 13 15 17 19 20 26 28 29 30 31 33 34 35 38 39pod 0052 2 3 5 7 8 9 10 11 17 19 21 23 26 33 34 39 40smurf 068 2 3 5 8 17 23 24 25 26 29 33 35 36 38 39

teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40

U2RBuffer overflow 00086 1 2 3 5 6 7 8 9 10 14 21 23 29 30 31 32 33 36 38 39 40loadmodule 00058 1 2 3 4 7 8 14 27 36 39 40

rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L

guess passwd 0025 2 3 4 6 9 10 11 13 14 17 21 23 24 37 38 39 40 41imap 00035 3 4 5 6 10 12 20 23 25 27 29 30 3233 34 36 38 39 41

multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39

warezclient 021 3 4 5 6 10 12 14 16 24 27 28 29 30 32 33 34 35 37 38 39 40 41warezmaster 0008 1 2 3 4 6 12 13 14 16 17 19 22 23 24 31 35 36 37 39

Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38

Table 3 The sample distributions on the training and testing data sets with the corrected labels of NSL-KDD data set

Class Training data set Testing data setNumber of samples Samples percentage () Number of samples Samples percentage () Number of novel attack samples

Normal 13449 5339 9866 4376 mdashPROBE 2289 909 2421 1074 1315DOS 9234 3665 7456 3307 1715U2R 11 004 67 030 32R2L 208 083 2734 1213 538

25192 100 22544 100 3600

Table 4 Confusion matrix

Predicted attack Predicted normalActual attack True positive (TP) False negative (FN)Actual normal False positive (FP) True negative (TN)True positive (TP) the number of attacks detected when it is actually attackTrue negative (TN) the number of normal detected when it is actuallynormalFalse positive (FP) the number of attacks detectedwhen it is actually normalFalse negative (FN) the number of normal detectedwhen it is actually attack

selected features of the test data set of KDDCuprsquo99 data setare tabulated in Table 7 On an average 984 of detectionrate is achievedThe average false alarm rate achieved is 068

The experimental results ofThomas and Balakrishnan [5]paper are taken for a comparative study Table 7 gives thedetection rate of the proposed system and the Thomas and

Balakrishnan [5] work The detection rate for DOS is 64 inprevious [5] work and it is 99 for the proposed system Sim-ilarly for PROBE U2R and R2L there is a high improvementin detection rate while comparing with previous work [5]Particularly for R2L there is improvement in the detectionrate Similarly the false alarm rate for DOS is 3620 in thework of Thomas and Balakrishnan [5] but in the proposedwork the value is minimized to 10 and for PROBE U2R andR2L also the false alarm rate value has decreased drastically

Figures 3 and 4 present a comparative study of detectionrate and false alarm rate of the proposed and existing fusionmethods

5 Conclusion

The key idea behind the study is that any IDS is efficient indetecting some specific attack category Different IDSs which


Table 5 Comparison of training and testing (built-in) time for different classifier using all and selected features

ClassifierTraining data set Testing data set

All features(seconds)

28 features(seconds)

Reduction in training(built-in) time ()



Reduction intesting (built-in)

time ()BayesNet 086 047 59 069 055 23RandomForest 1391 1031 30 1288 1007 24J48 192 155 22 169 094 57IBK 030 015 67 025 014 56SVM 790 710 11 12600 1210 4

Table 6 Detection rate and false alarm rate of each classifier for test data

Attack typeDetection rate False alarm rate

Anomaly-based Signature-based Anomaly-based Signature-basedSVM IBK RandomForest J48 BayesNet SVM IBK RandomForest J48 BayesNet

DOS 954 995 997 996 937 07 03 02 01 03PROBE 981 977 982 981 980 08 03 01 01 12U2R 860 830 860 825 805 01 02 01 01 08R2L 943 940 955 952 904 16 11 07 06 23Normal 941 972 985 985 921 38 18 12 13 22

Table 7 Comparison of detection rate and false alarm rate for Thomas and Balakrishnan [5] work and proposed system for different attack

Attack Detection rate False alarm rateThomas and Balakrishnan [5] Proposed system (28 features) Thomas and Balakrishnan [5] Proposed system (28 features)

DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L

Thomas and Balakrishnan work [5]Proposed system

Attack types

Figure 3 Performance comparison on detection rate of proposedwork andThomas and Balakrishnan [5] work

are good in detecting different attacks are combined togetherand an MIU is framed This paper uses only relevant featuresof the input traffic data for processing and the promisingclassification result is obtained from the MIU which is

40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te

Figure 4 Performance comparison on false alarm rate of proposedwork andThomas and Balakrishnan [5] work

the fusion of heterogeneous IDSs In comparison with thework of Thomas and Balakrishnan [5] good improvementin the detection rate and false alarm rate is achieved Whenthe detection rate and false alarm rate of single IDS unit are


compared with fusion IDS unit there is a vast improvementin the performance The feature selection done with geneticalgorithm has extracted the relevant features from the 41 fea-tures As a result there is improvement in training and testingspeed and good accuracy foundThe binary interpretation ofanomaly score can be avoided in future work The anomalyscore can be normalized and multiplied with the respectiveweights used as in the basic probability assignments

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] H-J Liao C-H Richard Lin Y-C Lin and K-Y TungldquoIntrusion detection system a comprehensive reviewrdquo Journalof Network and Computer Applications vol 36 no 1 pp 16ndash242013

[2] M H Bhuyan D K Bhattacharyya and J K Kalita ldquoNetworkanomaly detection methods systems and toolsrdquo IEEE Commu-nications Surveys amp Tutorials vol 16 no 1 pp 303ndash336 2014

[3] H G Kayacik A N Zincir-Heywood and M I HeywoodldquoSelecting features for intrusion detection A feature relevanceanalysis onKDD99 intrusion detection datasetsrdquo inProceedingsof the 3rd Annual Conference on Privacy Security and Trust(PST rsquo05) St Andrews Canada October 2005

[4] MWozniak M Grana and E Corchado ldquoA survey of multipleclassifier systems as hybrid systemsrdquo Information Fusion vol 16no 1 pp 3ndash17 2014

[5] C Thomas and N Balakrishnan ldquoImprovement in intrusiondetection with advances in sensor fusionrdquo IEEE Transactionson Information Forensics and Security vol 4 no 3 pp 542ndash5512009

[6] G Giacinto F Roli and L Didaci ldquoFusion ofmultiple classifiersfor intrusion detection in computer networksrdquo Pattern Recogni-tion Letters vol 24 no 12 pp 1795ndash1803 2003

[7] A Siraj R B Vaughn and S M Bridges ldquoIntrusion sensor datafusion in an intelligent intrusion detection system architecturerdquoin Proceedings of the Hawaii International Conference on SystemSciences pp 4437ndash4446 January 2004

[8] D Parikh and T Chen ldquoData fusion and cost minimization forintrusion detectionrdquo IEEE Transactions on Information Foren-sics and Security vol 3 no 3 pp 381ndash389 2008

[9] G Giacinto R Perdisci M Del Rio and F Roli ldquoIntrusiondetection in computer networks by a modular ensemble of one-class classifiersrdquo Information Fusion vol 9 no 1 pp 69ndash822008

[10] Y Li J Xia S Zhang J Yan X Ai and K Dai ldquoAn efficientintrusion detection system based on support vector machinesand gradually feature removal methodrdquo Expert Systems withApplications vol 39 no 1 pp 424ndash430 2012

[11] A Sung and S Mukkamala ldquoIdentifying important features forintrusion detection using support vector machines and neuralnetworksrdquo in Proceedings of the Symposium on Applications andthe Internet (SAINT rsquo03) pp 209ndash216 Orlando Fla USA

[12] Y Li J-L Wang Z-H Tian T-B Lu and C Young ldquoBuildinglightweight intrusion detection system using wrapper-basedfeature selection mechanismsrdquo Computers and Security vol 28no 6 pp 466ndash475 2009

[13] S-J Horng M-Y Su Y-H Chen et al ldquoA novel intrusiondetection system based on hierarchical clustering and supportvector machinesrdquo Expert Systems with Applications vol 38 no1 pp 306ndash313 2011

[14] KDDCup dataset 2014 httpkddicsuciedudatabaseskddcup99kddcup99html

[15] WekaWaikato environment for knowledge analysis (weka) ver-sion 36 2014 httpwwwcswaikatoacnzmlweka

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



The rest of the paper is organized as follows Section 2enumerates related works The proposed methodologies areelaborately dealt with in Section 3 with the algorithms fortraining and testing multiple IDS Section 4 discusses theperformance evaluation of the experiments in detail with theresults Section 5 presents the sum-up of the study

2 Related Works

Thomas and Balakrishnan [5] have optimized the perfor-mance of IDS using fusion of multiple IDS The assignmentof weight for each IDS is outlined in this paper and theweights are aggregated to take a correct decision DARPA1999 data set is used to evaluate the IDSs which are out-dated It contains more redundant records and so it affectsclassifier accuracy In their method binary values are usedto decide attack or normal Giacinto et al [6] proposed apattern-recognition approach based on the fusion of multipleclassifiers for network intrusion detection It provides a bettertradeoff between generalization abilities and false alarmgeneration Unfortunately the performances of fusion ruleson unknown attacks show no improvement over the resultsof the individual networks that are obtained No fusion ruleprovides improvements on the performances of the neuralnetwork trained on the overall feature set that attains the sameperformance of oracle Siraj et al [7] proposed the DecisionEngine of an Intelligent Intrusion Detection System (IIDS)that fuses information from different intrusion detectionsensors using an artificial intelligence technique Like neuralnetworks it cannot do self-learning and self-trainingThere isno functionality for customizing the standard attack Parikhand Chen [8] proposed ensemble of classifiers to combinedata from various sources and reduce the cost of falsealarm DLEARNIN and DCMS algorithms are used for theabovementioned purpose In their paper sum and productrules are not used Outputs are not directly compared Gia-cinto et al [9] proposed an unsupervised anomaly-based IDSCombination of one-class classifiers is used in their work fordesigning each module with distinct features for trainingFor high values of false alarm rate the system gives lowdetection rate Li et al [10] constructed a compact data setby clustering redundant data into a compact one Featuresare reduced from 41 to 19 using clustering and the use ofant colony optimization improved the efficiency of intrusiondetectionThe combination of the critical features used in thismethod could not distinguish the attackers and normal usersSung andMukkamala [11] have removed one feature at a timeto carry out an experiment on SVM and neural networkKDDCuprsquo99 data set has been used to verify this techniqueFor five-class classification out of 41 features only 19 of themost significant features are used Li et al [12] proposeda wrapper-based feature selection algorithm to constructlightweight IDS They applied a modified Random MutationHill Climbing (RMHC) for search strategy and modified thelinear SVMfor valuation criterionThismethod speeds up theprocess of selecting features and gives a high detection rate forIDS Since the types of intruders are wider in nature in todayrsquosinformation era the scope for the designing of improved IDSis high motivating the proposed work

3 The Proposed System

31 Motivation With the advent of online business and thesocial network the genuineness of the information availablein the internet has become a question Many human androbot based intruders are playing in an aggressive manner togain advantages of the information Also the kind of attacksin the Internet is nondeterministic in nature making it verycomplex task to detect and react Most of the present daystand-alone intrusion detection systems are not capable ofachieving a reasonably high detection rate and low false alarmrate Most of the existing works on IDSs show distinct per-formance in detecting a certain class of attack with improvedaccuracywhile performingmoderately for the other classes ofattacks It has become possible to obtain a more reliable andaccurate decision for awider class of attacks by combining thedecisions of multiple intrusion detection systems

Nowadays the processors are working in an unimagin-able speed So combining multiple IDSs is not a big issue inthe computation point of view and best-of-breed solutionshave been achieved earlier A better analysis of existingdata gathered by various individual IDSs can detect manyattacks that currently go undetected From the literaturesurvey it is learnt that the usage of appropriate feature selec-tion techniques simplifies the models to make them easierto interpret shorter the training times and enhance thegeneralization by reducing overfitting The challenges indesigning and deploying IDS are increasing due to the widerreach of the Internet services and nonavailability of standardprocedure for characterizing the intruders

32 The Proposed System Architecture The anomaly-basedIDSs identify the abnormal unusual behaviors on a networkand tag them as attacks It does not need any specific knowl-edge The disadvantage of this method is that it producesmore number of false alarmsThe signature-based IDS is wellversed in detecting attacks that match a predefined patternand it produces very minimum number of false alarms andthe fusion of signature-based and anomaly-based techniquesis done for three main reasons First the false alarm rateshould be minimum and it is only possible in signature-based IDS Second any IDS has to identify new attacks andit is possible through anomaly-based techniques Third theidea is that every IDS is efficient in detecting specific typesof attack For example anomaly-based IDS is suitable fordetectingDOS andR2L type attacks and signature-based IDSis good for detecting U2R and PROBE which can be inferredfrom Table 6 The fusion of signature-based and anomaly-based techniques will be able to detect more attacks with lessfalse alarm rate The proposed system consists of a MultipleIDS Unit (MIU) which contains five IDS units following fivedifferent algorithms

The proposed system architecture is shown in Figure 1It contains three phases of work In the first phase featureselection is done with the help of information gain (IG) andgenetic algorithm (GA) There are totally 41 features presentin KDDCuprsquo99 data set Certain features are irrelevant or notneeded for the IDS


MIU


IDS-1

SVMAnomaly-based

IDS-2

IBKAnomaly-based

IDS-3

J48

Signature-based


IDS-5BayesNet

Signature-based

Decisionfusion unit





IG [119894] = IGR(119865)119894++

endfor




endfor





119894) which








1 1199102 119910

5from five


119894is labeled as 119910119910

1or 119910119910

2






Then119910119894= 1199101199101

Else119910119894= 1199101199102


2count

the number of 1199101199101and 119910119910

2

If (1199101199101gt 3)



2


Cate

goriz

er

IDS-1

Fusion unit

ATTACK

NOT

ATTACK

IDS-2

IDS-3

IDS-4

IDS-5

y1

y2

y3

y4

y5

yy1

yy1

yy1

yy2

yy2













119868 (1198781 1198782 119878119898) = minus119898

sum119894minus1(119878119894

119878) log2(119878119894

119878) (1)




119895contain 119878

119894119895


119864 (119865) =

V

sum119895minus1

1198781119895 + sdot sdot sdot + 119878119898119895

119878lowast 119868 (1198781119895 119878119898119895) (2)


IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)






11011100011110101100111001110110011010001













lowast 100


lowast 100


lowast 100

(4)











PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37


DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38


teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40


rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L


multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39


Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38




25192 100 22544 100 3600







5 Conclusion


















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




MIU


IDS-1

SVMAnomaly-based

IDS-2

IBKAnomaly-based

IDS-3

J48

Signature-based


IDS-5BayesNet

Signature-based

Decisionfusion unit





IG [119894] = IGR(119865)119894++

endfor




endfor





119894) which








1 1199102 119910

5from five


119894is labeled as 119910119910

1or 119910119910

2






Then119910119894= 1199101199101

Else119910119894= 1199101199102


2count

the number of 1199101199101and 119910119910

2

If (1199101199101gt 3)



2


Cate

goriz

er

IDS-1

Fusion unit

ATTACK

NOT

ATTACK

IDS-2

IDS-3

IDS-4

IDS-5

y1

y2

y3

y4

y5

yy1

yy1

yy1

yy2

yy2













119868 (1198781 1198782 119878119898) = minus119898

sum119894minus1(119878119894

119878) log2(119878119894

119878) (1)




119895contain 119878

119894119895


119864 (119865) =

V

sum119895minus1

1198781119895 + sdot sdot sdot + 119878119898119895

119878lowast 119868 (1198781119895 119878119898119895) (2)


IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)






11011100011110101100111001110110011010001













lowast 100


lowast 100


lowast 100

(4)











PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37


DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38


teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40


rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L


multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39


Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38




25192 100 22544 100 3600







5 Conclusion


















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






1 1199102 119910

5from five


119894is labeled as 119910119910

1or 119910119910

2






Then119910119894= 1199101199101

Else119910119894= 1199101199102


2count

the number of 1199101199101and 119910119910

2

If (1199101199101gt 3)



2


Cate

goriz

er

IDS-1

Fusion unit

ATTACK

NOT

ATTACK

IDS-2

IDS-3

IDS-4

IDS-5

y1

y2

y3

y4

y5

yy1

yy1

yy1

yy2

yy2













119868 (1198781 1198782 119878119898) = minus119898

sum119894minus1(119878119894

119878) log2(119878119894

119878) (1)




119895contain 119878

119894119895


119864 (119865) =

V

sum119895minus1

1198781119895 + sdot sdot sdot + 119878119898119895

119878lowast 119868 (1198781119895 119878119898119895) (2)


IGR = Gain (119865) = 119868 (1198781 119878119898) minus 119864 (119865) (3)






11011100011110101100111001110110011010001













lowast 100


lowast 100


lowast 100

(4)











PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37


DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38


teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40


rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L


multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39


Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38




25192 100 22544 100 3600







5 Conclusion


















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in







11011100011110101100111001110110011010001













lowast 100


lowast 100


lowast 100

(4)











PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37


DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38


teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40


rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L


multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39


Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38




25192 100 22544 100 3600







5 Conclusion


















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






PROBE

ipsweep 082 2 3 5 12 13 14 16 17 21 23 24 25 28 31 32 33 37 38nmap 027 1 2 3 5 18 21 22 28 29 31 32 34 35 36 37


DOS

back 038 1 2 4 5 6 10 11 12 13 15 17 18 21 22 23 26 27 28 30 31 34 35 37 41land 00009 1 2 3 4 7 13 18 25 29 35 38


teardrop 027 3 4 5 6 8 10 13 23 24 25 26 32 34 35 36 37 39 40


rootkit 00035 3 6 9 11 13 14 16 17 18 23 28 31 32 33 34 35 37 39 41

R2L


multihop 00024 3 4 10 12 13 14 16 17 18 19 22 26 27 30 35 37phf 00021 3 4 6 8 9 10 13 14 19 28 29 36spy 00003 2 3 4 5 9 15 18 22 16 39


Normal 1196 1 2 3 4 5 6 7 15 23 24 14 15 19 20 21 23 25 26 27 28 30 32 33 34 36 37 38




25192 100 22544 100 3600







5 Conclusion


















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



















DOS 64 99 3650 1PROBE 76 99 2432 1U2R 92 98 810 138R2L 64 99 3584 1

100

0102030405060708090

Det

ectio

n ra

te (

)

DOS PROBE U2R R2L


Attack types



40

35

30

25

20

15

10

5

0DOS PROBE U2R R2L


Attack types

False

alar

m ra

te







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in







References























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in










Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



Date post:	13-Mar-2018
Category:	Documents
Upload:	dothu
View:	213 times
Download:	0 times

Research Article Fusion of Heterogeneous Intrusion...

Documents