+ All Categories
Home > Documents > COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM...

COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM...

Date post: 14-Sep-2019
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
25
COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM AND ARTIFICIAL IMMUNE NETWORK ALGORITHM IN INTRUSION DETECTION SYSTEM AHMED A. A. ABUNADA A Project report submitted in partial fulfillment of the requirements for the award of the degree of Master of Science Computer (Information Security) Faculty of Computing Universiti Teknologi Malaysia JANUARY 2013
Transcript
Page 1: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM AND

ARTIFICIAL IMMUNE NETWORK ALGORITHM

IN INTRUSION DETECTION SYSTEM

AHMED A. A. ABUNADA

A Project report submitted in partial fulfillment of the

requirements for the award of the degree of

Master of Science Computer (Information Security)

Faculty of Computing

Universiti Teknologi Malaysia

JANUARY 2013

Page 2: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

iii

I dedicate this project to my respected and beloved my mother ‘Ibtesam Saed

Al-khateb’ and My father ‘Abdurabou Ahmed Abunada’, thank you for the

moral and financial support you have given me throughout my academic life.

To my respected supervisor, Dr. Anazida Zainal

To my beloved country, Palestine, Gaza

To all my brothers and sisters

To all my friends

Page 3: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

iv

ACKNOWLEDGEMENT

First and foremost, all praise and thanks are due to Allah, and peace and

blessings be upon his Messenger, Mohammed (Peace Be Upon Him). Next, I wish to

express my sincere appreciation to my main supervisor, Dr. Mrs. Anazida Zainal,

for encouragement, guidance, critics, and friendship.

.

Page 4: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

v

ABSTRACT

Intrusion Detection Systems (IDS) is special software developed in order to

protect the system against security threats and malware. IDS provides second line of

defense after rule based firewall. Unfortunately IDS with supervised learning

approach heavily rely on labeled training data and generally it fails to detect novel

attacks and produces high false alarm. Besides, data labeling is expensive and time

consuming. However, a systematic method which offers the capability to alleviate

this problem is through the use of unsupervised approaches, which is the basis for

this research. In addition to that, to investigate this phenomenon, a comparison

between two clustering algorithms based on an anomaly detection system IDS is

proposed. Related literature has given a direction towards comparing two clustering

algorithm which are Artificial Immune Network (AIN) and Fuzzy c-means (FCM).

The performance of those two clustering algorithm were measured based on false

positive rate, false negative rate, hit rate and detection. This study has evaluated and

analyzed AIN and FCM clustering algorithms. The finding shows that AIN gives

higher overall accuracy and hit rate. It also gives lower false alarms on both datasets

used in the study. Consistent good performances of AIN in clustering network traffic

data into respective classes has made AIN a promising clustering technique to be of

used in detection novel attack traffic in IDS.

Page 5: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

vi

ABSTRAK

Sistem Pengesan Pencerobohan (IDS) adalah sebuah perisian khas

yang dibangunkan untuk melindungi sistem dari ancaman keselamatan dan juga

malware. IDS menyediakan pertahanan dua lapisan kepada firewall berasaskan

peraturan. Malangnya IDS dengan pendekatan seliaan pembelajaran tersangat

bergantung kepada data latihan berlabel dan umumnya ia gagal untuk mengesan

serangan-serangan dan juga menghasilkan penggera palsu. Selain itu, pelabelan data

adalah mahal dan memakan masa. Walaupun begitu, kaedah sistematik yang

menawarkan keupayaan untuk mengatasi masalah ini adalah dengan menggunakan

pendekatan tanpa pengawasan di mana adalah asas kepada penyelidikan ini.

Tambahan pula, untuk menyiasat fenomena ini, perbandingan di antara dua

kelompok algoritma berdasarkan sistem pengesanan anomali diusulkan. Penulisan

berkaitan memberikan tunjuk arah terhadap perbandingan dua kelompok algoritma

iaitu Artificial Immune Network (AIN) dan juga Fuzzy C-Means (FCM). Prestasi

kedua-dua kelompok algoritma ini telah diukur berdasarkan nisbah positif palsu,

nisbah negatif palsu, nisbah terkena ancaman dan juga pengesanan. Kajian ini telah

menilai dan telah menganalisa algoritma kelompok AIN dan juga FCM. Hasil dari

penemuan menunjukkan bahawa AIN mendapat ketepatan keseluruhan dan juga

nisbah terkena ancaman yang tinggi. Ia juga mendapat penggera palsu yang rendah

dalam kedua-dua set data yang digunakan dalam kajian ini. Prestasi baik AIN yang

konsisten di dalam mengkelompokan data-data trafik rangkaian ke dalam kelas

masing-masing telh membuat AIN teknik mengkelompokan yang

memberangsangkan untuk kegunaan pengesanan trafik baru di dalam IDS.

Page 6: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLAREATION

DEDICATION

ACKNOWLEDGMENT

ABSTRACT

ABSTRAK

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

LIST OF ABBREVIATION

ii

iii

iv

v

vi

vii

xi

xiii

xvi

1 INTRODUCTION 1

1.1 Introduction 1

1.2 Problem Background 3

1.3 Problem Statement 5

1.4 Purpose of study 5

1.5 Objective of study 5

1.6 Scope of study 6

1.7 Significant of Study 6

1.8 Organization of Report

7

2 LITERATURE REVIEW 8

2.1 Introduction 8

2.2 Intrusion Detection System 8

2.2.1 Host-based Intrusion Detection System 12

Page 7: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

viii

2.2.2 Network-based Intrusion Detection System 13

2.2.3 Types of attack 13

2.2.4 Misuse Intrusion Detection System 14

2.2.4.1 Snort 15

2.2.5 Anomaly Intrusion Detection 16

2.2.5.1 Architecture of Network-based Anomaly 17

2.2.5.2 Components of Anomaly Detection 18

2.2.5.3 Techniques used to develop

Anomaly Detection 19

2.3 K-Means Clustering 24

2.3.1 K-Means clustering in Intrusion Detection 24

2.4 Self-Organizing Maps 25

2.4.1 Self-Organizing algorithm 27

2.4.2 Goodness of SOM Network Architecture 28

2.5 Fundamentals of Human Immune System 29

2.6 Artificial Immune System 32

2.6.1 Clonal Selection 33

2.6.2 Negative Selection 35

2.6.3 Immune Network Theory 36

2.6.3.1 Algorithm of Immune Network 38

2.7 Fuzzy C-means Clustering 42

2.7.1 Adaptive Neuro-fuzzy Inference System 45

2.8 Related work 46

2.9 Summary

51

3 RESEARCH METHODOLOGY 52

3.1 Introduction 52

3.2 Research Framework 52

3.3 Research Design 55

3.3.1 Phase 1: Dataset Preparation 55

3.3.2 Phase 2: Developing Artificial Immune

Network using Matlab 56

3.3.3 Phase 3 Developing Fuzzy c-means algorithm 56

Page 8: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

ix

3.3.4 Phase 4 Evaluation and interpretation for

Fuzzy c-means and Artificial Immune

Network algorithms 56

3.4 KDD CUP 99 Dataset 58

3.4.1 Data Samples Preparation 62

3.5 Summary

65

4 IMPLEMENTATION AND TESTING

FUZZYC-MEANS 66

4.1 Introduction 66

4.2 Experimental Setup 66

4.3 Implementation of Fuzzy c-means 67

4.4 Results of Fuzzy c-means (FCM) 67

4.4.1 Fuzzy c-means Using Data Test 1 69

4.4.2 Fuzzy c-means Using Data Test 2 77

4.5 Discussion and Overall Results 82

4.6 Summary

83

5 IMPELEMENATION AND TESTING ARTIFICIAL

IMMUNE NETWORK 84

5.1 Introduction 84

5.2 Experimental Setup 85

5.3 Implementation of artificial Immune Network 85

5.4 Results of Artificial Immune Network 87

5.4.1 Artificial Immune Network Using Data Test 1 87

5.4.2 Artificial Immune Network Using Data Test 2 92

5.5 Discussion and Overall Results 97

5.6 Comparison between FCM and AIN 98

5.7 Summary

103

6 CONCLUSION AND FUTURE WORK 104

6.1 Introduction 104

6.2 Project Achievements and Challenges 104

6.3 Future Work 105

Page 9: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

x

6.4 Summary

106

REFERENCE 107

APPENDIX A

110

Page 10: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xi

LIST OF TABLES

TABLE NO. TITLE PAGE

2.1 Examples on supervised algorithms 20

2.2 Related studies 47

3.1 Formula used to calculate the performance 57

3.2 Variables for IDS data set 59

3.3 Best selected significant features obtained by

clustering algorithms in two different samples

of data. 61

The explanation of most 6 significant features 62

3.4

3.5 The distribution of the Normal and others attacks in

the data sample

63

3.6 The contain of sample Test1 63

3.7 The contain of sample Test2 64

4.1 Parameters used for FCM 67

4.2 Distribution of Data in Test 1 and test 2 sets. 68

4.3 Best few selected membership values

for Normal cluster 70

4.4 Accuracy of detection result using Test1-FCM 71

4.5 Confusion matrix for Test 1 datasets 72

4.6 Accuracy of detection result using Test2-FCM 77

4.7 Confusion matrix for Test 2 datasets 78

5.1 Best few selected Threshold 85

5.2 Distribution of Data in Test 1 and Test 2 sets. 87

5.3 Accuracy of detection result 87

5.4 Confusion matrix for Test 1 datasets using AIN 88

5.5 Accuracy of detection result using Test2 93

Page 11: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xii

5.6 Confusion matrix for Test 2 datasets 93

5.7(a) Comparison result between FCM and AIN using Test1 99

5.7 (b) Comparison result between FCM and AIN using Test2 100

5.8(a) Comparison of overall performance measures

for FCM and AIN using dataset Test 1 101

5.8(b) Comparison of overall performance measures

for FCM and AIN using dataset Test 2 102

Page 12: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xiii

LIST OF FIGURES

FIGURE NO. TITLE PAGE

2.1 Four possible outcomes of detection 9

2.2(a) Classifier of false positive and negative rate 10

2.2(b) Classifier of false positive and negative rate 11

2.2 A basic ROC graph 12

2.3 IDS taxonomy 17

2.4 Generic ANIDS functional architecture 24

2.5 K-means algorithm 28

2.6 SOM algorithm 29

2.7 SOM Network Architecture 31

2.8 The structure of immune system 31

2.9 B-cell and T-cell receptor for pattern recognition 34

2.10 Clonal Selection process 36

2.11 Immune Network theory 37

2.12 View on Idiotypic Immune Network 38

2.13 Algorithm of Immune Network 39

2.14(a) Dataset with three clusters 43

2.14(b) Corresponding network structure 54

2.15 Pseudo-code of Fuzzy c-Means 64

3.1 Research Framework 65

3.2 percentage of Sample Test1 68

3.3 percentage of Sample Test 2 69

4.1 Disabused of Test1 in percentage (%) 70

4.2 Membership of each data in Test 1 71

4.3 Three memberships for the detection accuracy 73

Page 13: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xiv

4.4 Normal detection Accuracy using Test1-FCM 74

4.5 Detection Accuracy of Probe Traffic using

Test1-FCM 75

4.6 Detection Accuracy of DoS Traffic using

Test1-FCM 75

4.7 Detection Accuracy of U2R Traffic using

Test1-FCM 75

4.8 Detection Accuracy of R2L Traffic using

Test1-FCM 76

4.9 Overall Accuracy using Test1-FCM 77

4.10 Detection Accuracy of Normal Traffic using

Test2-FCM 79

4.11 Detection Accuracy of Probe Traffic using

Test2-FCM 80

4.12 Detection Accuracy of DoS Traffic using

Test2-FCM 80

4.13 Detection Accuracy of U2R Traffic using

Test2-FCM 81

4.14 Detection accuracy of R2L Traffic using

Test2-FCM 81

4.15 Overall Accuracy using Test2-FCM 82

4.16 Result comparison between Test1 And Test 2 83

5.1 Trade-off between the final number of output

cells(N) and the suppression threshold S σ . 86

5.2 Classification Accuracy for Normal data

using Test1-AIN 89

5.3 Detection accuracy of Probe Traffic using

Test1-AIN 90

5.4 Detection accuracy of DoS Traffic using

Test1-AIN 90

5.5 Detection accuracy of U2R Traffic using

Test1-AIN 91

5.6 Detection Accuracy of R2L Traffic using

Test1-AIN

91

5.7 Overall Accuracy using Data Test1-AIN 92

5.8 Detection Accuracy of Normal Traffic using

Test2-AIN

94

5.9 Detection Accuracy of Probe traffic using

Test2-AIN 95

Page 14: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xv

5.10 Detection Accuracy of DoS Traffic using

Test2-AIN 95

5.11 Detection Accuracy of U2R Traffic using

Test2-AIN 96

5.12 Detection accuracy of R2L Traffic using

Test2-AIN 96

5.13 Overall Accuracy using Test2-AIN 97

5.14 Result comparison between Test1 And Test 2 89

Page 15: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

xvi

LIST OF ABBREVIATION

IDS - Intrusion detection system

AIN - Artificial Immune network

FCM - Fuzzy c-means

FP - False positive

FN - False Negative

HIDS - Host-based Intrusion detection system

NIDS - Network-based Intrusion detection system

ROC - Receiver Operating Characteristics

DoS - Denial of Service Attack

U2R - User to Root Attack

R2L - Remote to Local Attack

SOM - Self-Organizing Maps

aiNet Artificial Immune Network

Page 16: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

1

CHAPTER 1

INTRODUCTION

1.1 Introduction

Because of the growing utilization of computer networks in lots of facets of

our existence, the amount of weaknesses is also growing leading to the network

assets not available and split up the machine discretion, integrity and availability.

Makes use of pose a significant security threat for that stability and also the security

of knowledge within the network atmosphere. According to Qasem (2010), network

invasion attack involves an array of activities. It offers trying to destabilize the

network, attaining unauthorized use of files with rights, or mishandling and misusing

of software.

Intrusion Detection Systems (IDS’s) are security tools, like other measures

for example anti-virus programs, firewalls, and access control schemes, usually are

meant to strengthen the safety of knowledge and communication systems (Garcia-

Teodoro et al., 2009). An Intrusion Detection System is a vital element of the

computer and knowledge security framework between normal activities from the

system and actions that may be considered intrusive.

The objective of IDS would be to identify unauthorized use or accessibility

computer or network in the outdoors atmosphere by individuals who do not possess

the authority or access privileges to such systems. The primary purpose of Intrusion

Page 17: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

2

Detection would be to develop a system that may instantly scan the network activity

and identify such invasion attacks (Qasem, 2010). An IDS can be used to identify

several kinds of malicious actions that may compromise the safety and trust of the

computer or network. Including network attacks against vulnerable services, data

driven attacks on programs, host based attacks for example privilege escalation,

unauthorized logins and access to sensitive files, and malware.

There are two main intrusion detection approaches, misuse intrusion detection

system and anomaly intrusion detection system exist. The Misuse intrusion detection

is based on attack signatures, the detailed description of the sequence of actions

performed by the attacker. This approach provides the platform which allows the

detection of intrusions perfectly matching the signatures. On the other hand, the

misuse detection recognizes known attack patterns and uses well-defined patterns of

the attack.

The anomaly detection concentrates on the unusual activities of designs and

uses the standard behavior designs to recognize an intrusion. Many researchers

mentioned that the anomaly intrusion detection can solve the issues that misuse

detection cannot solve. Garcia-Teodoro et al., (2009) highlighted that the primary

advantage of anomaly detection approach is its ability to identify or detect previously

unknown intrusions. Panda and Patra (2005), mentioned that, it's a must have way of

discovering makes use of once the training information is unlabeled too for

discovering unknown kinds of makes use of. They further stated that, the technique

that satisfies this require is the anomaly detection and without supervision approach.

Meanwhile IDS’s can also be categorized according to the host system into two

types:

1. Host-based IDS (HIDS)

2. Network-based IDS(NIDS)

Page 18: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

3

The host-based IDS operates at the host level and monitors a single host machine

using the audit trails of the host operating system, whereas network-based IDS

operates at the network level and monitors any number of hosts on the network.

1.2 Problem Background

By dealing with very large number of data over networks, it is difficult to

classify them manually to detect possible intrusions. Labeled data could be acquired

by simulating intrusions, but this really is restricted to the group of known attacks

and can neglect to address new kinds of attacks that could occur later on.

Consequently of the limited ability in discovering unknown attacks, the recognition

product is not efficient in acquiring the network data (Qasem, 2010). Therefore, a

procedure for discovering makes use of once the information is unlabeled is required,

in addition to discovering new and unknown kinds of attacks.

Anomaly detection algorithms hold the advantage that they may identify new

types of intrusions as diversions from normal usage, Leon et at., (2004). Going by

this problem and given some normal data to train from, and given a totally new bit of

test data, invasion recognition formula is always to decide if test data take part in

“normal” order to be able to detect an anomalous behavior. Referring to this issue as

supervised anomaly detection because the models are produced only using the

normal behavior across the network. In comparison, without supervision anomaly

detection attempts to recognize anomalous behavior without needing any

understanding regarding the training data. However, both kinds of anomaly detection

schemes are stricken by maximum false alarms.

Qasem (2010) maintains that, in many conditions, labeled information is

unavailable and the time is right consuming and incredibly costly to label the

information by hand. Meanwhile, there's always an engaged alternation in normal

traffic designs as well as constantly emerging of novel attacks each one of these

Page 19: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

4

problems result in the supervised approach not practical solution for IDS. To resolve

these complaints, scientists proceed to focus on without supervision approach, for

example clustering because this without supervision approach doesn't rely on the

labeled data and it doesn't consume us just as much time as supervised needs.

Without a doubt, clustering will work for new novel attacks. Various without

supervision techniques happen to be suggested however the recognition rate of IDS

is quite insufficient compared to supervised approaches.

Bace and Mell (2001) suggest that, to manage to identify novel attacks,

anomaly-based Intrusion detection systems was recommended. The job starting with

modeling a range of normal or valid behavior, especially when the observed behavior

diverges from this model, then an anomaly is elevated. However, anomaly-based

IDSs are more likely to false positives that may be triggered by novel, but non-

malicious traffic, as it is difficult to make a model connected wonderful possible

normal traffic. These false positives generally are a considerable hindrance to

effective operators monitoring the NIDS, consequently of occasions wasted in

considering them. Single Percent false positive rate might trigger huge amounts of

bogus alerts particularly when run on the large volumes of traffic common in current

systems. This, according to Axelsson (2000), is known as the base rate fallacy.

Nonetheless, anomaly-based approach has remained an active part of research

interest and is the main focus of this research also.

Artificial immune system technique was introduced in late 90’s and it received a

lot of attention from researchers. The ability of immune system to protect human

body were adopted many algorithms such as Negative selection, Clonal selection and

immune network. Applications Artificial Immune system include that of computer

and internet security, network intrusion detection and computer viruses.

Immune network which is a clustering technique was founded by Jerne’s

idiotypic network theory (Jerne1974), which suggests that the immune systems looks

after a network of interconnected B-cells. In artificial immune network (AIN)

models, a B-cell population includes two sub-populations: the very first population

Page 20: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

5

as well as the cloned population. The very first set is created in the subset of raw

training data to create the B-cell network.

1.3 Problem Statement

Supervised techniques do suffer low detection accuracy and high false alarm

especially when dealing with novel attacks. Besides, labeling network traffic

instances is expensive and time consuming. Furthermore, a supervised technique can

be obsolete, especially when network traffic is dynamic. It warrants an updating of

reference model. Therefore, clustering often seen to be a better solution as it can deal

with changes.

1.4 Purpose of Study

In this research the performance of Fuzzy c-means algorithm (FCM) as well

as Artificial Immune Network algorithm(AIN) will be compared in terms of detection

accuracy , false alarms and hit rate. At the end of this comparison, an analysis of their

performances will be discussed and the algorithm that shows better performance will

be highlighted and recommended.

1.5 Objectives of Study

This research has the following objectives:

i. To study and investigate performance of Fuzzy c-Means algorithm in IDS .

ii. To study and investigate performance of Immune Network algorithm in IDS.

Page 21: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

6

iii. To compare the performance of Fuzzy c-Means algorithm and Immune

network algorithm in IDS.

1.6 Scope of Study

The scope of project is listed below:

i. Two clustering technique (Fuzzy c-means) and (Artificial Immune network)

will be used in this study.

ii. The data used in this study is from KDD Cup 1999 Intrusion Detection

dataset.

iii. The study intends to use two datasets which will comprise of 5,092 and

6,890 samples in order to retain actual distribution of KDD Cup 1999 data.

iv. Matlab will be used to code Fuzzy c-means (FCM), and Artificial Immune

Network (AIN) algorithms.

v. The classification will be based on five classes which are Normal , Probe,

DoS,U2R and R2L. as in works of (Abraham and Grosan, 2006, Zainal et al.,

2009), (Dutta, 2009)

vi. Performance will be evaluated based on detection accuracy , False positive

rate , False negative rate and Hit rate.

1.7 Significant of Study

This study evaluates the performance of two algorithms: Artificial Immune

Network clustering and Fuzzy c-means algorithm for the network-based IDS in terms

of detection accuracy, and false alarms by studying each one and investigate them to

show which one is more suitable to be used in IDS.

Page 22: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

7

1.8 Organization of Report

The thesis consists of 6 chapters. Chapter one describes the introduction,

background of the study, research objectives and questions, the scope of the study

and its primary objectives. The second chapter reviews available and related

literature on Intrusion detection Systems, Artificial Immune Network, supervised and

without supervision, Fuzzy c-means and clustering approaches. Chapter three

describes the study methodology along with the appropriate framework for the study.

The 4th chapter provides the results and analysis of the findings of the first

algorithm which is Fuzzy c-means (FCM). The 5th chapter provides the results and

analysis of the findings of the second algorithm which is Artificial Immune Network

(AIN) and the evaluation with Fuzzy c-means based on the detection Accuracy ,

False alarms and the Hit rate. Lastly, chapter 6 covers the conclusion and the future

works

Page 23: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

107

REFERENCES

Abraham, A.and Grosan, C. (2006). Evolving intrusion detection systems. Genetic

Systems Programming, 57-79.

Abraham, A., Grosan, C.and Ramos, V. (2006). Swarm intelligence in data mining,

Springer-Verlag New York Inc.

Aickelin, U., Greensmith, J.and Twycross, J. (2004). Immune system approaches to

intrusion detection–a review. Artificial Immune Systems, 316-329.

Amasyali, F.and Albayrak, S. (2003). Fuzzy c-means clustering on medical

diagnostic systems..

Budayan, C., Dikmen, I.and Birgonul, M. T. (2009). Comparing the performance of

traditional cluster analysis, self-organizing maps and fuzzy C-means method

for strategic grouping. Expert Systems with Applications, 36, 11772-11781.

Chandola, V., Banerjee, A.and Kumar, V. (2009). Anomaly detection: A survey.

ACM Computing Surveys (CSUR), 41, 15.

Chimphlee, W., Abdullah, A. H., Sap, M. N. M., Chimphlee, S.and Srinoy, S.

(2005). Unsupervised clustering methods for identifying rare events in

anomaly detection. a a, 2, 1.

Davis, J. J.and Clark, A. J. (2011). Data preprocessing for anomaly based network

intrusion detection: A review. Computers & Security.

Dutta, A. (2009). Fuzzy c-Means Classification of Multispectral Data Incorporating

Spatial Contextual Information by using Markov Random Field. M. Sc

Thesis, GFM, IIRS-ITC JEP.

Jiang, S. Y., Song, X., Wang, H., Han, J. J.and Li, Q. H. (2006). A clustering-based

method for unsupervised intrusion detections. Pattern Recognition Letters,

27, 802-810.

Kiang, M. Y.and Fisher, D. M. (2008). Selecting the right MBA schools–An

application of self-organizing map networks. Expert Systems with

Applications, 35, 946-955.

Page 24: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

108

Kim, J.and Bentley, P. J. (2001). An evaluation of negative selection in an artificial

immune system for network intrusion detection.. San Francisco, California,

USA, 1330-1337.

Laskov, P., Däussel, P., Schäafer, C.and And Rieck, K. ( 2005). Learnin intrusion

detection:supervised or unsupervised?. Lecture Notes in Computer Science,

Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., Mcclung, D.,

Weber, D., Webster, S. E., Wyschogrod, D.and Cunningham, R. K. (2000).

Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion

detection evaluation.. IEEE, 12-26 vol. 2.

Mehrotra, K., Mohan, C. K.and Ranka, S. (1997). Elements of artificial neural

networks, the MIT Press.

Nezakatolhoseini, M., Jabbehdari, S.and Pourmina, M. A. (2012). Analysis and

Performance Evaluation of Application Specific Processors for Network-

Based Intrusion Detection Systems. Advances in Computing and Information

Technology, 61-70.

Panda, M.and Patra, M. (2005). SOME Clustering Agorithm to enhance the

performance of network intrusion detection system. Journal of Theoretical

and Applied Information Technology.

Qasem, M. a. R. (2010). Anomaly intrusion detection system using immune network

with reduced network traffic features. Universiti Teknologi Malaysia, Faculty

of Computer Science and Information Systems.

Tavallaee, M., Bagheri, E., Lu, W.and Ghorbani, A. A. (2009). A detailed analysis of

the kdd cup 99 data set.

Velmurugan, T.and Santhanam, T. (2010). Computational complexity between k-

means and k-medoids clustering algorithms for normal and uniform

distributions of data points. Journal of Computer Science, 6, 363-368.

Yasami, Y., Khorsandi, S., Mozaffari, S.and Jalalian, A. (Year). An unsupervised

network anomaly detection approach by k-Means clustering & ID3

algorithms. In, 2008. IEEE, 398-403.

Zainal, A., Maarof, M. A.and Shamduddin, S. (2006). "Feature selection using rough

set in intrusion detection”, in Proc. IEEE TENCON, p.4.

Page 25: COMPARATIVE STUDY BETWEEN FUZZY C-MEANS ALGORITHM …eprints.utm.my/id/eprint/33102/5/AhmedAAAbunadaMFSKSM2013.pdfyang dibangunkan untuk melindungi sistem dari ancaman keselamatan

109

Zainal, A., Maarof, M. A.and Shamsuddin, S. M. (2009). Ensemble classifiers for

network intrusion detection system. Journal of Information Assurance and

Security, 4, 217-225.


Recommended