CHAPTER 5 DESIGN OF CONTEXT FREE GRAMMAR FOR PACKET...

70

CHAPTER 5

DESIGN OF CONTEXT FREE GRAMMAR FOR

PACKET DATA ANALYSIS

The proposed method in this chapter illustrates a rule based

approach, used to classify the network packet data into either normal or

attack. The interrelationships among the features of packet data or network

traffic are used to classify the invasion and normal packet data.

The combination of rules that distinguish the normal and abnormal

behavior patterns is proposed in this chapter. The generated rules have high

ranking accuracy of prediction and high levels of trust in the system and are

also effectively applied several times in observed data. The generated rules

incorporate temporal relationships among events. A set of rules have

originated inductively by viewing user activities and so better sensitivity to

detection of breaching that takes place is assured. The limitation of this

methodology is that the unrecognized patterns of activities can not be found

as anomalous due to mismatching of any rule.

5.1 NEED FOR RULES GENERATION

Since the features are a mixture of categorical and continuous, the

statistical methods are not well suited and rule based filters are more

appropriate.

Cohen (1995) suggested a classification method called Repeated

Incremental Pruning to Produce Error Reduction (RIPPER), a rule learning

71

technique which is an optimized version of IREP (Incremental Reduced Error

Pruning), proposed by Furnkranz and Widmer used to formulate a set of ‘if-

then’ rules from data. Separate-and-conquer rule learning and Reduced Error

Pruning (REP) have been combined in IREP. RIPPER is the tool that is

capable of formulating a complex and threshold based decision. Separate-and-

conquer are a sequential-covering rule learning algorithm in which the

training data is separated into a growing set and a pruning set. Then, the

algorithm generates a rule set at a time and eliminates all examples covered

by the new rule. So, REP is a successful method used to reduce the tree

learned in decision tree learning system and can be simply adapted to rule

learning system. While creating a rule, RIPPER seeks the most important rule

for the present growing set in rule space. After a rule is extracted on growing

set, it is pruned on pruning set and the related rule in the training set (growing

and pruning sets) are deleted. The left over training data is again separated

until the conditions are satisfied.

The RIPPER indicated that for a set of records, one of the features

is the class label which is used for classification, and the classification

algorithms can discriminate the feature values to describe each concept. The

RIPPER rules contain a combination of attribute values (conditions) and class

label (consequence) and can be deployed on IDS. The data mining programs

would be applied to study rules that perfectly identify the activities of normal

and intrusion. RIPPER is a part of the DARPA assessment and the authors

make clear statement for its performance. The authors concurred that in order

to detect any other new intrusion, anomaly detection has to be employed and

in general, it does not create effective signatures in signature detection tool.

Observation of any traffic pattern can help in deciding whether the

traffic category is normal or attack. In the same manner, it is observed that the

ground truth IDS data base KDD Cup99 follows certain behavioral patterns

72

with categorical and continuous features. Any intruder can be observed with

relevance to their behavior. If one can characterize the behavior of a normal

user and an abnormal user, a set of rules can be formulated for them. In this

chapter, the IDS data set is analyzed and a set of rules for both normal and

attack packet data are designed. Though these rules are explicitly tested on the

ground truth data set (KDD Cup99) for IDS, they can easily adapt to the new

environment.

The maximum challenges take place when thresholds are described

for the rules and are empirical in environment (Maheshkumar Sabhnani and

Gursel Serpen 2003). They proposed the generation of heuristic rules by

analyzing the signatures of attacks in the R2L class. But, insignificant

thresholds for rules outcomes make low detection for the targeted attacks. The

data or signature based heuristic rules can detect attacks in real time as they

typically require very modest executing moment. So, the misuse detection

performance is maximum, if specific and comprehensive heuristic rules

for a specific application is defined. The creation of rules employ the

signature analysis to define a set of features and automated techniques to

establish different threshold values that will be required to form heuristic

rules. They have studied the warezmaster and warezclient attacks in terms of

signatures, services used and significant features presented in the KDD cup99

dataset. In addition, they have narrated how the features and thresholds are

located for different rules to detect warezmaster and warezclient attacks.

Furthermore, the signature based heuristic rules would not be influenced by

noise in the training dataset and are used to detect the attacks with low false

alarm rates.

Gunes Kayacik et al (2005) described the details of the most

discriminative class label for each feature using information gain procedure.

They suggested that majority of the features between 31 and 41 are mostly

73

used to discriminate normal and some attacks accurately. Moreover, 9

features contributed modest to intrusion detection and features 20 and 21 have

not taken any part in intrusion detection.

Heuristic rules that have been generated and discussed using

RIPPER are used to detect only remote-to-local (R2L) attack (Maheshkumar

Sabhnani and Gursel Serpen 2003).

The novelty of the proposed grammar method is written at two

levels (Raw Filter and Micro Filter) for enhancing the performance of IDS.

Also, in the proposed work the computational complexity is less since they

require simple rules.

5.2 NORMAL AND ATTACK PACKET CATEGORIZATION

The normal and attack context free grammar methods have been

proposed to model and analyze the network sequences. The grammars are

excellent to articulate the interdependency of vulnerabilities and are

specifically desirable for the IDS alert. The logical rule of context free

grammar makes it more comprehensive. The false positives and false

negatives can be reduced by grammar examination as well. Basically, the

fixed character of grammar is used for easier recognition and inspection.

Also, the algorithmic complexity of the grammar is linear to the length of the

grammar (Yinqian Zhang et al 2008).

Initially, in the context free grammar method, the normal and attack

grammar have been created off line to facilitate IDS alerts. Then the grammar

would be loaded to a proposed Multi Stage Filter and also the traffic data

would be analyzed by applying the grammar. It is possible to construct new

grammars for any type of network environment and subsequently low false

alarms can be achieved in due course. A variation in the grammars depend

74

only on the features of any traffic data which changes the size of the grammar

and hence the algorithm complexity is O(n), where n is the length of the

grammar. The constructed rules are the most appropriate for the given dataset

and are sophisticated to identify new records or for adding new features. In

signature based rules, the significant features can be extracted by reviewing

the attack dynamics.

The C4.5 is a decision tree algorithm (Ron Kohavi and Ross

Quinlan 1999), which is used to classify data with common attributes and

each decision tree signifies a rule which classifies data according to the

attributes. A node of a decision tree identifies an attribute by which the data is

to be divided. The labeled edges of each node has a possible value of the

attribute in the parent node and the edge is used to connect either two nodes

or a leaf node and the leaves are labeled with a decision value for

classification of the data. The C4.5 rules can be used in misuse detection also.

The tree has been properly pruned by the algorithm and the table has been

created. Finally, appropriate rules would be formed for the normal and attack

patterns.

The feature ‘flag’ (labeled as v4 in Table 3.2 with 11 different

values) has been chosen as the most perceptive feature for classification to

find the normal packet in the proposed method. The association of the ‘flag’

and ‘service’ (labeled as v3 in Table 3.2 with 70 different values) features is

used for classifying the normal data from the rest of the data, which is called

as Normal Signature Association (NSA). All association events for normal

data are tabulated in Table 5.1. For example, the association of the feature

‘flag’ with the value ‘SH’ and the feature ‘service’ with the value ‘time’ treats

the data as ‘normal’.

75

Table 5.1 Normal Signature Association

Flag (f) Services (s)

RSTR finger,telnet,http,smtp,X11,ssh

RSTO finger,telnet,shell,pop_3, time, http, smtp, auth, IRC, ftp

OTH ftp_data, telnet, http, smtp, ftp

REJ private,urh_i,other,http,smtp,auth,IRC,X11

S0 domain, finger, telnet, time, http

S1 ftp_data, private, finger, telnet ,shell, http, smtp, IRC ,ftp

S2 ftp_data, other, http, smtp, ftp

S3 ftp_data, finger, telnet, time, http, smtp, IRC

SF csnet_ns, domain_u, ftp_data, private, tftp_u, domain, finger,

telnet, shell, eco_i, ecr_i, red_i, pop_3, tim_i, urh_i, urp_i,

ntp_u, other, time, http, smtp, auth, IRC, X11, ssh, efs, ftp

SH time

Likewise, based on the description and signature of various attacks,

different intrusion classification rules have been developed for each attack

called Intrusion Signature Rule (ISR). Table 5.2 shows ISR for all attacks

which is based on their description and signatures. These rules select

appropriate discriminating features and some values of selective features for

classification. The notations used are for AND function and for OR

function respectively.

76

Table 5.2 Intrusion Signature Rule

Back -DoS

The number of front slashes in the URL sent could be varied. The

attackers may intentionally add large number of front slashes in his/her script

causing the CPU power in the HTTP server to be under utilized. However, the

system recovers automatically when the attack stops.

Thus the rule for this ‘back’ attack can be coined as:

If the requested ‘service’ is ‘http’ and the ‘src_bytes’ is greater than a

threshold (typically 5000) occur. However the selection of the threshold is

subjective to the working environment.

Formulated rule of ‘back’ attack:

Back signature: (service == ‘http’) (src_bytes > 5000)

Smurf-DoS

In a broadcasting subnet environment, the attacker sends ICMP echo

request packets to the broadcast address of countless subnets with the source

address spoofed to be that of the planned victim. Here, the participants are the

attacker, the mediator and the victim (the mediator may be a victim). Several

nodes that listen on these subnets will respond by sending ICMP echo reply

packets to the victim.

Thus, the rule for this ‘smurf’ attack can be coined as:

If the requested ‘service’ is ‘ecr_i’ and the threshold value of feature

‘src_bytes’ is equal to 1032. The threshold should be selected based on the

minimum frame size of the subnet (LAN) under considerations.

Formulated rule of ‘smurf’ attack:

Smurf signature:(service==‘ecr_i’) (src_bytes==1032)

77

Table 5.2 (Continued)

Teardrop-DoS

The teardrop attack ensues due to a bug in TCP/IP fragmentation by

inserting false offset information into fragmented packets. During reassembly,

there are outsized payloads or overlapping fragments that can cause the

system to be crash.

Thus, the rule for this ‘teardrop’ attack can be coined as:

If the requested ‘service’ is ‘private’ and the threshold value of feature

‘src_bytes’ is equal to 28 whose value has been confused size.

Formulated rule of ‘teardrop’ attack:

Teardrop signature: (service==’private’) (src_bytes==28)

Neptune- DoS

In neptune attack, the attacker frequently sends abundance of

TCP/SYN packets with a fake sender address. The server initiated a half-open

connection by sending back a TCP/SYN-ACK packet for each packet and

waiting for a packet reply. The legitimate requests can not be connected with

server due to the saturation of half-open connections.

Thus, the rule for this ‘neptune’ attack can be coined as:

If the connection is not from/to the same host/port (land value is ‘0’)

with one of the flag RSTO, REJ and S0.

Formulated rule of ‘neptune’ attack:

neptune_signature : (land == 0) ((flag == RSTO) (flag == REJ)

(flag == S0))

78


Pod- DoS

In pod attack, the malformed ping packets have been sent to the victim

machine that leads to crash it.

Thus, the rule for this ‘pod’ attack can be coined as:

If the requested ‘service’ is ‘ecr_i’ or ’tim_i’ and the corresponding

malformed ping packets (threshold value of feature ‘src_bytes’) is equal to

1480 or 564.28 leads the crash of the system.

Formulated rule of ‘pod’ attack:

pod_signature: ((service == ‘ecr_i’) (src_bytes ==1480))

((service == ’tim_i’) (src_bytes= =564))

Land- DoS

In the land attack, an attacker sends a spoofed SYN packet in which

the source address and the destination address are identical one.

Thus, the rule for this ‘land’ attack can be coined as:

If the connection is from/to the same host/port (land value is ‘1’) with

one of the flag RSTO, REJ and S0.

Formulated rule of ‘land’ attack:

land_signature: (land == 1) ((flag == RSTO) (flag == REJ)

(flag == S0)),

79


Buffer_overflow-U2R

A buffer_overflow occurs when a program is provided more data for

input than the allocated size of information.

Thus, the rule for this ‘buffer_overflow’ attack can be coined as:

If the requested ‘service’ is ==‘telnet’ or ‘ftp_data’ during ‘SF’ flag

and the threshold value of feature ‘dst_host_same_srv_rate’ is 1 with

‘dst_host_diff_srv_rate’ is 0.

Formulated rule of ‘buffer_overflow’ attack:

cond_1: ((service==‘telnet’) (service== ‘ftp_data’) ) (flag==SF)

cond_2: (dst_host_same_srv_rate==1) (dst_host_diff_srv_rate==0)

buffer_overflow_signature: (cond_1 cond_2)

Loadmodule-U2R

The loadmodule attack which resets Internal Field Separator (IFS) for a

normal user and unauthorized users can gain root access on the local machine

Thus, the rule for this ‘loadmodule’ attack can be coined as:

If the requested ‘service’ is ==‘telnet’ or ‘ftp_data’ with the threshold

value of features ‘diff_srv_rate’ , ‘dst_host_same_src_port_rate’ ,

‘dst_host_srv_diff_host_rate’ and ‘dst_host_same_srv_rate’ .

Formulated rule of ‘loadmodule’ attack:

cond_1:(service==‘telnet’) (dst_host_same_srv_rate>=.6)

(dst_host_same_src_port_rate>=.15) (dst_host_srv_diff_host_rate<=.3)

cond_2:(service==‘ftp_data‘) (diff_srv_rate~=0)

(dst_host_same_srv_rate==1) (dst_host_srv_diff_host_rate>=1),

loadmodule_signature: (cond_1 cond_2 )

80


Rootkit-U2R

A rootkit is a collection of programs in which an attacker maintains

access to a machine once it has been compromised. The user installs one or

more components of a rootkit (in the way of sniffer, versions of login, su, and

other programs with backdoors) to access.

Thus, the rule for this ‘rootkit’ attack can be coined as:

If the requested ‘service’ is ‘telnet’ and threshold value of feature

‘dst_host_same_src_port_rate’ (percentage of connections from the port

services to the destination host) is equal to 0.

Formulated rule of ‘rootkit’ attack

rootkit_signature:(service==‘telnet’) (dst_host_same_src_port_rate==0),

Perl-U2R

In perl attack, the user id to root in a perl script is set and a root

shell will be created.

Thus, the rule for this ‘perl’ attack can be coined as:

If the requested ‘service’ is ==‘telnet’ and the threshold value of

features ‘num_root’ is 2 with ‘num_file_creations’is 2.

Formulated rule of ‘perl’ attack:

perl_signature: (service==‘telnet’) (num_root==2)

(num_file_creations==2)

81


Ftp_write- R2L

In ftp_write attack, the writable FTP directories would be detected and

acquires local login and used for unauthorized or illegal operations like write

files that could provide access to the computer.

Thus, the rule for this ‘ftp_write’ attack can be coined as:

If the requested ‘service’ is ‘login’ or ‘ftp’ or ‘ftp_data’ and the corresponding

threshold value of ‘dst_host_same_src_port_rate’(percentage of connections

from the port services to the destination host) feature.

Formulated rule of ‘ftp_write’ attack

cond_1:(service==’login’) (dst_host_same_src_port_rate>=.5),

cond_2:(service==’ftp’) (dst_host_same_src_port_rate>=1) ,

cond_3:(service==’ftp_data’) (dst_host_same_src_port_rate>=1),

ftp_write_signature: (cond_1 cond_2 cond_3)

Guess_passwd-R2L

The attacker attempts to guess the password via telnet for guest

account.

Thus, the rule for this ‘guess_passwd’ attack can be coined as:

If the requested ‘service’ is ‘telnet’ and the ‘src_bytes’ is less than a

threshold (typically 128) occur. However the selection of the threshold is

subjective to the working environment with some threshold value of service

error rate.

Formulated rule of ‘guess_passwd’ attack

guess_passwd_signature:(service==‘telnet’) (src_bytes<=128)

(dst_host_serror_rate~=0) (dst_host_srv_serror_rate~=0)

82


Imap-R2L

The Imap attack makes use of a buffer overflow in the Imap Linux

server. The remote attackers’ uses root privileges and access mail folders and

establish several file handling on behalf of the user logging in. The buffer

overflow exists after login even though the privileges are discarded.

Thus, the rule for this ‘Imap’ attack can be coined as:

If the requested ‘service’ is ‘imap4’ and the corresponding threshold

value of feature ‘count’ more than zero. The non-zero value of count indicates

that the attacker handing some file operations in imap server.

Formulated rule of ‘imap ’ attack

imap_signature: (service== ‘imap4’) (count>0)

Multihop-R2L

In multihop attack, the initial attack traffics that going into or coming

out of the network router will be used to discover further attacks and would be

a more efficient in undetected Denial of Service.

Thus, the rule for this ‘multihop’ attack can be coined as:

If the requested ‘service’ is ‘ftp’ or ‘telnet’ and flag is ‘SF’ with the

threshold value of feature dst_host_same_src_port_rate.

Formulated rule of ‘multihop’ attack

multihop_signature:((service==’ftp_data’) (service==‘telnet’)) (flag== SF)

( dst_host_same_src_port_rate>=1).

83


Phf-R2L

The Phf attack exploits a poorly created CGI script that permits a client

to execute commands with the authorization level of the http server by issuing

an http request.

Thus, the rule for this ‘phf’ attack can be coined as:

If the requested ‘service’ is ‘http’ and threshold value of features

‘src_bytes’ is less than 64 with ‘dst_bytes’ is greater than 4096.

Formulated rule of ‘phf’ attack

Phf_signature: (service==‘http’) (src_bytes<64) (dst_bytes>4096)

Spy-R2L

In the spy attack, an attacker breaks into a compromised machine

several times to collect important information and seeking secret data files or

reading user’s personal mail and also avoid the possibility of detection.

Thus, the rule for this ‘spy’ attack can be coined as:

If the requested ‘service’ is ‘telnet’ and threshold value of feature

‘dst_host_serror_rate’ (percentage of connections that have ‘SYN’ errors to

same host to the destination host) is .22.

Formulated rule of ‘spy’ attack

Spy_signature: (service==‘telnet’) (dst_host_serror_rate==.22)

84


Warzclient –R2L

In warezclient attack, any authorized user during an FTP

connection download the illegal “warez” software that has been placed

previously through a warezmaster attack. The files are downloading from

hidden directories or directories that are not generally accessible to guest

users on the FTP server.

Thus, the rule for this ‘warezclient’ attack can be coined as:

If the requested ‘service’ is ‘ftp_data’ or ‘ftp’ or ‘other’ and

successfully logged in the network (logged_in==1).

Formulated rule of ‘warzclient’ attack

warezclient_signature:((service==’ftp_data‘) (service==’ftp’)

(service==‘other’) ) ( logged_in==1)

Ipsweep-Probe

An Ipsweep attack is a surveillance sweep to find out which hosts are

listening on the network. The regular way is to send ICMP Ping packets to all

possible address inside a subnet and responding machines would be viewed.

Thus, the rule for this ‘ipsweep’ can be coined as:

If the threshold value of feature ‘dst_bytes’ is 0 and the number of

connections to the same host as the current connection in the past 2 seconds

(count feature) is 1.

Formulated rule of ‘ipsweep’ attack

ipsweep_signature: (dst_bytes==0) (count==1)

85


Nmap-Probe

Network mapping is a common tool for performing network scans that

supports different types of port scans options include SYN, FIN and ACK

scanning with both TCP and UDP, as well as ICMP (Ping) scanning. The

Nmap program also used to indicate which ports to scan, how much time to

wait between each port, and scanning of ports is sequential or in a random

order.

Thus, the rule for this ‘nmap’ can be coined as:

If the protocol type is either ‘udp’ or ‘tcp’ or ‘icmp’with some flag

value (portscans options include SYN, FIN and ACK scanning).

Formulated rule of ‘nmap’ attack

cond_1:(protocol_type==’udp’) (flag==S1),

cond_2:((protocol_type==’tcp’) (protocol_type==’tcp’)) (flag==SH),

nmap_signature: (cond_1 cond_2)

Portsweep-Probe

Surveillance sweep through many ports, used to find which services

are carried out on a single host.

Thus, the rule for this ‘portsweep’ can be coined as:

If the protocol type is ‘tcp’ and the threshold value of feature’

dst_host_count’ (Number of connections to the same host to the destination

host as the current connection in the past 2 seconds) is equal to 255 with the

feature ‘dst_host_same_src_port_rate’ (percentage of connections from the

port services to the destination host) not equal to 0 for the Surveillance sweep.

Formulated rule of ‘portsweep’ attack

portsweep_signature::(protocol_type==’tcp’) (dst_host_count>=255)

( dst_host_same_src_port_rate~=0)

86


Satan-Probe

Satan is a widely existing tool used to probe a network for security

vulnerabilities and misconfiguration and looks for well known weaknesses. It

is occasionally used by administrators and frequently used by attackers to

look for vulnerabilities on a network by gathering information.

Thus, the rule for this ‘satan’ can be coined as:

If the flag value (used to scan the ports) is either ‘SF’ or ‘REJ’ and the

number of connections to the same host as the current connection in the past 2

seconds (‘count’ feature) has non zero value.

Formulated rule of ‘satan’ attack

satan_signature: ( (flag==SF) flag==REJ)) ( count>0)

5.3 DESIGN OF CONTEXT FREE GRAMMAR FOR PACKET

DATA ANALYSIS

Hence, it is proposed to develop the signature for normal and each

attack in the form of Context Free Grammar (CFG). The ISR for all attack

types are interpreted into CFG. The general CFG is given by:

eBvalue.op.f

B

fB

fBtBtB

tBtBeBeB

var

(5.1)

87

6863,54,53,33,28,1

FS , 7062,56,54,53,49,40,34,33,28,2

FS ,

7054,53,33,12,3

FS , 6362,56,54,53,47,43,21,4

FS ,

5349,33,28,27,5

FS , 7062,54,53,34,33,28,21,12,6

FS ,

7054,53,47,12,7

FS , 6254,53,49,33,28,12,8

FS ,

7069,68,63,62,56,54,53,49,47,45,44,43,

42,40,38,37,36,34,33,27,28,23,21,12,11,10,

9FS ,

4910

FS .

5.3.1 Context Free Grammar of Normal Packet data

The grammar that is used to classify the normal packet data is

called Normal Association Grammar (NAG). In the proposed method, the

NAG adds key role to distinguish the normal packet data from attacks. The

normal behavior rule is represented as

112

112

,...,iFwhereiFflagif,F

noB

..ii

FSj

s

je,s

noBif,F

noBenoB (5.2)

The notation for the attack is represented asix

aB,

, where

x is ‘f’ or ‘t’ or ‘e’ denoting Boolean factors, terms and expression in

relevance to CFG and ‘i’ represents various Boolean conditions. Table 5.3

shows the NAG for normal packets.

Table 5.3 Normal Association Grammar

5.3.2 Context Free Grammar of Attack Packet data

Similarly, different intrusion grammars for each attack, namely

Attack Signature Grammar (ASG) have been developed. Table 5.4 shows

88

ASG for four types of attacks which contributes the key function to

discriminate the explicit attack types. The grammar for all classes of attack

types are developed and experimented. The numerical figures used in the

continuous features of ASG are specified by appropriate threshold values

chosen in a heuristic manner. Likewise, the categorical features are

represented by their discrete values.

Table 5.4 Attack Signature Grammar

k)DoS:back(b

531,

3v

fbk

B , 50002,

5v

fbk

B ,

2,1, fbk

Bf

bkBe

bkB

)(: ldlandDoS

11,

7v

fld

B , 32,

4v

fld

B , 53,

4v

fld

B ,

64,

4v

fld

B ,

4,3,2,1, fld

Bf

ldB

fld

Bf

ldBe

ldB

)(: neneptuneDoS

01,

7v

fneB , 3

2,4

vf

neB , 53,

4v

fneB ,

64,

4v

fneB ,

4,3,2,1, fneB

fneB

fneB

fneBe

neB ,

)(: sfsmurfDoS

371,

3v

fsf

B , 10322,

5v

fsf

B ,

2,1, fsf

Bf

sfBe

sfB

)(: pdpodDoS

371,

3v

fpd

B , 14802,

5v

fpd

B , 423,

3v

fpd

B ,

5644,

5v

fpd

B ,

4,3,2,1, fpd

Bfpd

Bfpd

Bfpd

Bepd

B

89


)(: tdteardropDoS

211,

3v

ftd

B , 282,

5v

ftd

B ,

2,1, ftd

Bf

tdBe

tdB

U2R :load module (lm)

331,

3v

flm

B , 6.02,

34v

flm

B , 15.03,

36v

flm

B ,

3.04,

37v

flm

B ,

4,3,2,1,1, flm

Bf

lmB

flm

Bf

lmB

tlm

B

125,

3v

flm

B , 06,

30v

flm

B , 17,

34v

flm

B ,

18,

37v

flm

B ,

8,7,6,5,2, flm

Bf

lmB

flm

Bf

lmB

tlm

B ,

2,1, tlm

Btlm

Belm

B

)(_:2 booverflowbufferRU

101,

3v

fbo

B , 12,

34v

fbo

B , 03,

35v

fbo

B ,

334,

3v

fbo

B , 125,

4v

fbo

B ,

5,4,3,2,1, f

boB

f

boB

fbo

Bf

boB

fbo

Bebo

B

)(:2 rkrootkitRU

331,

3v

frk

B , 02,

36v

frk

B ,

2,1, frk

Bf

rkBe

rkB

)(:2 plperlRU

331,

3v

fpl

B , 22,

16v

fpl

B , 23,

17v

fpl

B ,

3,2,1, fpl

Bfpl

Bfpl

Bepl

B

)(:2 phphfLR

531,

3v

fph

B , 642,

5v

fph

B , 40963,

6v

fph

B , ,

3,2,1, fph

Bfph

Bfph

Beph

B

90


)(_:2 gppasswdguessLR

331,

3v

fgpB 128

2,5

vf

gpB ,

0~3,

38v

fgpB , 0~

4,39

vfgpB ,

4,3,2,1, fgpB

fgpB

fgpB

fgpBe

gpB

)(:2 wmwaremasterLR

701,

3v

fwmB , 0

2,35

vf

wmB , 03,

37v

fwmB ,

3,2,1,1, fwmB

fwmB

fwmB

twmB ,

124,

3v

fwmB , 01.

5,38

vf

wmB , 05.6,

40v

fwmB ,

6,5,4,2, fwmB

fwmB

fwmB

twmB ,

2,1, twmB

twmBe

wmB

L:imap(im)R 2

351,

3v

fim

B , 02,

23v

fim

B ,

2,1, fim

Bf

imBe

imB

)(:2 mhmultihopLR

121,

3v

fmh

B , 332,

3v

fmh

B , 103,

4v

fmh

B ,

14,

36v

fmh

B ,

4,3,2,1, f

mhB

f

mhB

fmh

Bf

mhBe

mhB

)(:2 wcwareclientLR

701,

3v

fwcB , 33

2,3

vf

wcB , 473,

3v

fwcB ,

14,

12v

fwcB ,

4,3,2,1, fmh

Bf

Bf

wcBf

wcBewcB wc

91


)(_:2 fwwriteftpLR

701,

3v

ffw

B , 12,

36v

ffw

B ,

2,1,1, ffw

Bffw

Btfw

B ,

123,

3v

ffw

B , 14,

36v

ffw

B ,

4,3,2, ffw

Bffw

Btfw

B ,

415,

3v

ffw

B , 5.6,

36v

ffw

B ,

6,5,3, ffw

Bffw

Btfw

B ,

3,2,1, tfw

Btfw

Btfw

Befw

B

)(:2 syspyLR

331,

3v

fsyB , 22.

2,38

vf

syB ,

2,1, fsyB

fsyBe

syB

)tan(:Pr sasaobe

01,

23v

fsaB ,, 10

2,4

vf

saB , 53,

4v

fsaB ,

3,2,1, fsaB

fsaB

fsaBe

saB ,

)(:Pr psportsweepobe

21,

2v

fpsB , 255

2,32

vfpsB , 0~

3,36

vfpsB ,

3,2,1, fpsB

fpsB

fpsBe

psB

)(:Pr isipsweepobe

01,

6v

fis

B , 12,

23v

fis

B ,

2,1, fis

Bf

isBe

isB

92


)(:Pr nmnmapobe

31,

2v

fnmB , 7

2,4

vf

nmB ,

2,1,1, fnmB

fnmB

tnmB

23,

2v

fnmB , 3

4,2

vf

nmB , 115,

4v

fnmB ,

5,4,3,1, fnmB

fnmB

fnmB

enmB

1,1, enmB

tnmBe

nmB

5.4 ARCHITECTURE OF THE PROPOSED IDS MODEL

The purpose and focus of the proposed approach is to generate

efficient signature based CFG that is used to classify normal and abnormal

activities from a large volume of IDS packet data. Figure 5.1 shows the

proposed IDS based on Multi Stage Filter (MSF). The basic concept of this

approach is to discriminate the network packets behavior into normal and

abnormal in multilevel of test out. Each test out is more significant for

decreasing the false alarm rate which is the most important attention of this

approach. The normal and the various intrusion signatures of KDD Cup99

data have been analyzed using NAG and ASG rules.

Initially, MSF receives the packet data with all features. The NAG

classification system is used to discriminate merely the behavior of normal

patterns. Furthermore, one or more levels of MSF classify the normal packet

data from the attack packet data accurately.

93

Attack Packets

to AA

Normal

Packets

Attack Packets

To AA

Network

packetsRaw Filter

(RF)

Micro Filter

Stage - 1 (MF-1)

Attack Analyzer

(AA)

Micro FilterStage -N (MF-N)

Figure 5.1 Block diagram of Multi Stage Filter

Initially, the greater part of normal and attack packet data will be

separated in Raw Filter (RF) by means of NAG. The following are the two

outputs of RF.

Greatest number of normal packet data with some small

amount of attack packet data called NOrmal Nomenclature

Data (NOND) which will be directed to Micro Filter (MF).

Maximum quantity of attack packet data with extremely

modest quantity of normal packet data called Attack

Nomenclature Data (AND) which will be sent to Attack

Analyzer (AA).

In AA, individual attack classification will be accomplished on

AND to detect the explicit attack type by means of ASG. However, the

NOND contains a small number of attack packet data and should be filtered

again. The filtering process will be accomplished by MF of MSF whose

internal architecture is shown in Figure 5.2. In MF, the attack patterns are

removed and the greatest possible quantities of attacks are filtered using

Minimum Attack Signature Grammar (MASG). Table 5.5 shows MASG for

the attacks that are used in the proposed method. The numerical figures used

in the continuous features of MASG are also specified by appropriate

94

threshold values chosen in a heuristic manner. Each MF has four divisions of

internal MFs used to separate attack packets. The separated attack packet data

of the MF is directed to AA for attack classification by means of ASG.

Figure 5.2 Internal architecture of MSF

The signatures of some attack packet data are very analogous to

normal. Hence, the discrimination can not be accomplished using RF and

single stage of MF. As a result, added MFs are essential to filter analogous

attack packet data. Analogous signature attacks are discriminated in

Attack Nomenclature

Data (AND)

NOrmal Nomenclature

Data (NOND)

U2R attack Microfilter

Root

kit

Perl. . . .

R2L attack Microfilter

Imap Phf. . . .

DOS attack Microfilter

Back Pod. . . .

Normal

To Attack

Analyzer

Normal

To AttackAnalyzer

Normal

To AttackAnalyzer

Probe attack Microfilter

sata

n

NmapNormal

To Attack

Analyzer

Network

packetsRaw

Filter

(RF)

Attack Analyzer

(AA)

Micro Filter (MF)

95

subsequent MF stages based on a new set of MASG; otherwise, normal

packet data may be classified as attack due to the similarity in signatures,

which will have an effect on false negative. Based on the false alarm, the

network environment and attack types, ‘N’ stages of Micro filters are used in

MSF to attain efficient classification.

Table 5.5 Minimum Attack Signature Grammar

)(tdteardrop

211,

3v

ftd

B , 282,

5v

ftd

B ,

2,1, ftd

Bf

tdBe

tdB

)(_ booverflowbuffer

331,

3v

ftd

B , 12,

17v

ftd

B ,

2,1, ftd

Bf

tdBe

tdB

)(nmnmap

361,

3v

ftd

B , 82,

5v

ftd

B ,

2,1, ftd

Bf

tdBe

tdB

)( pdpod

371,

3v

ftd

B , 14802,

5v

ftd

B ,

2,1, ftd

Bf

tdBe

tdB

)(_ gppasswdguess

331,

3v

fld

B , 1792,

6v

fld

B , 20533,

6v

fld

B ,

3,2,1, fld

Bf

ldB

fld

Beld

B

)( psportsweep

61,

4v

fld

B , 027

vf

ldB

2,, 1

3,4

vf

ldB ,

3,2,1, fld

Bf

ldB

fld

B

96


)(sfsmurf

371,

3v

fld

B , 5202,

5v

fld

B , 10323,

5v

fld

B ,

184,

5v

fld

B ,

4,3,2,1, fld

Bf

ldB

fld

Bf

ldBe

ldB

)(neneptune

331,

3v

fld

B , 12,

38v

fld

B , 13,

4v

fld

B ,

9.4,

39v

fld

B ,

4,3,2,1, fld

Bf

ldB

fld

Bf

ldBe

ldB

)(isipsweep

371,

3v

fld

B , 5202,

5v

fld

B , 10323,

5v

fld

B ,

184,

5v

fld

B ,

4,3,2,1, fld

Bf

ldB

fld

Bf

ldBe

ldB

The pseudo-code for processing the packet data into normal and

attack type is shown in Figure 5.3.

97

PacketdatafilterorithmAlg

ServicesofSetSflagsofSetFdatapacketofSetPInput :;:;://

PacketdatafilteredTheOutput ://

packetdataM

NoutputandSandFbyfilteredisPMethod ://

RN ,

MN ,

Pj

ppacketdataeachfor

;,,R

SR

Fj

prawfilterR

NR

N

RN

jppacketdataeachfor

;,R

Aj

prmicrofilteM

NM

N

end

MNreturn

SFprawFilterorithmA ,,lg

ServicesofSetSflagsofSetFpacketdatagIncopInput :,:;min://

PacketdatafilteredrawTheOutput ://

packetdataR

poutputandSandFbyfilteredispMethod ://

Rp ,

Si

sFi

f , ,

irruleeachfor

0| RSTOSi

fi

si

fi

r

pi

rif

;pR

pR

p

end

end

Rpreturn

Figure 5.3 Algorithm for categorizing the packet data

98

AprmicroFilteorithmA ,lg

rulesattackofSetApacketdatafilteredrawR

pInput :;://

PacketdatafilteredmicroTheOutput ://

packetdataA

porM

poutputtofilteredisR

pMethod ://

Ap ,

Mp ,

Ai

r ,

irruleeachfor

pi

rif

;pA

pA

p

;,A

pnotR

pdiffM

p

end

end

Figure 5.3 (Continued)

There are 5 million packet data available with normal and possible

attacks. But only 50 percent of the packet data have been chosen for analysis

and experimentation. The remaining redundant packet data do not contribute

much to the performance of the proposed method. For the experiments, 50

percent of packet data have been grouped into eight categories and each

category has normal packet data as well as attack packet data.

5.5 SUMMARY

This chapter being the core of the work has considered an improved

rule set for the detection of intruders. Also, a MSF is designed to keep track

of the session of packets as dictated by the rule set. It is observed that the

performance is very promising compared to the existing methods that have

been explored in the previous chapter. The next chapter deals with the

presentation of the results and observations.

Date post:	22-Apr-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

CHAPTER 5 DESIGN OF CONTEXT FREE GRAMMAR FOR PACKET...

Documents