70
CHAPTER 5
DESIGN OF CONTEXT FREE GRAMMAR FOR
PACKET DATA ANALYSIS
The proposed method in this chapter illustrates a rule based
approach, used to classify the network packet data into either normal or
attack. The interrelationships among the features of packet data or network
traffic are used to classify the invasion and normal packet data.
The combination of rules that distinguish the normal and abnormal
behavior patterns is proposed in this chapter. The generated rules have high
ranking accuracy of prediction and high levels of trust in the system and are
also effectively applied several times in observed data. The generated rules
incorporate temporal relationships among events. A set of rules have
originated inductively by viewing user activities and so better sensitivity to
detection of breaching that takes place is assured. The limitation of this
methodology is that the unrecognized patterns of activities can not be found
as anomalous due to mismatching of any rule.
5.1 NEED FOR RULES GENERATION
Since the features are a mixture of categorical and continuous, the
statistical methods are not well suited and rule based filters are more
appropriate.
Cohen (1995) suggested a classification method called Repeated
Incremental Pruning to Produce Error Reduction (RIPPER), a rule learning
71
technique which is an optimized version of IREP (Incremental Reduced Error
Pruning), proposed by Furnkranz and Widmer used to formulate a set of ‘if-
then’ rules from data. Separate-and-conquer rule learning and Reduced Error
Pruning (REP) have been combined in IREP. RIPPER is the tool that is
capable of formulating a complex and threshold based decision. Separate-and-
conquer are a sequential-covering rule learning algorithm in which the
training data is separated into a growing set and a pruning set. Then, the
algorithm generates a rule set at a time and eliminates all examples covered
by the new rule. So, REP is a successful method used to reduce the tree
learned in decision tree learning system and can be simply adapted to rule
learning system. While creating a rule, RIPPER seeks the most important rule
for the present growing set in rule space. After a rule is extracted on growing
set, it is pruned on pruning set and the related rule in the training set (growing
and pruning sets) are deleted. The left over training data is again separated
until the conditions are satisfied.
The RIPPER indicated that for a set of records, one of the features
is the class label which is used for classification, and the classification
algorithms can discriminate the feature values to describe each concept. The
RIPPER rules contain a combination of attribute values (conditions) and class
label (consequence) and can be deployed on IDS. The data mining programs
would be applied to study rules that perfectly identify the activities of normal
and intrusion. RIPPER is a part of the DARPA assessment and the authors
make clear statement for its performance. The authors concurred that in order
to detect any other new intrusion, anomaly detection has to be employed and
in general, it does not create effective signatures in signature detection tool.
Observation of any traffic pattern can help in deciding whether the
traffic category is normal or attack. In the same manner, it is observed that the
ground truth IDS data base KDD Cup99 follows certain behavioral patterns
72
with categorical and continuous features. Any intruder can be observed with
relevance to their behavior. If one can characterize the behavior of a normal
user and an abnormal user, a set of rules can be formulated for them. In this
chapter, the IDS data set is analyzed and a set of rules for both normal and
attack packet data are designed. Though these rules are explicitly tested on the
ground truth data set (KDD Cup99) for IDS, they can easily adapt to the new
environment.
The maximum challenges take place when thresholds are described
for the rules and are empirical in environment (Maheshkumar Sabhnani and
Gursel Serpen 2003). They proposed the generation of heuristic rules by
analyzing the signatures of attacks in the R2L class. But, insignificant
thresholds for rules outcomes make low detection for the targeted attacks. The
data or signature based heuristic rules can detect attacks in real time as they
typically require very modest executing moment. So, the misuse detection
performance is maximum, if specific and comprehensive heuristic rules
for a specific application is defined. The creation of rules employ the
signature analysis to define a set of features and automated techniques to
establish different threshold values that will be required to form heuristic
rules. They have studied the warezmaster and warezclient attacks in terms of
signatures, services used and significant features presented in the KDD cup99
dataset. In addition, they have narrated how the features and thresholds are
located for different rules to detect warezmaster and warezclient attacks.
Furthermore, the signature based heuristic rules would not be influenced by
noise in the training dataset and are used to detect the attacks with low false
alarm rates.
Gunes Kayacik et al (2005) described the details of the most
discriminative class label for each feature using information gain procedure.
They suggested that majority of the features between 31 and 41 are mostly
73
used to discriminate normal and some attacks accurately. Moreover, 9
features contributed modest to intrusion detection and features 20 and 21 have
not taken any part in intrusion detection.
Heuristic rules that have been generated and discussed using
RIPPER are used to detect only remote-to-local (R2L) attack (Maheshkumar
Sabhnani and Gursel Serpen 2003).
The novelty of the proposed grammar method is written at two
levels (Raw Filter and Micro Filter) for enhancing the performance of IDS.
Also, in the proposed work the computational complexity is less since they
require simple rules.
5.2 NORMAL AND ATTACK PACKET CATEGORIZATION
The normal and attack context free grammar methods have been
proposed to model and analyze the network sequences. The grammars are
excellent to articulate the interdependency of vulnerabilities and are
specifically desirable for the IDS alert. The logical rule of context free
grammar makes it more comprehensive. The false positives and false
negatives can be reduced by grammar examination as well. Basically, the
fixed character of grammar is used for easier recognition and inspection.
Also, the algorithmic complexity of the grammar is linear to the length of the
grammar (Yinqian Zhang et al 2008).
Initially, in the context free grammar method, the normal and attack
grammar have been created off line to facilitate IDS alerts. Then the grammar
would be loaded to a proposed Multi Stage Filter and also the traffic data
would be analyzed by applying the grammar. It is possible to construct new
grammars for any type of network environment and subsequently low false
alarms can be achieved in due course. A variation in the grammars depend
74
only on the features of any traffic data which changes the size of the grammar
and hence the algorithm complexity is O(n), where n is the length of the
grammar. The constructed rules are the most appropriate for the given dataset
and are sophisticated to identify new records or for adding new features. In
signature based rules, the significant features can be extracted by reviewing
the attack dynamics.
The C4.5 is a decision tree algorithm (Ron Kohavi and Ross
Quinlan 1999), which is used to classify data with common attributes and
each decision tree signifies a rule which classifies data according to the
attributes. A node of a decision tree identifies an attribute by which the data is
to be divided. The labeled edges of each node has a possible value of the
attribute in the parent node and the edge is used to connect either two nodes
or a leaf node and the leaves are labeled with a decision value for
classification of the data. The C4.5 rules can be used in misuse detection also.
The tree has been properly pruned by the algorithm and the table has been
created. Finally, appropriate rules would be formed for the normal and attack
patterns.
The feature ‘flag’ (labeled as v4 in Table 3.2 with 11 different
values) has been chosen as the most perceptive feature for classification to
find the normal packet in the proposed method. The association of the ‘flag’
and ‘service’ (labeled as v3 in Table 3.2 with 70 different values) features is
used for classifying the normal data from the rest of the data, which is called
as Normal Signature Association (NSA). All association events for normal
data are tabulated in Table 5.1. For example, the association of the feature
‘flag’ with the value ‘SH’ and the feature ‘service’ with the value ‘time’ treats
the data as ‘normal’.
75
Table 5.1 Normal Signature Association
Flag (f) Services (s)
RSTR finger,telnet,http,smtp,X11,ssh
RSTO finger,telnet,shell,pop_3, time, http, smtp, auth, IRC, ftp
OTH ftp_data, telnet, http, smtp, ftp
REJ private,urh_i,other,http,smtp,auth,IRC,X11
S0 domain, finger, telnet, time, http
S1 ftp_data, private, finger, telnet ,shell, http, smtp, IRC ,ftp
S2 ftp_data, other, http, smtp, ftp
S3 ftp_data, finger, telnet, time, http, smtp, IRC
SF csnet_ns, domain_u, ftp_data, private, tftp_u, domain, finger,
telnet, shell, eco_i, ecr_i, red_i, pop_3, tim_i, urh_i, urp_i,
ntp_u, other, time, http, smtp, auth, IRC, X11, ssh, efs, ftp
SH time
Likewise, based on the description and signature of various attacks,
different intrusion classification rules have been developed for each attack
called Intrusion Signature Rule (ISR). Table 5.2 shows ISR for all attacks
which is based on their description and signatures. These rules select
appropriate discriminating features and some values of selective features for
classification. The notations used are for AND function and for OR
function respectively.
76
Table 5.2 Intrusion Signature Rule
Back -DoS
The number of front slashes in the URL sent could be varied. The
attackers may intentionally add large number of front slashes in his/her script
causing the CPU power in the HTTP server to be under utilized. However, the
system recovers automatically when the attack stops.
Thus the rule for this ‘back’ attack can be coined as:
If the requested ‘service’ is ‘http’ and the ‘src_bytes’ is greater than a
threshold (typically 5000) occur. However the selection of the threshold is
subjective to the working environment.
Formulated rule of ‘back’ attack:
Back signature: (service == ‘http’) (src_bytes > 5000)
Smurf-DoS
In a broadcasting subnet environment, the attacker sends ICMP echo
request packets to the broadcast address of countless subnets with the source
address spoofed to be that of the planned victim. Here, the participants are the
attacker, the mediator and the victim (the mediator may be a victim). Several
nodes that listen on these subnets will respond by sending ICMP echo reply
packets to the victim.
Thus, the rule for this ‘smurf’ attack can be coined as:
If the requested ‘service’ is ‘ecr_i’ and the threshold value of feature
‘src_bytes’ is equal to 1032. The threshold should be selected based on the
minimum frame size of the subnet (LAN) under considerations.
Formulated rule of ‘smurf’ attack:
Smurf signature:(service==‘ecr_i’) (src_bytes==1032)
77
Table 5.2 (Continued)
Teardrop-DoS
The teardrop attack ensues due to a bug in TCP/IP fragmentation by
inserting false offset information into fragmented packets. During reassembly,
there are outsized payloads or overlapping fragments that can cause the
system to be crash.
Thus, the rule for this ‘teardrop’ attack can be coined as:
If the requested ‘service’ is ‘private’ and the threshold value of feature
‘src_bytes’ is equal to 28 whose value has been confused size.
Formulated rule of ‘teardrop’ attack:
Teardrop signature: (service==’private’) (src_bytes==28)
Neptune- DoS
In neptune attack, the attacker frequently sends abundance of
TCP/SYN packets with a fake sender address. The server initiated a half-open
connection by sending back a TCP/SYN-ACK packet for each packet and
waiting for a packet reply. The legitimate requests can not be connected with
server due to the saturation of half-open connections.
Thus, the rule for this ‘neptune’ attack can be coined as:
If the connection is not from/to the same host/port (land value is ‘0’)
with one of the flag RSTO, REJ and S0.
Formulated rule of ‘neptune’ attack:
neptune_signature : (land == 0) ((flag == RSTO) (flag == REJ)
(flag == S0))
78
Table 5.2 (Continued)
Pod- DoS
In pod attack, the malformed ping packets have been sent to the victim
machine that leads to crash it.
Thus, the rule for this ‘pod’ attack can be coined as:
If the requested ‘service’ is ‘ecr_i’ or ’tim_i’ and the corresponding
malformed ping packets (threshold value of feature ‘src_bytes’) is equal to
1480 or 564.28 leads the crash of the system.
Formulated rule of ‘pod’ attack:
pod_signature: ((service == ‘ecr_i’) (src_bytes ==1480))
((service == ’tim_i’) (src_bytes= =564))
Land- DoS
In the land attack, an attacker sends a spoofed SYN packet in which
the source address and the destination address are identical one.
Thus, the rule for this ‘land’ attack can be coined as:
If the connection is from/to the same host/port (land value is ‘1’) with
one of the flag RSTO, REJ and S0.
Formulated rule of ‘land’ attack:
land_signature: (land == 1) ((flag == RSTO) (flag == REJ)
(flag == S0)),
79
Table 5.2 (Continued)
Buffer_overflow-U2R
A buffer_overflow occurs when a program is provided more data for
input than the allocated size of information.
Thus, the rule for this ‘buffer_overflow’ attack can be coined as:
If the requested ‘service’ is ==‘telnet’ or ‘ftp_data’ during ‘SF’ flag
and the threshold value of feature ‘dst_host_same_srv_rate’ is 1 with
‘dst_host_diff_srv_rate’ is 0.
Formulated rule of ‘buffer_overflow’ attack:
cond_1: ((service==‘telnet’) (service== ‘ftp_data’) ) (flag==SF)
cond_2: (dst_host_same_srv_rate==1) (dst_host_diff_srv_rate==0)
buffer_overflow_signature: (cond_1 cond_2)
Loadmodule-U2R
The loadmodule attack which resets Internal Field Separator (IFS) for a
normal user and unauthorized users can gain root access on the local machine
Thus, the rule for this ‘loadmodule’ attack can be coined as:
If the requested ‘service’ is ==‘telnet’ or ‘ftp_data’ with the threshold
value of features ‘diff_srv_rate’ , ‘dst_host_same_src_port_rate’ ,
‘dst_host_srv_diff_host_rate’ and ‘dst_host_same_srv_rate’ .
Formulated rule of ‘loadmodule’ attack:
cond_1:(service==‘telnet’) (dst_host_same_srv_rate>=.6)
(dst_host_same_src_port_rate>=.15) (dst_host_srv_diff_host_rate<=.3)
cond_2:(service==‘ftp_data‘) (diff_srv_rate~=0)
(dst_host_same_srv_rate==1) (dst_host_srv_diff_host_rate>=1),
loadmodule_signature: (cond_1 cond_2 )
80
Table 5.2 (Continued)
Rootkit-U2R
A rootkit is a collection of programs in which an attacker maintains
access to a machine once it has been compromised. The user installs one or
more components of a rootkit (in the way of sniffer, versions of login, su, and
other programs with backdoors) to access.
Thus, the rule for this ‘rootkit’ attack can be coined as:
If the requested ‘service’ is ‘telnet’ and threshold value of feature
‘dst_host_same_src_port_rate’ (percentage of connections from the port
services to the destination host) is equal to 0.
Formulated rule of ‘rootkit’ attack
rootkit_signature:(service==‘telnet’) (dst_host_same_src_port_rate==0),
Perl-U2R
In perl attack, the user id to root in a perl script is set and a root
shell will be created.
Thus, the rule for this ‘perl’ attack can be coined as:
If the requested ‘service’ is ==‘telnet’ and the threshold value of
features ‘num_root’ is 2 with ‘num_file_creations’is 2.
Formulated rule of ‘perl’ attack:
perl_signature: (service==‘telnet’) (num_root==2)
(num_file_creations==2)
81
Table 5.2 (Continued)
Ftp_write- R2L
In ftp_write attack, the writable FTP directories would be detected and
acquires local login and used for unauthorized or illegal operations like write
files that could provide access to the computer.
Thus, the rule for this ‘ftp_write’ attack can be coined as:
If the requested ‘service’ is ‘login’ or ‘ftp’ or ‘ftp_data’ and the corresponding
threshold value of ‘dst_host_same_src_port_rate’(percentage of connections
from the port services to the destination host) feature.
Formulated rule of ‘ftp_write’ attack
cond_1:(service==’login’) (dst_host_same_src_port_rate>=.5),
cond_2:(service==’ftp’) (dst_host_same_src_port_rate>=1) ,
cond_3:(service==’ftp_data’) (dst_host_same_src_port_rate>=1),
ftp_write_signature: (cond_1 cond_2 cond_3)
Guess_passwd-R2L
The attacker attempts to guess the password via telnet for guest
account.
Thus, the rule for this ‘guess_passwd’ attack can be coined as:
If the requested ‘service’ is ‘telnet’ and the ‘src_bytes’ is less than a
threshold (typically 128) occur. However the selection of the threshold is
subjective to the working environment with some threshold value of service
error rate.
Formulated rule of ‘guess_passwd’ attack
guess_passwd_signature:(service==‘telnet’) (src_bytes<=128)
(dst_host_serror_rate~=0) (dst_host_srv_serror_rate~=0)
82
Table 5.2 (Continued)
Imap-R2L
The Imap attack makes use of a buffer overflow in the Imap Linux
server. The remote attackers’ uses root privileges and access mail folders and
establish several file handling on behalf of the user logging in. The buffer
overflow exists after login even though the privileges are discarded.
Thus, the rule for this ‘Imap’ attack can be coined as:
If the requested ‘service’ is ‘imap4’ and the corresponding threshold
value of feature ‘count’ more than zero. The non-zero value of count indicates
that the attacker handing some file operations in imap server.
Formulated rule of ‘imap ’ attack
imap_signature: (service== ‘imap4’) (count>0)
Multihop-R2L
In multihop attack, the initial attack traffics that going into or coming
out of the network router will be used to discover further attacks and would be
a more efficient in undetected Denial of Service.
Thus, the rule for this ‘multihop’ attack can be coined as:
If the requested ‘service’ is ‘ftp’ or ‘telnet’ and flag is ‘SF’ with the
threshold value of feature dst_host_same_src_port_rate.
Formulated rule of ‘multihop’ attack
multihop_signature:((service==’ftp_data’) (service==‘telnet’)) (flag== SF)
( dst_host_same_src_port_rate>=1).
83
Table 5.2 (Continued)
Phf-R2L
The Phf attack exploits a poorly created CGI script that permits a client
to execute commands with the authorization level of the http server by issuing
an http request.
Thus, the rule for this ‘phf’ attack can be coined as:
If the requested ‘service’ is ‘http’ and threshold value of features
‘src_bytes’ is less than 64 with ‘dst_bytes’ is greater than 4096.
Formulated rule of ‘phf’ attack
Phf_signature: (service==‘http’) (src_bytes<64) (dst_bytes>4096)
Spy-R2L
In the spy attack, an attacker breaks into a compromised machine
several times to collect important information and seeking secret data files or
reading user’s personal mail and also avoid the possibility of detection.
Thus, the rule for this ‘spy’ attack can be coined as:
If the requested ‘service’ is ‘telnet’ and threshold value of feature
‘dst_host_serror_rate’ (percentage of connections that have ‘SYN’ errors to
same host to the destination host) is .22.
Formulated rule of ‘spy’ attack
Spy_signature: (service==‘telnet’) (dst_host_serror_rate==.22)
84
Table 5.2 (Continued)
Warzclient –R2L
In warezclient attack, any authorized user during an FTP
connection download the illegal “warez” software that has been placed
previously through a warezmaster attack. The files are downloading from
hidden directories or directories that are not generally accessible to guest
users on the FTP server.
Thus, the rule for this ‘warezclient’ attack can be coined as:
If the requested ‘service’ is ‘ftp_data’ or ‘ftp’ or ‘other’ and
successfully logged in the network (logged_in==1).
Formulated rule of ‘warzclient’ attack
warezclient_signature:((service==’ftp_data‘) (service==’ftp’)
(service==‘other’) ) ( logged_in==1)
Ipsweep-Probe
An Ipsweep attack is a surveillance sweep to find out which hosts are
listening on the network. The regular way is to send ICMP Ping packets to all
possible address inside a subnet and responding machines would be viewed.
Thus, the rule for this ‘ipsweep’ can be coined as:
If the threshold value of feature ‘dst_bytes’ is 0 and the number of
connections to the same host as the current connection in the past 2 seconds
(count feature) is 1.
Formulated rule of ‘ipsweep’ attack
ipsweep_signature: (dst_bytes==0) (count==1)
85
Table 5.2 (Continued)
Nmap-Probe
Network mapping is a common tool for performing network scans that
supports different types of port scans options include SYN, FIN and ACK
scanning with both TCP and UDP, as well as ICMP (Ping) scanning. The
Nmap program also used to indicate which ports to scan, how much time to
wait between each port, and scanning of ports is sequential or in a random
order.
Thus, the rule for this ‘nmap’ can be coined as:
If the protocol type is either ‘udp’ or ‘tcp’ or ‘icmp’with some flag
value (portscans options include SYN, FIN and ACK scanning).
Formulated rule of ‘nmap’ attack
cond_1:(protocol_type==’udp’) (flag==S1),
cond_2:((protocol_type==’tcp’) (protocol_type==’tcp’)) (flag==SH),
nmap_signature: (cond_1 cond_2)
Portsweep-Probe
Surveillance sweep through many ports, used to find which services
are carried out on a single host.
Thus, the rule for this ‘portsweep’ can be coined as:
If the protocol type is ‘tcp’ and the threshold value of feature’
dst_host_count’ (Number of connections to the same host to the destination
host as the current connection in the past 2 seconds) is equal to 255 with the
feature ‘dst_host_same_src_port_rate’ (percentage of connections from the
port services to the destination host) not equal to 0 for the Surveillance sweep.
Formulated rule of ‘portsweep’ attack
portsweep_signature::(protocol_type==’tcp’) (dst_host_count>=255)
( dst_host_same_src_port_rate~=0)
86
Table 5.2 (Continued)
Satan-Probe
Satan is a widely existing tool used to probe a network for security
vulnerabilities and misconfiguration and looks for well known weaknesses. It
is occasionally used by administrators and frequently used by attackers to
look for vulnerabilities on a network by gathering information.
Thus, the rule for this ‘satan’ can be coined as:
If the flag value (used to scan the ports) is either ‘SF’ or ‘REJ’ and the
number of connections to the same host as the current connection in the past 2
seconds (‘count’ feature) has non zero value.
Formulated rule of ‘satan’ attack
satan_signature: ( (flag==SF) flag==REJ)) ( count>0)
5.3 DESIGN OF CONTEXT FREE GRAMMAR FOR PACKET
DATA ANALYSIS
Hence, it is proposed to develop the signature for normal and each
attack in the form of Context Free Grammar (CFG). The ISR for all attack
types are interpreted into CFG. The general CFG is given by:
eBvalue.op.f
B
fB
fBtBtB
tBtBeBeB
var
(5.1)
87
6863,54,53,33,28,1
FS , 7062,56,54,53,49,40,34,33,28,2
FS ,
7054,53,33,12,3
FS , 6362,56,54,53,47,43,21,4
FS ,
5349,33,28,27,5
FS , 7062,54,53,34,33,28,21,12,6
FS ,
7054,53,47,12,7
FS , 6254,53,49,33,28,12,8
FS ,
7069,68,63,62,56,54,53,49,47,45,44,43,
42,40,38,37,36,34,33,27,28,23,21,12,11,10,
9FS ,
4910
FS .
5.3.1 Context Free Grammar of Normal Packet data
The grammar that is used to classify the normal packet data is
called Normal Association Grammar (NAG). In the proposed method, the
NAG adds key role to distinguish the normal packet data from attacks. The
normal behavior rule is represented as
112
112
,...,iFwhereiFflagif,F
noB
..ii
FSj
s
je,s
noBif,F
noBenoB (5.2)
The notation for the attack is represented asix
aB,
, where
x is ‘f’ or ‘t’ or ‘e’ denoting Boolean factors, terms and expression in
relevance to CFG and ‘i’ represents various Boolean conditions. Table 5.3
shows the NAG for normal packets.
Table 5.3 Normal Association Grammar
5.3.2 Context Free Grammar of Attack Packet data
Similarly, different intrusion grammars for each attack, namely
Attack Signature Grammar (ASG) have been developed. Table 5.4 shows
88
ASG for four types of attacks which contributes the key function to
discriminate the explicit attack types. The grammar for all classes of attack
types are developed and experimented. The numerical figures used in the
continuous features of ASG are specified by appropriate threshold values
chosen in a heuristic manner. Likewise, the categorical features are
represented by their discrete values.
Table 5.4 Attack Signature Grammar
k)DoS:back(b
531,
3v
fbk
B , 50002,
5v
fbk
B ,
2,1, fbk
Bf
bkBe
bkB
)(: ldlandDoS
11,
7v
fld
B , 32,
4v
fld
B , 53,
4v
fld
B ,
64,
4v
fld
B ,
4,3,2,1, fld
Bf
ldB
fld
Bf
ldBe
ldB
)(: neneptuneDoS
01,
7v
fneB , 3
2,4
vf
neB , 53,
4v
fneB ,
64,
4v
fneB ,
4,3,2,1, fneB
fneB
fneB
fneBe
neB ,
)(: sfsmurfDoS
371,
3v
fsf
B , 10322,
5v
fsf
B ,
2,1, fsf
Bf
sfBe
sfB
)(: pdpodDoS
371,
3v
fpd
B , 14802,
5v
fpd
B , 423,
3v
fpd
B ,
5644,
5v
fpd
B ,
4,3,2,1, fpd
Bfpd
Bfpd
Bfpd
Bepd
B
89
Table 5.4 (Continued)
)(: tdteardropDoS
211,
3v
ftd
B , 282,
5v
ftd
B ,
2,1, ftd
Bf
tdBe
tdB
U2R :load module (lm)
331,
3v
flm
B , 6.02,
34v
flm
B , 15.03,
36v
flm
B ,
3.04,
37v
flm
B ,
4,3,2,1,1, flm
Bf
lmB
flm
Bf
lmB
tlm
B
125,
3v
flm
B , 06,
30v
flm
B , 17,
34v
flm
B ,
18,
37v
flm
B ,
8,7,6,5,2, flm
Bf
lmB
flm
Bf
lmB
tlm
B ,
2,1, tlm
Btlm
Belm
B
)(_:2 booverflowbufferRU
101,
3v
fbo
B , 12,
34v
fbo
B , 03,
35v
fbo
B ,
334,
3v
fbo
B , 125,
4v
fbo
B ,
5,4,3,2,1, f
boB
f
boB
fbo
Bf
boB
fbo
Bebo
B
)(:2 rkrootkitRU
331,
3v
frk
B , 02,
36v
frk
B ,
2,1, frk
Bf
rkBe
rkB
)(:2 plperlRU
331,
3v
fpl
B , 22,
16v
fpl
B , 23,
17v
fpl
B ,
3,2,1, fpl
Bfpl
Bfpl
Bepl
B
)(:2 phphfLR
531,
3v
fph
B , 642,
5v
fph
B , 40963,
6v
fph
B , ,
3,2,1, fph
Bfph
Bfph
Beph
B
90
Table 5.4 (Continued)
)(_:2 gppasswdguessLR
331,
3v
fgpB 128
2,5
vf
gpB ,
0~3,
38v
fgpB , 0~
4,39
vfgpB ,
4,3,2,1, fgpB
fgpB
fgpB
fgpBe
gpB
)(:2 wmwaremasterLR
701,
3v
fwmB , 0
2,35
vf
wmB , 03,
37v
fwmB ,
3,2,1,1, fwmB
fwmB
fwmB
twmB ,
124,
3v
fwmB , 01.
5,38
vf
wmB , 05.6,
40v
fwmB ,
6,5,4,2, fwmB
fwmB
fwmB
twmB ,
2,1, twmB
twmBe
wmB
L:imap(im)R 2
351,
3v
fim
B , 02,
23v
fim
B ,
2,1, fim
Bf
imBe
imB
)(:2 mhmultihopLR
121,
3v
fmh
B , 332,
3v
fmh
B , 103,
4v
fmh
B ,
14,
36v
fmh
B ,
4,3,2,1, f
mhB
f
mhB
fmh
Bf
mhBe
mhB
)(:2 wcwareclientLR
701,
3v
fwcB , 33
2,3
vf
wcB , 473,
3v
fwcB ,
14,
12v
fwcB ,
4,3,2,1, fmh
Bf
Bf
wcBf
wcBewcB wc
91
Table 5.4 (Continued)
)(_:2 fwwriteftpLR
701,
3v
ffw
B , 12,
36v
ffw
B ,
2,1,1, ffw
Bffw
Btfw
B ,
123,
3v
ffw
B , 14,
36v
ffw
B ,
4,3,2, ffw
Bffw
Btfw
B ,
415,
3v
ffw
B , 5.6,
36v
ffw
B ,
6,5,3, ffw
Bffw
Btfw
B ,
3,2,1, tfw
Btfw
Btfw
Befw
B
)(:2 syspyLR
331,
3v
fsyB , 22.
2,38
vf
syB ,
2,1, fsyB
fsyBe
syB
)tan(:Pr sasaobe
01,
23v
fsaB ,, 10
2,4
vf
saB , 53,
4v
fsaB ,
3,2,1, fsaB
fsaB
fsaBe
saB ,
)(:Pr psportsweepobe
21,
2v
fpsB , 255
2,32
vfpsB , 0~
3,36
vfpsB ,
3,2,1, fpsB
fpsB
fpsBe
psB
)(:Pr isipsweepobe
01,
6v
fis
B , 12,
23v
fis
B ,
2,1, fis
Bf
isBe
isB
92
Table 5.4 (Continued)
)(:Pr nmnmapobe
31,
2v
fnmB , 7
2,4
vf
nmB ,
2,1,1, fnmB
fnmB
tnmB
23,
2v
fnmB , 3
4,2
vf
nmB , 115,
4v
fnmB ,
5,4,3,1, fnmB
fnmB
fnmB
enmB
1,1, enmB
tnmBe
nmB
5.4 ARCHITECTURE OF THE PROPOSED IDS MODEL
The purpose and focus of the proposed approach is to generate
efficient signature based CFG that is used to classify normal and abnormal
activities from a large volume of IDS packet data. Figure 5.1 shows the
proposed IDS based on Multi Stage Filter (MSF). The basic concept of this
approach is to discriminate the network packets behavior into normal and
abnormal in multilevel of test out. Each test out is more significant for
decreasing the false alarm rate which is the most important attention of this
approach. The normal and the various intrusion signatures of KDD Cup99
data have been analyzed using NAG and ASG rules.
Initially, MSF receives the packet data with all features. The NAG
classification system is used to discriminate merely the behavior of normal
patterns. Furthermore, one or more levels of MSF classify the normal packet
data from the attack packet data accurately.
93
Attack Packets
to AA
Normal
Packets
Attack Packets
To AA
Network
packetsRaw Filter
(RF)
Micro Filter
Stage - 1 (MF-1)
Attack Analyzer
(AA)
Micro FilterStage -N (MF-N)
Figure 5.1 Block diagram of Multi Stage Filter
Initially, the greater part of normal and attack packet data will be
separated in Raw Filter (RF) by means of NAG. The following are the two
outputs of RF.
Greatest number of normal packet data with some small
amount of attack packet data called NOrmal Nomenclature
Data (NOND) which will be directed to Micro Filter (MF).
Maximum quantity of attack packet data with extremely
modest quantity of normal packet data called Attack
Nomenclature Data (AND) which will be sent to Attack
Analyzer (AA).
In AA, individual attack classification will be accomplished on
AND to detect the explicit attack type by means of ASG. However, the
NOND contains a small number of attack packet data and should be filtered
again. The filtering process will be accomplished by MF of MSF whose
internal architecture is shown in Figure 5.2. In MF, the attack patterns are
removed and the greatest possible quantities of attacks are filtered using
Minimum Attack Signature Grammar (MASG). Table 5.5 shows MASG for
the attacks that are used in the proposed method. The numerical figures used
in the continuous features of MASG are also specified by appropriate
94
threshold values chosen in a heuristic manner. Each MF has four divisions of
internal MFs used to separate attack packets. The separated attack packet data
of the MF is directed to AA for attack classification by means of ASG.
Figure 5.2 Internal architecture of MSF
The signatures of some attack packet data are very analogous to
normal. Hence, the discrimination can not be accomplished using RF and
single stage of MF. As a result, added MFs are essential to filter analogous
attack packet data. Analogous signature attacks are discriminated in
Attack Nomenclature
Data (AND)
NOrmal Nomenclature
Data (NOND)
U2R attack Microfilter
Root
kit
Perl. . . .
R2L attack Microfilter
Imap Phf. . . .
DOS attack Microfilter
Back Pod. . . .
Normal
To Attack
Analyzer
Normal
To AttackAnalyzer
Normal
To AttackAnalyzer
Probe attack Microfilter
sata
n
NmapNormal
To Attack
Analyzer
Network
packetsRaw
Filter
(RF)
Attack Analyzer
(AA)
Micro Filter (MF)
95
subsequent MF stages based on a new set of MASG; otherwise, normal
packet data may be classified as attack due to the similarity in signatures,
which will have an effect on false negative. Based on the false alarm, the
network environment and attack types, ‘N’ stages of Micro filters are used in
MSF to attain efficient classification.
Table 5.5 Minimum Attack Signature Grammar
)(tdteardrop
211,
3v
ftd
B , 282,
5v
ftd
B ,
2,1, ftd
Bf
tdBe
tdB
)(_ booverflowbuffer
331,
3v
ftd
B , 12,
17v
ftd
B ,
2,1, ftd
Bf
tdBe
tdB
)(nmnmap
361,
3v
ftd
B , 82,
5v
ftd
B ,
2,1, ftd
Bf
tdBe
tdB
)( pdpod
371,
3v
ftd
B , 14802,
5v
ftd
B ,
2,1, ftd
Bf
tdBe
tdB
)(_ gppasswdguess
331,
3v
fld
B , 1792,
6v
fld
B , 20533,
6v
fld
B ,
3,2,1, fld
Bf
ldB
fld
Beld
B
)( psportsweep
61,
4v
fld
B , 027
vf
ldB
2,, 1
3,4
vf
ldB ,
3,2,1, fld
Bf
ldB
fld
B
96
Table 5.5 (Continued)
)(sfsmurf
371,
3v
fld
B , 5202,
5v
fld
B , 10323,
5v
fld
B ,
184,
5v
fld
B ,
4,3,2,1, fld
Bf
ldB
fld
Bf
ldBe
ldB
)(neneptune
331,
3v
fld
B , 12,
38v
fld
B , 13,
4v
fld
B ,
9.4,
39v
fld
B ,
4,3,2,1, fld
Bf
ldB
fld
Bf
ldBe
ldB
)(isipsweep
371,
3v
fld
B , 5202,
5v
fld
B , 10323,
5v
fld
B ,
184,
5v
fld
B ,
4,3,2,1, fld
Bf
ldB
fld
Bf
ldBe
ldB
The pseudo-code for processing the packet data into normal and
attack type is shown in Figure 5.3.
97
PacketdatafilterorithmAlg
ServicesofSetSflagsofSetFdatapacketofSetPInput :;:;://
PacketdatafilteredTheOutput ://
packetdataM
NoutputandSandFbyfilteredisPMethod ://
RN ,
MN ,
Pj
ppacketdataeachfor
;,,R
SR
Fj
prawfilterR
NR
N
RN
jppacketdataeachfor
;,R
Aj
prmicrofilteM
NM
N
end
MNreturn
SFprawFilterorithmA ,,lg
ServicesofSetSflagsofSetFpacketdatagIncopInput :,:;min://
PacketdatafilteredrawTheOutput ://
packetdataR
poutputandSandFbyfilteredispMethod ://
Rp ,
Si
sFi
f , ,
irruleeachfor
0| RSTOSi
fi
si
fi
r
pi
rif
;pR
pR
p
end
end
Rpreturn
Figure 5.3 Algorithm for categorizing the packet data
98
AprmicroFilteorithmA ,lg
rulesattackofSetApacketdatafilteredrawR
pInput :;://
PacketdatafilteredmicroTheOutput ://
packetdataA
porM
poutputtofilteredisR
pMethod ://
Ap ,
Mp ,
Ai
r ,
irruleeachfor
pi
rif
;pA
pA
p
;,A
pnotR
pdiffM
p
end
end
Figure 5.3 (Continued)
There are 5 million packet data available with normal and possible
attacks. But only 50 percent of the packet data have been chosen for analysis
and experimentation. The remaining redundant packet data do not contribute
much to the performance of the proposed method. For the experiments, 50
percent of packet data have been grouped into eight categories and each
category has normal packet data as well as attack packet data.
5.5 SUMMARY
This chapter being the core of the work has considered an improved
rule set for the detection of intruders. Also, a MSF is designed to keep track
of the session of packets as dictated by the rule set. It is observed that the
performance is very promising compared to the existing methods that have
been explored in the previous chapter. The next chapter deals with the
presentation of the results and observations.