Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | melvin-henry |
View: | 213 times |
Download: | 1 times |
Network Intrusion Detection Network Intrusion Detection Using Random ForestsUsing Random Forests
Jiong ZhangJiong Zhang
Mohammad ZulkernineMohammad Zulkernine
School of ComputingSchool of Computing
Queen's UniversityQueen's University
Kingston, Ontario, CanadaKingston, Ontario, Canada
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 22
OutlineOutline
MotivationMotivation Intrusion detection systemIntrusion detection system Data mining meets intrusion Data mining meets intrusion
detectiondetection Proposed architectureProposed architecture Challenges and solutionsChallenges and solutions Experimental resultsExperimental results Conclusion and future workConclusion and future work
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 33
MotivationMotivation Intrusion Prevention System (firewall) Intrusion Prevention System (firewall)
can not prevent all attacks.can not prevent all attacks.
InternetInternet
Intruder
Intruder Victim
Firewall
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 44
Motivation (contd.)Motivation (contd.)
Statistical data for intrusionsStatistical data for intrusions• Total losses of 2004 (reported): Total losses of 2004 (reported):
$141,496,560$141,496,560.. Source: FBI survey for Year 2004Source: FBI survey for Year 2004
• 50%50% of security breaches are of security breaches are undetected.undetected.
Source: FBI Statistics for Year 2000Source: FBI Statistics for Year 2000
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 55
Intrusion Detection Intrusion Detection TechniquesTechniques
Misuse DetectionMisuse Detection• Extracts patterns of known intrusionsExtracts patterns of known intrusions• Cannot detect novel intrusions Cannot detect novel intrusions • Has low false positive rateHas low false positive rate
Anomaly DetectionAnomaly Detection• Builds profiles for normal activitiesBuilds profiles for normal activities• Uses the deviations from the profiles to detect Uses the deviations from the profiles to detect
attacksattacks• Can detect unknown attacksCan detect unknown attacks• Has high false positive rateHas high false positive rate
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 66
Network Intrusion Detection Network Intrusion Detection
System (NIDS)System (NIDS) Monitors network traffic to detect Monitors network traffic to detect
intrusions intrusions Monitors more targets on a networkMonitors more targets on a network Detects some attacks that host-Detects some attacks that host-
based systems missbased systems miss Does not affect network operationsDoes not affect network operations
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 77
Current NIDS Current NIDS Many current NIDSs (like snort) :Many current NIDSs (like snort) : Rule-based Rule-based Unable to detect novel attacksUnable to detect novel attacks High maintenance costHigh maintenance cost
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 88
Rule Based vs. Data MiningRule Based vs. Data Mining
Rule based systemsRule based systems
Data mining based systemsData mining based systems
Intrusion Data Security Experts Rules
Labeled DataData Mining
EnginePatterns
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 99
Data Mining Meets Data Mining Meets Intrusion Detection Intrusion Detection
Extract patterns of intrusions for Extract patterns of intrusions for misuse detectionmisuse detection
Build profiles of normal activities for Build profiles of normal activities for anomaly detectionanomaly detection
Build classifiers to detect attacksBuild classifiers to detect attacks Some IDSs have successfully Some IDSs have successfully
applied data mining techniques in applied data mining techniques in intrusion detectionintrusion detection
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1010
Proposed Architecture Proposed Architecture
AlarmerDetector
Pattern BuilderData Set
SensorsOn-line Pre-Processors
Off line
On line
Architecture of the proposed NIDS
NetworksNetworksNetworksNetworks
Database(On line)
Off-line Pre-processor
Database(Off line)
Patterns
PacketsAudited
dataFeaturevectors
Featurevectors
Alarms
Trainingdata
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1111
Random ForestsRandom Forests Unsurpassable in accuracy among Unsurpassable in accuracy among
the current data mining algorithmsthe current data mining algorithms Runs efficiently on large data set Runs efficiently on large data set
with many featureswith many features Gives the estimates of what features Gives the estimates of what features
are importantare important No nominal data problemNo nominal data problem No over-fittingNo over-fitting
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1212
Imbalanced IntrusionImbalanced Intrusion ProblemsProblems
• Higher error rate for minority intrusionsHigher error rate for minority intrusions• Some minority intrusions are more Some minority intrusions are more
dangerousdangerous• Need to improve the performance for Need to improve the performance for
the minority intrusions the minority intrusions Proposed SolutionProposed Solution
• Down-sample the majority intrusions Down-sample the majority intrusions and over-sample the minority intrusionsand over-sample the minority intrusions
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1313
Feature Selection Feature Selection
Essential for improving detection Essential for improving detection raterate
Reduces the computational costReduces the computational cost Many NIDSs select features by Many NIDSs select features by
intuition or the domain knowledgeintuition or the domain knowledge
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1414
Feature Selection over Feature Selection over the KDD’99 Datasetthe KDD’99 Dataset
Calculate variable Calculate variable importance using importance using random forests. random forests.
Select the 38 Select the 38 most important most important features in features in detection. detection.
-10 -5 0 5 10 15
32310353317
86
321424
536401312
4163422
12
293138373018194127
9261128253915
72021
Fe
atu
re
Importance
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1515
Some FeaturesSome Features
The two most important featuresThe two most important features• Feature 3. service type, such as http, telnet, and ftpFeature 3. service type, such as http, telnet, and ftp• Feature 23. count, # connections to the same host as Feature 23. count, # connections to the same host as
the current one during past two secondsthe current one during past two seconds The three least important featuresThe three least important features
• Feature 7. land, 1 if connection is from/to the same Feature 7. land, 1 if connection is from/to the same host/port; 0 otherwisehost/port; 0 otherwise
• Feature 20. num_outbound_cmds, # of outbound Feature 20. num_outbound_cmds, # of outbound commands in an ftp sessioncommands in an ftp session
• Feature 21. is_hot_login, 1 if the login belongs to the Feature 21. is_hot_login, 1 if the login belongs to the “hot” list; 0 otherwise“hot” list; 0 otherwise
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1616
Parameter Optimization Parameter Optimization for Random Forestsfor Random Forests
Optimize the Optimize the parameter parameter MtryMtry of of random forests to random forests to improve detection improve detection rate.rate.
Choose 15 as the Choose 15 as the optimal value, which optimal value, which reaches the reaches the minimum of the oob minimum of the oob error rate. error rate.
0.00165
0.0017
0.00175
0.0018
0.00185
0.0019
0.00195
0.002
0.00205
0.0021
0.00215
5 10 15 20 25 30 35 38
Mtry
Oob
Erro
r Rat
e0
100
200
300
400
500
600
Tim
e
Oob Error Rate
Time
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1717
Performance Comparison Performance Comparison on the KDD’99 Dataseton the KDD’99 Dataset
Our approach Our approach provides lower provides lower overall error rate and overall error rate and cost compared to the cost compared to the best KDD’99 result.best KDD’99 result.
Feature selection Feature selection can improve the can improve the performance of performance of intrusion detection. intrusion detection.
Overall Error Rate
6.95%
7.00%
7.05%
7.10%
7.15%
7.20%
7.25%
7.30%
7.35%
Best KDDResult
Experimentwithoutfeature
selection
Experiment with featureselection
Cost of Misclassification
0.225
0.226
0.227
0.228
0.229
0.23
0.231
0.232
0.233
0.234
Best KDDResult
Experimentwithoutfeature
selection
Experiment with featureselection
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1818
Conclusion and Future WorkConclusion and Future Work
Random forests algorithm can help Random forests algorithm can help improve detection performance and improve detection performance and select features.select features.
Sampling techniques can reduce the time Sampling techniques can reduce the time to build patterns and increase the to build patterns and increase the detection rate of minority intrusions. detection rate of minority intrusions.
In future, we will focus on anomaly In future, we will focus on anomaly detection and a multiple classifier detection and a multiple classifier architecture.architecture.
PST2005PST2005 Jiong Zhang and Mohammad ZulkernineJiong Zhang and Mohammad Zulkernine 1919