Ariu - Workshop on Multiple Classifier Systems - 2011

A modular architecture for the analysis of HTTP payloads based

on Multiple Classifiers

Davide Ariu [email protected]

Giorgio Giacinto [email protected]

Department of Electric and Electronic Engineering

University of Cagliari

Pattern Recognition and Applications Group http://prag.diee.unica.it

Group This research was sponsored by the Autonomous Region of Sardinia through a grant financed with the ”Sardinia PO FSE 2007‐2013” funds and provided according to the L.R. 7/2007

Napoli, 17 Giugno 2011

Outline

• Motivations

• The proposed system

• Experimental Setup and Results

• Conclusions

2 Pattern Recognition and Applications Group http://prag.diee.unica.it

Group

The objective

Design of an anomaly based Intrusion Detection System for the protection of Web Servers and Applications.

The HTTP traffic toward the web servers is inspected by a multiple classifier system.


Group

Why Web Applications?


Group

Why Anomaly Detection?


Group

A legitimate Payload...

GET /pra/ita/home.php HTTP/1.1 Host: prag.diee.unica.it Accept: text/*, text/html User-Agent: Mozilla/4.0


Group




Group

Request Line




Group

Request Line

Request Headers

...and some attacks

• Long Request Buffer Overflow HEAD / aaaaaaa…aaaaaaaaaaaa

• URL Decoding Error GET /d/winnt/sys32/cmd.exe?/c+dir HTTP/1.0 Host: www Connection: close


Group

Why Payload Analysis?

• Detection of Web-based attacks based on the – Analysis of the Request-Line

• Allows detecting only attacks that exploit input-validation flows e.g. Spectrogram ([Song,2009]), HMM-Web ([Corona,2009])

– HTTP Payload Analysis • Takes into account the whole HTTP-request, and thus it can (in principle) detect any kind of attack


Group

SOA - Payload Analysis

• Payl [Wang,2004] – n-grams to represent byte statistics

• McPAD [Perdisci,2009] – Ensemble of one-class SVM trained on ν-grams

• Spectrogram [Wang,2009] – Ensemble of Markov Chains to analyze the request-Line

• HMMPayl [Ariu,2011] – Ensemble of HMM to analyze sequences of bytes from

the whole payload

None of the above techniques represented the structure of the payload


Group

The proposed system Basic Idea

• We propose to take into account the structure of HTTP payloads – For each line of the payload, an ensemble of HMM is used to model the sequences of bytes.

– The final decision is obtained by using the HMM outputs as features. The payload is thus classified by a one-class classifier trained on the outputs of the HMM ensembles.


Group

The proposed system A scheme


Group

HMM Ensemble Request‐Line

HMM Ensemble User‐Agent

HMM Ensemble Host

HMM Ensemble Accept‐Encoding

HMM Ensemble Accept‐Language 0.62

‐1

0.53

0.34

0.49

One‐Class Classifier

Output Score or

Class‐Label

IDS

GET /pra/index.php HTTP/1.1 Host: prag.diee.unica.it User-Agent: Mozilla/5.0 Accept-Encoding: gzip, deflate

HTTP Payload

Missing Features

• Each request typically does not contain all the headers

– Training phase: the value of the feature related to a missing header has been set to the average value

– Testing phase: the value of the feature related to a missing header has been set to -1


Group

Experimental Setup - 1

• 2 Datasets of “Real” legitimate traffic – DIEE, collected at the University of Cagliari

– GT, collected at Georgia Tech


Group


• 3 Datasets of “Real” Attacks – Generic, 66 Attacks – Shell-code, 11 Attacks – XSS-SQL Injection,38 Attacks

• Training: 1 day of traffic • Test: the remaining traffic plus attacks – K-fold CV

16


• 4 One-class classification algorithms with default setting of parameters – Gauss - Gaussian distribution – Mog – Mixture of Gaussians – Parzen – Parzen density estimator – SVM – SVM with RBF Kernel

• Performance evaluated using the “Partial AUC” – Computed in the FP range [0,0.1] – Normalized dividing by 0.1


Group

Experimental Results Partial AUC – DIEE Dataset


Group

Experimental Results Multiple HMM – DIEE Dataset – Shellcode Attacks


Group

Experimental Results Partial AUC – GT Dataset


Group

Experimental Results Comparison with similar IDS


Group

Computational Cost


Group

Conclusions

• We proposed an anomaly based IDS for the protection of Web-Servers and Web-Applications

• We exploited the MCS paradigm – To analyze the structure of the HTTP payload – By combining the outputs through a One-class classifier

• Compared to similar systems, our propoal – Provides high performance in attack detection – Is fast


Group

Thank You!

Date post:	25-May-2015
Category:	Technology
Upload:	pra-group-university-of-cagliari
View:	420 times
Download:	0 times

Ariu - Workshop on Multiple Classifier Systems - 2011

Technology