A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

A Scalable Approach for Malware Detec2on through Bounded Feature

Space Behavior Modeling

Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel Briand, Shar Lwin Khin, and Bindu Madhavi Padmanabhuni

Interdisciplinary Centre for ICT Security, Reliability, and Trust

University of Luxembourg, Luxembourg

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

What is malware?

Malware (malicious + soFware) is nothing but a soFware that do malicious things without the vicHm’s knowledge

Mo2va2on

Ø More than 403 million new malware variants were created in 2011, a 41% increase over 2010.

Ø On average around 55,000 new malware samples were reported per day.

Ø ExponenHal growth of malware is a major threat in the soFware industry

Problem Defini2on 1/2

q New malware has become very sophisHcated.

q Malware evade tradiHonal anH-‐virus signatures, using various obfuscaHon techniques.

q Malware authors change the syntacHc characterisHcs (i.e., structure) of a malicious program without changing its semanHcs (i.e., behavior)

Problem Defini2on 2/2

q Scalability is a major problem in exisHng behavior-‐based malware detecHon techniques §  malware feature space grows in proporHon with the number of samples under examinaHon

§  ComputaHonally very intensive

Related Work 1/2

q PracHcality and efficiency of behavior based malware detecHon depends on:

•  size of feature space, •  computaHonal complexity, •  overheads (e.g., pre-‐processing) •  detecHon accuracy

q Simple malware behavior models (e.g., n-‐gram, m-‐bag and k-‐tuple) generate huge feature spaces and require various pruning and parameter tuning mechanisms

Related Work 2/2

q Complex malware behavior models (e.g., system call dependency graphs) are highly computaHonally intensive

Behavior Modeling – An Overview

Ø SoFware program perform ac#ons on various operaHng system resources.

Ø An acHon corresponds to a higher-‐level operaHon (e.g., reading a file) composed of a set of related system calls (e.g., NtReadFile)

Ø Advantage of using acHons over system calls is that OS may use different names for system calls that are in fact serving the same purpose

Ø NtCreateProcess and NtCreateProcessEx maps to CreateProcess acHon

Opera2ng System Resource Types

ü File System

ü Registry

ü Process and Thread

ü Network

ü SynchronizaHon

ü SecHon

Bounded Feature space behavior Modeling (BOFM)

Malware feature For each type of OS resource, the set of acHons performed by malware on an instance of the OS resource type concerned consHtutes a feature of the malware

Ø Example: Malware performs, CreateFile and DeleteFile acHons on a file instance C:\foo.exe, and DeleteFile acHon on another file instance C:\abc.dll

This malware has two features, {CreateFile, DeleteFile} and {DeleteFile} with respect to file resource instances C:\foo.exe and C:\abc.dll, respecHvely.

ü  Goal: To be more resilient to commonly used obfuscaHon techniques

v Property 1: Regardless of the number of Hmes an acHon is performed on an OS resource instance it is considered only once in final feature set.

E.g., ReadFile acHon is performed several Hmes on a file instance C:\Windows\...\sysfile2.dll; this behavior is modeled by a BOFM feature {ReadFile}

v Property 2: The sequence, in which the acHons are performed, by malware, is ignored in feature construcHon.

E.g., malware features {ReadFile, QueryFileInforma9on} and {QueryFileInforma9on, ReadFile} are considered idenHcal.

Proper2es of BOFM features 1/2

v Property 3: IdenHcal acHon sets which are performed on two different OS resource instances of same type are modeled as a single feature.

E.g., acHons CreateFile and DeleteFile performed on two different file resource instances C:\Windows\abc.dll and D:\Personel \foo.exe are modeled as a single BOFM feature {CreateFile, DeleteFile}

Proper2es of BOFM features 2/2

Goal: Avoid malware feature space growth proporHonal to number of samples under examinaHon

•  Lets j to be OS resource type, where •  Total number kj of possible acHons that a malware may

perform on an OS resource instance of type j is a constant •  Maximum number mj of possible features with regard to OS

resource type j is also a constant

Where, •  Maximum number of possible features N for all resource

types is always the following constant :

Bounded Feature Space

OS Resource Types and Corresponding Ac2ons

Total malware features (N) extracted from these six OS resources is 16,652

Model Construction Work Flow

Example feature vector

Detec2on Method

Ø Machine Learning (ML) classificaHon techniques used for building Malware DetecHon models

Ø LogisHc Regression (LR) and Support Vector Machine (SVM) are used in our experiments

Ø Malware detecHon process involves two phases •  Phase 1: model building phase •  Phase 2: model evaluaHon phase

Experimental Dataset

ü  Training-‐set of 5000 malware and 80 benign samples and a test-‐set of 300 malware and 20 benign samples

Experimental Results

ü SVM achieved 99.4% detecHon accuracy with no false posiHves and LR achieved 99.6% detecHon accuracy with 1% FP rate

ü Balanced test-‐sets consists of 20 randomly selected (from a pool of 300 samples) malware samples and the 20 benign samples.

ü For balance test-‐sets SVM yielded a perfect accuracy of 100% with 0% FP rate and LR achieved 99.5% detecHon accuracy with 1% FP rate.

Comparison with Canali et al. (ISSTA 2012)

q  Both achieve 99% detecHon accuracy q However, §  BOFM generated only 569 acHve features whereas Canali et al. generated several millions.

§  It took 1.67 hrs to extract malware features using BOFM while Canali et al. took around 48 hrs.

§  It took 26 seconds to train the SVM classifier, consuming only 200MB RAM. Whereas, Canali’s approach consumed more than 1GB RAM to perform signature matching.

§  BOFM is much more efficient and scalable

Conclusion ü  Malware evade tradiHonal anH-‐virus signatures, using various

obfuscaHon techniques.

ü  Behavior-‐based malware detecHon is an increasingly common soluHon

ü  Scalability is a major problem in exisHng behavior-‐based malware detecHon techniques

ü  We proposed a bounded feature space malware behavior modeling (BOFM) technique to address the scalability issue.

ü  BOFM entails a fixed number of features that do not grow in proporHon with the number of malware samples under examinaHon

ü  Benchmark: BOFM combined with SVM achieved 100% detecHon accuracy, within less than a minute and 200 MB of memory

Feature Space Analysis

•  Comparison of malware and benign feature spaces

•  57% of unique malware features suggests that BOFM is a promising technique to model the malware behavior

Brief Analysis of Interes2ng Features

Ø ‘NoHfyChangeKey’ acHon is very widely used by malware samples compared to benign samples (86% Vs. 15%).

Ø ‘DeleteKey’ acHon also widely used by malware samples.

Ø AcHons such as ‘OpenFile’, ‘GetFileAmributes’, ‘CreateMutex’ and ‘ReleaseMutex’ widely appeared in both malware and benign samples.

Date post:	24-Dec-2014
Category:	Documents
Upload:	lionel-briand
View:	90 times
Download:	0 times

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Documents