Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | lucas-patrick |
View: | 214 times |
Download: | 0 times |
CISC 879 - Machine Learning for Solving Systems Problems
Presented by: Akanksha KaulDept of Computer & Information Sciences
University of Delaware
SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging
Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao
CISC 879 - Machine Learning for Solving Systems Problems
• Urgent need to detect malicious executables
• Major Threats
Metamorphic Executables
Reprograms itself
Capable of infecting two OS.
Polymorphic Executables
Emulates as Non-malicious code
Unseen Executables
MOTIVATION
CISC 879 - Machine Learning for Solving Systems Problems
Need of the Hour• SBMDS String Based Malware Detection System• What this system is exactly all about??
• Performs Interpretable String Analysis Interpretable string is line of codes in a program which contains both API execution calls and important semantic strings representing the intent and goal of the program writer.
CISC 879 - Machine Learning for Solving Systems Problems
Interpretable String???• Eg: Worm “Nimda”“html script language = ‘javascript’ window.open(‘readme.eml’)”
• Another Example:“&gameid= %s&pass=%s; myparentthreadid=%d; myguid=%s”
• But all Strings are not interpretable Eg:
“!0&0h0m0o0t0y0”
“*3d%3dtgyhjij”,
CISC 879 - Machine Learning for Solving Systems Problems
Major Steps to perform
• Constructing the interpretable strings by developing a feature parser.
• Performing feature selection to select informative strings.
• Using SVM ensemble with bagging to construct the classifier.
• Conducting the malware detector, also predict the exact type of the malware.
CISC 879 - Machine Learning for Solving Systems Problems
Step 1• Develop Feature parser
39,838 executable collected from Kingsoft Anti-virus lab.
All executables are PE files.
Extract static features API calls from import table. Strings carrying semantic interpretation.
CISC 879 - Machine Learning for Solving Systems Problems
SAMPLE (Backdoor-Redgirl.exe)
‘%s’ goto delete” always implicates that the malware may generate the “.bat” file to suicide
CISC 879 - Machine Learning for Solving Systems Problems
Step 2
• Feature Selection
Selects only interpretable stringsfrom the huge set of strings obtained
from previous step.
Assign these strings as signatures of thePE files.
CISC 879 - Machine Learning for Solving Systems Problems
Step 3• Using SVM to CLASSIFYWhy SVM ??
• Have showed state-of-art results in classification problem.
Problem: training complexity of SVM dependent on size of data set.
CISC 879 - Machine Learning for Solving Systems Problems
Problem
Training Accuracy becomes Constant when size of dataset reaches 3000
CISC 879 - Machine Learning for Solving Systems Problems
Curse of Dimensionality??
• Problem caused by the exponential increase in volume of data.
• How does SVM deals with “Curse of Dimensionality”
• Solution: By Using SVM ensemble & • Bagging
• SVM ensemble and Bagging???
CISC 879 - Machine Learning for Solving Systems Problems
3.1 SVM Ensemble with Bagging
• Ensemble is a set of classifiers whose individual decisions are combined in some way to classify new samples.
• Bagging technique on the training set
“BAGGING” (Bootstrap AGGregating)
• Uniform sampling of training data set
CISC 879 - Machine Learning for Solving Systems Problems
3.2 Multi-Classification
• Various classes of Malwares.• To select the identical values from two different classes method of “MAJORITY VOTING” is used.• Smallest index is chosen1= Backdoors2= Spywares3= Trojans4= Worms0= Benign files
CISC 879 - Machine Learning for Solving Systems Problems
STEP 4: Malware Detection
• Unknown variants of malwares are used.
• Malicious or not.
• To which class Malware belongs to.
CISC 879 - Machine Learning for Solving Systems Problems
System Architecture
1. Feature Parser
2. Feature Selection
3. SVM Ensemble Classifier
4. Malware Detector
CISC 879 - Machine Learning for Solving Systems Problems
Reason why I Chose This paper
• Comparisons With the Popular Anti- Virus Software.
Points of Comparisons:1. Detecting Known Variants of Malware.2. Detecting Unknown Variants.3. Efficiency (Detection Time).4. Number of False positive Detections.
CISC 879 - Machine Learning for Solving Systems Problems
Conclusion
• This system has been already incorporated into the scanning tool of a commercial Anti- Virus software.
• Anti-Virus Name not Disclosed.