Polymorphic Malware Detection

Polymorphic Malware DetectionConnor Schnaith, Taiyo Sogawa9 April 2012

Motivation• “5000 new malware samples per day”

• --David Perry of Trend Micro

• Large variance between attacks• Polymorphic attacks

• Perform the same function• Altered immediate values or addressing• Added extraneous instructions

• Current detection methods insufficient• Signature-based matching not accurate• Behavioral-based detection requires human analysis and

engineering

Malware Families•Classified into related clusters (families)

•Tracking of development•Correlating information•Identifying new variants

•Based on similarity of code•Koobface•Bredolab•PoisonIvy•Conficker (7 mil. Infected)

Source: Carrera, Ero, and Peter Silberman. "State of Malware: Family Ties." Media.blackhat.com. 2010. Web. 7 Apr. 2012. <https://media.blackhat.com/bh-eu-10/presentations/Carrera_Silberman/BlackHat-EU-2010-Carrera-Silberman-State-of-Malware-slides.pdf>.

~300 samples of malware with 60% similarity threshold

Current Research• Techniques for identifying malicious behavior

• Mining and clustering • Building behavior trees

• Industry• ThreatFire and Sana Security developing behavioral-based

malware detection

Design challenges• Discerning malicious portions of code

o Dynamic program slicingo accounting for control flow dependencies

• Reliable automationo Must be able to be reliable w/o human interventiono Minimal false positives

Holmes: Main Ideas• Two major tasks

o Mining significant behaviors from a set of samples

o Synthesizing an optimally discriminative specification from multiple sets of samples

• Key distinction in approacho "positive" set - maliciouso "negative" set - benigno Malware: fully described in the positive set,

while not fully described in the negative set

Main Ideas: behavior mining• Extracts portions of the dependence graphs of

programs from the positive set that correspond to behaviors that are significant to the programs’ intent.

• The algorithm determines what behaviors are significant (next slide)

• Can be thought of as contrasting the graphs of positive programs against the graphs of negative programs, and extracting the subgraphs that provide the best contrast.

Main ideas: behavior mining

• A "behavior" is a data dependence graph

• G = (V, E, a, B)

o V is the set of vertices that correspond to operations (system calls)

o E is the edges of the graph and correspond to dependencies between operations

o a is the labeling function that associates nodes with the operations they represent

o B is the labeling function that associates the edges with the logic that represents the dependencies

Main ideas: behavior mining• A program P exhibits a behavior G if it can produce an

execution trace T with the following propertieso Every operation in the behavior corresponds to an

operation invocation and its arguments satisfy certain logical constraints

o the logic formula on edges connecting behavior operations is satisfied by a corresponding pair of operation invocations in the trace

• Must capture information flow in dependence graphso two key characteristics

the path taken by the data in the program security labels assigned to the data source and

the data sink

Security Label DescriptionNameOfSelf The name of the currently

executing program

IsRegistryKeyForBootList

A Windows registry key lsiting software set to start on boot

IsRegistryKeyForWindows A registry key that contains configuration settings for the operating system

IsSystemDirectory The Windows system directory

IsRegistryKeyForBugfix The Windows registry key containing list of installed bugfixes and patches

IsRegistryKeyForWindowsShell

The Windows registry key controlling the shell

IsDevice A named kernel device

IsExecutableFile Executable file

Main ideas: behavior mining• Information gain is used to determine if a behavior

is significant. A behavior that is not significant is ignored when constructing the dependency graph

• Information gain is defined in terms of Shannon entropy and it means gaining additional information to increase the accuracy of determining if a G is in G+ or G-

• Shannon entropyo H(G+ U G-) corresponds to the uncertainty that

a graph G belongs to G+ or G-o partition G+ and G- into smaller subsets to

decrease that uncertaintyo process called subgraph isomorphism

Main ideas: behavior mining• A significant behavior g is a subgraph of a

dependence graph in in G+ such that:

Gain(G+ U G- , g) is maximized

• Information gain is used as the quality measure to guide the behavior mining process

• Some non-significant actions can get passed as significant o these actions may or may not throw off the

algorithm that determines if the program is malicious

Main ideas: behavior mining• Significant behaviors mined from malware Ldpinch

o Leaking bugfix information over the networko Adding a new entry to the system autostart listo Bypassing firewall to allow for malicious traffic

• Could say any program that exhibits all three of these behaviors should be flagged maliciouso This is too specific of a statement

i. Doesn't account for variations within a familyii. It is known that smaller subsets of behaviors

that only include one of these actions could still be malicious

iii. Need discriminative specifications

Main ideas: discriminative specifications• Creates clusters of behaviors that can be classified

into as characteristic subset

o Program matches specification if it matches all of the behaviors in a subset

o "Discriminative" in that it matches the malicious but not the benign programs

Main ideas: discriminative specifications• Each set of subset of behaviors induces a cluster of

sampleso Malicious and benign samples are mined are

organized into these clusterso Goal: find an optimal clustering technique to

organize the malicious into the positive subset and the benign into negative subset

Main ideas: discriminative specifications• Three part algorithm

o Formal concept analysiso Simulated annealingo Constructing optimal specifications

• Formal concept analysiso O is a cluster of sampleso A is the set of mined behaviors in Oo A concept is the pair (A, O)

Set of concepts: {c1, c2, c3 , ... , cN)Behavior specification: S(c1, c2, c3, ... , cN)

Main ideas: discriminative specificationsFormal Concept Analysis (continued)• Begins by constructing all concepts and computes

pairwise intersection of the intent sets of these concepts

• Repeated until a fixpoint is reached and no new concepts can be constructed

• When algorithm terminates, left with an explicit listing of all of the sample clusters that can be specified in terms of one or more mined behaviors

• Goal is to find {c1, c2, c3, ... , cN} such that S(c1, c2, c3, ... , cN) is optimal (based on

threshold)

Main ideas: discriminative specificationsSimulated annealing

• Probabilistic technique for finding approximate solution to global optimization problem

• At each step, a candidate solution i is examined and one of its neighbors j is selected for comparison

• The algorithm moves to j with some probability• A cooling parameter T is reduced throughout

process and when it gets to a minimum the process stops

Main ideas: discriminative specificationsConstructing Optimal Specifications

• Threshold t, a set containing positive and negative samples, and a set of behaviors mined with the previous process

• Called SpecSyntho Constructs full set of conceptso Removes redundant concepts o Run simulated annealing until convergence, then return the best solution

Holmes: Mining an Clustering

Evaluation and Results: Holmes• Used six malware families to develop specifications• Tested final product against 19 malware families

• Collected 912 malware samples and 49 benign

Holmes Continued• Experiments carried over varying threshold values (t)• Demonstrates high sensitivity to system accuracy • Perhaps only efficient for a specific subset of malware

Holmes Scalability• Worst-case complexity is exponential• Behaviors of repeated executions (Stration and Delf)

took 12-48 hours to analyze• Scalability for Holmes is a nightmare!

“scary and scaled”

USENIX• The Advanced Computing Systems Association• (Unix Users Group)

• 2009 article: automatic behavior matchingo Behavior graphs (slices)o Tracking data and control dependencieso Matching functionso Performance evaluations

Source: Kolbitsch, Clemens. "Effective and Efficient Malware Detection at the End Host." Usenix Security Symposium (2009). Web. 8 Apr. 2012. <http://www.iseclab.org/papers/usenix_sec09_slicing.pdf>.

USENIX: Producing Behavior Graphs• Instruction log

o Trace instruction dependencies

o Slicing doesn't reflect stack manipulation

• Memory logo Access memory

locations

Partial behavior graph of Netsky (Kolbitsch et al)

USENIX: Behavior Slices to Functions• Use instruction and memory log to determine input

arguments• Identify repeated instructions as loops• Include memory read functions• We can now compare to known malware

EvaluationSix families used for development (mostly mass-mailing worm)

Expanded test set

Performance Evaluation• Installed Internet Explorer, Firefox, Thunderbird, Putty, and Notepad on

Windows XP test machine• Single-core, 1.8 GHz, 1GB RAM, Pentium 4 processor

USENIX Limitations• Evading system emulator

o USENIX detector uses Qemu emulatoro delayso time-triggered behavioro command and control mechanisms

• Modifying algorithms behavioro A more fundamental change, but cannot be detected

using same signatures

• End-host based systemo Cannot track network activity

Questions/Discussion

Date post:	25-Feb-2016
Category:	Documents
Upload:	asis
View:	55 times
Download:	2 times

Polymorphic Malware Detection

Documents