Polymorphic Malware DetectionConnor Schnaith, Taiyo Sogawa9 April 2012
Motivation• “5000 new malware samples per day”
• --David Perry of Trend Micro
• Large variance between attacks• Polymorphic attacks
• Perform the same function• Altered immediate values or addressing• Added extraneous instructions
• Current detection methods insufficient• Signature-based matching not accurate• Behavioral-based detection requires human analysis and
engineering
Malware Families•Classified into related clusters (families)
•Tracking of development•Correlating information•Identifying new variants
•Based on similarity of code•Koobface•Bredolab•PoisonIvy•Conficker (7 mil. Infected)
Source: Carrera, Ero, and Peter Silberman. "State of Malware: Family Ties." Media.blackhat.com. 2010. Web. 7 Apr. 2012. <https://media.blackhat.com/bh-eu-10/presentations/Carrera_Silberman/BlackHat-EU-2010-Carrera-Silberman-State-of-Malware-slides.pdf>.
~300 samples of malware with 60% similarity threshold
Current Research• Techniques for identifying malicious behavior
• Mining and clustering • Building behavior trees
• Industry• ThreatFire and Sana Security developing behavioral-based
malware detection
Design challenges• Discerning malicious portions of code
o Dynamic program slicingo accounting for control flow dependencies
• Reliable automationo Must be able to be reliable w/o human interventiono Minimal false positives
Holmes: Main Ideas• Two major tasks
o Mining significant behaviors from a set of samples
o Synthesizing an optimally discriminative specification from multiple sets of samples
• Key distinction in approacho "positive" set - maliciouso "negative" set - benigno Malware: fully described in the positive set,
while not fully described in the negative set
Main Ideas: behavior mining• Extracts portions of the dependence graphs of
programs from the positive set that correspond to behaviors that are significant to the programs’ intent.
• The algorithm determines what behaviors are significant (next slide)
• Can be thought of as contrasting the graphs of positive programs against the graphs of negative programs, and extracting the subgraphs that provide the best contrast.
Main ideas: behavior mining
• A "behavior" is a data dependence graph
• G = (V, E, a, B)
o V is the set of vertices that correspond to operations (system calls)
o E is the edges of the graph and correspond to dependencies between operations
o a is the labeling function that associates nodes with the operations they represent
o B is the labeling function that associates the edges with the logic that represents the dependencies
Main ideas: behavior mining• A program P exhibits a behavior G if it can produce an
execution trace T with the following propertieso Every operation in the behavior corresponds to an
operation invocation and its arguments satisfy certain logical constraints
o the logic formula on edges connecting behavior operations is satisfied by a corresponding pair of operation invocations in the trace
• Must capture information flow in dependence graphso two key characteristics
the path taken by the data in the program security labels assigned to the data source and
the data sink
Security Label DescriptionNameOfSelf The name of the currently
executing program
IsRegistryKeyForBootList
A Windows registry key lsiting software set to start on boot
IsRegistryKeyForWindows A registry key that contains configuration settings for the operating system
IsSystemDirectory The Windows system directory
IsRegistryKeyForBugfix The Windows registry key containing list of installed bugfixes and patches
IsRegistryKeyForWindowsShell
The Windows registry key controlling the shell
IsDevice A named kernel device
IsExecutableFile Executable file
Main ideas: behavior mining• Information gain is used to determine if a behavior
is significant. A behavior that is not significant is ignored when constructing the dependency graph
• Information gain is defined in terms of Shannon entropy and it means gaining additional information to increase the accuracy of determining if a G is in G+ or G-
• Shannon entropyo H(G+ U G-) corresponds to the uncertainty that
a graph G belongs to G+ or G-o partition G+ and G- into smaller subsets to
decrease that uncertaintyo process called subgraph isomorphism
Main ideas: behavior mining• A significant behavior g is a subgraph of a
dependence graph in in G+ such that:
Gain(G+ U G- , g) is maximized
• Information gain is used as the quality measure to guide the behavior mining process
• Some non-significant actions can get passed as significant o these actions may or may not throw off the
algorithm that determines if the program is malicious
Main ideas: behavior mining• Significant behaviors mined from malware Ldpinch
o Leaking bugfix information over the networko Adding a new entry to the system autostart listo Bypassing firewall to allow for malicious traffic
• Could say any program that exhibits all three of these behaviors should be flagged maliciouso This is too specific of a statement
i. Doesn't account for variations within a familyii. It is known that smaller subsets of behaviors
that only include one of these actions could still be malicious
iii. Need discriminative specifications
Main ideas: discriminative specifications• Creates clusters of behaviors that can be classified
into as characteristic subset
o Program matches specification if it matches all of the behaviors in a subset
o "Discriminative" in that it matches the malicious but not the benign programs
Main ideas: discriminative specifications• Each set of subset of behaviors induces a cluster of
sampleso Malicious and benign samples are mined are
organized into these clusterso Goal: find an optimal clustering technique to
organize the malicious into the positive subset and the benign into negative subset
Main ideas: discriminative specifications• Three part algorithm
o Formal concept analysiso Simulated annealingo Constructing optimal specifications
• Formal concept analysiso O is a cluster of sampleso A is the set of mined behaviors in Oo A concept is the pair (A, O)
Set of concepts: {c1, c2, c3 , ... , cN)Behavior specification: S(c1, c2, c3, ... , cN)
Main ideas: discriminative specificationsFormal Concept Analysis (continued)• Begins by constructing all concepts and computes
pairwise intersection of the intent sets of these concepts
• Repeated until a fixpoint is reached and no new concepts can be constructed
• When algorithm terminates, left with an explicit listing of all of the sample clusters that can be specified in terms of one or more mined behaviors
• Goal is to find {c1, c2, c3, ... , cN} such that S(c1, c2, c3, ... , cN) is optimal (based on
threshold)
Main ideas: discriminative specificationsSimulated annealing
• Probabilistic technique for finding approximate solution to global optimization problem
• At each step, a candidate solution i is examined and one of its neighbors j is selected for comparison
• The algorithm moves to j with some probability• A cooling parameter T is reduced throughout
process and when it gets to a minimum the process stops
Main ideas: discriminative specificationsConstructing Optimal Specifications
• Threshold t, a set containing positive and negative samples, and a set of behaviors mined with the previous process
• Called SpecSyntho Constructs full set of conceptso Removes redundant concepts o Run simulated annealing until convergence, then return the best solution
Holmes: Mining an Clustering
Evaluation and Results: Holmes• Used six malware families to develop specifications• Tested final product against 19 malware families
• Collected 912 malware samples and 49 benign
Holmes Continued• Experiments carried over varying threshold values (t)• Demonstrates high sensitivity to system accuracy • Perhaps only efficient for a specific subset of malware
Holmes Scalability• Worst-case complexity is exponential• Behaviors of repeated executions (Stration and Delf)
took 12-48 hours to analyze• Scalability for Holmes is a nightmare!
“scary and scaled”
USENIX• The Advanced Computing Systems Association• (Unix Users Group)
• 2009 article: automatic behavior matchingo Behavior graphs (slices)o Tracking data and control dependencieso Matching functionso Performance evaluations
Source: Kolbitsch, Clemens. "Effective and Efficient Malware Detection at the End Host." Usenix Security Symposium (2009). Web. 8 Apr. 2012. <http://www.iseclab.org/papers/usenix_sec09_slicing.pdf>.
USENIX: Producing Behavior Graphs• Instruction log
o Trace instruction dependencies
o Slicing doesn't reflect stack manipulation
• Memory logo Access memory
locations
Partial behavior graph of Netsky (Kolbitsch et al)
USENIX: Behavior Slices to Functions• Use instruction and memory log to determine input
arguments• Identify repeated instructions as loops• Include memory read functions• We can now compare to known malware
EvaluationSix families used for development (mostly mass-mailing worm)
Expanded test set
Performance Evaluation• Installed Internet Explorer, Firefox, Thunderbird, Putty, and Notepad on
Windows XP test machine• Single-core, 1.8 GHz, 1GB RAM, Pentium 4 processor
USENIX Limitations• Evading system emulator
o USENIX detector uses Qemu emulatoro delayso time-triggered behavioro command and control mechanisms
• Modifying algorithms behavioro A more fundamental change, but cannot be detected
using same signatures
• End-host based systemo Cannot track network activity
Questions/Discussion