Impeding Malware Analysis Using
Conditional Code Obfuscation
Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke LeeConference: Network and Distributed System Security Symposium (NDSS), 2008Presented by: LIU Limin
Outline
Introduction Conditional Code Obfuscation Implications Implementation and Evaluation Discussion
Introduction
Hundreds of new malware samples appear every day.– Trojans, Rootkits, Worms, Viruses,
Backdoors … Automated malware analysis becomes
increasingly important. – Static analysis– Dynamic analysis– State-of-the-art analyzer
Malware Analysis Offense
– Polymorphism, metamorphism and opaque predicates.
– Trigger based behavior. (time-bombs, logic-bombs, bot commands etc.)
?
Defense– Static analysis
– Dynamic analysis
– Input-oblivious analyzers (Dynamic multiple path exploration, Forced execution)
Obfuscation
Obfuscations that are easily to be applicable on existing code can be a threat.
Conditional Code Obfuscation: A simple, automated and transparent obfuscation against powerful input-oblivious analyzer.
Outline
Introduction Conditional Code Obfuscation Implications Implementation And Evaluation Discussion
Conditional Code Snippets
cmd = get_command (sock);if (strcmp (cmd, “startkeylogger”) ==
0){ log_keys();}
n = get_day_of_month ();if ((n > 10) && (n<20)){ attack();}
E.g.1
E.g.2
Obfuscated example snippet
Original code
Obfuscated codecmd = get_command (sock);if (hash (cmd) == H) /* here, H=
hash(“startkeylogger”)*/{ decrypt_function (encr_log_keys, cmd); encr_log_keys(); /*encrypted log_keys*/}
cmd = get_command (sock);if (strcmp (cmd, “startkeylogger”) ==
0){ log_keys();}
One-way
General Obfuscation Mechanism
Hash properties– Pre-image resistance: infeasible to find c given Hc.– Second pre-image resistance: hard to find another c’ for
which Hash (c’) = Hc .
Candidate conditions– Equality operators: ‘==’, strcmp, strncmp, memcmp…– Unsupported operators: ‘>’, ‘<’…
Conditional code– Code that gets executed when a condition is satisfied.
Automation using Static Analysis
Finding Conditional Code– Identify candidate conditions
Construct a CFG for each function Identify basic blocks having conditional
branches Select candidate conditions those contain
equality operators
– Find corresponding conditional code Intra-procedural: basic blocks which are control
dependent on condition with true outcome Inter-procedural: set of functions which only be
reachable when certain condition is satisfied
Automation using Static Analysis
Handling Common Conditional Code
– Duplicate the code and encrypt it separately for each candidate condition.
Simplifying Compound Constructs
Operators (&& or ||…) combine more than one simple condition
Break the compound conditions into semantically equivalent but simplified conditions
Outline
Introduction Conditional Code Obfuscation Implications Implementation And Evaluation Discussion
Consequences to Existing Analyzers
Path exploration and input discovery– Construct constraints for each path (e.g. X ==
c ). Input Discovery (EXE)
– Discover inputs from constraints by using symbolic execution.
Obfuscated constraints is “Hash (X) == Hc”
Infeasible to reverse the hash function.
Consequences to Existing Analyzers
Forcing execution– Force execution along a specific path without
solving the constraints– Without key, program crashes.
Static analysis– Conceal the behavior in the encrypted block
Attacks
Brute Force and Dictionary Attacks– Constraint: Hash (X) = Hc
Find possible X for satisfying above equation. Domain (X) : set of all possible values that X may
take during execution. t: time taken to a test a single value of X or the
hash computation time. Brute Force attempt: time = |Domain (X)|* t . If X is n bits in length, attack requires 2nt time.
Outline
Introduction Conditional Code Obfuscation Implications Implementation And Evaluation Discussion
Implementation
Platform: Linux Input: C/C++ Source; Output: ELF Binary Four phases:
– Front-end Code Parsing Phase– Analysis/Transformation Phase– Code Generation Phase– Encryption Phase
Two Levels:– Binary level: decrypted code is executable– Intermediate code level: data types information
Analysis phase
Candidate Condition Replacement– Identify candidate conditions and their conditional code– Hash function: SHA-256
Decipher Routine– Encryption algorithm: AES with 256-bit keys
Decryption Key and Markers– Key (X) = Hash (X|N), N is a nonce.– marker: foresee the exact location of the corresponding
code in the resulting binary file.
Encryption phase
Identify code blocks needing encryption. Extracts the encryption key Kc. Replace K c and End_marker() with NOP
instructions. Calculate the size of the block to be encrypted. Place the size as argument to the call to Decipher. Encrypt the block with the key Kc.
Experimental Evaluation
Evaluate system by determining how many manually identified trigger-based malicious behavior were automatically and completed obfuscated.
Three levels of obfuscation strength:– Strong: strings– Medium: integers– Weak: boolean flags
Outline
Introduction Conditional Code Obfuscation Implications Implementation And Evaluation Discussion
Strengths
Malware author can modify the programs to improve the strengths.– Introducing more candidate conditions.
Query for resources and compare with the names.
Replace operators such as <, > or != by ==.
– Increasing the size of the concealed code. Incorporate triggers that encapsulates more
execution behavior.
– Increasing the input domains. Use variables with larger domains (e.g., string)
or use integer with larger size.
Weakness
Limited types of conditions– Equality checks.
Input domain may be very small in some cases.– 32-bit or 64-bit integers.
Possible ways to defeat
Equipped with decryptors that reduce the search space of keys by taking the input domain into account.– the result or an argument receiving data
from a system call, e.g. gettimeofday.
Input-aware analysis.– Collection mechanisms capture interaction
of the binary with its environment.
Conclusion
An obfuscation scheme that can be automatically applied on malware programs.
The obfuscation conceal trigger based-malicious behavior from state-of-the-art malware analyzers.
It is shown that the obfuscation scheme is capable of concealing a large fraction of malicious triggers by experiment.