Post on 28-Dec-2015
transcript
Heng Yin, Dawn Song, Manuel Egele, Christopher Kruegel, and Engin Kirda
Presentation by
Mridula Menon N.
INDEX INTRODUCTION
CONTRIBUTION
OVERVIEW OF APPROACH
DESIGN & IMPLEMENTATION
TAINT-GRAPH-BASED MALWARE DETECTION
AND ANALYSIS
EVALUATION
CONCLUSION
INTRODUCTIONMalicious software provided by reputable vendors perform undesirable
actions violating users privacy. Eg: Google Desktop, Sony media player.Traditional Malware Detection:
Signature-based Cannot detect new malware or variants
Heuristics- based High false positives High false negatives
End-to-end approach to automatically identify the fundamental trait of malicious information access and processing behavior of a given program.
Monitor and record the information access and processing behavior of the sample in the test cases: use whole-system, fine-grained taint tracking. (operating-system-aware taint analysis)
CONTRIBUTIONObserved the fundamental trait of privacy-breaching malware lies in their
information access and processing behavior to sensitive information, and proposed an end-to-end automatic approach to classify and detect malware using their information access and processing behaviors.
Designed and developed a critical component of Panorama, a whole system, fine-grained, operating-system-aware, dynamic taint tracking system to monitor and investigate the unknown sample.
In their extensive experiments, their system detected all the malware samples(keyloggers, password sniffers, stealth backdoors, rootkits and
spyware) and had very few false positives.
DESIGN AND IMPLEMENTATION
Hardware-level Dynamic Taint TrackingOS-Aware Taint TrackingAutomated Testing and Taint Graph Generation
1)Hardware-level Dynamic Taint TrackingThey monitor the whole system execution in a processor
emulator and dynamically instrument code to keep track of how tainted data propagates during program execution.
Implemented Panorama on QEMU. Our approach can deal with multiple processes, memory
swapping and disks.Shadow Memory: To store taint status, for efficient
memory usage .Taint Sources: All sensitive information that is introduced
into the system in the automated tests is marked as taint source.
Taint Propagation: Monitor CPU instructions and DMA operations.
Taint Propagation
For arithmetic instructions, the result will be tainted if and only if any byte of the operands is tainted.
For data movement instructions and DMA operations, destination will be tainted if and only if bytes of operands is tainted.
Special cases: Constant function: Eg: “xor eax, eax” - Untaint the result. Table lookup: Updated rule: If any byte used to calculate the address of a
memory locations is tainted, then, the result of a memory read using this address is tainted as well.
Control flow evasion: Taint tracking stops at a keystroke Unicode conversion routine called _xxxInternalToUnicode while examining Windows kernel code. Solved by instrumenting an instruction within the function. This instrumentation checks the taint status of the input parameter of the function, and appropriately propagates the taint status to its output parameter.
2) OS-Aware Taint TrackingResolving process and module information.Resolving filesystem and network information.Identifying the code under analysis and its actions.
Resolving process and module informationDeveloped a kernel module called module notifier.Load this module into the guest operating system to
collect the updated memory map information.All the information is passed on to Panorama through
a predefined I/O port.To ensure the authenticity of the messages that
Panorama receives from the module notifier, we check the program counter of the instruction that is responsible for sending this message.
Resolving filesystem and network informationFilesystem: Integrated a disk forensic tool called
“The Sleuth Kit” (TSK) into Panorama for gathering filesystem information.
Network: When tainted data is sent out, check the packet header to find out which connection it belongs to.
Identifying the code under analysis and its actionsTwo Cases:The sample under analysis dynamically generates new code : Taint the complete code segment of the sample under analysis, using a special label. The given code calls a piece of trusted code in order to perform tainted operations on its behalf: We record the current value of the stack pointer, together with the current thread identifier..
Automated TestingThe system defines nine different types
of taint sources: text, password, HTTP, HTTPS, ICMP, FTP, document, and directory.
The communication between the test engine and the taint engine is via an intercepted registry writing API.
Taint Graph GenerationGenerate one graph for each taint source with different
label.For each taint source, the taint propagation originating
from this source forms a directed graph (taint graph).Taint graph can be represented as g = (V,E), where V is
set of vertices and E is a set of directed edges connection vertices.
Each vertex is labeled with a (type, value) pair, where value is the unique name that identifies the vertex.
Taint Graph GenerationType of a vertex defined in a hierarchical
form:
type ::= taint_source | os_objecttaint_source ::= text | password | HTTP | HTTPS| FTP | ICMP | document | directoryos_object ::= process | module | network | file
Taint Graph Generation
Figure 2: An example of taint graph. This graph reflects the procedure for Windows user authentication. While a password thief is running in the background, it catches the password and saves them to its log file “c:\ginalog.log”.
TAINT-GRAPH-BASED MALWARE DETECTION AND ANALYSIS
Taint-Graph-Based Malware Detection
Taint-Graph-Based Malware Analysis
Taint-Graph-Based Malware DetectionAnomalous information access behaviorAnomalous information leakage behaviorExcessive information access behaviorPolicies enforced on the taint graphs:
Taint-Graph-Based Malware AnalysisGiven a taint graph, the first step is to check this graph for
the presence of a node that corresponds to the sample under analysis.
If such a node is present, we obtain the information that the sample has accessed certain tainted input data because the test cases are designed such that input data is sent to trusted applications, but never to the sample under analysis.
EVALUATION
Two of these false positives were personal firewall programs. The third false positive was a browser accelerator.
Malware Detection
Case StudyHere, %INST DIR% represents “c:\Program Files\Google\Google Desktop Search”, and %TEMP% is “c:\Documents and Settings\user\Local Settings\ Temporary Internet Files”.
Malware Analysis
ConclusionWhole system fine grained taint analysis to discern fine-
grained information access and processing behaviour of unknown code.
Panorama yields zero false negative and very few false positives.