Malware analysis and its issuesThe average malicious binary is not interesting
- Repetitive code- Repetitive techniques- Self-taught developers- Limited interests
Wouldn‘t it be neat to see at one glance roughlywhat a binary is about?
Limitations of contemporaryautomated malware analysis
Obfuscation
Self-modifying code
Byte code and virtual machines
Dynamic API loading
Asynchronous code
Object oriented code
Sandbox detection
Missing dependencies/components
Need for interaction
Time based evasion
Missing input values
Multiple execution paths
Incompatibilities
Static Dynamic
Multiple execution pathsCommon sandboxes are fairly limited in their analysis capabilitiesof multi-purpose malware
In almost all cases they are totally useless for
analyzing benign binaries
Packer /Evasion
Setup Call home
might or might notbe analyzed
Indicators for packersbenign
targeted random
EP section name abnormal
EP section entropy too high/low
Use of TLS sections
API calls / KB ratio
Section count too low
Imphash missing
No big dataNo clusteringFor sure no machinelearningNo binary diffingNo serious mathNo software licenses ^^
Help in static analysis
Persisting of analysis results
Small to medium scale sample sets
Tool thats easy to handle andextendable
Metrics
Creative indicator extraction
So yeah.. I used radare2
Radare2 accessed through r2pipe, scripted from Python
Available for free
Disassemble (and assemble for) many different architectures
Debug with local native and remote debuggers (gdb, rap, webui, r2pipe, winedbg, windbg)
Run on Linux, *BSD, Windows, OSX, Android, iOS, Solaris and Haiku
Perform forensics on filesystems and data carving
Be scripted in Python, Javascript, Go and more
Support collaborative analysis using the embedded webserver
Visualize data structures of several file types
Patch programs to uncover new features or fix vulnerabilities
Use powerful analysis capabilities to speed up reversing
Aid in software exploitation
ScalableScriptableGUI-freeGreat supportQuick bug fixes
With splendid reasoning
Can analyze entire binariesProvides
- functions and cross references- symbols- strings- basic PE information
GraphityPython project built on
radare2 / r2pipeNetworkXpyplotpefileNeo4j
graphity
graphityOut
graphityFunc
graphityUtil
Published athttps://github.com/GDATAAdvancedAnalytics/r2graphity
Binary Cartography
Function call graphsFunction cross references within code section
References to function offsetsReferences to code w/o functionOutside executable section(s)
Nodes: functions
Edges: calls, handler functions
StringsString parsingEvaluation: ASCII, cross references,character frequency analysis
String list detectionstring length + alingmentstring following w/o cross reference
Fitting strings into the graph
APIsCross references on symbols
Indirect calls- parsing for mov/lea- disassembling further- call and jmp considered xref
Thunk pruning
Dynamic loading
Callbacks / Handler Functions„Top-down“
Disassemble upwards
Check the push instructions for function cross references
Add edge and tag
Currently only CreateThread and SetWindowsHookEx, becausecontext
„Bottom-up“
Sweep for nodes without inbound edgesCheck for cross references within functionsAdd edge and tag
Backdoor: Win32/Redsip.A
https://github.com/citizenlab/malware-indicators/blob/master/file-indicators.csv
Binary WhisperingAPI call gadgets
„pattern matching“ of APIs
Iterate nodes
Iterate neighbors
If feasible, further iterations
Problems:
-indirect function calls
-bigger call gadgets lowerhit chances
-human analyst to drawfinal conclusions
Why metrics?Measuring things is fun
Lack of metrics for sophistication
Lack of metrics for complexity
IOCs suck- they ain‘t no metrics that aren‘t cheaply tricked
Little ability to measure suspiciousness
Little ability to masure benign-ness
Graph Measurement
Numbers: simplified representation, allow for distancemeasurement, help finding outliers and anomalies
Good Papers
„Jackdaw: Towards Automated Reverse Engineering of Large Datasets of Binaries“, Polino, Scorti, Maggi, Zanerohttps://iseclab.org/media/uploads/zotero/Polino%20et%20al_2015_Jackdaw.pdf
„Distributing the Reconstruction of High-Level Intermediate Representation forLarge Scale Malware Analysis“, Matrosov, Rodionov, Barbosa, Brancohttps://github.com/REhints/BlackHat_2015/blob/master/slides_BHUS_2015.pdf
„Automated Reverse Engineering“, Halvar Flakehttp://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf