+ All Categories
Home > Documents > Forecasting Malware Capabilities From Cyber Attack Memory … · Forecasting Malware Capabilities...

Forecasting Malware Capabilities From Cyber Attack Memory … · Forecasting Malware Capabilities...

Date post: 01-Feb-2021
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
18
Forecasting Malware Capabilities From Cyber Attack Memory Images Omar Alrawi*, Moses Ike*, Matthew Pruett, Ranjita Pai Kasturi, Srimanta Barua, Taleb Hirani, Brennan Hill, Brendan Saltaformaggio Georgia Institute of Technology Abstract The remediation of ongoing cyber attacks relies upon timely malware analysis, which aims to uncover malicious functionalities that have not yet executed. Unfortunately, this requires repeated context switching between different tools and incurs a high cognitive load on the analyst, slowing down the investigation and giving attackers an advantage. We present Forecast, a post-detection technique to enable incident responders to automatically predict capabilities which malware have staged for execution. Forecast is based on a probabilistic model that allows Forecast to discover capabilities and also weigh each capability according to its relative likelihood of execution (i.e., forecasts). Forecast leverages the execution context of the ongoing attack (from the malware’s memory image) to guide a symbolic analysis of the malware’s code. We performed extensive evaluations, with 6,727 real-world malware and futuristic attacks aiming to subvert Forecast, showing the accuracy and robustness in predicting malware capabilities. 1 Introduction Cyber attack response requires countering staged malware capabilities (i.e., malicious functionalities which have not yet executed) to prevent further damages [1], [2]. Unfortunately, predicting malware capabilities post-detection remains manual, tedious, and error-prone. Currently, analysts must repeatedly carry out multiple triage steps. For example, an analyst will often load the binary into a static disassembler and perform memory forensics, to combine static and dynamic artifacts. This painstaking process requires context switching between binary analysis and forensic tools. As such, it incurs a high cognitive load on the *Authors contributed equally. analyst, slowing down the investigation and giving the attackers an advantage. To automate incident response, symbolic execution is promising for malware code exploration, but lacks the prior attack execution state which may not be re-achievable after-the-fact (e.g., concrete inputs from C&C activity). Environment-specific conditions, such as expected C&C commands, limit dynamic and concolic techniques (e.g., [3]–[14]) from predicting inaccessible capabilities. In addition, these techniques depend on dissecting a standalone malware binary or running it in a sandbox. However, malware are known to delete their binary or lock themselves to only run on the infected machine (hardware locking). Worse still, researchers found that fileless malware incidents (i.e., only resides in memory) continue to rise [1], [15], [16]. Having access to the right execution context is necessary to guide malware into revealing its capabilities. Malware internally gather inputs from environment-specific sources, such as the registry, network, and environment variables, in order to make behavior decisions [11], [17], [18]. Therefore, an ideal and practical input formulation for malware can be adapted from this internal execution state in memory bearing the already-gathered input artifacts. It turns out that anti-virus and IDS already collect memory images of a malicious process after detecting it [19]–[21]. A malware memory image contains this internal concrete execution state unique to the specific attack instance under investigation. During our research, we noticed that if we can animate the code and data pages in a memory image, and perform a forward code exploration from that captured snapshot, then we can re-use these early concrete execution data to infer the malware’s next steps. Further, by analyzing how these concrete inputs induce paths during code exploration, we can predict which paths are more likely to execute capabilities based on the malware’s captured execution state. Based
Transcript
  • Forecasting Malware Capabilities From Cyber Attack Memory Images

    Omar Alrawi*, Moses Ike*, Matthew Pruett, Ranjita Pai Kasturi, Srimanta Barua,Taleb Hirani, Brennan Hill, Brendan Saltaformaggio

    Georgia Institute of Technology

    AbstractThe remediation of ongoing cyber attacks relies upontimely malware analysis, which aims to uncovermalicious functionalities that have not yet executed.Unfortunately, this requires repeated context switchingbetween different tools and incurs a high cognitive loadon the analyst, slowing down the investigation andgiving attackers an advantage. We present Forecast,a post-detection technique to enable incident respondersto automatically predict capabilities which malwarehave staged for execution. Forecast is based on aprobabilistic model that allows Forecast to discovercapabilities and also weigh each capability according toits relative likelihood of execution (i.e., forecasts).Forecast leverages the execution context of theongoing attack (from the malware’s memory image) toguide a symbolic analysis of the malware’s code. Weperformed extensive evaluations, with 6,727 real-worldmalware and futuristic attacks aiming to subvertForecast, showing the accuracy and robustness inpredicting malware capabilities.

    1 Introduction

    Cyber attack response requires countering stagedmalware capabilities (i.e., malicious functionalitieswhich have not yet executed) to prevent furtherdamages [1], [2]. Unfortunately, predicting malwarecapabilities post-detection remains manual, tedious, anderror-prone. Currently, analysts must repeatedly carryout multiple triage steps. For example, an analyst willoften load the binary into a static disassembler andperform memory forensics, to combine static anddynamic artifacts. This painstaking process requirescontext switching between binary analysis and forensictools. As such, it incurs a high cognitive load on the

    *Authors contributed equally.

    analyst, slowing down the investigation and giving theattackers an advantage.

    To automate incident response, symbolic execution ispromising for malware code exploration, but lacks theprior attack execution state which may not bere-achievable after-the-fact (e.g., concrete inputs fromC&C activity). Environment-specific conditions, such asexpected C&C commands, limit dynamic and concolictechniques (e.g., [3]–[14]) from predicting inaccessiblecapabilities. In addition, these techniques depend ondissecting a standalone malware binary or running it ina sandbox. However, malware are known to delete theirbinary or lock themselves to only run on the infectedmachine (hardware locking). Worse still, researchersfound that fileless malware incidents (i.e., only residesin memory) continue to rise [1], [15], [16].Having access to the right execution context is

    necessary to guide malware into revealing itscapabilities. Malware internally gather inputs fromenvironment-specific sources, such as the registry,network, and environment variables, in order to makebehavior decisions [11], [17], [18]. Therefore, an idealand practical input formulation for malware can beadapted from this internal execution state in memorybearing the already-gathered input artifacts. It turnsout that anti-virus and IDS already collect memoryimages of a malicious process after detecting it [19]–[21].A malware memory image contains this internalconcrete execution state unique to the specific attackinstance under investigation.During our research, we noticed that if we can

    animate the code and data pages in a memory image,and perform a forward code exploration from thatcaptured snapshot, then we can re-use these earlyconcrete execution data to infer the malware’s nextsteps. Further, by analyzing how these concrete inputsinduce paths during code exploration, we can predictwhich paths are more likely to execute capabilitiesbased on the malware’s captured execution state. Based

  • on this idea, we propose seeding the symbolicexploration of a malware’s pre-staged paths withconcrete execution state obtained via memory imageforensics. Through this, we overcome the previouspainstaking and cognitively burdensome process that ananalyst must undertake.

    We present Forecast, a post-detection technique toenable incident responders to forecast what capabilitiesare possible from a captured memory image. Forecastranks each discovered capability according to itsprobability of execution (i.e., forecasts) to enableanalysts to prioritize their remediation workflows. Tocalculate this probability, Forecast weighs eachpath’s relative usage of concrete data. This approach isbased on a formal model of the degree of concreteness(or DC(s)) of a memory image execution state (s).Starting from the last instruction pointer (IP) value inthe memory image, Forecast explores each path bysymbolically executing the CPU semantics of eachinstruction. During this exploration, Forecast modelshow the mixing of symbolic and concrete datainfluences path generation and selection. Based on thismixing, a “concreteness” score is calculated for eachstate along a path to derive forecast percentages foreach discovered capability. DC(s) also optimizessymbolic analysis by dynamically adapting loop bounds,handling symbolic control flow, and pruning paths toreduce path explosion.To automatically identify each capability, we

    developed several modular capability analysis plugins:Code Injection, File Exfiltration, Dropper, Persistence,Key & Screen Spying, Anti-Analysis, and C&C URLConnection. Each plugin defines a given capability interms of API sequences, their arguments, and how theirinput and output constraints connect each API.Forecast plugins are portable and can easily beextended to capture additional capabilities based on thetarget system’s APIs. It is worth noting thatForecast’s analysis only requires a forensic memoryimage, allowing it to work for fileless malware, makingit well-suited for incident response.We evaluated Forecast with memory images of

    6,727 real-world malware (including packed andunpacked) covering 274 families. Forecast rendersaccurate capability forecasts compared to reportsproduced manually by human experts. Further, we showthat Forecast is robust against futuristic attacks thataim to subvert Forecast. We show that Forecast’spost-detection forecasts are accurately induced by earlyconcrete inputs. We empirically compared Forecastto S2E [6], angr [22], and Triton [23] and found thatForecast outperforms them in identifying capabilitiesand reducing path explosion. Forecast is availableonline at: https://cyfi.ece.gatech.edu/.

    2 Overview

    This section presents the challenges and benefits ofcombining the techniques of memory image forensicsand symbolic analysis. Using the DarkHotel incident [2]as a running example, we will show how incidentresponders can leverage Forecast to expedite theirinvestigation and remediate a cyber attack.

    Running Example - DarkHotel APT. DarkHotelis an APT that targets C-level executives through spearphishing [2]. Upon infection, DarkHotel deletes itsbinary from the victim’s file system, communicates witha C&C server, injects a thread into Windows Explorer,and ultimately exfiltrates reconnaissance data. When anIDS detects anomalous activities on an infected host, anend-host agent captures the suspicious process memory(i.e., DarkHotel’s), terminates its execution, andgenerates a notification. At this point, incidentresponders must quickly understand DarkHotel’scapabilities from the different available forensic sources(network logs, event logs, memory snapshot, etc.) toprevent further damages.Dynamic techniques [11]–[14] may require an active

    C&C, which may have been taken down, to induce amalware binary to reveal its capabilities. BecauseDarkHotel only resides in memory, these techniques,which work by running the malware in a sandbox,cannot be applied.1 With only the memory image, ananalyst can use a forensic tool, such as Volatility [24],to “carve out” the memory image code and data pages.Based on the extracted code pages, symbolic analysiscan simulate the malware execution in order to exploreall potential paths. Unfortunately, existing symbolictools require a properly formatted binary and are notoptimized to work with memory images [7], [22], [23].Ideally, an analyst can manually project these code

    fragments into symbolic analysis and source concretevalues from the data pages to tell which code branchleads to a capability. However, this back-and-forthprocess of “stitching up” code with extracted memoryartifacts, involves context switching between symbolicexecution and the forensic tool. This places a very highcognitive burden on the analyst. An analyst must alsohandle challenges such as path explosion, API callsimulation [4], [22], [25]–[27], and concretizing APIarguments (e.g., attacker’s URL), which may not bestatically accessible in the memory image. Lastly, ananalyst must manually inspect APIs along each path toinfer high-level capabilities.

    1Forensic memory images are not re-executable due to being“amputated” from the original operating system and hardware.

    https://cyfi.ece.gatech.edu/

  • 𝑝"𝑝#

    𝑝$𝑝%

    API & Argument Constraints

    Augmented Symbolic

    ExplorationProbability Assignment

    Capability-Relevant Paths

    2 3 4CODE

    DATA

    EAX, EBX,ECX, EDX,EIP, ESP,ESI, EDI, EFLAGS

    Execution Context

    MemoryImage

    Parser

    1

    Context-Aware Memory Forensics Probabilistic Symbolic Analysis

    Capability PluginAnalysis

    5

    C&C URL

    File Exfiltration

    Code Injection31%

    15%

    54%

    Capability Forecast

    6

    CPU STATE

    Figure 1: Forecast workflow. A memory image is used to reconstruct the original execution state. Concrete data isutilized to explore code paths while API constraints are analyzed against plugins to forecast capabilities.

    2.1 Hybrid Incident ResponseIncident responders rely on memory forensics to identifyattack artifacts in memory images. However, memoryforensics alone, which is largely based on signatures,misses important data structures due to high falsenegatives [21]. On the other hand, symbolic executioncan explore code in the forward direction, but suffersfrom issues such as path explosion [22]. To addressthese limitations, Forecast combines symbolicexecution and memory forensics through a feedbackloop to tackle the shortcomings of both techniques.Context-Aware Memory Forensics. Symbolicanalysis provides code exploration context to accuratelyidentify data artifacts that are missed by memoryforensics. For example, traditional forensic parsing ofDarkHotel’s memory image missed C&C URL stringsbecause they are obfuscated via a custom encodingscheme. However, subsequent symbolic analysis of theinstructions that reference those bytes as arguments,such as a strncpy API, allowed Forecast to correctlyidentify and utilize these data artifacts in the memoryimage. Moreover, targeted malware may employ tacticsthat aim to subvert Forecast, using anti-forensics andanti-symbolic-analysis, which we carefully considered inour design and evaluation.

    Memory image forensics provides concrete inputs thatcan help symbolic analysis perform addressconcretization, control flow resolution, and loopbounding. In addition, memory forensics identifiesloaded library addresses in memory which allowsForecast to perform library function simulation.Path Probability. Given a memory image, the goalis to utilize available concrete data to explore potentialcode paths and forecast capabilities along them. Byanalyzing how different paths are induced by concretememory image data, Forecast can derive theprobability that a path will reach a capability relative toother paths. Forecast computes this probabilitybased on modeling how concrete and symbolic dataoperations are influencing path generation and selection.Forecast also leverages this probability metric as aheuristic in pruning paths with the least concrete data.

    2.2 Incident Response with Forecast

    Forecast identifies capabilities originating from amalware memory image in an automated pipeline. Todemonstrate this, we simulated DarkHotel’s attack andmemory capture, which involved setting up an IDS withDarkHotel’s network signature and executing theAdvanced Persistent Threat (APT). Followingdetection, the IDS signals the end host agent to capturethe DarkHotel process memory. We then input thismemory image to Forecast for analysis. In 459seconds, Forecast reveals DarkHotel’s capabilities: aC&C communication (i.e., mse.vmmnat.com), a fileexfiltration (i.e., of host information), and a codeinjection (i.e., into Windows Explorer).

    There are six stages for processing a forensic memoryimage shown in Figure 1. 1 Forecast forensicallyparses the memory image and reconstructs the priorexecution context by loading the last CPU and memorystate into a symbolic environment for analysis. Inanalyzing the memory image, Forecast inspects theloaded libraries to identify the exported function namesand addresses. Next, 2 Forecast proceeds to explorethe possible paths, leveraging available concrete data inthe memory image to concretize path constraints. 3Forecast models and weighs how each path isinduced by concrete data and assigns a probability toeach generated path. 4 Forecast then uses thisprobability as a weight to adapt loop bounds and prunefalse paths, allowing Forecast to narrow-in on theinduced capability-relevant paths. 5 Forecastmatches identified APIs to a repository of capabilityanalysis plugins to report capabilities to an analyst.Finally, 6 Forecast identifies three capabilities andderives their forecast percentages from the pathprobabilities as 31%, 15%, and 54%, respectively.The first path matches the Code Injection plugin.

    This path contains the APIs: VirtualAllocEx,WriteProcessMemory, and CreateRemoteThread, whichare used in process injection. Analyzing the argumentconstraints leading to these APIs reveals explorer.exe asthe target process. The second path matches the FileExfiltration plugin. This path contains APIs

  • getaddrinfo, SHGetKnownFolderPath, WriteFile, Socket,and Send. Forecast inspects their arguments’constraints to determine that the malware writes hostinformation to a file, which it sends over the network.The File Exfiltration plugin concretizes the argument ofSHGetKnownFolderPath to reveal the file locationidentifier: FOLDERID_LocalAppData. The third pathmatches the C&C Communication plugin, which revealsa sequence of network APIs includingInternetOpenUrlA. The plugin queries the APIconstraints and concretizes InternetOpenUrlA’sargument then reports that DarkHotel makes an HTTPrequest to the mse.vmmnat.com domain.Given these forecast reports, an incident responder

    learns from the captured memory snapshot, thatDarkHotel will communicate with mse.vmmnat.com,steal host data, and inject into Windows Explorer. Thiswill prompt the analyst to block the URL and clean upthe affected Explorer process mitigating furtherdamages. Forecast empowers the analyst to quicklyand efficiently respond to threats by alleviating thecognitive burden and context switching required tomanually obtain the same results.

    3 System Architecture

    Forecast is a post-detection cyber incident responsetechnique for forecasting capabilities in malware memoryimages. It only requires a memory image as input. Theoutput of Forecast is a text report of each discoveredcapability (e.g., code injection), a forecast percentage,and the target of the capability (e.g., injected process).

    Reconstructing Execution Context. Forecastparses the memory image to extract the execution state(e.g., code pages, loaded APIs, register values, etc.) tobe used to reconstruct the process context. Staticanalysis of the code pages is used to initialize symbolicexploration. It explores each path beginning from thelast IP in the reconstructed process context.

    Forecast symbolically executes the CPU semanticsof the disassembled code pages until an undecidablecontrol flow is encountered. To resolve this, Forecastrecursively follows the code blocks to resolve new CFGpaths. When a library call is reached, Forecastsimulates and symbolizes the call (discussed in §3.2).Library call simulation introduces symbolic data foreach explored state, thus increasing the possibility ofstate explosion. However, the DC(s) model (discussednext) provides optimization metrics that enableForecast to dynamically adapt parameters for loopbounding, symbolic control flow, and path pruning.

    3.1 Modeling Concreteness to GuideCapability Forecasting

    Forecast models how available concrete data in amemory image induces capability-relevant paths usingthe degree of concreteness model (DC(s)). Degree ofconcreteness is a property of execution states whichencapsulates the “mixing” of symbolic and concreteoperations. Symbolic operations (Sym_Ops) make useof symbolic variables such as arithmetic involvingsymbolic operands. Concrete operations (Con_Ops) donot make use of symbolic variables. Sym_Ops andCon_Ops are intrinsic to every state transition. Astate transition happens each time a basic block isexecuted along an explored path. Based on the ratio ofSym_Ops to Con_Ops, there exists an associateddegree of concreteness (DC(s)) value, which measureshow concrete or symbolic the current execution state is.Forecasting is based on malware’s use of pre-staged

    concrete data to execute a set of capabilities. UnderDC(s), paths that increasingly utilize concrete states aremore likely to reach a set of capabilities. As a result,Forecast assigns DC(s) scores to states by modelingtheir cumulative usage of concrete data. ThisDC(s) scoreis then used to derive the probability, Pprob(s), that apath will reach a capability relative to other paths. Atthe end of exploration, the paths where capabilities arefound are analyzed based on their Pprob(s), to computeforecast percentages of identified capabilities.In addition to deriving forecasts, DC(s) detects

    conditions that trigger path explosion (e.g., rapid pathsplitting due to symbolic control flows), and makesperformance improvements including pruning falsestates based on the degree of concreteness of everyactive state (discussed in §3.2).Formulation of DC(s). For DC(s) to forecastcapabilities, it must summarize two key features:(1) the rate of change in the ratio of symbolicoperations to all operations, with respect to statetransitions, and (2) the cumulative state conditionsfrom a starting exploration state j to a target state n.We normalize DC(s) with respect to the number ofstates explored in our model. This bounds its valuebetween 0.0 and 1.0, which describes the current statemixing. Formally, we define a state transition set τn,which is a set of ordered states from sj to sn:

    τn := {sj ,sj+1,sj+2, ...,sn} (1)

    where state sj is the first state generated from a memoryimage and 0≤ j ≤ n,n∈Z. Transitioning from state si−1to si involves executing every operation (All_Opsi−1)in the basic block BBi−1 at state si−1. The states in τnare ordered based on the basic block ordering, i.e., thebasic block BBi maps to state si, and executing BBi

  • mov eax, ecx

    mov ecx, 5

    jmp 0x40374D

    mov edx, [0x732460]

    cmp edx, 0

    jnz 0x403787

    mov edx, eax

    add edx, 3

    mov eax, 0x732470

    mov eax, [eax]

    add eax, 1

    push eax

    xor esi, esi

    push esi

    call 0x4042AD

    BB1 0x403280

    BB2

    BB3

    BB4

    Memory View:

    0x732460: AA 23 BF CA

    0x732464: SYMBOLIC

    0x732468: SYMBOLIC

    0x73246C: F1 EC 2B 32

    0x732470: SYMBOLIC

    Register View:

    EAX: 0x732468

    EBX: SYMBOLIC

    ECX: SYMBOLIC

    EDX: 0x2000

    EIP: 0x403280

    ESP: 0x28FECC

    ESI: 0x4000

    lea eax, [0x732468]

    mov eax, [eax]

    jmp 0x40385B

    BB5

    (a) Symbolic exploration for the control-flowgraph, memory, and register values from thememory image.

    Let state 𝑠 be the current state after basic block 𝐵𝐵 is

    executed, and let 𝐷 𝑠 be the degree of concreteness

    at state 𝑠 .

    𝐷 𝑠 = 1 −

    1

    3

    1= 1 −

    0.333

    1= 0.67

    𝐷 𝑠 = 1 −

    1

    3+0

    3

    2= 1 −

    0.333

    2= 0.83

    𝐷 𝑠 = 1 −

    1

    3+0

    3+4

    5

    3= 1 −

    1.133

    3= 0.62

    𝐷 𝑠 = 1 −

    1

    3+0

    3+4

    5+1

    4

    4= 1 −

    1.383

    4= 0.65

    (b) Value derivation for degree ofconcreteness (DC(s)).

    (c) Plot of cumulative ratio vs states.

    (d) Plot of DC(s) vs states.

    Figure 2: Forecast recovers context from the process memory image, including the memory values and registervalues for the captured state in (a). Using the degree of concreteness (DC(s)) formula, (b) calculates the valuesfor each transition state. Figure (c) plots the cumulative ratio of Sym_Ops to All_Ops accumulated across statetransitions. Figure (d) plots the degree of concreteness (DC(s)) across state transitions in the symbolic exploration.

    transitions the program’s context to BBi+1 and statesi+1. The set All_Opsi is partitioned into 2 disjoint sets,Sym_Opsi and Con_Opsi, such that:

    Sym_Opsi∪Con_Opsi =All_Opsi (2)and

    Sym_Opsi∩Con_Opsi = ∅ (3)

    For a state sn, we define the DC(sn) function as follows:

    DC(sn) = 1−

    n∑i=j

    |Sym_Opsi||All_Opsi|

    |τn|(4)

    where |Sym_Opsi| is the cardinality of the Sym_Opsperformed to reach state si and |All_Opsi| is thecardinality of All_Ops performed to reach state si.Further, |τn| is the cardinality of the state transitionsfrom state sj to sn.Tracking the cumulative ratio of Sym_Opsi to

    All_Opsi for each state transition enables us tocalculate DC(s) instantaneously without iteratingthrough the previous states sj to sn. An extended formof DC(s) that allows us to calculate its instantaneousvalue is given as follows:

    DC(sn) = 1−δ

    δTCumul_Ratio(sn) (5)

    where, for all transition states T , Cumul_Ratio(sn) isthe sum of the states’ ratio for states sj to sn, anddefined as:

    {∀si ∈ T : Cumul_Ratio(sn) :=n∑

    i=j

    |Sym_Opsi||All_Opsi|

    } (6)

    An Example of DC(s) Computation. Figure 2 isa working example to show the computation of DC(s).Figure 2a depicts a recovered CFG and memory andregister values from the memory image. Symbolicexecution starts at basic block BB1 and ends at BB4.We annotate each basic block to show whichinstructions are Sym_Ops based on the register ormemory values when the basic block is being executed.Notice that because register edx at BB2 and memoryaddress 0x732460 at BB2 have concrete values, only onebranch is taken by the conditional jump instructions atthe end of BB2. For this reason, BB5 is not explored.Symbolic data can be introduced by I/O-relatedfunction calls and calls to functions that are simulatedbased on Forecast’s function models. Such functioncalls create symbolic variables within the memory dumpwhich causes a mixing of symbolic and concrete data.

    Following along with Figure 2a, Figure 2b computesDC(s) for each state (basic block) transition. Forexample, DC(s1) = 0.67 when we transition to state s2,then it increases to 0.83 as we transition from s2 to s3.For each DC(si) value derived in Figure 2b, we plotthem against the transition states in Figure 2d.Figure 2c plots the Cumul_Ratio(si) for each state(shown in black). The instantaneous Cumul_Ratio(sn)function is a straight line (Cumul_Ratio(sn) = mT )drawn from origin to the point sn ∈ T , where m is theslope. The derivative of Cumul_Ratio(sn) =mT givesthe instantaneous DC(sn) (Equation 5).

    Path Probability. Given m current states, the pathprobability of a path p, with current state s, is derivedby dividing s’s DC(s) by the summation of the DC(s)

  • Algorithm 1 The Degree of Concreteness (DC(s))Input: PATHS: Explored program paths in a memory imageOutput: DC(s): ∀s ∈ path,∀path ∈ P AT HS

    . Initialize Cumul_Ratio for each explored path pfor path p ∈ P AT HS do

    Cumul_Ratio← 0T ← 0

    . Compute DC(s) for each state s generated along pfor State s ∈ SuccessorStates(p) do

    . Get Sym_Ops and All_OpsNum_all_ops←GetNumAllOps(s)Num_sym_ops←GetNumSymOps(s)

    . Calculate the ratio of Sym_Ops to All_Ops for state sSym_Ratio←Num_sym_ops/Num_all_ops

    . Update Cumul_Ratio along the explored pathCumul_Ratio← Cumul_Ratio + Sym_Ratio

    . Compute DC(s) for the considered state sDC(s)← 1− (Cumul_Ratio/T )T + +

    end forend for

    of all m states. This bounds its value between 0.0 and1.0, and is given as follows:

    {Pprob(sx) =DC(sx)

    m∑i=1

    DC(si),m= |{AllCurrentStates}|} (7)

    Algorithmic Approach to DC(s). In order toderive DC(s), Forecast uses Algorithm 1.Cumul_Ratio is the cumulative ratio of symbolicoperations to all operations, and T is the total statetransitions in terms of basic blocks. For each exploredpath p in the memory image, DC(s) is calculated forevery state s generated and executed along the path p.

    3.2 DC(s)-Guided Symbolic AnalysisForecast uses DC(s) to optimize symbolic executionmulti-path exploration by bounding loops, concretizingaddresses for symbolic control flow, and pruning paths.Neglecting these parameters impacts soundness andperformance [27], [28]. State-of-the-art tools [6], [22],[23] rely on hard-coded thresholds to balance thetrade-off between coverage and soundness. Thesetechniques mostly focus on finding bugs innon-malicious code. Choosing an informed threshold isapplication-specific and may require a manualinvestigation. Yet, unlike finding bugs, malware employadversarial means to vary these issues at run-time,hence a hard-coded or manual threshold will be limiting.However, by modeling the changing concrete state of anexploration, Forecast can dynamically adapt these(otherwise application-specific) thresholds at run-time.DC(s) embodies this automated adaptability to

    optimize exploration. We evaluate these features againstadversarial symbolic analysis tactics in §4.

    Adapting Loop Bounds. Forecast optimizesloops by forcing a bound only when DC(s) indicates aheavy symbolic state over time (specifically, whenDC(s) drops below 0.10 after 10 state transitions). Thisoptimization precisely measures how much a loop isaffecting a state to decide when to bound it. We observethat unlike harmless loops, explosion-causing loopsconverge DC(s) to 0.10 after two or more transitions.

    On-Demand State Pruning. When performance isoverwhelmed by heavy state symbolism, Forecastprioritizes states for pruning by selecting the worstperformers. Under DC(s), this selection is trivial sinceevery state has a DC(s) score, which is used to prunestates with heavy symbolic footprints. In §4.6, we foundon-demand pruning drove Forecast toward moreconcrete paths than tools which prune paths via ahard-coded threshold — leading to Forecastexploring deeper in selected paths.

    Stack Backtrace Analysis. False successor pathsoften arise in symbolic analysis. Forecast examinesthe return addresses on the stack in a memory image toidentify false paths — function returns which do notconform to previously established targets in the callstack. Specifically, the stack backtrace enablesForecast to verify flow-correctness by comparing thestack pointer and return addresses in the backtrace withthat computed after executing a return instruction.

    Address Concretization. Forecast uses thememory image data space to concretize symbolicindices to a tractable range. In addition, we observedthat false states perform illegal indices accesses (indicesbeyond the mapped code/data space of a process).Forecast uses this indicator to prune such states.Further, Forecast’s analysis is transparent to addressspace layout randomization (ASLR) because ASLR isdone at process load, before execution.

    Library Function Simulation. Forecastanalyzes the libraries present in the memory image toidentify the exported functions. Identified functions arehooked to redirect the symbolic exploration to asimulated procedure. Forecast also handles dynamiclibrary loading by calls to the LoadLibrary functions. Ifa library is loaded during symbolic exploration,Forecast creates a new section in memory for theloaded library. Once a call to GetProcAddress isreached, a new address is allocated in the library’smemory section and hooked, then this address isreturned. Any calls made to this address will beredirected to the correct simulated procedure.

  • 3.3 Forecasting Malware Capabilities

    To characterize high-level capabilities, we focus oncontextualizing a malware’s API functionality byanalyzing the constraints on their input and outputparameters. Forecast analyzes the symbolicconstraints on the input and output parameters of eachAPI to “connect the dots” between APIs. AnalyzingAPIs used by malware is useful for identifying itscapabilities because a malware’s behavior stems fromits API calls and data flow [11]–[13], [29]. Specifying aunique trace involves identifying the first (source) andlast (sink) API in the sequence. While analyzing APIdata flow is not novel [30], previous work relies ondynamic taint-tracking [11], [14], [29], which can hardlybe applied here. To tackle this, we leverage a constraintmatching technique introduced by [5] to modelmalware’s decision making. Our approach is based onthe formulation that for a given API trace to embody acapability, the path constraints on the input of eachsucceeding API starting from the sink, can be matchedto the output constraints of at least one preceding API.When a sink is encountered, Forecast performs a

    call-based backward slice to record all call instructionssuch that, for each instruction, there is a data flow fromat least one of its operands to the input argument ofthe sink. If the extracted slice includes a correspondingsource, Forecast proceeds to match the constraints onthe input of every succeeding call, starting from thesink, to the output of any preceding call. Note thattraditional system call/API tracing often missesmalware capabilities due to a lack of contextualconnection between observed APIs. Instead, Forecastuses the constraints on the API parameters in thiscall-based backward slice to precisely connect the dataflow between the APIs to infer capabilities. Put simply:The constraints encapsulate only the relevant data flowbetween sources and sinks.Figure 3 illustrates this analysis on AveMaria, a

    Trojan that steals Firefox cookie files. AveMaria infectsby replacing the code of Svcshost, a Windows service,with its own code, a code injection capability known asprocess hollowing. AveMaria also takes screenshots tospy on the user’s screen. The shaded boxes are therelevant APIs in the trace and their key arguments. Thedotted line matches the input constraints on anargument of a latter API to the output constraints of atleast one preceding API. The analysis starts when asink is identified (e.g., SetThreadContext for AveMaria’sCode Injection) and the entire trace is recovered by acall-based backward slice. The numbers, 1, 2, etc., showthe constraint matching steps, starting from the sinkand walking backwards to a source. In AveMaria’s FileExfiltration, the constraints on the input file (buf_3)

    Figure 3: API Constraints Analysis of AveMaria.

    exfiltrated by send are matched with the constraints onbuf_2, an output argument of ReadFile. Next, theconstraints on the file handle (hFile) of ReadFile arematched with the constraints on the output of OpenFile.When these constraints are matched from a send tosocket, Forecast reports a File Exfiltration.

    Capability Analysis Plugin. A plugin specifiesdifferent ways that a given capability is to beidentified.2 It lists one or more API sequences, their keyarguments, and how constraints on their input andoutput parameters connect each other. We developplugins to identify 7 specific malware capabilities.Analysts can easily extend these plugins to specifyadditional capabilities by reviewing the APIdocumentation of the target operating system. Next, wedescribe each capability, showing how a plugin canspecify them.

    1. File Exfiltration. Malware sends stoleninformation from an infected host by uploading a file toits drop site. This is done by using OpenFile andReadFile APIs to copy data into a buffer followed byuse of the send or HttpSendRequest network API.3 Theplugin matches the constraints on the buffer written toby ReadFile with the buffer of data sent by send orHttpSendRequest. Figure 3 shows Forecast’s analysisof AveMaria’s file exfiltration.

    2. Code Injection. Malware injects its code into avictim process to run under the target process ID. Thisis done by the OpenProcess or CreateProcess APIs,followed by WriteProcessMemory (process hollowing)and/or CreateRemoteThread (PE or DLL Injection).

    2Several plugins could be defined for one capability to capturedifferent possible ways that malware exhibit that capability.

    3We refer to APIs with multiple variants (A, ExA, W, andExW ) by the base API name but our plugins cover all variants.

  • The plugin matches the input constraints on the processhandle used by these APIs. Figure 3 shows Forecast’sanalysis of AveMaria’s code injection.

    3. Dropper. Malware writes a file to disk and changesits attributes for execution. The plugin matches theconstraints on the file handle returned by CreateFilewith the file handle input passed to WriteFile, as wellas the file name passed to CreateFile, SetFileAttributes,and CreateProcess.

    4. Key & Screen Spying. Malware recordskeystrokes and screenshots of a user’s computer. Todetect key spying, the plugin matches the constraintson the window handle passed to RegisterHotKey andGetMessage and checks if WH_KEYBOARD waspassed to SetWindowsHook to monitor keystrokes. Forscreenshots, the plugin checks if a device context handlereturned by GetDC or GetWindowDC is passed toCreateCompatibleBitmap. Figure 3 shows this analysisfor AveMaria’s screen spying.

    5. Persistence. Malware make registry entries tomaintain persistence across reboots. The persistenceplugin compares the constraints on the registry keyhandle returned by RegCreateKey or RegSetValue withthe input to RegSetValue. We also specify the keys andsubkeys that malware commonly use with these APIs,such as HKLM, HKCU, Run, and ControlSet.

    6. Anti-analysis. Malware check for analysisenvironments and tools to determine if it should hide itsbehavior. This can be done by checking for debuggerswith OutputDebugString, IsDebuggerPresent, orCheckRemoteDebuggerPresent. VM checks look forrunning services by using CreateToolhelp32Snapshot orEnumProcesses or invoking cpuid to check for virtualCPUs. The plugin checks for usage of these APIs.

    7. C&C Communication. This plugin checks thearguments of socket (af is an IP address),InternetOpenUrl (lpszUrl is a domain), andIWinHttpRequest::Open (lpszServerName is a domain orIP) to determine which servers are contacted. Fordomains that are represented by constant values orstored in memory (e.g., obtained from an externalsource such as file or socket), the plugin can successfullyextract the domain. If the domain is from an externalsource and had not be stored in memory at the time ofthe memory capture, the plugin is unable to determineits concrete value. In the case of domains generatedalgorithmically, Forecast builds constraints on thebytes of the domain, seeds Z3 with the concreteexecution data, and attempts to solve the constraints.To develop these plugins, we manually analyzed 50

    samples and compiled many relevant API traces andtheir key arguments, similar to what an analyst would

    do. Since there are a finite number of ways malware canexhibit a given capability, we can expect to model mostof those methods. In doing this, we observed that therecould be variations in API traces for the samecapability, but the key APIs are always present. Inaddition, some APIs perform the same function, andhence can be interchanged. For example,WriteV irtualMemory can be interchanged forWriteProcessMemory in the process hollowingexample in Figure 3. Furthermore, this approach isresilient to noisy API calls that malware authors maymix into their capability function. We provideadditional details about the constraints for each pluginin Appendix A Table 7.

    Capability Forecasts Percentages. The pathswhere capabilities are found are known as capabilitypaths or CP aths. Forecast considers these paths toderive forecast percentages for discovered capabilities.For each capability cx along a path x, Forecastreports a forecast Ccast(cx) as a percentage. Ccast(cx)is derived from path probabilities of all CP aths, andmeasures the probability that cx will be executedrelative to other capabilities. Let the cardinality ofCpaths be m. A forecast is given as follows:

    {∀i ∈ CP aths : Ccast(cx) =Pprob(x)

    m∑i=1

    Pprob(i)×100} (8)

    4 Evaluation

    Forecast builds upon several angr [22] features,including exploration techniques, SimProcedures, andstate plugins. Our focus is on Windows malware sincethey are most prevalent, but our methodology could beported to other platforms.

    Experiment Setup. Our experiment mimics areal-world deployment where a host-based security toolcaptures a memory image of malware once an IDSdetects malicious network activity. Our testbed iscomprised of (1) an Ubuntu 14.04 machine (with 40GBRAM and 4-core 2.7GHz cpu) running Forecast, (2) aWindows 7 machine executing malware, and (3) an IDSsystem running SNORT. We collected the alert networksignatures of each malware to configure SNORT. IDSalerts during the malware’s execution trigger thecapture of a process memory image4 and sends it toForecast. We profiled all captured memory imagesand observed that 83% were taken while the malwarewas polling on I/O, such as a network socket.

    4WinDBG memory capture also collects pages swapped to disk.

  • MalwareC&C Comm File Code Dropper Key & Screen Persistence Anti-analysis

    OF

    P

    OF

    N

    Exfiltration Injection SpyPF OM OF PF OM OF PF OM OF PF OM OF PF OM OF PF OM OF PF OM OF

    Bokbok 38% 2 2 5% 3 3 57% 1 1 - - - - 0 0AcridRain 23% 3 3 19% 4 4 - 28% 2 2 - 30% 1 1 - 0 0AthenaGo - 11% 4 4 - 22% 3 2 - 33% 2 3 34% 1 1 2 0Rokrat 30% 1 1 26% 2 2 22% 3 3 - 17% 4 4 - 15% 5 5 0 0

    AdamLocker 22% 3 3 0 4 ∅ 45% 1 1 - - 33% 2 2 - 0 1Marap - 46% 3 3 40% 1 1 - 14% 2 2 - - 0 0ATI - - - 41% 2 2 - 42% 1 1 17% 3 3 0 0

    TeslaAgent 11% 4 4 14% 3 3 32% 1 1 - 13% 2 3 30% 3 3 - 0 0Andromeda 25% 2 2 - 14% 3 3 - - 61% 1 1 - 0 0AveMaria 28% 3 4 29% 2 2 28% 4 3 - 25% 1 1 0 3 ∅ - 2 1

    Aveo 22% 3 3 - - 40% 1 1 - 38% 2 2 0 4 ∅ 0 17Honest - 16% 3 3 51% 1 1 11% 4 4 - 22% 2 2 - 0 0Abaddon - 26% 2 2 - - - 84% 1 1 - 0 0AVCrpyt 51% 1 1 - - - - 19% 3 3 30% 2 2 0 0

    Table 1: Capability Forecasts of 14 Select Recent Samples. PF : Forecast percentage, OM : Ground truth manualordering, OF : Forecast ordering, OF P : Ordering false positive, OF N : Ordering false negative.

    4.1 Evaluating Capability ForecastsTable 1 presents the capability forecasts of 14 recentsamples5 we manually collected ground truth for.Forecast output 49 distinct capability forecasts.Manual analysis validated 45 of them; we found 4 falsepositives (FP) and 3 false negatives (FN), with anaccuracy of 86.5%. FPs were due to over-approximatingsymbolic constraints when simulating undocumentedAPIs such as RtlCreateUserThread. The FNs were dueto rare unresolved symbolic targets.

    Ground Truth. Validating each forecast involves 2checks: (1) the presence or absence of the identifiedcapability, and (2) the accuracy of the forecastpercentage. For ground truth for the presence orabsence of a capability, we leveraged malware reportsfrom security vendors [31], [32] and our own manualanalysis. We also used the MITRE ATT&CKFramework [33] for our initial ground truth mappings.To validate our ground truth forecast percentages,

    (i.e.rank each outcome according to the “difficulty” or“constraints required” of arriving at an outcome) wemodeled the difficulty metric of executing capabilitiesfrom the memory image capture point based on thenumber of branch constraints to reach a givencapability. We can obtain this metric via manualanalysis of the memory image since we know theaddresses of the individual capabilities. Using Bokbot asan example, Table 1 shows its 3 capabilities: CodeInjection, C&C Communication, and File Exfiltration.For these, Forecast reports forecast percentages of57%, 38%, and 5% respectively (listed in the Ccastcolumn of Table 1). Based on manual analysis of itsmemory image, the number of branch constraints toreach these capabilities are 166, 195, and 257,

    5Their hashes are presented in Table 8 in Appendix A.

    respectively. Thus, Code Injection is less difficult toreach and hence has the highest forecast.Next, we validate capability ordering. We assign an

    increasing number, starting at 1, to each capabilityidentified by manual checking (defined as OM ) andordered by increasing difficulty. We assign an increasingnumber to each capability identified by Forecast(OF ) up to the number of identified capabilities. ForBokbot, both manual checking and Forecast report anordering of 1, 2, and 3 for Code Injection, C&CCommunication, and File Exfiltration respectively.

    As shown in Table 1, because Forecast’s forecast forBokbok’s Code Injection is the highest, (i.e., 57%), CodeInjection’s ordering or OF is 1. Similarly, the orderingby manual checking or OM is also 1, which validatesForecast’s forecast for Bokbot’s Code Injection. Inanother example, Forecast’s prediction for AthenaGo’sDropper is 22%, which is the second highest forecast (i.e.,OF is 2). However, manual checking shows Persistence asthe second highest instead, resulting in FP for AthenaGo(listed in the OF P column of Table 1). Forecast missedAveo’s Anti-analysis capability, resulting in a FN (listedin the OF N column), and a forecast of 0 (Ccast column).Overall, Persistence reported the highest forecast

    percentages, as high as 84% for Abaddon. We foundthat most malware persist via infecting the registry.Conversely, File Exfiltration reports the lowest forecasts,as low as 5% for Bokbok. Reasonably, File Exfiltrationcan be seen as an “end goal” capability, which malwaredeploy in deep code under several constraints. Byintegrating capability analysis plugins, Forecast wasable to automatically identify them.C&C Communication. Table 1 shows 7 C&Cdomains identified with 1 FP. We focused onWinINET’s APIs such as InternetOpenUrl and socket.In particular, we concretized their domain and IPaddress arguments. Forecast revealed Rokrat and

  • CapabilitiesMalware Packer Paths Steps Const. Leaves Time (s) DC(s) C&C Exfil. Inject Drop Spy Persist Anti-Analy. FP FN

    PackedFromTable 1

    Marap UPXType-I

    227 465.95 25.74 3.01 97.39 0.94 3 3 3 0 0AVCrypt 59 184.69 23.53 2.00 27.91 0.84 3 3 3 0 0ATI 115 179.44 19.89 3.17 56.90 0.83 3 3 3 0 0

    StressTest

    Packers

    RokRat ASPackType-III

    595 265.68 14.05 1.99 143.54 0.93 3 7 3 3 3 0 1AcridRain 1410 330.39 26.82 2.84 247.47 0.88 3 3 3 7 0 1AthenaGo 677 371.39 26.48 2.03 193.44 0.92 3 3 3 3 0 0RokRat Armadillo

    Type-VI

    732 56.39 18.19 2.96 139.31 0.68 3 3 7 3 3 0 1AcridRain 338 226.30 23.70 3.42 93.34 0.84 3 3 3 7 0 1AthenaGo 701 55.21 18.17 2.66 107.42 0.67 3 7 3 3 0 1

    Table 2: Packed malware evaluation results based on packer taxonomy found in Ugarte-Pedrero et al. [34].

    AVCrypt’s usage of dropbox.com and TOR(bxp44w3qwwrmuupc.onion), respectively. TeslaAgentuses a hardcoded IP address (45.77.35.239), and a gmailaccount ([email protected]) to communicateexternally. Aveo communicates with a .it domain,vacanzaimmobiliare.it. We found that this server ishosting a vacation website and is likely compromised.

    Code Injection. Forecast reports 8 CodeInjection with 1 FP. Explorer and Svchost are the mostcommon Windows programs injected into. 7Honest,Bokbot, and AveMaria hollows into Svchost by invokingCreateProcess with a CREATE_SUSPENDED flag, andthereafter swaps the code pages withWriteProcessMemory and SetThreadContext. TeslaAgentand Andromeda inject into Explorer using theVirtualAlloc and WriteProcessMemory API sequences.

    Dropper. Forecast reports 5 Dropper forecastswith no FP and FN. 7Honest and AthenaGo dropadditional files in the AppData and ProgramDatadirectories and manipulate their permissions usingSetFileAtrributes. AcridRain drops a WinDDecode.exeexecutable in AppData. We determined it was a customdecoder for its C&C. Aveo drops .dat executables insystem32.

    Key & Screen Spying. We focused on detectingkeyloggers and screen captures based on the Key Hooksand GDI API toolkit. Forecast reported 4 Key &Screen spying forecasts with 1 FP. RokRat andTeslaAgent used GetAsyncKeyState and RegisterHotKeyAPI to obtain key presses. AveMaria invoked screencapture using a sequence of GetDeskstopWindow,GetWindowDC, and CreateCompatibleBitmap.

    Anti-Analysis. Forecast reports 4 Anti-analysisforecasts with 1 FN. RokRat and AthenaGo performednetwork checks via InternetCheckConnectionA. AVcryptuses IsDebuggerPresent, OutputDebugString, andCheckRemoteDebuggerPresent to check for debuggers.To check for VM, ATI issues cpuid calls to obtainhardware platform information.

    4.2 Packed Malware

    We evaluated Forecast’s robustness against packersusing the taxonomy proposed by Ugarte-Pedrero etal. [34]. In fact, 3 of the 14 samples from Table 1 arepacked by UPX, which is a Type-I packer. We includethose three samples in our packer robustness evaluationas a reference, as shown in Table 2. Our evaluation alsolooks at three additional families using two differenttypes of packers, namely ASPack (Type-III) andArmadillo (Type-VI), giving us a total of 9 samples.

    Type-I through Type-IV packers fully unpack themalware code in memory before executing the maliciouscode [34]. For completeness, we evaluate Forecastagainst ASPack, a Type-III packer, where layeredunpacking routines are not sequential, leaving junk codeand data in memory from earlier layers. In Table 2,Forecast explores an average of 894 paths per samplewith a high final DC(s) (mostly concrete). Additionally,Forecast identifies almost every capability found inTable 1, except for exfiltration (Exfil.) and persistence(Presist) capabilities for RokRat and AcridRain,respectively. We mark those missed capabilities asfalse-negatives (FN) in Table 2.

    Type-V and VI packers unpack malicious codeincrementally using different memory frames. Weevaluate Forecast against Armadillo withCopyMem-II protection, which incrementally unpacksand executes code at a memory-page granularity.Forecast explores an average of 590 paths per samplewith an average of 0.73 for the final DC(s), which islower than Type-I and Type-III packers. Moreover,Forecast identifies all the capabilities in Table 1except code injection (Inject), persistence (Persist), anddropper (Drop) capabilities found in RokRat,AcridRain, and AthenaGo, respectively. These resultsempirical show the effect of incremental unpacking onForecast’s capability to analyze malware, which isrooted in the memory artifacts that are visible duringmalware capture. We discuss these limitations in §6.

  • Malware Family All Samples browsefox coinminer xtrat autoit expiro bifrose darkkomet rebhip dprotect llac delfTotal Samples 6,727 200 161 57 161 3428 69 163 80 398 68 65C&C URL 30.5% 51% 47% 39% 32% 17%File Exfiltration 11.3% 12% 17% 8%Code Injection 32.7% 25% 44%Dropper 41.0% 37% 23% 23% 11% 44% 33% 37% 26% 10%Persistence 55.2% 52% 60% 67% 61% 63% 57% 46%Key&Screen Spy 24.4% 40% 33%Anti-analysis 29.4% 29% 34% 39%Avg. Explore time(s) 291 218 234 196 124 310 128 326 285 227 134 420Avg. APIs per path 26 21 18 12 9 17 29 13 45 28 13 11Avg. States generated 1638 1196 1267 3450 950 1471 4601 670 897 1136 823 1568DC(s) of final states 0.21 0.34 0.21 0.29 0.39 0.28 0.18 0.43 0.43 0.32 0.41 0.31

    Table 4: Average Capability Forecasts and Metrics, featuring the 11 most prevalent malware families.

    4.3 Tactics To Subvert Forecast

    Category Samples Paths Steps C/P Leaves FlagsNo Hash 10 2.00 21.50 3.00 16.00 100%Hash-Guarded 10 74.70 45.15 19.00 3.40 100%Tigress 2311 4.02 58.65 8.47 3.84 97%

    Table 3: Averaged results of symbolic obfuscationevaluation. C/P denotes constraints per path.

    Malware authors who are aware of Forecast maytry to adapt advanced tactics to subvert our capabilityexploration. To evaluate Forecast against targetedattacks, we follow the set of obfuscation benchmarksproposed by Banescu et al [35], [36]. These anti-analysisbenchmarks are broken into two sets, a set of 10hash-guarded programs that simulate license checking(Hash-Guarded) and a set of 2,311 Tigress-obfuscatedprograms (Tigress). Table 3 presents the results forthree experiments, namely baseline (No Hash),Hash-Guarded, and Tigress. For the Hash-Guardedprograms we created a Forecast plugin that triggerswhen the license check is correct (captured Flag). Forthe No Hash programs, Forecast found the flag andexplored 2 paths with an average of 21.5 steps per path,3 constraints, and 16 leaf nodes per constraint AST.

    For the Hash-Guarded programs, Forecast found allflags and explored an average of 74.7 paths, with 45.15steps and 19 constraints per path, and 3.4 leaves perconstraint. For the Tigress obfuscated programs, thecode performs various transformations on the input andcompares the derived value against an expected valuethat represents the correct license key.6 The results showthat 97% of the flags were found and an average of4.02 paths were explored with an average of 58.65 stepsper path, 8.47 constraints per path, and 3.84 leaves perconstraint. These results empirically demonstrate thatForecast is resilient against adversarial obfuscationattacks targeting symbolic execution.

    6We excluded Tigress programs which crashed or did not printthe flag during a natural execution with the correct argument.

    4.4 Large-Scale AnalysisWe show that Forecast is effective when applied to alarger set of memory images from 6,727 malwaresamples (covering 274 different malware families).Table 4 summarizes Forecast’s capability forecasts forthe 6,727 samples and highlights metrics for the top 11most prevalent malware families in our dataset. Thehighest capability forecasts were recorded forPersistence and Dropper. We observed that over 70% ofall 6,727 samples have Dropper and Persistencecapabilities. When averaged, Persistence reports 55.2%overall forecast, peaking at 67% for the Bifrose family.Our experiment revealed that the Bifrose family entersseveral registry Run keys in both the HKLM* and HKCU*registry directories – an aggressive means to forcepersistence across reboots, compared to other families.Bifrose samples also drop a .dat executable file inWindows\System32 and connect to a no-ip.comdomain C&C. Dropper capability reported 41.0%overall, peaking at 44% for the expiro family. The lowestforecasts were File Exfiltration, with 11.3% overall.

    We observe fairly low variance between the highs andlows of forecasts within each family. Digging deeper,this is due to samples in the same family reusing thesame features (e.g., dropper filenames). Samples in theBrowsefox adware family drop an executable with aconsistent file name format of “Expance.exe”in ProgramData directory. Our investigation found thatit installs extensions to browsers to display ads, earningthe attacker ad revenue. The Xtrat family of remoteaccess trojans displays similar patterns of C&C domainnames, namely to.org. Concrete examples arezapto.org and hopto.org.

    Exploration Metrics. Table 4 reveals interestingobservations about the metrics reported by eachmalware family. The average exploration time for onememory image is 291 seconds, which shows thatForecast is efficient as an offline investigation tool.Forecast revealed an average of 26 unique APIs per

  • memory image and generated 1,638 states on averageper sample. The Bifrose family reported the largestnumber of states per sample (4,601), while Darkkometgenerated the lowest (670 states per sample).The average DC(s) for end states was 0.21, which

    indicates that states toward the end were more symbolicthan concrete. Bifrose reported the lowest DC(s) of0.18 indicating a very symbolic ending. This was thegeneral observation for most C&C-based malware sincesimulating socket calls introduces more symbolic data,causing DC(s) to drop. Darkkomet and Rebhip tied forthe highest DC(s) with 0.43. This confirms thecorrelation between DC(s) and cumulative statescoverage. Samples in the Delf family reported themaximum exploration time (420 seconds on average),which explains their high average states (1568).

    4.5 Pre-Staged Concrete InputRecall that when no concrete input data exists, puresymbolic analysis will explore all paths. The DC(s)model assumes that following paths that involvepre-staged concrete data in the memory image focusesForecast on the most urgent payloads. Weempirically evaluated this assumption withcontrolled-experiments on 2 malicious and 3 benignprograms: (1) LokiRAT, a remote access trojan, (2)xTBot, an IRC-based malware, (3) netstat, (4)ipconfig, and (5) arp. These were chosen becausetheir source code is publicly available and theirbehavior for concrete inputs can be determined.7 Weanalyzed their source code and compiled binaries toestablish the ground truth set of paths that selectedconcrete inputs will cause the program to take. ForLokiRAT and xTBot, we determined all specific pathsthat the malware could take when it receives certaincommands from its C&C server. For netstat,ipconfig, and arp, we determined all specific pathsthat the programs could take when executed with a setof command-line flags. Figure 4 illustrates an example.Table 5 shows these programs and each of the

    concrete data we investigated. For netstat, ipconfig,and arp, we executed each program with thecommand-line flags shown in Table 5 and took amemory image when main was called to ensure theflags exist in the memory image as concrete data. Forthese experiments, we obtained 9 memory images (3command-line flags for each of the 3 programs). ForLokiRAT and xTBot, we executed each sample, injectedeach selected C&C command, and captured memoryimages as soon as they received each C&C command (6in total). The intuition here is that Forecast shouldproduce the same paths as each ground truth set for thecorresponding memory images.

    7Forecast did not have access to the ground truth.

    Figure 4: LokiRAT ground truth. PT RUT H−regnewkey,PT RUT H−message, and PT RUT H−rename represent theground truth set of paths for each LokiRAT C&Ccommand (regnewkey, message, rename).

    Malware Ground Truth Forecast ResultsC&C Cmds Paths Paths TP FP FN Acc(%)

    LokiRATregnewkey 4 5 4 1 0 80

    message 4 4 4 0 0 100rename 2 2 2 0 0 100

    XTBot.ntstats 1 1 1 0 0 100.netinfo 2 2 2 0 0 100.sysinfo 28 30 27 2 1 90.0

    Benign Argument Paths Paths TP FP FN Acc(%)

    netstat-a 3 3 3 0 0 100-e 3 3 3 0 0 100-r 2 2 2 0 0 100

    ipconfig-release 4 4 4 0 0 100-renew 6 5 5 0 1 83.3-no-flag 19 18 16 2 1 84.2

    arp-a 6 6 6 0 0 100

    -d 10.1.1.1 8 7 7 0 1 87.5-s :cf:b8:20 11 12 10 1 1 83.3

    Table 5: Exploration Based on Pre-Staged Input.

    Table 5 shows that, for the malware, Forecastdiscovered 40 out of 41 ground truth paths, with 3 FPand 1 FN, giving an accuracy of 95.0%. For the benignprograms, Forecast discovered 56 out of 62 groundtruth paths, with 3 FP and 4 FN, giving an accuracy of93.1%. We found that the FP results were caused byshort paths that were pruned when they accessed illegalmemory. FNs were caused by symbolic IP values due tounconstrained jump targets. Overall, Forecastattained an accuracy of 94.0%. This shows thatForecast’s exploration of memory images usingpre-staged inputs is accurate.

    4.6 Comparing Existing TechniquesWe empirically compared Forecast with S2E [6],angr [22], and Triton [23]. We found that Forecastoutperforms them at identifying malware capabilitiesbased on the coverage of capability paths (i.e. codepaths where at least one capability is found). Since theycannot take a memory image as input, with theexception of angr, we provided the malware binary andconfigured them to start from an equivalent IP as

  • Tools Exp

    loration

    techniqu

    es

    Exp

    lored

    paths

    Identified

    capa

    bilities

    Path

    explosion

    instan

    ces

    Exp

    lore

    time(s)

    Basic

    blocks

    covered

    Forecast Data-Guided 877 32 28 301 12488angr [22] Pure Symbolic 1292 11 521 236 14567S2E [6] Concolic 602 7 57 98 10007Triton [23] Concolic 229 3 N/A 522 4309

    Table 6: Forecast Compared to Existing Techniques.Forecast. We used 50 samples for this experiment.8

    As shown in Table 6, Forecast identified more thantwice the capabilities compared to angr, S2E, andTriton. Forecast explored as many as 877 paths persample on average. By leveraging prior execution stateto optimize paths, only 28 paths were terminated dueto path explosion compared to 521 by angr and 57 byS2E. Although angr explored the most paths (1292), itterminated 521 due to path explosion. We observed thatangr could not concretize paths when faced with earlysymbolic control flow, causing state explosion. Theexploration time for angr was relatively low (236s)because many paths quickly became unconstrained andterminated. Forecast reported a higher runtime of301s due to the overhead of computing probabilityscores for each state.S2E requires symbolic variables to be manually

    induced for multi-path exploration. When we initiallytested S2E with malware, we traced only a single path.However, to enable S2E to explore multiple paths, wesymbolized the arguments of the malware’s localfunctions and only traced paths that originated fromthe malware code. This led to an exploration of 602paths, where 57 became unconstrained and terminated.S2E had the fastest average runtime (98s), because itexecutes code natively on the CPU. Triton uses aper-input iterative approach to code exploration, hencethe path explosion metric is not applicable. To tracemultiple paths with Triton, we manually pushed newconstraints to each path predicate, but Triton washeavily hindered by input requirements to explore newpaths. Triton traced 229 paths on average, 3 of whichidentified capabilities. Due to its iterative nature andinstruction-level emulation, it incurred the highestruntime of 522 seconds.

    5 Related WorkPrior work uses symbolic execution for variousapplications including test case generation [8]–[10], [27],[37]–[40], vulnerability detection [9], [41], [42], andenhancing dynamic malware analysis [3], [5], [43], butoften relies on simplistic heuristics to optimize symbolicexecution. FuzzBall [44] initializes the program states to

    8Hashes and capability addresses are in Table 9 in Appendix A.

    concretize constraints, while MAYHEM [27] applieson-line and off-line concolic execution to manage pathexploration. However, Forecast reduces pathexplosion by using the DC(s) framework to identifycapability-relevant paths. Additionally, Forecast doesnot require an intact binary file or prior knowledge of aprogram’s input and environment, which avoidsrestrictive assumptions for symbolic execution.

    For malware applications, prior works use full-systememulation [4], dynamic analysis [5], [45], and Win APIsimulation [46] to identify malware capabilities.Yadegari et al. [47] study the robustness of symbolicanalysis techniques against malware obfuscation. Incontrast, Forecast is a post-detection approach thatcombines both symbolic analysis and memory forensicsto identify staged malware capabilities. Prior work onmemory forensics focuses on kernel objects [48], [49],access patterns to kernel objects [50]–[54], and dynamicmemory traces [55], [56] to detect and remediate rootkitmalware. DSCRETE [57] leverages memory image codereuse for interpreting single data structures. Similarly,for mobile security, prior works [58]–[61] analyze amobile application’s memory to recover artifacts relatedto recent activities. However, Forecast relies onmemory artifacts to contextualize malware behaviorthrough symbolic analysis and surgically analyzes asingle target malicious process.Provenance-based investigation techniques are also

    related to Forecast. NoDoze [62] and Hassan etal. [63] utilize Windows and Linux system events toprioritize alerts through a network diffusion approachusing temporal ordering. Similarly, HOLMES [64]correlates suspicious events by examining informationflows and TARDIS [65] identifies compromised websitesthrough a spatial-temporal approach to present attacktactics for analysts. Attack2Vec [66] uses system eventembedding to derive emerging attack tactics.Forecast uses the DC(s) model to predict in-progressmalware capabilities using a similar network diffusionapproach [62]–[64] but instead identifies relevant pathsbased on the execution context of a malware.

    6 Limitations and Discussion

    Subverting Symbolic Analysis. An adversary maytarget the symbolic execution component of Forecastby exploiting path explosion, path divergence, andconstructing complex constraints. In §4.3, we turned tothe published literature on symbolic analysisbenchmarks [35] and found that Forecast is robustagainst these attacks. However, we acknowledge that anovel attack, not considered in the literature, maysubvert Forecast’s results.

  • Subverting Memory Artifacts. An adversary maytarget memory acquisition or memory artifacts tosubvert Forecast. The memory acquisition dependson the IDS, which Forecast has no control over. It isreasonable to assume that the IDS will detect andcapture a malware while the malware is executingmalicious routines (which produced the detectedsignature). To tamper with memory artifacts, anadversary can obfuscate code segments, use anon-standard stack layout, or insert junk code/data.Forecast was shown to be resilient to junk code/dataproduced by Type-III packers in §4.2. If Forecast isaffected by an attack that subverts code analysis,Forecast could be extended to handle specificmemory manipulation attacks by porting IDAmicrocodes to flatten obfuscated code structures [67].

    Virtualization-Based (VM) Packing. Generally,like any symbolic exploration framework, Forecastcannot explore capabilities in packed code unless it isunpacked. As our evaluation in §4.2 shows, Forecastcan handle Type-I, Type-III, and Type-VI packers asoutlined in Ugarte-Pedrero et al. [34]. Some packers usevirtualization to convert programs into bytecode anduse an interpreter to run the bytecode. Due to thecomplexity of virtualization, Forecast cannot handlesuch techniques, which account for less than 2% ofpacked malware [34].

    Adversarial Aware Attacks. An adversary that isaware of Forecast can influence the analysis via twofactors: memory frame replacement and pointerobfuscation. First, memory frame replacement cansubvert Forecast for specific samples using Armadillowith CopyMem-II protection due to the iterativeunpacking and execution sequence. This unpacks codeat a memory frame-level granularity, which limits thevisibility of the malicious code to the most recentunpacked memory frame. This artifact is evident fromour evaluation in Table 2. Second, pointer obfuscationcreates additional overhead for the symbolic executionengine, which drops the degree of concreteness (DC(s))metric. An attacker can heavily utilize pointerobfuscation by relying on a unique seed in memory todeobfuscate pointers.Heavy obfuscation of memory artifacts can and does

    affect the performance and stability of the malware,which may not be in the favor of the malware operator.Not surprisingly, Ugarte-Pedrero et al. [34] finds suchheavy obfuscation in only 1.8% of in-the-wild malware.Finally, we emphasize that the quality of the memorycapture is dependent on the detection tool, independentof Forecast. Forecast is a post-detection approachthat relies on a forensic memory capture to performcapability prediction.

    7 Conclusion

    Forecast overcomes the high cognitive burden on ananalyst by forecasting future malware capabilities.Forecast integrates memory forensics and symbolicanalysis in a feedback loop to efficiently exploremalware with context. Our evaluation has shown thatForecast produces accurate forecasts of capabilities.

    Acknowledgments

    The authors would like to thank the anonymousreviewers for their constructive comments and feedback.We also thank Professor Nolen Scaife for his guidancewhile shepherding this paper. This work was supported,in part, by ONR under Award N00014-19-1-2179 andNSF under Award 1755721. Any opinions, findings, andconclusions in this paper are those of the authors anddo not necessarily reflect the views of our sponsors orcollaborators.

    References

    [1] Fileless attacks against enterprise networks, https://securelist.com/fileless- attacks- against- enterprise-networks/77403/, 2017.

    [2] The Darkhotel APT: A Story of Unusual Hospitality, https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2018/03/08070903/darkhotel_kl_07.11.pdf,2014.

    [3] U. Bayer, A. Moser, C. Kruegel, and E. Kirda, “Dynamicanalysis of malicious code,” Journal in Computer Virology,vol. 2, no. 1, 2006.

    [4] D. Brumley, C. Hartwig, M. G. Kang, Z. Liang,J. Newsome, P. Poosankam, D. Song, and H. Yin,“Bitscope: Automatically dissecting malicious binaries,”Technical Report, School of Computer Science, CarnegieMellon University, vol. CS-07-133, 2007.

    [5] A. Moser, C. Kruegel, and E. Kirda, “Exploring MultipleExecution Paths for Malware Analysis,” in Proceedings ofthe 28th Symposium on Security and Privacy (Oakland),Oakland, CA, May 2007.

    [6] V. Chipounov, V. Kuznetsov, and G. Candea, “S2E: Aplatform for in-vivo multi-path analysis of softwaresystems,” in Proceedings of the 16th ACM InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS), NewportBeach, CA, Mar. 2011.

    [7] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: APractical Concolic Execution Engine Tailored for HybridFuzzing,” in Proceedings of the 27th USENIX SecuritySymposium (Security), Baltimore, MD, Aug. 2018.

    [8] K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unittesting engine for C,” in Proceedings of the ACM SIGSOFTSoftware Engineering Notes, Lisbon, Portugal, Sep. 2005.

    [9] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, andD. R. Engler, “EXE: Automatically generating inputs ofdeath,” ACM Transactions on Information and SystemSecurity, vol. 12, no. 2, 2008.

    https://securelist.com/fileless-attacks-against-enterprise-networks/77403/ https://securelist.com/fileless-attacks-against-enterprise-networks/77403/ https://securelist.com/fileless-attacks-against-enterprise-networks/77403/ https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2018/03/08070903/darkhotel_kl_07.11.pdf https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2018/03/08070903/darkhotel_kl_07.11.pdf https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2018/03/08070903/darkhotel_kl_07.11.pdf

  • [10] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directedautomated random testing,” in Proceedings of the 2005 ACMSIGPLAN Conference on Programming Language Designand Implementation (PLDI), Chicago, IL, Jun. 2005.

    [11] P. M. Comparetti, G. Salvaneschi, E. Kirda, C. Kolbitsch, C.Kruegel, and S. Zanero, “Identifying dormant functionality inmalware programs,” in Proceedings of the 31th Symposiumon Security and Privacy (Oakland), Oakland, CA, May 2010.

    [12] L. Martignoni, E. Stinson, M. Fredrikson, S. Jha, and J. C.Mitchell, “A layered architecture for detecting maliciousbehaviors,” in International Workshop on Recent Advancesin Intrusion Detection, Springer, 2008, pp. 78–97.

    [13] K. A. Roundy and B. P. Miller, “Hybrid analysis andcontrol of malware,” in Proceedings of the 13thInternational Symposium on Research in Attacks,Intrusions and Defenses (RAID), Ottawa, Canada, Sep.2010.

    [14] C. Kolbitsch, T. Holz, C. Kruegel, and E. Kirda, “Inspectorgadget: Automated extraction of proprietary gadgets frommalware binaries,” in Proceedings of the 31th Symposium onSecurity and Privacy (Oakland), Oakland, CA, May 2010.

    [15] Non-Malware Attacks and Ransomware Take Center Stagein 2016, https://www.carbonblack.com/wp-content/uploads/2016/12/16_1214_Carbon_Black-_Threat_Report_Non-Malware_Attacks_and_Ransomware_FINAL.pdf, 2016.

    [16] Q. Wang,W. U. Hassan, D. Li, K. Jee, X. Yu, K. Zou, J. Rhee,Z. Chen, W. Cheng, C. Gunter, and H. Chen, “You arewhat you do: Hunting stealthy malware via data provenanceanalysis,” in Proceedings of the 2020 Annual Network andDistributed System Security Symposium (NDSS), San Diego,CA, Feb. 2020.

    [17] M. I. Sharif, A. Lanzi, J. T. Giffin, and W. Lee, “Impedingmalware analysis using conditional code obfuscation,” inProceedings of the 15th Annual Network and DistributedSystem Security Symposium (NDSS), San Diego, CA, Feb.2008.

    [18] D. Balzarotti, M. Cova, C. Karlberger, E. Kirda, C. Kruegel,and G. Vigna, “Efficient detection of split personalities inmalware,” in Proceedings of the 17th Annual Network andDistributed System Security Symposium (NDSS), San Diego,CA, Feb. 2010.

    [19] FireEye: Endpoint Forensics, https://www.fireeye.com/products/mir-endpoint-forensics.html, [Accessed: 2018-02-28].

    [20] B. D. Carrier and J. Grand, “A hardware-based memoryacquisition procedure for digital investigations,” DigitalInvestigation, vol. 1, 2004.

    [21] S. Vömel and F. C. Freiling, “A survey of main memoryacquisition and analysis techniques for the windowsoperating system,” Digital Investigation, 2011.

    [22] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens,M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser,C. Kruegel, and G. Vigna, “SoK: (State of) The Art ofWar: Offensive Techniques in Binary Analysis,” inProceedings of the 37th Symposium on Security andPrivacy (Oakland), San Jose, CA, May 2016.

    [23] Triton: A Dynamic Symbolic Execution Framework, SSTIC,2015, pp. 31–54.

    [24] Volatility: Open Source Memory Forensics Framework, https://www.volatilityfoundation.org, 2019.

    [25] Y. Li, Z. Su, L. Wang, and X. Li, “Steering symbolicexecution to less traveled paths,” in Proceedings of the2013 Annual ACM SIGPLAN International Conference onObject Oriented Programming, Systems, Languages &Applications (OOPSLA), Indianapolis, IN, Oct. 2013.

    [26] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficientstate merging in symbolic execution,” ACM SigPlan Notices,vol. 47, no. 6, pp. 193–204, 2012.

    [27] S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley,“Unleashing MAYHEM on Binary Code,” in Proceedings ofthe 33rd Symposium on Security and Privacy (Oakland),San Francisco, CA, May 2012.

    [28] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley,“Enhancing symbolic execution with veritesting,” inProceedings of the 36th International Conference onSoftware Engineering (ICSE), Hyderabad, India, May 2014.

    [29] C. Kolbitsch, P. M. Comparetti, C. Kruegel, E. Kirda,X. Zhou, and X. Wang, “Effective and Efficient MalwareDetection at the End Host,” in Proceedings of the 18thUSENIX Security Symposium (Security), Montreal,Canada, Aug. 2009.

    [30] H. Lim, “Detecting Malicious Behaviors of Software throughAnalysis of API Sequence k-grams,” Computer Science andInformation Technology, vol. 4, no. 3, pp. 85–91, 2016.

    [31] Malpedia: Free and Open Malware Reverse EngineeringResource offered by Fraunhofer FKIE, https://malpedia.caad.fkie.fraunhofer.de, [Accessed: 2019-01-28].

    [32] Malware Archaeology: Malware Discovery, Education,Training, Active Defense, Detection and Response,https : / / www . malwarearchaeology . com / analysis,[Accessed: 2019-01-28].

    [33] MITRE ATT&CK Framework: A globally-accessibleknowledge base of adversary tactics and techniques basedon real-world observations.https : / / attack . mitre . org / software/, [Accessed:2019-04-20].

    [34] X. Ugarte-Pedrero, D. Balzarotti, I. Santos, andP. G. Bringas, “SoK: Deep Packer Inspection: ALongitudinal Study of the Complexity of Run-TimePackers,” in Proceedings of the 36th Symposium onSecurity and Privacy (Oakland), San Jose, CA, May 2015.

    [35] S. Banescu, C. Collberg, V. Ganesh, Z. Newsham, andA. Pretschner, “Code obfuscation against symbolicexecution attacks,” in Proceedings of the 32nd AnnualComputer Security Applications Conference (ACSAC),2016.

    [36] S. Banescu, C. Collberg, V. Ganesh, Z. Newsham, andA. Pretschner, Obfuscation benchmarks, 2016. [Online].Available:https://github.com/tum-i22/obfuscation-benchmarks.

    [37] J. C. King, “Symbolic execution and program testing,”Communications of the ACM, vol. 19, no. 7, 1976.

    [38] R. S. Boyer, B. Elspas, and K. N. Levitt, “SELECT — aformal system for testing and debugging programs bysymbolic execution,” ACM SigPlan Notices, vol. 10, no. 6,pp. 234–245, 1975.

    [39] W. E. Howden, “DISSECT — A symbolic evaluation andprogram testing system,” IEEE Transactions on SoftwareEngineering, no. 4, pp. 266–278, 1978.

    [40] C. Cadar and D. Engler, “Execution generated test cases:How to make systems code crash itself,” in Proceedings ofthe International SPIN Workshop on Model Checking ofSoftware, San Francisco, CA, Aug. 2005.

    https://www.carbonblack.com/wp-content/uploads/2016/12/16_1214_Carbon_Black-_Threat_Report_Non-Malware_Attacks_and_Ransomware_FINAL.pdf https://www.carbonblack.com/wp-content/uploads/2016/12/16_1214_Carbon_Black-_Threat_Report_Non-Malware_Attacks_and_Ransomware_FINAL.pdf https://www.carbonblack.com/wp-content/uploads/2016/12/16_1214_Carbon_Black-_Threat_Report_Non-Malware_Attacks_and_Ransomware_FINAL.pdf https://www.fireeye.com/products/mir-endpoint-forensics.htmlhttps://www.fireeye.com/products/mir-endpoint-forensics.htmlhttps://www.volatilityfoundation.orghttps://www.volatilityfoundation.orghttps://malpedia.caad.fkie.fraunhofer.dehttps://malpedia.caad.fkie.fraunhofer.dehttps://www.malwarearchaeology.com/analysishttps://attack.mitre.org/software/https://github.com/tum-i22/obfuscation-benchmarks

  • [41] C. Cadar, P. Godefroid, S. Khurshid, C. S. Păsăreanu,K. Sen, N. Tillmann, and W. Visser, “Symbolic executionfor software testing in practice: Preliminary assessment,” inProceedings of the 33th International Conference onSoftware Engineering (ICSE), Honolulu, HI, May 2011.

    [42] V. Chipounov, V. Georgescu, C. Zamfir, and G. Candea,“Selective Symbolic Execution,” in Proceedings of the 5thWorkshop on Hot Topics in System Dependability (HotDep),Estoril, Portugal, Jun. 2009.

    [43] D. Brumley, C. Hartwig, Z. Liang, J. Newsome, D. Song, andH. Yin, “Automatically identifying trigger-based behaviorin malware,” in Botnet Detection, Springer, 2008, pp. 65–88.

    [44] L. Martignoni, S. McCamant, P. Poosankam, D. Song, andP. Maniatis, “Path-exploration lifting: Hi-fi tests for lo-fiemulators,” in Proceedings of the 17th ACM InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS), London,UK, Mar. 2012.

    [45] F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su,“X-Force: Force-Executing Binary Programs for SecurityApplications,” in Proceedings of the 23rd USENIX SecuritySymposium (Security), San Diego, CA, Aug. 2014.

    [46] R. Baldoni, E. Coppa, D. C. D’Elia, and C. Demetrescu,“Assisting Malware Analysis with Symbolic Execution: ACase Study,” in Proceedings of the InternationalConference on Cyber Security Cryptography and MachineLearning (CSCML), Israel, Jun. 2017.

    [47] B. Yadegari and S. Debray, “Symbolic Execution ofObfuscated Code,” in Proceedings of the 22nd ACMConference on Computer and Communications Security(CCS), Denver, Colorado, Oct. 2015.

    [48] M. Carbone, W. Cui, L. Lu, W. Lee, M. Peinado, and X.Jiang, “Mapping kernel objects to enable systematic integritychecking,” in Proceedings of the 16th ACM Conference onComputer and Communications Security (CCS), Chicago,Illinois, Nov. 2009.

    [49] W. Cui, M. Peinado, Z. Xu, and E. Chan, “TrackingRootkit Footprints with a Practical Memory AnalysisSystem,” in Proceedings of the 21st USENIX SecuritySymposium (Security), Bellevue, WA, Aug. 2012.

    [50] J. Rhee, R. Riley, Z. Lin, X. Jiang, and D. Xu, “Data-CentricOS kernel malware characterization,” IEEE Transactionson Information Forensics and Security, vol. 9, 2014.

    [51] B. Dolan-Gavitt, T. Leek, J. Hodosh, and W. Lee, “Tappanzee (north) bridge: Mining memory accesses forintrospection,” in Proceedings of the 20th ACM Conferenceon Computer and Communications Security (CCS), Berlin,Germany, Oct. 2013.

    [52] Z. Lin, X. Zhang, and D. Xu, “Automatic reverse engineeringof data structures from binary execution,” in Proceedings ofthe 17th Annual Network and Distributed System SecuritySymposium (NDSS), San Diego, CA, Feb. 2010.

    [53] A. Slowinska, T. Stancescu, and H. Bos, “Howard: a DynamicExcavator for Reverse Engineering Data Structures,” inProceedings of the 18th Annual Network and DistributedSystem Security Symposium (NDSS), San Diego, CA, Feb.2011.

    [54] Q. Feng, A. Prakash, H. Yin, and Z. Lin, “Mace:High-coverage and robust memory analysis for commodityoperating systems,” in Proceedings of the 30th AnnualComputer Security Applications Conference (ACSAC),2014.

    [55] M. Polino, A. Scorti, F. Maggi, and S. Zanero, “Jackdaw:Towards Automatic Reverse Engineering of Large Datasetsof Binaries,” in Proceedings of the Conference on Detectionof Intrusions and Malware, and Vulnerability Assessment(DIMVA), Milan, IT, Jul. 2015.

    [56] Z. Xu, J. Zhang, G. Gu, and Z. Lin, “Autovac:Automatically extracting system resource constraints andgenerating vaccines for malware immunization,” inProceedings of the 33rd International Conference onDistributed Computing Systems (ICDCS), 2013.

    [57] B. Saltaformaggio, Z. Gu, X. Zhang, and D. Xu, “DSCRETE:Automatic Rendering of Forensic Information from MemoryImages via Application Logic Reuse,” in Proceedings of the23rd USENIX Security Symposium (Security), San Diego,CA, Aug. 2014.

    [58] B. Saltaformaggio, R. Bhatia, X. Zhang, D. Xu, and G. G.Richard III, “Screen after previous screens: Spatial-temporalrecreation of android app displays from memory images,”in Proceedings of the 25th USENIX Security Symposium(Security), Austin, TX, Aug. 2016.

    [59] R. Bhatia, B. Saltaformaggio, S. J. Yang, A. Ali-Gombe,X. Zhang, D. Xu, and G. G. Richard III, “"Tipped Offby Your Memory Allocator": Device-Wide User ActivitySequencing from Android Memory Images,” in Proceedingsof the 2018 Annual Network and Distributed System SecuritySymposium (NDSS), San Diego, CA, Feb. 2018.

    [60] B. Saltaformaggio, R. Bhatia, Z. Gu, X. Zhang, and D. Xu,“GUITAR: Piecing Together Android App GUIs fromMemory Images,” in Proceedings of the 22nd ACMConference on Computer and Communications Security(CCS), Denver, Colorado, Oct. 2015.

    [61] B. Saltaformaggio, R. Bhatia, Z. Gu, X. Zhang, and D. Xu,“VCR: App-Agnostic Recovery of Photographic Evidencefrom Android Device Memory Images,” in Proceedings of the22nd ACM Conference on Computer and CommunicationsSecurity (CCS), Denver, Colorado, Oct. 2015.

    [62] W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, andA. Bates, “NoDoze: Combatting Threat Alert Fatigue withAutomated Provenance Triage,” in Proceedings of the 2019Annual Network and Distributed System SecuritySymposium (NDSS), San Diego, CA, Feb. 2019.

    [63] W. U. Hassan, A. Bates, and D. Marino, “Tacticalprovenance analysis for endpoint detection and responsesystems,” in Proceedings of the 41st Symposium onSecurity and Privacy (Oakland), Online Conference, May2020.

    [64] S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V.Venkatakrishnan, “Holmes: real-time APT detection throughcorrelation of suspicious information flows,” in Proceedingsof the 40th Symposium on Security and Privacy (Oakland),San Francisco, CA, May 2019.

    [65] R. P. Kasturi, Y. Sun, R. Duan, O. Alrawi, E. Asdar, V. Zhu,Y. Kwon, and B. Saltaformaggio, “TARDIS: Rolling backthe clock on CMS-targeting cyber attacks,” in Proceedingsof the 41st Symposium on Security and Privacy (Oakland),Online Conference, May 2020.

    [66] Y. Shen and G. Stringhini, “Attack2vec: Leveragingtemporal word embeddings to understand the evolution ofcyberattacks,” in Proceedings of the 28th USENIX SecuritySymposium (Security), Santa Clara, CA, Aug. 2019.

    [67] R. Rolles, Rolfrolles/hexraysdeob, https://github.com/RolfRolles/HexRaysDeob, Jun. 2018.

    https://github.com/RolfRolles/HexRaysDeobhttps://github.com/RolfRolles/HexRaysDeob

  • A Appendix: Additional Technical Material

    Capability Plugin Tracked APIs (Reverse Order) Tracked Parameters Description

    File Exfiltration

    Send(socket, buf)Socket(socket)ReadFile(hFile, buf)OpenFile(lpFname)

    socket

  • Sample Hash Capture IP Capability End Address(es)f9c6db5331051aa487b706f0616f3287a40a27606bfddc804b3c4684d4203717 0x140005057 0x14000806059b9d061ff78c240e1e0e8135d9be482e0fe788186b6cb940f56c67798a862df 0x14000515b 0x1400081521eed6b168c2cd7701bf3a2aa6a30cf014cae9bc6ae813ef7356c5c6bc8ad6d18 0x1400050e7 0x1400080ec471de9132673ec513b5c7c06a4bc1f67a7e91c6c8c7def55e9e03131ac5fb400 0x40109d 0x401374, 0x77244bb4153fb1b9cd5dabffa3d123c4ac91abae46546db7447140df7b4aa1f2d3e8f59e 0x1400010e6 0x1400010f1, 0x77994bb4ffec8e4a80182eb507489bcabd368d42489bf1ec871542c131df04c068d01a76 0x14000221f 0x140002516, 0x78204bb4baa0f9e799a3d46ccb04c9d4520a69e58383b2d88aad8746f9214eaa8d3a06f3 0x14000f2b1 0x140011380, 0x14000f29eff64690b250faa9b1902b945f543a7b4ff9560cb562c0b18f3798538cc28178c 0x14000c2ac 0x14000fa10, 0x14000c299dc616f2f6b1856454412ea608b96d3d6d7ab719684b6d04f0a79cf9228477d4f 0x140001408 0x1400013fd, 0x140001054f2f9696ffea5b8cf3c1bf860a3d0704033b7693199cf097367a052144b0c350f 0x14000b089 0x1400234a9, 0x77314bb43025bf51ac1f1571e3f49ee1836d44f0cfd9bfcf6e39731f6fea0ddde33925a1 0x1400012d4 0x140007690, 0x78714bb4e82b6a27a1aec373983f189cd422f1eeb336f1f493db341df5d090a4946feae8 0x14000159a 0x140012987, 0x7fefef911fd,

    0x77984bb4, 0x14000de82,0x14000de5f, 0x14000de8d

    51c5668f052bbfb4ca9670413a240c8214264839211119543b28f90f86504edc 0x14000136c 0x1400043b5f055f75abb82c9500b3f2cf64f6b546105177599b718304b3fc569e932533087 0x14000be19 0x14000e360, 0x14000be06c06b359921a385efbf8ce33bd875a797d89f88c575fe640173429ce5a10b45ae 0x140001c56 0x140002e0e, 0x77ba4bb454b49a2faef8b8a6b8ef9bd96a44575403025e8c422ef8817d8cba6ea0344945 0x14000ec06 0x140010bc7, 0x14000ebf3a785bc5be1fd3e9f6997f558a4e613b973769cc43c6e7b738158354b66390d06 0x1400012b5 0x140004daa, 0x78114bb4fee18f402375b210fc7b89e29084fb8e478d5ee0f0cdb85d4618d14abb2e5197 0x14000faa9 0x140011e80, 0x14000fa96f85abdfa7e8931686bbbb9bb0dd2e12ca10f28b8b1b7be2890eb19023c52232a 0x1400242b8 0x1400242c3, 0x77314bb4ec72f1af9119754195a77cd890cc9e5ee1e555e9ef89fe2e535ee3e4ce2132cf 0x140011f10 0x140016387, 0x140011efda5c8d9df73b2ff360d22e879b678d323bbccd81cb9e0ef45cce4aaf4e37c7f27 0x140011516 0x14001163a, 0x77644bb458f9504b59b40dfbff5e3093af0a39def00b449c499ef3e7c0880ac986575f76 0x14000fda9 0x140012180, 0x14000fd966130a8c7595f6d9abc3dba157e8bd7596b11c9903296060e52d764a8719d7b84 0x140023d5a 0x77d0a358, 0x140023156,

    0x77ba4bb4, 0x77d03e18,0x140025d80

    c9b27cbdc1b4258cd4103b3847e7de9c52985289ce4bd61323d69bf9c1e2a8c0 0x1400025bc 0x14000343c, 0x77874bb41edfad978a9e4beb24c2f51e9cf12424d415f5e9b5292279ac47b9f650495b31 0x140001041 0x14000174e, 0x140001045,

    0x77e1a358, 0x1400016a0,0x140001437

    cab869f98ba3fe1948d2b48fa76fa4767fa7f31e28f3be2b34572ab0c63f942a 0x14000f8cd 0x140011ca0, 0x14000f8ba600845916e82b6de80f9ff1d6a0553ff98bce6f41dc6029343821f095072fcee 0x14000a6c9 0x14001bbb9, 0x77644bb4c2bda34d3ac4844ea377aa87b115b94019b98919de7d153029865efc969fd46d 0x140002a5e 0x76f53e18, 0x140002a62b59c3d14968a9d7d90baa0df624339aa977dc98e5de1c7f6b71bef23606db769 0x14000d20d 0x14000f1a0, 0x14000d1fa2dbd5d77540a1470459d74906d1668ae49fb275d834976fae1f31bbe74d8e168 0x140003df6 0x140010ba0, 0x76cd4bb4e47b4147f8a51511b087f90ae07a4d0650b17a6ca2be5a7b19ad1c3f058fb15f 0x4038db 0x406a8b, 0x7fefbb1580a,

    0x405b7a, 0x405b7f, 0x406a7f6dabcf4ce36360826b381a80a7bd34d0df6612f37528e0086009a87bbc16ee57 0x401724 0x7fefdaa99e2, 0x7fefdad811eb1dee4864ee0d67afb4889cdb0efe1ea54e1005debeb9ef4b4541848c23750c8 0x14000b079 0x14001f7e9, 0x77984bb4f3fb1b8bd66a67e9f5e00895fb1fee886764c1fc65def4b0104eb7408973ee40 0x140004750 0x778c3e18, 0x77644bb4b3f91bd440d63ff0b3a28e3fc444714088dc8f30160a6e5f8073594f7d9a6aa6 0x1400016ae 0x1400172b0, 0x7feff5d11fd,

    0x77244bb4, 0x140010cdb,0x140010ce9, 0x140010cb2

    d12899958f7adc1be6a3f540f5a25a6ea5eb024dba018d7d3d0a1808df970323 0x140004ac2 0x140004c00ae210c336cdfec7f7f523fa5b910981e2896f53184b3863621629e81cc0607ed 0x14000a747 0x78263e18, 0x14000a73235b8a197bd6642f62af2b809ba72d8d7cc4ac18879f10cffeb8f2df66db93746 0x14000a6cd 0x14001f869, 0x140005d7e,

    0x14000370ec9ee386c3d2b8230d95870ce3391aa8a4890169a0fe021a5562d3735f2466160 0x140001238 0x140001243db8caaf17e1e9afa4a64b7e6a57d07a2eb6669edaed70daced81295ea183da9f 0x14000233c 0x140002347355341b710fe7f121df4c5fcfc32de9da5a5e2003f0869fcbb7a47f92f2471f2 0x40369d 0x403a64, 0x40146e, 0x4014f8fba0cc427658445f0ca78d6a263c5b9a9714e99e733ffe25ba719c9b39b98664 0x14000a685 0x140019af9, 0x77ba4bb423c7eee980ca21ac8597bd6eb2147e4bfc1941490db87f276a13146914ea5637 0x140003957 0x1400075ac, 0x140003945,

    0x1400074d6, 0x1400074fe,0x14000721f

    4f998e4290bdf67dc4a1e75ed739eb57defda3c329b6b07f29b3b6c771a8b3ea 0x1400010ae 0x1400032d9, 0x77ba4bb4a238ccc209980719927c777fc9f16866403cb9d58c0c847b9cd92ece0d46e725 0x14000226c 0x14000ab18, 0x14000225a,

    0x14000aa42, 0x14000aa6a,0x14000a79f

    17ecabd73e1eb5f7a7f6b35b0c48d3fcf5f73f65aef34993726439d7d27da849 0x14000254b 0x1400025565c9e92f6b45b0cb098838e5db6623067396f066704f9c909b31d234bfaf74458 0x100005642 0x10000c259697256960cdded3229b0f2f99b593751d3862774dc7c5cabdbbf769beadd263f 0x2000032cb 0x20000da50c0be7a344a863894890127e61851838037bd9d076423bfc8296cfd6e01d66f6b 0x14000f939 0x140011d10, 0x14000f926656ac5ec110c5f8ce68ce1962d6b2cbd47ee6ce20a181c88bb1e5481793f0578 0x140001c70 0x140001c81, 0x14000133a

    Table 9: Malware Samples And Parameters Used In The Empirical Evaluation (§4.6).

    IntroductionOverviewHybrid Incident ResponseIncident Response with Forecast

    System ArchitectureModeling Concreteness to Guide Capability ForecastingDC(s)-Guided Symbolic AnalysisForecasting Malware Capabilities

    EvaluationEvaluating Capability ForecastsPacked MalwareTactics To Subvert ForecastLarge-Scale AnalysisPre-Staged Concrete InputComparing Existing Techniques

    Related WorkLimitations and DiscussionConclusionAppendix: Additional Technical Material


Recommended