+ All Categories
Home > Documents > Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i....

Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i....

Date post: 19-Mar-2020
Category:
Upload: others
View: 17 times
Download: 0 times
Share this document with a friend
12
Reflash: practical ActionScript3 instrumentation with RABCDAsm Jarkko T urkulainen F-Secure [email protected] June 16, 2016 Abstract Adobe Flash has been announced dead for almost ten years now. But it is still here, installed on almost every computing device on Earth, and it is getting more attention because of the alarming rate of zero-day vulnerabilities we have been witnessing over the past few years. Even though the Flash platform is such wide spread, it is still lacking comprehensive binary analysis tools. In this paper we try to address this shortcoming with a set of tools and techniques for dynamic Action- Script3 (AS3) instrumentation and analysis. The techinques described in the paper covers generic AS3 opcode instrumentation and stack trace generation, and finally a toolchain to operate generated data outside the AS3 execution environment is presented. The presented toolchain consists of a service component that listens on active network connections and attempts to inject an instrumentation module to flash files on the wire. The instrumentation module produces a generic stack trace that is sent back to the service for offline analysis. The last component of the framework is a set of tools for building and manipulating a SQL database of stack events. The paper shows that the presented client/server-architecture is scalable and relatively stable in a hostile execution environment. I. Introduction A dobe Flash platform is a very common target for malicious software. It has a huge install base, it is still enabled by default on most browsers and as a very compli- cated interpreted software platform it also has many bugs. Writing software for the Flash plat- form is relatively fast and its deep interaction with other Web tehnologies, such as JavaScript and VBScript, offers attractive opportunities for malicious actors. It is quite common to include Flash compo- nents in so-called Exploit Kits, either as a form of payload delivery mechanism, or as a main target for vulnerability exploitation. As a part of Exploit Kit (EK), the malicious Flash can au- tomate exploitation, obfuscate the EK actions, collect statistics and steal personal information from target systems. Reverse engineering a complicated software stack including web pages, JavaScript, Flash and possibly other interpreted languages, end- ing up to a physical CPU using software bug, is a challenging and time-consuming task. Flash files usually contain several layers of embed- ded Flash files for further obfuscating the pur- pose of their existence. The nature of such interpreted environments offer good opportu- nities for complicated code obfucation, but on the other hand, it also lacks the control com- pared to native execution environments. For complicating the matter even further, Flash is usually run on a Web environment, and it is dependent on external resources, such as JavaScript and other Web content. This again creates a twofold situation. On the other hand, having the entire Internet as a computing re- 1
Transcript
Page 1: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3instrumentation with RABCDAsm

Jarkko Turkulainen

[email protected]

June 16, 2016

Abstract

Adobe Flash has been announced dead for almost ten years now. But it is still here, installed on almostevery computing device on Earth, and it is getting more attention because of the alarming rate of zero-dayvulnerabilities we have been witnessing over the past few years. Even though the Flash platform is suchwide spread, it is still lacking comprehensive binary analysis tools.

In this paper we try to address this shortcoming with a set of tools and techniques for dynamic Action-Script3 (AS3) instrumentation and analysis. The techinques described in the paper covers generic AS3opcode instrumentation and stack trace generation, and finally a toolchain to operate generated dataoutside the AS3 execution environment is presented.

The presented toolchain consists of a service component that listens on active network connections andattempts to inject an instrumentation module to flash files on the wire. The instrumentation moduleproduces a generic stack trace that is sent back to the service for offline analysis. The last component ofthe framework is a set of tools for building and manipulating a SQL database of stack events. The papershows that the presented client/server-architecture is scalable and relatively stable in a hostile executionenvironment.

I. Introduction

Adobe Flash platform is a very commontarget for malicious software. It has ahuge install base, it is still enabled by

default on most browsers and as a very compli-cated interpreted software platform it also hasmany bugs. Writing software for the Flash plat-form is relatively fast and its deep interactionwith other Web tehnologies, such as JavaScriptand VBScript, offers attractive opportunities formalicious actors.

It is quite common to include Flash compo-nents in so-called Exploit Kits, either as a formof payload delivery mechanism, or as a maintarget for vulnerability exploitation. As a partof Exploit Kit (EK), the malicious Flash can au-tomate exploitation, obfuscate the EK actions,collect statistics and steal personal information

from target systems.Reverse engineering a complicated software

stack including web pages, JavaScript, Flashand possibly other interpreted languages, end-ing up to a physical CPU using software bug, isa challenging and time-consuming task. Flashfiles usually contain several layers of embed-ded Flash files for further obfuscating the pur-pose of their existence. The nature of suchinterpreted environments offer good opportu-nities for complicated code obfucation, but onthe other hand, it also lacks the control com-pared to native execution environments.

For complicating the matter even further,Flash is usually run on a Web environment,and it is dependent on external resources, suchas JavaScript and other Web content. This againcreates a twofold situation. On the other hand,having the entire Internet as a computing re-

1

Page 2: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

source creates reverse engineering problemsthat are theoretically very difficult to solve [1]but it also rules out some problems existingon native platforms, such as timing attacksbecause timing network connections is not reli-able.

The research described on this paper con-centrates exclusively on the Flash part of thiscomplete picture. One of the challenges theauthor had to face was almost complete non-existence of binary analysis and debuggingtools for the Flash platform. For native plat-forms, such as Intel x86/x64, there exists al-ready multitude of good binary research tools,but not so much for the Flash platform. As aninterpreted execution environment, the Flashrelies a lot on the native execution runtime li-braries, which do offer very good debuggingsupport, but in order to use them, one wouldhave to compile the software with debugginginformation. Once compiled, the code can-not anymore directly manipulate its instruc-tion stream, which means that Flash softwarewithout debugging information is out of reachof debuggers and other profiling tools. Thisagain is a clear difference to native platforms,where one loses a lot of information withoutdebugging symbols, but it is still possible tomanipulate the code, for example insert break-point instruction (INT3 on Intel) and catch theexception with external debugger tools.

One of the main motivations for the authorwas to create a set of tools analogous to famil-iar native platforms debugging tools. The verybasic requirement for such a tool is the abil-ity to run without any debugging information.This is absolutely required, since the malicioussoftware usually leaves out such reverse engi-neering aids. The other major requirement isstablility and efficiency because malicious soft-ware quite often exploits the limits of executionand uses exotic platform features.

Proof of concept of this research is calledReflash. It is a set of tools for analyzing binaryFlash files without any debugging information.Reflash grew out of curiosity for a quite promis-ing platform. It is not limited only to reverseengineering use, altough that is its main pur-

pose.Reflash can run Flash files written in Ac-

tionScript, targeting the Adobe ActionScriptVirtual Machine version 2.

i. ActionScript 2/3

The language used to write software for theFlash platform is called ActionScript. Its cur-rent version is 3. ActionScript is an object-oriented language originally designed by GaryGrossman [4]. ActionScript is a dialect ofECMAScript, standardized scripting languagespecification. Other known dialects includeJavaScript and JScript.

Reflash operates on a lower level, using theunderlying Virtual Machine instruction set andfeatures. However, some parts of the instru-mentation module is written in ActionScript,for conveniece.

The underlying parsing engine RABCDAsm[3] supports ActionScript 2 and 3, but pro-grams written in version 2 has not been testedin the context of Reflash.

ii. Adobe AVM2

The execution environment for ActionScriptsofware is ActionScript Virtual Machine,shortly AVM. For ActionScript 2/3, the exe-cution environment is called AVM version 2,shortly AVM2.

AVM2 is a stack-based machine, unlikemost native platforms and also some inter-preted platforms such as Dalvik [5]. AVM2instruction set consists of up to 256 differentopcodes, most of them handling data valueson stack, such as arithmetic operations.

AVM2 execution environment consists ofmethod body (instructions), data stack, heap,local registers and scope stack. Of these themost relevant parts for Reflash are the actualinstructions, data stack and local registers.

Instructions are a stream of binary opera-tion codes (opcodes), such as ADD, CALL orJUMP. AVM2 instruction are high-level opcodes,meaning they can operate directly with Action-Script language constructs, such as classes andarrays.

2

Page 3: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

Data stack is a memory area used for arith-metic operations, call parameters and otherdata. AVM2 stack operates on a familiar con-cept of pushing and popping data from thestack. There is no direct way to manipulatestack pointer.

Local registers are quite similar to stack,but instead of indirect data manipulation, datais moved in and out of local registers with spe-cific opcodes (GETLOCAL/SETLOCAL). Bothstack and local register can hold any type ofvalue.

Heap is a data area managed by the run-time and it cannot be addressed directly fromthe AVM2. Only way to observe data in heap isthrough the objects created on the heap, suchas new classes.

For more comprehensive introduction toAVM2, Adobe’s AVM2 overview is a good ref-erence [2]

iii. Reflash

Reflash toolkit consist of several components.In the following sections Reflash refers both tothe instrumentation engine and to the overallconcept. If there is a need to distinguish thetwo, it will be indicated clearly. When referringto the instrumentation engine, a term Reflashexecutable is used.

• Reflash executable is the actual underly-ing instrumentation engine. It is imple-mented as a standalone executable thatcan also be be used independently fromother components.

• Instrument is an instrumentation moduleinjected by the Reflash executable to allanalyzed flash files. It is responsible forgenerating stack trace of instrumentedflash program.

• Proxy is a service component acting asa HTTP proxy. It attempts to captureany flash file requests and executes Re-flash executable before returning the ac-tual flash content to the client.

• Dbtool is used for building a SQLdatabase from stack trace. There are also

some useful features in Dbtool, for ex-ample it can run YARA [10] over thedatabase or produce a human readabledecompilation from the stack trace.

• Replay is a graphical frontend for the SQLdatabase. It presents a debugger-like in-terface that can be used for analyzing thedatabase.

• reflasher.py is a driver script that bindstogether Proxy and Dbtool for automaticstack data collection.

In the following sections, some of the abovecomponents are described with details.

II. Reflash executable

Reflash is the main component of thetoolkit. It is a standalone executable,written in D-language [6]. The choice

of programming language was natural, sinceReflash interfaces the RABCDAsm [3] classesdirectly. Reflash has an integrated disassem-bler, assembler and generic flash file injectionmechanism. The purpose of flash file injec-tion is to have a flexible approach for writinginstrumentation modules.

i. Opcode instrumentation mecha-nism

When Reflash is instructed to perform opcodeinstrumentation, it will disassemble all instruc-tions in a given flash file and replace opcodeswith a generated assembly code that collectsthe stack arguments and transfers them to aninstrumentation module written with Action-Script. Instrumentation decision is made byconsulting a user-supplied configuration filewhere user can define a list of regular expres-sions that are evaluated against the opcodenames during the disassembly. In the mostgeneric setup, only one hook may be defined,a wild card ".*".

Opcode names correspond directly tonames presented in the Adobe AVM2 docu-ment [2]. In addition to the normal opcodes,there is one pseudo opcode, method_entry avail-able for instrumentation. This hooks the

3

Page 4: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

method prolog but instead of stack arguments,it collects arguments from local registers ac-cording to a protocol described in [2]: thispointer is always in the first local register, andrest of the arguments are placed to followinglocal registers.

If an opcode was selected for instrumenta-tion, certain preconditions needs to be specifiedbefore the actual code injection:

• Amount of stack items to be evaluated.This can be reliably determined fromstatic compile time information, such asthe actual operation the opcode is per-forming (for example: ADD always addstwo top values on stack) or the amountof arguments for function calls.

• How many of the stack items are ob-ject arguments and function arguments.Object arguments are usually the thispointer, namespace and/or name infor-mation for runtime multinames [2] andother runtime-specific stack values, or forexample arithmetic operation arguments.Function arguments are arguments formethod calls and class constructors. Ob-ject arguments are never modified, butfunction arguments can be modified bythe instrumentation module. There arealways at least one object argument.

• Session context for the opcode. Sessionconsists of symbolic user-defined prefix(usually just a single character, such as"s"), a number indicating flash file run inthe session, a unique method identifierand an opcode index. This information islater sent back from instrumentation forconnecting a specific dynamically gener-ated stack event to statically generateddata, such as disassembly.

• Available local registers. Instrumenta-tion needs to be aware of local registersused by the code for avoiding collisionswith the original code. This informationis always available statically in flash filemethod body structure.

• Instrumentation module API. Instrumen-tation code needs to refer to the Action-Script module with a specific API that

consists of user-defined package name, afixed class Instrument and a fixed methodname InstrumentStack. Method Instru-mentStack takes three parameters: ses-sion context, amount of object argumentsand an array of stack items. Instru-mentStack returns back an array consist-ing of function arguments. If an argu-ment was not modified, item in the arrayis undefined.

With the information defined above, a genericform of opcode instrumentation can be pre-sented as following listing:

1 _ s t a r t :2 s e t l o c a l A+13 s e t l o c a l A+24 . .5 s e t l o c a l A+N6 g e t l o c a l A+N7 . .8 g e t l o c a l A+29 g e t l o c a l A+1

10 newarray [N]11 s e t l o c a l A12 g e t l e x PACKAGE: Instrument13 pushstr ing SESSION_ID14 pushint X15 g e t l o c a l A16 c a l l p r o p e r t y : InstrumentStack ( 3 )17 s e t l o c a l A18 g e t l o c a l A+N19 . .20 g e t l o c a l A21 c a l l p r o p e r t y : pop ( 0 )22 dup23 pushundefined24 i f e q L225 jump L326 L2 :27 pop28 g e t l o c a l A+229 L3 :30 g e t l o c a l A31 c a l l p r o p e r t y : pop ( 0 )32 dup33 pushundefined34 i f e q L435

36 jump L537 L4 :38 pop39 g e t l o c a l A+140 L5 :41 OPCODE

Listing 1: Generic instrumentation opcodes

4

Page 5: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

Explanation of code listing line by line:

• lines 2-5 Store stack items to local regis-ters. Symbol A refers to first availablelocal register, used later for saving thestack items as an array. Symbol N is thetotal number of stack items determinedstatically.

• lines 6-9 Push back local registers to stackin reverse order.

• line 10 Create an array of N items.• line 11 Store created stack item array to

local register A.• line 12 Get PACKAGE.Instrument where

PACKAGE is a user-defined packagename corresponding the the packagename used in instrumentation Action-Script module. Instrument is a fixed func-tion name.

• line 13 Push the session identifier to stack.• line 14 Push the number of object argu-

ments to stack.• line 15 Push the stack item array to stack.• line 16 Call InstrumentStack with three

parameters(array, object argument count,session identifier).

• line 17 Store the returned array in A.• line 18 Start pushing the object arguments

from local registers to stack, up to X. Inthis procedure, there is no need for anytype checking.

• lines 20-21 After X object arguments, popfirst function argument to stack.

• line 22 Duplicate the stack item.• line 23 Push undefined to stack.• line 24 If the stack items are equal, branch

to L2.• line 25 If the stack items are not equal,

branch to L3, leaving stack item intact.• line 27 Pop undefined from stack.• line 28 Push function argument from local

register.• lines 30-39 Repeat steps at 20-28, ending

up to last function argument A+1.• line 41 Call original opcode.

This method of moving stack arguments as anarray was originally deviced by Jeong WookOh with his tool FlashHacker [7]. The reason

why stack items has to be recycled using localregisters is to retain the types. First version ofReflash was executing NEWARRAY and sav-ing the full array returned by InstrumentStackdirectly to stack, without any type coercion,which led to obscure failures. Second revi-sion was doing some manual type coercion forvalues indicated by debug Flash Player, but itproved to be futile. The current implementa-tion is very careful to store only modified func-tion arguments to stack from returned array,all other items are restored from local registers.

ii. Opcode relocations

When Reflash executable is inserting instru-mented code blocks to the instruction stream,it has to relocate some of the branch targets andexception handler targets in the method body.This procedure is straightforward: if the targetis after inserted code block, it needs to be ad-justed with the size of the inserted block. Thisneeds to be done to all branch targets, aftereach instrumentation. The following diagramillustrates the procedure:

Figure 1: Branch target relocations

5

Page 6: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

iii. Flash injection mechanism

Flash injection mechanism is generic mecha-nism for including the Instrument to the ana-lyzed flash file. It is implemented using thebuilt-in RABCDAsm engine [3]. When the ana-lyzed flash file is being disassembled on disk,the injected flash is also disassembled to a sub-directory and its class files are included withRABCDAsm include-directive.

Injection mechanism is completely agnos-tic about the underlying API so it cannot doany verification for instrumentation engine andinjected flash code interoperation. That is com-pletely the responsibility of the user.

1

2 # include " wrutrofsoudkqvr . s c r i p t . asasm "3 # include " g/ c d x j k y j r l a v . s c r i p t . asasm "4 # include " g/ i e p f k n f p d j n g r r r . s c r i p t . asasm "5 # include " g/ekxapdubdwhxqmz . s c r i p t . asasm "6 . .7 # include " . . / sub/Instrument . s c r i p t . asasm "

Listing 2: Example includes

iv. Metadata

In order to later correlate stack trace to theoriginally disassembled code, specific meta-data is generated by the Reflash executable.This metadata is a simple stream disassemblyof the original code.

1

2 0 −0:4: wrutrofsoudkqvr/ i n s t a n c e / i n i t :3 00000000 g e t l o c a l 04 00000001 pushscope5 00000002 pushbyte 06 00000003 s e t l o c a l 117 00000004 pushbyte 08 00000005 s e t l o c a l 79 . .

Listing 3: Example disassembly metadata

In the above listing, 0-0 is the Session identi-fier. Leftside column is an opcode index inthe unique method body 4; wrutrofsoudkqvr/in-stance/init. Later when the Instrument returnsstack event, it can be correlated to the metadatafor presenting contextual background for theevent.

v. Performance

For evaluating Reflash executable performance,we selected three files based on file size andfeatures. These files were instrumented withfour different configurations. The executiontime was measured with Unix time command.

Small file is a malicious file of 717 bytes.It does nothing else than load additionalflash payload from network using methodflash.display.Loader::load(). This is fairly typicalprocedure used by flash components in ExploitKits and other malware.

Typical file presents the most common type offlash malware. It is a compressed file withsize of 69,5KB, consisting of 10138 opcodes.It is loading two embedded flash files withflash.display.Loader::loadBytes(). The final flashpayload attempts to exploit vulnerability in theFlash Player. Embedded payloads are extractedand prepared with various manipulations overByteArrays.

Large file is a benign game file of 13,0MB. Thefile contains total of 730608 opcodes.

Configurations present sets of instrumentedopcodes.

First configuration is very minimalistic set ofhooks instrumenting only CALL instructions.

Second configuration is a conservative set con-sisting of the following regular expressions:

1

2 " method_entry " ,3 " c a l l . ∗ " ,4 " i n i t . ∗ " ,5 " setprop . ∗ " ,6 " c o n s t r u c t . ∗ "

Listing 4: Configuration 2

Third configuration is comprehensive set ofhooks instrumenting arithmetic operations, bit-wise manipulations and other typical opcodesin addition to the second set.

Fourth configuration is hooking only pseudoinstruction method_entry, used for evaluationof the raw performance.

6

Page 7: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

Table 1: Small file

Conf Instrumented Size Time

1 3/59 3611 bytes 0, 0s2 8/59 3702 bytes 0, 0s3 14/59 3776 bytes 0, 0s

Table 2: Typical file

Conf Instrumented Size Time

1 74/10138 73,0K 0, 1s2 263/10138 74,8K 0, 1s3 3487/10138 96,2K 0, 5s

Table 3: Large file

Conf Instrumented Size Time

1 9763/730608 13,5M 5.79s2 97714/730608 18,5M 44.5s3 267455/730608 26,8M 238.0s4 1742/730608 12,8M 3.6s

The results show that Reflash executableperforms well in typical situations, but theperformance starts to deteriorate when theamount of instrumented opcodes is in range5000-10000. The execution time appears togrow linearly, which indicates there is roomfor optimization in the instrumentation algo-rithm itself. The underlying parsing engineRABCDAsm [3] performs reasonably well evenin extreme situations, as we can see from largefile configuration 4.

It should be noted that this section dis-cusses only the perfomance of executing thestatic instrumentation. The performance of in-strumented files is discussed in the followingsection.

III. Instrument module

Iinstrument module is the code injected byReflash executable. It is written in Action-Script and compiled as a standard flash

file.

As with any Internet content, flash is de-signed to run untrusted content over the net-work. This places some difficult restrictions onthe data collection. For example, we cannotaccess disk directly for efficient data logging.Flash player also runs out of memory and CPUtime if the data collection requires too muchresources. All this makes the Instrument datacollection a challenging task.

Typically, similar applications collect dataon disk and do some preprocessing over thedata, such as presenting textual information inreadable form. Because disk access is restrictedand any preprocessing require VM resources,we have developed a method for minimizingthe amount of processing inside the VM. Stackargument processing has been divided in twoparts: first the arguments are collected andpacked in binary format and transferred overa binary TCP connection. After that the data isprocessed outside the VM.

i. Argument collection

Arguments are transferred from the instrumen-tation hook as an array of stack items. Thisarray is rearranged to a new array consistingof:

Table 4: Returned Array

Index Item

0 Session1 Type of argument 02 Argument 0 as ByteArray3 Type of argument 14 Argument 1 as ByteArrayN*2 - 1 Type of argument NN*2 Argument N as ByteArray

All arguments are presented as ByteArraybecause AS3 ByteArray method writeObject isused for formatting the argument in ActionMessage Format [8].

7

Page 8: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

The argument collection can be presentedas the following ActionScript-like pseudo code:

1 func t ion GetArguments ( array , s e s s i o n )2 {3 var len : i n t = array . length ;4 var ba : ByteArray = new ByteArray ( ) ;5 var r e t : Array = new Array ( ( len ∗2) +1) ;6

7 r e t [ 0 ] = s e s s i o n ;8 f o r ( var i : i n t = 0 ; i < len ; i ++)9 {

10 ba . wr i teObjec t ( array [ i ] ) ;11

12 r e t [ ( i ∗2) +1] = ArgumentType ;13 r e t [ ( i ∗2) +2] = ba ;14 }15 re turn r e t ;16 }

Listing 5: Argument collection

session in the above listing is the session con-text prepared by Reflash executable for theparticular hook. ArgumentType is class name asretuned by getQualifiedClassName().

Instrument module tracks the visited codelocations based on the session identifier for lim-iting the amount if collected stack data. If theamount of visits on particular location exceedsa preconfigured threshold (by default 20), datais not collected. This is to prevent exhaustingthe AVM2 resources, while still providing somedetails about loops. With the default settings,it is still possible to get an idea on what thetarget is running inside loops.

In the actual implementation, only safetypes are written directly with writeObject. Safetypes include final classes, like String and In-teger, and classes from package flash.*. Thissafety measure is to prevent infinite loops inclass getters. If the class is not safe to write,explicit coercing to ByteArray is tried and ifthat fails, zero-sized ByteArray is written tothe return array.

ii. Logging

After preparing the return Array, Instru-ment module packs up the Array again withwriteObject() and then sends the packed objectover a binary TCP connection to Proxy mod-ule. In effect, there is no log format or protocol

present. The log simply consists of series ofAMF-packed data objects that can be later un-packed with standard tools.

iii. loadBytes instrumentation

In addition to raw data logging, the Instru-ment module can also manipulate call argu-ments. This feature is utilized in one particularuse case, namely instrumentation of embed-ded flash content. Many malware uses embed-ded flash files as a form of obfuscation, so itis essential for Reflash to detect this and runReflash executable also for any detected em-bedded flash content, otherwise it will lose itsvisibility to the overall execution.

Embedded flash content loading issupported in ActionScript with methodflash.display.Loader::loadBytes, which loads con-tent from ByteArray. In practise, this method isalways executed with CALL opcode, so the ab-solute minimum configuration should includeall calls. Instrument module will detect CALLinstructions with flash.display.Loader objectand ByteArray argument. If this condition ispresent, it sends the ByteArray over a HTTPconnection back to Proxy module for instru-mentation. After receiving back instrumentedcontent, the ByteArray argument is replacedwith the new content.

This concept is simple in principle, butquite challenging to implement in the concur-rent Flash Players. Flash Player runs AS3 codeusually in a single thread and all networkingis asynchronous, based on callbacks. Becausethe ByteArray content needs to be replacedsynchronously, there is no definitive way toaccomplish this task. The current implementa-tion of Instrument uses AS3 ExternalInterface,which enables synchronous remote procedurecalls from AVM2. The embedded flash contentis wrapped as an argument to JavaScript func-tion in the context of browser and the JS codecan then perform synchronous HTTP requestto Proxy.

8

Page 9: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

iv. Performance

The performance of instrumented files is nottrivial to measure, because there are too manydetails involved. For that reason, a simple testfile was prepared. The test file runs a series ofsimple loops, copying bytes from ByteArray toanother:

1 var ba : ByteArray = new ByteArray ( ) ;2 var bb : ByteArray = new ByteArray ( ) ;3

4 var len : i n t = ba . length ;5

6 var s t a r t : Number = getTimer ( ) ;7 var i : i n t ;8 var j : i n t ;9 var x : i n t ;

10

11 f o r ( i = 0 ; i < N; i ++)12 {13 ba . p o s i t i o n = 0 ;14 bb . p o s i t i o n = 0 ;15 f o r ( j = 0 ; j < len ; j ++)16 {17 x = ba . readByte ( ) ;18 bb . wri teByte ( x ) ;19 }20 }21 . .22 [ repeated Round times ]23 . .24 f o r ( i = 0 ; i < N; i ++)25 {26 ba . p o s i t i o n = 0 ;27 bb . p o s i t i o n = 0 ;28 f o r ( j = 0 ; j < len ; j ++)29 {30 x = ba . readByte ( ) ;31 bb . wri teByte ( x ) ;32 }33 }34

35 t r a c e ( getTimer ( ) − s t a r t ) ;

Listing 6: Perfomance test loop

The fixed number N is the threshold for Instru-ment loop counter. This test procedure wasrepeated 10, 100, 1000 and 10000 times, produc-ing the following results when measuring theexecution time with getTimer():

Table 5: Instrumented code performance test

Rounds Instrumented Time

10 96/375 40ms100 726/2985 375ms1000 7030/29095 3, 6s10000 70030/290095 37, 1s

The test run with 10000 rounds was a goodstress test also for the Reflash executable. Theinstrumented main class contained a methodbody with around 1,2 million instructions. Asexpected, the execution time grows linearly.

IV. Proxy module

Proxy module has two operational modes:a sandbox mode, which only servesthe initial flash file to the browser

and a live mode, which passes all connectionsthrough. When the Proxy detects a requestto flash file, it will run the Reflash executableand return instrumented content instead of theoriginal flash file.

In addition to the flash instrumentation, theProxy is collecting the AMF-packed log gener-ated by the Instrument.

Proxy is implemented as an inline mitm-proxy script [9] .

V. Dbtool

Dbtool is a python module responsiblefor parsing the AMF-packed binarylog, populating a SQL database from

the parsed stack events and other tasks, such asproducing readable reports from the database.In addition to stack events, other data, such asthe metadata produced by Reflash executable,is written to the database.

9

Page 10: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

1 nw22/ i n s t a n c e /nw22/g e t _ s k o t i n a :2 {3 [00005586] ( propvoid ) : : wr i teBytes4 (5 obj : ByteArray : ’ ’6 arg : ByteArray : ’ \ xeb\x12X1\xc9f [ . . . ]7 arg : I n t e g e r : ’ 0 ’8 arg : I n t e g e r : ’ 1 4 2 6 ’9 )

10 [00005587] ( propvoid ) : : wri teMult iByte11 (12 obj : ByteArray : ’ \ xeb\x12X1\xc9f [ . . . ]13 arg : S t r i n g : ’ uqmyijenjr ’14 arg : S t r i n g : ’ iso −8859−1’15 )16 [00005588] ( propvoid ) : : wri teByte17 (18 obj : ByteArray : ’ \ xeb\x12X1\xc9f [ . . . ]19 arg : I n t e g e r : ’ 3 4 ’20 )

Listing 7: Example database report

VI. Replay

Replay is a graphical frontend for the SQLdatabase prepared by Dbtool. With

the stack events and metadata in SQLdatabase, it can present a coherent view of AS3execution. Replay was modelled after populardebugging tools for presenting a disassemblyview, stack view and a hex data view. Some ofthe features included in Replay:

• Present a stream disassembly of the cur-rent stack event’s method.

• Search textual data over stack items anddisassembly.

• Step forward and backward the stackevents.

• Go to specific stack item.• View selected stack item in hex view.• Save data from hex view to disk.• Set breakpoints on stack events, run, sin-

gle step.• Run YARA [10] over stack items and dis-

assembly.• IDA-style [11] disassembly navigation us-

ing ENTER and ESC keys.

Figure 2: Replay graphical database frontend

10

Page 11: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

VII. Future development

Reflash is still very early in its development,and there are many things that can be im-proved. One of the culprits in the current im-plementation is the instrumentation of embed-ded flash content. Relying on ExternalInterfaceis somewhat fragile, because there is no guar-antee that it will remain synchronous in anygiven situation. It appears to be reliable, butthat can change with any Flash Player release.

There is no obvious solution to this prob-lem, because essentially it requires to trans-form the asynchronous nature of AS3 to syn-chronous, for that particular case. If External-Interface suddenly stops working, one of thepossible solutions could be including an instru-mented code in the Instrument module. Thatwould require a two-step approach. First detectthe attempt to load embedded content, theninstrument the content, rebuild a new Instru-ment module embedding the new content andfinally restart the execution. That is somewhatproblematic to fully automate and it cannot bedone to polymorphic payloads.

Another possible solution to the embeddedcontent problem could be using a method de-viced by Timo Hirvonen with his tool Sulo [12].Sulo is instrumenting the Flash Player outsidethe AS3 context, so it can see all calls to load-Bytes no matter how many layers of embeddingthere is. On the other hand, it is still unknownif it is possible to feed back modified flash us-ing the Sulo tracing approach.

Next target for development is the Instru-ment logging, which can never be efficientenough. Here also the Sulo approach couldbe interesting. Instead of sending the AMF-packed data over to a TCP connection, justfetch them off the AVM2 using Sulo.

There are also some interesting things thatcould be done with the Replay frontend. Inthe current implementation, there is no exe-cution logic interception - it is just a dummydatabase frontend. It should be perfectly possi-ble to implement an AVM2 emulation for stackdata manipulation. In that scenario, only theinstructions that require AVM2 runtime, such

as class constructors and method calls, wouldbe instrumented. The resulting stack valuescould be migrated to the emulation, thus creat-ing a hybrid, or assisted emulator. Functionalitythat is not trivial to implement would be runon a real Flash Player, but all trivial operationswould be emulated over the database. Exampleof such trivial functionality could be a simpleADD operation. It is not necessary to instru-ment that, thus saving precious AVM2 CPUcycles.

One interesting future development possi-bility is to replace Reflash stack data generationbackend with Mozilla Shumway [13]. In thatsetup, Reflash instrumentation stack trace gen-eration would be implemented directly to theShumway runtime. Reflash executable’s rolewould be reduced to provide only contextualinformation and metadata to the database, andShumway would generate stack items. Thatshould provide better performance and sta-bility as there is no instrumentation involved.This approach sounds very promising in prin-ciple, but there would be a lot of compatibilityissues with concurrent malware and exploitkits. They tend to require a very specific andrealistic environments.

Finally, there is an interesting, albeit some-what speculative analogy from native plat-forms instrumentation platforms, such as IntelPin [14] . With that approach, Reflash wouldinstrument entire method body instead of in-dividual opcodes. There is already the pseudoopcode method_entry that could be extended forproviding emulated instrumentation for the en-tire method body. Because AS3 code cannotaccess directly the opcode level, intermediatecode could be provided forehand by the staticinstrumentation code. With this intermediatecode, the method instrumentation code couldemulate stack manipulation and proxy all op-codes requiring flash runtime, such as objectmanipulation on heap and all native methods.This could provide better perfomance and cov-erage.

11

Page 12: Reflash: practical ActionScript3 instrumentation with RABCDAsm · instrumentation modules. i. Opcode instrumentation mecha-nism When Reflash is instructed to perform opcode instrumentation,

Reflash: practical ActionScript3 instrumentation with RABCDAsm

VIII. Related work

The basic concept of dynamic instrumentationof flash files for malware analysis was intro-duced by Jeong Wook Oh in his 2012 presenta-tion, AVM Inception. In the original presenta-tion, a concept of class hooking was presented,but later he has also released a tool called Flash-Hacker [7] that uses the RABCDAsm [3] toolsfor instrumenting flash files, creating call tracesand other manipulations. Reflash can be con-sidered as a continuation of that work, extend-ing and automating the approach.

F-Secure Sulo by Timo Hirvonen is alsosomewhat related to what Reflash is doing.Reflash’s emphasis is in data postprocessing,and it could be beneficial to start using Suloapproach for data collection, instead of flashfile instrumentation. The trade-off would belosing the portability of Reflash AS3 instru-mentation, but there would be huge benefits inperformance.

IX. Conlcusion

This paper shows that large-scale instrumen-tation of flash files is not only possible, buta practical solution for dynamic ActionScriptanalysis. The novel client/server-architectureused in the solution also opens up other possi-bilities for even more efficient stack trace col-lection in the future.

References

[1] TrendLabs Security Intelligence blog. HowExploit Kit Operators are Misusing Diffie-Hellman Key Exchange (2015).

[2] Adobe Systems inc. ActionScript VirtualMachine 2 (AVM2) Overview (2007).https://www.adobe.com/content/

dam/Adobe/en/devnet/actionscript/

articles/avm2overview.pdf

[3] Vladimir Panteleev. Robust ABC (Action-Script Bytecode) [Dis-]Assembler. https://github.com/CyberShadow/RABCDAsm

[4] Gary Grossman, Emmy Huang. Action-Script 3.0 overview (2006). https://

www.adobe.com/devnet/actionscript/

articles/actionscript3_overview.

html

[5] Dan Bornstein. Dalvik VM Internals(2008). https://sites.google.com/

site/io/dalvik-vm-internals

[6] The D programming language. https:

//dlang.org/

[7] Jeong Wook Oh. FlashHacker ActionScriptBytecode instrumentation framework.https://github.com/ohjeongwook/

FlashHacker

[8] Adobe Systems inc. Action MessageFormat (2013). http://wwwimages.

adobe.com/www.adobe.com/content/

dam/Adobe/en/devnet/amf/pdf/

amf-file-format-spec.pdf

[9] mitmproxy, the Man-In-The-Middle proxy.https://mitmproxy.org/

[10] YARA, The pattern matching swissknife for malware researchers. https://virustotal.github.io/yara/

[11] IDA, a multi-processor disassembler anddebugger. https://www.hex-rays.com/

products/ida/

[12] Timo Hirvonen. Sulo: Dynamic instru-mentation tool for Adobe Flash Playerbuilt on Intel Pin. https://github.com/F-Secure/Sulo

[13] Mozilla. Shumway: HTML5 technologyexperiment that explores building a faith-ful and efficient renderer for the SWFfile format without native code assistance.https://github.com/mozilla/shumway

[14] Intel. Pin: A Dynamic Binary Instrumenta-tion Tool https://software.intel.com/en-us/articles/pintool

12


Recommended