Self Encrypting Code to Protect Against Analysis and Tampering

Self-encrypting Code to Protect Against Analysis and Tampering

Jan Cappaert, Nessim Kisserli, Dries Schellekens, and Bart Preneel

Katholieke Universiteit LeuvenDepartment of Electrical Engineering, ESAT/SCD-COSIC

Kasteelpark Arenberg 10B-3001 Heverlee, Belgium

{jcappaer,nkisserl,dschelle,preneel}@esat.kuleuven.be

Abstract

Confidentiality and data authenticity are twobasic concepts in security. The first guaran-tees secrecy of a message, while the latter pro-tects its integrity. This paper examines the useof encryption to secure software static analysisand tampering attacks. We present the conceptof code encryption, which offers confidentiality,and a method to create code dependencies thatimplicitly protect integrity. For the latter wepropose several dependency schemes based ona static call graph which allow runtime codedecryption simultaneous with code verification.If code is modified statically or dynamically, itwill result in incorrect decryption of other code,producing a corrupted executable.

1 Introduction

From the early 60s until 80s application se-curity was merely solved by designing se-cure hardware, such as ATM terminals, orset-top boxes. Since the 90s, however, se-cure software gained much interest due toits low cost and flexibility. Nowadays, weare surrounded by software applications whichwe use for webbanking, communication, e-voting, . . . As a side effect, more threatssuch as piracy, reverse engineering and tam-

pering emerge. These threats try to exploitcritical and poorly protected software. Thisillustrates the importance of thorough threatanalysis (e.g. STRIDE [13]) and new softwareprotection schemes, needed to protect softwarefrom analysis and tampering attacks. This pa-per provides a technique to protect against thelast two threats, namely reverse engineeringand tampering.

For decades encryption has provided themeans to hide information. Originally, itserved for encrypting letters or communica-tions, but quickly became a technique to secureall critical data, either for short-term trans-mission or long-term storage. While softwareenterprises offer commercial tools for perform-ing software protection, an arms race is goingon between the software programmers and thepeople attacking software. Although, encryp-tion is one of the best understood informa-tion hiding techniques, encryption of softwareis still an open research area. In this paper weexamine the use of self-encrypting code as ameans of software protection.

Section 1 introduces our motivation andSection 2 describes software security and itsthreats. Section 3 gives a brief overviewof related research, while Section 4 elabo-rates on code encryption. In Section 5 wepresent a framework that facilitates generation

1

of tamper-resistant programs and show someempirical results as a proof of concept. Section6 explains several attacks and possible counter-measures. And finally, Section 7 summarisesthis paper and outlines some conclusions.

2 Software security andthreats

One of a company’s biggest concerns is thattheir software falls prey to reverse engineering.A secret algorithm that is extracted and reusedby a competitor can have major consequencesfor software companies. Also secret keys, con-fidential data or security related code are notintended to be analysed, extracted and stolenor even corrupted. Even if legal actions such aspatenting and cyber crime laws are in place, re-verse engineering remains a considerable threatto software developers and security experts.

Often the software is not only analysed, butalso tampered with. In a branch jamming at-tack, for example, an attacker just replacesa conditional jump by an unconditional one,forcing a specific branch to execute even whenit is not supposed to under those conditions.Such attacks can have a major impact on ap-plications which involve licensing, billing, oreven e-voting.

Before actually changing the code in a mean-ingful way, one always needs to understand theinternals of a program. Changing a programat random places can no longer guarantee thecorrect working of the application after modifi-cation. Several papers present the idea of self-verifying code [2, 12] which is able to detect anychanges to critical code. These schemes, how-ever, do not protect against analysis of code.In this paper we try to solve analysis and tam-pering attacks simultaneously.

We can distinguish two main categories ofanalysis techniques: static analysis and dy-namic analysis. Static analysis is applied to

non-executing code, e.g. disassembly or decom-pilation [5]. Dynamic analysis is performedwhile the code is executed. It is typically eas-ier to obstruct static analysis than protect thecode against dynamic attacks.

In this paper we focus on software-only so-lutions because of their low cost and flexibil-ity. It is clear that code encryption is usefulif encrypted code can be sent to a secure co-processor [22]. But when this component is notavailable, as it is in most current systems, it be-comes less obvious how to tackle this problem.As opposed to a black-box system, where theattacker is only able to monitor I/O of a pro-cess, an environment where the attacker hasfull privileges behaves like a white box, whereeverything can be monitored. Chow et al. [4]call this a white-box environment and proposea method to hide a key within an encryptionalgorithm.

3 Related research

There are three major threats to software:piracy, reverse engineering and tampering.Collberg et al. [9] give a compact overviewof techniques to protect against these threats.Software watermarking for example aims atprotecting software reactively against piracy.It generally embeds hidden, unique informa-tion into an application such that it can beproved that a certain software instance belongsto a certain individual or company. When thisinformation is unique for each instance, onecan trace copied software to the source unlessthe watermark is destroyed. The second group,code obfuscation, protects against reverse en-gineering. This technique consists of one ormore program transformations that transforma program in such a way that its functionalityremains the same but analysing the internalsof the program becomes very hard. A thirdgroup of techniques aims to make software

2

‘tamper-proof’, also called tamper-resistant.As this paper investigates protection mecha-nisms against malicious analysis and tamper-ing, we will not elaborate on software water-marking.

3.1 Code obfuscation

As software gets distributed worldwide, it be-comes harder and harder to control it from adistance. This means that attackers often cananalyse, copy, and change it at will. Compa-nies however have been inventing techniquesto make this analysis harder. The techniquesrange from small tricks to counter debugging,such as code stripping, to complex control flowand data flow transformations that try to hidea program’s internals. This hiding tries to ad-dress the security objective of confidentiality.For example, when Java bytecode was shownto be susceptible to decompilation – yield-ing the original source code – researchers be-gan investigating techniques to protect againstthis [7, 8, 15]. Protection of low-level codeagainst reverse engineering has been addressedas well [25, 19].

3.2 Self-modifying code

While code obfuscation aims to protect codeagainst both static and dynamic analysis, thereexists another technique to protect againstcode analysis, namely self-modifying code.This technique offers the possibility to gener-ate code at runtime, instead of transforming itstatically. In practice however, self-modifyingcode is largely limited to the realm of virusesand malware. Nevertheless, some publicationsconsider self-modifying code as a technique toprotect against static and dynamic analysis.Madou et al. [16] for example consider dynamiccode generation. They propose a techniquewhere functions are constructed prior to theirfirst call at runtime. Furthermore, clustering

is proposed such that a common template canbe used to construct each function in a clus-ter, performing a minimal amount of changes.To protect the constant ‘edits’ against dynamicanalysis, the authors propose use of a pseudorandom number generator (PRNG).

Our decryption at runtime technique isequivalent with code generation, except thefact that the decryption key can rely on othercode, rather then on a PRNG. Furthermoreminimises re-encryption the visability of codeduring execution, while Madou et al. do notexplicitly protect a function template after thefunction executed.

3.3 Tamper resistance

Protecting code against tampering can be con-sidered as the problem of data authenticity,where in this context ‘data’ refers to the pro-gram code. In ’96 Aucsmith [1] illustratedin his paper a scheme to implement tamper-resistant software. His technique protectsagainst analysis and tampering. For this,he uses small, armoured code segments, alsocalled integrity verification kernels (IVKs), toverify code integrity. These IVKs are protectedthrough encryption and digital signatures suchthat it is hard to modify them. Furthermore,these IVKs can communicate with each otherand across applications through an integrityverification protocol. Many papers in the fieldof tamper resistance base their techniques onone or more of Aucsmith’s ideas.

Several years later, Chang et al. [2] proposeda scheme based on software guards. Their pro-tection scheme relies on a complex networkof software guards which mutually verify eachother’s integrity and that of the program’s crit-ical sections. A software guard is defined as asmall piece of code performing a specific tasks,e.g. checksumming or repairing. When check-summing code detects a modification, repaircode is able to undo this malicious tamper

3

attempt. The security of the scheme reliespartially on hiding the obfuscated guard codeand the complexity of the guard network. Ayear later, Horne et al. [12] elaborated on thesame idea and proposed ‘testers’, small hashingfunctions that verify the program at runtime.These testers can be combined with embeddedsoftware watermarks to result in a unique, wa-termarked, self-checking program. Other re-lated research is oblivious hashing [3] whichinterweaves hashing instructions with programinstructions and which is able to prove whethera program operated correctly or not.

Recently, Ge et al. [10] published a paper oncontrol flow based obfuscation. Although theauthors published their work as a contributionto obfuscation, the control flow information isprotected with an Aucsmith-like tamper resis-tance scheme.

4 Code encryption

The following sections give an overview of dy-namic code encryption. This is encryptingbinary code at runtime. Often this is alsocovered by the terms self-modifying or self-generating code. Encryption generally assuresthe confidentiality of the data. In the contextof binary code, this technique mainly protectsagainst static analysis. For example, severalencryption techniques are used by polymorphicviruses [21] and polymorphic shell code [6].In this way, they are able to bypass intrusiondetection systems, virus scanners, and otherpattern-matching interception tools. The fol-lowing sections present several methods of en-crypting code at runtime.

4.1 Bulk encryption

If a program is encrypted completely with asingle routine, we call it bulk encryption. Thedecryption routine is usually prepended to the

encrypted body. At runtime this routine de-crypts the body and then transfers control toit. The decrypting routine can either consultan embedded key or fetch one dynamically (e.g.from user input or from the operating system).The main advantage of such a mechanism isthat as long as the program is encrypted itsinternals are hidden and therefore protectedagainst static analysis. Another advantage isthat the encrypted body makes it hard for anattacker to statically change bits in a mean-ingful way. Changing a single bit will result inone or more bit flips in the decrypted code andthus modifying one or more instructions, whichmight crash the program or cause other unin-tended behaviour due to binary code’s brittle-ness. Nevertheless, a simple construction suchas bulk encryption, has certain desirable prop-erties:

• it protects the code against static analysisand forces an attacker to perform a dy-namic attack;

• as long as the code is encrypted, it is pro-tected against targeted tampering;

• it has a very limited overhead in size andperformance as the encryption is done allat once.

However, as all code is decrypted simultane-ously, it is inherently weak. An attacker simplywaits for the decryption to occur before dump-ing the process image to disk in clear form foranalysis.

4.2 Partial encryption

In contrast to bulk encryption where pro-gram code is decrypted all at once, one couldincrease granularity and decrypt small partsat runtime. Shiva [17] is a binary encryp-tor that uses obfuscation, anti-debugging tech-niques and multi-layer encryption to protect

4

ELF binaries. However, to the best of ourknowledge it still encrypts large code blocks,although one at a time, and thus exposes largeportions of code at runtime. Viega et al. [24]provide a method in C to write self-modifyingprograms that decrypt a function at runtime.While implementing self-modifying code on ahigh level is not straightforward (no address in-formation is known before compilation), theirproposed solution is easy-to-use as it is basedon predefined macros, an external encryptionprogram and a four step build phase, whichgoes as follows:

• initial build: the code is instrumented tocollect the required address information;

• the actual address information is gener-ated by executing the instrumented exe-cutable;

• final build: software is built and the neces-sary encryption routines are put in place;

• an external encryption program uses theaddress information to encrypt the func-tion that should be initially hidden.

Figure 1 illustrates how a function will bedecrypted at runtime. A function cipher isused to modify (decrypt) a block code. Thecode block key is read and used as key.

This scheme overcomes the weaknesses of re-vealing all code at runtime as it offers the pos-sibility to decrypt only the necessary parts, in-stead of the whole body, as bulk encryptionusually does. The disadvantage is a slight in-crease in overhead due to multiple calls to thedecryption routine.

call

key

code

“modify”

“read”

cipher

Figure 1: A basic scheme for function de-cryption where correct decryption of a func-tion, called code, depends on another func-tion’s code, called key. The code that performsthis operation is called cipher.

5 Function encryption frame-work

5.1 Basic principle

For our code encryption framework we relyon function encryption and code dependen-cies. For this we use the principle of encryptingfunctions mentioned above. We define a newkind of software guard, which decrypts (D) orencrypts (E) the code of a function a usingthe code of another function b. Using param-eters a and b decryption can be expressed asa = Db(A), where A is the encrypted functiona or A = Eb(a). Furthermore, we would likethe guard to have the following properties:

5

• if one bit is modified in b, then 1 or morebits in a should change; and

• if one bit is modified in A, then 1 or morebits should be modified in a after decryp-tion.

Many functions meet these requirementshowever. For the first requirement a crypto-graphic function with b as key could be used.For example Viega et al. [24] use the streamcipher RC4 where the key is the code of an-other function. The advantage of an additivestream cipher is that encryption and decryp-tion are the same computation, thus the samecode. This also holds for certain block ciphers,such as DES [18], but not for all (e.g. AES); us-ing a suitable mode of operation, like countermode (in this case you use the block cipher as astream cipher), can overcome this inconvience.However, the key size for symmetric crypto-graphic algorithms is limited, e.g. to 128 or 256bits. RC4 for example allows a key up to 256bytes. This means that any modification to bbeyond the first 256 bytes will not cause anychange to a, which violates our first require-ment. Therefore we need some kind of com-pression function that maps a variable lengthcode block to a fixed length string, which isthen used as key for the encryption routine.Possible functions are checksum functions, e.g.CRC32, or cryptographic hash functions [18].

Using code of b to decrypt A could be seen asan implicit way of creating tamper resistance;modifying b will result in an incorrect hashvalue, i.e. encryption key, and consequently in-correct decryption of A. Furthermore, flippinga bit in A will flip at least one bit in a; in caseof an additive stream cipher a bit change in theciphertext will happen at the same location inthe plaintext, while the error propagation forblock ciphers depends on the used mode of op-eration. This might be sufficient to make mostapplications crash due to binary code’s brittle-ness. For example, a single bit flip in the clear

code might change the opcode of an instruc-tion, resulting in an incorrect instruction to beexecuted, but also in desynchronising the nextinstructions [14], which most likely will crashthe program. Changing one of the operandsof an instruction will cause incorrect or unpre-dictable program behaviour.

Another advantage of this scheme is thatthe key is computed at runtime (relying onother code), which means the key is not hard-coded in the binary and therefore hard tofind through static analysis (e.g. entropy scan-ning [20]). The main disadvantage is perfor-mance: loading a fixed length cryptographickey is usually more compact and faster thancomputing one at runtime, which may involvecalculating a hash value. Furthermore, the keysetup of symmetric cryptographic algorithmwill also have a performance impact.

Although we believe that cryptographic hashfunctions and stream ciphers are more secure,we used for our experiments a self-designedXOR-based scheme – which satisfies our twoproperties – to minimise the cost in speed andsize after embedding the software guards.

5.2 Dependency schemes

With this basic function encryption method wenow can build a network of code dependenciesthat make it hard to change code statically ordynamically. We propose three schemes whichare based on call graph information to makefunctions depend on each other such that staticand dynamic tampering becomes difficult.

Scheme 1 Initially all callees, the calledfunctions, are encrypted, except main(), whichhas to be in clear when the program transferscontrol to it. A function is decrypted beforeits call and the decryption key is based on thecode of the caller, the calling function.

Note that in the above case once a functionis decrypted it stays so and is susceptible to

6

static analysis, e.g. if a user forces a dump ofthe process’ memory space.

Scheme 2 Initially all callees are encrypted.Their caller calls a guard to decrypt them justbefore they are called and to re-encrypt themwhen they return. Again the decryption key isbased on the code of the calling function. Thiswill tampering of the caller without affectingthe callees very difficult.

We remark that if a function is only de-crypted before it is called and is re-encryptedafter it returns, then the code of all callers inthe call path (this is the path in the call graphleading to the called function) will be in clear-text.

Scheme 3 Initially all callees are encrypted.Each caller decrypts its callee before the calland re-encrypts it after it returns. Addition-ally, the callee encrypts its caller upon beingcalled and decrypts it before returning.

In this last case the maximum number offunctions in cleartext during execution is min-imised. Though, guard code is implicitly con-sidered to be in cleartext as well.

The memory layout of a function call pro-tected according to scheme 3 is sketched inFigure 2. A function is called through the fol-lowing steps:

1. a guard is called to decrypt the callee;

2. control is transferred to the callee;

3. the callee calls a guard to encrypts hiscaller;

4. the callee executes;

5. before returning, the callee calls a guardto decrypt the caller code;

6. control is transferred to the caller;

guard

caller1 2

3

5 6

callee

7

4

Figure 2: Memory layout of scheme 3: 1, 3,5, and 7 are guard calls; 2 and 6 are controltransfers.

7. the caller calls a guard to re-encrypt thecallee code.

All functions that call a guard to decrypt orencrypt another function use their own code askey material. It can be shown that in this casetampering will always be detected (‘detected’here implies that incorrect execution and un-desired behaviour will appear):

• If a function is tampered with while it isencrypted, this will result in a modifiedversion and all callees of this function willbe decrypted incorrectly (and their calleesas well, etc.). This is a = Db(A).

• If a function is tampered with while it isdecrypted, this will result in incorrect de-cryption of the callees that are decrypted

7

after this moment in time. This corre-sponds to a = Db(A).

Note that it is also possible to generateschemes based on heuristic information, suchas other software guards [2] do, where theowner specifies which code is critical and wherethe protection techniques focus on that partonly. However, our schemes are applied to thewhole call graph, thus protecting the whole bi-nary.

Consider a program P and its modified ver-sion P , then we define the time cost Ct and thespace cost Cs as

Ct(P, P ) =T (P )T (P )

Cs(P, P ) =S(P )S(P )

where T (X) is the execution time of pro-gram X and S(X) its size. Table 1 gives anoverview the performance cost of schemes 1, 2,and 3 applied on some basic implementationof common UNIX commands. The commanddu outputs how much space files represent ondisk, tar is an archiving utility, and wc is aprogram that counts words in a file. We clearlynotice that wc has the largest performance losswhen protected by scheme 3. This is due tothe numerous loops that also contain a lot ofguard calls to decrypt or re-encrypt code. Call-ing more guards outside the loops would speedup the program, but it would reveal functionslonger than needed at runtime. Scheme 1 ap-pears to run faster than the original program.This might be a result of extensive caching ofcode and data fragments (our code is treatedas data when we decrypt it). Space cost rangedfrom 1.031 to 1.170 which is an increase of 3 upto 17% of the original program size. This ex-pansion is proportional to the number of guardcalls and the size of the guard code.

Program Scheme 1 Scheme 2 Scheme 3du 0.899 3.612 8.364tar 0.822 1.339 2.783wc 0.989 39.715 91.031

Table 1: Performance cost Ct when using self-modifying code with dependency schemes.

5.3 Scheme restrictions

Opposed to the simplicity of our schemes,generic programs face each scheme to specificdifficulties. Some of them are easy, othersharder to solve. An overview is given below.

Loops Scheme 1 poses a problem when afunction call is nested within a loop and is de-crypted prior the function call; during differentiterations the function will be decrypted mul-tiple times resulting in incorrect code. There-fore, we propose placing the decryption routineoutside the loop. Schemes 2 and 3 do not en-counter this problem as they always re-encryptthe called function after it returns. However,placing the encryption and decryption routinesoutside the loop could always be considered forperformance reasons. This implies code will bedecrypted and thus unprotected as long as theloop is running, but it reduces overhead.

Recursion For all three schemes care withrecursion needs to be taken. If a functioncalls itself (pure recursive call), it should – ac-cording to our scheme definitions – decrypt it-self, although it is in cleartext already. There-fore, we decrypt a recursive function only once:namely when it gets called by another function.We can extend this to recursive cycles, wherea group of functions together form a recursion.

Multiple callers If a function a is calledby different callers bi, one could choose to en-crypt the callee a with the cleartext code of

8

only one of the callers, e.g. based on profil-ing information. The function that calls theparticular callee the most, could then be usedas key to decrypt it. However, when anothercaller is modified, this will not result in incor-rect decryption of a. Therefore we state thatthe decryption of A should rely on all bi. Theproblem is: when a is called, only one bi mightbe in clear. To encrypt all bi yields a numberof guard calls that decrypt the paths from theactual caller to the key code functions. Afterthis, it should actually re-encrypt all the de-crypted functions to reduce visibility in mem-ory. The maximum number of decryptions re-quired to get the key code in cleartext for a pairof callers bx and by is lx+ ly where lx and ly arethe nesting levels of ax, respectively ay, rela-tive to main(). The same number of encryp-tions is needed to re-encrypt all these functionsafter the target function is decrypted. In thecase of n callers, we need

∑ni=1 li guards in the

worst case to decrypt all callers bi and then an-other n guards to decrypt A. To overcome thisoverhead we propose to rely on the encryptedcode of the callers, namely all Bi, and decryptA even before control is given to any bi. Forthis we only need n extra guards instead ofn +

∑ni=1 li and any change in any caller will

still be propagated to the callee.Other options involve:

• not encrypting functions which have mul-tiple callers (most trivial solution), whichviolates our schemes;

• inlining the callee, but the callee may callother functions itself, which only shifts theproblem, because the callees of the inlinedcallee will have multiple callers after inlin-ing;

• encrypting with another (possibly en-crypted) code as key code, e.g. the encryp-tion code itself (see also Section 6.1);

• modifying the guard, such that it foreseesa correction value c which compensatesthe hash of a function b2 when function ais encrypted with b1 yielding hash(b1) =hash(b2) ⊕ c. However, this value alsofacilitates attackers to modify code. Allthey have to do is compute the new hashand compensate it in the correction value.

As an example we refer to Figure 3 whereerror print() can be called by errf()and perrf(). Relying on their clear codewould yield decryption of main() as well ascounter() to decrypt error print(). Whenrelying however on encrypted code, we candecrypt error print() just before one of itscallers is called and rely on their encryptedcode.

5.4 Code encryption as an additionto code verification

Our guards, which modify code depending onother code, offer several advantages over thesoftware guards proposed by Chang et al. [2]that only verify (or repair) code:

• confidentiality: as long as code remainsencrypted in memory it is protectedagainst analysis attacks. With a goodcode dependency scheme it is feasible toensure only a minimal number of codeblocks are present in memory in decryptedform;

• tamper resistance: together with a gooddependency scheme, our guards offer pro-tection against any tampering attempt. Ifa function is tampered with statically oreven dynamically, the program will gener-ate corrupted code when this function isexecuted and will most likely eventuallycrash due to illegal instructions. Further-more, if the modification generates exe-cutable code, this change will be propa-

9

errf

error print

counter report

main

getword

isword

perrf

1 main {wc.c 124}2 errf {wc.c 32}4 error_print {wc.c 20}10 counter {wc.c 105}12 perrf {wc.c 44}14 error_print ... {4}16 getword {wc.c 75}19 isword {wc.c 62}23 report {wc.c 55}25 report ... {23}

Figure 3: Static call graph and tree of the UNIX word count command wc. The reduced staticcall tree was produced with GNU’s cflow [11].

gated to other functions, resulting in erro-neous code.

In some cases, programmers might opt forself-checking code instead of self-encryptingcode, based on some of the following disadvan-tages:

• implicit reaction to tampering: if a veri-fied code section is tampered with the pro-gram will crash (if executed parts rely onthis modified section). However, crashingis not very user-friendly. In the case ofsoftware guards [2, 12], detection of tam-pering could be handled more technicallyby triggering another routine that for ex-ample exits the program after a randomtime, calls repair code that fixes the mod-ified code (or a hybrid scheme, which in-volves both techniques), . . .

• limited hardware support: self-modifyingcode requires memory pages to be exe-cutable and writable at the same time.However some operating systems enforcea WˆX policy as a mechanism to makethe exploitation of security vulnerabilitiesmore difficult. This means a memory pageis either writable (data) or executable(code), but not both. Depending on the

operating system, different approaches ex-ist to bypass – legally – the WˆX protec-tion: using mprotect(), the system callfor modifying the flags of a memory page,to explicitly mark memory readable andexecutable (e.g. used by OpenBSD) or set-ting a special flag in the binary (e.g. incase of PaX). A bypass mechanism willmost likely always exist to allow for somespecial software like a JVM that optimisesthe translation of Java bytecode to nativecode on the fly.

6 Attacks and improvements

6.1 inlining of guard code

If implementation of dependency schemes (seealso Section 4.2) consists of a single instanceof the guard code and numerous calls to it,an attacker can modify the guard or cryptocode to write all decrypted content to anotherfile or memory region. To avoid that an at-tacker only needs to attack this single instanceof the guard code, inlining the entire guardcould thwart this attack and force an attackerto modify all instances of the guard code atruntime, as all nested guard code will initiallybe encrypted. However, a disadvantage of this

10

Program Scheme 1 Scheme 2 Scheme 3du 1.088 1.379 1.753tar 1.213 1.484 2.219wc 0.458 2.210 2.800

Table 2: Space cost Cs when using self-modifying code with dependency schemes afterinlining all guards.

inlining is code expansion. Compact encryp-tion routines might keep the spacial cost rela-tively low, but implementations of secure cryp-tographic functions are not always small.

Table 2 shows that wc almost tripled in sizeafter inlining guards according to scheme 3.The performance cost, however, jumped from91.031 to 1379.71 which was our worst case re-sult after inlining guards. The program tarran only 35 times slower after inlining guardsas specified by scheme 3. Further optimisa-tion of the guard code, and the cryptographicalgorithms should contribute to a lower spacecost and as a consequence also a smaller per-formance penalty, as some guard code has tobe decrypted by other guards and so on.

6.2 Hardware-assisted circumven-tion attack

Last year, van Oorschot et al. [23] published ahardware-assisted attack that circumvents self-verifying code mechanisms. The attack ex-ploits differences between data reads and in-struction fetches. This is feasible due to thefact that current computer architectures dis-tinguish between data and code. When in-structions are verified (e.g. checksummed orhashed) they are treated as data, but wheninstructions are fetched for execution they aretreated as code. The attack consists of dupli-cating each memory page, one page containingthe original code, while another contains tam-pered code. A modified kernel intercepts every

data read and redirects it to the page contain-ing the original code, while the code that getsexecuted is the modified one.

Our protection scheme is different however.Redirecting the data reads to a page with un-modified code will result in a correct hash todecrypt the next function. However, this at-tack implies that code is in clear and thus canbe modified. The only blocks in cleartext how-ever are the guard code and main() and theycan thus be modified using van Oorschot’s at-tack. If the decryption routines are not in-lined, then an attacker could just modify theencryption code (e.g. to redirect all generatedcleartext at runtime), while – if the integrityof the decryption routine is verified – this willnot be detected due to the redirection of thedata reads. When cryptographic routines areinlined, extending this attack to the inlined de-cryption routines would require to duplicateevery memory page, as soon as a function getsdecrypted, and modify its decrypted body dy-namically. This whole attack implies intercept-ing ‘data writes’ that modify code, use thisas a trigger to dynamically copy a memorypage, and modify code dynamically. This at-tack however is identical to a dynamic analysisattack, where functions get decrypted, decryp-tion routines are identified and all clear code isintercepted, allowing an attacker to rebuild theapplication without protection code and mod-ifying it afterwards statically.

6.3 Increasing granularity

Our scheme is built on top of static call graphinformation and therefore uses functions asbuilding blocks. Our implementation also usesfunction pointers, which can be addressed at ahigh level (e.g. C). Implementing self-verifyingor self-modifying code however can work withany granularity if implemented carefully. Theonly rule that should be respected is: codeshould be in cleartext form (correct binary

11

code instructions, part of the original program)whenever it is executed.

With inline Assembly (asm() in gcc) we caninline Assembly labels in the C code. Just asfunction pointers their scope is global. How-ever, these labels can be placed anywhere ina function, unlike function pointers which bydefinition only occur at the beginning of thefunction and which must be defined before theycan be used. A further benefit of using labels isthe elimination of the initial build phase whichgathered address information. Providing theright addresses to the guard code is in this casedone by the assembler which just replaces thelabels by the corresponding addresses.

If one increases the granularity, and encryptsparts of functions, the guards can be integratedinto the program’s control flow which will makeit even harder to analyse the network of guards,especially when they are inlined. However, webelieve that such a fine-grained structure willinduce much more overhead. The code blocksto be encrypted will be much smaller than theadded code. Furthermore, more guards willbe required to cover the whole program code.Hence it is important to trade-off the use ofthese guards and perhaps focus on some criticalparts of the program and avoid ‘hot spots’ suchas frequently executed code.

7 Conclusions

This paper presents a new type of softwareguards which are able to encipher code at run-time, relying on other code as key information.This technique offers confidentiality of code,a property that previously proposed softwareguards [2, 12] did not offer yet. As code isused as a key to decrypt other code, it becomespossible to create code dependencies whichmake the program more tamper-resistant. Wetherefore propose three dependency schemes,which are built on static call graph informa-

tion. These schemes make sure an introducedmodification is propagated through the rest ofthe program, forcing the application to workincorrectly or exit prematurely. As a proof ofconcept we implemented our technique in Cand applied it on some small C programs.

Acknowledgements

This work was supported in part by the Re-search Foundation - Flanders (FWO), the In-stitue for the Promotion of Innovation throughScience and Technology in Flanders (IWT),the Interdisciplinary Institute for BroadBandTechnology (IBBT), and the Concerted Re-search Action (GOA) Ambiorics 2005/11 of theFlemish Government.

References

[1] D. Aucsmith. Tamper resistant software:an implementation. Information Hid-ing, Lecture Notes in Computer Science,1174:317–333, 1996.

[2] H. Chang and M. J. Atallah. Protectingsoftware codes by guards. ACM Work-shop on Digital Rights Managment (DRM2001), LNCS 2320:160–175, 2001.

[3] Y. Chen, R. Venkatesan, M. Cary,R. Pang, S. Sinha, and M. Jakubowski.Oblivious hashing: a stealthy software in-tegrity verification primitive. In Informa-tion Hiding, 2002.

[4] S. Chow, P. Eisen, H. Johnson, and P. vanOorschot. A White-Box DES Implemen-tation for DRM Applications. In Pro-ceedings of 2nd work ACM Workshop onDigital Rights Management (DRM 2002),November 18 2002.

[5] C. Cifuentes and K. Gough. Decompilingof binary programs. Software – Practice& Experience, 25(7):811–829, 1995.

12

[6] CLET team. Polymorphic shellcodeengine using spectrum analysis.http://www.phrack.org/phrack/61/p61-0x09_Polymorphic_Shellcode_Engine%.txt.

[7] C. Collberg, C. Thomborson, and D. Low.A taxonomy of obfuscating transforma-tions. Technical Report #148, Depart-ment of Computer Science, The Univer-sity of Auckland, 1997.

[8] C. Collberg, C. Thomborson, andD. Low. Manufacturing cheap, resilient,and stealthy opaque constructs. Prin-ciples of Programming Languages 1998,POPL’98, pages 184–196, 1998.

[9] C. S. Collberg and C. Thomborson. Wa-termarking, Tamper-Proofing, and Obfus-cation – Tools for Software Protection. InIEEE Transactions on Software Engineer-ing, volume 28, pages 735–746, August2002.

[10] J. Ge, S. Chaudhuri, and A. Tyagi. Con-trol flow based obfuscation. In DRM ’05:Proceedings of the 5th ACM workshop onDigital rights management, pages 83–92,2005.

[11] GNU. GNU cflow.http://www.gnu.org/software/cflow/.

[12] B. Horne, L. R. Matheson, C. Sheehan,and R. E. Tarjan. Dynamic Self-CheckingTechniques for Improved Tamper Resis-tance. In Proceedings of Workshop on Se-curity and Privacy in Digital Rights Man-agement 2001, pages 141–159, 2001.

[13] M. Howard and D. C. LeBlanc. WritingSecure Code, Second Edition. MicrosoftPress, 2002.

[14] C. Linn and S. Debray. Obfuscation ofexecutable code to improve resistance tostatic disassembly. In CCS ’03: Proceed-ings of the 10th ACM conference on Com-puter and communications security, pages290–299, 2003.

[15] D. Low. Java Control Flow Obfuscation.Master’s thesis, University of Auckland,New Zealand, 1998.

[16] M. Madou, B. Anckaert, P. Moseley,S. Debray, B. De Sutter, and K. De Boss-chere. Software protection through dy-namic code mutation. In J. Song,T. Kwon, and M. Yung, editors, The 6thInternational Workshop on InformationSecurity Applications (WISA 2005), vol-ume LNCS 3786, pages 194–206. Springer-Verlag, August 2006.

[17] N. Mehta and S. Clowes. Shiva – ELFExecutable Encryptor. Secure Reality.http://www.securereality.com.au/.

[18] A. Menez, P. van Oorschot, and S. Van-stone. Handbook of Applied Cryptography.CRC Press, Inc., 1997.

[19] Scut, Team Teso. Burneye – x86/LinuxELF Relocateable Object Obfuscator.

[20] A. Shamir and N. van Someren. Playing“Hide and Seek” with Stored Keys.Financial Cryptography ’99, LNCS1648:118–124, 1999.

[21] Symantec. Understanding and ManagingPolymorphic Viruses.http://www.symantec.com/avcenter/reference/striker.pdf.

[22] J. D. Tygar and B. Yee. Dyad: A systemfor using physically secure coprocessors.In IP Workshop Proceedings, 1994.

13

[23] P. C. van Oorschot, A. Somayaji, andG. Wurster. Hardware-assisted circum-vention of self-hashing software tamper re-sistance. IEEE Transactions on Depend-able and Secure Computing, 02(2):82–92,2005.

[24] J. Viega and M. Messier. Secure Program-ming Cookbook for C and C++. O’ReillyMedia, Inc., 2003.

[25] G. Wroblewski. General Method of Pro-gram Code Obfuscation. PhD thesis, Wro-claw University of Technology, Institute ofEngineering Cybernetics, 2002.

14

Date post:	16-Nov-2014
Category:	Documents
Upload:	adarsh
View:	899 times
Download:	4 times

Self Encrypting Code to Protect Against Analysis and Tampering

Documents