+ All Categories
Home > Documents > Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps...

Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps...

Date post: 11-Jan-2016
Category:
Upload: brett-houston
View: 218 times
Download: 1 times
Share this document with a friend
25
Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin
Transcript
Page 1: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovery of Variables and Heap Structure in x86

Executables

Gogul BalakrishnanThomas Reps

University of Wisconsin

Page 2: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Overview

• Introduction• Challenges• Background• Recovering A-locs via Iteration• An Abstraction for Heap-Allocated

Storage• Experiments

Page 3: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Introduction

• The Need of Analyzing Executables– What You See Is Not What You eXecute

• Many Obstacles in Analyzing Executables– Data Objects are Not Easily Identifiable.– Absence of Symbol Table & Debugging Information– Determining the Memory Addresses of Data Objects– Difficult to Track the Flow of Data through Memory– Challenging to get useful information about the heap

e.g) memset(password, ‘\0’, len); free(password);

Page 4: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Challenges(1/3)

• Recovering Variable-like Entities– The layout of Memory is known at Compile

time or Assembly time (IDAPro’ Approach)

– To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined.

void main() { int x, y; x = 1; y = 2; return;}

proc main1 mov ebp, esp2 sub esp, 83 mov [ebp-8], 14 mov eax, ebp5 mov [eax-4], 26 add esp, 87 retn

Page 5: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Challenges(2/3)

• Granularity of Recovered Variable-like

Entities– Affects the complexity and accuracy of

subsequent analyses

• The Structure of Heap-Allocated Objects– Only the Size of the Allocated Block is Known.– Using Abstract-Refinement Algorithm

Page 6: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Challenges(3/3)

• Resolving Virtual-Function Calls

– A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update)

one-variable-per-malloc-site abstraction

Page 7: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(1/6)

• Abstract Locations (A-locs)– Memory Region

• A Set of Disjoint Memory Areas• Represents a Group of Locations that have Similar

Runtime Properties

– Abstract Locations• Locations between two addresses/offsets in Memory-

Region• Address & Offsets are Statically Determined

Page 8: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(2/6)

• Abstract Locations (cont’d) proc main0 mov ebp,esp1 sub esp,402 mov ecx,03 lea eax,[ebp-40]L1: mov [eax], 15 mov [eax+4],26 add eax, 87 inc ecx8 cmp ecx, 59 jl L110 mov eax,[ebp-36]11 add esp,4012 retn

Page 9: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(3/6)

• Value-Set Analysis (VSA)– Combined Numeric-Analysis & Pointer-Analysis– Over-Approximation of the values that each a-

loc holds at each program point– Value-Set

• The Set of Addresses and Numeric Values• N-tuple of strided intervals of the form s[l, u]

• (Global Region, Procedure Region, …)• (1[0, 9], ∮) versus (∮, -8[-40, -8])

e.g) 8[-40, -8] = {-40, -32, -24, -16, -8}

N : the number of memory-regions

Page 10: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(4/6)

• Value-Set Analysis (cont’d)– The Value-Set of eax at L1

• (∮, 8[-40, -8]) • eax holds the offsets

{-40, -32, -24, -16, -8}• Starting Addresses of Field x of p

proc main0 mov ebp,esp1 sub esp,402 mov ecx,03 lea eax,[ebp-40]L1: mov [eax], 15 mov [eax+4],26 add eax, 87 inc ecx8 cmp ecx, 59 jl L110 mov eax,[ebp-36]11 add esp,4012 retn

Typedef struct { int x, y;} Point;

int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y;}

Page 11: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(5/6)

• Aggregate Structure Identification (ASI)– Can Distinguish between Accesses to Different

Parts of the Same Aggregate– Aggregate is broken up into smaller parts

(atoms)– Data-Access Constraint Language (DAC)

• Specifying Data-Access Pattern in the Program

DataRef Reference to a set of sequences of bytes

UnifyConstraint

Flow of Data in the Program

Page 12: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Background(6/6)

• Aggregate Structure Identification (cont’d)– Data-Access Constraint Language (DAC)

• DataRef [l : u] refers to bytes l through u in DataRef• DataRef n : n is the number of elements

– ASI DAG

e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11]

return_main

p[0:39] 5[0:3] ≈ const_1[0:3];p[0:39] 5[4:7] ≈ const_2[0:3];return_main[0:3] ≈ p[4:7]

Page 13: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovering A-locs via Iteration• Problems of VSA

– Can only Represent a Contiguous Sequence of Memory Locations

– Cannot Detect Internal Substructure

• Basic Idea

1. VSA is used to obtain memory-access patterns in the executable;

2. ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA.

IDAPro

ASI VSAFinal Value-Sets

Page 14: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovering A-locs via Iteration• Generating Data-Access Constraints

from Value<Algorithm 1 SI2ASI>if s[l,u] is a singleton then return <“r[l : l+length-1]”, true>else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return <ref, (s = size)>enf if

e.g) s[l, l]

Actual Byte Range

The number of array elements

Input : (r, s[l, u], length)Output : (ASI Ref, Boolean)

(AR_main, 8[-40, -8], length)=> {AR_main[(-40):(-1)] 5[0:7]}AR_main[-40:-33][0:7]AR_main[-32:-25][0:7]AR_main[-24:-17][0:7]AR_main[-16:-9][0:7]AR_main[-8:-1][0:7]

Page 15: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovering A-locs via Iteration• Generating Data-Access Constraints

from Value<Algorithm 2>if (s1[l1,u1] or s2[l2,u2] is a singleton then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)end ifif s1 ≥ (u2 – l2 + length) then baseSI ← s1[l1, u1] indexSI ← s2[l2, u2]else if s2 ≥ (u1 – l1 + length) then baseSI ← s2[l2, u2] indexSI ← s1[l1, u1]else return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)end if<baseRef, exactRef> ← SI2ASI(r, baseSI, stride(baseSI))if exactRef is false then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length)else return concat(baseRef, SI2ASI(‘’, indexSI, length))endif

Determine base register

Row-major order

Base Addr

Base Addr

Index Addr

e.g) eax : (1[0:9], ∮)ecx : (∮, 16[-160, -16])In case of [ecx+eax] =>AR[-160:-1] 10[0:15] [0:9] 10[0:0]

Page 16: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovering A-locs via Iteration• Interpreting Indirect Memory-

References– Lookup Algorithm

• NodeDesc : <name, length>

• NodeDescList : An Ordered List of NodeDesc

• Three Operations

name :the name associated with the ASI tree nodelength : the length of above node

e.g) [nd1, nd2, …, ndn]

Name Output

GetChildren(aloc) List of Child Nodes

GetRange(start, end)

List of Nodes with offsets in the given range [start, end]

GetArrayElements(m)

List of Nodes with m elements

Page 17: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Recovering A-locs via Iteration• Lookup Algorithm Examples

e.g) Lookup p[0:39] 5[0:3]

GetChildren(p) = [<a3, 4>, <a4, 4>, <i2, 32>]GetRange(0, 39) = [<a3, 4>, <a4, 4>, <i2, 32>]GetArrayElements(5) = [<a3, 4>, <a4, 4>], [<a5, 4>, <a6, 4>]GetRange(0, 3) = [<a3, 4>, <a5, 4>]

Page 18: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

An Abstraction for Heap-Allocated Storage

• Previous Abstraction

• Recency Abstraction– Allowing VSA & ASI to recover Info. About

virtual-function tables– Use Two Memory-Regions per allocation site s

• MRAB[s] : Most Recently Allocated Block• NMRAB[s] : Non-Most Recently Allocated Block• count : How many concrete blocks the memory-region

represents (MRAB[s].count, NMRAB[s].count)– SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]}

• size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size)

All of the nodes allocated at a given allocation site s are folded together into a single summary node ns.

Page 19: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

An Abstraction for Heap-Allocated Storage

• Operation– AbsEnv[s] : MRAB[s]/NMRAB[s] →

<count,size,alocEnv>– AlocEnv = a-loc → ValueSet– Allocation site s transforms absEnv to absEnv’

• absEnv’(MRAB[s]) = <[0,1], size, a-loc.Value-Set>• absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count +

absEnv(MRAB[s]).count• absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪

absEnv(MRAB[s]).size• absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv

∪ absEnv(MRAB[s]).alocEnv

Page 20: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

An Abstraction for Heap-Allocated Storage

Page 21: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Experiments

• Environments

• Software

OS Compiler Language Target Files

Windows Visual Studio 6.0

C++ .obj

Page 22: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Experiments

• Results of Virtual-Function Call Resolution

Page 23: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Experiments

• Results of A-loc Identification– Comparing the Results of Algorithm with

Debugging Information

The structure of 87% of the local variables is correct

Page 24: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Experiments

• Results of A-loc Identification

The structure of 72% of the objects in the heap is correct

Page 25: Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin.

Q & A


Recommended