+ All Categories
Home > Documents > 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran...

2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran...

Date post: 31-Dec-2015
Category:
Upload: griselda-laureen-brooks
View: 214 times
Download: 0 times
Share this document with a friend
38
2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley
Transcript

2/9/2009 1

Binary Analysis and Rewriting

Arvind AyyangarNiranjan Hasabnis

Alireza SaberiTung TranR. Sekar

Stony Brook University

Min Gyung KangStephen McCamantPongsin Poosankam

Dawn SongUC, Berkeley

2/9/2009 2

Binary Rewriting for Protecting ApplicationsBasic approach: Instrument OS+application to

enforce policies that protect an application from a hostile OS

Why binary rewriting? Versatile: enforce a wide range of properties

Low-level: memory pages, instructions/operands,…Higher-level: fine-grained (data-structure level) memory

isolation, policies on callable functions and parameters,…Global: information flow, control-flow integrity,…

Wide applicability:COTS and legacy applications available only in binary form

Application and all library code can be analyzed/rewrittenWorks across programs in many high-level languagesAbility to handle low-level code written in assembly

2/9/2009 3

Binary Rewriting TodayRelies on dynamic rewriting

Each basic block rewritten just before first executionBenefit: Side-steps challenges of static rewriting, e.g., accurate disassembly

Drawbacks High overheads for problems of our interest

400% to 4000% for taint-tracking

Difficulty in reasoning about higher level propertiesLimited visibility (single basic block) constrains the

classes of properties that can be reasoned aboutTargets a single instruction set (usually x86)

2/9/2009 4

Our ApproachDevelop novel static analysis based methods to

overcome the drawbacks of today’s techniquesMany research challenges:

Robust and scalable static analysis of low-level code produced by different compilers (or hand-written

assembly)Accurate disassembly of binary code

Indirect control-flow transfers, non-standard call/return conventions, mingling of data and code, …

Accurate reasoning about key properties Dynamic taint analysis

2/9/2009 5

Robust and scalable Static analysis of low-level code

2/9/2009 6

Static analysis of low-level codeScalability relies on modularity

Analyze functions individually, compose resultsAvoids repeated analysis of same code (esp. libraries)

Strength comes from accurate treatment of local variables

Challenges in low-level binary codeDifficult to identify parameter passing in optimized

codeMissing pushes, parameter passing via registers,…

Difficult to distinguish local variables from other accesses

Caller/callee-saved registers, stack pointer conventions, …

2/9/2009 7

Static analysis of low-level codeTo solve these challenges, previous approaches

make optimistic assumptions, or rely on compiler idiomsoften fail on optimized code and/or large programsdon’t work for other compilers, or hand-written assembly

Our solution: Develop a new static analysis thatUses systematic analysis to avoid

assumptions/heuristicsParameters, passing conventions, caller/callee save regs,…

Verifies assumptions that it needs to makepreservation of stack pointer across callswhether return goes back to caller, etc.

Accurately tracks local variables by analyzing values held in registers and on the stack

2/9/2009 8

Stack AnalysisIdentify well-formed functionsAssociate with it scope, activation recordNo assumptions about

Parameters & Return valuesCaller & Callee SavesUse of base pointers

ESP RETURN ADDR

ƒ

2/9/2009 9

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[0,0]

ESP0 Base_SP

2/9/2009 10

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

ESP

Base_BP+[0,0]

0

-4

Base_SP

2/9/2009 11

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_SP +[-4,-4]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

ESP

Base_BP+[0,0]

0

-4

Base_SP

2/9/2009 12

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_SP +[-4,-4]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP-20ESP

Base_BP+[0,0]

0

-4

Base_SP

2/9/2009 13

Stack Analysis (contd)

<f>:push %ebpmov %esp, %ebpsub $16, %espmov 8(%ebp), %eaxadd $3, %eaxmov %eax, 8(%ebp)mov $7, -12(%ebp)mov 12(%ebp), %edxmov %edx, -8(%ebp)leaveret

args

locals

Base_SP + [-4, -4]

arg1 + [3, 3]

arg2 + [0, 0]

EBP

EAX

EDX

Base_SP + [0, 0]

arg2 + [0, 0]

ESP

-12

SP

arg2

arg1 + [3, 3]

Ret Addr

RP

arg2 + [0, 0]

Base BP +[0,0]

7

Caller frame

Calleeframe

args

locals

Base_SP

Base_SP+[-20,-20]

2/9/2009 14

Function summaries from Stack analysis

Change in ESP as a result of executing functionNumber of incoming parameters Changes in registers and parameters as a result

of executing functionFor function <f>:

ESP unchanged2 incoming argumentsEAX, EDX and first parameter changed as shown before; Other registers and parameters unchanged.

2/9/2009 15

Analysis time

XMMSApache

0

50

100

150

200

250

300

0 200 400 600

Size (K instructions)

Anal

ysis

tim

e (s

econ

ds)

2/9/2009 16

Static disassembly of binary code

2/9/2009 17

Background: Disassembly TechniquesLinear sweep algorithm

Start with program entry point, proceed to disassemble instructions sequentially

Key assumption: all instructions appear one after the next, without any gapsViolated in most code (presence of data or padding)

Recursive Traversal AlgorithmAfter a control-flow transfer instruction (CTI),

proceed to disassemble target addressFor conditional CTI and non-CTI, proceed to

disassemble next instructionKey problems

Code reached only through indirect CTIsFunctions that don’t return in the usual way

2/9/2009 18

Our Approach for DisassemblyAssumption

No code obfuscationNon-assumptions

Function prologue and epilogue patternsCompiler idioms or (lack of) optimizations

ApproachUse recursive traversalUse stack analysis to compute/verify return targetsDevelop new analysis techniques to determine

targets of indirect control-flow transfers

2/9/2009 19

Our Approach: Type inference Key insight: Code pointer values don’t undergo

arithmetic or other transformationsImplication: values assigned to code pointers must

represent indirect CTI targetsAchieves much better results than data flow

analysisAvoids global def-use problem, which is very hard in low-level languages

Compute sets C of possible code addresses and C of definite code addressesCode at addresses in C can be safely disassembledCode at addresses not in C can be safely relocated

2/9/2009 20

Static Disassembly: Preliminary Results

Analysis of disassembler on 'ls' binary

Analysis Disassembled code Reachable code not disassembled

Recursive Traversal 2.7% 85%

Compiler idioms and heuristics 87% 1%

Function pointer analysis 88% 0%

2/9/2009 21

Static Disassembly: Preliminary Results

Gap in dhclient due to incomplete implementation, dealing with global arrays

Application Size (KB)

Disassembled code

Reachable code not disassembled

pdftops 14 97% 0%

chroot 26 85% 0%

chmod 39 87% 0%

cat 43 92% 0%

ls 96 88% 0%

dhclient 411 81% 4%

2/9/2009 22

DTA++: Improving accuracy of Dynamic Taint Analysis

2/9/2009 23

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

2/9/2009 24

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too few dependencies lead to under-tainting

2/9/2009 25

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too many dependencies lead to over-tainting

2/9/2009 26

Basic IdeaData dependencies

Taint propagates from operands to the output of an operation

Control dependenciesVariables assigned within a conditional branch

receive taint from the operands of the conditionCommonly omitted in DTA: leading to under-

taintingKey idea in DTA++: propagate taint only for

control dependencies that would otherwise cause under-tainting (culprit implicit flows)

2/9/2009 27

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Intuition: Information Flow

2/9/2009 28

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Intuition: Information Flow

1 char output[256];2 char input = next_in();3 long len = 0;4 if (input == '{') {5 output[0] = '\\';6 output[1] = '{';7 len = 2;8 }

2/9/2009 29

Offline Rule GenerationHypothesis: under-tainting occurs at just a few

locations in a program (culprit branches)Approach: find these locations in advance, and

construct new taint propagation rules form themAssumption: we are given test inputs that

demonstrate the under-tainting

2/9/2009 30

Architecture Overview

Extra Propagation

Conventional DTA

Extra Propagation

Conventional DTA

Under-taintingDiagnosis

Rule Generation

correct propagationinformation

sampletainted input execution

traceimplicit flow

branches

DTA++ propagation rules

Offline Analysis

generaltainted input

trace(or other analysis)

2/9/2009 31

Under-tainting Detection PredicateGiven a (partial) execution trace t, φ(t) holds if t

contains a culprit implicit flowImplementation: count how many other inputs

could take the same execution path as t (using symbolic execution)Few or none →φ(t) = true

2/9/2009 32

Search for Culprit BranchesSearch through prefixes of a trace to find the

shortest satisfying φ: the last instruction in the prefix is the culprit

To minimize calls to φ, use binary searchAfter finding one culprit, remove it and repeat

the search to find others

2/9/2009 33

Experiment Setup• Subject programs are 8 Windows word-

processing applications in binary form• Input tainted plain text from virtual

keyboard• Convert and save the text in RTF or HTML

– RTF: “Taint it: {” →“Taint it \{”– HTML: “Taint it: <” →“Taint it: &lt”

2/9/2009 34

Results: Performance

ProgramDescription

# of CulpritImplicit Flows

Detected & Fixed

Time forDiagnosis

WordPad, RTF 1 0.26s

MS Word 2003, RTF 24 31m 5.26s

AbiWord, HTML 1 14.29s

AngelWriter, HTML 3 0.63s

AurelEdit, RTF 1 0.76s

VNU Editor, RTF 1 0.34s

IntelliEdit, RTF 1 0.40s

CryptEdit, RTF 1 0.23s

2/9/2009 35

Measuring Over-tainting• After saving the file, count the number of

tainted bytes in system memory– Also counted tainted branches (in paper)

• Four levels of propagation:– Original: vanilla DTA (has under-tainting)– Optimal: fix a single instruction manually– DTA++: targeted control-flow propagation– DYTAN*: indiscriminate control-flow

propagation (similar to Clause et al.)

2/9/2009 36

Over-tainting Measurements

2/9/2009 37

Questions?

2/9/2009 38

Related WorkIDAProVSANaClTIEBIRD


Recommended