+ All Categories
Home > Documents > William Whistler REcon 2010. Who am I Long-time reverser Studied Computer Science at Oxford ...

William Whistler REcon 2010. Who am I Long-time reverser Studied Computer Science at Oxford ...

Date post: 31-Dec-2015
Category:
Upload: robert-lambert
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
47
Reversing, better William Whistler REcon 2010
Transcript
Page 1: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Reversing, betterWilliam WhistlerREcon 2010

Page 2: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Introduction

Who am I Long-time reverser Studied Computer Science at Oxford Enjoys reversing challenges▪ T2 2007 winner, etc.

Purpose of this presentation To describe the concepts behind a new

application for reversing called REvealer

Page 3: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Introduction

Let’s model things like ‘real’ engineers Allows us to ask the model questions Include everything we discover▪ Both through manual and automated techniques

Gradually move to higher-level concepts But still allow drilling down to the exact

detail

Page 4: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Dynamic Taint Analysis

Step through, instruction by instruction

If the input to something is tainted, the output is tainted

{define eax as tainted}

mov ebx, eax

{tainted: eax, ebx}

add edx, ecx

{tainted: eax, ebx}

mov eax, edx

{tainted: ebx}

add edx, ebx

{tainted: ebx, edx}

Page 5: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Dynamic Taint Analysis

Used by vulnerability researchers to find where data from the network is used

But useful in other reversing contexts too

{define eax as tainted}

mov ebx, eax

{tainted: eax, ebx}

add edx, ecx

{tainted: eax, ebx}

mov eax, edx

{tainted: ebx}

add edx, ebx

{tainted: ebx, edx}

Page 6: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Dynamic Taint Analysis

Dynamic, so deals with one execution path No problems with loops/etc

Exact memory addresses are available at use

Just look up esi and ecx to get the exact memory address being read

Some complications, but still very useful

mov eax, dword ptr [esi+ecx]

Page 7: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Multiple Taint Sources

More than one taint source

Analysis done simultaneously

Locations can be tainted by more than one source

{eax tainted red} {ecx tainted green}

mov ebx, eax

{red: eax, ebx} {green: ecx}

add edx, ecx

{red: eax, ebx} {green: ecx, edx}

mov eax, edx

{red: ebx} {green: eax, ecx, edx}

add edx, ebx

{red: ebx, edx} {green: eax, ecx, edx}

Page 8: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Multiple Taint Sources

Let’s define every external input to our code as a taint source

CPU timestamp, file reads, network buffers, API results…

Useful to see where values have come from when looking at code for the first time

Page 9: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Static Taint Analysis

We want to deal with more than one particular run through

Ideally all possible!

So we go static instead

But now we have some problems…

Page 10: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Static Taint Analysis

What does it access? Indirect memory reads/writes▪ mov eax, dword ptr [esi+ecx]

Where does it go? Conditional jumps▪ jz

Indirect jumps/calls▪ jmp eax

Self-modifying code Loops Exceptions

Page 11: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Path Explosion Problem

100101

100: test eax, eax101: jz 103102: …103: test ecx, ecx104: jz 106105: …106: test ebx, ebx107: jz 109108: …109: …

103104

105106107

106107

102103104

105106107

106107

Page 12: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Paths

Recombining paths “OR” taint status of each location? Store different possibilities somehow?

Let’s go with the former for now.

Which ones to recombine? Obvious choice: by EIP?

Page 13: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Paths

100101

100: test eax, eax101: jz 103102: …103: test ecx, ecx104: jz 106105: …106: test ebx, ebx107: jz 109108: …109: …

103104

102

106107

105

Page 14: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Paths

But why be so strict? Be user-guided, not fully automated

When is keeping paths separate useful? Self-modifying code Loop unrolling Obfuscations▪ Intrainstruction jumps

Page 15: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Storing results

We want to ask what’s tainted at any point the user chooses, not just termination

Repeating the whole analysis is slow

Let’s store what we learn along the way Store all tainted sets at each point? Store differences? Store effects?

Page 16: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Storing by difference

Store what changes for each taint source after each instruction

Query at any point by working back up

Can either query for the full set or a subset of locations

{eax tainted red} {ecx tainted green}

mov ebx, eax

{red: +ebx} {green: no change}

add edx, ecx

{red: no change} {green: +edx}

mov eax, edx

{red: -eax} {green: +eax}

add edx, ebx

{red: +edx} {green: no change}

Page 17: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Storing by effects

mov ebx, eax If eax was tainted, ebx is now tainted If eax was untainted, ebx is now

untainted

add ebx, eax If eax was tainted, ebx is now tainted If eax was untainted, ebx remains as

it was

Flags updated in a similar way

Page 18: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Storing by effects

Uses the principle of “lazy evaluation”

If we only do specific queries, this is much more efficient

Allows us to consider intermediary locations as ‘internal’ taint sources e.g. Ask what’s tainted by a function

parameter

Page 19: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Slicing

See what’s involved in calculating something, not just the taint sources used

Plenty of practical uses Where was this buffer encrypted? Where was this checksum calculated?

Can even hide the instructions we’re not interested in as a nice visualization

Page 20: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Slicing

mov eax, 100mov ebx, 200mov ecx, 300mov edx, 400add eax, ebxadd ecx, edxadd eax, ecx

eax

mov eax, 100mov ebx, 200add eax, ebx

ecx

mov ecx, 300mov edx, 400add ecx, edx

Page 21: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Data flow

mov eax, 100mov ebx, 200mov ecx, 300mov edx, 400add eax, ebxadd ecx, edxadd eax, ecx

add eax, ecx

add eax, ebx

eax

eax

mov eax,100

add ecx, edx

ecx

mov ecx,300

mov ebx,200

mov edx,400

ebx ecx edx

Page 22: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Symbolic Execution

mov eax, 100mov ebx, 200mov ecx, 300mov edx, 400add eax, ebxadd ecx, edxadd eax, ecx

add eax, ecx

add eax, ebx

eax=300

eax=100

mov eax,100

add ecx, edx

ecx=700

mov ecx,300

mov ebx,200

mov edx,400

ebx=200 ecx=300 edx=400

eax=1000

Page 23: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Symbolic Execution

Why restrict ourselves to the “route” taken by information?

Emulate instructions when values are known

As well as storing the taint effects of each instruction, store an evaluation function too

Page 24: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Symbolic Execution

Allows us to deal with some (but not all) branches and indirect jumps/calls Opaque predicates

Similar to certain compiler optimizations (constant folding)

For merged paths, similar choice to before If the incoming values are different, just forget it Find something more complicated to keep all

possibilities

Page 25: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstract Interpretation

Why restrict ourselves to exact values?

More limited information can still be useful

Many possibilities Even/odd▪ Or other modular congruencies

Positive/negative Alphanumeric or not Ranges

Page 26: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstract Interpretation

Instructions can pass non-exact value information

Can be specialized for specific cases (as with the mul here)

{eax totally unknown}

mul eax, 2

{eax is even}

inc eax

{eax is odd}

Page 27: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Backwards Abstract Interpretation

Requires “inverse” evaluation functions for instructions Many will not be possible or only provide

limited information

Allows the user to ask “what restrictions must hold in order for a value to match” Meeting the restrictions does not

guarantee that the value matches – not weakest preconditions

Page 28: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Path constraints

Every branch choice depends on values e.g. a jz will branch one way if the z flag is set, the other if

not

We can spread this back up in the same way And use this information to improve the analysis

Allows path-specific “input crafting” “What must be true for this code to be reached”

Useful against control-flow obfuscation

Further links with compiler optimizations

Page 29: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

More complex solving

So far, no use of constraint solvers or computer algebra systems

Every input was considered “independent” of other calculations

But that’s not always the case e.g. polynomial equations

So we can use the external systems to help But we can reduce the complexity of queries by

using our analysis information

Page 30: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstraction

So far, no functions – entirely interprocedural

No assumptions about code structure used e.g. stack frames, functions always return, etc.

Haven’t been able to deal with APIs or loops

Simple solution: think of them as instructions

Page 31: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstraction: Functions

Bunch existing instructions together

Not just for labelling and visualizations; allows function specific improvements

Perform costly optimizations only once

Allows custom inverse implementations e.g. MD5 cracking

Page 32: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstraction: APIs

Recognize the API based on call address Can use a symbolic base for modules

Define what we know about behaviour e.g. Possible return values

External modelling e.g. For writing/reading pipes or atoms

Can check for anti-debug/emulation too Overwritten API code Using undefined OS behaviour

Page 33: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstraction: Loops

The loop becomes an instruction with all potentially read locations as inputs and all potentially written locations as outputs Can be refined further by checking internal constraints

A few cases have easy models e.g. repeated adds = multiplication

Otherwise, do what we can Anything too complicated is just defined as tainted by all

potential inputs with no evaluation

Again, the user can improve the analysis manually Unroll iterations, add evaluators, etc.

Page 34: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Abstraction

Gradually forming more expressive units Becoming increasingly higher-level More flexible than decompilation

Can always zoom back in to the details

Reflects what’s happening in the analyst’s mind Figuring out components and how they fit together Easier to manage and navigate than just by

function

Page 35: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Memory

Automated static analysis tools can’t miss bugs False positives are a necessary evil for them

Whenever there’s ambiguity, they have to assume the worst

We can allow the user to focus on the cases they consider interesting The ‘normal’ cases for some RE tasks The ‘edge’ cases for others (e.g. vuln research)

Page 36: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Memory

Considering when accesses overlap A form of alias analysis

eax

?

ebx

mov [eax], 123

mov [ebx], 456

mov ecx, [eax]

?

Page 37: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Memory

Pointers to objects don’t just magically appear Based on results from allocation functions

So compare the taint sources!

Not immune to deliberate obfuscation e.g. a loop searching memory, like egg hunters

in shellcode

… but still useful in many cases

Page 38: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Memory

mov [eax], 123

mov [ebx], 456

mov ecx, [eax]

esi Sharing a taint

source – prioritize equal case

eax

?

ebx

?

Page 39: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Memory

mov [eax], 123

mov [ebx], 456

mov ecx, [eax]

No shared taint source – prioritize unequal case

esi edi

eax

?

ebx

?

Page 40: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Dynamic

Perform runs to collect example values Can be guided by our analysis▪ Input values▪ Which values to collect▪ Where to intercept

Useful for still-unresolved indirect calls/jumps Also useful to just display for the user!

Improve memory heuristics

Timing and hit count information also useful Much like profiling

Page 41: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Virtualizing Packers

Opcode handlers become instructions

Suddenly we have a disassembler and debugger with no additional effort Can compare routines between VM and

native implementation

Could be useful for Java or Flash VMs too Needs further investigation

Page 42: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

No Intermediate Language?

Haven’t mentioned any sort of Intermediate Language

The whole thing can be thought of as an extensible IL

Share defined abstractions between projects For library code, etc.

Page 43: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Principles

Integrate techniques Share information as much as possible

Stay interactive Both for analysis and visualization

Allow flexible abstraction Reflect what happens in the analyst’s

mind

Page 44: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Theoretical extensions

Type recovery C++ classes

Dealing with new obfuscation methods White-box cryptography

Functional programming?

Page 45: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Implementation - REvealer Under heavy development

Release likely in early 2011

GUI Useful visualization of all this

Dynamic extensions Hypervisor based?

Other information sources Debugging symbols

Extensible… somehow! Python? Lua?

Page 46: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Special Thanks

Rolf Rolles Sean Heelan Cyril Paciullo Shawn Zivontsis Jonathan Kay Sérgio Silva Tareq Saade Daniel Hauenstein Jason Geffner

Page 47: William Whistler REcon 2010.  Who am I  Long-time reverser  Studied Computer Science at Oxford  Enjoys reversing challenges ▪ T2 2007 winner, etc.

Thank you for listening!

Mailing list: http://revealer.co.uk

E-mail: [email protected] Twitter:

http://twitter.com/WillWhistler

Other useful links: http://reddit.com/r/ReverseEngineeri

ng http://ref.x86asm.net


Recommended