Post on 06-Feb-2016
description
transcript
Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010
Paradyn Project
Safe and Efficient Instrumentation
Andrew Bernat
Binary Instrumentation
2Safe and Efficient Instrumentation
• Instrumentation modifies the original code• Moves original code• Allocates new memory• Overwrites original code
• This affects the behavior of:• Moved code• Code that references moved code• Code that references changed memory
• And can cause incorrect execution
Sensitivity Models
• A program is sensitive to a particular modification if that modification changes the program’s behavior
• Current binary instrumenters rely on fixed sensitivity models• And may fail to preserve behavior
• Compensating for sensitivity imposes overhead
3Safe and Efficient Instrumentation
push $(ret_addr)jmp printfcall printf
retpop %eaxcall compensate_ret_addrjmp %eax
Safe and Efficient
Approach
Safe and Efficient
Approach
Efficiency vs Sensitivity
4Safe and Efficient Instrumentation
SensitivityMalware
Optimized Code
Conventional Code
Effi
cien
cy
Pin, Valgrind, …
Dyninst
Safe and Efficient
Approach
How do we do this?
• Formalization of code relocation• Visible behavior• Instruction sensitivity• External sensitivity
• Implementation in Dyninst• Analysis phase• Transformation phase
• Analysis and performance results
5Safe and Efficient Instrumentation
Three Questions
• What program behavior do we wish to preserve?
• How does modification affect instructions?
• How do instructions change program behavior?
6Safe and Efficient Instrumentation
Approach
• Preserve visible behavior• Relationship of input to output
• Identify sensitive instructions• Those whose behavior is changed
• Only compensate for externally sensitive instructions• Those whose sensitivity affects visible
behavior
7Safe and Efficient Instrumentation
Original Binary
Instrumented Binary
Visible Behavior
• Intuition: we can change anything that does not affect the output of the program
8Safe and Efficient Instrumentation
X YX + A Y + BInstrumentati
onInput
Instrumentation
Output
Sensitivity
• What does instrumentation change?• Addresses of instructions• Contents of memory• Shape of the address space
• Directly affected instructions:• Access the PC (and are moved)• Read modified memory• Test allocated memory
9Safe and Efficient Instrumentation
Sensitivity Examples
10Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Call/Return Pair:
Jumptable:protect: call initialize <data buffer>…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified):
Sensitivity Is Not Enough
• An instruction is externally sensitive if it causes a visible change in behavior• Approximation: or changes control flow
• This requires:• The sensitive instruction must produce
different values• These differences must reach an instruction
that affects output (or control flow)• … and change its behavior
11Safe and Efficient Instrumentation
Program Modification
12Safe and Efficient Instrumentation
Analysis
Compensation
Code
Original Binary
Modified Binary
Original Code
Relocated Code
Analysis Phase
• Identify sensitive instructions• InstructionAPI: used and defined sets
• Determine affected instructions• DepGraphAPI: forward slice
• Analyze effects of modification• SymEval: symbolic expansion of the slice
13Safe and Efficient Instrumentation
Analysis Example: Call/Return Pair
14Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Call/Return Pair:
Sensitivity: call (moved, uses PC)
Slice: call ret
Symbolic Expansion: call: ret:
Analysis Example: Jumptable
15Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call mov (%esp), %ebx
Symbolic Expansion: call: mov: add: mov: jmp:
jumptable: push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
Jumptable:
add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Analysis Example: Unpacking Code
16Safe and Efficient Instrumentation
Sensitivity: call (moved, uses PC)
Slice: call initialize pop %esi mov (%esi, %ebx, 4), %eax call unpack … Symbolic Expansion: call: pop: mov:
protect: call initialize <data buffer>…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Self-Unpacking Code(Simplified)
Compensation Phase
• Generates the relocated code
• Current instrumenter approach:• Treat each instruction individually• May miss optimization opportunities
• New approach: group transformation• Derived from Dyninst heuristics
17Safe and Efficient Instrumentation
Instruction Transformation
• Emulate each externally sensitive instruction• Replace some instructions (e.g., calls) with
sequences
• Some sequences impose high overhead• E.g., run-time compensation
18Safe and Efficient Instrumentation
pop %eaxcall compensate_ret_addrjmp %eax
ret
call printf push $(orig_ret_addr)jmp printf
Group Transformation
• Emulate the behavior of a group of instructions• Motivating example: compiler thunk
functions
• Open questions:• Which instructions are included in the
group?• How is the replacement sequence
determined?
• Current status: hand-crafted templates
19Safe and Efficient Instrumentation
mov (%esp), %ebxret
call ebx_thunk
mov $(ret_addr), %ebx
Transformation: Call/Return Pair
20Safe and Efficient Instrumentation
main: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Original Codemain: push %ebp mov %esp, %ebp … call worker … leave ret
worker: push %ebp mov %esp, %ebp … ret
Relocated Code
Transformation: Jumptable
21Safe and Efficient Instrumentation
Original Code Relocated Codejumptable:
push %ebp mov %esp, %ebp call get_pc_thunk add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
get_pc_thunk: mov (%esp), %ebx ret
jumptable: push %ebp mov %esp, %ebp mov $(ret_addr), %ebx add $(offset), %ebx mov (%ebx, %eax, 4), %ecx jmp *%ecx
Transformation: Unpacking Code
22Safe and Efficient Instrumentation
Relocated Codeprotect:
call initialize <data buffer>…initialize: pop %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Original Codeprotect: jmp initialize <data buffer>…initialize: mov $(ret_addr), %esi mov $(unpack_base), %edi mov $0x0, %ebxloop_top: mov (%esi, %ebx, 4), %eax call unpack mov %eax, (%edi, %ebx, 4) inc %ebx cmp %ebx, $0x42 jnz loop_top jmp $(unpacked_base)
Results
Type of Binary % PC Sensitive % Externally Sensitive
% Unanalyzable
Executable (a.out) 9.0% 0.01% 0.59%
Library (.so) 7.9% 0.55% 0.72%
23Safe and Efficient Instrumentation
Percentage of PC-Sensitive Instructions (32-bit, GCC, static analysis)
Instrumentation Overhead (go, 32-bit, 12.3s base time)
Current Dyninst: 23.4s (90.2%)Safe and Efficient Algorithm: 16.3s (32.5%)
Future Work
• Memory sensitivity and compensation• Improved pointer analysis• Useful user intervention?
• Investigate group transformations• Widen range of input binaries• Expand supported platforms
24Safe and Efficient Instrumentation
Questions?
25Safe and Efficient Instrumentation
ASProtect code loop
26Safe and Efficient Instrumentation
8049756: call 8049761
8049761: mov EDX, ECX8049763: pop EDI8049764: push EAX8049765: pop ESI8049766: add EDI, 2183804976c: mov ESI, EDI804976e: push 08049773: jz 804977c
8049779: adc DH, 229
804977c: pop EBX804977d: mov EAX, 2015212641
8049782: mov ECX, EBX(EDI)8049785: jmp 804979c
804979c: add ECX, 158698631680497a2: xor ESI, 31433375680497a8: xor ECX, 59491573380497ae: jmp 80497c3
80497c3: sub ECX, 59494877880497c9: sub ESI, 6426080497ce: push ECX, ESP80497cf: mov EAX, 88437732180497d4: pop EBX(EDI)80497d7: jmp 80497ed
80497ed: adc AL, 10080497f0: sub EBX, 159502605080497f6: xor EAX, 3477880497fb: add EBX, 15950260468049801: call 804980c
804980c: mov AX, 27838049810: pop ESI8049811: cmp EBX, 42949653448049817: jnz 8049834
804981d: or ESI, 8391819108049823: jmp 8049847
8049834: mov ESI, 12875703758049839: jmp 8049782
Emulation Examples
27Safe and Efficient Instrumentation
add %eax, %ebx
jnz 0xf3e
call fprintf
mov (%esi, %ebx, 4), %eax
jnz 0xe498d3
add %eax, %ebx
push $804391jmp fprintf
lea (%esi, %ebx, 4), %eaxcall mem_addr_translatemov (%eax), %eax
retpop %eaxcall addr_translatejmp %eax