+ All Categories
Home > Documents > The College of W ILLIAM M ARY Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang.

The College of W ILLIAM M ARY Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang.

Date post: 17-Dec-2015
Category:
Upload: tyler-jordan
View: 216 times
Download: 0 times
Share this document with a friend
49
The College of WILLIAM k MARY Mimimorphism: A New Approach to Binary Code Obfuscation Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang
Transcript

The College of

WILLIAM k MARY

Mimimorphism:A New Approach to Binary Code

ObfuscationZhenyu Wu, Steven Gianvecchio, Mengjun Xie

Advisor: Dr. Haining Wang

The College of

WILLIAM k MARY 2

Internet & Ubiquitous Computing◦ Billions of networked computers◦ Playground for malware

Suppression Techniques◦ Static analysis

Low latency, high throughput Widely used, IDS deployable

◦ Dynamic analysis

Malware Propagation & Detection

The College of

WILLIAM k MARY 3

Un-obfuscated◦ Binary in plain

Oligomorphism◦ Simple transformation (XOR)

Polymorphism◦ Compression and encryption

Metamorphism◦ Meta transformation (P-code)

State of the Art◦ Control-flow encryption◦ Byte frequency manipulation

Unique substring◦ Segments of the binary

Algorithmic detection◦ Build in transformations

Statistical analysis◦ Anomalies in code body

Advanced pattern matching◦ N-gram signatures

Semantic analysis◦ Persist high-level fingerprints

The Game of Hide and Seek

The College of

WILLIAM k MARY 4

Fugitive On The Run

WANTED

$5,000,000

The College of

WILLIAM k MARY 5

Fugitive On The Run

The College of

WILLIAM k MARY 6

Polymorphism◦ Compression & Encryption

Nobody looks like a small dark box!

Fugitive On The Run

??

The College of

WILLIAM k MARY 7

Metamorphism◦ Reordering Components

Cannot evade feature detections

Fugitive On The Run

Wanted

$5,000,000

!!

The College of

WILLIAM k MARY 8

Control Flow Encryption◦ Prevent feature analysis

Increases suspicion

Fugitive On The Run

??

The College of

WILLIAM k MARY 9

The Real Player◦ Assume other people’s identity (Mimicry)

Fugitive On The Run

The College of

WILLIAM k MARY 10

Lessons Learned:◦ Evasion without obfuscating features

◦ Evasion by refusing inspection

◦ Evasion by mimicking Obfuscating original features Open to inspection, but disguises detection

Fugitive On The Run

The College of

WILLIAM k MARY 11

Mimimorphism:◦ Reversible transformation of an executable that produces

output statically resembles other benign programs

◦ Characteristics: Completely erases features from the original binary High order statistics matches benign executables Transformed payload consists of “meaningful” control flows,

highly resemble those from benign executables

Binary Executable Mimicry

The College of

WILLIAM k MARY 12

Text Stenography Technique◦ Transforms the input data and produces mimicry output

copies that assume statistical and grammatical (structural) properties of another type of data

◦ Originally proposed by Peter Wayner as means to transport sensitive data under harsh surveillance Novel use of Huffman coding

Mimic Functions

The College of

WILLIAM k MARY 13

Huffman Tree

Huffman Coding◦ Digesting

Builds a Huffman tree according to the symbol frequency

◦ Encoding Removes redundancies of the input

data using a given Huffman tree◦ Decoding

Recovers the original data from the “condensed” data by emitting symbols according to the original Huffman tree

Mimic Functions

s

m a

0 1

0 1

mass 000111(32 bits) (6 bits)

01 s00 m01 a

The College of

WILLIAM k MARY 14

What if we decode a piece of random data?◦ Produces “meaningless” data, but

The output exhibits similar symbol frequency to the digest- and -

Input data can be recovered by Huffman encode

Regular Mimic Function◦ Learn: Build a Huffman tree from sample text◦ Mimicry: Huffman decode on input (randomized)◦ Recover: Huffman encode

Mimic Functions

The College of

WILLIAM k MARY 15

Huffman “Forest”

Insufficiencies◦ Produces illegible, garbled text◦ Frequency distributions follow 2n

distribution High-order Mimic Function

◦ Captures interdependencies Build multiple Huffman trees One for each unique symbol prefix

◦ Produces “sensible” text with much more “natural” symbol frequency distributions

Mimic Functions

c

l n

0 1

0 1

chi

p t

0 1

ins

rou

t

0 1

n g

0 1

The College of

WILLIAM k MARY 16

Mimicry of Peter Wayner’s paper◦ Produced by 6th order mimic function

Each of these historical reason, I don’t recommend using gA(t) to choose the safe. These one-to-one encoded with n leaves and punctuation. The starting every intended to find the same order mimic files. A Method is to break the trees by constructing the mimics the path down the most even though, offer no way that is, in this paper. Figure will not overflow memory. These produced by truncating letter. This need to handle n-th ordered compartment of nonsense words cannot bear any resemblance to B because this task is a Huffman showed in [1], [2], [3] among others.

Mimicry Text Sample

The College of

WILLIAM k MARY 17

The Challenge: Machine Language Mimicking◦ Consists of instructions and control flows

Each instruction has a strict format to follow Machines never make “typo”, or use wrong “tense”!

◦ Mimic function has no knowledge of instructions Often makes mistakes generating instructions Have a low success rate of creating mimicry control flows

Our Solution◦ Integrate a custom assembler / disassembler◦ Help the mimic function understand the language

Mimimorphism

The College of

WILLIAM k MARY 18

Digesting

Mimimorphism: Digesting

Exec.Binaries

Mimicry Target

Disassemble

High Order Instruction

Mimic Function

Instruction Huffman Forest

Mimicry Digest

PUSH

DEC

MOV

XOR

Control Flows

The College of

WILLIAM k MARY 19

Digesting

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

MOV

Mimimorphism: Digesting

Exec.Binary

INC

PUSH

0 1

0 1

PUSH

DEC

MOV

XOR

COMMON_INST Structure

Instruction Huffman Tree

Instruction Prefix

MOV

MOV

XOR

PUSH

DEC

The College of

WILLIAM k MARY 20

Digesting

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Digesting

INC

PUSH

0 1

0 1

XOR

PUSH

DEC

PUSH

DEC

MOV

XOR

COMMON_INST Structure

Instruction Huffman Tree

Instruction Prefix

MOV

Instruction Encoding TemplateMOV

The College of

WILLIAM k MARY 21

DigestingMOV

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Digesting

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Instruction Encoding Template

The College of

WILLIAM k MARY 22

Digesting

Mimimorphism: Digesting

INC

PUSH

0 1

0 1

XOR

PUSH

DEC

Instruction Huffman Tree

Instruction Prefix

MOV

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Instruction Encoding Template

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

The College of

WILLIAM k MARY 23

Digesting

Mimimorphism: Digesting

MOV

INC

PUSH

0 1

0 1

XOR

PUSH

DEC

Instruction Huffman Tree

Instruction Prefix

MOV

XOR

PUSH

DECXOR

PUSH

DEC

XOR

PUSH

DEC

MOV

Instruction PrefixMOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

The College of

WILLIAM k MARY 24

Digesting

Mimimorphism: Digesting

MOV

INC

PUSH

0 1

0 1

XOR

PUSH

DEC

MOV

CMP

XCHG

10

10

PUSH

DEC

MOV

JMP CALL

10

DEC

MOV

POP

Mimimorphic Digest

Instruction Prefix

PUSH

DEC

MOV

The College of

WILLIAM k MARY 25

Encoding

Mimimorphism: Encoding

Binary Data

PRNG

High Order Instruction

Mimic Function

Mimicry Digest

Assemble

MimicryBinaries

The College of

WILLIAM k MARY 26

Encoding

Mimimorphism: Encoding

Binary Data

01001001100101010001010010001001

XOR

PUSH

DEC

Instruction Prefix

Mimicry Digest

MOV

INC

PUSH

0 1

0 1

XOR

PUSH

DEC

Instruction Huffman Tree

The College of

WILLIAM k MARY 27

Instruction Encoding Template

Encoding

Mimimorphism: Encoding

Binary Data

01001001100101010001010010001001

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XOR

PUSH

DEC

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

The College of

WILLIAM k MARY 28

Instruction Encoding Template

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

The College of

WILLIAM k MARY 29

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

COMMON_INST Structure

Instruction Encoding TemplateMOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

The College of

WILLIAM k MARY 30

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

COMMON_INST Structure

PUSH

DEC

?

XOR

MOV

The College of

WILLIAM k MARY 31

Encoding

Mimimorphism: Encoding

01001001100101010001010010001001

PUSH

DEC

MOV

XOR

MOV

XOR

PUSH

DEC

MOV

Instruction Prefix

The College of

WILLIAM k MARY 32

Decoding

Mimimorphism: Decoding

Binary Data

PRNG

High Order Instruction

Mimic Function

Mimicry Digest

MimicryBinaries

Disassemble

The College of

WILLIAM k MARY 33

Training◦ Select 100 Windows XP system files as mimicry target

They represent typical legitimate binaries◦ Trained using 7th and 8th order mimimorphic engines

Most control flow basic blocks have 7-8 instructions

Evaluations◦ Statistical Anomaly Tests

Kolmogorov-Smirnov Test & Entropy Test◦ Semantic Detection Test

Control Flow Fingerprinting

Experimental Setup

The College of

WILLIAM k MARY 34

Statistical Tests◦ Kolmogorov-Smirnov Test

Maximum byte frequency distribution differences

Legitimate: 0.074±0.045; Mimimorphic: 0.093±0.006

◦ Entropy Test Measurement of predictability

(or randomness) of data Legitimate: 6.353±0.258;

Mimimorphic: 6.528±0.021

Evaluation Results

0.074

0.09

6.353

0.516

The College of

WILLIAM k MARY 35

Semantic Tests◦ Control Flow Fingerprinting

Statically analyze executables (with a special disassembler) and extract control flow patterns

Detecting malwares by matching their characteristic control flow patterns (i.e., shared fingerprints)

◦ Between original binary and Mimimorphic instances Shared fingerprints: the lower the better Only 1 out of 100 instances share a single fingerprint (out of

hundreds of thousands fingerprints)

Evaluation Results

The College of

WILLIAM k MARY 36

Semantic Tests◦ Between mimimorphic and legitimate binaries

Shared fingerprints: the higher the better 7th order mimimorphic instances:

Average 1856.46±372.5 (72.93 benign files) Minimum 1057 (44 files); Maximum 3321 (92 files)

8th order mimimorphic instances: Average 11407.99±912.42 (81.37 benign files) Minimum 9606 (70 files); Maximum 14216 (91 files)

Evaluation Results

The College of

WILLIAM k MARY 37

Semantic Tests◦ A sample mimicry control

flow pattern Reproduced by a 7th order

mimimorphic instance

Evaluation Results

The College of

WILLIAM k MARY 38

Application Constraint◦ Memory consumption: 600MB for 7th order and 1.2GB for

8th order mimimorphic transformation Disk-based on-demand digest storage

◦ Size increase: 20x inflation for 7th order and 30x for 8th order mimimorphic transformation Typical malware are less than 100KB Mimimorphism results in 2~3MB files

Limitations & Discussions

The College of

WILLIAM k MARY 39

We propose mimimorphism as a novel binary obfuscation technique

◦ Enhanced high order mimic functions with custom assembler / disassembler

◦ Achieves evasion by disguising, not refusing detection

◦ Effective against both statistical anomaly detection as well as semantic fingerprinting tests

Conclusion

The College of

WILLIAM k MARY 40

Robustness against other approaches◦ Automatic n-gram detections

Typical x86 instruction length: 2.1~2.8 8th order mimimorphism can approach 16-gram mimicry Existing n-gram detection algorithms can hardly scale up to

◦ Static semantic analysis Mimimorphism does not target specific detection techniques Focuses on reproducing features from benign programs Immune to lower order signature detections

Limitations & Discussions

The College of

WILLIAM k MARY 41

Robustness against other approaches◦ Deep syntactic analysis

Fails to exactly reproduce high level syntactic features: 45% “functions” do not have matching prologue and epilogue Many jump instructions go across function boundaries

Detectable program-level anomalies Not all programs follow conventions Could lead to false positives

Limitations & Discussions

The College of

WILLIAM k MARY

Questions?

The College of

WILLIAM k MARY 43

The Problem of the Unpacker◦ Mimimorphic transformation does not provide solution for

hiding the unpacker◦ However, we believe unpackers do benefit from using

mimimorphism Unpacker is the weakness of polymorphism because it is

easy to be “spotted” – all other payload is not executable! All mimimorphic payload is “executable”, separating

unpacker code from the payload becomes non-trivial

Limitations & Discussions

The College of

WILLIAM k MARY 44

Decoding

Mimimorphism: Decoding

Binary Data

PRNG

High Order Instruction

Mimic Function

Mimicry Digest

MimicryBinaries

Disassemble

The College of

WILLIAM k MARY 45

Decoding

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

MOV

Mimimorphism: Decoding

MimicryBinary

MOV

INC

PUSH

0 1

0 1

PUSH

DEC

MOV

XOR

COMMON_INST Structure

Instruction Huffman Tree

Instruction Prefix

MOV

MOV

XOR

PUSH

DEC00

Decoded Bits

The College of

WILLIAM k MARY 46

XOR

PUSH

DEC

Decoding

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Mimimorphism: Decoding

MOV

INC

PUSH

0 1

0 1

COMMON_INST Structure

Instruction Huffman Tree

Instruction Prefix

MOV

00

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Decoded Bits

Decoded Bits

The College of

WILLIAM k MARY 47

Decoding

Mimimorphism: Decoding

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

Decoded Bits

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

16bit

ECX

3x4+0

0101

The College of

WILLIAM k MARY 48

Decoding

Decoded Bits

Mimimorphism: DecodingDecoded Bits

MOV

Inst. Prefixes(Atomic op., repeat, operand size, etc.)

ModR/M(Mod / Reg. / R/M)

SIB(Scale / Idx. / Base)

Displacement

0101

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XOR

PUSH

DEC

MOV

Inst. Prefix

16bit REP

0 1

ModR/M

EAX

0 1

ECX EDX

0 1

……

DisplacementSIB

2x8+16 3x4+0

0 1

16bit

ECX

3x4+0

00

The College of

WILLIAM k MARY 49

Decoding

0100100110010101

Decoded Bits

Mimimorphism: Decoding

MOV

INC

PUSH

0 1

0 1

Instruction Huffman Tree

MOV

XOR

PUSH

DEC010100

Instruction Prefix

XOR

PUSH

DEC

XOR

PUSH

DEC

MOV

Instruction Prefix XOR

PUSH

DEC


Recommended