+ All Categories
Home > Documents > Syntia: Synthesizing the Semantics of Obfuscated Code mov r15, 0x200 xor r15, 0x800 mov rbx, rbp add...

Syntia: Synthesizing the Semantics of Obfuscated Code mov r15, 0x200 xor r15, 0x800 mov rbx, rbp add...

Date post: 17-Jul-2018
Category:
Upload: dangbao
View: 241 times
Download: 0 times
Share this document with a friend
36
Syntia: Synthesizing the Semantics of Obfuscated Code Tim Blazytko Moritz Contag Cornelius Aschermann Thorsten Holz Ruhr-Universität Bochum August 17, 2017
Transcript

Syntia: Synthesizing the Semantics of Obfuscated Code

Tim Blazytko Moritz Contag Cornelius Aschermann Thorsten Holz

Ruhr-Universität Bochum

August 17, 2017

Code obfuscation

�~I = (i1, . . . , in) ~O = (o1, . . . , om)

?~I = (i1, . . . , in) ~O = (o1, . . . , om)

semantics-preserving transformation

DRM systems, software protection systems, malware

Tim Blazytko (RUB) 2 / 25

Mixed Boolean-Arithmetic

x + y + z(((x ⊕ y) + ((x ∧ y) << 1)) ∨ z) + (((x ⊕ y) + ((x ∧ y) << 1)) ∧ z)

hard to simplify symbolically (NP-complete)

Tim Blazytko (RUB) 3 / 25

Virtual Machine-based obfuscation

VM Entryswitch from native

to VM context

Native Code

5b 60 97 84 66 d8 aa 11 22

Bytecode

Fetch

handler_add8handler_mul16handler_not8

…handler_sub32

Handler Table

Decode

Execute

obfuscated code is interpreted by virtual CPUTim Blazytko (RUB) 4 / 25

Related work

Yadegari et al. use taint analysis and symbolic execution for deobfuscation(S&P 2015)

Banescu et al. introduce code obfuscation against symbolic execution attacks(ACSAC 2016)

Contributionsorthogonal approach to traditional techniqueslearn the code’s semantic based on its I/O behaviorgeneric approach for trace simplification via program synthesis

Tim Blazytko (RUB) 5 / 25

Syntactic versus semantic complexityRAX = { ( ( ( ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + 0x5483B8CA ) * 0xA7E08A93 ) + 0x4FC311F5 ) 0 32, 0x0 32 64 }

RAX = ((M3 * M2) ^ M4)RAX = ((M3 * M2) ^ M4)

Tim Blazytko (RUB) 6 / 25

Symbolic execution and program synthesis

semantic

simple complex

syntax symbolic synthesis symbolic synthesis

simple ! ! ! 7

complex 7 ! 7 7

Tim Blazytko (RUB) 7 / 25

Approach

Simplification of instruction traces1. dissecting trace intro trace windows2. random sampling of each trace window3. synthesis of trace windows

Tim Blazytko (RUB) 8 / 25

Trace dissectionSplit at indirect control-flow transfers

mov rax, 0x8add rax, rbxjmp rdxinc raxretmov rdx, 0x1ret

mov rax, 0x8add rax, rbxjmp rdx

Trace window 1

inc raxret

Trace window 2

mov rdx, 0x1ret

Trace window 3

Tim Blazytko (RUB) 9 / 25

Random sampling1 mov rax, [rbp + 0x8]2 add rax, rcx3 mov [rbp + 0x8], rax4 add [rbp + 0x8], rdx

inputs: ~I = (M1, rcx, rdx)outputs: O1, O2

M1 rcx rdx O1 O2

2 5 7 7 141 7 10 8 186 10 15 16 31

120 27 0 147 147. . . . . . . . . . . . . . .

Tim Blazytko (RUB) 10 / 25

Synthesis of trace windows

M1 rcx rdx O1 O2

2 5 7 7 141 7 10 8 186 10 15 16 31

120 27 0 147 147. . . . . . . . . . . . . . .

We synthesize each output separately:

O1 = M1 + rcx

O2 = (M1 + rcx) + rdx

Tim Blazytko (RUB) 11 / 25

Program synthesis

probabilistic optimization problem

guided search towards more promising program candidates

based on Monte Carlo Tree Search (MCTS)

General ideaInput: I/O samples from program P

generate candidate program P ′ (based on prior knowledge)compare the I/O behavior of P ′ to Pbackpropagation

Tim Blazytko (RUB) 12 / 25

Running exampleWe want to synthesize

f (a, b) := a + b mod 23

The set of I/O samples is

a b O

2 2 45 3 03 0 3

Tim Blazytko (RUB) 13 / 25

Context-free grammar

U → U + U | U ∗ U | a | b

non-terminal symbols: U

a terminal symbol for each input: {a, b}

sentences of the grammar are candidate programs: a + b

intermediate programs contain non-terminal symbols: U + U

U ⇒ U + U ⇒ U + b ⇒ a + b

Tim Blazytko (RUB) 14 / 25

Which intermediate program is more promising?

1. derive a random program candidate from the intermediate program

2. compare I/O behavior to the original program

U ∗ U ⇒ · · · ⇒ ((a + a) ∗ (b ∗ a))⇒ g(a, b) := ((a + a) ∗ (b ∗ a)) mod 23

a b O∗

2 2 05 3 63 0 0

U + U ⇒ · · · ⇒ (a + (b + b))⇒ h(a, b) := (a + (b + b)) mod 23

a b O+

2 2 65 3 33 0 3

We come back to this in a few minutes.Tim Blazytko (RUB) 15 / 25

Measuring output similarityHow close is the I/O behavior to the original program?

output similarity is represented by a score

score 1.0: equivalent output behavior for all samples

arithmetic mean of different similarity metrics defines the score

We compare

how close two values are numerically (arithmetic distance)

in how many bits two values differ (Hamming distance)

if two values are in the same range (leading/trailing zeros/ones)

Tim Blazytko (RUB) 16 / 25

Example: Hamming distance and leading zeros

similarity(O, O′) := hamming(O, O′) + lz(O, O′)2

U ∗ U: g(a, b)O O∗ hamming lz similarity

4 0 0.67 0 0.3350 6 0.34 0 0.173 0 0.34 0 0.34

⇒ average similarity: 0.28

U + U: h(a, b)O O+ hamming lz similarity

4 6 0.67 1.0 0.8350 3 0.34 0.34 0.343 3 1.0 1.0 1.0

⇒ average similarity: 0.73⇒ from U + U derived program candidate is more promising⇒ next generated program candidate more-likely based on U + U than U ∗ U

Tim Blazytko (RUB) 17 / 25

Evaluation

simplification of Mixed Boolean-Arithmetic

Tigress Obfuscator

synthesis of arithmetic VM instruction handlers

commercial versions of VMProtect and Themida

ROP gadget analysis

VerificationAll synthesis results have been verified by manual reverse engineering.

Tim Blazytko (RUB) 18 / 25

Mixed Boolean-Arithmeticint p10 (int v0, int v1, int v2, int v3, int v4){

int r = ((~ v0) - v4);

return r;}

generated 500 random expressions

two stages of arithmetic encoding

synthesized 448 expressions (90%) in the first run

4 seconds per synthesis task

Tim Blazytko (RUB) 19 / 25

Probabilistic synthesis behavior

0 2 4 6 8 100

100

200

300

400

500

# synthesis runs

#synthesiz

edexpressio

ns

Tim Blazytko (RUB) 20 / 25

Arithmetic VM instruction handler

mov r15, 0x200xor r15, 0x800mov rbx, rbpadd rbx, 0xc0mov rbx, qword ptr [rbx]mov r13, 1mov rcx, 0mov r15, rbpadd r15, 0xc0or rcx, 0x88add rbx, 0xbmov r15, qword ptr [r15]or r12, 0xffffffff80000000sub rcx, 0x78movzx r10, word ptr [rbx]xor r12, r13add r12, 0xffffadd r15, 0mov r8, rbpsub rcx, 0x10or r12, r12or rcx, 0x800movzx r11, word ptr [r15]xor rcx, 0x800mov r12, r15add r8, 0xor r12, 0xf0mov rbx, 0x58add r11, rbpxor rbx, 0x800and r12, 0x20add rbx, 0x800mov r11, qword ptr [r11]add rbx, 1and r12, r9mov rdx, 1xor r10d, dword ptr [r8]sub r9, r11pushfq xor rbx, 0xf0xor rbx, 0x800and rdx, r8mov r12, rbpxor rdx, 0x20sub rbx, 4add r11, 0x2549b044or rbx, 0x78and rdx, r10mov rax, 0add r12, 0x42

mov r15, rdxxor r10d, dword ptr [r12]sub r15, 0x800or rdx, 0x400mov rsi, 0x200mov r14, rbpsub rsi, rsimov rdi, rbpmov r8, 0x400sub rsi, r9sub r8, rsiadd r14, 0add rsi, raxand r8, 0x88xor rsi, r14mov rsi, rbpadd rdi, 0xc0sub r8, rdiadd r8, 0x78add rsi, 4mov rcx, 0x200mov rdi, qword ptr [rdi]add dword ptr [rsi], 0x2549b044xor rcx, 0xf0add rcx, r10add rdi, 6mov r8, 0x400mov ax, word ptr [rdi]mov r8, 1mov rsi, rbpand rcx, 8sub rcx, 1mov rcx, rdiadd rsi, 0x29or rcx, 8mov r8, rsiadd rcx, 4mov r13b, byte ptr [rsi]cmp r13b, 0xd2jbe 0x4f2c1eand r8, r13or rcx, r13or rcx, 4mov rbx, rbpor rcx, 4sub rcx, 0x400add rax, rbpor rcx, 0x80add rcx, 0x80add rbx, 0x5a

add r8, 1or r8, 0x78add word ptr [rbx], r10wmov r15, raxsub r15, raxpop r9mov rcx, rbpadd rcx, 0xc0mov rcx, qword ptr [rcx]add rcx, 8movzx r10, word ptr [rcx]mov r9, rbpadd r9, 0xor r10d, dword ptr [r9]and rdi, 0xffffffff80000000sub r13, 0xf0mov rsi, 0sub r13, 0x20mov rbx, rbpor r13, 0x88and rcx, 8mov r8, 0x58add rbx, 0xc0mov rbx, qword ptr [rbx]sub rcx, 0x20add rdi, 0x80sub r13, 0x10add rbx, 8mov si, word ptr [rbx]or r9, 0xffffsub r9, 1mov r9, rbpmov r12, 0x58add r9, 0sub r13, 0x80mov r15, r13or rcx, r12xor esi, dword ptr [r9]mov r10, rbpadd r10, 0xccsub r15, 0x20xor esi, dword ptr [r10]xor r13, 0x90add rdi, 0x10mov r14, rsimov rdx, rbpadd rdx, 0add dword ptr [rdx], esixor r12, 1mov r13, r15

or r14, r14mov rax, rbpand rcx, r13add rax, 4sub r8, -0x80000000add r13, 0xffffand rcx, 0x20mov r10, rbpadd r13, r15add r14, r8add r10, 0x89xor word ptr [r10], sixor rdx, r11mov rsi, rbpsub rdx, rbxand rax, 0x40or rbx, 0xf0add rsi, 0x5amov r8, rcxmovzx rsi, word ptr [rsi]mov rax, 0x200mov r14, rbpand rax, rdxand rcx, 0x20add r14, 0x89or rax, 0x40xor si, 0x7a28add rdx, 0x78add rdx, 0x20movzx r14, word ptr [r14]mov rcx, 0x58add rsi, rbpxor rax, rdxadd r8, 0x80mov r15, rsiadd r14, rbpadd r8, r15mov rbx, 0and rdx, 0x10mov r14, qword ptr [r14]add qword ptr [rsi], r14pushfq xor r11, r14add r15, r14mov r13, 0x12mov r8, 0and r14, 0x88and r13, 0x40add r13, 1mov rdx, rbp

mov r14, 0x200add rdx, 0xc0add r11, r14or r15, 0x88mov rdx, qword ptr [rdx]add rdx, 0xaadd r11, 0x78mov r8b, byte ptr [rdx]cmp r8b, 0je 0x4f2edemov rdx, rbpor r11, 0x40and r15, 1xor r11, 0x10add rdx, 0xc0or r14, 4mov r15, 0x12mov rdx, qword ptr [rdx]sub r11, r8add rdx, 4or r11, 0x80mov r8w, word ptr [rdx]mov r14, r8add r8, rbpxor r13, 4pop r10mov qword ptr [r8], r10jmp 0x4f2eeexor rsi, 0x88xor rbx, 0xffffffff80000000add rsi, 0x78mov r10b, 0x68mov r9, 0x12or rbx, r10and r15, 0x78mov r14, rbpor r9, 8add r14, 0x29xor rbx, rdiand r15, 0x3for byte ptr [r14], r10bmov rax, 0x58mov r8, rbpsub rsi, 0x78add r8, 0x127mov rdi, rbxxor rbx, 0x3fmov r8, qword ptr [r8]xor rsi, 1mov rax, rbp

add r15, 0x3for r15, 0xffffffff80000000and rsi, r9add rax, 0xc0add rdi, r14or rsi, 1mov rax, qword ptr [rax]and rdi, 0x7fffffffadd rax, 2sub rsi, 4or rbx, rsimovzx rax, word ptr [rax]mov r9, rbpmov r13, 0x200mov r10, 0x58add r9, 0or r10, 0x20add eax, dword ptr [r9]xor r10, 0x40add eax, 0x3f505c07add r15, 0x88mov r12, rbpor rdi, 0x90add r12, 0or rbx, 0x80add rdi, 0xf0mov r13, 0x400add dword ptr [r12], eaxand rsi, r8or r10, 8and rbx, 0x20and rax, 0xffffmov r11, 0add r13, r8or rbx, 1shl rax, 3add r8, raxor rbx, r15sub r15, 0x10or r11, r13mov rbx, qword ptr [r8]mov rdx, rbpsub r13, 0x80add rdx, 0xc0add qword ptr [rdx], 0xdjmp rbx

u64 res = M13 + M14u64 res = M13 + M14

Tim Blazytko (RUB) 21 / 25

Arithmetic VM instruction handler

VMProtect Themida

#unique trace windows 449 106#instructions per window 49 258#inputs per window 2 15#outputs per window 2 10#synthesis tasks 1,123 1,092

I/O sampling time (s) 118 60synthesis time per task (s) 3.7 9.1

VMProtect: 194 out of 196 handlers (98%)

Themida: 34 out of 36 handlers (no I/O samples for 2 handlers)

Tim Blazytko (RUB) 22 / 25

ROP gadget analysis

inc eaxpop ebpret

78 unique gadgets3 inputs and 2 outputs on averagefound partial semantics for 97% of the gadgetssynthesized 91% of the 178 outputs

Synthesis results:

O1 = eax + 1O2 = esp + 4

Tim Blazytko (RUB) 23 / 25

Limitations

trace window boundaries

semantic complexity

non-deterministic functions

point functions (e.g., hash comparisons)

confusion and diffusion (cryptography)

Tim Blazytko (RUB) 24 / 25

Conclusion

traditional deobfuscation techniques are limited by code’s complexity

program synthesis is limited by the code’s semantic complexity

⇒ succeeds where traditional approaches fail

introduced a generic approach for trace simplification

demonstrated that program synthesis is applicable to real-world obfuscated code

Tim Blazytko (RUB) 25 / 25

References I

Cameron B Browne et al. ‘A Survey of Monte Carlo Tree Search Methods’. In: IEEETransactions on Computational Intelligence and AI in Games (2012).

Tim Blazytko (RUB) 26 / 25

Monte Carlo tree search (MCTS)Introduction

general game playing, Computer Go

reinforcement learning

does not require much domain knowledge

efficient tree search for exponential decision trees

based on random walks and Monte Carlo simulations

synthesis as stochastic optimization problem

Tim Blazytko (RUB) 27 / 25

Monte Carlo tree search (MCTS)Algorithm

1. node selection

select best child node (exploration vs. exploitation trade-off)

2. node expansion

derive new game states

3. simulation

random playoutsa score represents the node’s quality

4. backpropagation

update the path’s quality

Tim Blazytko (RUB) 28 / 25

Monte Carlo tree search (MCTS)Visualization

TreePolicy

DefaultPolicy

Selection Expansion Simulation Backpropagation

Figure: MCTS algorithm [1]

Tim Blazytko (RUB) 29 / 25

SelectionUpper confidence bound for trees (UCT)

X j + C√

ln nnj

average child reward: X j

number of simulations (parent node): n

number of simulations (child node): nj

exploration-exploitation constant: C

Tim Blazytko (RUB) 30 / 25

SelectionSimulated Annealing UCT (SA-UCT)

X j + T√

ln nnj

dynamic parameter: T = C N−iN

exploration-exploitation constant: C

maximal MCTS rounds: N

current MCTS round: i

Focus shifts to exploitation over time.

Tim Blazytko (RUB) 31 / 25

Synthesis tree

U

U U * U U +

U b + U U U + + U a +

U U * a + b a +

U U U * +

a b

Tim Blazytko (RUB) 32 / 25

Grammar components

addition, multiplication

unary/binary minus

signed/unsigned division

signed/unsigned remainder

logical and arithmetic shifts

unary/binary bitwise operations

zero/sign extend

extract

concat

Tim Blazytko (RUB) 33 / 25

Expression derivation

U U U ∗ +⇔ (U + (U ∗ U))

+

U*

UU

apply random production rule to top-most-right-most U

Tim Blazytko (RUB) 34 / 25

Random playout

AlgorithmInput: Set of I/O samples S1. randomly derive terminal expression T from current node2. reward := 03. for all ~I, O ∈ S

3.1 evaluate terminal expression O′ := T (~I)3.2 reward := similarity(O, O′) + reward

4. return reward|S|

Tim Blazytko (RUB) 35 / 25

Backpropagation

AlgorithmInput: current node n1. WHILE n 6= root

1.1 update the nodes average reward1.2 increment the nodes playout count1.3 n := n.parent

Tim Blazytko (RUB) 36 / 25


Recommended