Syntia: Synthesizing the Semantics of Obfuscated Code
Tim Blazytko Moritz Contag Cornelius Aschermann Thorsten Holz
Ruhr-Universität Bochum
August 17, 2017
Code obfuscation
�~I = (i1, . . . , in) ~O = (o1, . . . , om)
?~I = (i1, . . . , in) ~O = (o1, . . . , om)
semantics-preserving transformation
DRM systems, software protection systems, malware
Tim Blazytko (RUB) 2 / 25
Mixed Boolean-Arithmetic
x + y + z(((x ⊕ y) + ((x ∧ y) << 1)) ∨ z) + (((x ⊕ y) + ((x ∧ y) << 1)) ∧ z)
hard to simplify symbolically (NP-complete)
Tim Blazytko (RUB) 3 / 25
Virtual Machine-based obfuscation
VM Entryswitch from native
to VM context
Native Code
5b 60 97 84 66 d8 aa 11 22
Bytecode
Fetch
handler_add8handler_mul16handler_not8
…handler_sub32
Handler Table
Decode
Execute
obfuscated code is interpreted by virtual CPUTim Blazytko (RUB) 4 / 25
Related work
Yadegari et al. use taint analysis and symbolic execution for deobfuscation(S&P 2015)
Banescu et al. introduce code obfuscation against symbolic execution attacks(ACSAC 2016)
Contributionsorthogonal approach to traditional techniqueslearn the code’s semantic based on its I/O behaviorgeneric approach for trace simplification via program synthesis
Tim Blazytko (RUB) 5 / 25
Syntactic versus semantic complexityRAX = { ( ( ( ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + 0x5483B8CA ) * 0xA7E08A93 ) + 0x4FC311F5 ) 0 32, 0x0 32 64 }
RAX = ((M3 * M2) ^ M4)RAX = ((M3 * M2) ^ M4)
Tim Blazytko (RUB) 6 / 25
Symbolic execution and program synthesis
semantic
simple complex
syntax symbolic synthesis symbolic synthesis
simple ! ! ! 7
complex 7 ! 7 7
Tim Blazytko (RUB) 7 / 25
Approach
Simplification of instruction traces1. dissecting trace intro trace windows2. random sampling of each trace window3. synthesis of trace windows
Tim Blazytko (RUB) 8 / 25
Trace dissectionSplit at indirect control-flow transfers
mov rax, 0x8add rax, rbxjmp rdxinc raxretmov rdx, 0x1ret
mov rax, 0x8add rax, rbxjmp rdx
Trace window 1
inc raxret
Trace window 2
mov rdx, 0x1ret
Trace window 3
Tim Blazytko (RUB) 9 / 25
Random sampling1 mov rax, [rbp + 0x8]2 add rax, rcx3 mov [rbp + 0x8], rax4 add [rbp + 0x8], rdx
inputs: ~I = (M1, rcx, rdx)outputs: O1, O2
M1 rcx rdx O1 O2
2 5 7 7 141 7 10 8 186 10 15 16 31
120 27 0 147 147. . . . . . . . . . . . . . .
Tim Blazytko (RUB) 10 / 25
Synthesis of trace windows
M1 rcx rdx O1 O2
2 5 7 7 141 7 10 8 186 10 15 16 31
120 27 0 147 147. . . . . . . . . . . . . . .
We synthesize each output separately:
O1 = M1 + rcx
O2 = (M1 + rcx) + rdx
Tim Blazytko (RUB) 11 / 25
Program synthesis
probabilistic optimization problem
guided search towards more promising program candidates
based on Monte Carlo Tree Search (MCTS)
General ideaInput: I/O samples from program P
generate candidate program P ′ (based on prior knowledge)compare the I/O behavior of P ′ to Pbackpropagation
Tim Blazytko (RUB) 12 / 25
Running exampleWe want to synthesize
f (a, b) := a + b mod 23
The set of I/O samples is
a b O
2 2 45 3 03 0 3
Tim Blazytko (RUB) 13 / 25
Context-free grammar
U → U + U | U ∗ U | a | b
non-terminal symbols: U
a terminal symbol for each input: {a, b}
sentences of the grammar are candidate programs: a + b
intermediate programs contain non-terminal symbols: U + U
U ⇒ U + U ⇒ U + b ⇒ a + b
Tim Blazytko (RUB) 14 / 25
Which intermediate program is more promising?
1. derive a random program candidate from the intermediate program
2. compare I/O behavior to the original program
U ∗ U ⇒ · · · ⇒ ((a + a) ∗ (b ∗ a))⇒ g(a, b) := ((a + a) ∗ (b ∗ a)) mod 23
a b O∗
2 2 05 3 63 0 0
U + U ⇒ · · · ⇒ (a + (b + b))⇒ h(a, b) := (a + (b + b)) mod 23
a b O+
2 2 65 3 33 0 3
We come back to this in a few minutes.Tim Blazytko (RUB) 15 / 25
Measuring output similarityHow close is the I/O behavior to the original program?
output similarity is represented by a score
score 1.0: equivalent output behavior for all samples
arithmetic mean of different similarity metrics defines the score
We compare
how close two values are numerically (arithmetic distance)
in how many bits two values differ (Hamming distance)
if two values are in the same range (leading/trailing zeros/ones)
Tim Blazytko (RUB) 16 / 25
Example: Hamming distance and leading zeros
similarity(O, O′) := hamming(O, O′) + lz(O, O′)2
U ∗ U: g(a, b)O O∗ hamming lz similarity
4 0 0.67 0 0.3350 6 0.34 0 0.173 0 0.34 0 0.34
⇒ average similarity: 0.28
U + U: h(a, b)O O+ hamming lz similarity
4 6 0.67 1.0 0.8350 3 0.34 0.34 0.343 3 1.0 1.0 1.0
⇒ average similarity: 0.73⇒ from U + U derived program candidate is more promising⇒ next generated program candidate more-likely based on U + U than U ∗ U
Tim Blazytko (RUB) 17 / 25
Evaluation
simplification of Mixed Boolean-Arithmetic
Tigress Obfuscator
synthesis of arithmetic VM instruction handlers
commercial versions of VMProtect and Themida
ROP gadget analysis
VerificationAll synthesis results have been verified by manual reverse engineering.
Tim Blazytko (RUB) 18 / 25
Mixed Boolean-Arithmeticint p10 (int v0, int v1, int v2, int v3, int v4){
int r = ((~ v0) - v4);
return r;}
generated 500 random expressions
two stages of arithmetic encoding
synthesized 448 expressions (90%) in the first run
4 seconds per synthesis task
Tim Blazytko (RUB) 19 / 25
Probabilistic synthesis behavior
0 2 4 6 8 100
100
200
300
400
500
# synthesis runs
#synthesiz
edexpressio
ns
Tim Blazytko (RUB) 20 / 25
Arithmetic VM instruction handler
mov r15, 0x200xor r15, 0x800mov rbx, rbpadd rbx, 0xc0mov rbx, qword ptr [rbx]mov r13, 1mov rcx, 0mov r15, rbpadd r15, 0xc0or rcx, 0x88add rbx, 0xbmov r15, qword ptr [r15]or r12, 0xffffffff80000000sub rcx, 0x78movzx r10, word ptr [rbx]xor r12, r13add r12, 0xffffadd r15, 0mov r8, rbpsub rcx, 0x10or r12, r12or rcx, 0x800movzx r11, word ptr [r15]xor rcx, 0x800mov r12, r15add r8, 0xor r12, 0xf0mov rbx, 0x58add r11, rbpxor rbx, 0x800and r12, 0x20add rbx, 0x800mov r11, qword ptr [r11]add rbx, 1and r12, r9mov rdx, 1xor r10d, dword ptr [r8]sub r9, r11pushfq xor rbx, 0xf0xor rbx, 0x800and rdx, r8mov r12, rbpxor rdx, 0x20sub rbx, 4add r11, 0x2549b044or rbx, 0x78and rdx, r10mov rax, 0add r12, 0x42
mov r15, rdxxor r10d, dword ptr [r12]sub r15, 0x800or rdx, 0x400mov rsi, 0x200mov r14, rbpsub rsi, rsimov rdi, rbpmov r8, 0x400sub rsi, r9sub r8, rsiadd r14, 0add rsi, raxand r8, 0x88xor rsi, r14mov rsi, rbpadd rdi, 0xc0sub r8, rdiadd r8, 0x78add rsi, 4mov rcx, 0x200mov rdi, qword ptr [rdi]add dword ptr [rsi], 0x2549b044xor rcx, 0xf0add rcx, r10add rdi, 6mov r8, 0x400mov ax, word ptr [rdi]mov r8, 1mov rsi, rbpand rcx, 8sub rcx, 1mov rcx, rdiadd rsi, 0x29or rcx, 8mov r8, rsiadd rcx, 4mov r13b, byte ptr [rsi]cmp r13b, 0xd2jbe 0x4f2c1eand r8, r13or rcx, r13or rcx, 4mov rbx, rbpor rcx, 4sub rcx, 0x400add rax, rbpor rcx, 0x80add rcx, 0x80add rbx, 0x5a
add r8, 1or r8, 0x78add word ptr [rbx], r10wmov r15, raxsub r15, raxpop r9mov rcx, rbpadd rcx, 0xc0mov rcx, qword ptr [rcx]add rcx, 8movzx r10, word ptr [rcx]mov r9, rbpadd r9, 0xor r10d, dword ptr [r9]and rdi, 0xffffffff80000000sub r13, 0xf0mov rsi, 0sub r13, 0x20mov rbx, rbpor r13, 0x88and rcx, 8mov r8, 0x58add rbx, 0xc0mov rbx, qword ptr [rbx]sub rcx, 0x20add rdi, 0x80sub r13, 0x10add rbx, 8mov si, word ptr [rbx]or r9, 0xffffsub r9, 1mov r9, rbpmov r12, 0x58add r9, 0sub r13, 0x80mov r15, r13or rcx, r12xor esi, dword ptr [r9]mov r10, rbpadd r10, 0xccsub r15, 0x20xor esi, dword ptr [r10]xor r13, 0x90add rdi, 0x10mov r14, rsimov rdx, rbpadd rdx, 0add dword ptr [rdx], esixor r12, 1mov r13, r15
or r14, r14mov rax, rbpand rcx, r13add rax, 4sub r8, -0x80000000add r13, 0xffffand rcx, 0x20mov r10, rbpadd r13, r15add r14, r8add r10, 0x89xor word ptr [r10], sixor rdx, r11mov rsi, rbpsub rdx, rbxand rax, 0x40or rbx, 0xf0add rsi, 0x5amov r8, rcxmovzx rsi, word ptr [rsi]mov rax, 0x200mov r14, rbpand rax, rdxand rcx, 0x20add r14, 0x89or rax, 0x40xor si, 0x7a28add rdx, 0x78add rdx, 0x20movzx r14, word ptr [r14]mov rcx, 0x58add rsi, rbpxor rax, rdxadd r8, 0x80mov r15, rsiadd r14, rbpadd r8, r15mov rbx, 0and rdx, 0x10mov r14, qword ptr [r14]add qword ptr [rsi], r14pushfq xor r11, r14add r15, r14mov r13, 0x12mov r8, 0and r14, 0x88and r13, 0x40add r13, 1mov rdx, rbp
mov r14, 0x200add rdx, 0xc0add r11, r14or r15, 0x88mov rdx, qword ptr [rdx]add rdx, 0xaadd r11, 0x78mov r8b, byte ptr [rdx]cmp r8b, 0je 0x4f2edemov rdx, rbpor r11, 0x40and r15, 1xor r11, 0x10add rdx, 0xc0or r14, 4mov r15, 0x12mov rdx, qword ptr [rdx]sub r11, r8add rdx, 4or r11, 0x80mov r8w, word ptr [rdx]mov r14, r8add r8, rbpxor r13, 4pop r10mov qword ptr [r8], r10jmp 0x4f2eeexor rsi, 0x88xor rbx, 0xffffffff80000000add rsi, 0x78mov r10b, 0x68mov r9, 0x12or rbx, r10and r15, 0x78mov r14, rbpor r9, 8add r14, 0x29xor rbx, rdiand r15, 0x3for byte ptr [r14], r10bmov rax, 0x58mov r8, rbpsub rsi, 0x78add r8, 0x127mov rdi, rbxxor rbx, 0x3fmov r8, qword ptr [r8]xor rsi, 1mov rax, rbp
add r15, 0x3for r15, 0xffffffff80000000and rsi, r9add rax, 0xc0add rdi, r14or rsi, 1mov rax, qword ptr [rax]and rdi, 0x7fffffffadd rax, 2sub rsi, 4or rbx, rsimovzx rax, word ptr [rax]mov r9, rbpmov r13, 0x200mov r10, 0x58add r9, 0or r10, 0x20add eax, dword ptr [r9]xor r10, 0x40add eax, 0x3f505c07add r15, 0x88mov r12, rbpor rdi, 0x90add r12, 0or rbx, 0x80add rdi, 0xf0mov r13, 0x400add dword ptr [r12], eaxand rsi, r8or r10, 8and rbx, 0x20and rax, 0xffffmov r11, 0add r13, r8or rbx, 1shl rax, 3add r8, raxor rbx, r15sub r15, 0x10or r11, r13mov rbx, qword ptr [r8]mov rdx, rbpsub r13, 0x80add rdx, 0xc0add qword ptr [rdx], 0xdjmp rbx
u64 res = M13 + M14u64 res = M13 + M14
Tim Blazytko (RUB) 21 / 25
Arithmetic VM instruction handler
VMProtect Themida
#unique trace windows 449 106#instructions per window 49 258#inputs per window 2 15#outputs per window 2 10#synthesis tasks 1,123 1,092
I/O sampling time (s) 118 60synthesis time per task (s) 3.7 9.1
VMProtect: 194 out of 196 handlers (98%)
Themida: 34 out of 36 handlers (no I/O samples for 2 handlers)
Tim Blazytko (RUB) 22 / 25
ROP gadget analysis
inc eaxpop ebpret
78 unique gadgets3 inputs and 2 outputs on averagefound partial semantics for 97% of the gadgetssynthesized 91% of the 178 outputs
Synthesis results:
O1 = eax + 1O2 = esp + 4
Tim Blazytko (RUB) 23 / 25
Limitations
trace window boundaries
semantic complexity
non-deterministic functions
point functions (e.g., hash comparisons)
confusion and diffusion (cryptography)
Tim Blazytko (RUB) 24 / 25
Conclusion
traditional deobfuscation techniques are limited by code’s complexity
program synthesis is limited by the code’s semantic complexity
⇒ succeeds where traditional approaches fail
introduced a generic approach for trace simplification
demonstrated that program synthesis is applicable to real-world obfuscated code
Tim Blazytko (RUB) 25 / 25
References I
Cameron B Browne et al. ‘A Survey of Monte Carlo Tree Search Methods’. In: IEEETransactions on Computational Intelligence and AI in Games (2012).
Tim Blazytko (RUB) 26 / 25
Monte Carlo tree search (MCTS)Introduction
general game playing, Computer Go
reinforcement learning
does not require much domain knowledge
efficient tree search for exponential decision trees
based on random walks and Monte Carlo simulations
synthesis as stochastic optimization problem
Tim Blazytko (RUB) 27 / 25
Monte Carlo tree search (MCTS)Algorithm
1. node selection
select best child node (exploration vs. exploitation trade-off)
2. node expansion
derive new game states
3. simulation
random playoutsa score represents the node’s quality
4. backpropagation
update the path’s quality
Tim Blazytko (RUB) 28 / 25
Monte Carlo tree search (MCTS)Visualization
TreePolicy
DefaultPolicy
Selection Expansion Simulation Backpropagation
Figure: MCTS algorithm [1]
Tim Blazytko (RUB) 29 / 25
SelectionUpper confidence bound for trees (UCT)
X j + C√
ln nnj
average child reward: X j
number of simulations (parent node): n
number of simulations (child node): nj
exploration-exploitation constant: C
Tim Blazytko (RUB) 30 / 25
SelectionSimulated Annealing UCT (SA-UCT)
X j + T√
ln nnj
dynamic parameter: T = C N−iN
exploration-exploitation constant: C
maximal MCTS rounds: N
current MCTS round: i
Focus shifts to exploitation over time.
Tim Blazytko (RUB) 31 / 25
Synthesis tree
U
U U * U U +
U b + U U U + + U a +
U U * a + b a +
U U U * +
a b
Tim Blazytko (RUB) 32 / 25
Grammar components
addition, multiplication
unary/binary minus
signed/unsigned division
signed/unsigned remainder
logical and arithmetic shifts
unary/binary bitwise operations
zero/sign extend
extract
concat
Tim Blazytko (RUB) 33 / 25
Expression derivation
U U U ∗ +⇔ (U + (U ∗ U))
+
U*
UU
apply random production rule to top-most-right-most U
Tim Blazytko (RUB) 34 / 25
Random playout
AlgorithmInput: Set of I/O samples S1. randomly derive terminal expression T from current node2. reward := 03. for all ~I, O ∈ S
3.1 evaluate terminal expression O′ := T (~I)3.2 reward := similarity(O, O′) + reward
4. return reward|S|
Tim Blazytko (RUB) 35 / 25