Home >Documents >Syntia: Synthesizing the Semantics of Obfuscated Code mov r15, 0x200 xor r15, 0x800 mov rbx, rbp add...

Syntia: Synthesizing the Semantics of Obfuscated Code mov r15, 0x200 xor r15, 0x800 mov rbx, rbp add...

Date post:17-Jul-2018
Category:
View:222 times
Download:0 times
Share this document with a friend
Transcript:
  • Syntia: Synthesizing the Semantics of Obfuscated Code

    Tim Blazytko Moritz Contag Cornelius Aschermann Thorsten Holz

    Ruhr-Universitt Bochum

    August 17, 2017

  • Code obfuscation

    ~I = (i1, . . . , in) ~O = (o1, . . . , om)

    ?~I = (i1, . . . , in) ~O = (o1, . . . , om)

    semantics-preserving transformation

    DRM systems, software protection systems, malware

    Tim Blazytko (RUB) 2 / 25

  • Mixed Boolean-Arithmetic

    x + y + z(((x y) + ((x y)

  • Virtual Machine-based obfuscation

    VM Entryswitch from native

    to VM context

    Native Code

    5b 60 97 84 66 d8 aa 11 22Bytecode

    Fetch

    handler_add8handler_mul16handler_not8

    handler_sub32

    Handler Table

    Decode

    Execute

    obfuscated code is interpreted by virtual CPUTim Blazytko (RUB) 4 / 25

  • Related work

    Yadegari et al. use taint analysis and symbolic execution for deobfuscation(S&P 2015)

    Banescu et al. introduce code obfuscation against symbolic execution attacks(ACSAC 2016)

    Contributionsorthogonal approach to traditional techniqueslearn the codes semantic based on its I/O behaviorgeneric approach for trace simplification via program synthesis

    Tim Blazytko (RUB) 5 / 25

  • Syntactic versus semantic complexityRAX = { ( ( ( ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( - ( ( ( ( ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xB03CEE0B ) ) + ( - ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0xA3665E57 ) * 0xB03CEE0B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + ( { ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) 0 32, ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + ( - ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) ) + 0x6DB7E0E ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + ( { ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) 0 32, ( ( ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) + 0x4FC311F5 ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) ) + 0x4FC311F5 ) ) * 0x55BE239B ) + ( - ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) ) + 0x5C99A1A9 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } * { ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [0:32] 0 32, ( ( { ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + ( ( ( ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF0 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0x581F756D ) + 0xB03CEE0A ) | ( ( - ( ( ( @32[ ( RBP_init + 0xFFFFFFFFFFFFFFF8 ) ] * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) ) + 0xB03CEE0A ) ) * 0x55BE239B ) + 0x5C99A1A9 ) * 0xA7E08A93 ) 0 32, 0x0 32 64 } + 0x1 ) [31:32]? ( 0xFFFFFFFF,0x0 ) ) 32 64 } ) [0:32] + 0x5483B8CA ) * 0xA7E08A93 ) + 0x4FC311F5 ) 0 32, 0x0 32 64 }

    RAX = ((M3 * M2) ^ M4)RAX = ((M3 * M2) ^ M4)

    Tim Blazytko (RUB) 6 / 25

  • Symbolic execution and program synthesis

    semantic

    simple complex

    syntax symbolic synthesis symbolic synthesis

    simple ! ! ! 7

    complex 7 ! 7 7

    Tim Blazytko (RUB) 7 / 25

  • Approach

    Simplification of instruction traces1. dissecting trace intro trace windows2. random sampling of each trace window3. synthesis of trace windows

    Tim Blazytko (RUB) 8 / 25

  • Trace dissectionSplit at indirect control-flow transfers

    mov rax, 0x8add rax, rbxjmp rdxinc raxretmov rdx, 0x1ret

    mov rax, 0x8add rax, rbxjmp rdx

    Trace window 1

    inc raxret

    Trace window 2

    mov rdx, 0x1ret

    Trace window 3

    Tim Blazytko (RUB) 9 / 25

  • Random sampling1 mov rax, [rbp + 0x8]2 add rax, rcx3 mov [rbp + 0x8], rax4 add [rbp + 0x8], rdx

    inputs: ~I = (M1, rcx, rdx)outputs: O1, O2

    M1 rcx rdx O1 O2

    2 5 7 7 141 7 10 8 186 10 15 16 31

    120 27 0 147 147. . . . . . . . . . . . . . .

    Tim Blazytko (RUB) 10 / 25

  • Synthesis of trace windows

    M1 rcx rdx O1 O2

    2 5 7 7 141 7 10 8 186 10 15 16 31

    120 27 0 147 147. . . . . . . . . . . . . . .

    We synthesize each output separately:

    O1 = M1 + rcx

    O2 = (M1 + rcx) + rdx

    Tim Blazytko (RUB) 11 / 25

  • Program synthesis

    probabilistic optimization problem

    guided search towards more promising program candidates

    based on Monte Carlo Tree Search (MCTS)

    General ideaInput: I/O samples from program P

    generate candidate program P (based on prior knowledge)compare the I/O behavior of P to Pbackpropagation

    Tim Blazytko (RUB) 12 / 25

  • Running exampleWe want to synthesize

    f (a, b) := a + b mod 23

    The set of I/O samples is

    a b O

    2 2 45 3 03 0 3

    Tim Blazytko (RUB) 13 / 25

  • Context-free grammar

    U U + U | U U | a | b

    non-terminal symbols: U

    a terminal symbol for each input: {a, b}

    sentences of the grammar are candidate programs: a + b

    intermediate programs contain non-terminal symbols: U + U

    U U + U U + b a + b

    Tim Blazytko (RUB) 14 / 25

  • Which intermediate program is more promising?

    1. derive a random program candidate from the intermediate program

    2. compare I/O behavior to the original program

    U U ((a + a) (b a)) g(a, b) := ((a + a) (b a)) mod 23

    a b O

    2 2 05 3 63 0 0

    U + U (a + (b + b)) h(a, b) := (a + (b + b)) mod 23

    a b O+

    2 2 65 3 33 0 3

    We come back to this in a few minutes.Tim Blazytko (RUB) 15 / 25

  • Measuring output similarityHow close is the I/O behavior to the original program?

    output similarity is represented by a score

    score 1.0: equivalent output behavior for all samples

    arithmetic mean of different similarity metrics defines the score

    We compare

    how close two values are numerically (arithmetic distance)

    in how many bits two values differ (Hamming distance)

    if two values are in the same range (leading/trailing zeros/ones)

    Tim Blazytko (RUB) 16 / 25

  • Example: Hamming distance and leading zeros

    similarity(O, O) := hamming(O, O) + lz(O, O)

    2

    U U: g(a, b)O O hamming lz similarity

    4 0 0.67 0 0.3350 6 0.34 0 0.173 0 0.34 0 0.34

    average similarity: 0.28

    U + U: h(a, b)O O+ hamming lz similarity

    4 6 0.67 1.0 0.8350 3 0.34 0.34 0.343 3 1.0 1.0 1.0

    average similarity: 0.73 from U + U derived program candidate is more promising next generated program candidate more-likely based on U + U than U U

    Tim Blazytko (RUB) 17 / 25

  • Evaluation

    simplification of Mixed Boolean-Arithmetic

    Tigress Obfuscator

    synthesis of arithmetic VM instruction handlers

    commercial versions of VMProtect and Themida

    ROP gadget analysis

    VerificationAll synthesis results have been verified by manual reverse engineering.

    Tim Blazytko (RUB) 18 / 25

  • Mixed Boolean-Arithmeticint p10 (int v0, int v1, int v2, int v3, int v4){

    int r = ((~ v0) - v4);

    return r;}

    generated 500 random expressions

    two stages of arithmetic encoding

    synthesized 448 expressions (90%) in the first run

    4 seconds per synthesis task

    Tim Blazytko (RUB) 19 / 25

  • Probabilistic synthesis behavior

    0 2 4 6 8 100

    100

    200

    300

    400

    500

    # synthesis runs

    #synthesiz

    edexpressio

    ns

    Tim Blazytko (RUB) 20 / 25

  • Arithmetic VM instruction handler

    mov r15, 0x200xor r15, 0x800mov rbx, rbpadd rbx, 0xc0mov rbx, qword ptr [rbx]mov r13, 1mov rcx, 0mov r15, rbpadd r15, 0xc0or rcx, 0x88add rbx, 0xbmov r15, qword ptr [r15]or r12, 0xffffffff80000000sub rcx, 0x78movzx r10, word ptr [rbx]xor r12, r13add r12, 0xffffadd r15, 0mov r8, rbpsub rcx, 0x10or r12, r12or rcx, 0x800movzx r11, word ptr [r15]xor rcx, 0x800mov r12, r15add r8, 0xor r12, 0xf0mov rbx, 0x58add r11, rbpxor rbx, 0x800and r12, 0x20add rbx, 0x800mov r11, qword ptr [r11]add rbx, 1and r12, r9mov rdx, 1xor r10d, dword ptr [r8]sub r9, r11pushfq xor rbx, 0xf0xor rbx, 0x800and rdx, r8mov r12, rbpxor rdx, 0x20sub rbx, 4add r11, 0x2549b044or rbx, 0x78and rdx, r10mov rax, 0add r12, 0x42

    mov r15, rdxxor r10d, dword ptr [r12]sub r15, 0x800or rdx, 0x400mov rsi, 0x200mov r14, rbpsub rsi, rsimov rdi, rbpmov r8, 0x400sub rsi, r9sub r8, rsiadd r14, 0add rsi, raxand r8, 0x88xor rsi, r14mov rsi, rbpadd rdi, 0xc0sub r8, rdiadd r8, 0x78add rsi, 4mov rcx, 0x200mov rdi, qword ptr [rdi]add dword ptr [rsi], 0x2549b044xor rcx, 0xf0add rcx, r10add rdi, 6mov r8, 0x400mov ax, word ptr [rdi]mov r8, 1mov rsi, rbpand rcx, 8sub rcx, 1mov rcx, rdiadd rsi, 0x29or rcx, 8mov r8, rsiadd rcx, 4mov r13b, byte ptr [rsi]cmp r13b, 0xd2jbe 0x4f2c1eand r8, r13or rcx, r13or rcx, 4mov rbx, rbpor rcx, 4sub rcx, 0x400add rax, rbpor rcx, 0x80add rcx, 0x80add rbx, 0x5a

    add r8, 1or r8, 0x78add word ptr [rbx], r10wmov r15, raxsub r15, raxpop r9mov rcx, rbpadd rcx, 0xc0mov rcx, qword ptr [rcx]add rcx, 8movzx r10, word ptr [rcx]mov r9, rbpadd r9, 0xor r10d, dword ptr [r9]and rdi, 0xffffffff80000000sub r13, 0xf0mov rsi, 0sub r13, 0x20mov rbx, rbpor r13, 0x88and rcx, 8mov r8, 0x58add rbx, 0xc0mov rbx, qword ptr [rbx]sub rcx, 0x20add rdi, 0x80sub r13, 0x10add rbx, 8mov si, word ptr [rbx]or r9, 0xffffsub r9, 1mov r9, rbpmov r12, 0x58add r9, 0sub r13, 0x80mov r15, r13or rcx, r12xor esi, dword ptr [r9]mov r10, rbpadd r10, 0xccsub r15, 0x20xor esi, dword ptr [r10]xor r13, 0x90add rdi, 0x10mov r14, rsimov rdx, rbpadd rdx, 0add dword ptr [rdx], esixor r12, 1mov r13, r15

    or r14, r14mov rax, rbpand rcx, r13add rax, 4sub r8, -0x80000000add r13, 0xffffand rcx, 0x20mov r10, rbpadd r13, r15add r14, r8add r10, 0x89xor word ptr [r10], sixor rdx, r11mov rsi, rbpsub rdx, rbxand rax, 0x40or rbx, 0xf0add rsi, 0x5amov r8, rcxmovzx rsi, word ptr [rsi]mov rax, 0x200mov r14, rbpand rax, rdxand rcx, 0x20add r14, 0x89or rax, 0x40xor si, 0x7a28add rdx, 0x78add rdx, 0x20movzx r14, word ptr [r14]mov rcx, 0x58add rsi, rbpxor rax, rdxadd r8, 0x80mov r15, rsiadd r14, rbpadd r8, r15mov rbx, 0and rdx, 0x10mov r14, qword ptr [r14]add qword ptr [rsi], r14pushfq xor r11, r14add r15, r14mov r13, 0x12mov r8, 0and r14, 0x88and r13, 0x40add r13, 1mov rdx, rbp

    mov r14, 0x200add rdx, 0xc0add r11, r14or r15, 0x88mov rdx, qword ptr [rdx]add rdx, 0xaadd r11, 0x78mov r8b, byte ptr [rdx]cmp r8b, 0je 0x4f2edemov rdx, rbpor r11, 0x40and r15, 1xor r11, 0x10add rdx, 0xc0or r14, 4mov r15, 0x12mov rdx, qword ptr [rdx]sub r11, r8add rdx, 4or r11, 0x80mov r8w, word ptr [rdx]mov r14, r8add r8, rbpxor r13, 4pop r10mov qword ptr [r8], r10jmp 0x4f2eeexor rsi, 0x88xor rbx, 0xffffffff80000000add rsi, 0x78mov r10b, 0x68mov r9, 0x12or rbx, r10and r15, 0x78mov r14, rbpor r9, 8add r14, 0x29xor rbx, rdiand r15, 0x3for byte ptr [r14], r10bmov rax, 0x58mov r8, rbpsub rsi, 0x78add r8, 0x127mov rdi, rbxxor rbx, 0x3fmov r8, qword ptr [r8]xor rsi, 1mov rax, rbp

    add r15, 0x3for r15, 0xffffffff80000000and rsi, r9add rax, 0xc0add rdi, r14or rsi, 1mov rax, qword ptr [rax]and rdi, 0x7fffffffadd rax, 2sub rsi, 4or rbx, rsimovzx rax, word ptr [rax]mov r9, rbpmov r13, 0x200mov r10, 0x58add r9, 0or r10, 0x20add eax, dword ptr [r9]xor r10, 0x40add eax, 0x3f505c07add r15, 0x88mov r12, rbpor rdi, 0x90add r12, 0or rbx, 0x80add rdi, 0xf0mov r13, 0x400add dword ptr [r12], eaxand rsi, r8or r10, 8and rbx, 0x20and rax, 0xffffmov r11, 0add r13, r8or rbx, 1shl rax, 3add r8, raxor rbx, r15sub r15, 0x10or r11, r13mov rbx, qword ptr [r8]mov rdx, rbpsub r13, 0x80add rdx, 0xc0add qword ptr [rdx], 0xdjmp rbx

    u64 res = M13 + M14u64 res = M13 + M14

    Tim Blazytko (RUB) 21 / 25

  • Arithmetic VM instruction handler

    VMProtect Themida

    #unique trace windows 449 106#instructions per window 49 258#inputs per window 2 15#outputs per window 2 10#synthesis tasks 1,123 1,092

    I/O sampling time (s) 118 60synthesis time per task (s) 3.7 9.1

    VMProtect: 194 out of 196 handlers (98%)

    Themida: 34 out of 36 handlers (no I/O samples for 2 handlers)

    Tim Blazytko (RUB) 22 / 25

  • ROP gadget analysis

    inc eaxpop ebpret

    78 unique gadgets3 inputs and 2 outputs on averagefound partial semantics for 97% of the gadgetssynthesized 91% of the 178 outputs

    Synthesis results:

    O1 = eax + 1O2 = esp + 4

    Tim Blazytko (RUB) 23 / 25

  • Limitations

    trace window boundaries

    semantic complexity

    non-deterministic functions

    point functions (e.g., hash comparisons)

    confusion and diffusion (cryptography)

    Tim Blazytko (RUB) 24 / 25

  • Conclusion

    traditional deobfuscation techniques are limited by codes complexity

    program synthesis is limited by the codes semantic complexity

    succeeds where traditional approaches fail

    introduced a generic approach for trace simplification

    demonstrated that program synthesis is applicable to real-world obfuscated code

    Tim Blazytko (RUB) 25 / 25

  • References I

    Cameron B Browne et al. A Survey of Monte Carlo Tree Search Methods. In: IEEETransactions on Computational Intelligence and AI in Games (2012).

    Tim Blazytko (RUB) 26 / 25

  • Monte Carlo tree search (MCTS)Introduction

    general game playing, Computer Go

    reinforcement learning

    does not require much domain knowledge

    efficient tree search for exponential decision trees

    based on random walks and Monte Carlo simulations

    synthesis as stochastic optimization problem

    Tim Blazytko (RUB) 27 / 25

  • Monte Carlo tree search (MCTS)Algorithm

    1. node selection

    select best child node (exploration vs. exploitation trade-off)

    2. node expansion

    derive new game states

    3. simulation

    random playoutsa score represents the nodes quality

    4. backpropagation

    update the paths quality

    Tim Blazytko (RUB) 28 / 25

  • Monte Carlo tree search (MCTS)Visualization

    TreePolicy

    DefaultPolicy

    Selection Expansion Simulation Backpropagation

    Figure: MCTS algorithm [1]

    Tim Blazytko (RUB) 29 / 25

  • SelectionUpper confidence bound for trees (UCT)

    X j + C

    ln nnj

    average child reward: X j

    number of simulations (parent node): n

    number of simulations (child node): nj

    exploration-exploitation constant: C

    Tim Blazytko (RUB) 30 / 25

  • SelectionSimulated Annealing UCT (SA-UCT)

    X j + T

    ln nnj

    dynamic parameter: T = C NiN

    exploration-exploitation constant: C

    maximal MCTS rounds: N

    current MCTS round: i

    Focus shifts to exploitation over time.

    Tim Blazytko (RUB) 31 / 25

  • Synthesis tree

    U

    U U * U U +

    U b + U U U + + U a +

    U U * a + b a +

    U U U * +

    a b

    Tim Blazytko (RUB) 32 / 25

  • Grammar components

    addition, multiplication

    unary/binary minus

    signed/unsigned division

    signed/unsigned remainder

    logical and arithmetic shifts

    unary/binary bitwise operations

    zero/sign extend

    extract

    concat

    Tim Blazytko (RUB) 33 / 25

  • Expression derivation

    U U U + (U + (U U))

    +

    U*

    UU

    apply random production rule to top-most-right-most U

    Tim Blazytko (RUB) 34 / 25

  • Random playout

    AlgorithmInput: Set of I/O samples S1. randomly derive terminal expression T from current node2. reward := 03. for all ~I, O S

    3.1 evaluate terminal expression O := T (~I)3.2 reward := similarity(O, O) + reward

    4. return reward|S|

    Tim Blazytko (RUB) 35 / 25

  • Backpropagation

    AlgorithmInput: current node n1. WHILE n 6= root

    1.1 update the nodes average reward1.2 increment the nodes playout count1.3 n := n.parent

    Tim Blazytko (RUB) 36 / 25

    AppendixReferences

of 36/36
Syntia: Synthesizing the Semantics of Obfuscated Code Tim Blazytko Moritz Contag Cornelius Aschermann Thorsten Holz Ruhr-Universität Bochum August 17, 2017
Embed Size (px)
Recommended