+ All Categories
Home > Documents > Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of...

Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of...

Date post: 19-Apr-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
40
Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel
Transcript
Page 1: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Neural Reverse Engineering of Stripped Binaries

Yaniv David, Uri Alon, Eran YahavTechnion, Israel

Page 2: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Reverse Engineering (RE) BinariesWhat, Why & How?

2

Page 3: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

RE – What & Why?

3

Malware?

Bug? find & fix it

Page 4: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

RE – How? Disassemblers

4

call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe

Page 5: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

RE – How? Disassemblers

5

call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe

No Names

No Types

Page 6: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

RE – How? Modern Disassemblers

6

Page 7: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

RE – How? Modern Disassemblers

7

Where to start?

Page 8: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Progress in Other Domains

8

Page 9: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Progress in the Source Code Domain

9http://jsnice.org

Page 10: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Progress in the Source Code Domain

10https://code2vec.org - code2vec: Learning Distributed Representations of Code

Page 11: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Un-Stripping Procedure Names

11

Page 12: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Un-Stripping Procedure Names

12

Start at the right place

Page 13: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Translate: Assembly Procedure → English

13

Page 14: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Sequence-To-Sequence (seq2seq) Models

• A basic approach:• LSTM encoder• LSTM decoder

14

estás

how are you

cómo

• LSTM with attention & Transformers are state of the art for seq2seq tasks (machine translation, speech recognition, etc.)

Page 15: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Binary Syntax Is Very Local

15

call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe

Page 16: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Binary Syntax Is Very Local

16

call getaddrinfomov rax, [rbp-30h]mov rdx, [rbp-50h]mov rdx, cs:688588dmov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]mov eax, [rbx+40h]cdqe

Global offsets local to

executable

Register allocation is local to instruction/BB

Stack offsets local to procedure

Page 17: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Finding Prediction Anchors

17

call getaddrinfomov rdx, cs:qword_68858mov rax, [rbp-30h]mov rdx, [rbp-50h]mov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]Mov eax, [rbp+var3C]cdqe

call getaddrinfo…

call strerror…

call setsockopt…

Not enough data and context

Focus On Calls

Page 18: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Finding Prediction Anchors

18

call getaddrinfomov rdx, cs:qword_68858mov rax, [rbp-30h]mov rdx, [rbp-50h]mov [rax], rdxmov rax, [rbp-30h]mov rdx, [rbp-580h]mov [rax+8], rdxmov rax, [rbp-30h]call strerrorsub rdx, raxidiv [rbp-28h]call setsockoptmov rdx, [rax]Mov eax, [rbp+var3C]cdqe

call getaddrinfo…

call strerror…

call setsockopt…

Not enough data and context

Focus On Calls

Combine binary program analysis with machine learning to find a sweet-spot

Page 19: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmented Call Sites as Learning Features

19

Page 20: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Using API Calls

20

…call getaddrinfo

…call strerror

…call setsockopt

…setsockopt(rdi,rsi,rdx,rcx,r8)

API calls Reconstructed API Call Sites

Calling Conventions + Library information

Page 21: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmenting Call Sites

21

setsockopt(rdi,rsi,rdx,rcx,r8)

call socket(...)mov [rbp-58h], raxmov rax, [rbp-58h]mov rdi, rax

mov rsi, 1

mov r8, 4

In C: setsocketopt(sock_var,…,1,4)

Page 22: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmenting Call Sites

Using concrete or abstracted values:

1. Concrete value (Integer, Enum, String)

2. ARG – procedure argument

3. GLOBAL - pointer to a global variable

4. RET – a return value from a call

5. STACK – pointer to stack memory

22

Less Informative

Page 23: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Pointer-Aware Slicing of Call Site Args

23getaddrinfo(rdi,rsi,rdx,rcx)

mov rdi, rax

mov rax, [rbp-68h] ∅

V(rax) P([rax])

P([rbp-68h])

mov [rbp-68h], rdi

V(rbp)

V

V(rdi)

∅ ∅

P([rdi])

Page 24: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmenting Call Site Arguments

24getaddrinfo(rdi,rsi,rdx,rcx)

mov rdi, rax

mov rax, [rbp-68h] ∅

mov [rbp-68h], rdi∅

∅ ∅

STACK

ARG

ARG | ∅

STACK | ARG

ARG | ∅

Page 25: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmenting Call Site Arguments

25getaddrinfo(rdi,rsi,rdx,rcx)

mov rdi, rax

mov rax, [rbp-68h] ∅

mov [rbp-68h], rdi∅

∅ ∅

STACK

ARG

ARG | ∅

STACK | ARG

ARG | ∅

Using concrete or abstracted values:

1. Concrete value (Integer, Enum, String)

2. ARG – procedure argument

3. GLOBAL - pointer to a global variable

4. RET – a return value from a call

5. STACK – pointer to stack memory

Less Informative

Page 26: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Augmenting Call Site Arguments

26getaddrinfo(ARG,rsi,rdx,rcx)

mov rdi, rax

∅ ∅ARG

ARG | ∅

STACK | ARG

ARG | ∅

STACK

Page 27: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

27

Augmented Control Flow Graph

…call …

…call socket

…call printf

…call setsockopt

…call close

…call printf

Page 28: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

28

Augmented Control Flow Graph

setsockopt(RET,0,10,STK,4)

socket(2,1,0)

printf(GLOBAL,…)

close(…)

...

printf(GLOBAL,…)

Usefull for training seq2seq or GNN models

...

Page 29: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Extracting Paths From the ACFG

29

Extract simple paths(no loops)

setsockopt(RET,0,10,STK,4)

socket(2,1,0)

printf(GLOBAL,…)

close(…) ...

printf(GLOBAL,…)

setsockopt(RET,1,2,STK,4)

getaddrinfo(ARG,ARG,STK,STK)

socket(…)

bind(…)

listen(…)

memset(STK,0,48)

Page 30: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Our Approach: [Set-Of-Seq]-To-Seq

30

setsockopt(RET,1,2,STK,4)

getaddrinfo(ARG,ARG,STK,STK)

socket(…)

bind(…)

listen(…)

memset(STK,0,48)

servercreate socket

Page 31: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

EvaluationImplementation: Nero

31

Page 32: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Evaluation Corpus

32

GNU software repository

Remove Duplications

67,246 Labeled

Procedures

Strip

Strip &

Obfuscate APIs

8:1:1 Package-Based Split

Page 33: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Executable Obfuscation Types

• String encoding/encryption

• Code obfuscations (opaque predictions, etc.)

• Commercial (known) / Home-made packers • Header manipulation => API calls not visable

33

Page 34: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Simulating Header Manipulation

• Zeroing ’.dynstr’ removes imported libraries & procedure names

34

Page 35: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Stripped & Obfuscated API Calls

Prec Rec F1

15.46 14.00 14.70

18.41 12.24 14.70

32.10 28.76 30.09

39.12 31.40 34.83

36.50 32.25 34.24

Stripped

Prec Rec F1

22.32 21.16 21.72

25.45 15.97 19.64

34.86 32.54 33.66

39.94 38.89 39.40

41.54 38.64 40.04

Evaluation Results

StatsModel

LSTM-text

Transformer-text

Debin [He et al. 2018]

Nero-LSTM

Nero-Transformer

35”Debin: Predicting Debug Information in Stripped Binaries”, CCS’18

Page 36: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Ablation Study

Components Prec Rec F1

Only Callsà LSTM 23.45 24.56 24.04

Augmented Call Sites à LSTM 36.05 31.77 33.77

Paths à Only Calls à LSTM 29.84 24.08 26.65

Paths à Augmented Call Sites à LSTM 39.94 38.89 39.40

36

Page 37: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Prediction Examples

Model Prediction

Ground Truth read file check new watcher

get user groups

install signal handlers

Debin [He et al. 2018] bt open read index display signal setup

LSTM-text <unk> check opt close stdin <unk>

Transformer-text Ipmi disable coredump <unk> config file

ipmi regfree

Nero-LSTM vfs read file check file get ip groups install handlers

Nero-Transformer read file system list check state get user

groups install signal

37

Page 38: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Qualitive EvaluationError Type Package Ground Truth Predicted Name

Programmers Vs

English Language

wget i18n_initialize i18n_initdirevent split_cfg_path split_config_path

gzip add_env_opt add_option

Date StructureName Missing

gtypist get_best_speed get_list_itemwget ftp_parse_winnt_ls parse_treegzip abort_gzip_signal fatal_signal_handler

Verb Replaced

units read_units parsefindutils share_file_fopen add_filemcsim display_help show_help

38

Page 39: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Qualitive EvaluationError Type Package Ground Truth Predicted Name

Programmers Vs

English Language

wget i18n_initialize i18n_initdirevent split_cfg_path split_config_path

gzip add_env_opt add_option

Date StructureName Missing

gtypist get_best_speed get_list_itemwget ftp_parse_winnt_ls parse_treegzip abort_gzip_signal fatal_signal_handler

Verb Replaced

units read_units parsefindutils share_file_fopen add_filemcsim display_help show_help

39

Measured F1 is actually a lower-

bound

Page 40: Neural Reverse Engineering of Stripped Binaries · 2020. 6. 14. · Neural Reverse Engineering of Stripped Binaries Yaniv David, Uri Alon, Eran Yahav Technion, Israel

Takeaway Messages

40

Use Augmented Call Sites as Learning Features

setsockopt(rdi,rsi,rdx,rcx,r8)

call socket(...)mov [rbp-58h], raxmov rax, [rbp-58h]mov rdi, rax

mov rsi, 1

mov r8, 4

In C: setsocketopt(sock_var,…,1,4)

Translate: Assembly Procedure → English


Recommended