Model-Based Whitebox Fuzzing - NUS Computingthuanpv/publications/MoWF_ASE2016.pdf · PhD Thesis...

Post on 01-Apr-2018

239 views 3 download

transcript

PhD Thesis Defense

Presented by Marcel Boehme

Model-Based Whitebox Fuzzing for Program Binaries

Marcel BöhmeThuan Pham Abhik Roychoudhury

ASE 2016 September 3-7, 2016

Singapore

Presented by Thuan Pham

Vulnerabilities in file-processing programs

2

#CVE-assigned vulnerabilities by year

0

100

200

300

400

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

315

399

328352

304 310

199 203

343

169

(US National Vulnerability Database) (By 30/8)

File Processing Programs

Presented by Thuan Pham

Vulnerabilities in file-processing programs

2

#CVE-assigned vulnerabilities by year

0

100

200

300

400

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

315

399

328352

304 310

199 203

343

169

(US National Vulnerability Database) (By 30/8)

File Processing Programs

Presented by Thuan Pham

Challenge

•Generating test cases to expose vulnerabilities in file-processing software is challenging !

3

•Highly Structured

• Having both syntactic and semantic relationships

• Compression/decompression algorithms

• Integrity constraints e.g., Checksums

Presented by Thuan Pham

File as a Tree

4

PNG

CHUNKS

CHUNKSCHUNK

CHUNKlength type

Signature

CRCDATA

xxx yyy…

length of cry of

Data chunk

Data field

Integrity constraint

length type CRCDATA

zzz txt…length of cry of

Presented by Thuan Pham

File as a Tree

4

PNG

CHUNKS

CHUNKSCHUNK

CHUNKlength type

Signature

CRCDATA

xxx yyy…

length of cry of

Data chunk

Data field

Integrity constraint

length type CRCDATA

zzz txt…length of cry of

1. (Model-Based) Blackbox Fuzzing 2. Whitebox Fuzzing

Presented by Thuan Pham

Blackbox Fuzzing

📄 📄

📄

📄

Blackbox Fuzzing

5

Seed Input

Mutated Inputs

Presented by Thuan Pham

Blackbox Fuzzing

📄 📄

📄

📄

Blackbox Fuzzing Rejected !

Rejected !

Rejected !

5

Seed Input

Mutated Inputs

Presented by Thuan Pham

Model-Based Blackbox Fuzzing

📄 Model-Based Blackbox Fuzzing

Peach, Spike …

6

Seed Input

Presented by Thuan Pham

Model-Based Blackbox Fuzzing

📄 Model-Based Blackbox Fuzzing

Input model

Peach, Spike …

6

Seed Input

Presented by Thuan Pham

Model-Based Blackbox Fuzzing

📄 Model-Based Blackbox Fuzzing

Input model

Peach, Spike …

6

Seed Input

📄📄

📄

Pass all checks

Satisfy some checks

Satisfy some checks

Mutated Inputs

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Probability to generate correct value(s) for

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Probability to generate correct value(s) for One 32-bit data field: 1/2^32

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64Three 32-bit data fields: 1/2^96

Presented by Thuan Pham

Model-Based Blackbox Fuzzing (MoBF)

7

MoBF struggles at generating specific values for data fields !

Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64Three 32-bit data fields: 1/2^96…

Presented by Thuan Pham

Whitebox Fuzzing

📄Symbolic

Symbolic

Dynamic Symbolic Execution

📄Rejected !

📄Rejected !

📄Benign

📄Crash!

NDSS’08, ICSE’09, ASPLOS’11, ICSE’15 …

8

Seed Input

(potential) crash locations

Presented by Thuan Pham

Whitebox Fuzzing (WF)

9

Presented by Thuan Pham

Whitebox Fuzzing (WF)

9

WF comfortably generates specific values for data fields

Presented by Thuan Pham

Whitebox Fuzzing (WF)

9

WF easily gets bogged down by large space of invalid inputs while

• adding missing data chunk(s) or • enforcing integrity constraints like checksums, size-of, offset-of …

WF comfortably generates specific values for data fields

Presented by Thuan Pham

Motivating Example A PNG file triggers a crash in VLC media player

10

Presented by Thuan Pham

Motivating Example A PNG file triggers a crash in VLC media player

10

Requires an optional data chunk

Presented by Thuan Pham

Motivating Example A PNG file triggers a crash in VLC media player

10

Requires an optional data chunk

Requires specific values for some data fields

Presented by Thuan Pham

Motivating Example A PNG file triggers a crash in VLC media player

10

Requires an optional data chunk

Requires specific values for some data fields

Presented by Thuan Pham

Motivating Example A PNG file triggers a crash in VLC media player

10

Requires an optional data chunk

Requires specific values for some data fieldsMoBF & WF are very unlikely to generate the crashing input

IF the selected seed file does not have optional tRNS

data chunk

Presented by Thuan Pham

Observation & Solution

•A missing data chunk can be obtained from other seed inputs in the test suite

•OR it can be directly instantiated from the input model

11

Presented by Thuan Pham

Observation & Solution

•A missing data chunk can be obtained from other seed inputs in the test suite

•OR it can be directly instantiated from the input model

11

New File having necessary part

Input File with a missing part

Test suites

Input model

Data chunk Transplantation

Presented by Thuan Pham

Model-Based Whitebox Fuzzing

12

Augmented MoBF MoBF + Transplantation

Selective and Targeted Whitebox Fuzzing

Presented by Thuan Pham

Model-Based Whitebox Fuzzing

12

Augmented MoBF MoBF + Transplantation

Selective and Targeted Whitebox Fuzzing

•Handles missing data chunks by data chunk transplantation •Enforces integrity

checks

Presented by Thuan Pham

Model-Based Whitebox Fuzzing

12

Augmented MoBF MoBF + Transplantation

Selective and Targeted Whitebox Fuzzing

•Handles missing data chunks by data chunk transplantation •Enforces integrity

checks

•Guides data chunk transplantation •Explores deep

paths •Generates specific

values causing program crashes

Presented by Thuan Pham

Model-Based Whitebox Fuzzing

12

Augmented MoBF MoBF + Transplantation

Selective and Targeted Whitebox Fuzzing

•Handles missing data chunks by data chunk transplantation •Enforces integrity

checks

•Guides data chunk transplantation •Explores deep

paths •Generates specific

values causing program crashes

Peach Fuzzer Production-quality MoBF

Presented by Thuan Pham

Model-Based Whitebox Fuzzing

12

Augmented MoBF MoBF + Transplantation

Selective and Targeted Whitebox Fuzzing

•Handles missing data chunks by data chunk transplantation •Enforces integrity

checks

•Guides data chunk transplantation •Explores deep

paths •Generates specific

values causing program crashes

Peach Fuzzer Production-quality MoBF

Hercules (ICSE’15) Scale to WMP, Adobe Reader

Presented by Thuan Pham

What the input model looks like?

13

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

inherits common data fields & relationships

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

inherits common data fields & relationships

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Data model for PNG image files

inherits common data fields & relationships

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Data model for PNG image files

inherits common data fields & relationships

Presented by Thuan Pham

XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk

14

Data model for PNG image files

inherits common data fields & relationships

Presented by Thuan Pham 15

File Cracker

Generator + Mutator

Test suite

Mutated File

Input Model

Decomposes file into data elements — data chunks & data fields

Integrity constraints are enforced

Presented by Thuan Pham

Peach Fuzzer + Transplantation

16

Modified File Cracker

File Sticher

Test suite

Mutated File

Input Model

Fragment Pool

Symbolic Execution

Presented by Thuan Pham

Peach Fuzzer + Transplantation

16

Modified File Cracker

File Sticher

Test suite

Mutated File

Input Model

Fragment Pool

Symbolic Execution

Presented by Thuan Pham

Peach Fuzzer + Transplantation

16

Modified File Cracker

File Sticher

Test suite

Mutated File

Input Model

Fragment Pool

Symbolic Execution

What to transplant?

Presented by Thuan Pham

Peach Fuzzer + Transplantation

16

Modified File Cracker

File Sticher

Test suite

Mutated File

Input Model

Fragment Pool

Symbolic Execution

What to transplant?

Where to transplant?

Presented by Thuan Pham

Peach Fuzzer + Transplantation

16

Modified File Cracker

File Sticher

Test suite

Mutated File

Input Model

Fragment Pool

Symbolic Execution

Crucial IF Statements

Presented by Thuan Pham

Crucial IF Statements

17

Code extracted from LibPNG

Presented by Thuan Pham

Crucial IF Statements

17

A Crucial IF Statement - Only one branch has been taken - depends on the presence of a data chunk in the input file

Code extracted from LibPNG

Presented by Thuan Pham

Detecting Crucial IF Statements

18

Presented by Thuan Pham

Detecting Crucial IF Statements

18

📄Symbolic• Step 1. Mark input file (partially) symbolic

Presented by Thuan Pham

Detecting Crucial IF Statements

18

📄Symbolic• Step 1. Mark input file (partially) symbolic

• Step 2. Concolically execute program in one path - same path as concrete input

if_1

if_2

if_3

Presented by Thuan Pham

Detecting Crucial IF Statements

18

📄Symbolic• Step 1. Mark input file (partially) symbolic

• Step 2. Concolically execute program in one path - same path as concrete input

• Step 3. Collect branch conditions of IF statements at which only one branch has been taken (e.g., if_2)

if_1

if_2

if_3

Presented by Thuan Pham

Detecting Crucial IF Statements

18

📄Symbolic• Step 1. Mark input file (partially) symbolic

• Step 2. Concolically execute program in one path - same path as concrete input

• Step 3. Collect branch conditions of IF statements at which only one branch has been taken (e.g., if_2)

• Step 4. Use symbolic-execution-based taint analysis & input model to analyse branch conditions (at if_2) to validate crucial IFs statements

if_1

if_2

if_3

Presented by Thuan Pham

Evaluation - Subjects & Input Models

19

Presented by Thuan Pham

Evaluation - Subjects & Input Models

19

9 subject programs

Presented by Thuan Pham

Evaluation - Subjects & Input Models

19

9 subject programs

6 Input models

One-time effort 34 hrs

Presented by Thuan Pham

Evaluation - Effectiveness of MoWF

Program Advisory ID Input Model #Seed files MoWF Peach Hercules

VLC 2.0.7 OSVDB-95632 PNG 10

VLC 2.0.3 CVE-2012-5470 PNG 10

LTP 1.5.4 CVE-2011-3328 PNG 10

XNV1.98 Unknown-1 PNG 10

XNV1.98 Unknown-2 PNG 10

XNV1.98 Unknown-3 PNG 10

WMP 9.0 Unknown-4 WAV 10

WMP 9.0 CVE-2014-2671 WAV 10

WMP 9.0 CVE-2010-0718 MIDI 10

AR 9.2 CVE-2010-2204 PDF 10

RP 1.0 CVE-2010-3000 FLV 10

MP 0.35 CVE-2011-0502 MIDI 10

OV 1.04 CVE-2010-0688 ORB 10

20

Time bound: 24hrs

Presented by Thuan Pham

Evaluation - Seed Input Dependence

Program Advisory ID Input Model #Seed files Hercules++

VLC 2.0.7 OSVDB-95632 PNG 0

VLC 2.0.3 CVE-2012-5470 PNG 0

LTP 1.5.4 CVE-2011-3328 PNG 0

XNV1.98 Unknown-1 PNG 0

XNV1.98 Unknown-2 PNG 0

XNV1.98 Unknown-3 PNG 0

WMP 9.0 Unknown-4 WAV 0

WMP 9.0 CVE-2014-2671 WAV 0

WMP 9.0 CVE-2010-0718 MIDI 0

AR 9.2 CVE-2010-2204 PDF 0

RP 1.0 CVE-2010-3000 FLV 0

MP 0.35 CVE-2011-0502 MIDI 0

OV 1.04 CVE-2010-0688 ORB 0

21

Presented by Thuan Pham

Evaluation - Seed Input Dependence

Program Advisory ID Input Model #Seed files Hercules++

VLC 2.0.7 OSVDB-95632 PNG 0

VLC 2.0.3 CVE-2012-5470 PNG 0

LTP 1.5.4 CVE-2011-3328 PNG 0

XNV1.98 Unknown-1 PNG 0

XNV1.98 Unknown-2 PNG 0

XNV1.98 Unknown-3 PNG 0

WMP 9.0 Unknown-4 WAV 0

WMP 9.0 CVE-2014-2671 WAV 0

WMP 9.0 CVE-2010-0718 MIDI 0

AR 9.2 CVE-2010-2204 PDF 0

RP 1.0 CVE-2010-3000 FLV 0

MP 0.35 CVE-2011-0502 MIDI 0

OV 1.04 CVE-2010-0688 ORB 0

21

70% No seed file is needed

Presented by Thuan Pham

Related Work

22

Presented by Thuan Pham

Related Work

22

Grammar-based whitebox fuzzing (PLDI’08)

Presented by Thuan Pham

Grammar-based Whitebox Fuzzing (GWF)

Grammar-Based Whitebox Fuzzing

Context-Free Solver

Regular Expression

Context-Free Grammar

SAT/ UNSAT

📄Conform

to grammar

📄📄

📄

PLDI’08

23

Javascript file

Presented by Thuan Pham

MoWF vs GWF

•Regular Expression (GWF) is much weaker than full Path Condition - it cannot encode simple arithmetic constraint like “x<y’’

•MoWF maintains full Path Condition and has no impact on the soundness and completeness of Whitebox Fuzzing technique

•MoWF leverages File format input model - more expressive yet simple than Context-Free Grammar. It can comfortably handle integrity constraints like length-of, offset-of and checksums

24

Presented by Thuan Pham 25

Presented by Thuan Pham 25