PhD Thesis Defense
Presented by Marcel Boehme
Model-Based Whitebox Fuzzing for Program Binaries
Marcel BöhmeThuan Pham Abhik Roychoudhury
ASE 2016 September 3-7, 2016
Singapore
Presented by Thuan Pham
Vulnerabilities in file-processing programs
2
#CVE-assigned vulnerabilities by year
0
100
200
300
400
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
315
399
328352
304 310
199 203
343
169
(US National Vulnerability Database) (By 30/8)
File Processing Programs
Presented by Thuan Pham
Vulnerabilities in file-processing programs
2
#CVE-assigned vulnerabilities by year
0
100
200
300
400
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
315
399
328352
304 310
199 203
343
169
(US National Vulnerability Database) (By 30/8)
File Processing Programs
Presented by Thuan Pham
Challenge
•Generating test cases to expose vulnerabilities in file-processing software is challenging !
3
•Highly Structured
• Having both syntactic and semantic relationships
• Compression/decompression algorithms
• Integrity constraints e.g., Checksums
Presented by Thuan Pham
File as a Tree
4
PNG
CHUNKS
CHUNKSCHUNK
CHUNKlength type
Signature
CRCDATA
xxx yyy…
…
length of cry of
Data chunk
Data field
Integrity constraint
length type CRCDATA
zzz txt…length of cry of
Presented by Thuan Pham
File as a Tree
4
PNG
CHUNKS
CHUNKSCHUNK
CHUNKlength type
Signature
CRCDATA
xxx yyy…
…
length of cry of
Data chunk
Data field
Integrity constraint
length type CRCDATA
zzz txt…length of cry of
1. (Model-Based) Blackbox Fuzzing 2. Whitebox Fuzzing
Presented by Thuan Pham
Blackbox Fuzzing
📄 📄
📄
📄
Blackbox Fuzzing
5
Seed Input
Mutated Inputs
Presented by Thuan Pham
Blackbox Fuzzing
📄 📄
📄
📄
Blackbox Fuzzing Rejected !
Rejected !
Rejected !
5
Seed Input
Mutated Inputs
Presented by Thuan Pham
Model-Based Blackbox Fuzzing
📄 Model-Based Blackbox Fuzzing
Peach, Spike …
6
Seed Input
Presented by Thuan Pham
Model-Based Blackbox Fuzzing
📄 Model-Based Blackbox Fuzzing
Input model
Peach, Spike …
6
Seed Input
Presented by Thuan Pham
Model-Based Blackbox Fuzzing
📄 Model-Based Blackbox Fuzzing
Input model
Peach, Spike …
6
Seed Input
📄📄
📄
Pass all checks
Satisfy some checks
Satisfy some checks
Mutated Inputs
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Probability to generate correct value(s) for
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Probability to generate correct value(s) for One 32-bit data field: 1/2^32
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64Three 32-bit data fields: 1/2^96
Presented by Thuan Pham
Model-Based Blackbox Fuzzing (MoBF)
7
MoBF struggles at generating specific values for data fields !
Probability to generate correct value(s) for One 32-bit data field: 1/2^32Two 32-bit data fields: 1/2^64Three 32-bit data fields: 1/2^96…
Presented by Thuan Pham
Whitebox Fuzzing
📄Symbolic
Symbolic
Dynamic Symbolic Execution
📄Rejected !
📄Rejected !
📄Benign
📄Crash!
NDSS’08, ICSE’09, ASPLOS’11, ICSE’15 …
8
Seed Input
(potential) crash locations
Presented by Thuan Pham
Whitebox Fuzzing (WF)
9
Presented by Thuan Pham
Whitebox Fuzzing (WF)
9
WF comfortably generates specific values for data fields
Presented by Thuan Pham
Whitebox Fuzzing (WF)
9
WF easily gets bogged down by large space of invalid inputs while
• adding missing data chunk(s) or • enforcing integrity constraints like checksums, size-of, offset-of …
WF comfortably generates specific values for data fields
Presented by Thuan Pham
Motivating Example A PNG file triggers a crash in VLC media player
10
Presented by Thuan Pham
Motivating Example A PNG file triggers a crash in VLC media player
10
Requires an optional data chunk
Presented by Thuan Pham
Motivating Example A PNG file triggers a crash in VLC media player
10
Requires an optional data chunk
Requires specific values for some data fields
Presented by Thuan Pham
Motivating Example A PNG file triggers a crash in VLC media player
10
Requires an optional data chunk
Requires specific values for some data fields
Presented by Thuan Pham
Motivating Example A PNG file triggers a crash in VLC media player
10
Requires an optional data chunk
Requires specific values for some data fieldsMoBF & WF are very unlikely to generate the crashing input
IF the selected seed file does not have optional tRNS
data chunk
Presented by Thuan Pham
Observation & Solution
•A missing data chunk can be obtained from other seed inputs in the test suite
•OR it can be directly instantiated from the input model
11
Presented by Thuan Pham
Observation & Solution
•A missing data chunk can be obtained from other seed inputs in the test suite
•OR it can be directly instantiated from the input model
11
New File having necessary part
Input File with a missing part
Test suites
Input model
Data chunk Transplantation
Presented by Thuan Pham
Model-Based Whitebox Fuzzing
12
Augmented MoBF MoBF + Transplantation
Selective and Targeted Whitebox Fuzzing
Presented by Thuan Pham
Model-Based Whitebox Fuzzing
12
Augmented MoBF MoBF + Transplantation
Selective and Targeted Whitebox Fuzzing
•Handles missing data chunks by data chunk transplantation •Enforces integrity
checks
Presented by Thuan Pham
Model-Based Whitebox Fuzzing
12
Augmented MoBF MoBF + Transplantation
Selective and Targeted Whitebox Fuzzing
•Handles missing data chunks by data chunk transplantation •Enforces integrity
checks
•Guides data chunk transplantation •Explores deep
paths •Generates specific
values causing program crashes
Presented by Thuan Pham
Model-Based Whitebox Fuzzing
12
Augmented MoBF MoBF + Transplantation
Selective and Targeted Whitebox Fuzzing
•Handles missing data chunks by data chunk transplantation •Enforces integrity
checks
•Guides data chunk transplantation •Explores deep
paths •Generates specific
values causing program crashes
Peach Fuzzer Production-quality MoBF
Presented by Thuan Pham
Model-Based Whitebox Fuzzing
12
Augmented MoBF MoBF + Transplantation
Selective and Targeted Whitebox Fuzzing
•Handles missing data chunks by data chunk transplantation •Enforces integrity
checks
•Guides data chunk transplantation •Explores deep
paths •Generates specific
values causing program crashes
Peach Fuzzer Production-quality MoBF
Hercules (ICSE’15) Scale to WMP, Adobe Reader
Presented by Thuan Pham
What the input model looks like?
13
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
inherits common data fields & relationships
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
inherits common data fields & relationships
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Data model for PNG image files
inherits common data fields & relationships
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Data model for PNG image files
inherits common data fields & relationships
Presented by Thuan Pham
XML-based Input Model (Peach Fuzzer)Data model for a generic data chunk
14
Data model for PNG image files
inherits common data fields & relationships
Presented by Thuan Pham 15
File Cracker
Generator + Mutator
Test suite
Mutated File
Input Model
Decomposes file into data elements — data chunks & data fields
Integrity constraints are enforced
Presented by Thuan Pham
Peach Fuzzer + Transplantation
16
Modified File Cracker
File Sticher
Test suite
Mutated File
Input Model
Fragment Pool
Symbolic Execution
Presented by Thuan Pham
Peach Fuzzer + Transplantation
16
Modified File Cracker
File Sticher
Test suite
Mutated File
Input Model
Fragment Pool
Symbolic Execution
Presented by Thuan Pham
Peach Fuzzer + Transplantation
16
Modified File Cracker
File Sticher
Test suite
Mutated File
Input Model
Fragment Pool
Symbolic Execution
What to transplant?
Presented by Thuan Pham
Peach Fuzzer + Transplantation
16
Modified File Cracker
File Sticher
Test suite
Mutated File
Input Model
Fragment Pool
Symbolic Execution
What to transplant?
Where to transplant?
Presented by Thuan Pham
Peach Fuzzer + Transplantation
16
Modified File Cracker
File Sticher
Test suite
Mutated File
Input Model
Fragment Pool
Symbolic Execution
Crucial IF Statements
Presented by Thuan Pham
Crucial IF Statements
17
Code extracted from LibPNG
Presented by Thuan Pham
Crucial IF Statements
17
A Crucial IF Statement - Only one branch has been taken - depends on the presence of a data chunk in the input file
Code extracted from LibPNG
Presented by Thuan Pham
Detecting Crucial IF Statements
18
Presented by Thuan Pham
Detecting Crucial IF Statements
18
📄Symbolic• Step 1. Mark input file (partially) symbolic
Presented by Thuan Pham
Detecting Crucial IF Statements
18
📄Symbolic• Step 1. Mark input file (partially) symbolic
• Step 2. Concolically execute program in one path - same path as concrete input
if_1
if_2
if_3
Presented by Thuan Pham
Detecting Crucial IF Statements
18
📄Symbolic• Step 1. Mark input file (partially) symbolic
• Step 2. Concolically execute program in one path - same path as concrete input
• Step 3. Collect branch conditions of IF statements at which only one branch has been taken (e.g., if_2)
if_1
if_2
if_3
Presented by Thuan Pham
Detecting Crucial IF Statements
18
📄Symbolic• Step 1. Mark input file (partially) symbolic
• Step 2. Concolically execute program in one path - same path as concrete input
• Step 3. Collect branch conditions of IF statements at which only one branch has been taken (e.g., if_2)
• Step 4. Use symbolic-execution-based taint analysis & input model to analyse branch conditions (at if_2) to validate crucial IFs statements
if_1
if_2
if_3
Presented by Thuan Pham
Evaluation - Subjects & Input Models
19
Presented by Thuan Pham
Evaluation - Subjects & Input Models
19
9 subject programs
Presented by Thuan Pham
Evaluation - Subjects & Input Models
19
9 subject programs
6 Input models
One-time effort 34 hrs
Presented by Thuan Pham
Evaluation - Effectiveness of MoWF
Program Advisory ID Input Model #Seed files MoWF Peach Hercules
VLC 2.0.7 OSVDB-95632 PNG 10
VLC 2.0.3 CVE-2012-5470 PNG 10
LTP 1.5.4 CVE-2011-3328 PNG 10
XNV1.98 Unknown-1 PNG 10
XNV1.98 Unknown-2 PNG 10
XNV1.98 Unknown-3 PNG 10
WMP 9.0 Unknown-4 WAV 10
WMP 9.0 CVE-2014-2671 WAV 10
WMP 9.0 CVE-2010-0718 MIDI 10
AR 9.2 CVE-2010-2204 PDF 10
RP 1.0 CVE-2010-3000 FLV 10
MP 0.35 CVE-2011-0502 MIDI 10
OV 1.04 CVE-2010-0688 ORB 10
20
Time bound: 24hrs
Presented by Thuan Pham
Evaluation - Seed Input Dependence
Program Advisory ID Input Model #Seed files Hercules++
VLC 2.0.7 OSVDB-95632 PNG 0
VLC 2.0.3 CVE-2012-5470 PNG 0
LTP 1.5.4 CVE-2011-3328 PNG 0
XNV1.98 Unknown-1 PNG 0
XNV1.98 Unknown-2 PNG 0
XNV1.98 Unknown-3 PNG 0
WMP 9.0 Unknown-4 WAV 0
WMP 9.0 CVE-2014-2671 WAV 0
WMP 9.0 CVE-2010-0718 MIDI 0
AR 9.2 CVE-2010-2204 PDF 0
RP 1.0 CVE-2010-3000 FLV 0
MP 0.35 CVE-2011-0502 MIDI 0
OV 1.04 CVE-2010-0688 ORB 0
21
Presented by Thuan Pham
Evaluation - Seed Input Dependence
Program Advisory ID Input Model #Seed files Hercules++
VLC 2.0.7 OSVDB-95632 PNG 0
VLC 2.0.3 CVE-2012-5470 PNG 0
LTP 1.5.4 CVE-2011-3328 PNG 0
XNV1.98 Unknown-1 PNG 0
XNV1.98 Unknown-2 PNG 0
XNV1.98 Unknown-3 PNG 0
WMP 9.0 Unknown-4 WAV 0
WMP 9.0 CVE-2014-2671 WAV 0
WMP 9.0 CVE-2010-0718 MIDI 0
AR 9.2 CVE-2010-2204 PDF 0
RP 1.0 CVE-2010-3000 FLV 0
MP 0.35 CVE-2011-0502 MIDI 0
OV 1.04 CVE-2010-0688 ORB 0
21
70% No seed file is needed
Presented by Thuan Pham
Related Work
22
Presented by Thuan Pham
Related Work
22
Grammar-based whitebox fuzzing (PLDI’08)
Presented by Thuan Pham
Grammar-based Whitebox Fuzzing (GWF)
Grammar-Based Whitebox Fuzzing
Context-Free Solver
Regular Expression
Context-Free Grammar
SAT/ UNSAT
📄Conform
to grammar
📄📄
📄
PLDI’08
23
Javascript file
Presented by Thuan Pham
MoWF vs GWF
•Regular Expression (GWF) is much weaker than full Path Condition - it cannot encode simple arithmetic constraint like “x<y’’
•MoWF maintains full Path Condition and has no impact on the soundness and completeness of Whitebox Fuzzing technique
•MoWF leverages File format input model - more expressive yet simple than Context-Free Grammar. It can comfortably handle integrity constraints like length-of, offset-of and checksums
24
Presented by Thuan Pham 25
Presented by Thuan Pham 25