Date post: | 13-Apr-2017 |
Category: |
Software |
Upload: | lastline-inc |
View: | 650 times |
Download: | 4 times |
• Co-founder and Chief Scientist at Lastline, Inc.– Lastline offers protection against zero-day threats and advanced
malware
• Professor in Computer Science at UC Santa Barbara– many systems security papers in academic conferences
• Part of Shellphish
Who are we?
• PhD Student at UC Santa Barbara– research focused primarily on binary security and embedded devices
• Part of Shellphish– team leader of Shellphish's effort in the DARPA Cyber Grand
Challenge
• Doesn't like peanut butter
Who are we?
- firmware- binary analysis- angr
What are we talking about?
The “Internet of Things”
Embedded software is everywhere
• Embedded Linux and user-space programs
• Custom OS and custom programs combined together in a binary blob– typically, the binary is all that you get– and, sometimes, it is not easy to get this off the device
What is on embedded devices?
Binary analysisnoun | bi·na·ry anal·y·sis | \ˈbī-nə-rē ə-ˈna-lə-səs\
1. The process of automatically deriving properties about the behavior of binary programs
2. Including static binary analysis and dynamic binary analysis
Binary Analysis
• Program verification• Program testing• Vulnerability excavation• Vulnerability signature generation
• Reverse engineering• Vulnerability excavation• Exploit generation
Goals of Binary Analysis
– reason over multiple (all) execution paths– can achieve excellent coverage– precision versus scalability trade-off
• very precise analysis can be slow and not scalable• too much approximation leads to wrong results (false positives)
– often works on abstract program model• for example, binary code is lifted to an intermediate representation
Static Binary Analysis
– examine individual program paths– very precise– coverage is (very) limited– sometimes hard to properly run program
• hard to attach debugger to embedded system• when code is extracted and emulated, what happens with calls to
peripherals?
Dynamic Binary Analysis
• Get the binary code
• Binaries lack significant information present in source
• Often no clear library or operating system abstractionso where to start the analysis from?o hard to handle environment interactions
Challenges of Static Binary Analysis
From Source to Binary Code
compile link
strip
From Source to Binary Code
compile link
striptype info
function names
variable names
jump targets
• (Linux) system call interface is great– you know what the I/O routines are
• important to understand what user can influence– you have typed parameters and return values– lets the analysis focus on (much smaller) main program
• OS is not there or embedded in binary blob– heuristics to find I/O routines– open challenge to find mostly independent components
Missing OS and Library Abstractions
• Library functions are great– you know what they do and can write a “function summary”– you have typed parameters and return values– lets the analysis focus on (much smaller) main program
• Library functions are embedded (like static linking)– need heuristics to rediscover library functions– IDA FLIRT (Fast Library Identification and Recognition Technology)– more robustness based on looking for control flow similarity
Missing OS and Library Abstractions
• Memory safety vulnerabilities– buffer overrun– out of bounds reads (heartbleed)– write-what-where
• Authentication bypass (backdoors)
• Actuator control!
Types of Vulnerabilities
Linux embedded device: HTTP server for management and video monitoring, with a known backdoor.
Backdoor!!!➔ Username: 3sadmin➔ Password: 27988303
Heffner, Craig. "Finding and Reversing Backdoors in Consumer Firmware." EELive! (2014).
Motivating Example
Authentication Bypass
Prompt
Authentication
Success Failure
Authentication Bypass
Prompt
Authentication
Success Failure
Backdoore.g. strcmp()
Authentication Bypass
Prompt
Authentication
Success Failure
Backdoore.g. strcmp()
Hard to find.
Authentication Bypass
Prompt
Success
Missing!
Modeling Authentication BypassPrompt
Authentication
Success Failure
Backdoore.g. strcmp()
Easier to find!
Hard to find.
Input DeterminismPrompt
Authentication
Success Failure
Backdoore.g. strcmp()
Can we determine the input needed to reach the successfunction, just by analyzing the code?
The answer is NO
Input Determinism
Prompt
Authentication
Success Failure
Backdoore.g. strcmp()
Can we determine the input needed to reach the successfunction, just by analyzing the code?
The answer is YES
Modeling Authentication BypassPrompt
Authentication
Success Failure
Backdoore.g. strcmp()
Easier to find!
But how?
• Without OS/ABI information:
• With ABI information:
Finding “Authenticated Point”
EXEC()
Using Binary Analysis to Hunt for Vulnerabilities
Program
Symbolic ExecutionSecurity policies Security
Policy Checker
POCs
Static Analysis
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
"How do I trigger path X or condition Y?"
Symbolic Execution
Input Determinism
Prompt
Authentication
Success Failure
Backdoore.g. strcmp()
Can we determine the input needed to reach the successfunction, just by analyzing the code?
"How do I trigger path X or condition Y?"
- Dynamic analysis- Input A? No. Input B? No. Input C? …- Based on concrete inputs to application.
- (Concrete) static analysis- "You can't"/"You might be able to"- Based on various static techniques.
We need something slightly different.
Symbolic Execution
"How do I trigger path X or condition Y?"
1. Interpret the application.2. Track "constraints" on variables.3. When the required condition is triggered,
"concretize" to obtain a possible input.
Symbolic Execution
Constraint solving:
❏ Conversion from set of constraints to set of concrete values that satisfy them.
❏ NP-complete, in general.
Constraints
x >= 10x < 100
x = 42
Symbolic Execution
Pros
- Precise- No false positives (with
correct environment model)
- Produces directly-actionable inputs
Symbolic Execution - Pros and ConsCons
- Not scalable- constraint solving is np-
complete- path explosion
Our Case
Worst-Case
Worst-Case
angr: A Binary Analysis Framework
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow GraphData-Flow AnalysisValue-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow GraphData-Flow AnalysisValue-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow GraphData-Flow AnalysisValue-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
angr: A Binary Analysis Framework
Control-Flow GraphData-Flow AnalysisValue-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
Example
cmp rbx, 0x1024 ja _OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4
rbx?
What is rbx in the yellow square?mov rax, 0x400000mov rbx, 0
Symbolic execution: state explosion
Naive static analysis: "anything"
Range analysis: "< 0x1024"
Can we do better?
Memory access checks Type inference
Variable recovery Range recoveryWrapped-interval analysis
Value-set analysisAbstract interpretation
Value Set Analysis
Value Set Analysis - Strided Intervals4[0x100, 0x120],32
Stride Low High Size
0x100 0x10c 0x118
0x104 0x110 0x11c
0x108 0x114 0x120
Example
cmp rbx, 0x1024 ja _OUT cmp [rax+rbx], 1337 je _OUT add rbx, 4
rbx?
What is rbx in the yellow square?
1. 1[0x0, 0x0],64
2. 4[0x0, 0x4],64
3. 4[0x0, 0x8],64
4. 4[0x0, 0xc],64
5. 4[0x0, ∞],64
6. 4[0x0, 0x1024],64
mov rax, 0x400000mov rbx, 0
Widen
Narrow
1
234 56
angr: A Binary Analysis Framework
Control-Flow GraphData-Flow AnalysisValue-Set Analysis
Static Analysis Routines
Symbolic Execution Engine
Binary Loader
angr
CBvulnerable program
RBpatched program
POVexploit
CyberReasoning
System
The Cyber Grand Challenge!
The Shellphish CRS
CB
ProposedRBs
Autonomous vulnerability
scanning
Autonomous service
resiliency
PCAP
Test cases
POV
RB
Autonomous processing
Autonomous patching
ProposedPOVs
The Shellphish CRS
CB
ProposedRBs
Autonomous vulnerability
scanning
Autonomous service
resiliency
PCAP
Test cases
POV
RB
Autonomous processing
Autonomous patching
ProposedPOVs
- ipython-accessible- powerful analyses- versatile- well-encapsulated- open and expandable- architecture "independent"
Angr
Angr Mini-howto# ipython
In [1]: import angr, networkxIn [2]: binary = angr.Project("/some/binary")In [3]: cfg = binary.analyses.CFG()In [4]: networkx.draw(cfg.graph)
In [5]: explorer = binary.factory.path_group()In [6]: explorer.explore(find=0xc001b000)
➔ http://angr.io ➔ https://github.com/angr➔ [email protected]
Pull requests, issues, questions, etc super-welcome! Let's bring on the next generation of binary analysis!
Angr - Open source!
Birthday: September 2013Total line numbers: 59950Total commits: ALMOST 9000!! (actually ~6000)
Value Set Analysis - Strided Intervals
4[0x100, 0x110],32 + 1 = 4[0x101, 0x111],32
4[0x100, 0x110],32 >> 1 = 2[0x80, 0x88],31
2[0x80, 0x88],31 << 1 = 1[0x100, 0x110],32
4[0x100, 0x110],32 ⋃ 4[0x102, 0x112],32 = 2[0x100, 0x112],32
4[0x100, 0x110],32 ⋂ 3[0x100, 0x110],32 = 12[0x100, 0x112],32
WIDEN (4[0x100, 0x110],32 ⋃ 4[0x100, 0x112],32) = 4[0x100, ∞],32