+ All Categories
Home > Documents > Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability...

Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability...

Date post: 24-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
70
Solvers for Software Reliability and Security VIJAY GANESH MIT 2011 Monday, May 2, 2011
Transcript
Page 1: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Solversfor

Software Reliability and Security

VIJAY GANESHMIT2011

Monday, May 2, 2011

Page 2: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

The Software Reliability Problem

• Software is error-prone

• Significant and increasing costs

• Foundational research problem and opportunity

2Monday, May 2, 2011

Page 3: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

What is at the Core?Logic Abstractions of Computation

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program Reasoning

3Monday, May 2, 2011

Page 4: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

What is at the Core?Logic Abstractions of Computation

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program Reasoning

Logics(Boolean,...)

3Monday, May 2, 2011

Page 5: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Why Logic for Program ReasoningLogic Abstractions of Computation

4

Logic Formula: Declarative ViewImperative Code: Operational view

• Logic provides abstractions of computation

• Easy to work with abstractions

• Compact representation of desired properties

Monday, May 2, 2011

Page 6: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Reliability through Logical ReasoningEngineering, Usability, Novelty

5

Program Reasoning Tool

Program Specification

Program is Correct?or Generate Counterexamples (Test cases)

Solver

Logic Formulas

SAT/UNSAT

Monday, May 2, 2011

Page 7: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

What is at the Core?The SAT/SMT Problem

• Rich logics (Modular arithmetic, Arrays, Strings,...)• NP-complete, PSPACE-complete,...• Practical, scalable, usable, automatic• Enable novel software reliability approaches

Logic Formula

SAT

UNSATSolver

(q ∨ p ∨ ¬r)(q ∨ ¬p ∨ r)

...

6Monday, May 2, 2011

Page 8: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

So, What’s New?From Reliability Problem to Solvers

7

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program ReasoningSAT/SMTSolvers

Monday, May 2, 2011

Page 9: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

So, What’s New?From Reliability Problem to Solvers

7

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program ReasoningSAT/SMTSolvers

• 1000+X improvement in 10 years

• Enabled completely new techniques

• Super-charged existing techniques

Monday, May 2, 2011

Page 10: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

1,000 Constraints

10,000 Constraints

100,000 Constraints

1,000,000 Constraints

1998 2001 2004 2007 2010

And, The Research Story is....

8Monday, May 2, 2011

Page 11: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

1,000 Constraints

10,000 Constraints

100,000 Constraints

1,000,000 Constraints

1998 2001 2004 2007 2010

• Supercharge existing techniques• Bounded model-checking

And, The Research Story is....

8Monday, May 2, 2011

Page 12: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

1,000 Constraints

10,000 Constraints

100,000 Constraints

1,000,000 Constraints

1998 2001 2004 2007 2010

• Supercharge existing techniques• Bounded model-checking

•Concolic testing• Program analysis

And, The Research Story is....

8Monday, May 2, 2011

Page 13: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

• Solver-based languages• Compiler optimizations using solvers• Solver-based debuggers• Solver-based type systems• Solver-based concurrency bugfinding• Solver-based synthesis

1,000 Constraints

10,000 Constraints

100,000 Constraints

1,000,000 Constraints

1998 2001 2004 2007 2010

• Supercharge existing techniques• Bounded model-checking

•Concolic testing• Program analysis

And, The Research Story is....

8Monday, May 2, 2011

Page 14: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

My ContributionsSTP & HAMPI Solvers

9

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program Reasoning

STP & HAMPISolvers

Monday, May 2, 2011

Page 15: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

My ContributionsSTP & HAMPI Solvers

9

Formal Methods

Program Analysis

Automatic Testing

Program Synthesis

Program Reasoning

STP & HAMPISolvers

• Can handle real-world formulas with millions of constraints

• Enabled completely new techniques (e.g., Concolic testing)

• Enable test million-line codes

• Super-charged existing techniques (e.g., Hardware bounded MC)

• Future is bright: Multicore, programming language, runtime systems

Monday, May 2, 2011

Page 16: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

• STP• Enabled Concolic Testing• EXE by Engler et al• BAP/BitBlaze by Song et al.• Model checking by Dill et al.

• Solver-based languages (Alloy team)• Solver-based debuggers• Solver-based type systems • Solver-based concurrency bugfinding

100,000 Constraints

1,000,000 Constraints

2005 2009 Today

• HAMPI: String Solvers• Ardilla by Ernst et al.• Kudzu & Kaluza by Song et al.• Klee by Engler et al.• George Candea’s Cloud 9 tester• STP + HAMPI exceed 100+ projects

And, The Research Story is ...

10Monday, May 2, 2011

Page 17: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Key ContributionsName Key Concept Impact Pubs

STP Bit-vector & Array Solver1,2

Abstraction-refinement for Solving

Concolic Testing

CAV 2007CCS 2006TISSEC 2008

HAMPI String Solver1

App-driven Bounding for Solving

Analysis of Web Apps

ISSTA 20093

TOSEM 2011(Invited/in submission)

(Un)Decidability results for Strings

Insights from Practical Applications

First results for strings+length

In submission

11

1. 100+ research projects use STP and HAMPI2. STP won the SMTCOMP 2006 and 2010 competitions for bit-vector solvers3. ACM Best Paper Award 2009

Monday, May 2, 2011

Page 18: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Rest of the Talk

• STP Bit-vector and Array Solver• Why Bit-vectors and Arrays?• How does STP scale: Abstraction-refinement• Impact: Concolic testing• Experimental Results

• HAMPI String Solver• Why Strings?• How does HAMPI scale: Bounding• Impact: String-based program analysis• Experimental Results

• Future Work• Multicore SAT• SAT-based Languages• Auto-tuning Solvers• Advice-based Solvers

12Monday, May 2, 2011

Page 19: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

STP Bit-vector & Array Solver

13

STP SolverProgram

Expressions(x = z+2 OR

mem[i] + y <= 01)

UNSAT

SAT

• Bit-vector or machine arithmetic• Arrays for memory• C/C++/Java expressions• NP-complete

Monday, May 2, 2011

Page 20: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Programs Reasoning & STPWhy Bit-vectors and Arrays

• STP logic tailored for software reliability applications

• Support symbolic execution/program analysis

14

C/C++/Java/... Bit-vectors and Arrays

Int VarChar Var

32 bit variable8 bit variable

Arithmetic operation(x+y, x-y, x*y, x/y,...)

Arithmetic function(x+y,x-y,x*y,x/y,...)

assignmentsx = expr;

equalityx = expr;

if conditionalif(cond) x = expr1 else x = expr2

if-then-else constructx = if(cond) expr1 else expr2

inequality inequality predicate

Memory read/writex = *ptr + i;

Array read/writeptr[]; x = Read(ptr,i);

Structure/Class Serialized bit-vector expressions

Function Symbolic execution

Loops Bounding

Monday, May 2, 2011

Page 21: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How to Automatically Crash Programs?Concolic Execution & STP

15

Problem: Automatically generate crashing tests given only the code

Symbolic ExecutionEnginewith

Implicit Spec

Program

Crashing Tests

STP

Formulas

SAT/UNSAT

Automatic Tester

Monday, May 2, 2011

Page 22: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How to Automate Testing?Concolic Execution & STP

16

Buggy_C_Program(int* data_field, int len_field) {

int * ptr = malloc(len_field*sizeof(int));int i; //uninitialized

while (i++ < process(len_field)) { //1. Integer overflow causing NULL deref //2. Buffer overflow *(ptr+i) = process_data(*(data_field+i));}

}

Structured input processing code: PDF Reader, Movie Player,...

• Formula captures computation• Tester attaches formula to capture spec

Monday, May 2, 2011

Page 23: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How to Automate Testing?Concolic Execution & STP

16

Buggy_C_Program(int* data_field, int len_field) {

int * ptr = malloc(len_field*sizeof(int));int i; //uninitialized

while (i++ < process(len_field)) { //1. Integer overflow causing NULL deref //2. Buffer overflow *(ptr+i) = process_data(*(data_field+i));}

}

Structured input processing code: PDF Reader, Movie Player,...

data_field, mem_ptr : ARRAY;len_field : BITVECTOR(32); //symbolici, j, ptr : BITVECTOR(32);//symbolic..mem_ptr[ptr+i] = process_data(data_field[i]);mem_ptr[ptr+i+1] = process_data(data_field[i+1]);..

Equivalent Logic Formula derived usingsymbolic execution

• Formula captures computation• Tester attaches formula to capture spec

Monday, May 2, 2011

Page 24: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How to Automate Testing?Concolic Execution & STP

16

Buggy_C_Program(int* data_field, int len_field) {

int * ptr = malloc(len_field*sizeof(int));int i; //uninitialized

while (i++ < process(len_field)) { //1. Integer overflow causing NULL deref //2. Buffer overflow *(ptr+i) = process_data(*(data_field+i));}

}

Structured input processing code: PDF Reader, Movie Player,...

data_field, mem_ptr : ARRAY;len_field : BITVECTOR(32); //symbolici, j, ptr : BITVECTOR(32);//symbolic..mem_ptr[ptr+i] = process_data(data_field[i]);mem_ptr[ptr+i+1] = process_data(data_field[i+1]);..

Equivalent Logic Formula derived usingsymbolic execution

• Formula captures computation• Tester attaches formula to capture spec

Monday, May 2, 2011

Page 25: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How to Automate Testing?Concolic Execution & STP

16

Buggy_C_Program(int* data_field, int len_field) {

int * ptr = malloc(len_field*sizeof(int));int i; //uninitialized

while (i++ < process(len_field)) { //1. Integer overflow causing NULL deref //2. Buffer overflow *(ptr+i) = process_data(*(data_field+i));}

}

Structured input processing code: PDF Reader, Movie Player,...

data_field, mem_ptr : ARRAY;len_field : BITVECTOR(32); //symbolici, j, ptr : BITVECTOR(32);//symbolic..mem_ptr[ptr+i] = process_data(data_field[i]);mem_ptr[ptr+i+1] = process_data(data_field[i+1]);..

Equivalent Logic Formula derived usingsymbolic execution

• Formula captures computation• Tester attaches formula to capture spec

//INTEGER OVERFLOW QUERY0 <= j <= process(len_field);ptr + i + j = 0?

Monday, May 2, 2011

Page 26: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksBird’s Eye View: Translate to SAT

17

SATUNSAT

Bit-vector&

Array Formula

(x = z+2 ORmem[i] + y <= 01)

...

TranslateToSAT

Boolean SAT Solver

STP

Why Translate to SAT? • Both theories NP-complete• Non SAT approaches didn’t work• Translation to SAT leverages solid engineering

Monday, May 2, 2011

Page 27: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksRich Theories cause MEM Blow-up

18

SATUNSAT

Bit-vector&

Array Formula

(x = z+2 ORmem[i] + y <= 01)

...

TranslateTo

SAT

Boolean SAT Solver

STP

Formula Growth

• Making information explicit• Space cost• Time cost

Monday, May 2, 2011

Page 28: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Explicit Information causes Blow-upArray Memory Read Problem

19

• Array Aliasing is implicit• Need to make information explicit during solving • Cannot be avoided

data_field, mem_ptr : ARRAY;len_field : BITVECTOR(32); //symbolici, j, ptr : BITVECTOR(32);//symbolic..mem_ptr[ptr+i] = process_data(data_field[i]);mem_ptr[ptr+i+1] = process_data(data_field[i+1]);..if(ptr+i = ptr+i+1) then mem_ptr[ptr+i] = mem_ptr[ptr+i+1);

Logic Formula derived usingsymbolic execution

//INTEGER OVERFLOW QUERY0 <= j <= process(len_field);ptr + i + j < ptr?

Monday, May 2, 2011

Page 29: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksArray-read MEM Blow-up Problem

20

• Problem: O(n2) axioms added, n is number of read indices• Lethal, if n is large, say, n = 100,000; # of axioms is 10 Billion

Read(Mem,i0) = expr0

Read(Mem,i1) = expr1 Read(Mem,i2) = expr2

.

.

.Read(Mem,in) = exprn

v0 = expr0

v1 = expr1

.

.

.

vn = exprn

(i0 = i1) => (v0 = v1)

(i0 = i2) => (v0 = v2)

...

(i1 = i2) => (v1 = v2)

...

Formula Growth

Monday, May 2, 2011

Page 30: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksThe Array-read Solution

21

• Key Observation • Most indices don’t alias in practice• Exploit locality of memory access in typical programs• Need only a fraction of array axioms for equivalence

Read(Mem,i0) = expr0

Read(Mem,i1) = expr1 Read(Mem,i2) = expr2

.

.

.Read(Mem,in) = exprn

v0 = expr0

v1 = expr1

.

.

.

vn = exprn

(i0 = i1) => (v0 = v1)

Monday, May 2, 2011

Page 31: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

STP Key Conceptual ContributionAbstraction-refinement Principle

22

Abstraction Step

Input Formula

Boolean SAT Solver

Abstracted Formula

Check AnswerCorrect Answer

Refinement

Monday, May 2, 2011

Page 32: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksWhat to Abstract & How to Refine?

23

Abstraction Refinement

1. Less essential parts2. Causes MEM blow-up

1. Guided2. Must remember

Abstraction manages formula growth hardness

Refinement manages search-space hardness

Monday, May 2, 2011

Page 33: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

24

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

Result

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ(i0,i1)

Input

Monday, May 2, 2011

Page 34: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

25

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

Result

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ’(i0,i1)

i0 = i1

Monday, May 2, 2011

Page 35: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

26

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

Result

Abstracted Input Array Axioms Dropped

v0=0 v1=1 … vn=10,000 Θ’ (i0,i1)

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ(i0,i1)

Input

Monday, May 2, 2011

Page 36: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

27

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

Result

Abstracted Input Array Axioms Dropped

v0=0 v1=1 … vn=10,000 Θ’ (i0,i1)

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ(i0,i1)

Input

i0=0,i1=0 v0=0, v1=1

Input Formula false

in Assignment

Monday, May 2, 2011

Page 37: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

28

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

Result

Abstracted Input Array Axioms Dropped

v0=0 v1=1 … vn=10,000 Θ’ (i0,i1)

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ(i0,i1)

Input

i0=0,i1=0 v0=0, v1=1

… Add Axiom that

is Falsified

(i0=i1)!v0=v1

Monday, May 2, 2011

Page 38: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How STP WorksAbstraction-refinement for Array-reads

29

Substitutions

Simplifications

Linear Solving

Array Abstraction

Conversion to SAT

Boolean SAT Solver

Refinement Loop

UNSAT

Read(A,i0)=0 Read(A,i1)=1 … Read(A,in)=10,000 Θ(i0,i1)

Input

Monday, May 2, 2011

Page 39: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

STP vs. Other Solvers

30

Testcase (Formula Size) Result Z3 (sec)

Yices (sec)

STP(sec)

610dd9c (~15K) SAT TimeOut MemOut 37

Grep65 (~60K) UNSAT 0.3 TimeOut 4

Grep84 (~69K) SAT 176 TimeOut 18

Grep106 (~69K) SAT 130 TimeOut 227

Blaster4 (~262K) UNSAT MemOut MemOut 10

Testcase20 (~1.2M) SAT MemOut MemOut 56

Testcase21 (~1.2M) SAT MemOut MemOut 43

* All experiments on 3.2 GHz, 512 Kb cache* MemOut: 3.2 GB (Memory used by STP much smaller), TimeOut: 1800 seconds* Examples obtained from Dawn Song at Berkeley, David Molnar at Berkeley and Dawson Engler at Stanford* Experiments conducted in 2007

Monday, May 2, 2011

Page 40: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

STP vs. Other Leading Solvers

31

* All experiments on 2.4 GHz, 1 GB RAM* Timeout: 500 seconds/example

0

1000

2000

3000

4000

Boolector MSAT STP

Tim

e in

Sec

onds

STP vs. Boolector & MathSAT on 615 SMTCOMP 2007 - 2010 examples

Monday, May 2, 2011

Page 41: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Impact of STP

32

• Enabled existing SE technologies to scale • Bounded model checkers, e.g., Chang and Dill

• Easier to engineer SE technologies • Formal tools (ACL2+STP) for verifying Crypto, Smith & Dill

• Enabled new SE technologies • Concolic testing (EXE,Klee,...) by Engler et al., Binary Analysis by Song et al.

Monday, May 2, 2011

Page 42: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Impact of STP: Notable Projects

33

Category Research Project Project Leader/Institution

Formal MethodsACL2 Theorem Prover + STPVerification-aware Design CheckerJava PathFinder Model Checker

Eric Smith & David Dill/StanfordJacob Chang & David Dill/StanfordMehlitz & Pasareanu/NASA

Program AnalysisBitBlaze & WebBlazeBAP

Dawn Song et al./BerkeleyDavid Brumley/CMU

Automatic TestingSecurity

Klee, EXESmartFuzzKudzu

Engler & Cadar/StanfordMolnar & Wagner/BerkeleySaxena & Song/Berkeley

Hardware Bounded Model-cheking (BMC)

Blue-spec BMCBMC

Katelman & Dave/MITHaimed/NVIDIA

• Enabled Concolic Testing• 100+ reliability and security projects

Monday, May 2, 2011

Page 43: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Impact of STPhttp://www.metafuzz.com

34

Program NameLines of Code

Number of Bugs Found

Team

Mplayer ~900,000 HundredsDavid Molnar/Berkeley & Microsoft Research

Evince ~90,000 HundredsDavid Molnar/Berkeley & Microsoft Research

Unix Utilities 1000s Dozens Dawson Engler et al./Stanford

Crypto Hash Implementations

1000s Verified Eric Smith & David Dill/Stanford

Monday, May 2, 2011

Page 44: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Rest of the Talk

• STP Bit-vector and Array Solver• Why Bit-vectors and Arrays?• How does STP scale: Abstraction-refinement• Impact: Concolic testing• Experimental Results

• HAMPI String Solver• Why Strings?• How does HAMPI scale: Bounding• Impact: String-based program analysis• Experimental Results

• Future Work• Multicore SAT• SAT-based Languages

35Monday, May 2, 2011

Page 45: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

HAMPI String Solver

36

HAMPI SolverString

ExpressionsUNSAT

SAT

• X = concat(“SELECT...”,v) AND (X ∈ SQL_grammar)• JavaScript and PHP Expressions• Web applications, SQL queries • NP-complete

Monday, May 2, 2011

Page 46: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

What is the theory of Strings?

37

PHP/JavaScript/C++... Theory of Strings

Var a;$a = ‘name’

Var a : 12; //String variable of bounded-sizea = ‘name’

a.” is ” Concat(a, “ is “);

substr(a,1,3) sub-string extraction

assignments/strcmpa = string_expr;

equalitya = string_expr;

Sanity check using regular expression REExpression in a suitable Language (e.g., SQL)

a in RE a in SQL

• Capture String Expressions in PHP, JavaScript, Perl, C/C++/Java

• Support symbolic execution/program analysis

Monday, May 2, 2011

Page 47: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseString Operations in PHP, JavaScript,...

38

Program Reasoning Tool

String Program Specification

Program is Correct?or Generate Tests

HAMPI

Logic Formulas

SAT/UNSAT

Monday, May 2, 2011

Page 48: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseSQL Injection Vulnerabilities

39

BackendDataBase

Malicious SQL Query

Unauthorized Database Results

Buggy PHP/JavaScript

Monday, May 2, 2011

Page 49: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseSQL Injection Vulnerabilities

40Monday, May 2, 2011

Page 50: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseSQL Injection Vulnerabilities

41

if (input in regexp(“[0-9]+”)) query := “SELECT m FROM messages WHERE id=‘ ” + input + “ ’ “)

Buggy Script

• input passes validation (regular expression check)

• query is syntactically-valid SQL

• query can potentially contain an attack substring (e.g., 1’ OR ‘1’ = ‘1)

Monday, May 2, 2011

Page 51: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseSQL Injection Vulnerabilities

41

if (input in regexp(“[0-9]+”)) query := “SELECT m FROM messages WHERE id=‘ ” + input + “ ’ “)

Buggy Script

• input passes validation (regular expression check)

• query is syntactically-valid SQL

• query can potentially contain an attack substring (e.g., 1’ OR ‘1’ = ‘1)

Should be: “^[0-9]+$”

Monday, May 2, 2011

Page 52: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Use-caseSQL Injection Vulnerabilities

42

Hampi finds an attack input: v ≔ 1’ OR ‘1’ = ‘1SELECT msg FROM messages WHERE topicid=1’ OR ‘1’=’1’

Var v : 12;

cfg SqlSmall := "SELECT ” [a-z]+ " FROM ” [a-z]+ " WHERE " Cond;

cfg Cond := Val "=" Val | Cond " OR " Cond;

cfg Val := [a-z]+ | "'” [a-z0-9]* "'" | [0-9]+;

val q := concat("SELECT msg FROM messages WHERE topicid='", v, "'");

assert q in SqlSmall;

assert q contains "OR ‘1'=‘1'";

SQL Grammar

SQL Query

Input String

SQLI attack conditions

“q is a valid SQL query”

“q contains an attack vector”

assert v in [0-9]+;

Monday, May 2, 2011

Page 53: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Key Contribution: Bounded LogicsTesting, Vulnerability Detection,...

43

• Finding satisfying assignment is key

• Short assignments are sufficient

• Hence, bounding strings is sufficient

• Bounded logics are easier to decide

Monday, May 2, 2011

Page 54: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Hampi Key Conceptual ContributionBounding, expressiveness and efficiency

44

LiComplexity of∅ = L1 ∩ ... ∩ Ln

Current Solvers

Context-free Undecidable n/a

Regular PSPACE-complete Quantified Boolean Logic

Bounded NP-complete SATEfficient in practice

Monday, May 2, 2011

Page 55: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How Hampi WorksBird’s Eye View: Strings into Bit-vectors

45

Hampi

Find a 4-char string v:• (v) is in E• (v) contains ()()

var v : 4;

cfg E := “()” | E E | “(“ E “)”;

val q := concat( “(“, v, ”)”);

assert q in E;assert q contains “()()”;

STP Encoder

STP DecoderSTP

String Solutionv = )()(

Bit-vector Constraints

Bit-vector Solution

Normalizer

Monday, May 2, 2011

Page 56: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How Hampi WorksUnroll Bounded CFGs into Regular Exp.

46

Bound(E,6) ([()() + (())]) +()[()() + (())] +[()() + (())]()

Hampivar v : 4;

cfg E := “()” | E E | “(“ E “)”;

val q := concat( “(“, v, ”)”);

assert q in E;assert q contains “()()”;

STP Encoder

STP DecoderSTP

String Solutionv = )()(

Bit-vector Constraints

Bit-vector Solution

Normalizer

Monday, May 2, 2011

Page 57: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How Hampi WorksUnroll Bounded CFGs into Regular Exp.

46

Bound(E,6) ([()() + (())]) +()[()() + (())] +[()() + (())]()

Hampivar v : 4;

cfg E := “()” | E E | “(“ E “)”;

val q := concat( “(“, v, ”)”);

assert q in E;assert q contains “()()”;

STP Encoder

STP DecoderSTP

String Solutionv = )()(

Bit-vector Constraints

Bit-vector Solution

Normalizer

Bound Auto-derived

Monday, May 2, 2011

Page 58: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How Hampi WorksConverting Regular Exp. into Bit-vectors

47

( v ) ∈ ()[()() + (())] + [()() + (())]() + ([()() + (())])

Formula Φ1 ∨ Formula Φ2 ∨ Formula Φ3

Encode regular expressions recursively •  Alphabet { (, ) } 0, 1 •  constant bit-vector constant •  union + disjunction ∨ •  concatenation conjunction ∧ •  Kleene star * conjunction ∧•  Membership, equality equality

B[0]=0∧B[1]=1∧{B[2]=0∧B[3]=1∧B[4]=0∧B[5]=1 ∨…

Monday, May 2, 2011

Page 59: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

How Hampi WorksDecoder converts Bit-vectors to Strings

48

Hampi

Find a 4-char string v:• (v) is in E• (v) contains ()()

var v : 4;

cfg E := “()” | E E | “(“ E “)”;

val q := concat( “(“, v, ”)”);

assert q in E;assert q contains “()()”;

STP Encoder

STP DecoderSTP

String Solutionv = )()(

Bit-vector Constraints

Bit-vector Solution

Normalizer

Monday, May 2, 2011

Page 60: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

HAMPI: Result 1Static SQL Injection Analysis

49

0.01

0.1

1

10

100

1000

1 10 100 1000 10000 100000

Tim

e To

Sol

ve (

sec)

Grammar Size (# of productions)

• 1367 string constraints from Wasserman & Su [PLDI’07] • Hampi scales to large grammars• Hampi solved 99.7% of constraints in < 1sec• All solvable constraints had short solutions

Monday, May 2, 2011

Page 61: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

HAMPI: Result 2Security Testing

50

• Hampi used to build Ardilla security tester [Kiezun et al., ICSE’09]

• 60 new vulnerabilities on 5 PHP applications (300+ kLOC)• 23 SQL injection• 37 cross-site scripting (XSS) 5 added to

US National Vulnerability DB

• 46% of constraints solved in < 1 second per constraint

• 100% of constraints solved in <10 seconds per constraint

Monday, May 2, 2011

Page 62: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

HAMPI: Result 3Comparison with Competing Tools

51av

erag

e tim

e (se

c.)

0 10 20 30 40 50 0

5

10

15

20

25

Hampi

CFGAnalyzer

string size (characters)

• HAMPI vs. CFGAnalyzer (U. Munich): HAMPI ~7x faster for strings of size 50+• HAMPI vs. Rex (Microsoft Research): HAMPI ~100x faster for strings of size 100+• HAMPI vs. DPRLE (U. Virginia): HAMPI ~1000x faster for strings of size 100+

Monday, May 2, 2011

Page 63: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Impact of Hampi: Notable Projects

52

Category Research Project Project Leader/Institution

Static Analysis SQL-injection vulnerabilities Wasserman & Su/UC, Davis

Security TestingArdilla for PHP (SQL injections, cross-site scripting)

Kiezun & Ernst/MIT

Concolic TestingKleeSAGEKudzuNoTamper

Engler & Cadar/StanfordGodefroid/Microsoft ResearchSaxena & Song/BerkeleyBisht & Venkatakrishnan/U Chicago

New Solvers Kaluza Saxena & Song/Berkeley

Monday, May 2, 2011

Page 64: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Rest of the Talk

• STP Bit-vector and Array Solver• Why Bit-vectors and Arrays?• How does STP scale: Abstraction-refinement• Impact: Concolic testing• Experimental Results

• HAMPI String Solver• Why Strings?• How does HAMPI scale: Bounding• Impact: String-based program analysis• Experimental Results

• Future Work• Multicore SAT• SAT-based Languages

53Monday, May 2, 2011

Page 65: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Current Parallel SAT ApproachesWon’t Scale with more Nodes

• Portfolio or search-space split approach (ManySAT, pLingeling,...)• Works ok on clusters• Confirmed thru’ experimentation:

• 12x speedup on a 128 node cluster• Not close to linear speedup

54

Heuristic 1

Heuristic 5Heuristic 4

Heuristic 3Heuristic 2

Monday, May 2, 2011

Page 66: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

PSAT: Parallel SAT ApproachPartition SAT-Input into k Pieces

• Didn’t work on clusters; much better prospects with multicore• Latency much better on multicore than cluster • Software engineering instances partition well• Heuristics to minimize communication overhead

55Monday, May 2, 2011

Page 67: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Imperative Language WithSAT-based Declarative Primitives

• Motivation: • Declarative can be more robust • Delegating the “how” to runtime

• Combine imperative and SAT-based declarative language• Efficient solvers evaluate and search• Solvers leverage multicores

• Examples• Squander by Milicevic, Rayside and Daniel Jackson (MIT)

56Monday, May 2, 2011

Page 68: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Related Work

• Model Checking: • Abstraction-refinement (Ed Clarke et al.)• Bounding (Ed Clarke, Daniel Jackson et al.)

• Other SMT solvers• Unsat core based approximations (Randy Bryant et al.)• Z3, CVC3, Boolector, BAT....

• DPLL(T)• Tinelli, Nieuwenhuis and Oliviera

57Monday, May 2, 2011

Page 69: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Conclusions

• Logic formulas can capture meta-properties of software• The right logical abstraction (bit-vector and arrays, strings,...)

• Exploit meta-properties in solving formulas efficiently• Locality, modularity,...

• The more SMT solving, the less program analysis• Automation, ease-of-use,...

58Monday, May 2, 2011

Page 70: Solvers for Software Reliability and Securityvganesh/talks/vijayganesh-2011...From Reliability Problem to Solvers 7 Formal Methods Program Analysis Automatic Testing Program Synthesis

Questions?

59

* 100+ research projects use STP and HAMPI (NSF funding $600,000.00)* STP won the SMTCOMP 2006 and 2010 competitions for bit-vector solvers* HAMPI paper won ACM Best Paper Award 2009

Contributions at a Glance Future Work

• STP* & HAMPI* (CAV 2007, TISSEC 2008, ISSTA 2009)

• Decidability/Undecidability results for strings (under submission)

• BuzzFuzz: Directed Whitebox Fuzzing (ICSE 2009)

• Concolic testers (JFuzz: NFM 2009)

• Solvers for integer linear arithmetic (FMCAD 2002, TACAS 2003)

• Retargetable compilers (DATE 1999)

• Parallel SAT

• SAT-based programming languages

• Program hardening • Solvers for rich theories (attribute grammars, floating-point)

• Auto-tuning SAT solvers

• Advice-based SAT solvers

• Unsound and incomplete solvers

• Solver-based concurrency bug-finding

Monday, May 2, 2011


Recommended