Open Source Model Checking Radu Grosu SUNY at Stony Brook

Post on 03-Jan-2016

21 views 0 download

description

Open Source Model Checking Radu Grosu SUNY at Stony Brook. Joint work with X. Huang, S. Jain and S. A. Smolka. GCC Compiler. Early stages: A modest C compiler. Translation: source code translated directly to RTL. Optimization: at low RTL level. - PowerPoint PPT Presentation

transcript

Open Source Model Checking

Radu Grosu SUNY at Stony Brook

Joint work with

X. Huang, S. Jain and S. A. Smolka

GCC Compiler

• Early stages: A modest C compiler.- Translation: source code translated directly to RTL.

- Optimization: at low RTL level.

- High level information lost: calls, structures, fields, etc.

• Now days: Full blown, multi-language compiler generating code for more than 30 architectures.

- Input: C, C++, Objective-C, Fortran, Java and Ada.

- Tree-SSA: added GENERIC, GIMPLE and SSA ILs.

- Optimization: at GENERIC, GIMPLE, SSA and RTL levels.

- Verification: Tree-SSA API suitable for verification, too.

GCC Compilation Process

Java FileC++ FileC File

C Parser

C++ Parser

Java Parser

Genericize

Gimplify

Parse Tree

GEN AST

..

GPL AST

Code Gen

Build CFG

GPL AST

Rest Comp

SSA/GPL CFG

RTL Code

Obj Code

C Program and its GIMPLE IL

int main() {

int a,b,c;

a = 5;

b = a + 10;

c = a + foo(a,b);

if (a > c)

c = b++/a + b*a;

bar(a,b,c); }

int main { int a,b,c; int T1,T2,T3,T4;

a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1;

if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1;fi: bar(a,b,c); }

Associated GIMPLE CFG

a = 5;b = a + 10;T1 = foo(a,b);T2 = b + T1;if (a > T2) goto B;

A

a 5

=CE

b

a 10

+

=

CE

CE

b

T1

foo a

CallE

= B

a T2

>

if

CE

T2

b T1

+

=T3 = b / a;T4 = b * a;c = T3 + T4;b = b + 1;

bar(a,b,c);return;

Exit

true falseBC

FUNCTION DECL

Entry int int int int int int inta T4T3T2c T1b

GCC Model Checking (GMC)

• GMC: a suite of analysis and verification tools we are developing for the Tree-SSA level of GCC. Currently:

– Intra-procedural slicer: in work is inter-procedural slicing.

– Symbolic execution engine: for Boolean C programs.

– Interpreter: traverses the CFG using Tree-SSA iterators.

– Monte Carlo MC (GMC2): OSE, randomized alg. for LTL MC.

• GMC2: a newly developed technique that uses the theory of geometric random variables, statistical hypothesis testing and random sampling of lassos.

recurrencediameter

LassosComputation tree (CT)

Explore all lassos in the CT

DDFS,SCC: time efficient DFS: memory efficient

LTL MC Finding Accepting Lassos

LTL

Randomized Algorithms

• Takes of next step algorithm may depend on random choice (coin flip).

– Benefits: simplicity, efficiency, and symmetry breaking.

• Monte Carlo: may produce incorrect result but with bounded error probability.– Example: Election’s result prediction

• Las Vegas: always gives correct result but running time is a random variable.

– Example: Randomized Quick Sort

recurrencediameter

Explore N(,) independent lassos in the CT

Error margin and confidence ratio

Monte Carlo Approach

LTL…

flip a k-sided coin

LassosComputation tree (CT)

Bernoulli Random Variable Z(coin flip)

1

2

3

4

1

1 2

4 3

4 41

4

½

¼ ⅛

p(0) = P[Z=0] = qZ = 7/8

p(1) = P[Z=1] = pZ = 1/8

Probability mass function:

Geometric Random Variable

• Value of geometric RV X with parameter pz:

– No. of independent lassos until success.

• Probability mass function:

– p(N) = P[X = N] = qzN-1 pz

• Cumulative Distribution Function:

– F(N) = P[X N] = ∑i Np(i) = 1 – qzN = 1 – (1- pz)N

How Many Lassos?

• Requiring 1- (1-pz)N = 1- δ yields:

N = ln (δ) / ln (1- pz)

• Lower bound on number of trials N needed to achieve success with confidence ratio δ.

What If pz Unknown?

• Requiring pz ε yields:

M = ln (δ) / ln (1- ε) N = ln (δ) / ln (1- pz)

and therefore P[X M] 1- δ

• Lower bound on number of trials M needed to achieve success with

confidence ratio δ and error margin ε .

Statistical Hypothesis Testing

• Null hypothesis H0: pz ε

• Alternative hypothesis H1: pz < ε

• If no success after N trials, then reject H0

• Type I error: α = P[ X > M | H0 ] < δ

• Since: P[ X M | H0 ] 1- δ

Monte Carlo Model Checking (MC2)

input: B=(Σ,Q,Q0,δ,F), ε, δ

N = ln (δ) / ln (1- ε)

for (i = 1; i N; i++)

if (RL(B) == 1) return (1, error-trace);

return (0, “reject H0 with α = Pr[ X>N | H0 ] < δ”);

where RL(B) performs a uniform random walk through B to obtain a random lasso.

GCC MC2 (GMC2)

• Input: a set of CFGs.– Main function: A specifically designated CFG.

• Random walks in the Büchi automaton: generated on-the-fly.– Initial state: of the main routine + bookkeeping information.

– Next state: choose process + call interpreter on its CFG.

– Processes: created by using the fork primitive.

– Optimization: interpreter returns only upon context switch.

• Lassos: detected by using a hierarchic hash table.– Local variables: removed upon return from a procedure.

Shared Variables Valuation(channels & semaphores)

List Of Process statesp1 p2 p3 …

CFG Name Statement #

Control State Data State

Program State

Shared Variables Valuation(channels & semaphores)

List Of Process statesp1 p2 p3 …

Heap Global Variables Valuation

Control State Data State

Frame Stack

Return Control State Local Variables Valuation

f1 f2 …

Program State

Interpreter

• Interprets GIMPLE statements: according to their semantics. Interesting:– Inter-procedural: call(), return(). Manipulate the frame

stack.

• Catches and interprets: function calls to various modeling and concurrency primitives:– Modeling: toss(), assert(). Nondeterminism and checks.

– Processes: fork(), … Manipulate the process list.

– Communication: send(), recv(). Manipulate shared vars. May involve a context switch.

GMC2property rule bugs time sampl

1 no 0.23 1278 Safe Advisory Selection 2 yes 0.03 147

1 no 0.23 1278 Best Advisory Selection 2 yes 0.04 206

1 yes 0.01 36 Avoid unnecessary Crossing 2 yes 0.03 180

1 yes 0.01 27No. Crossing Adv. Selection 2 yes 0.01 8

1 no 0.23 1278Optimal Advisory Selection 2 yes 0.06 217

Results: TCAS

GMC2 Verisoftph time sampl ce.len time states trans

4 0:00.07 2 12 0:00.61 16 37 6 0:00.11 4 12 0:16.60 773 11718 0:00.78 11 20 2:57.29 5431 8449 10 0:02.17 31 24 10:41 17908 31433 12 0:04.82 24 27 >2hr N/A N/A 14 0:06.22 22 44 >2hr N/A N/A

16 0:11.56 14 32 >2hr N/A N/A

(Deadlock freedom)

DPh: Symmetric Fair Version

GMC2 Verisoft Genetic time sampl time states time errors

6h 37' 10,682,639 >8h N/A 2h 33' 3

Needham-Schroeder Protocol

• Quite sophisticated C implementation.

• However, of a sequential nature:- Essentially executes only one round of a reactive system

Related Work

• Software model checkers for concurrent C/C++: – VeriSoft, Spin, Blast (Slam), Magic, C-Wolf. Bogor?

• Cooperative Bug Isolation [Liblit, Naik & Zheng]:– Compile-time instrumentation. Distribute binaries/collect bugs.

– Statistical analysis to isolate erroneous code segments.

• Random interpretation [Gulvany & Necula]: – Execute random paths and merge with random linear operators.

• Monte Carlo and abstract interpretation [Monniaux]: – Analyze programs with probabilistic and nondeterministic input.

Conclusions

• Presented GMC2: a software MC for GCC based on Monte Carlo MC:

– At Tree-SSA level: applicable to C, C++, Ada, Java, etc.

– Open source: freely available for usage/critique/extension.

• Ongoing and Future Work: Create a software MC branch of GCC, which also includes:

– Automated abstraction/refinement/interpolation techniques.

– Currently we manually apply a form of bounded-range abstraction (e.g. in TCAS).

Talk Outline

1. Model Checking

2. Randomized Algorithms

3. LTL Model Checking

4. Probability Theory Primer

5. Monte Carlo Model Checking

6. Implementation & Results

7. Conclusions & Open Problem

Linear Temporal Logic

• LTL formula: made up inductively of

• atomic propositions p, boolean connectives , , • temporal modalities X (neXt) and U (Until).

• Safety: “nothing bad ever happens”

E.g. G( (pc1=cs pc2=cs)) where G is a derived modality (Globally).

• Liveness: “something good eventually happens”

E.g. G( req F serviced ) where F is a derived modality (Finally).

Model Checking

• S is a nondeterministic/concurrent system.

is a temporal logic formula.

– in our case Linear Temporal Logic (LTL).

• Basic idea: intelligently explore S’s state space in attempt to establish S |= .

LTL Model Checking

• Every LTL formula can be translated to a Büchi automaton B such that L() = L(B)

• Automata-theoretic approach:

S |= iff L(BS) L(B ) iff L(BS B )

• Checking non-emptiness is equivalent to finding a reachable accepting cycle (lasso).

Emptiness Checking

• Checking non-emptiness is equivalent to finding an accepting cycle reachable from initial state (lasso).

• Double Depth-First Search (DDFS) algorithm can be used to search for such cycles, and this can be done on-the-fly!

s1 s2 s3 sksk-2 sk-1

sk+1sk+2sk+3sn

DFS2

DFS1

Randomized Algorithms

• Huge impact on CS: (distributed) algorithms, complexity theory, cryptography, etc.

• Takes of next step algorithm may depend on random choice (coin flip).

• Benefits of randomization include simplicity, efficiency, and symmetry breaking.

Lassos Probability Space

• Sample Space: lassos in BS B

• Bernoulli random variable Z :

– Outcome = 1 if randomly chosen lasso accepting

– Outcome = 0 otherwise

• pZ = ∑ pi Zi (expectation of an accepting lasso)

where pi is lasso prob. (uniform random walk)

Bernoulli Random Variable(coin flip)

• Value of Bernoulli RV Z:

Z = 1 (success) & Z = 0 (failure)

• Probability mass function:

p(1) = Pr[Z=1] = pz

p(0) = Pr[Z=0] = 1- pz = qz

• Expectation: E[Z] = pz

Statistical Hypothesis Testing

• Example: Given a fair and a biased coin.

– Null hypothesis H0 - fair coin selected.

– Alternative hypothesis H1 - biased coin selected.

• Hypothesis testing: Perform N trials.

– If number of heads is LOW, reject H0 .

– Else fail to reject H0 .

Statistical Hypothesis Testing

H0 is True H0 is False

reject H0

Type I error

w/prob. α

Correct to reject H0

fail to reject H0

Correct to fail to

reject H0

Type II error

w/prob. β

Random Lasso (RL) Algorithm

Buchi automaton B; sample lasso; return 0 if accepting; 1 if not;

(1)

input : output :

while s := rInit(B); i := 1; f := 0;

(2) (s HashTbl) {(3) HashTbl(s) := i;(4) acc

R

(

AL

s,

V al

B) f

gor

:= iif ;

ithm

(5) t

s := rNext(s,B); i := i +1; }(6) (HashTbl(s) f) 0if return elsere urn 1;

Correctness of MC2

Theorem: Given a Büchi automaton B, error margin ε, and confidence ratio δ, if MC2 rejects H0, then its type I error has probability

α = P[ X > M | H0 ] < δ

Complexity of MC2

Theorem: Given a Büchi automaton B having diameter D, error margin ε, and confidence ratio δ, MC2 runs in time O(N∙D) and uses space O(D), where N = ln(δ) / ln(1- ε)

Cf. DDFS which runs in O(2|S|+|φ|) time

for B = BS B .

Alternative Sampling Strategies

0 1 nn-1

• Multilasso sampling: ignores backedges that do not lead to an accepting lasso.

Pr[Ln]= O(2-n)

• Probabilistic systems: there is a natural way to assign a probability to a RL.

• Input partitioning: partition input into classes that trigger the same behavior (guards).