Open Source Model Checking Radu Grosu SUNY at Stony Brook

transcript

Open Source Model Checking

Radu Grosu SUNY at Stony Brook

Joint work with

X. Huang, S. Jain and S. A. Smolka

GCC Compiler

• Early stages: A modest C compiler.- Translation: source code translated directly to RTL.

- Optimization: at low RTL level.

- High level information lost: calls, structures, fields, etc.

• Now days: Full blown, multi-language compiler generating code for more than 30 architectures.

- Input: C, C++, Objective-C, Fortran, Java and Ada.

- Tree-SSA: added GENERIC, GIMPLE and SSA ILs.

- Optimization: at GENERIC, GIMPLE, SSA and RTL levels.

- Verification: Tree-SSA API suitable for verification, too.

GCC Compilation Process

Java FileC++ FileC File

C Parser

C++ Parser

Java Parser

Genericize

Gimplify

Parse Tree

GEN AST

GPL AST

Code Gen

Build CFG

GPL AST

Rest Comp

SSA/GPL CFG

RTL Code

Obj Code

C Program and its GIMPLE IL

int main() {

int a,b,c;

a = 5;

b = a + 10;

c = a + foo(a,b);

if (a > c)

c = b++/a + b*a;

bar(a,b,c); }

int main { int a,b,c; int T1,T2,T3,T4;

a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1;

if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1;fi: bar(a,b,c); }

Associated GIMPLE CFG

a = 5;b = a + 10;T1 = foo(a,b);T2 = b + T1;if (a > T2) goto B;

=T3 = b / a;T4 = b * a;c = T3 + T4;b = b + 1;

bar(a,b,c);return;

true falseBC

FUNCTION DECL

Entry int int int int int int inta T4T3T2c T1b

GCC Model Checking (GMC)

• GMC: a suite of analysis and verification tools we are developing for the Tree-SSA level of GCC. Currently:

– Intra-procedural slicer: in work is inter-procedural slicing.

– Symbolic execution engine: for Boolean C programs.

– Interpreter: traverses the CFG using Tree-SSA iterators.

– Monte Carlo MC (GMC2): OSE, randomized alg. for LTL MC.

• GMC2: a newly developed technique that uses the theory of geometric random variables, statistical hypothesis testing and random sampling of lassos.

recurrencediameter

LassosComputation tree (CT)

Explore all lassos in the CT

DDFS,SCC: time efficient DFS: memory efficient

LTL MC Finding Accepting Lassos

Randomized Algorithms

• Takes of next step algorithm may depend on random choice (coin flip).

– Benefits: simplicity, efficiency, and symmetry breaking.

• Monte Carlo: may produce incorrect result but with bounded error probability.– Example: Election’s result prediction

• Las Vegas: always gives correct result but running time is a random variable.

– Example: Randomized Quick Sort

recurrencediameter

Explore N(,) independent lassos in the CT

Error margin and confidence ratio

Monte Carlo Approach

LTL…

flip a k-sided coin

LassosComputation tree (CT)

Bernoulli Random Variable Z(coin flip)

¼ ⅛

p(0) = P[Z=0] = qZ = 7/8

p(1) = P[Z=1] = pZ = 1/8

Probability mass function:

Geometric Random Variable

• Value of geometric RV X with parameter pz:

– No. of independent lassos until success.

• Probability mass function:

– p(N) = P[X = N] = qzN-1 pz

• Cumulative Distribution Function:

– F(N) = P[X N] = ∑i Np(i) = 1 – qzN = 1 – (1- pz)N

How Many Lassos?

• Requiring 1- (1-pz)N = 1- δ yields:

N = ln (δ) / ln (1- pz)

• Lower bound on number of trials N needed to achieve success with confidence ratio δ.

What If pz Unknown?

• Requiring pz ε yields:

M = ln (δ) / ln (1- ε) N = ln (δ) / ln (1- pz)

and therefore P[X M] 1- δ

• Lower bound on number of trials M needed to achieve success with

confidence ratio δ and error margin ε .

Statistical Hypothesis Testing

• Null hypothesis H0: pz ε

• Alternative hypothesis H1: pz < ε

• If no success after N trials, then reject H0

• Type I error: α = P[ X > M | H0 ] < δ

• Since: P[ X M | H0 ] 1- δ

Monte Carlo Model Checking (MC2)

input: B=(Σ,Q,Q0,δ,F), ε, δ

N = ln (δ) / ln (1- ε)

for (i = 1; i N; i++)

if (RL(B) == 1) return (1, error-trace);

return (0, “reject H0 with α = Pr[ X>N | H0 ] < δ”);

where RL(B) performs a uniform random walk through B to obtain a random lasso.

GCC MC2 (GMC2)

• Input: a set of CFGs.– Main function: A specifically designated CFG.

• Random walks in the Büchi automaton: generated on-the-fly.– Initial state: of the main routine + bookkeeping information.

– Next state: choose process + call interpreter on its CFG.

– Processes: created by using the fork primitive.

– Optimization: interpreter returns only upon context switch.

• Lassos: detected by using a hierarchic hash table.– Local variables: removed upon return from a procedure.

Shared Variables Valuation(channels & semaphores)

List Of Process statesp1 p2 p3 …

CFG Name Statement #

Control State Data State

Program State

Shared Variables Valuation(channels & semaphores)

List Of Process statesp1 p2 p3 …

Heap Global Variables Valuation

Control State Data State

Frame Stack

Return Control State Local Variables Valuation

f1 f2 …

Program State

Interpreter

• Interprets GIMPLE statements: according to their semantics. Interesting:– Inter-procedural: call(), return(). Manipulate the frame

stack.

• Catches and interprets: function calls to various modeling and concurrency primitives:– Modeling: toss(), assert(). Nondeterminism and checks.

– Processes: fork(), … Manipulate the process list.

– Communication: send(), recv(). Manipulate shared vars. May involve a context switch.

GMC2property rule bugs time sampl

1 no 0.23 1278 Safe Advisory Selection 2 yes 0.03 147

1 no 0.23 1278 Best Advisory Selection 2 yes 0.04 206

1 yes 0.01 36 Avoid unnecessary Crossing 2 yes 0.03 180

1 yes 0.01 27No. Crossing Adv. Selection 2 yes 0.01 8

1 no 0.23 1278Optimal Advisory Selection 2 yes 0.06 217

Results: TCAS

GMC2 Verisoftph time sampl ce.len time states trans

4 0:00.07 2 12 0:00.61 16 37 6 0:00.11 4 12 0:16.60 773 11718 0:00.78 11 20 2:57.29 5431 8449 10 0:02.17 31 24 10:41 17908 31433 12 0:04.82 24 27 >2hr N/A N/A 14 0:06.22 22 44 >2hr N/A N/A

16 0:11.56 14 32 >2hr N/A N/A

(Deadlock freedom)

DPh: Symmetric Fair Version

GMC2 Verisoft Genetic time sampl time states time errors

6h 37' 10,682,639 >8h N/A 2h 33' 3

Needham-Schroeder Protocol

• Quite sophisticated C implementation.

• However, of a sequential nature:- Essentially executes only one round of a reactive system

Related Work

• Software model checkers for concurrent C/C++: – VeriSoft, Spin, Blast (Slam), Magic, C-Wolf. Bogor?

• Cooperative Bug Isolation [Liblit, Naik & Zheng]:– Compile-time instrumentation. Distribute binaries/collect bugs.

– Statistical analysis to isolate erroneous code segments.

• Random interpretation [Gulvany & Necula]: – Execute random paths and merge with random linear operators.

• Monte Carlo and abstract interpretation [Monniaux]: – Analyze programs with probabilistic and nondeterministic input.

Conclusions

• Presented GMC2: a software MC for GCC based on Monte Carlo MC:

– At Tree-SSA level: applicable to C, C++, Ada, Java, etc.

– Open source: freely available for usage/critique/extension.

• Ongoing and Future Work: Create a software MC branch of GCC, which also includes:

– Automated abstraction/refinement/interpolation techniques.

– Currently we manually apply a form of bounded-range abstraction (e.g. in TCAS).

Talk Outline

1. Model Checking

2. Randomized Algorithms

3. LTL Model Checking

4. Probability Theory Primer

5. Monte Carlo Model Checking

6. Implementation & Results

7. Conclusions & Open Problem

Linear Temporal Logic

• LTL formula: made up inductively of

• atomic propositions p, boolean connectives , , • temporal modalities X (neXt) and U (Until).

• Safety: “nothing bad ever happens”

E.g. G( (pc1=cs pc2=cs)) where G is a derived modality (Globally).

• Liveness: “something good eventually happens”

E.g. G( req F serviced ) where F is a derived modality (Finally).

Model Checking

• S is a nondeterministic/concurrent system.

is a temporal logic formula.

– in our case Linear Temporal Logic (LTL).

• Basic idea: intelligently explore S’s state space in attempt to establish S |= .

LTL Model Checking

• Every LTL formula can be translated to a Büchi automaton B such that L() = L(B)

• Automata-theoretic approach:

S |= iff L(BS) L(B ) iff L(BS B )

• Checking non-emptiness is equivalent to finding a reachable accepting cycle (lasso).

Emptiness Checking

• Checking non-emptiness is equivalent to finding an accepting cycle reachable from initial state (lasso).

• Double Depth-First Search (DDFS) algorithm can be used to search for such cycles, and this can be done on-the-fly!

s1 s2 s3 sksk-2 sk-1

sk+1sk+2sk+3sn

Randomized Algorithms

• Huge impact on CS: (distributed) algorithms, complexity theory, cryptography, etc.

• Takes of next step algorithm may depend on random choice (coin flip).

• Benefits of randomization include simplicity, efficiency, and symmetry breaking.

Lassos Probability Space

• Sample Space: lassos in BS B

• Bernoulli random variable Z :

– Outcome = 1 if randomly chosen lasso accepting

– Outcome = 0 otherwise

• pZ = ∑ pi Zi (expectation of an accepting lasso)

where pi is lasso prob. (uniform random walk)

Bernoulli Random Variable(coin flip)

• Value of Bernoulli RV Z:

Z = 1 (success) & Z = 0 (failure)

• Probability mass function:

p(1) = Pr[Z=1] = pz

p(0) = Pr[Z=0] = 1- pz = qz

• Expectation: E[Z] = pz

• Example: Given a fair and a biased coin.

– Null hypothesis H0 - fair coin selected.

– Alternative hypothesis H1 - biased coin selected.

• Hypothesis testing: Perform N trials.

– If number of heads is LOW, reject H0 .

– Else fail to reject H0 .

H0 is True H0 is False

reject H0

Type I error

w/prob. α

Correct to reject H0

fail to reject H0

Correct to fail to

reject H0

Type II error

w/prob. β

Random Lasso (RL) Algorithm

Buchi automaton B; sample lasso; return 0 if accepting; 1 if not;

input : output :

while s := rInit(B); i := 1; f := 0;

(2) (s HashTbl) {(3) HashTbl(s) := i;(4) acc

:= iif ;

s := rNext(s,B); i := i +1; }(6) (HashTbl(s) f) 0if return elsere urn 1;

Correctness of MC2

Theorem: Given a Büchi automaton B, error margin ε, and confidence ratio δ, if MC2 rejects H0, then its type I error has probability

α = P[ X > M | H0 ] < δ

Complexity of MC2

Theorem: Given a Büchi automaton B having diameter D, error margin ε, and confidence ratio δ, MC2 runs in time O(N∙D) and uses space O(D), where N = ln(δ) / ln(1- ε)

Cf. DDFS which runs in O(2|S|+|φ|) time

for B = BS B .

Alternative Sampling Strategies

0 1 nn-1

• Multilasso sampling: ignores backedges that do not lead to an accepting lasso.

Pr[Ln]= O(2-n)

• Probabilistic systems: there is a natural way to assign a probability to a RL.

• Input partitioning: partition input into classes that trigger the same behavior (guards).

Open Source Model Checking Radu Grosu SUNY at Stony Brook

Documents