Context-bounded model checking of concurrent
softwareShaz Qadeer
Microsoft Research
Joint work with:•Jakob Rehof, Microsoft Research•Dinghao Wu, Princeton University
Concurrent software
•Operating systems, device drivers•Databases, web servers, browsers, GUIs, ...•Modern languages: C#, Java
Processor 1
Processor 2
Thread 1
Thread 2
Thread 3
Thread 4
Concurrency is increasingly important
• New classes of concurrent software– Web services– Workflows
• Single-chip multiprocessors are an architectural inflexion point– Software running on these chips will be
even more concurrent
Reliable concurrent software?
•Correctness Problem– does program behaves correctly for all
inputs and all interleavings?
•Bugs due to concurrency are insidious – non-deterministic, timing dependent– difficult to detect, reproduce, eliminate– coverage from testing very poor
Analysis of concurrent programs is difficult (1)
•Finite-data single-procedure program– n lines– m states for global data variables
•1 thread– n * m states
•K threads– (n)
K * m states
Analysis of concurrent programs is difficult (2)
• Finite-data program with procedures– n lines– m states for global data variables
• 1 thread– Infinite number of states– Can still decide assertions in O(n * m3)– SLAM, ESP, BLAST implement this algorithm
• K 2 threads– Undecidable! (Ramalingam 00)
Context-bounded verification of concurrent software
Context Context Context
Context switch Context switch
Analyze all executions with small number of context switches !
• Many subtle concurrency errors are manifested in executions with a small number of contexts
• Context-bounded analysis can be performed efficiently
Why context-bounded analysis?
KISS: A static checker for concurrent software
• An implementation of context-bounded analysis– Technique to use any sequential checker
to perform context-bounded concurrency analysis
• Has found a number of concurrency errors in NT device drivers
Sequentialprogram QKISS
Sequential Checker
Concurrentprogram P
No error found
Error in Q indicateserror in P
KISS: A static checker for concurrent software
Sequentialprogram QKISS
Concurrentprogram P
KISS: A static checker for concurrent software
No error found
Error in Q indicateserror in P
SDV
Sequentialprogram QKISS
Concurrentprogram P
KISS: A static checker for concurrent software
No error found
Error in Q indicateserror in P
PREfix
Sequentialprogram QKISS
Concurrentprogram P
KISS: A static checker for concurrent software
No error found
Error in Q indicateserror in P
ESP
Inside a static checker for sequential programs
int x, y, z;
void foo ( ) { if (x > y) { y = x; } if (y > z) { z = y; }
assert (x ≤ z);}
• Symbolically analyze all paths
• Check the assertion for each path
• Interprocedural analysis – e.g., PREfix, ESP, SLAM,
BLAST
KISS strategy
• Q encodes executions of P with small number of context switches– instrumentation introduces lots of extra
paths to mimic context switches
• Leverage all-path analysis of sequential checkers
Sequentialprogram QKISS
Concurrentprogram P
SDV
KISS features• KISS trades off soundness for scalability • Cost of analyzing a concurrent program P
= cost of analyzing a sequential program Q– Size of Q asymptotically same as size of P
• Unsoundness is precisely quantifiable– for 2-thread program, explores all executions
with up to two context switches – for n-thread program, explores up to 2n-2
context switches
• Allows any sequential checker to analyze concurrency
Experimental Evaluation of KISS
Driver Stopping Error in Bluetooth Driver (1 KLOC)
DispatchRoutine() { int t; if (! de->stopping) { AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); }}
PnpStop() { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent); driverStopped = T;}
int t;if (! de->stopping) {
int t;de->stopping = T;t = AtomicDecr(& de->count);if (t == 0) SetEvent(& de->stopEvent);WaitEvent(& de->stopEvent);driverStopped = T;
AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent);}
Assertion fails!
DispatchRoutine(IRP *irp) { … irp->CancelRoutine = PacketCancelRoutine; Enqueue(irp); IoMarkIrpPending(irp); …}
IoCancelIrp(IRP *irp) { IoAcquireCancelSpinLock(); if (irp->CancelRoutine) { (irp->CancelRoutine)(irp); } …}
PacketCancelRoutine(IRP *irp) { … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock(); …}
IRP Cancellation Error in Packet Driver (2.5 KLOC)
…irp->CancelRoutine = PacketCancelRoutine;Enqueue(irp);
IoAcquireCancelSpinLock();if (irp->CancelRoutine) { // inline PacketCancelRoutine(irp) … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock();
IoMarkIrpPending(irp);
Error: An irp should not be marked pending after it has been completed !
Data-race Conditions in DDK Sample Drivers
• Device extension shared among threads• Data-races on device extension fields• 18 sample DDK drivers
– Range 0.5-9.2 KLOC– Total 70 KLOC
• Each field checked separately with resource limit of 20 minutes and 800MB
• Two threads: each calls nondeterministically chosen dispatch routine
9929.2Fdc
1347.6Mouser
1367.4Kbdclass
1 347.0Mouclass
5246.6Toaster/func
2415.9Serenum
0305.0Toaster/bus
6392.9Fakemodem
1182.81394vdev
0182.71394diag
0162.4Diskperf
181.4Toaster/toastmon
091.1Startio
151.1Imca
0151.1Kbfiltr
0141.0Moufiltr
030.5Tracedrv
# Races# FieldsKLOCDriver
Total:30 races
Keep It Simple and Sequential
• Context-bounded analysis by leveraging existing sequential checkers
• Validates the hypothesis that many concurrency errors require few context switches to show up
However…
• Hard limit on number of explored contexts– e.g., two context switches for concurrent
program with two threads
• Case study: Concurrent transaction management code written in C# (Naik-Rehof 04)– Analyzed by the Zing model checker after
automatically translating to the Zing input language
– Found three bugs each requiring between three and four context switches
Is a tuning knob possible?
Given a concurrent boolean program P and a positive integer c, does P go wrong by failing an assertion via anexecution with at most c contexts?
Given a concurrent boolean program P, does P go wrong by failing an assertion? Undecidable
Decidable
Given a concurrent boolean program P with unbounded fork-join parallelism and a positive integer c, does P go wrong by failing an assertion via an execution with at most c contexts? Decidable
Context Context Context
Context switch Context switch
Problem:• Unbounded computation possible within each context!• Unbounded execution depth and reachable state space• Different from bounded-depth model checking
Global store g, valuation to global variablesLocal store l, valuation to local variables Stack s, sequence of local storesState (g, s)
Sequential pushdown system
Transition relation:
(g, s) (g’, s’)
Reachability problem for sequential pushdown
systemGiven (g, s), is there s’ such that (g, s) * (error,s’)?
Aggregate state
Set of stacks ssAggregate state (g, ss) = { (g,s) | s ss }
Reach(g, ss, g’) = {s’ | (g’, s’) Reach(g, ss)}
Reach(g, ss) = { (g’, s’) | exists s ss such that (g, s) * (g’, s’) }
Theorem (Buchi, Schwoon00)
• If ss is regular, then Reach(g, ss, g’) is regular.
• If ss is given as a finite automaton A, then a finite automaton A’ for Reach(g, ss, g’) can be constructed from A in polynomial time.
Algorithm
Solution:Compute automaton for Reach(g, {s}, error) and report error if it is nonempty.
Problem:Given (g, s), is there s’ such that (g, s) * (error,s’)?
Global store g, valuation to global variablesLocal store l, valuation to local variables Stack s, sequence of local storesState (g, s1, s2)
Concurrent pushdown system
Transition relation:
(g, s1) (g’, s’1) in thread 1
(g, s1, s2) 1 (g, s’1, s2)
(g, s2) (g’, s’2) in thread 2
(g, s1, s2) 2 (g, s1, s’2)
Reachability problem for concurrent pushdown
system
Given (g, s1, s2), are there s’1 and s’2 such that (g, s1, s2) reaches (error, s’1, s’2) via an execution with at most c contexts?
Aggregate transition relation
ss’1 = Reach1(g, ss1, g’)
(g, ss1, ss2) 1 (g’, ss’1, ss2)
(g, ss1, ss2) 2 (g’, ss1, ss’2)
ss’2 = Reach2(g, ss2, g’)
Algorithm: 2 threads, c contexts
1 2
1 2
1 2Depth c
(g, {s1}, {s2})
Compute the set of reachable aggregate states.Report an error if (g, ss1, ss2) is reachable andg = error, ss1 is nonempty, and ss2 is nonempty.
Complexity: 2 threads, c contexts
1 2
1 2
1 2
Depth of tree = context bound cBranching factor bounded by G 2 (G = # of global stores)Number of edges bounded by (G 2) (c+1)
Each edge computable in polynomial time
Depth c
(g, {s1}, {s2})
Unbounded fork-join parallelism
• Fork operation: x = fork• Join operation: join(x)• Copy thread identifier from one
variable to another
Algorithm: unbounded fork-join parallelism, c contexts
• At most c threads may perform a transition
• Reduce to previously solved problem with c threads and c contexts– Nondeterministically pick c forked
threads for execution
Context-bounded analysis of concurrent software
• Many subtle concurrency errors are manifested in executions with few context switches – Experience with KISS on Windows drivers– Experience with Zing on transaction manager
• Algorithms for context-bounded analysis are more efficient than those for unbounded analysis– Reducibility to sequential checking with KISS– Decidability of assertion checking for
concurrent boolean programs