+ All Categories
Home > Documents > CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Date post: 26-Mar-2015
Category:
Upload: colin-simmons
View: 226 times
Download: 4 times
Share this document with a friend
Popular Tags:
36
CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research
Transcript
Page 1: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

CHESS : Systematic Testing of Concurrent Programs

Madan MusuvathiShaz Qadeer

Microsoft Research

Page 2: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Testing multithreaded programs is HARD

Specific thread interleavings expose subtle errorsTesting often misses these errors

Even when found, errors are hard to debugNo repeatable traceSource of the bug is far away from where it manifests

Page 3: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Concurrency is a real problemWindows 2000 hot fixes

Concurrency errors most common defects among “detectable errors”

Incorrect synchronization and protocol errors most common defects among all coding errors

Windows Server 2003 late cycle defectsSynchronization errors second in the list, next to buffer

overruns

Race conditions can result in security exploits

Page 4: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Current practiceConcurrency testing == Stress testing

Example: testing a concurrent queueCreate 100 threads performing queue operationsRun for days/weeksPepper the code with sleep ( random() )

Stress increases the likelihood of rare interleavingsMakes any error found hard to debug

Page 5: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

CHESS: Unit testing for concurrencyExample: testing a concurrent queue

Create 1 reader thread and 1 writer threadExhaustively try all thread interleavings

Run the test repeatedly on a specialized scheduler

Explore a different thread interleaving each timeUse model checking techniques to avoid redundancy

Check for assertions and deadlocks in every runThe error-trace is repeatable

Page 6: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Systematic Stress Testing Using CHESS

Kernel: Threads, Scheduler, Synchronization Objects

While(not done) { TestScenario()}

While(not done) { TestScenario()}

TestScenario() { …}

ProgramTester Provides a Test Scenario CHESS

CHESS runs the scenario in a loop • Every run takes a different interleaving• Every run is repeatable

Win32 API

Page 7: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Conditions on Test ScenarioTest scenario should terminate in all interleavings

Test scenario should be idempotentFree all resources (handles, memory, …)Clear the hardware state

Key observation:Existing stress tests already have these propertiesBecause they repeatedly run for ever

Page 8: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Perturb the System as Little as Possible

Kernel: Threads, Scheduler, Synchronization Objects

While(not done){ TestScenario()}

While(not done){ TestScenario()}

TestScenario(){ …}

Program

CHESS

Win32 API

Detour Win32 API calls• To control and introduce nondeterminism

Run the system as is• On the actual OS, hardware• Using system threads, synchronization

Advantages• Avoid reporting false errors• Easy to add to existing test frameworks• Use existing debuggers

Page 9: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Implementation detailsHandle all the Win32 synchronization mechanisms

Critical sections, locks, semaphores, events,…ThreadpoolsAsynchronous procedure callsTimersIO Completions

No modification to the kernel scheduler / Win32 library

CHESS drives the system along a desired by interleaving by ‘hijacking’ the scheduler

Page 10: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Controlling the Scheduling NondeterminismNondeterministic choices for the scheduler

Determine when to context switchOn context switch, pick the next runnable thread to runOn resource release, wake up one of the waiting threads

Hijack these choices from the schedulerEnsure at most one thread is runnableNo thread is waiting on a resourceAt chosen schedule points, block the current thread while

waking the next threadEmulate program execution on a uniprocessor with

context switches only at synchronization points

Page 11: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Partial-order reductionMany thread interleavings are equivalent

Accesses to separate memory locations by different threads can be reordered

Avoid exploring equivalent thread interleavings

Page 12: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Partial-order reduction in CHESSAlgorithm:

Assume the program is data-race freeContext switch only at synchronization pointsCheck for data-races in each execution

Theorem:If the algorithm terminates without reporting races,

then the program has no assertion failures

Page 13: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Executions on Multi-coresCHESS checks for data-racesIf a Test Scenario manifests a bug on a multi-core

machine, then CHESS willEither report a data-raceOr the bug

CHESS systematically enumerates all sequentially consistent executionsAny data-race free multi-core execution is equivalent to

a sequentially consistent execution

Page 14: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

State space explosion

x = 1;y = 1;x = 1;y = 1;

x = 2;y = 2;x = 2;y = 2;

2,12,1

1,01,0

0,00,0

1,11,1

2,22,2

2,22,22,12,1

2,02,0

2,12,12,22,2

1,21,2

2,02,0

2,22,2

1,11,1

1,11,1 1,21,2

1,01,0

1,21,2 1,11,1

y = 1;y = 1;

x = 1;x = 1;

y = 2;y = 2;

x = 2;x = 2;

Page 15: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

State space explosion

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

n threads

k steps each

Number of executions = O( nnk )

Exponential in both n and kTypically: n < 10 k > 100

Limits scalability to large programs (large k)

Page 16: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Bounding execution depthWorks very well for message-passing programs

Limit the number of message exchanges

Message processing code executed atomicallyCan go ‘deep’ in the state space

Does not work for multithreaded programsEven toy programs can have large number of steps

(shared-variable accesses)

Page 17: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

x = 1;if (p != 0) { x = p->f;}

x = 1;if (p != 0) { x = p->f;}

Iterative context bounding

x = p->f;} x = p->f;}

x = 1;if (p != 0) {x = 1;if (p != 0) {

p = 0;p = 0;

preemption

non-preemption

Page 18: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Iterative context-bounding algorithmThe scheduler has a budget of c preemptions

Nondeterministically choose the preemption pointsResort to non-preemptive scheduling after c

preemptionsOnce all executions explored with c preemptions

Try with c+1 preemptions

Iterative context-bounding has desirable propertiesProperty 0: Easy to implement

Page 19: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Property 1: Polynomial state spaceTerminating program with fixed inputs and deterministic threads

n threads, k steps each, c preemptionsNumber of executions <= nkCc . (n+c)! = O( (n2k)c. n! )

Exponential in n and c, but not in k

x = 1; … … … … …y = 1;

x = 1; … … … … …y = 1;

x = 2; … … … … … y = 2;

x = 2; … … … … … y = 2;

x = 1; … … … …

x = 1; … … … …

x = 2; … … …

x = 2; … … …

…y = 1; …y = 1;

… … … …

y = 2;y = 2;

• Choose c preemption points

• Permute n+c atomic blocks

Page 20: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Property 2: Deep exploration possible with small boundsA context-bounded execution has unbounded depth

a thread may execute unbounded number of steps within each context

Event a context-bound of zero yields complete terminating executions

Page 21: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Property 3: Finds the ‘simplest’ error traceFinds smallest number of preemptions to the

error

Number of preemptions better metric of error complexity than execution length

Page 22: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Property 4: Coverage metricIf search terminates with context-bound of c, then any

remaining error must require at least c+1 preemptions

Intuitive estimate forThe complexity of the bugs remaining in the programThe chance of their occurrence in practice

Page 23: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Property 5: Lots of bugs with small number of preemptionsA non-blocking implementation of the work-

stealing queue algorithmbounded circular buffer accessed concurrently by

readers and stealersDeveloper provided

test harnessthree buggy variations of the program

Each bug found with at most 2 preemptionsexecutions with 35 preemptions are possible!

Page 24: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Context-bounding + Partial-order reductionAlgorithm:

Assume the program is data-race freeContext switch only at synchronization pointsExplore executions with c preemptionsCheck for data-races in each execution

Theorem:If the algorithm terminates without reporting races,

Then the program has no assertion failures reachable with c preemptions

Requires that a thread can block only at synchronization pointsProof (Musuvathi-Q, PLDI 2007)

Page 25: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Bugs found

Program KLOC Max Num Threads

Bugs Reachable with Preemption Count

0 1 2 3 Total

Bluetooth 0.4 3 0 1 0 0 1

Work-Stealing Queue

1.3 3 0 1 2 0 3

Transaction Manager

7.0 2 0 0 2 1 3

APE 18.9 4 2 1 1 - 4

Dryad Channels 16.0 5 1 5 1 - 7

Page 26: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}

Page 27: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}

Page 28: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}

Page 29: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}

Page 30: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application

// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}

// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);

// ... do work here

channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished

delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}

Page 31: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Facts about Dryad error trace

Long error trace but requires only one preemptionDepth-bounding cannot find it without a lot of luck

The error trace has 6 non-preempting context switchesIt is important to leave unbounded the number of non-

preempting context switches This (and the other 6 errors) in Dryad remained in

spite of careful regression testing and months of production use

Page 32: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Bugs found

Program KLOC Max Num Threads

Bugs Reachable with Preemption Count

0 1 2 3 Total

Bluetooth 0.4 3 0 1 0 0 1

Work-Stealing Queue

1.3 3 0 1 2 0 3

Transaction Manager

7.0 2 0 0 2 1 3

APE 18.9 4 2 1 1 - 4

Dryad Channels 16.0 5 1 5 1 - 7

Page 33: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Coverage vs. Context-bound

Page 34: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Dryad (coverage vs. time)

Page 35: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

Current CHESS applications (work in progress)Dryad (library for distributed dataflow programming)Singularity/Midori (OS in managed code)User-mode drivers

Cosmos (distributed file system)SQL database

Page 36: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.

ConclusionConcurrency is important

Building robust concurrent software is still a challengeLack of debugging and testing toolsCHESS: Concurrency unit-testing

Exhaustively try all interleavingsAttempt to seamlessly integrate with existing test

frameworksProvide replay capability

Iterative context-bounding algorithm key to the design


Recommended