A Randomized Scheduler with Probabilistic
Guarantees of Finding Bugs
Sebastian BurckhardtMicrosoft Research
Pravesh KothariIndian Institute of Technology,
Kanpur
Santosh NagarakatteUniversity of Pennsylvania
Madanlal MusuvathiMicrosoft Research
What is Concurrency Testing?Whether a test finds a bug depends on
◦ the configuration◦ the inputs◦ the schedule
Concurrency bugs are bugs that surface only for some schedules
The Concurrency Testing Problem◦ How to cover buggy schedules as best we
can?◦ Testing all schedules is infeasible!
Idea: Randomize the Schedule
void* p = 0;
CreateThd(child)
;
p = malloc(…);
Init();
DoMoreWork();
p->f ++;
Parent Child
1. Instrument code with calls to insert random delays
2. If we are lucky, delay exposes bugs
3. But: how long to delay? where not to delay?
void* p = 0;
RandDelay();
CreateThd(child
);
RandDelay();
p = malloc(…);
Init();
RandDelay();
DoMoreWork();
RandDelay();
p->f ++;
void* p = 0;
RandDelay();
CreateThd(child
);
RandDelay();
p = malloc(…);
Init();
RandDelay();
DoMoreWork();
RandDelay();
p->f ++;
void* p = 0;
RandDelay();
Start(child);
RandDelay();
p = malloc(…);
Init();
RandDelay();
DoMoreWork();
RandDelay();
p->f ++;
What is a Randomized Algorithm?
A randomized algorithm:◦“An algorithm that makes nondeterministic
choices”◦An algorithm using a random source
with a precisely defined distribution
A probabilistic guarantee:◦ “A guarantee that doesn’t always hold”◦A lower bound on the probability of
success
What we did / Talk Outline1. Define bug depth in such a way that
common bugs have low depth
2. Develop PCT algorithm (probabilistic concurrency testing), a randomized scheduling algorithm with a good probabilistic guarantee to find bugs of low depth
3. Build it into Cuzz, a concurrency fuzzing tool that improves the efficiency of stress testing
BUG DEPTHPart I
Bug Depth
Bug Depth = the number of ordering constraints a schedule has to satisfy to find the bug.
More constraints means more things have to go “just right” to find the bug.
Conjecture: many typical bugs have low depth.Let’s look at 3 examples.
Ordering Violation Example: A Bug of Depth 1Bug depth = the number of ordering constraints sufficient to find the
bug.
All schedules that satisfy the “” find the bug.
…start(child);p = malloc();…
Parent Thread…do_init();p->f ++;…
Child Thread
Atomicity Violation Example: A Bug of Depth 2Bug depth = the number of ordering constraints sufficient to find the
bug.
All schedules that satisfy both “” find the bug.
p = malloc();start(child);…If (p != null) p->f++…
Parent Thread…p = null;…
Child Thread
Deadlock Example: A Bug of Depth 2Bug depth = the number of ordering constraints sufficient to find the
bug.
All schedules that satisfy both “” find the bug.
…Lock(A);…Lock(B);…
Parent Thread…Lock(B);…Lock(A);…
Child Thread
THE PCT ALGORITHMPart II
PCT Algorithm: Randomly Assign & Change Thread Priorities
Input: int k; // no. of steps - guessed from previous runs int d; // target bug depth - randomly chosen
State: int pri[]; // thread priorities int change[]; // when to change priorities int stepCnt; // current step count
PCT::Init() {
stepCnt = 0;
foreach tid pri[tid] = rand() + d;
for( i=0; i<d-1; i++ ) change[i] = rand() % k;
}
PCT::RandDelay( tid ) {
stepCnt ++; if stepCnt == change[i] for some i pri[tid] = i; if (tid is not highest pri enabled thread) spin;
}
The PCT GuaranteeGiven a program with
◦n threads (~tens)◦k steps (~millions)◦a bug of depth d (1,2)
Each run PCT finds the bug with a probability of at least
(this is a worst-case guarantee)
1
1 dkn
p
THE CUZZ TOOL& RESULTS
Part III
How it Works
Intercept at synchronization points◦ Detour win32 synchronization calls◦ Optionally instrument data accesses◦ No manual instrumentation required
Program
Kernel Scheduler
Win32 API
CuzzRandomizedAlgorithm
binary instrumentationfor data accesses(optional)
Some Results
Practice Beats Worst-CaseMeasured Probability often
significantly better than worst-case guaranteed probability
Why Does Practice Beat Worst-Case?Worst-case guarantee applies to
hardest-to-find bug of given depth If bugs can be found in multiple ways,
probabilities add up!Example: Increasing the number of threads
helps:
2 3 5 9 17 33 650
0.0020.0040.0060.0080.01
0.0120.0140.0160.0180.02
Number of Threads
Mea
sure
d Pr
obab
ility
Internal Tool Status
The Cuzz tool is available internally at Microsoft
We are working with several product groups that actively use Cuzz to improve their stress testing
DEMO
Demo ConclusionMeasure probabilities on cluster
◦Without Cuzz: 1 Fail in 238’820 runs ratio = 0.000004817
◦With Cuzz: 12 Fails in 320 runs ratio = 0.0375
◦Resource Savings: factor 7,800
1 day of stress testing = 11 seconds of Cuzz testing
ConclusionsBug depth is a useful metric to
focus testing effortsSystematic randomization
improves concurrency testingNo reason not to use Cuzz
Thank You For Your
Attention.