Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | edgar-harris |
View: | 218 times |
Download: | 0 times |
IBM Research
© 2007 IBM CorporationIBM Haifa Labs © 2006 IBM Corporation
Cross-Entropy BasedTesting
Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov
Research
Haifa, Israel
© 2003 IBM Corporation2
IBM Research
2
The problem: How to test for rare problems in large
programs?Testing involves running the program many times, hoping to find
the problem.o If a problem appears only in a small fraction of the runs, it is
unlikely to be found during random executions.
searching for a needle in haystack
© 2003 IBM Corporation3
IBM Research
3
The main idea:
The cross-entropy method is a widely used approachto estimating probabilities of rare events (Rubinstein).
Use thecross-entropy
method!
© 2003 IBM Corporation4
IBM Research
4
The cross-entropy method - motivation
The problem:o There is a probability space S with probability distribution f
and a performance function P defined on it.o A rare event e is that P(s) > r, for some s 2 S and some r
o How can we estimate the probability of e?
Space S
input in which the rare event e
occurs
s
and this happens very rarely under f
© 2003 IBM Corporation5
IBM Research
5
The naïve idea
Generate a big enough sample and compute the probability of the rare event from the inputs in the sample
a huge sample
from the probability space
This won’t work because for very rare events even a very large sample does not reflect
the probability correctly
© 2003 IBM Corporation6
IBM Research
6
A wishful thinking: if we had a distribution that gives the good inputs the probability 1, then we would be all set …
So we try to approximate it in iterations, every time trying to come a little closer:
o In each iteration, we generate a sample of some (large) size.
o We update the parameters (the probability distribution) so that we get a better sample in the next iteration.
The cross-entropy method
But we don’t have such a distribution
w.r.t. the performance function
© 2003 IBM Corporation7
IBM Research
7
Formal definition of cross-entropy
In information theory, the cross entropy, or the Kullback-Leibler “distance” between two probability distributions p and q measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p.
The cross entropy for two distributions p and q over the same discrete probability space is defined as follows:
H(p,q) = - x p(x) log(q(x))
not really a distance, because it is not symmetric
© 2003 IBM Corporation8
IBM Research
8
In optimization problems, we are looking for inputs that maximize the performance function.
The main problem is that this maximum is unknown beforehand.The stopping point is when the sample has a small relative standard deviation.The method was successfully applied to a variety of graph optimization problems:
o MAX-CUTo Traveling salesmano …
[Rubinstein]
The cross-entropy methodfor optimization problems
© 2003 IBM Corporation9
IBM Research
9
Illustration
Per
form
ance
fun
ctio
nUniform distribution
Per
form
ance
fun
ctio
n
Updated distribution
starting point
© 2003 IBM Corporation10
IBM Research
10
The setting in graphs
In graph problems, we have the following:o The space is all paths in the graph Go A performance function f gives each path a valueo We are looking for a path that maximizes f
In each iteration, we choose the best part Q of the sample The probability update formula for an edge e=(v,w) is
f’(e) = #paths in Q that use e
#paths in Q that go via v
© 2003 IBM Corporation11
IBM Research
11
Cross-entropy for testing
A program is viewed as a graph Each decision point is a node in the graph Decision points can result from any non-deterministic or other not
predetermined decisions:
The performance function is defined according to the bug that we want to find
o More on than later …
concurrency
coin tossing
inputs
© 2003 IBM Corporation12
IBM Research
12
Our implementation
We focus on concurrent programs. A program under test is represented as a graph, with nodes being the
synchronization points.
Edges are possible transitions between nodes. The graph is assumed to be DAG – all loops are unwound. The graph is constructed on-the-fly during the executions. The initial probability distribution is uniform among edges. We collect a sample of several hundreds executions. We adjust the probabilities of edges according to the formula. We repeat the process until the sample has a very small relative
standard deviation.
1-5%
this works only if there is a correct
locking policy
© 2003 IBM Corporation13
IBM Research
13
Dealing with loops
Unwinding all loops creates a huge graph. Problems with huge graphs:
o Takes more space to represento Takes more time to converge
We assume that most of the time, we are doing the same thing on subsequent iterations of the loop.
We introduce modulo parameter. It reduces the size of the graph.
dramatically, but also loses information There is a balance between a too-small and
a too-large modulo parameter that is found empirically.
for instance, modulo 2 creates two nodes
for each location inside the loop – for
even and for odd iterations
for i=1 to 100 dosync node;end for
sync nodeodd
sync nodeeven
i mod 2
© 2003 IBM Corporation14
IBM Research
14
Bugs and performance functions
bug performance function
buffer overflow number of elements in the buffer
deadlock number of locks
data race number of accessed shared resources
testing error paths number of error paths taken
note that we can also test for patterns, not necessarily bugs
© 2003 IBM Corporation15
IBM Research
15
Implementation – in Java for Java
program under test---------------------------------
Instrumentation
Stopper Decider
Evaluator Updater
probability distribution
table
disk
© 2003 IBM Corporation16
IBM Research
16
Experimental results We ran ConCEnter on several examples with buffer overflow
and with deadlocks. The bugs were very rare and did not manifest themselves in
random testing. ConCEnter found the bugs successfully. The method requires significant tuning: the modulo parameter,
the smoothing parameter, correct definition of the performance function, etc.
Example: A-B-push-popmyName=A // or B – there are two typesloop: if (top_of_stack=myName) pop;
else push(myName);end loop;
x10
AB
A
thread A
thread A
thread B
thread B
36
the probability of stack overflow is exponentially
small
© 2003 IBM Corporation17
IBM Research
17
Future work
Automatic tuning. Making ConCEnter plug-and-play for some predefined bugs. Replay: can we use distance from a predefined execution as a
performance function?
Second best: what if there are several areas in the graph where the maximum is reached?
What are the restrictions on the performance function in order for this method to work properly?
works already
seems that the function should be smooth
enough
© 2003 IBM Corporation18
IBM Research
18
Related work
Testing:o Random testingo Stress testingo Noise makerso Coverage estimationo Bug-specific heuristicso Genetic algorithmso …
Cross-entropy applications:o Buffer allocation, neural computation, DNA sequence
alignment, scheduling, graph problems, …
nothing specifically
targeted to rare bugs
cross-entropy is useful in many areas
© 2003 IBM Corporation19
IBM Research
19