Search-Based Testing · Applications in Mutation Testing Test suite prioritisation....

Post on 18-Jul-2020

0 views 0 download

transcript

Search-Based TestingPhil McMinn

University of Sheffield, UK

OverviewHow and why Search-Based Testing Works

Examples

Search-Based Test Data Generation

Future Directions

Testability Transformation

Input Domain Reduction

Empirical and Theoretical Studies

Temporal, Functional, Structural

Applications in Mutation Testing

Test suite prioritisation

Acknowledgements

The material in some of these slides has kindly been provided by:

Mark Harman (KCL/UCL)

Joachim Wegener (Berner & Matner)

manual design of test cases / scenarios

Conventional Testing

manual design of test cases / scenarios

laborious,time-consuming

Conventional Testing

manual design of test cases / scenarios

laborious,time-consuming

tedious

Conventional Testing

manual design of test cases / scenarios

laborious,time-consuming

tedious

difficult!(where are the faults ???)

Conventional Testing

the hard part...

manual design of test cases / scenarios

laborious,time-consuming

tedious

difficult!(where are the faults ???)

Conventional Testing

Random Test Data Generation

Input

Random Test Data Generation

Input

Random Test Data Generation

Input

Search-Based Testing is an automated search of a

potentially large input space

the search is guided by a problem-specific ‘fitness function’

Search-Based Testing is an automated search of a

potentially large input space

the search is guided by a problem-specific ‘fitness function’

the fitness function guides the search to the test goal

Fitness Function

The fitness function scores different inputs to the system according to the test goal

Fitness Function

The fitness function scores different inputs to the system according to the test goal

which ones are ‘good’ (that we should develop/evolve further)

Fitness Function

The fitness function scores different inputs to the system according to the test goal

which ones are ‘good’ (that we should develop/evolve further)

which ones are useless (that we can forget about)

Fitness-guided search

Input

Fitn

ess

Fitness-guided search

Input

Fitn

ess

Fitness-guided search

Input

Fitn

ess

First publication on SBST

Webb Miller David Spooner

Automatic Generation of Floating-Point Test DataIEEE Transactions on Software Engineering, 1976

Winner of the 2009 Accomplishment by a Senior Scientist Award of the International Society for Computational Biology

2009 Time100Scientists and Thinkers

Publications since 1976

source: SEBASE publications repository http://www.sebase.org

International Search-Based Testing Workshop

co-located with ICST

http://sites.google.com/site/icst2011workshops/

http://sites.google.com/site/icst2011/

4th event at ICST 2011 next year

check websites for submission deadlines etc.:

International Symposium on Search-Based Software

Engineering

Benevento, Italy.7th - 9th September 2010

www.ssbse.org

International Symposium on Search-Based Software Engineering

2011: Co-location with FSE. Szeged, Hungary

www.ssbse.org

Phil McMinnGeneral Chair

Myra Cohen and Mel Ó Cinnéide Program Chairs

Fitness Functions

Often easy

We often define metrics

Need not be complex

Daimler Temporal Testing

tmin

tmax

t

dece

lera

tion

optimal point in time for

triggering the airbag igniter

Fitness = duration

J. Wegener and M. Grochtmann. Verifying timing constraints of real-time systems by means of evolutionary testing. Real-Time Systems, 15(3):275– 298, 1998.

Conventional testingmanual design of test cases / scenarios

laborious,time-consuming

tediousdifficult!

(where are the faults ???)

Conventional testingmanual design of test cases / scenarios

laborious,time-consuming

tediousdifficult!

(where are the faults ???)

Search-Based Testing:automatic - may sometimes be time consuming, but it is not a human’s time being consumed

Conventional testingmanual design of test cases / scenarios

laborious,time-consuming

tediousdifficult!

(where are the faults ???)

Search-Based Testing:automatic - may sometimes be time consuming, but it is not a human’s time being consumed

Search-Based Testing:a good fitness function will lead the search to

the faults

Generating vs CheckingConventional Software Testing Research

Write a method to construct test cases

Search-Based Testing

Write a method to determine how good a test case is

Generating vs CheckingConventional Software Testing Research

Write a method to construct test cases

Search-Based Testing

Write a fitness functionto determine how good a test case is

psi

gap

dist2space

space length

space width

Daimler Autonomous Parking System

Input

Daimler Autonomous Parking System

Fitness

Generation 0

Generation 0 Generation 10critical

Generation 0 Generation 10

Generation 20

collision

critical

Test setupUsual approach to testing

Software under test

Simulation environment

Manually generated input situations

Output of test

Test setup

Software under test

Simulation environment

Search Algorithm

Automatically generated inputs

outputs (converted to fitness values)

Search-Based Testing approach

Feedback loop

O Bühler and J Wegener: Evolutionary functional testing, Computers & Operations Research, 2008 - Elsevier

O. Buehler and J. Wegener. Evolutionary functional testing of an automated parking system. In International Conference on Computer, Communication and Control Technologies and The 9th. International Conference on Information Systems Analysis and Synthesis, Orlando, Florida, USA, 2003.

Structural testing

Structural testing

fitness function analysesthe outcome of

decision statements and the values of variables in

predicates

Structural testing

fitness function analysesthe outcome of

decision statements and the values of variables in

predicates

More later ...

Assertion testing

Assertion testing

assertion condition speed < 150mph

Assertion testing

assertion condition speed < 150mph

fitness function:f =150 - speed

Assertion testing

assertion condition speed < 150mph

fitness function:f =150 - speed

fitness minimised If f is zero or less a fault is found

Bogdan Korel, Ali M. Al-Yami: Assertion-Oriented Automated Test Data Generation. ICSE, 1996: pp. 71-80

Search Techniques

Hill Climbing

Input

Fitn

ess

Hill Climbing

Input

Fitn

ess

Hill Climbing

Input

Fitn

ess

Hill Climbing

Input

Fitn

ess

Hill Climbing

Input

Fitn

ess

No better solution in neighbourhood Stuck at a local optima

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Hill Climbing - Restarts

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Simulated Annealing

Input

Fitn

ess

Worse solutions temporarily

accepted

Evolutionary Algorithm

Input

Fitn

ess

Evolutionary Algorithm

Input

Fitn

ess

Evolutionary Algorithm

Input

Fitn

ess

Evolutionary Algorithms

inspired by Darwinian Evolution and concept of survival of the fittest

Crossover

Mutation

Crossover

a b c10 10 20 40

d

a b c20 -5 80 80

d

Crossover

a b c10 10 20 40

d

a b c20 -5 80 80

d

Crossover

a b c10 10 20 40

d

a b c20 -5 80 80

d

Crossover

a b c10 10 20 40

d

a b c20 -5 80 80

d

a10

b10

c20 40

d

Crossover

a b c10 10 20 40

d

a b c20 -5 80 80

d

c80 80

d

a20

b-5

a10

b10

c20 40

d

d40

Mutation

a b c10 10 20

d40

Mutation

a b c10 10 20 20

d

d40

Mutation

a b c10 10 20 20

d40

d

d40

Mutation

a b c10 10 20 20

d40

da20

Evolutionary Testing

Evolutionary Testing

Mutation

Crossover

Selection

Insertion

Fitness Evaluation

End?

Evolutionary Testing

Mutation

Crossover

Selection

Insertion

Fitness Evaluation

End?

Test cases

Monitoring

Execution

Which search method?Depends on characteristics of the search landscape

Which search method?Depends on characteristics of the search landscape

Which search method?Depends on characteristics of the search landscape

Which search method?Depends on characteristics of the search landscape

Which search method?Depends on characteristics of the search landscape

Which search method?Some landscapes are

hard for some searches but easy for

others

...and vice versa...

Which search method?Some landscapes are

hard for some searches but easy for

others

...and vice versa...

Which search method?Some landscapes are

hard for some searches but easy for

others

...and vice versa...

Which search method?Some landscapes are

hard for some searches but easy for

others

more on this later...

...and vice versa...

Representation

Fitness Function

Neighbourhood

Ingredients for an optimising search

algorithm

Representation

Fitness Function

Neighbourhood

Ingredients for Search-Based Testing

Representation A method of encoding all possible inputs

Usually straightforward

Inputs are already in data structuresFitness Function

Neighbourhood

Ingredients for Search-Based Testing

Part of our understanding of the problem

We need to know our near neighbours

Representation

Fitness Function

Neighbourhood

Ingredients for Search-Based Testing

Transformation of the test goal to a numerical function

Numerical values indicate how ‘good’ an input is

Ingredients for Search-Based Testing

Representation

Fitness Function

Neighbourhood

More search algorithms

More search algorithms

Tabu Search

Estimation of Distribution Algorithms

Particle Swarm Optimisation

Ant Colony Optimisation

Genetic Programming

SBST Surveys & ReviewsPhil McMinn: Search-based software test data generation: a survey. Software Testing, Verification and Reliability 14(2): 105-156, 2004

Wasif Afzal, Richard Torkar and Robert Feldt: A Systematic Review of Search-based Testing for Non-Functional System Properties. Information and Software Technology, 51(6):957-976, 2009

Shaukat Ali, Lionel Briand, Hadi Hemmati and Rajwinder Panesar-Walawege: A Systematic Review of the Application and Empirical Investigation of Search-Based Test-Case Generation. IEEE Transactions on Software Engineering, To appear, 2010

Getting started in SBSEM. Harman and B. Jones: Search-based software engineering. Information and Software Technology, 43(14):833–839, 2001.

M. Harman: The Current State and Future of Search Based Software Engineering, In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), 20-26 May, Minneapolis, USA (2007)

D. Whitley: An overview of evolutionary algorithms: Practical issues and common pitfalls. Information and Software Technology, 43(14):817–831, 2001.

More applications of Search-Based Testing

Mutation Testing

original

Mutation Testing

original

mutant

mutant

Mutation Testing

mutant

mutant

Mutation Testing

mutant

mutant

higher order mutant

Mutation Testing

mutant

mutant

higher order mutant

Fewer Equivalent MutantsA. J. Offutt. Investigations of the software testing coupling effect. ACM

Transactions on Software Engineering and Methodology 1(1):5–20, Jan. 1992.

Finding Good HOMsDue to the large number of potential HOMs finding the

ones that are most valuable is hard

We want:

Subtle mutants

Reduced Test Effort

Finding Good HOMsDue to the large number of potential HOMs finding the

ones that are most valuable is hard

We want:

Subtle mutants

Reduced Test Effort

HOMs that are hard to kill, corner cases where

undiscovered faults reside

Finding Good HOMsDue to the large number of potential HOMs finding the

ones that are most valuable is hard

We want:

Subtle mutants

Reduced Test Effort

HOMs that are hard to kill, corner cases where

undiscovered faults reside

HOMs that subsume as many first-order mutants as

possible

Subtle Mutants -Fitness Function

ratio of the killability of the HOM to the killability of the constituent FOMs

> 1 HOM is weaker than FOMs

< 1 HOM is stronger than FOMs

= 0, potential equivalent mutant

Y. Jia and M. Harman. Constructing subtle faults using higher order mutation testing. In 8th International Working Conference on Source Code Analysis and

Manipulation (SCAM 2008), Beijing, China, 2008. IEEE Computer Society.

Time aware test suite prioritisation

K. R. Walcott, M. L. Soffa, G. M. Kapfhammer, and R. S. Roos. Time aware test suite prioritization. In Proc. ISSTA, pages 1–11, 2006.

Time aware test suite prioritisation

Suppose a 12 minute time budget

Time aware test suite prioritisation

Order by considering the number of faults that can be

detected

Time aware test suite prioritisation

Order by considering the time only

Time aware test suite prioritisation

Average % of faults detected

Time aware test suite prioritisation

Intelligent heuristic search

Time aware test suite prioritisation

Intelligent heuristic search

Same no. of faults found,

less time

Fitness FunctionThe tester is unlikely to know the location of faults

Need to estimate how likely a test is to find defects

% of code coverage is used to estimate a suite’s potential

the time taken to execute the test suitex

Results

Mutations used to seed faults

Multi-objective Search

Instead of combining the objectives into one fitness function, handle them as distinctive goals

Time Faults

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

Multi-objective SearchFitness Function A

Fitness Function B

The Pareto Front

Some ResultsPareto Efficient Multi-Objective Test Case

Selection, Shin Yoo and Mark Harman. Proceedings of ACM International Symposium on Software Testing and Analysis 2007 (ISSTA

2007), p140-150

Three objectives

Other applications of Search-Based Testing

Combinatorial Interaction Testing

GUI Testing

M.B. Cohen, M.B. Dwyer and J. Shi, Interaction testing of highly-configurable systems in the presence of constraints, International Symposium on Software Testing and Analysis

(ISSTA), London, July 2007, pp. 129-139.

X. Yuan, M.B. Cohen and A.M. Memon, GUI Interaction Testing: Incorporating Event Context, IEEE Transactions on Software Engineering, to appear

Other applications of Search-Based Testing

Stress testing

State machine testing

L. C. Briand, Y. Labiche, and M. Shousha. Stress testing real-time systems with genetic algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference

(GECCO 2005), pages 1021–1028, Washington DC, USA, 2005. ACM Press.

K. Derderian, R. Hierons, M. Harman, and Q. Guo. Automated unique input output sequence generation for conformance testing of FSMs. The Computer Journal,

39:331–344, 2006.

Search-Based Structural Test Data

Generation

Covering a structure

TARGET

Fitness evaluation

TARGET

Fitness evaluation

TARGET

The test data executes the ‘wrong’ path

Analysing control flow

TARGET

Analysing control flow

TARGET

The outcomes at key decision statements

matter.

These are the decisions on

which the target is control dependent

Approach Level

TARGET

Approach Level

TARGET

= 2

Approach Level

TARGET

= 2

= 1

Approach Level

TARGET

= 2

= 1

= 0

Approach Level

TARGET

= 2

= 1

= 0

minimisation

Approach LevelRoy P. Pargas, Mary Jean Harrold, Robert Peck: Test-Data Generation

Using Genetic Algorithms. Softw. Test., Verif. Reliab. 9(4): 263-282 (1999)

Joachim Wegener, André Baresel, Harmen Sthamer: Evolutionary test environment for automatic structural testing. Information & Software Technology 43(14): 841-854 (2001)

Analysing predicatesApproach level alone gives us coarse values

a = 50, b = 0 a = 45, b = 5 a = 40, b = 10 a = 35, b = 15 a = 30, b = 20 a = 25, b = 25 !

Analysing predicatesApproach level alone gives us coarse values

a = 50, b = 0 a = 45, b = 5 a = 40, b = 10 a = 35, b = 15 a = 30, b = 20 a = 25, b = 25 !

getting ‘closer’ to being true

Branch distanceAssociate a distance formula with different

relational predicates

a = 50, b = 0 branch distance = 50a = 45, b = 5 branch distance = 40a = 40, b = 10 branch distance = 30a = 35, b = 15 branch distance = 20a = 30, b = 20 branch distance = 10a = 25, b = 25 branch distance = 0

getting ‘closer’ to being true

Branch distances for relational predicates

Nigel Tracey, John Clark & Keith Mander. The Way Forward for Unifying Dynamic Test-Case Generation:

The Optimisation-Based Approach.1998

Bogdan Korel. Automated Software Test Data Generation. IEEE Trans. Software Eng. 1990

Webb Miller & D. Spooner. Automatic Generation of Floating-Point Test Data. IEEE Trans. Software Eng. 1976

Putting it all together

true

true

if a >= b

if b >= c

TARGET

false

false

true if c >= d false

Fitness = approach Level + normalised branch distance

TARGET

Putting it all together

true

true

if a >= b

if b >= c

TARGET

TARGET MISSEDApproach Level = 2

Branch Distance = b - a

false

false

true if c >= d false

Fitness = approach Level + normalised branch distance

TARGET

normalised branch distance between 0 and 1indicates how close approach level is to being penetrated

Putting it all together

true

true

if a >= b

if b >= c

TARGET

TARGET MISSEDApproach Level = 1

Branch Distance = c - b

TARGET MISSEDApproach Level = 2

Branch Distance = b - a

false

false

true if c >= d false

Fitness = approach Level + normalised branch distance

TARGET

normalised branch distance between 0 and 1indicates how close approach level is to being penetrated

Putting it all together

true

true

if a >= b

if b >= c

TARGET

TARGET MISSEDApproach Level = 1

Branch Distance = c - b

TARGET MISSEDApproach Level = 2

Branch Distance = b - a

false

false

true if c >= d false

TARGET MISSEDApproach Level = 0

Branch Distance = d - c

Fitness = approach Level + normalised branch distance

TARGET

normalised branch distance between 0 and 1indicates how close approach level is to being penetrated

Normalisation FunctionsSince the ‘maximum’ branch distance is generally unknown we need a non-standard normalisation function

Baresel (2000), alpha = 1.001

Normalisation FunctionsSince the ‘maximum’ branch distance is generally unknown we need a non-standard normalisation function

Arcuri (2010), beta = 1

void fn( input1, input2, input3 .... )

Alternating Variable Method‘Probe’ moves

decrease

void fn( input1, input2, input3 .... )

increase

Alternating Variable Method‘Probe’ moves

decrease decrease

void fn( input1, input2, input3 .... )

increase increase

Alternating Variable Method‘Probe’ moves

decrease decrease decrease

void fn( input1, input2, input3 .... )

increase increase increase

Alternating Variable Method‘Probe’ moves

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method

Input variable value

Fitn

ess

Accelerated hill climb

Alternating Variable Method1. Randomly generate start point

a=10, b=20, c=30

Alternating Variable Method1. Randomly generate start point

a=10, b=20, c=30

2. ‘Probe’ moves on a

a=9, b=20, c=30 no effect

a=11, b=20, c=30

Alternating Variable Method1. Randomly generate start point

a=10, b=20, c=30

2. ‘Probe’ moves on a

a=9, b=20, c=30 no effect

a=11, b=20, c=30

3. ‘Probe’ moves on b

a=10, b=19, c=30 improved

branch distance

Alternating Variable Method1. Randomly generate start point

a=10, b=20, c=30

2. ‘Probe’ moves on a

a=9, b=20, c=30 no effect

a=11, b=20, c=30

3. ‘Probe’ moves on b

a=10, b=19, c=30

4. Accelerated moves in direction of improvement

improved

branch distance

Key PublicationsBogdan Korel. Automated Software Test Data Generation.

IEEE Trans. Software Eng. 1990

Alternating variable method

Evolutionary structural test data generationS. Xanthakis, C. Ellis, C. Skourlas, A. Le Gall, S. Katsikas, and K. Kara- poulios.

Application of genetic algorithms to software testing (Applica- tion des algorithmes g ́en ́etiques au test des logiciels). In 5th International Conference on Software

Engineering and its Applications, pages 625–636, Toulouse, France, 1992

A search-based test data generator tool

Input Generation Using Automated Novel Algorithms

IGUANA(Java)

Test object

(C code compiled to a DLL )

IGUANA(Java)

Test object

(C code compiled to a DLL )

inputs

IGUANA(Java)

Test object

(C code compiled to a DLL )

inputs

information from test object

instrumentation

fitness computation

IGUANA(Java)

Test object

(C code compiled to a DLL )

inputs

information from test object

instrumentation

fitness computation

search algorithm

IGUANA(Java)

Test object

(C code compiled to a DLL )

inputs

information from test object

instrumentation

Java Native Interface

fitness computation

search algorithm

A function for testing

1. Parse the code and extract control dependency graph

TARGET

“which decisions are key for the execution of individual structural

targets” ?

if a == b

if b == c

return 1

Test Object Preparation

2. Instrument the code

TARGET

if a == b

if b == c

return 1

for monitoring control flow and variable values in

predicates

Test Object Preparation

3. Map inputs to a vector

Test Object Preparation

a b c

3. Map inputs to a vector

Test Object Preparation

a b c

Straightforward in many cases

3. Map inputs to a vector

Test Object Preparation

a b c

Straightforward in many casesInputs composed of dynamic data structures are harder to compose

Kiran Lakhotia, Mark Harman and Phil McMinn.Handling Dynamic Data Structures in Search-Based Testing.

Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2008), Atlanta, Georgia, USA, July 12-16 2008, pp. 1759-1766, ACM Press.

Instrumentation

Instrumentation

Each branching condition is

replaced by a call to the function

node(...)

the instrumentation should only observe the program and not alter its behaviour

The first parameter is the control flow

graph node ID of the decision statement

The second parameter is a boolean condition that replicates the structure in the original program

(i.e. including short-circuiting)

Relational predicates are replaced with functions that compute branch distance.

The instrumentation tells us:

Which decision nodes were executed

and their outcome (branch distances)

Therefore we can find which decision control flow diverged from a target for an input....

... and compute the approach level from the control dependence graph

... and lookup the branch distance

fitness value for an input

TARGET

if a == b

if b == c

return 1

1nput: <20, 20, 30>NODE T F

1 0 1

2 10 0

4 20 0

....

Diverged at node 2approach level: 0

branch distance: 10

fitness = 0.009945219

TestabilityTransformation

The ‘Flag’ Problem

Program Transformation

Program Transformation

Programs will inevitably have features that heuristic searches handle less well

Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.

IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.

Testability transformation: change the program to improve test data generation

Programs will inevitably have features that heuristic searches handle less well

Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.

IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.

Testability transformation: change the program to improve test data generation

... whilst preserving test adequacy

Programs will inevitably have features that heuristic searches handle less well

Mark Harman, Lin Hu, Rob Hierons, Joachim Wegener, Harmen Sthamer, Andre Baresel and Marc Roper. Testability Transformation.

IEEE Transactions on Software Engineering. 30(1): 3-16, 2004.

Nesting

Phil McMinn, David Binkley and Mark Harman Empirical Evaluation of a Nesting Testability Transformation for Evolutionary Testing

ACM Transactions on Software Engineering and Methodology, 18(3), Article 11, May 2009

Testability Transformation

Testability Transformation

Note that the programs are no longer equivalent

Testability Transformation

Note that the programs are no longer equivalent

But we don’t care - so long as we get the test data is still adequate

Nesting & Local Optima

Nesting & Local Optima

Nesting & Local Optima

Nesting & Local Optima

-100

-80

-60

-40

-20

0

20

40

60

80

100

Nested branches

Chan

ge in

suc

cess

rate

afte

r app

lyin

g tr

ansf

orm

atio

n (%

)Results - Industrial & Open source code

Dependent & Independent Predicates

Predicates influenced by disjoint sets of input variables

IndependentCan be optimised

in parallel

e.g. ‘a == 0’ and ‘b == 0’

Dependent & Independent Predicates

Predicates influenced by non-disjoint sets of input

variables

DependentInteractions

between predicates inhibit

parallel optimisation

e.g. ‘a == b’ and ‘b == c’

-100

-80

-60

-40

-20

0

20

40

60

80

100

Nested branches

Chan

ge in

suc

cess

rate

afte

r app

lyin

g tr

ansf

orm

atio

n (%

)

Dependent predicates

-100

-80

-60

-40

-20

0

20

40

60

80

100

Nested branches

Chan

ge in

suc

cess

rate

afte

r app

lyin

g tr

ansf

orm

atio

n (%

)

Dependent predicates

Independent and some dependent predicates

When not preserving program equivalence can go wrong

we are testing to cover structure

... but the structure is the problem

so we transform the program

... but this alters the structure

we are testing to cover structure

... but the structure is the problem

so we transform the program

... but this alters the structure

so we need to be careful:are we still testing according to the same

criterion?

we are testing to cover structure

... but the structure is the problem

so we transform the program

... but this alters the structure

Input Domain Reduction

Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.

The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/

FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.

Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.

The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/

FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.

Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn and Joachim Wegener.The Impact of Input Domain Reduction on Search-Based Test Data Generation.

The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/

FSE 2007), Cavtat, Croatia, September 3-7 2007, pp. 155-164, ACM Press.

Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000

-100,000 ... 100,000

approx 1016

Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000

-100,000 ... 100,000

Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000

-100,000 ... 100,000

Effect of Reduction-100,000 ... 100,000-100,000 ... 100,000

-100,000 ... 100,000

200,001

Variable Dependency Analysis

Variable Dependency Analysis

Variable Dependency Analysis

Variable Dependency Analysis

Empirical Study

Random Search Alternating Variable Method Evolutionary Testing

Defroster F2

Gimp Spice

Tiff

Case studies:

Studied the effects of reduction with:

Effect on Random Testing-49 ... 50

Probability of executing target:

-49 ... 50-49 ... 50

100 x 100 x 1

100 x 100 x 100

Effect on Random Testing-49 ... 50

Probability of executing target:

-49 ... 50-49 ... 50

100 x 100 x 1

100 x 100 x 100

Results with Random Testing

Results with AVM

Effect on AVMSaves probe moves (and thus wasteful fitness evaluations) around irrelevant variables

void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... )

increase increase increase increase

decrease decrease decrease decrease

Effect on AVMSaves probe moves (and thus wasteful fitness evaluations) around irrelevant variables

void fn( irrelevant1, irrelevant2, irrelevant3, required1 .... )

increase

decrease

Effect on ET

Saves mutations on irrelevant variables

Mutations concentrated on the variables that matter

Likely to speed up the search

Results with ET

Conclusions for Input Domain Reduction

Variable dependency analysis can be used to reduce input domains

This can reduce search effort for the AVM and ET

Perhaps surprisingly, there is no overall change for random search

Which searchalgorithm?

Empirical Study Bibclean Defroster F2 Eurocheck Gimp Space Spice Tiff Totinfo

Mark Harman and Phil McMinn.A Theoretical and Empirical Study of Search Based Testing: Local, Global and Hybrid Search.

IEEE Transactions on Software Engineering, 36(2), pp. 226-247, 2010.

760 branches in ~5 kLOC

Interesting branches

20 98

Alternating Variable Method

Evolutionary Testing

0

10

20

30

40

50

60

70

80

90

100

f2 F2 (

11F)

spac

e spa

ce_s

eqrot

rg (17

T)

spac

e spa

ce_s

eqrot

rg (22

T)

spice

clip_

to_cir

cle (4

9T)

spice

clip_

to_cir

cle (6

2T)

spice

clip_

to_cir

cle (6

8F)

spice

clipa

rc (22

T)

spice

clipa

rc (24

T)

spice

clipa

rc (63

F)

totinf

o Info

Tbl (14

F)

totinf

o Info

Tbl (21

F)

totinf

o Info

Tbl (29

T)

totinf

o Info

Tbl (29

F)

totinf

o Info

Tbl (35

T)

totinf

o Info

Tbl (35

F)

Suc

cess

Rat

e (%

)

Evolutionary Testing

Hill Climbing

Wins for the AVM

1

10

100

1,000

10,000

100,000

gimp_

rgb_to

_hsl

(4T)

gimp_

rgb_to

_hsv

(5F)

gimp_

rgb_to

_hsv

4 (11

F)

gimp_

rgb_to

_hsv

_int (1

0T)

gradie

nt_ca

lc_bil

inear_

factor

(8T)

gradie

nt_ca

lc_co

nical_

asym

_facto

r (3F

)

gradie

nt_ca

lc_co

nical_

sym_fa

ctor (

3F)

gradie

nt_ca

lc_sp

iral_f

actor

(3F)

clipa

rc (13

F)

clipa

rc (15

T)

clipa

rc (15

F)

TIFF_SetS

ample

(5T)

Aver

gage

num

ber o

f fitn

ess

eval

uatio

ns

Evolutionary Testing

Hill Climbing

Wins for the AVM

When does the AVM win?fit

ness

inputinput

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

chec

k_IS

BN (23F

)

chec

k_IS

BN (27T

)

chec

k_IS

BN (29T

)

chec

k_IS

BN (29F

)

chec

k_IS

SN (23F

)

chec

k_IS

SN (27T

)

chec

k_IS

SN (29T

)

chec

k_IS

SN (29F

)

Suc

cess

Rat

e

Evolutionary Testing

Hill Climbing

Wins for Evolutionary Testing

When does ET win?

The branches in question were part of a routine for validating ISBN/ISSN strings

When a valid character is found, a counter variable is incremented

When does ET win?Evolutionary algorithms incorporate

a population of candidate solutions

crossover

Crossover enables valid characters to be crossed over into different ISBN/ISSN strings

Schemata

1010100011110000111010

1111101010000000101011

0001001010000111101011

Subsets of useful genes

e.g. substrings of

1’s in a binary all

ones problem

Schemata

1010100011110000111010

1111101010000000101011

0001001010000111101011

The schema theory predicts that schema of above average fitness will proliferate in subsequent

generations of the evolutionary search

Subsets of useful genes

e.g. substrings of

1’s in a binary all

ones problem

Schemata

1**1 *11*

1111

crossover of fit schemata can lead to even fitter, higher order schemata

Royal Roads

landscape structures where there is a ‘red carpet’ for crossovers to follow

The Genetic AlgorithmRoyal Road

S1: 1111****************************S2: ****1111************************S3: ********1111********************S4: ************1111****************S5: ****************1111************S6: ********************1111********S7: ************************1111****S8: ****************************1111S9: 11111111************************S10: ********11111111****************S11: ****************11111111********S12: ************************11111111S13: 1111111111111111****************S14: ****************1111111111111111S15: 11111111111111111111111111111111

When Crossover Helps

When Crossover Helps Executes the target

R1 R2

Q1

P1 P2

Q2

P3 P4

Q1

P1 P2

Q2

P3 P4

When Crossover Helps

When Crossover Helps Executes the target

Contains 2 valid characters

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains1 valid

character

Contains 2 valid characters

Contains 2 valid characters

Contains 2 valid characters

Contains 4 valid characters

Contains 4 valid characters

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

chec

k_IS

BN (23F

)

chec

k_IS

BN (27T

)

chec

k_IS

BN (29T

)

chec

k_IS

BN (29F

)

chec

k_IS

SN (23F

)

chec

k_IS

SN (27T

)

chec

k_IS

SN (29T

)

chec

k_IS

SN (29F

)

Suc

cess

Rat

e

Evolutionary Testing

Headless Chicken Test

Headless Chicken TestT. Jones, “Crossover, macromutation and population-based search,” Proc. ICGA ’95. Morgan Kaufmann, 1995, pp. 73–80.

Investigations into Crossover

Royal Roads

HIFF

Real Royal Roads

Ignoble Trails

M. Mitchell, S. Forrest, and J. H. Holland, “The royal road for genetic algorithms: Fitness landscapes and GA performance,” Proc. 1st European Conference on Artificial Life. MIT Press, 1992, pp. 245–254.

R. A. Watson, G. S. Hornby, and J. B. Pollack, “Modeling building-block interdependency,” Proc. PPSN V. Springer, 1998, pp.97–106.

T. Jansen and I. Wegener, “Real royal road functions - where crossover provably is essential,” Discrete Applied Mathematics, vol. 149, pp. 111–125, 2005.

J. N. Richter, A. Wright, and J. Paxton, “Ignoble trails - where crossover is provably harmful,” Proc. PPSN X. Springer, 2008, pp. 92–101.

Evolutionary Testing Schemata

{(a, b, c) | a = b}

{(a, b, c) | a > 0}

(50, 50, 25)(100, 100, 10)

...

(50, 10, 25)(100, -50, 10)

...

Crossover of good schemata

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}

{(a, b, c) | a = b ∧ b ≥ 100}

Crossover of good schemata

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}

{(a, b, c) | a = b ∧ b ≥ 100}

subschema subschema

superschema

Crossover of good schemata

{(a, b, c) | a = b} {(a, b, c) | b ≥ 100}

{(a, b, c) | a = b ∧ b ≥ 100}

subschema subschema

superschema

building block building block

Crossover of good schemata

{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10}

{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10}

Crossover of good schemata

{(a, b, c) | a = b ∧ b ≥ 100} {(a, b, c) | c ≤ 10}

{(a, b, c) | a = b ∧ b ≥ 100 ∧ c ≤ 10} coveringschema

What types of program and program structure enable Evolutionary Testing to perform well, through crossover and how?

P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator in Evolutionary Testing? Proc. Symposium on Search-Based Software Engineering, 2010

1. Large numbers of conjuncts in the input condition

{(a, b, c...) | a = b ∧ b ≥ 100 ∧ c ≤ 10 ...

each represents a ‘sub’-test data generation problem that can be solved independently and

combined with other partial solutions

P. McMinn, How Does Program Structure Impact the Effectiveness of the Crossover Operator in Evolutionary Testing? Proc. Symposium on Search-Based Software Engineering, 2010

2. Conjuncts should reference disjoint sets of variables

{(a, b, c, d ...) | a = b ∧ b = c ∧ c = d ...

the solution of each conjunct independently does not necessarily result in an overall solution

What types of program and program structure enable Evolutionary Testing to perform well, through crossover and how?

Progressive Landscapefit

ness

input input

Crossover - Conclusions

Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables

1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables

Crossover - Conclusions

Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables

1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables

... or units that require large sequences of method calls to move an object into a required state

Crossover - Conclusions

Crossover lends itself to programs/units that process large data structures (e.g. strings, arrays) resulting in input condition conjuncts with disjoint variables

1. Large numbers of conjuncts in the input condition 2. Conjuncts should reference disjoint sets of variables

... or units that require large sequences of method calls to move an object into a required state

e.g. testing for a full stack - push(...), push(...), push(...)

Other Theoretical Work

A. Arcuri, P. K. Lehre, and X. Yao, “Theoretical runtime analyses of search algorithms on the test data generation for the triangle classification problem,” SBST workshop 2008, Proc. ICST 2008. IEEE, 2008, pp. 161–169.

A. Arcuri, “Longer is better: On the role of test sequence length in software testing,” Proc. ICST 2010. IEEE, 2010.

Future directions...

The Oracle ProblemDetermining the correct output for a given input is called the oracle problem

Software engineering research has devoted much attention to automated oracles

... but many systems do not have automated oracles

Human Oracle Cost Typically the responsibility falls on the human

Reducing human effort, the human oracle cost remains an important problem

Test data generation and human oracle cost

generate test data that maximises coverage but minimises the number of test cases

reduce size of test cases

Quantitative

Test data generation and human oracle cost

Quantitative generate test data that maximises coverage but minimises the number of test cases

reduce size of test casesA. Leitner, M. Oriol, A. Zeller, I. Ciupa, and B. Meyer.

Efficient unit test case minimization. ASE 2007, pp. 417–420. ACM.

M. Harman, S. G. Kim, K. Lakhotia, P. McMinn, and S. Yoo. Optimizing for the number of tests generated in search based test

data generation with an application to the oracle cost problem. 3rd Search-Based Software Testing workshop (SBST 2010), 2010. IEEE

digital library.

Test data generation and human oracle cost

Qualitative e.g. how easily can the scenario comprising a test case be understood so that the output can be evaluated ?

Test data generation and human oracle cost

Qualitative e.g. how easily can the scenario comprising a test case be understood so that the output can be evaluated ?

Test data generation and human oracle cost

Qualitative e.g. how easily can the scenario comprising generated test databe understood so that the output can be evaluated ?

Phil McMinn, Mark Stevenson and Mark Harman.Reducing Qualitative Human Oracle Costs associated with Automatically Generated Test Data.

Proceedings of the 1st International Workshop on Software Test Output Validation (STOV 2010), to appear.

Calendar program

Takes two dates (represented by 3 integers each)

Finds the number of days between the dates

Calendar program

-4048/-10854/-29141 10430/3140/6733 3063/31358/8201

Some example dates generated:

Machine-generated test data tends to not fit the operational profile of a program particularly well

Seeding knowledgeProgrammer test case used as the starting point of the search process

16/1/20102/1/20092/32/2010

In what ways can operational profile knowledge be obtained?

Programmers

The Program

In what ways can operational profile knowledge be obtained?

Likely to have run the program with a sanity check

These can be seeded to bias the test data generation process

Programmers

The Program

In what ways can operational profile knowledge be obtained?

Programmers

The Program

Identifier names

Code comments

Sanitisation routines

Identifier namesGive clues to the sort of inputs that

might be expected

dayOfTheWeek

country_of_origin

url

Can be analysed in conjunction with large-scale natural language lexicons such as WordNet

Sanitisation Routines

Defensive programming routines might be used to ‘correct’ a program’s own test data

Test Data Re-use

Test Data Re-use

Program similarity and test data re-use

Code structure

Identifier names

Code comments

clone detection

plagiarism detection

Will these techniques work?

Will they compromise fault-finding capability?

web: http://www.dcs.shef.ac.uk/~phil

email: p.mcminn@sheffield.ac.uk

twitter: @philmcminn

Questions &Discussion