Post on 20-Jun-2015
description
transcript
Mutation Testing
Chris Sinjakli
Ruby
Edition
Testing is a good thing
But how do we know our tests are good?
Code coverage is a start
But it can give a “good” score with really dreadful tests
Really dreadful testsclass Adder def self.add (x, y) return x - y endend
describe Adder do it "should add the two arguments" do Adder.add(1, 1) endend
Coverage: 100%Usefulness: 0
A contrived example
But how could we detect it?
Mutation Testing!
“Who watches the watchmen?”
If you can change the code, and a test doesn’t fail, either the code is never run or the tests are
wrong.
How?
1. Run test suite2. Change code (mutate)3. Run test suite again
If tests now fail, mutant dies. Otherwise itsurvives.
Going with our previous exampleclass Adder def self.add (x, y) return x - y endend
describe Adder do it "should add the two arguments" do Adder.add(1, 1) endend
Let’s change something
Going with our previous exampleclass Adder def self.add (x, y) return x + y endend
describe Adder do it "should add the two arguments" do Adder.add(1, 1) endend
This still passes
Success
We know something is wrong
So what? It caught a really rubbish test
How about something slightly less obvious?
Slightly less obvious (and I mean slightly)
class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end endend
describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 endend
Coverage: 100%Usefulness: >0But still wrong
Slightly less obvious (and I mean slightly)
class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end endend
describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 endend
Mutate
Slightly less obvious (and I mean slightly)
class ConditionChecker def self.check(a, b) if a || b return 42 else return 0 end endend
describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 endend
Passing tests
Mutation testing caught our mistake
:D
Useful technique
But still has its flaws
The downfall of mutation(Equivalent Mutants)
index = 0
while index != 100 dodoStuff()index += 1
end
index = 0
while index < 100 dodoStuff()index += 1
end
Mutates to
But the programs are equivalent, so no test will fail
There is no possible test which can “kill” the mutant
The programs are equivalent
Also (potentially)
• Infinite loops• More memory used• Compile/run time errors – tools should
minimise these
How bad is it?
• Good paper assessing the problem [SZ10]• Took 7 widely used, “large” projects• Found:– 15 mins to assess one mutation– 45% uncaught mutations are equivalent– Better tested project -> worse signal-to-noise ratio
Can we detect the equivalents?
• Not in the general case [BA82]• Some specific cases can be detected– Using compiler optimisation techniques [BS79]– Using mathematical constraints [DO91]– Line coverage changes [SZ10]
• All heuristic algorithms – not seen any claiming to kill all equivalent mutants
Tools
Some Ruby, then a Java one I liked
Ruby
• Looked into Heckle• Seemed unmaintained (nothing since 2009)• Then I saw...
Ruby
Ruby
• Mutant seems to be the new favourite• Runs in Rubinius (1.8 or 1.9 mode)• Only supports RSpec• Easy to set up
rvm install rbx-headrvm use rbx-headgem install mutant
• And easy to usemutate “ClassName#method_to_test” spec
Java
• Loads of tools to choose from• Bytecode vs source mutation• Will look at PIT (seems like one of the better
ones)
PIT - pitest.org
• Works with “everything”– Command line– Ant– Maven
• Bytecode level mutations (faster)• Very customisable
– Exclude classes/packages from mutation– Choose which mutations you want– Timeouts
• Makes pretty HTML reports (line/mutation coverage)
Summary
• Can point at weak areas in your tests• At the same time, can be prohibitively noisy• Try it and see
Questions?
References• [BA82] - T. A. Budd and D. Angluin. Two notions of correctness and
their relation to testing. Acta Informatica, 18(1):31-45, November 1982.
• [BS79] - D. Baldwin and F. Sayward. Heuristics for determining equivalence of program mutations. Research report 276, Department of Computer Science, Yale University, 1979.
• [DO91] - R. A. DeMillo and A. J. Outt. Constraint-based automatic test data generation. IEEE Transactions on Software Engineering, 17(9):900-910, September 1991.
• [SZ10] - D. Schuler and A. Zeller. (Un-)Covering Equivalent Mutants. Third International Conference on Software Testing, Verification and Validation (ICST), pages 45-54. April 2010.
Also interesting• [AHH04] – K. Adamopoulos, M. Harman and R. M. Hierons. How to
Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. Genetic and Evolutionary Computation -- GECCO 2004, pages 1338-1349. 2004.