© improuv GmbH Agile Leadership. http://improuv.com
Live and Let Die
Robust Testing the Fancy Way
Seb Heglmeier
Daniel Zappold
© improuv GmbH Agile Leadership | http://improuv.com
About us
improuv is an agile consultancy. We support enterprises on their agile journey.
Our expertise is Scrum, Kanban, and Lean & Agile software development.
Sebastian Heglmeier
@SebHeglmeier
github.com/CandleLightDoener
Daniel Zappold
@dzappold
github.com/dzappold
© improuv GmbH Agile Leadership | http://improuv.com
Mutation Testingin a Nutshell
© improuv GmbH Agile Leadership | http://improuv.com
A common problem:
OMG we have a BUG in production!!
• How could that happen?
• I thought there‘s enough tests!
• Are your tests any good at all?
© improuv GmbH Agile Leadership | http://improuv.com
Philosophical question
Quis custodiet ipsos custodes?
…
Who will guard the guards?
© improuv GmbH Agile Leadership | http://improuv.com
A real life problem
• Research at University of Gothenburg & Volvo [2]: car parking algorithm not
as robust against as expected when sensor measurements are a bit off
© improuv GmbH Agile Leadership | http://improuv.com
A real life problem
• Research at University of Gothenburg & Volvo [2]: car parking algorithm not
as robust against as expected when sensor measurements are a bit off
© improuv GmbH Agile Leadership | http://improuv.com
“Covered” doesn’t mean “well tested”
source: www.pitest.org
© improuv GmbH Agile Leadership | http://improuv.com
So …
How did they find these bad real-life scenarios
which were not covered by tests?
© improuv GmbH Agile Leadership | http://improuv.com
Mutation Testing ☺Generate a mutant by applying one well-defined change (mutation
operation) to your production code (byte code) and let your test suites run
against it. If none of your tests turns red, it means that this change was
not covered. This might be bad (it depends).
Note: there’s many mutants per class (typically 1 to 3 mutants per LOC)
Original code
4 mutants
© improuv GmbH Agile Leadership | http://improuv.com
Examples for typical Mutation Operators
Original Mutated
• Conditionals
• Change slightly < <=
• Negate == !=
• Remove if(a==b) if(true), if(false)
• Math Mutators?
• Negate + -
& |
• Return values
• Negate or replace some Object null
null throw RuntimeException
• Skip void calls someVoidMethod() …[not called]
© improuv GmbH Agile Leadership | http://improuv.com
Mutation Testing..why like this and not
differently?
a) stupid small mistake
b) Murphy’s law (mistakes couple)
c) big problem
� Typical Mutation Operators mimic exactly these small mistakes
source: wikipedia
© improuv GmbH Agile Leadership | http://improuv.com
History of Mutation Testing
• Proposed by Richard J. Lipton in 1971 (winner of 2014 Knuth Prize) as a
better way to measure the quality of your tests
• 1st implementation tool was a PhD work in 1980
More than 40 years old and you call that „fancy“?
• Recent availability of massive computing power has led to a resurgence
• Test Automation and Byte Code Manipulation speeden up things further
© improuv GmbH Agile Leadership | http://improuv.com
Example: Gilded Roseone badly written method which updates
the quality of (magical) items based on their
respective dates of expiry.
© improuv GmbH Agile Leadership | http://improuv.com
Some mutants survived
Number of
generated mutants
per line
This means: we could just alter the code like this .. and get away with it?
© improuv GmbH Agile Leadership | http://improuv.com
Mutation Test Results
KILLED
• A mutant is killed if a test fails (detecting the mutated code)
• This proves the mutated code is properly tested
SURVIVED
• A mutant didn‘t trigger a failing test ...
TIMED OUT
• The mutant caused the program loop, get stuck
© improuv GmbH Agile Leadership | http://improuv.com
What to do with survivors?
• Manual analysis of the surviving mutant
• Is the mutant’s behaviour equivalent to the orginal code? -> That’s OK!
• Otherwise: try to write red test and fix it
[3]
© improuv GmbH Agile Leadership | http://improuv.com
Conclusion on Mutation Testing
• It’s very powerful in detecting hidden defects even in well tested code
• It is computationally expensive (not that big of an issue I find)
• Equivalent mutants detection is still “research in progress” and involves
manual work (depending on your code: that could be a killer)
• Proper TDD would help avoid these kinds of (potential) bugs
• BUILDING quality in (instead of TESTING quality in)
© improuv GmbH Agile Leadership | http://improuv.com
References for Mutation Testing
• [1]
https://www.researchgate.net/publication/4200521_Is_mutation_an_appro
priate_tool_for_testing_experiments
• [2]
https://www.researchgate.net/publication/274079891_Early_Verification_a
nd_Validation_According_to_ISO_26262_by_Combining_Fault_Injection_an
d_Mutation_Testing
• [3] https://www.st.cs.uni-saarland.de/publications/files/gruen-mutation-
2009.pdf
• PIT Mutation Testing: http://pitest.org
• Gilded Rose: https://github.com/emilybache/GildedRose-Refactoring-Kata
© improuv GmbH Agile Leadership | http://improuv.com
Property Based Testingin a Nutshell
© improuv GmbH Agile Leadership | http://improuv.com
Are you familiar with something like this
• Losing overview with too many (similar) tests?
• Did we choose proper examples for our tests in order to fulfill our
specification?
© improuv GmbH Agile Leadership | http://improuv.com
What is Property Based Testing?
• Moving from examples to properties
© improuv GmbH Agile Leadership | http://improuv.com
Why Property Based Testing?
• Readability, understandability
• One property replaces many tests
• Better testing
• Lots of combinations you‘d never test by hand
• Less time spent on diagnosis
• Failures minimized automagically (since junit-quickcheck 0.6 (11.03.2016) also for java)
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose
• Let‘s look at the requirements to spot a first property
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Quality is Never Negative
• Let‘s write our first property based test
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Ups ...
• Our test fails ... but was shrunken to ...
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Wrong Input Data
• Limit your input
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Respect Max Quality
• Limit your test data to correct input
© improuv GmbH Agile Leadership | http://improuv.com
Should be valid for every item
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Quality is Never Negative
Independent from Item Name
• .
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Writing Own Generator
© improuv GmbH Agile Leadership | http://improuv.com
Gilded Rose – Writing Own Generator
© improuv GmbH Agile Leadership | http://improuv.com
Coverage after first property.
© improuv GmbH Agile Leadership | http://improuv.com
Potential Pitfalls
• Using wrong test data set
• Re-implementing a function
• Can be difficult to determine what properties you should use
• Forces you to think carefully about what a function is doing
• Some generators are complex
© improuv GmbH Agile Leadership | http://improuv.com
Conclusion on Property Based Testing
• Writing good properties can be hard
• Writing good generators can be even harder
• PBTs are more general
• One proberty-based test can replace many example-based tests.
• PBTs can reveal overlooked edge cases
• Nulls, negative numbers, weird strings, etc.
• PBTs ensure deep understanding of requirements
• Property-based tests force you to think! �/☺
• Example-based tests are still helpful though!
• Easier to understand for newcomers, better for TDDing
© improuv GmbH Agile Leadership | http://improuv.com
References for PBT
• JUnit Quickcheck: https://github.com/pholser/junit-quickcheck
• Title Slide: http://www.teachthis.com.au/products/view-
resource/link/Properties-of-Living-Things/id/4008/
• Gilded Rose: https://github.com/emilybache/GildedRose-Refactoring-Kata