BW10 Session 6/5/2013 3:45 PM
"Enhancing Developer Productivity with Code Forensics"
Presented by:
Anthony Voellm Google, Inc.
Brought to you by:
340 Corporate Way, Suite 300, Orange Park, FL 32073 888‐268‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com
Anthony Voellm Google, Inc.
At Google Anthony Voellm is focused on delivering performance, reliability, and security to the Google Compute Engine, Google App Engine, Google Cloud SQL, and Google Cloud BigQuery while also innovating new offerings. His experience ranges from kernel and database engines to image processing and graphics. Anthony is an avid inventor who holds seven technology patents. Prior to joining Google in 2011, Anthony held multiple roles at Microsoft leading the Windows reliability, security, and privacy test teams. Anthony has taught performance testing to more than 2,000 people worldwide and given dozens of informative talks on software fundamentals and the cloud. He writes a technology blog on software fundamentals.
Enhancing Developer Productivity with Code Forensics:Applications of behavioral analysis and developer assessment to improve productivity
Presented at BSE West - Vegas - June 5th 2013
Anthony F. VoellmGoogle Cloud Security, Performance and Test Manager
[email protected] / G+ / @p3rfguy
The hypothesis:
Today Tomorrow
One size fits all testing• Do everything• Do nothing• Best guess• Static code analysis
The right amount of tests• Skills / Knowledge• Experience• State of mind• Behavior
Overview• Part 1 - The backdrop
• Part 2 - The big question
• Part 3 - Measure, Measure, Measure
• Part 4 - Where is the science?
• Part 5 - The path forward
The backdrop
All developers are are the same - right?
Chevy Nova for a car is a great name in english. "No va" however in spanish means "no go."
Internationalization
Reliability We live in a world of
bugs
Internationalization
Security bugsPerformance bugs
Logical Bugs
Accessibility bugs
...int x[10];
x[10] = -1;...
Where do bugs come from?
Humans... The Google Brain - NYTimes article
... in the future machines.
How do bugs happen?
Fridays… :)
How do bugs happen?
T t Bl " If ’ ti duTest Blog - "... If you’re tired, angry or frustrated for instance (like Patriots fans this morning) then you’re almost guaranteed to make some careless mistakes "make some careless mistakes. ...
How do bugs happen?
College mentee of mine -"Cut and paste is wicked."
How do bugs happen?
"How do fixes become bugs" Paperin 2011 - "...the bug-fixing process can also introduce errors... Developers and
reviewers for incorrect fixes usually do not have enoughusually do not have enough
knowledge...”
How do bugs happen?
NIST Study - "Software is error-ridden in part because
i l itof its growing complexity. The size of software products is no longer measured in thousands of lines of code, but in millions.
Software developers already spend approximately 80 percent of development costs on identifying and correcting defects, and yet few products of any type
other than software are shipped with such highother than software are shipped with such high levels of errors."
Speaking of complexity...
From - "How do fixes become bugs" paper
The cost of bugs• Time
o 25%+ of developers time is fixing bugso 25%+ of developers time is fixing bugso A 1 line fix takes 1+ hours of testing is common
• Moneyo ~$60 Billion (9 zero's) to the US economy each year
in 2002!
• Reputationo 10% Error rate on critical security fixes.
Reputation … this might be the most important.
Years to develop and it only takes minutes to destroy.
The big question
One size fits allWith all the evidence that humans are the root cause of bugs *and* we all have different levelscause of bugs and we all have different levels of skill.... Why do we all test the same?
From - "How do fixes become bugs" paper
Types of testing• All unit tests (15 minutes or less should be the target)
o The most basic test with all layers stripped away.
• All Integration / System Tests (1 h l )
TimeSink
• All Integration / System Tests (1 hour or less)o Uses multiple features together.
• All Performance Tests (8 hours or less)o Micro-benchmarks (fio, iperf, ...)o Industry benchmarks (SpecCPU, TPC-C, Hibernate, ...)
• All Reliability tests (days)o Longhaulo Longhaulo Leak detection tools
• All Security tests (weeks)o Smart Fuzzerso Static code analyzers like Coverity
HURRAY!
“…time pressure prevents testers from conducting
thorough regression tests before releasing the fix.”*
How do we choose what to run?
• Today:o Run everythingo Run nothingo Selectively run based on complex change
to test associationso Best guess based on developer caution
How do we choose what to run?
• Tomorrow:o Run based on developer skillo Run based familiarity with code baseo Run based on the complexity of the codeo Run based on the type of bug being fixedo Run based on behavioral analysis of the
code
Measure, Measure, Measure
What should we measure?
Artifacts
• What is the frequency of check ins by the developer?
• How often has this developer checked in a [severe] bug?[ ] g
• Do bugs trail checkings?
Behavior and skillCode .....Code .....Code .....Code .....C d
Freshman
Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....
Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....Code .....
Code .....Code .....Code .....Code .....Code .....Code
Code .....Code
Sophomore
Junior
Senior
VS
Code .....Code .....
Code .....Code .....Code .....Code .....
Code .....Code .....Code .....Code .....Code .....
Code ..... Guru
Measures
• Is the checkin in code the developer is "familiar" with?with?
• Knowledge of the code reviewer.
• Peer ranking on how people feel about your level g p p yof expertise.
• What is the size of the check in?
Emotions
http://people.brandeis.edu/~sekuler/eegERP.html
Measures
• Is the the day before a weekend?
• Is the time of day of the check in unusual for the developer?
• Bug DebtBug Debt
Cyclomatic complexity
http://en.wikipedia.org/wiki/Cyclomatic_complexity
Measures
• Does the developer write complex code?
• How layered is the code?
• Does the developer write units tests?
• What percentage of the check in is covered by tests?
Where is the science?
Studies...
Empirical investigation of software p gproduct line quality
Researcher: Katerina Goseva-PopstajanovaLane Department of Computer Science and Electrical Engineering
West Virginia Universityest g a U e s ty
***Special thanks to Katerina Goseva-Popstajanova who presented at GTAC2013 and graciously allowed me to use these slides. You can see her full talk here - http://www.youtube.com/watch?v=fiG-SdNcjTE
West Virginia University
The following slides are based on the paper
Open source product line: basics
“A longitudinal study of post-release faults in an evolving, open-source software product line”
by T. Devine, K. Goseva-Popstajanova, S. K i h d R R L tKrishnan, and R. R. Lutz
Submitted to a journal, currently under review
West Virginia University
Open source product line: basics
Eclipse can be treated as a SPL • Currently consists of fourteen different members that
share main components and are set apart by variable components
• Considered four products: Classic, C/C++, Java and JavaEE
• Large size: these four products consist of over 125,000 files and 20 million LoC
• Evolving product line: considered seven releases• Goals: assessment and prediction of post-release faults
West Virginia University
Open source product line: basics
Release Year ClassicPkgs
C/C++Pkgs
JavaPkgs
JavaEEPkgs
Total
KLoC Pkgs FaultyPkgsPkgs
2.0 2002 34 773 34 26
2.1 2003 41 1054 41 37
3.0 2004 76 1756 76 70
3.3.Europa 2007 85 62 103 185 3988 185 148
3 43.4 Ganymede 2008 89 62 105 200 4291 200 152
3.5 Galileo 2009 77 61 104 188 3913 188 120
3.6 Helios 2010 77 61 105 206 4262 206 103
West Virginia University
Different degrees of reuseEuropa Ganymede
Galileo Helios
West Virginia University
MetricsCode metrics Change metrics
LoC Revisions Average Changeset
Statements R f t i AStatements Refactorings Age
Percent Branch Statement Bugfixes Weighted Age
Method Call Statement Authors
Percent Lines with Comments LoC Added
Classes and Interfaces Max LoC Added
Methods per class Average LoC Added
Average Statements per Method L C D l t dAverage Statements per Method LoC Deleted
Max Complexity Max LoC Deleted
Average Complexity Codechurn
Max Block Depth Max Codechurn
Average Block Depth Average Codechurn
Statements at Block Level n (0, 1, … 9) Max Changeset
West Virginia University
Assessment: Evolution through releases
1. Does quality, measured by the number of post-release faults for the packages in each release, consistently improve as the SPL matures?improve as the SPL matures?
Post-release fault density decreases as the product line matures through releases
West Virginia University
Assessment: Post-release fault distribution
2. Do the majority of faults reside in a small subset of packages?
For each release, from 66% to 93% of post-release faults were located in 20% of packages, with average around 81%
West Virginia University
Prediction: What features are good predictors?
7. Are some features better indicators of the number of post-release faults in a package than others?
Feature selection via stepwise regression selected from 1 –16 features out of 112 features
West Virginia University
Prediction: What features are good predictors?
Of the fifteen features appearing in more than a quarter of models, only four are static code metrics
West Virginia University
Prediction: What features are good predictors?
Change metrics (correlation 0.726 – 0.768)• total and maximum number of bugfixes • total authors • total code churn • total revisions
S ( 0 610 0 683)Static code metrics (correlation 0.610 - 0.683)• maximum statements at block level one• maximum and total statements at block level four• maximum method call statements
The path forward
Use the human factor to ship faster...• Create [AI] models that account for...
o Experienceo Knowledge of the code baseo Knowledge of the code baseo Code complexityo Measures of behavioro ... and much more
• Use the models as part of check in and code health...o Developers with less risky profiles run less testso Developers with higher risk profile run more tests
• Let automated systems running in parallel be the safety net.
Human factors success - Blint! By Erick Fejta
Let productivity soar!
End - Questions?
Name: Anthony F. Voellm (aka Tony)Contact: [email protected]: http://perfguy.blogspot.comG+: http://goo gl/mPXcXG+: http://goo.gl/mPXcXTwitter: @p3rfguy
Appendix
Abstract:This talk will present data and findings on how behavioral analysis and
developer assessment can be applied to improving productivity. Just image an engineering system that could recognize rushed check-ins, "grade" developer knowledge, and use that data to speed up development -"Congrats Jane you know this code well ... no check-in test gate for you." The approach has been motivated by looking at today's test systems, tools, and processes and recognizing thes are designed around the premise that all developers are created equal. Studies have shown developer error rates can vary widely and have a number of root causes - mind set of the developer at the time the code was written, experience level, amount of code in a check in complexity of the code and much more This talk willcode in a check in, complexity of the code, and much more. This talk will introduce a number of metrics and concepts such as Cyclomatic complexity and Digital Code Forensics, and demonstrate how even modest application of the approach can speed up development. This is the bleeding edge of engineering productivity.
The message:The message:
Not all developers have the same experience or skill level, and we can use this to improve the speed of development. Speed up the better developers, and slow down the less precise. We dont need a one
size fits all policy, however we do need to base the decisions on data.
References• http://en.wikipedia.org/wiki/Software_bug#cite_note-1• http://www.cs.unm.edu/~forrest/classes/readings/HowDoFixesBecomeBugs.pdf• http://blog.utest.com/the-software-testing-mindset/2012/02/• http://www.cse.buffalo.edu/~mikeb/Billions.pdf• http://software-testing-zone.blogspot.com/2008/12/why-are-bugsdefects-in-software.html• http://www.itbusinessedge.com/cm/community/features/guestopinions/blog/battling-software-
defects-one-developer-at-a-time/?cs=39611• http://istqbexamcertification.com/what-is-the-psychology-of-testing/• http://ubuntuforums.org/archive/index.php/t-1582847.html• http://sqa.stackexchange.com/questions/545/how-does-a-testers-perspective-toward-software-
differ-from-a-developersdiffer from a developers• http://software-testing-zone.blogspot.com/2009/04/software-testing-diplomacy-deal.html