+ All Categories
Home > Documents > MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

Date post: 11-Feb-2016
Category:
Upload: ownah
View: 39 times
Download: 0 times
Share this document with a friend
Description:
MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar. Fyrirlestrar 41 & 42 Comparing Bug Finding Tools. Can you detect me?. Case Study Dæmisaga. Reference Comparing Bug Finding Tools with Reviews and Tests , - PowerPoint PPT Presentation
36
05/18/22 Dr Andy Brooks 1 MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar Fyrirlestrar 41 & 42 Comparing Bug Finding Tools Can you detect me?
Transcript
Page 1: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 1

MSc Software Testing and MaintenanceMSc Prófun og viðhald hugbúnaðar

Fyrirlestrar 41 & 42Comparing Bug Finding Tools

Can you detect me?

Page 2: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 2

Case StudyDæmisaga

ReferenceComparing Bug Finding Tools with Reviews and Tests,Stefan Wagner, Jan Jürjens, Claudia Koller, and Peter Trischberger, Institut für Informatik, Technische Universität München, 2005http://www4.in.tum.de/publ/papers/SWJJCKPT05.pdf

Page 3: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 3

1. Introduction

• Software quality assurance accounts for around 50% of the development time.

• Defect-detection techniques need to be improved and costs reduced.

• There are a number of automated static analysis tools called bug finding tools.

• Faults are the cause of failures in code.

Page 4: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 4

Problem

1. Which kinds of defects are found by bug finding tools, reviews, and testing?

2. Are the same or different defects found?• How much overlap is there between the

different techniques?

3. Do the static analysis tools produce too many false positives?

• Bug reports that are not actually bugs...

1. Introduction

Page 5: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 5

Results1. Bug finding tools detect only a subset of the

kinds of defects that reviews find.2. The tools are better regarding the bug patterns

they are programmed for.3. Testing finds completely different defects than

bug finding tools.4. Bug finding tools produce many more false

positives than true positives.5. Results of applying bug finding tools varies

according to the project being studied.

1. Introduction

Page 6: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 6

Consequences

1. Testing or reviews cannot be substituted by bug finding tools.

2. Bug finding tools could be usefully run before conducting reviews.

3. The false positive ratio from bug finding tools needs to be lowered to realise reductions in defect-detection effort.

4. Tools should be more tolerant of programming style and design.

1. Introduction

Page 7: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 7

Experimental Setup

• Five Java projects– 4 industrial (telecomms company O2)

• web information systems– 1 university

• Technische Universität München– projects in use or in final testing– projects have an interface to a relational

database• Java bug finding tools and testing was

applied to all 5 projects.

1. Introduction

Page 8: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 8

Experimental Setup

• A review was applied to only one project.• Reports from the bug finding tools were

classified as true and false positives by experienced developers.

• Defects were classified by:– severity– type

1. Introduction

Page 9: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 9

Techniques used by tools

• Bug patterns are based on experience and known pitfalls in a programming language.

• Readability is checked based on coding guidelines and standards.

• Dataflow and controlflow analysis.• Code annotations to allow extended static

checking/model checking.– code annotation tools ignored in this study

2. Bug Finding Tools

Page 10: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 10

The Java bug finding tools• FindBugs Version 0.8.1

– bug patterns & dataflow analysis• can detect unused variables

– analyses bytecode• PMD Version 1.8

– coding standards• can detect empty try/catch blocks• can detect classes with high cyclomatic complexity

• QJ Pro Version 2.1– uses over 200 rules

• can detect too long variable names• can detect imbalance between code and commentary lines

2. Bug Finding Tools

Page 11: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 11

3. Projects• Project A

– online shop– software in use for 6 months– 1066 Java classes, over 58 KLOC

• Project B– pay for goods– not operational at time of study– 215 Java classes, over 24 KLOC

• Project C– frontend for file converter– software in use for 3 months– over 3 KLOC and JSP code

Page 12: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 12

3. Projects• Project D

– data manager– J2EE application– 572 classes, over 34 KLOC

• EstA– non-industrial, requirements editor– not extensively used– over 4 KLOC

Page 13: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 13

4.1 General• Bug finding tools used on all 5 projects.• Black-box and white-box testing of all 5

projects.• One review (Project C).• Techniques used completely independently.• Warnings from the tools are called positives

and experienced developers classified them as true positives or false positives.

4. Approach

Page 14: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 14

4.1 General• Validity threats include:

– one review is not representative of reviews– only 3 bug finding tools were used

• there are many more and results might be different– testing of the mature projects did not reveal

many faults• too little data to make accurate statistical inferences

– only 5 projects were analysed• more experiments are necessary

4. Approach

Page 15: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 15

4.2 Defect Categorisation

1. Defects that lead to a crash.2. Defects that cause a logical failure.3. Defects with insufficient error handling.4. Defects that violate the principles of

structured programming.5. Defects that reduce code maintainability.

4. Approach

Page 16: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 16Table 1, Section 5 Analysis

*

over all projects

Page 17: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 17

Observations and Interpretations

• Most of the true positives are Category 5– code maintainability

• Different tools find different positives.– only one defect type was found across all tools*

• FindBugs is the only tool to find positives across all defect categories 1 thru´ 5.

• FindBugs detects the most number of types, QJ Pro the least.

5.1 Bug Finding Tools

Page 18: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 18

Observations and Interpretations• True positive detection is diverse.

– For the defect type in common to all, FindBugs finds only 4 true positives, PMD finds 29, and QJ Pro finds 30.

• FindBugs and PMD have lower false positive ratios than QJ Pro.– Because all warnings have to be examined, QJ Pro is

not efficient.

5.1 Bug Finding Tools

FindBugs PMD QJ Pro Total0.47 0.31 0.96 0.66

Table 2. Average ratios of false positives for each tool and in total.

Page 19: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 19

Observations and Interpretations• Efficiency of tools varied across projects.

– For the Category 1 defect (“Database connection not closed”), FindBugs issued true positives for projects B and D but 46 false positives for project A.

– Detection rates of true positives decreases for projects A and D for the other two tools.

• Ignoring Category 5 defects.

• Recommending a single tool is difficult.– QJ Pro is the least efficient.– FindBugs and PMD should be used in combination.

• FindBugs finds many different defect types.• PMD has accurate results for Category 5 defects.

5.1 Bug Finding Tools

Page 20: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 20

5.2 Bug Finding Tools vs. Review• An informal review was performed on

project C with three developers.– no preparation– code author was a reviewer– code inspected at the review meeting– 19 different types of defects were found

This variable is initialised but not used.

Page 21: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 21Section 5.2

**

Page 22: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 22

Observations and Interpretations• All defect types found by the tools* were

also found by the review of project C:– “Variable initialised but not used”

• The tools found 7 defects.• The review found only one.

– “Unnecessary if clause”• The review found 8 defects.

– An if-clause with no further computation.– 7 defects required investigation of program logic.

• The tools found only one.– The if-clause with no further computation.

5.2 Bug Finding Tools vs Review

Page 23: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 23

Observations and Interpretations• But 17 additional defect types were found in the

review, some of which could have been found by tools but were not:– “Database connection is not closed” was not found by

the tools.– FindBugs is generally able to detect “String

concatenated inside loop with “+”” but did not.• to avoid creating unnecessary and unreferenced String objects

• Defect types such as “Wrong result” cannot be found by static tools but can be found in a review by manually executing a test case through the code.

5.2 Bug Finding Tools vs Review

Page 24: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 24

Observations and Interpretations• By finding more defect types, the review of

project C can be thought of as more successful than any tool.

• Perhaps it is beneficial to use a bug finding tool first because automated static analysis is cheap.– But bug finding tools produce many false

positives and the work involved in assessing a positive as false might outweigh the benefits of automatic static analysis.

5.2 Bug Finding Tools vs Review

Page 25: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 25

5.3 Bug Finding Tools vs. Testing• Several hundred test cases were executed.• Black-box test cases were based on the

textual specifications and the experience of the testers.– equivalence partitioning– boundary value analysis

• White-box test cases involved path testing.– Path selection criteria are not specified.

Page 26: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 26

5.3 Bug Finding Tools vs. Testing

• A coverage tool checked test set quality.– Coverage was high apart from project C.– “In all the other projects, class coverage was nearly

100%, method coverage was also in that area and line coverage lay between 60% and 93%.”

• No stress tests were executed.– This “might have changed the results significantly”.

• Defects were found only for project C and project EstA.– Other projects were “probably too mature”.

Page 27: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 27

Page 28: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 28

Observations and Interpretations

• Dynamic testing found defects in Categories 1,2, and 3, but not 4 or 5.– Category 5 defects are not detectable by dynamic

testing.• Dynamic testing of project C and project EstA

found completely different defects to those found by the bug finding tools.

• Stress testing might have revealed the database connections that were not closed.

• “Therefore, we again recommend using both techniques in a project.”

5.3 Bug Finding Tools vs. Testing

Page 29: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 29

5.4 Defect Removal Efficiency

• The total number of defects is unknown but can be estimated using all the defects found so far.

• Without regard to severity of defect, efficiency is poor for tests and good for the bug finding tools.

(Only 1 defect found in common: between Review and Tools.)

Page 30: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 30

5.4 Defect Removal Efficiency

• With regard to severity of defect, tests and reviews are “far more efficient in finding defects of the categories 1 and 2 than the bug finding tools”.

Page 31: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 31

6. Discussion

• The results are not too surprising:– Static tools, with no model checking capabilities, are

limited and cannot verify program logic.– Reviews and tests can verify program logic.

• Perhaps surprising is that there was not a single defect detected both by the tools and testing.– Few defects, however, were found during testing

since most of the projects were mature and already in operation. This may explain the lack of overlap.

Page 32: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 32

6. Discussion• “A rather disillusioning result is the high

ratio of false positives that are issued by the tools.”– The benefits of automated detection are

outweighed by the need to manually determine a positive is false.

• No cost/benefit analysis performed in this study.

Page 33: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 33

6. Discussion• Some bug finding tools make use of

additional annotations that permit some checks of logic.– The number of false positives could be reduced.– Category 1 and 2 defect detection could be

increased.– But savings could be outweighed by the need to

add annotations to the source code.

Page 34: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 34

8. Conclusions

• Work is not a comprehensive empirical study and provides only “first indications” of the effectiveness of bug finding tools to other techniques.– Further experimental work is needed.– Cost/benefit models need to be built.

Page 35: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 35

8. Conclusions

• Bug finding tools find:– different defects than testing– a subset of the types a review finds

• Bug finding tool effectiveness varied from project to project.– Probably because of different programming

style and design in use.• Andy asks: how should we incorporate the idea of

maintainability into static analysis tools?

Page 36: MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

04/22/23 Dr Andy Brooks 36

8. Conclusions

• If the number of false positives were much lower, it would be safe to recommend using bug finding tools, reviews and testing in a combined approach.– “It probably costs more time to resolve the

false positives than is saved by the automation using the tools.”

Looks like another false positive and another two minutes of my time wasted...


Recommended