Post on 18-Nov-2014
transcript
A Dissertationentitled
A Cost Effective Methodology for Quantitative Evaluation of
Software Reliability using Static Analysis
by
Walter W. Schilling, Jr.
Submitted as partial fulfillment of the requirements for theDoctor of Philosophy in Engineering
Advisor: Dr. Mansoor Alam
Graduate School
The University of ToledoDecember 2007
The University of Toledo
College of Engineering
I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY
SUPERVISION BY Walter W. Schilling, Jr.
ENTITLED A Cost Effective Methodology for Quantitative Evaluation of Software
Reliability using Static Analysis
BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR
THE DEGREE OF DOCTOR OF PHILOSOPHY IN ENGINEERING
_____________________________________________________________________
Dissertation Advisor: Dr. Mansoor Alam
Recommendation concurred by _______________________________________ Dr. Mohsin Jamali _______________________________________ Dr. Vikram Kapoor _______________________________________
Dr. Henry Ledgard
_______________________________________ Dr. Hilda Standley _______________________________________ Mr. Michael Mackin _______________________________________ Mr. Joseph Ponyik
_____________________________________________________________________
Dean, College of Engineering
Committee
On
Final Examination
Copyright c©2007
All Rights Reserved. This document is copyrighted material. Under copyright
law, no parts of this document may be reproduced, stored in a retrieval system,
or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise without prior written permission of the author.
Typeset using the LATEX Documentation system using the MikTEX package devel-
oped by Christian Schenk.
All trademarks are the property of their respective holders and are hereby ac-
knowledged.
An Abstract of
A Cost Effective Methodology for Quantitative Evaluation of Software
Reliability using Static Analysis
Walter W. Schilling, Jr.
Submitted in partial fulfillmentof the requirements for the
Doctor of Philosophy in Engineering
The University of ToledoDecember 2007
Software reliability represents an increasing risk to overall system reliability. As
systems have become larger and more complex, mission critical and safety critical
systems have increasingly had functionality controlled exclusively through software.
This change has resulted in a shift of the root cause of systems failure from hardware
to software. Market forces have encouraged projects to reuse existing software as well
as purchase COTS solutions. This has made the usage of existing reliability models
difficult. Traditional software reliability models require significant testing data to be
collected during development. If this data is not collected in a disciplined manner
or is not made available to software engineers, these modeling techniques can not be
applied. It is imperative that practical reliability modeling techniques be developed
to address these issues. This dissertation puts forth a practical method for estimating
software reliability.
The proposed software reliability model combines static analysis of existing source
code modules, functional testing with execution path capture, and a series of Bayesian
iv
Belief Networks. Static analysis is used to detect faults within the source code which
may lead to failure. Code coverage is used to determine which paths within the source
code are executed as well as the execution rate. The Bayesian Belief Networks combine
these parameters and estimate the reliability for each method. A second series of
Bayesian Belief Networks then combines the data for each method to determine the
overall reliability for the system.
In order to use this model, the SOSART tool is developed. This tool serves as
a reliability modeling tool and a bug finding meta tool suitable for comparing the
results of different static analysis tools.
Verification of the model is presented through multiple experimental instances.
Validation is first demonstrated through the application to a series of Open Source
software packages. A second validation is provided using the Tempest Web Server,
developed by NASA Glenn Research Center.
v
Dedication
I would like to dedicate this dissertation to my wife Laura whose help and assis-
tance has allowed me to persevere through the struggles of doctoral studies.
vi
Acknowledgments
I would like to take this opportunity to express my thanks to the many persons
who have assisted my research as a doctoral student.
First and foremost, I would like to recognize the software companies whom I am
indebted to for the usage of their tools in my research. In particular, this includes
Gimple Software, developers of the PC-Lint static analysis tool, Fortify Software,
developed of the Fortify SCA security analysis tool, Programming Research Limited
developers of the QAC, QAC++, and QAJ tools, and Sofcheck, developer of the
Sofcheck static analysis tools. Through their academic licensing programs, I was able
to use these tools at greatly reduced costs for my research, and without their support,
it would have been virtually impossible to conduct successful experimentation.
Second, I would like to acknowledge the Ohio Supercomputing Center in Colum-
bus, OH. My research included a grant of computing on the supercomputing cluster
in Columbus which was used for reliability research and web reliability calculation.
I am indebted to the NASA Glenn Research Center in Cleveland and the Flight
Software Engineering Branch, specifically Michael Mackin, Joseph Ponyak, and Kevin
Carmichael. Their assistance during my one summer on site proved vital in the
development of this practical and applied software reliability model.
I am also indebted to the Ohio Space Grant Consortium, located in Cleveland
Ohio. Their Doctoral fellowship supported my graduate studies and made it possible
for me to complete this work.
vii
I am also indebted to Dr. Mosin Jamali, Dr. Vikram Kapoor, Dr. Dr. Henry
Ledgard, Dr. Hilda Standley, and Dr. Afzal Upal from the University of Toledo who
at various times served on my dissertation committee. As a last statement, I would
like to express my gratitude to my advisor, Dr. Mansoor Alam.
viii
Contents
Abstract iv
Dedication vi
Acknowledgments vii
Contents ix
List of Figures xiv
List of Tables xvii
1 Introduction and Key Contributions 1
1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Dissertation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Software Field Failure Case Studies 12
2.1 COTS System Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Embedded System Failures . . . . . . . . . . . . . . . . . . . . . . . . 16
ix
2.3 Hard Limits Exceeded . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Survey of Static Analysis Tools and Techniques 28
3.1 General Purpose Static Analysis Tools . . . . . . . . . . . . . . . . . 31
3.1.1 Commercial Tools . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2 Academic and Research Tools . . . . . . . . . . . . . . . . . . 37
3.2 Security Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Commercial Tools . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.2 Academic and Research Tools . . . . . . . . . . . . . . . . . . 43
3.3 Style Checking Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.1 Academic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Teaching Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.1 Commercial Tools . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 Academic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Proposed Software Reliability Model 48
4.1 Understanding Faults and Failures . . . . . . . . . . . . . . . . . . . 49
4.1.1 Classifying Software Faults . . . . . . . . . . . . . . . . . . . . 51
4.2 What Causes a Fault to Become a Failure . . . . . . . . . . . . . . . 53
4.2.1 Example Statically Detectable Faults . . . . . . . . . . . . . . 53
4.2.2 When Does a Fault Manifest Itself as a Failure . . . . . . . . . 58
4.3 Measuring Code Coverage . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 The Static Analysis Premise . . . . . . . . . . . . . . . . . . . 70
x
5 Static Analysis Fault Detectability 83
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.1 Basic Fault Detection Capabilities . . . . . . . . . . . . . . . . 88
5.2.2 The Impact of False Positives and Style Rules . . . . . . . . . 91
5.3 Applicable Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Bayesian Belief Network 96
6.1 General Reliability Model Overview . . . . . . . . . . . . . . . . . . . 97
6.2 Developed Bayesian Belief Network . . . . . . . . . . . . . . . . . . . 98
6.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.2 Confirming Fault Validity and Determining Fault Risk . . . . 99
6.2.3 Assessing Fault Manifestation Likelihood . . . . . . . . . . . . 103
6.2.4 Determining Reliability . . . . . . . . . . . . . . . . . . . . . . 108
6.3 Multiple Faults In a Code Block . . . . . . . . . . . . . . . . . . . . . 110
6.4 Combining Code Blocks to Obtain Net Reliability . . . . . . . . . . . 114
7 Method Combinatorial Network 117
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2 Problems with Markov Models . . . . . . . . . . . . . . . . . . . . . . 122
7.3 BBNs and the Cheung Model . . . . . . . . . . . . . . . . . . . . . . 123
7.4 Method Combinatorial BBN . . . . . . . . . . . . . . . . . . . . . . . 124
7.5 Experimental Model Validation . . . . . . . . . . . . . . . . . . . . . 127
7.6 Extending the Bayesian Belief Network . . . . . . . . . . . . . . . . . 129
xi
7.7 Extended Network Verification . . . . . . . . . . . . . . . . . . . . . . 131
7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8 The Software Static Analysis Reliability Tool 135
8.1 Existing Software Reliability Analysis Tools . . . . . . . . . . . . . . 135
8.2 Sosart Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.3 SOSART Implementation Metrics . . . . . . . . . . . . . . . . . . . . 140
8.4 External Software Packages Used within SOSART . . . . . . . . . . . 143
8.5 SOSART Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.5.1 Commencing Analysis . . . . . . . . . . . . . . . . . . . . . . 144
8.5.2 Obtaining Program Execution Profiles . . . . . . . . . . . . . 149
8.5.3 Importing Static Analysis Warnings . . . . . . . . . . . . . . . 154
8.5.4 Historical Database . . . . . . . . . . . . . . . . . . . . . . . . 158
8.5.5 Exporting Graphics . . . . . . . . . . . . . . . . . . . . . . . . 161
8.5.6 Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.5.7 Project Saving . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.5.8 GUI Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.5.9 Reliability Report . . . . . . . . . . . . . . . . . . . . . . . . . 164
9 Model Validation using Open Source Software 171
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2 STREW Metrics and GERT . . . . . . . . . . . . . . . . . . . . . . . 172
9.3 Real Estate Program Analysis . . . . . . . . . . . . . . . . . . . . . . 174
9.4 JSUnit Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 177
xii
9.5 Jester Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.6 Effort Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10 Model Validation using Tempest 185
10.1 Tempest Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.2 Evaluating Tempest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.2.1 Java Web Tester . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.2.2 Initial Experimental Setup . . . . . . . . . . . . . . . . . . . . 191
10.2.3 Final Experiment Setup . . . . . . . . . . . . . . . . . . . . . 193
10.2.4 Measured Reliability . . . . . . . . . . . . . . . . . . . . . . . 196
10.2.5 Analyzing the Source code for Statically Detectable Faults . . 198
10.2.6 SOSART Reliability Assessment . . . . . . . . . . . . . . . . . 200
11 Conclusions and Future Directions 202
11.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
11.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Bibliography 210
A Fault Taxonomy 239
B SOSART Requirements 243
B.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 243
B.2 Development Process requirements . . . . . . . . . . . . . . . . . . . 246
B.3 Implementation Requirements . . . . . . . . . . . . . . . . . . . . . . 247
xiii
List of Figures
2-1 Source code which caused AT&T Long Distance Outage. . . . . . . . 22
4-1 The Relationship between Faults and Failures. . . . . . . . . . . . . . 51
4-2 Venn Diagram classifying software bugs. . . . . . . . . . . . . . . . . 52
4-3 Source code exhibiting uninitialized variable. . . . . . . . . . . . . . . 54
4-4 source code exhibiting statically detectable faults. . . . . . . . . . . . 55
4-5 PC Lint for buffer overflow.c source file. . . . . . . . . . . . . . . . . 55
4-6 Source exhibiting loop overflow and out of bounds array access. . . . 57
4-7 Source exhibiting statically detectable mathematical error. . . . . . . 63
4-8 GNU gcov output from testing prime number source code. . . . . . . 64
4-9 Control flow graph for calculate distance to next prime number method.
65
4-10 Source exhibiting uninitialized variable. . . . . . . . . . . . . . . . . . 66
4-11 Control flow graph for do walk method. . . . . . . . . . . . . . . . . . 67
4-12 Source code which determines if a timer has expired. . . . . . . . . . 71
4-13 Translation of timer expiration routine from C to Java. . . . . . . . . 72
4-14 Flowchart for check timer routine.. . . . . . . . . . . . . . . . . . . . 73
xiv
4-15 gcov output for functional testing of timer routine. . . . . . . . . . . 77
4-16 Modified timer source code to output block trace. . . . . . . . . . . . 79
4-17 Rudimentary trace output file. . . . . . . . . . . . . . . . . . . . . . . 80
4-18 gdb script for generating path coverage output trace. . . . . . . . . . 81
6-1 BBN for faults, code coverage, and reliability. . . . . . . . . . . . . . 99
6-2 BBN relating two statically detectable faults. . . . . . . . . . . . . . . 111
6-3 BBN Combining four statically detectable faults. . . . . . . . . . . . 115
6-4 BBN Network to combine multiple blocks with multiple faults. . . . . 116
7-1 A program flow graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7-2 Program flow graph with internal loop. . . . . . . . . . . . . . . . . . 121
7-3 Basic BBN for modeling a Markov Model. . . . . . . . . . . . . . . . 125
7-4 A program flow graph. . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7-5 Extended BBN for modeling a Markov Model. . . . . . . . . . . . . . 130
7-6 Extended BBN allowing up to n nodes to be assessed for reliability . 131
7-7 Extended program flow graph. . . . . . . . . . . . . . . . . . . . . . . 132
8-1 Analysis menu used to import Java source code files. . . . . . . . . . 145
8-2 Summary panel for imported class.. . . . . . . . . . . . . . . . . . . . 146
8-3 Source Code Panel for program. . . . . . . . . . . . . . . . . . . . . . 147
8-4 Basic Activity Diagram for program. . . . . . . . . . . . . . . . . . . 148
8-5 Basic Tracepoint Panel for program. . . . . . . . . . . . . . . . . . . 149
8-6 Java Tracer command line usage. . . . . . . . . . . . . . . . . . . . . 150
xv
8-7 Java Tracer execution example. . . . . . . . . . . . . . . . . . . . . . 150
8-8 XML file showing program execution for HTTPString.java class. . . . 152
8-9 Execution trace within the SOSART tool. . . . . . . . . . . . . . . . 153
8-10 Taxonomy assignment panel. . . . . . . . . . . . . . . . . . . . . . . . 155
8-11 A second taxonomy definition panel. . . . . . . . . . . . . . . . . . . 155
8-12 Imported Static Analysis Warnings. . . . . . . . . . . . . . . . . . . . 166
8-13 Static Analysis Verification panel. . . . . . . . . . . . . . . . . . . . . 167
8-14 Static Analysis fault report. . . . . . . . . . . . . . . . . . . . . . . . 168
8-15 Analyze menu of SOSART tool. . . . . . . . . . . . . . . . . . . . . . 168
8-16 SOSART Program Configuration Panel. . . . . . . . . . . . . . . . . 169
8-17 SOSART Reliability Report Panel. . . . . . . . . . . . . . . . . . . . 169
8-18 Textual Export of Reliability report for analyzed source. . . . . . . . 170
10-1 Flow diagram showing relationship Tempest, experiments, and laptop
web browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10-2 Web tester GUI Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10-3 OCARNet Lab topology. . . . . . . . . . . . . . . . . . . . . . . . . . 192
10-4 Network topology for test setup. . . . . . . . . . . . . . . . . . . . . . 194
10-5 DummyLogger.java class. . . . . . . . . . . . . . . . . . . . . . . . . . 195
10-6 Modified NotFoundException.java file. . . . . . . . . . . . . . . . . . 196
xvi
List of Tables
1.1 The cost of Internet Downtime . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Software Failure Root Cause Descriptions . . . . . . . . . . . . . . . . 13
3.1 Summary of static analysis tools . . . . . . . . . . . . . . . . . . . . . 30
4.1 Relationship between faults and failures in different models . . . . . . 49
4.2 Discrete Paths through sample function . . . . . . . . . . . . . . . . . 74
4.3 Execution Coverage of various paths . . . . . . . . . . . . . . . . . . 76
4.4 Execution Coverage of various paths . . . . . . . . . . . . . . . . . . 80
5.1 Static Analysis Fault Categories for Validation Suite . . . . . . . . . . 85
5.2 Summary of fault detections. . . . . . . . . . . . . . . . . . . . . . . . 89
5.3 Static Analysis Detection Rate by Tool Count . . . . . . . . . . . . . 90
5.4 Correlation between warning tool detections. . . . . . . . . . . . . . . 90
5.5 Static Analysis Tool False Positive and Stylistic Rule Detections . . . 91
5.6 Correlation between false positive and stylistic rule detections . . . . 92
5.7 Percentage of warnings detected as valid based upon tool and warning. 95
xvii
6.1 Bayesian Belief Network State Definitions . . . . . . . . . . . . . . . 100
6.2 Network Combinatorial States . . . . . . . . . . . . . . . . . . . . . . 112
6.3 Network Worst Case Combinatorial Probabilities . . . . . . . . . . . 112
6.4 Network Typical Combinatorial Probabilities . . . . . . . . . . . . . . 113
7.1 Bayesian Belief Network States Defined for Reliability . . . . . . . . . 126
7.2 Bayesian Belief Network States Defined for Execution Rate . . . . . . 126
7.3 Markov Model Parameter Ranges . . . . . . . . . . . . . . . . . . . . 128
7.4 Differences between the Markov Model reliability values and the BBN
Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.5 Test Error ranges and counts . . . . . . . . . . . . . . . . . . . . . . . 129
7.6 Test error relative to normal distribution . . . . . . . . . . . . . . . . 129
7.7 Extended Bayesian Belief Network States Defined for Execution Rate 131
7.8 Markov Model Parameter Ranges . . . . . . . . . . . . . . . . . . . . 133
7.9 Differences between the Markov Model reliability values and the BBN
Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.10 Test Error ranges and counts . . . . . . . . . . . . . . . . . . . . . . . 134
7.11 Test error relative to normal distribution . . . . . . . . . . . . . . . . 134
8.1 SOSART Development Metrics . . . . . . . . . . . . . . . . . . . . . 141
8.2 SOSART Overview Metrics . . . . . . . . . . . . . . . . . . . . . . . 142
8.3 A listing of SOSART supported static analysis tools. . . . . . . . . . 154
9.1 Real Estate Overview Metrics . . . . . . . . . . . . . . . . . . . . . . 174
xviii
9.2 RealEstate Class Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.3 Real Estate STREW Metrics Reliability Parameters . . . . . . . . . . 176
9.4 RealEstate Class Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.5 JSUnit Overview Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.6 JSUnit Class Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.7 JSUnit STREW Metrics Reliability Parameters . . . . . . . . . . . . 179
9.8 JSUnit Static Analysis Findings . . . . . . . . . . . . . . . . . . . . . 179
9.9 Jester Overview Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.10 Jester Class Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.11 Jester STREW Metrics Reliability Parameters . . . . . . . . . . . . . 181
9.12 Jester Static Analysis Findings . . . . . . . . . . . . . . . . . . . . . 182
9.13 Software Complete Review Effort Estimates . . . . . . . . . . . . . . 183
9.14 Software Reliability Modeling Actual Effort . . . . . . . . . . . . . . 184
10.1 Tempest Overview Metrics . . . . . . . . . . . . . . . . . . . . . . . . 188
10.2 Tempest Class Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.3 Tempest Configuration Parameters . . . . . . . . . . . . . . . . . . . 189
10.4 Tempest Test Instance Configurations . . . . . . . . . . . . . . . . . . 193
10.5 Tempest Field Measured Reliabilities . . . . . . . . . . . . . . . . . . 197
10.6 Tempest Rule Violation Count with All Rules Enabled . . . . . . . . 198
10.7 Tempest Rule Violation Densities with All Rules Enabled . . . . . . . 199
10.8 Static Analysis Rule Configuration Metrics . . . . . . . . . . . . . . . 199
10.9 Tempest Rule Violation Count with Configured Rulesets. . . . . . . . 200
xix
10.10Tempest Estimated Reliabilities using SOSART . . . . . . . . . . . . 200
A.1 SoSART Static Analysis Fault Taxonomy . . . . . . . . . . . . . . . . 239
xx
Chapter 1
Introduction and Key
Contributions
“The most significant problem facing the data processing business to-
day is the software problem that is manifested in two major complaints:
Software is too expensive and software is unreliable.”[Mye76]
This statement, leading into Myers book Software Reliability: Principles and
Practices was written in 1976. Yet, even today, this statement is equally valid, and
thus, it forms a perfect introductory quotation for this dissertation.
The problems of software reliability are not new. The first considerations to soft-
ware reliability, in fact, were made during the late 1960’s as system downtime began
to become a significant issue[Sho84]. The earliest effort towards the development of
a reliability model was a Markov birth-death model developed in 1967[Sho84]. How-
ever, beyond a few specialized application environments, software failure was regarded
1
2
as an unavoidable nuisance. This has been especially true in the area of embedded
systems, where cost and production factors often outweigh quality and reliability
issues.
Increased systemic reliance upon software has began to change this attitude, and
software reliability is beginning to be significantly considered in each new product
development. Software is becoming increasingly responsible for the safety-critical
functionality in the medical, transportation, and nuclear energy fields. The software
content of embedded software doubles every 18 months[Hot01], and many automotive
products, telephones, routers, and consumer appliances incorporate 100 KB or more
of firmware. The latest airplanes under development contain over 5 million lines of
code, and even older aircraft contain upwards of 1 million lines of code[Sha06]. A
study of airworthiness directives indicated that 13 out of 33 issued for the period
1984-1994, or 39%, were directly related to software problems[Sho96]. The medical
field faces similar complexity and reliability problems. 79% of medical device recalls
can be attributed to software defects[Ste02]. In today’s consumer electronic products,
software has become the principle source for reliability problems, and it is reported
that software driven outages exceed hardware outages by a factor of ten[EKN98].
Aside from the inconvenience and potential safety hazards related to software fail-
ures, there is a huge economic impact as well. A 2002 study by the National Institute
of Standards found that software defects cost the American economy $59.5 billion
annually [Tas02]. For the fiscal year 2003, the Department of Defense is estimated to
have spent $21 billion on software development. Upwards of $8 billion, or 40% of the
total, was spent on reworking software due to quality and reliability issues[Sch04b].
3
Table 1.1: The cost of downtime per hour in 2000[Pat02]
Brokerage operations $6,450,000Credit card authorization $2,600,000Ebay $225,000Amazon.com $180,000Package shipping services $150,000Home shopping channel $113,000Catalog sales center $90,000Airline reservation center $89,000Cellular service activation $41,000On-line network fees $25,000ATM service fees $14,000
But it is not only these large systems that have enormous economic costs. Delays in
the Denver airport automated luggage system due to software problems cost $1.1 mil-
lion per day[Nta97]. A single automotive software error led to a recall of 2.2 million
vehicles and expenses in excess of $20 million[LB05].
In the embedded systems world, software development costs have skyrocketed to
the point that modern embedded software typically costs between $15 and $30 per line
of source code[Gan01]. But, the development costs can be considered small when one
considers the economic cost for system downtime, which ranges from several thousand
dollars per hour to several million dollars per hour depending upon the organization,
as is shown in Table 1.
These economic costs, coupled with the associated legal liabilities, have made soft-
ware reliability an area of extreme importance. That being said, economic pressures,
including decreased time to market, stock market return on investment demands, and
shortages of skilled programmers have led business decisions being made that may
work against increasing reliability. In many projects, software reuse has been consid-
4
ered as a mechanism to solve both the problems of decreased delivery schedules and
increasing software cost. However, software reliability has not necessarily increased
with reuse. The Ariane 5[JM97]1 and Therac-25[LT93] failures can be directly related
to improper software reuse.
In today’s software development environment, the development of even a rela-
tively simple software system is often partitioned to multiple contractors or vendors,
typically referred to as suppliers. Each supplier is responsible for delivering their
piece of software. Each piece is then integrated to form a final product. Over the
course of a product lifecycle, a given vendor may release hundreds of versions of their
component. Each time a new release is made, the integration team must make a de-
cision as to whether or not it is safe to integrate the given component into the overall
product. Unfortunately, there is often little concrete knowledge to base this decision
upon, and the exponential growth in releases by COTS vendors has been shown to
cause a reduction in net software reliability over time[MMZC06].
The Capability Maturity Model (CMM) was developed by Carnegie Mellon Uni-
versity to assist in assessing the capabilities of government contractors to deliver
software on time and on budget. As such, software development companies are as-
sessed on a scale of 1 to 5 relative to their development capabilities, with 5 being
the best and 1 being the lowest. However, due to many issues, the usage of CMM
assessment has been problematic in filtering capable companies from incapable com-
panies, as is discussed in O’Connell and Saiedian [OS00] and [Koc04]. It has also had
problems related to inconsistent assessment by different assessors, as is documented
1The Ariane 5 failure is discussed in further detail in Chapter 2 of this dissertation.
5
in Saiedian and Kuzara[SK95]. Thus, an engineer can not rely on CMM assessments
to determine if it is acceptable to integrate a new release of a software component
into a product.
1.1 The Problem
Traditional software reliability models require significant data collection during
development and testing, including the operational time between failures, the severity
of the failures, and other metrics. This data is then applied to the project to determine
if an adequate software reliability has been achieved. While these methods have been
applied successfully to many projects, there are often occasions where the failure data
has not been collected in an adequate fashion to obtain relevant results. This is often
the case when reusing software which has been developed previously or purchasing
COTS components for usage within a project. In the reuse scenario, the development
data may have been lost or never collected. In the case of COTS software, the requisite
development data may be proprietary and unavailable to the customer.
This poses a dilemma for a software engineer wishing to reuse a piece of software or
purchase a piece of software from a vendor. Internal standards for release vary greatly,
and from the outside, it is impossible to know where on the software reliability curve
the existing software actually stands. One company might release early on the curve,
resulting in more failures occurring in the field, whereas another company might
release later in the curve, resulting in fewer field defects. Further complicating this
decision are licensing issues. In traditional software reliability models, whenever a
6
failure occurs it is nearly immediately repaired and a new version is released. With
COTS and open source code, this immediate response generally does not occur, and
the software must be used “as is”. Licensing agreements may also restrict a customer
from fixing a known defect which leads to a failure or risk support termination if such
a fix is attempted.
Independent verification of third party software can be used as a mechanism to
aid in assessing externally developed software. However, independent verification of
third party code can be a costly and expensive endeavor. Huisman et al. [HJv00]
report that the verification of one Java class, namely the Java Vector class, required
nearly 700 man hours of effort.
This leads to the following engineering problem:
How can one economically ensure that software delivered from an ex-
ternal vendor is of sufficient reliability to ensure that the final delivered
product obtains an acceptable level of reliability?
It is this very problem that this dissertation intends to address.
1.2 Key Contributions
Thusfar, we have discussed the problems of software reliability and ensuring that
reused software is of acceptable quality. A key emphasis has been on the aspect
of the economic costs of software failure. In this section, our intent is to outline the
key contributions to the software engineering body of knowledge that this dissertation
7
provides as a partial solution to the problem of software reliability modeling of reused
and COTS software.
In order to address the needs of software practitioners, any developed model must
be readily understood, easily applied, and generate no more than a small increase
in development costs. Without meeting these criteria, the likelihood of adoption by
development organizations is decreased.
Static analysis can operate in a manner similar to that of a compiler. An un-
derstanding of compiler usage is a required skill for all software engineers developing
embedded systems software, and thus, by using static analysis tools to detect stati-
cally detectable faults within source code modules, the additional knowledge required
of a practitioner to apply this model is minimized. Static analysis tools offer yet
another benefit. Because the interface to static analysis tools is similar to that of a
compiler, the operation of static analysis tools can be highly automated through the
usage of a build process. This automation not only adds to the repeatability of the
analysis, but also serves to obtain a cost effective analysis.
Research contributions of this dissertation are:
1. A demonstration of the effectiveness of static analysis tools.
This dissertation shows that for the Java programming language, through the
usage of an appropriate set of static analysis tools, a significant number of
commonly encountered programming mistakes can be detected and analyzed.
2. The design of a Bayesian Belief Network which relates software reliability to
statically detectable faults and program execution profiles.
8
In carrying out this research, a Bayesian Belief Network was developed using
expert knowledge which relates the risk of failure to statically detected faults.
3. The design of a Bayesian Belief Network which relates method execution rates
and reliability and obtains results comparable to those obtained through Markov
Modeling.
This network serves to combine the reliabilities of multiple methods based upon
their execution frequency and their individual reliability measures.
4. The design and development of a new software tool, SOSART, which allows the
application of the described model to existing software modules.
The SOSART tool simplifies the application of this model to existing source
code by providing the user a convenient interface to analyze detected faults as
well as capture execution profiles.
5. The successful demonstration of the reliability model through two different sets
of experiments which shows that the reliability model developed here can accu-
rately predict software reliability.
In the first set of experiments, readily available Open Source software is an-
alyzed for reliability using the STREW metrics suite. The software is then
re-analyzed using the SOSART method and the results are compared. The
second experiment applies the model to an existing medium scale embedded
systems project and comparing the results to actual reliability data obtained
from program operation.
9
1.3 Dissertation Overview
Chapter 1 has provided a synopsis of the software reliability problem and a jus-
tification for the research that follows. Next, the problem statement is clearly and
succinctly stated. An overview of the key contributions of this research to the software
engineering body of knowledge is presented as well.
Chapter 2 provides case studies of software failure. Understanding how software
fails in the field is vital in order to create a practical software reliability model which
is applicable to today’s competitive software development environments. In this case,
the failures discussed have been chosen to be ones which occurred due to a software
fault which was statically detectable through the usage of the appropriate static
analysis tools. The discussion of these failures provides the justification in Chapter 1
for the research objectives.
Chapter 3 provides an overview of static analysis tools. The research upon which
this dissertation is based relies heavily upon the capabilities of static analysis tools
to reliably detect injected faults at the source code level. In order to understand
the assessment capabilities of static analysis tools, this chapter provides a detailed
overview of the static analysis tools which are currently available.
Chapter 4 presents the key areas of contribution of our research by presenting key
concepts for our reliability model, an analysis of faults, and how certain faults lead to
failures. Then we discuss how to measure code coverage. Code coverage plays a major
importance both in achieving software reliability through testing as well as revealing
faults which manifest themselves as failures. This chapter then concludes with the
10
details of our proposed software reliability model, which combines static analysis of
existing source code modules, black-box testing of the existing source code module
while observing code coverage, and a series of Bayesian Belief Networks to combine
the data into a meaningful reliability measure.
Chapter 5 discusses the results of our experimentation with Static analysis tools
to determine their effectiveness at detecting commonly injected source code faults.
While extensive studies have been done for the C, C++, and Ada languages, there
have been very few studies on the effectiveness of Java static analysis tools. This
research works with the Java language. This chapter presents both the experiment
design as well as the results of the experiment, indicating that static analysis can be
an effective mechanism for detecting faults which may cause run time failures in the
Java language.
Chapters 6 and 7 discuss in detail the developed Bayesian Belief Networks which
combine the results of static analysis execution with program execution traces ob-
tained during limited testing. In general, there are three sets of Bayesian Belief
Networks within the model. One Bayesian belief network combines information to de-
termine the risk of failure from a single statically detectable fault. A second Bayesian
Belief network combines statically detectable faults which are co-located within a
single code block to obtain the reliability of the given block. A third Bayesian Belief
Network is then used to combine the code blocks together to assess the net reliability
for each method.
Chapter 8 discusses the Software Static Analysis Reliability Toolkit (SOSART)
which has been designed to allow the usage of the proposed model in assessing the
11
reliability of existing software packages. The SOSART tool developed entirely in the
Java language, interacts with the static analysis tools to allow statically detectable
faults to be combined with execution traces and analyzed using the proposed relia-
bility model. Without such a tool, the usage of this model is impractical due to the
large data processing overhead.
Chapter 9 discusses the experiments conducted on Open Source software pack-
ages which used the STREW metrics suite to assess reliability. In this chapter, the
methodology used is discussed, as well as the results of applying the methodology.
The chapter concludes with a discussion on the effort to apply this model in compar-
ison to 100% source code review.
Chapter 10 discusses the application of this model to a specific embedded reusable
component. In this case, the large scale component is assessed for its reliability and
then the reliability of this component is compared with the reliability obtained from
actual field execution and testing. This chapter highlights the economic advantages
of this model versus a complete peer review of the source code for reliability purposes.
This chapter also demonstrates this model’s overall effectiveness.
Chapter 11 summarizes the work and provides some suggestions for future work.
Chapter 2
Software Field Failure Case
Studies1
“The lessons of cogent case histories are timeless, and hence a height-
ened awareness among today’s designers of classic pitfalls of design logic
and errors of design judgment can help prevent the same mistakes from
being repeated in new designs.”[Pet94]
Before one can understand how to make software reliable, one must understand
how software fails. Leveson [Lev94] and Holloway [Hol99] both advocate that in order
to understand how to model risk for future programs, it is important to study and
understand past failures. Therefore, we begin with an in-depth study of computer
systems failures, specifically of embedded systems. The failures are summarized in
Table 2.1.
1Portions of this chapter appeared in Schilling[Sch05].
12
13
Table 2.1: Software Failure Root Cause Descriptions
Program Causes
Is fault
Statically
Detectable?Air Traffic Communi-cations
Failure to reboot Communications System. Reboot required by defect inusage of GetTickCount() API call of Win32 Subsystem.
Possibly
Mars Spirit Rover File System Expanded beyond available RAM Yes
2003 US BlackoutRace condition within event handling system, allowing two threads to simul-taneously write to a data structure Yes
Patriot Missile Failure Error induced in algorithmic computation due to rounding of decimal number. YesSTS–2 Uninitialized code jump Yes
Ariane 5Unhanded exception raised by variable overflow in typecast, resulting in com-puter shutdown. Yes
Milstar 3 Improper filter coefficient entered into configuration tables. Possibly
Mars PathFinder Priority Inversion between tasks resulted in reset of CPU. Possibly
USS YorktownImproper user input resulted in division by zero. This caused cascading errorwhich shutdown the ships propulsion system. Yes
Clementine Software error resulted in stuck open thruster. Possibly
GeoSatImproper sign in telemetry table resulted in momentum and torque beingapplied in the wrong direction.
Possibly
Near Earth AsteroidRendezvous Space-craft
Transient lateral acceleration of the craft exceeded firmware threshold, re-sulting in shutdown of thruster. Craft enter safe mode, but race conditionsin software occurred, leading to unnecessary thruster firing and a loss of fuel.
Possibly
AT&T Long DistanceOutages
Missing break statement in case statement Yes
Sony, Hitachi, Philips,RCA Television Fail-ures
Buffer overflow in microprocessor software caused by two extra bits in trans-mission stream.
Yes
Diebold ATM Failure Nachi worm infected Diebold ATM machines. YesTacSat-2 satellitelaunch delay
Improper sign in equation. Possibly
It is important to note that there are significant levels of commonality amongst
these failures. Many of these failures are directly attributable to the reuse of previ-
ously developed software with insufficient testing. Furthermore, many of these failures
can be attributed to a fault which can be readily and easily detected with existing
static analysis tools.
2.1 COTS System Failure
Air Traffic Control System Failure
On Tuesday September 14, 2004 at about 5 p.m. Pacific Time, air traffic con-
trollers lost contact with 400 airplanes which were flying in the Southwestern United
States. Planes were still visible on radar, but all voice communications between the
air traffic controllers and pilots was lost. This loss of communication was caused by
the failure of the Voice Switching and Control System (VSCS) which integrates all
14
air traffic controller communications into a single system. The failures lasted three
hours, and resulted in 800 flights being disrupted across the Western United States.
In at least five instances, planes violated minimum separation distances for planes
flying at high altitudes, but luckily no fatalities occurred[Bro04] [Gep04].
The cause was the failure of a newly integrated system enhancement intended
to provide operational availability of 0.9999999[Mey01]. Soon after installation, a
problem was detected within the VCSU installation, as the system crashed after 49.7
days of operation. After rebooting the system, operations returned to normal. As a
workaround, the FAA instituted a maintenance reboot every 30 days to the system.
This mandatory reboot was necessitated by a design flaw within the software.
The internal software of the VCSU relies on a 32 bit counter which counts down from
232 to 0 as the software operates. This takes 232ms, or 49.7103 days to complete a
countdown[Cre05]. This tick, part of the Win32 system and accessed by a call to the
GetTickCount() API, is used to provide a periodic pulse to the system. When the
counter reaches 0, the tick counter wraps around from 0 to 4, 294, 967, 296, and in
doing so, the periodic pulse fails to occur. In the case of the Southern California air
traffic control system, the mandatory reboot did not occur and the tick count reached
zero, shutting down the system in its entirety. The backup system attempted to start,
but was unable to handle the load of traffic, resulting in the complete communications
failure[Wal04] [Gep04].
15
USS Yorktown Failure
On September 21, 1997, the USS Yorktown, the prototype for United States Navy
Smart Ship program, suffered a catastrophic software failure while steaming in the
Atlantic. As a result of an invalid user input, the entire ship was left dead in the water
for two hours and forty five minutes while a complete reboot of systems occurred.
The software consisted of two major segments, a Windows NT user interface
portion and a monitoring and control surveillance system portion. Windows NT
was mandated by the Navy’s IT-21 report, “Information Technology for the 21st
Century”[wir98]. The monitoring and control surveillance system code was developed
in Ada[Sla98a]. In order to meet Navy schedules and deployment deadlines for the
Smart Ship program, installation was completed in eight months[Sla98a]2.
The failure of the Yorktown system began with a petty officer entering an incorrect
calibration value into the Remote Database Manager. This zero entry resulted in a
divide by zero, causing the database system to crash. Through the ATM network, the
crash propagated to all workstations on the network. Restoration of the ship required
rebooting of all computers[Sla98b].
Diebold ATM Lockups
In August of 2003, automated teller machines at two financial institutions failed
due to the Nachi worm in the first confirmed case of malicious code penetrating
ATM machines. The ATM machines operated on top of the Windows XP Embedded
2The typical schedule for such a development effort is approximately three years [Sla98a].
16
operating system which was vulnerable to an RPC DCOM security bug exploited by
Nachi and the Blaster virus. In this case, a patch had been available from Microsoft
for over a month, but had not been installed on the machines due to an in-depth set
of regression tests required by Diebold before deploying software patches[Pou03].
The Nachi worm spread through a buffer overflow vulnerability within Microsoft
Windows. In this particular instance, an infected machine would transmit a set of
packets to a remote machine using port 135. When the packets were processed a
buffer overflow would occur on the receiving machine crashing the RPC service and
allowing the worm to propagate to the second machine[McA].
2.2 Embedded System Failures
Midwest Power Failure
On August 14th, 2003, the worst electrical outage in North American History
occurred, as 50 million customers in eight US states and Canada lost power. The
economic impact of this failure has been estimated to range between $4.5 and $10
billion[ELC04]. While there were many contributing factors to the outage, buried
within the causes was the failure of a monitoring alarm system [Pou04a].
First Energy’s energy management system used to monitor the state of its elec-
trical grid failed silently. This system had over 100 installations worldwide [Jes04]
and the software was approximately four million lines of C code[Pou04b]. A thorough
analysis of the Alarm and Event processing routines comprised of approximately one
million lines of C and C++ code yielded a subtle race condition which allowed two
17
asynchronous threads to obtain write access to a common data structure. Once this
happened the data structure was corrupted and the alarm application went into an
infinite loop. This caused a queue overflow on the server hosting the alarm process,
resulting in a crash. A backup system kicked in, but it too was overwhelmed by the
number of unprocessed messages and it also soon crashed[Pou04b][Pou04a].
Patriot Missile Interception Failure
On February 25, 1991, an incoming Iraqi Scud missile was launched and struck
the American Army barracks. An American Patriot Missile battery in Dharan, Saudi
Arabia was tasked with intercepting incoming missiles, but due to a software defect,
this did not occur[Arn00].
The Patriot missile defense system operated by detecting airborne objects which
potentially match the electronic signature for a foreign object[Car92]. In calculating
the location to apply the range gate, the Patriot computer system used two funda-
mental data items: time, expressed as a 24 bit integer, and velocity expressed as a
24-bit fixed point decimal number. The prediction algorithm used in the source code
multiplied the current time stamp by 110
when calculating the next location.
In order to fit into the 24 bit registers of the Patriot computer system, the binary
expansion for 110
was truncated to 0.00011001100110011001100, yielding an error of
0.0000000000000000000000011001100... binary, or approximately 0.000000095 deci-
mal. This error, was cumulative with the time since the unit was initialized[Arn00].
The Alpha battery had been operating without reset for over 100 consecutive
hours. At 100 hours, the time inaccuracy was 0.3433 seconds. The incoming Iraqi
18
Scud missile cruised at 1,676 metres per second, and during this time error, it had
traveled more than one half of a kilometer [Car92]. The battery did not engage
the Scud missile, and an Army barrack was hit, resulting in 28 deaths and over 100
injuries.
STS-2 Simulation Failure
In October of 1981, a catastrophic space shuttle software failure was discovered
during a training scenario. In this training scenario, the objective of the crew was to
abort the launch and land in Spain. This required the dumping of excess fuel from
the orbiter. When the crew initiated the abort, all four redundant flight computers
failed and became unresponsive. The displays each showed a large X, indicating that
the display I/O routines were not receiving data.
The failure was traced to a single fault, namely a single, common routine used to
dump fuel from the orbiter. In this particular scenario, the routine had been invoked
once during the simulated assent, and then was canceled. Later on, the routine was
called again. However, not all variables had been re-initialized. One of the variables
was used to compute the offset address for a “GOTO” statement. This caused the
code to branch to an invalid address, resulting in the simultaneous lockup of all four
computers. A complete analysis of the shuttle code found 17 other instances of this
fault[Lad96].
19
Ariane 5 Maiden Launch Failure
On June 4, 1996, the maiden flight of the Ariane 5 rocket occurred. 39 seconds
into flight, a self destruct mechanism on the rocket activated, destroying the launcher
and all four payload satellites. The root cause for the failure was traced to a stat-
ically detectable programming error where a 64 bit floating point number was cast
into a 16 bit integer. As the code was implemented in Ada, this caused a runtime
exception which was not handled properly, resulting in the computer shutting down
and the ultimate loss of the vehicle and payload. Had this occurred on a less strongly
typed language, the program most likely would have continued executing without
incident[Hat99a]. In reviewing the source code for the module, it was found that
there were at least three other instances of unprotected typecasts present within the
code which could have caused a similar failure [Lio96].
The portion of the guidance software which failed in Ariane 5 had actually been
reused from Ariane 4 without review and integration testing[Lio96]. On the Ariane
4 program, a software “hack” had been implemented to keep a pre-launch alignment
task executing after lift-off[Gle96]. This had the effect of saving time whenever a
hold occurred. When the Ariane 5 software was developed, no one removed this
unnecessary feature. Thus, even after launch was initiated, pre-launch alignment
calculations occurred[And96]. The variable that overflowed was actually in the pre-
launch alignment calculations and served no purpose after launch.
20
Clementine Mission Failure
While in operation, the Clementine orbiter was classified by management as highly
successful. However, for all the success the orbiter was achieving, significant prob-
lems were also occurring, as over 3000 floating point exceptions were detected during
execution[Gan02b] and ground controllers were required to perform hardware resets
on the vehicle at least 16 times[Lee94].
All of these problems came to a climax in May, 1994 when another floating point
exception occurred. Ground controllers attempted to revive the system by sending
software reset commands to the craft, but these were ignored. After 20 minutes, a
hardware reset was successful at bringing the Clementine probe back on line [Lee94].
However, all of the vehicles hydrazine propellant had been expended[Gan02b].
Software for the Clementine orbiter was developed through a spiral model pro-
cess, resulting in iterative releases. Code was developed using both the C and Ada
languages. In order to protect against random thruster firing, designers had im-
plemented a software thruster timeout. However, this protection was defeated by
a hangup within the firmware. Evidence from the vehicle indicates that following
the exception, the microprocessor locked up due to a software defect. While in a
hung up state, the processor erroneously turned on a thruster, erroneously dump-
ing fuel and imparting a 80 RPM spin on the craft. This thruster fired until the
hydrazine propellant was exhausted[Gan02b]. The software to control the built in
watchdog mechanism which could have detected the run away source code was not
implemented due to time constraints [Lee94].
21
Milstar Launch Failure
A $433.1 million Lockheed Martin Titan IV B rocket was launched on April 30,
1999, from Cape Canaveral Air Station in Florida [Hal99] carrying the third Milstar
satellite destined for Geosynchronous orbit[Pav99]. The first nine minutes of the
launch were nominal. However, very quickly afterward an anomaly was detected in
the flight path as the vehicle was unstable in the roll orientation.
The instability against the roll axis returned during the second engine burn causing
excess roll commands that saturated pitch and yaw controls, making them unstable.
This resulted in vehicle tumble until engine shutdown. The vehicle then could not
obtain the correct velocity or the correct transfer orbit. Trying to stabilize the ve-
hicle, the RCS system exhausted all remaining propellant[Pav99]. The vehicle again
tumbled when the third engine firing occurred resulting in the vehicle being launched
into a low elliptical orbit instead of the intended geostationary[Pav99].
An accident investigation board determined that a single filter constant in the
Inertial Navigation Unit (INU) software file caused the error. In one filter table,
instead of −1.992476, a value of −0.1992476 had been entered. Thus, the values for
roll detection were effectively zeroed causing the instability within the roll system.
These tables were not under configuration management through a version control
system (VCS). This incorrect entry had been made in February of 1999, and had
gone undetected by both the independent quality assurance processes as well as the
independent validation and verification organizations [Neu99][Pav99].
As a footnote to this incident, and paralleling the Ariane 5 failure, the roll filter
22
which was mis-configured was not even necessary for correct flight behavior. The
filter had been requested early in the development of the first Milstar satellite when
there was a concern that fuel sloshing within the Milstar Satellite might effect launch
trajectory. Subsequently, this filter was deemed unnecessary. However, it was left in
place for consistency purposes [Pav99].
Long Distance System Crash
On January 15 of 1990, AT&T experienced the largest long distance outage on
record. 60 thousand people were left without telephone service as 114 switching nodes
within the AT&T system failed. All told, a net total of between $60 to $75 million was
lost. The culprit for this failure was a missing line of code within a failure recovery
routine[Dew90].
1: ...
2: switch (message)
3: {
4: case INCOMING_MESSAGE:
5: if (sending_switch == OUT_OF_SERVICE)
6: {
7: if (ring_write_buffer == EMPTY)
8: send_in_service_to_smm(3B);
9: else
10: break; /* Whoops */
11: }
12: process_incoming_message();
13: break;
14: ...
15: }
16: do_optional_database_work();
17: ...
Figure 2-1: Source code which caused AT&T Long Distance Outage[Hat99b]
When a single node crashed on the system, a message indicating that the given
node was out of service was sent to adjacent nodes so that the adjacent node could
route the traffic around the failed node. However, because of a misplaced “break”
statement within a C “case” construct the neighboring nodes themselves crashed upon
23
receiving an out of service message from an adjacent node. The second node then
transmitted an out of service message to its adjacent nodes, resulting in the domino
failure of the long distance network[Dew90].
NASA Mars Rover Watchdog Reset
On January 21, 2004 Jet Propulsion Laboratory Engineers struggled to estab-
lish communications with the Mars Spirit Rover. After 18 days of nearly flawless
operation on the Martian surface, a serious anomaly with the robotic vehicle had
developed, rendering the vehicle inoperable. While initial speculation on the Rover
failure pointed toward a hardware failure, the root cause for the failure turned out to
be software. The executable code that had been loaded into the Rover at launch had
serious shortcomings. A new code was uploaded via radio to the rover during flight.
In doing so, a new directory structure was uploaded into the file system while leaving
the old system intact.
Eventually, the Rover attempted to allocate more files than RAM would allow,
raising an exception and resulting in a diagnostic code being written to the file system
before rebooting the system. This scenario continued, causing the rover to get stuck
in an infinite series of resets[Hac04].
Mars Pathfinder File System Overflow
Mars PathFinder touched down on the surface of Mars on July 4, 1997, but
all communication was suddenly lost on September 27, 1997. The system began
encountering periodic total system resets, resulting in lost data and a cancelation
24
of any pending ground commands[Woe99]. The problem was quickly traced to a
watchdog reset[Gan02a] caused by a priority inversion within the system. New code
was uploaded to PathFinder and the mission was able to recover and complete a
highly successful mission[Ree97].
The Mars PathFinder failure resulted in significant research in real time tasking
leading to the development of the Java PathFinder [HP00] and JLint [Art01] static
analysis tools.
GEOSAT Sign inversion
The United States Navy launched the 370 kg GeoSAT Follow-On (GFO) satellite
in February 10, 1998 from Vandenburg Air Force Base on board a Taurus rocket.
Immediately following launch, there were serious problems with attitude control as
the vehicle simply tumbled in space. Subsequent analysis of the motion equations
programmed into the vehicle indicated that momentum and torque were being applied
in the wrong direction. Somewhere in the development of the vehicle, a sign on a
coefficient had been inverted[Hal03], resulting in forces being applied in the opposite
direction of what was necessary.
TacSat-2 Sign Inversion
The December, 2006 launch of the TacSat-2 satellite on board an Air Force Mino-
taur I rocket was delayed due to software issues. While the investigation is not com-
plete, indications are that the problem maybe related to a missing minus sign within
a mathematical equation. If the TacSat had been launched, the software defect would
25
have prevented the satellite’s attitude control system from turning the solar panels
closer than 45 degrees to the sun’s rays resulting in an eventual loss of power to the
satellite.
2.3 Hard Limits Exceeded
Near Earth Asteroid Rendezvous Spacecraft Threshold Exceeded
The NASA Near Earth Asteroid Rendezvous (NEAR) mission was launched from
Cape Canaveral on board a Delta-2 rocket on February 17, 1996. Its main mission was
to rendezvous with the asteroid 433 Eros[SWA+00]. Problems occurred on December
20, 1998 when the spacecraft was to fire the main engine in order to place the vehicle
in orbit around Eros. The engine started successfully but the burn was aborted almost
immediately. Communications with the spacecraft was lost for 27 hours[Gan00] dur-
ing which the spacecraft performed 15 automatic momentum dumps, fired thrusters
several thousand times, and burned up 96 kg of propellant[Hof99].
The root cause of the engine abort was quickly discovered. Sensors on board the
space craft had detected a transient lateral acceleration which exceeded a defined
constant in the control software. The software did not appropriately filter the input
and thus the engine was shut down. The spacecraft then executed a set of automated
scripts intended to place the craft into safe mode. These scripts, however, did not
properly start the reaction control wheels used to control attitude in the absence of
thrusters[Gan00]. A set of race conditions occurred and several untested exception
handlers executed. Both were exacerbated by low batteries. Over the next few
26
hours, 7900 seconds of thruster firings were logged before the craft reached sun-safe
mode[Gan00].
As a part of the investigation, some of the 80,000 lines of source code were in-
spected, and 17 faults were discovered[Hof99]. Complicating the situation was the
fact that there turned out to be two different version of flight software 1.11, one which
was onboard the craft and one which was readily available on the ground, as flight
code had been stored on a network server in an uncontrolled environment.
Television Sound Loss
The television transmission standard has changed little since its initial develop-
ment during the 1930’s. Recently, there has been a trend towards increased usage
of Extra Digital Services (XDS) services, such as closed captioning, automatic time
of day synchronization, program guides, and other such information, which is broad-
cast digitally during the vertical blanking interval. XDS services are decoded on the
television set using a microcontroller[Sop01].
Television system is transmission standards are strictly regulated in order that
television receivers can easily be mass produced cheaply. Problems however occur
when a transmission is out of specification. In one particular instance, a device gen-
erating the digital data stream for closed captioning on a transmitter had a periodic
yet random failure whereby two extra bits would be erroneously inserted into the data
stream[Sop01]. On certain models, receiving these erroneous bits caused a buffer over-
flow within software causing a complete loss of video image, color tint being set to
maximum green, or muted audio. In each case, the only mechanism for recovery was
27
to unplug the television set and allow the microcontroller to reset itself to the default
settings[Sop01].
Chapter 3
Survey of Static Analysis Tools
and Techniques1
“Static Program analysis consists of automatically discovering proper-
ties of a program that hold for all possible execution paths of the program.”[BV03]
Static analysis of source code is a technique commonly used during implementa-
tion and review to detect software implementation errors. Static analysis has been
shown to reduce software defects by a factor of six [XP04], as well as detect 60% of
post-release failures[Sys02]. Static analysis has been shown to out perform other Qual-
ity Assurance methods, such as model checking[Eng05][ME03]. Static analysis can
detect errors such as buffer overflows and security vulnerabilities[VBKM00], memory
leaks[Rai05], timing anomalies (race conditions, deadlocks, and livelocks)[Art01], se-
curity vulnerabilities[LL05], as well as other common programming mistakes. Faults
1Portions of this chapter appeared in Schilling and Alam[SA06c].
28
29
caught with static analysis tools early in the development cycle, before testing com-
mences where defects can be 5 to 10 times cheaper to repair than at a later phase[Hol04].
Static analysis of source code does not represent new technology. Static analy-
sis tools are highly regarded within certain segments of industry for being able to
quickly detect software faults [Gan04]. Static analysis is routinely used in mission
critical source code development, such as aircraft[Har99] and rail transit[Pol] areas.
Robert Glass reports that static analysis can remove upwards of 91% of errors within
source code [Gla99b] [Gla99a]. It has also been found effective at detecting pieces
of dead or unused source code in embedded systems [Ger04] and buffer overflow
vulnerabilities[LE01]. Richardson[Ric00] and Giessen[Gie98] provide overviews of the
concept of static analysis, including the philosophy and practical issues related to
static analysis.
Recent papers dealing with static analysis tools have shown a statistically signif-
icant relationship between the faults detected during automated inspection and the
actual number of field failures occurring in a specific product[NWV+04]. Static anal-
ysis has been used to determine testing effort by Ostrand et al.[OWB04]. Nagappan
et al. [NWV+04] and Zheng et al.[ZWN+06] discuss the application of static analysis
to large scale industrial projects, while Schilling and Alam[SA05b] cite the benefits of
using static analysis in an academic setting. Integration of static analysis tools into a
software development process have been discussed by Schilling and Alam[SA06c] and
Barriault and Lalo[BL06].
Static analysis tools have two important characteristics: soundness and complete-
ness. A static analysis tool is defined to be complete if it detects all faults present
30
Table 3.1: Summary of static analysis toolsSoftware Tool Domain Responsible Party
LanguagesChecked
Platforms
CGS Academic NASA C Linux
Checkstyle AcademicOpen Source Hosted onSourceforge Java OS Independent
CodeSonar Commercial Grammatech C, C++ WindowsCodeSurfer Commercial Grammatech C, C++ Windows
Coverity Prevent Commercial Coverity, Inc. C, C++ Linux, UNIX, Windows, MacOS X
CQual AcademicUniversity of California atBerkeley, GPL
C, C++ Unix, Linux
ESC-Java Academic
Software Engineering withApplied Formal MethodsGroupDepartment of ComputerScienceUniversity College Dublin
JavaLinux, Mac OSX, Windows,Solaris
ESP Commercial Microsoft C,C++ Windows
FindBugs Academic University of Maryland JavaAny JVM compatible plat-form
FlawFinder GPL David A. Wheeler C, C++ UNIX
Fortify Source CodeAnalysis (SCA) Commercial Fortify Software Java, C, C++
Windows, Solaris, Linux,Mac OS X, HP-UX, IBMAIX
Gauntlet Academic US Military Academy Java WindowsITS4 Commercial Cigital C,C++ Linux, Solaris, Windows
Java PathFinder Academic NASA Ames JavaAny JVM Compatible Plat-form.
JiveLint Commercial Sureshot Software Java Windows
JLint AcademicKonstantin KnizhnikCyrille Artho Java Windows, Linux
JPaX Academic NASA Java Not DocumentedKlocwork K7 Commercial Klocwork Java, C, C++ Sun Solaris, Linux, WindowsLint4j Academic jutils.com Java Any JDK System
MOPS AcademicUniversity of California,Berkeley C UNIX
PC-Lint FlexLint Commercial Gimpel Software C, C++DOS, Windows, OS/2, UNIX(FlexLint only)
PMD AcademicAvailable from SourceForgewith BSD License
JavaAny JVM Compatible Plat-form
Polyspace C Verifier Commercial Polyspace Ada, C, C++ Windows, UNIXPREfixPREfast
Commercial Microsoft C, C++ C# Windows
QAC QAC++ CommercialProgramming Research Lim-ited
C, C++ Windows, UNIX
RATS Academic Secure Software C,C++ Windows, Unix
Safer C Toolkit Commercial Oakwood Computing C Windows, Linux
SLAM Academic Microsoft C WindowsSofCheck Inspector forJava
Commercial Sofcheck Java Windows, UNIX, Linux
Splint AcademicUniversity of Virginia, De-partment of Computer Sci-ence
C Windows, UNIX, Linux
within a given source code module. A static analysis tool is deemed to be sound if it
never gives a spurious warning. A static analysis tool is said to generate a false posi-
tive if a spurious warning is detected within source code. A static analysis tool is said
to generate a false negative if a fault is missed during analysis. In practice, nearly
all static analysis tools are unsound and incomplete, as most tools generate false
positives and false negatives[Art01]. A discussion on the importance of soundness is
provided by Xie et al.[XNHA05] and Godefroid[God05].
For all of the advantages of static analysis tools, there have been very few inde-
pendent comparison studies between tools. Rutar et al. [RAF04] compare the results
31
of using Findbugs, JLint, and PMD tools on Java source code. Forristal[For05] com-
pares 12 commercial and open source tools for effectiveness, but the analysis is based
only on security aspects and security scanners, not the broader range of static anal-
ysis tools available. Lu et al.[LLQ+05], as well as Meftah[Mef05] propose a set of
benchmarks for benchmarking bug detection tools, but the paper does not specifi-
cally include static analysis tools.
In order to evaluate which static analysis tools have the potential for usage in soft-
ware reliability modeling, it was important to obtain information about the currently
existing tools. Table 3 provides a summary of the tools which are to be discussed in
the following sections.
3.1 General Purpose Static Analysis Tools
3.1.1 Commercial Tools
Lint
Lint[Joh78] is one of the first and widely used static analysis tools for the C and
C++ languages. Lint checks programs for a large set of syntax errors and seman-
tic errors. Newer versions of Lint include value tracking, which can detect subtle
initialization and value misuse problems, inter-function Value Tracking, which tracks
values across function calls during analysis, strong type checking, user-defined seman-
tic checking, usage verification, which can detect unused macros, typedef’s, classes,
members, and declarations, flow verification for uninitialized variables. Lint can also
32
handle the verification of common safer programming subsets, including the MISRA
(Motor Industry Software Reliability Association) C Standards [MIS04] [MIS98] and
the Scott Meyers Effective C++ Series of Standards [Mey92]. Lint also supports code
portability checks which can be used to verify that there are no known portability
issues with a given set of source code[Rai05]. A handbook on using Lint to verify C
programs has been written by Darwin[Dar88].
In addition to the basic Lint tool, several add on companion programs exist to
aid in the execution of the Lint program. ALOA[Hol04] automatically collects a set
of metrics from the Lint execution which can be used to aid in quality analysis of
source code. ALOA provides an overall lint score, which is a weighted sum of all Lint
warnings encountered, as well as break downs by source code module of the number
and severity of faults discovered.
QAC, QAC++, QAJ
QAC, and its companion tools QA C++, QAJ, and QA Fortran, have been de-
veloped by Programming Research Limited. Each tool is a deep flow static analyzer
tailored to the given languages. These tools are capable of detecting language im-
plementation errors, inconsistencies, obsolescent features and programming standard
transgressions through code analysis. Version 2.0 of the tool issues over 800 warn-
ing and error messages, including warnings regarding non-portable code constructs,
overly complex code, or code which violates the ISO/IEC 14882:2003 Programming
languages – C++ standard[ISO03]. Code which relies upon unspecified, undefined,
or implementation defined behavior will also be appropriately flagged.
33
The QAC and QAC++ family of tools are capable of validating several different
coding standards. QAC can validate the MISRA C coding standard [MIS98] [MIS04],
while QAC++ can validate against the High Integrity C++ Coding standard[Pro].
Polyspace C Verifier
The Polyspace Ada Verifier was developed as a result of the Ariane 501 launch fail-
ure and can analyze large Ada programs and reliably detect runtime errors. Polyspace
C++ and Polyspace C verifiers have subsequently been developed to analyze these
languages[VB04].
The Polyspace tools rely on a technique referred to as Abstract Interpretation.
Abstract Interpretation is a theory which formally constructs approximations for the
semantics of programming languages. Abstract interpretation extends data-flow anal-
ysis by providing an additional theoretical framework for mathematically justifying
data-flow analyzers, designing new data-flow analysis, and handling particular infinite
sets of properties[Pil03].
Polyspace C analysis has suffered from scalability issues. Venet and Brat[VB04]
indicate that the C Verifier was limited to analyzing 20 to 40 KLOC in a given in-
stance, and this analysis took upwards of 8 to 12 hours to obtain 20% of the total
warnings within the source code. This requires overnight runs and batch process-
ing, making it difficult for software developers to understand if their changes have
corrected the discovered problems[BDG+04]. Zitser et al. [ZLL04] discuss a case
whereby Polyspace executed for four days to analyze a 145,000 LOC program before
aborting with an internal error. This level of performance is problematic for large
34
programs. Aliasing has also posed a significant problem to the Polyspace tool[BK03].
PREfix and PREfast
PREfix operates as a compile time tool to simulate a C or C++ program in
operation, catching runtime problems before the program actually executes, as well
as matching a list of common logic and syntactic errors. PREfix performs extensive
deep flow static analysis and requires significant installation of both client and server
packages in order to operate. Current versions include both a database server and a
graphical user interface and are typically integrated into the mast build process.
PREfix mainly targets memory errors such as uninitialized memory, buffer over-
flows, NULL-pointer de-references, and memory leaks. PREfix is a path-sensitive
static analyzer that employs symbolic evaluation of execution paths. Path sensitivity
ensures that program paths analyzed are only the paths that can be taken during
execution. This helps to reduce the number of false positives found during static
analysis. Path sensitivity, however, does cause problems. Exponential path blowup
can occur due to control constructs and possibly infinite paths due to loops make it
impractical. To avoid this problem, PREfix only explores a representative set of paths
which can be configured by the user[BPS00]. PREfix running time scales linearly with
program size due to the fixed cutoff on number of paths[Rai05].
The development of PREfix has shown that the bulk of errors come from inter-
actions between two or more procedures. Thus, for maximum effectiveness, static
analyzers should be interprocedural[Rai05]. PREfix has also shown that in commer-
cial C and C++ code, approximately 90% errors are attributable to the interaction
35
of multiple functions. Furthermore, these problems are only revealed under rare error
conditions.
PREfast is a simpler tool developed by Microsoft based upon the results of ap-
plying PREfix to a large number of internal developments. This tools performs a
simpler intra-procedural analysis that detects fewer defects, has a higher noise faster,
and only generates local XML output files.
Nagappan and Ball[NB05] discuss the usage of PREfix and PREfast at Microsoft.
Microsoft has also developed the PREsharp defect detection tool for C#. PRE-
sharp performs analysis equivalent to PREfast on C# code.
SofCheck
SofCheck Inspector is a static analysis tool produced by SofCheck, Inc. for ana-
lyzing Java source code. The tool is designed to detect a large array of programming
errors including the misuse of pointers, array indices which go out of bounds, buffer
overruns, numeric overflows, numeric wraparounds, dimensional unit mismatch, stor-
age leaks, and the improper use of Application Programming Interfaces.
SofCheck works by thoroughly characterizing each element of the program in terms
of its inputs, outputs, heap allocations, preconditions, and postconditions. Precon-
ditions are based upon what needs to be present to prevent a run-time failure, and
post conditions are based upon every possible output when the element is executed.
Output is then provided as annotated source code in browsable html format. Included
within the annotated source code is the characterization of each method. SofCheck
also includes a built in history system which allows regression testing on source code
36
to be conducted such that the tool can actually verify that fixed faults are completely
removed and that no new faults are introduced.
Based upon the companies materials, SofCheck Inspector averages approximately
1000 lines per minute analysis speed depending upon the CPU speed, available RAM,
and complexity of the source code. Currently, SofCheck Inspector only works with
the Java Programming langauge. However, work is going on to develop versions
compatible with Ada, C, C++, and C#.
KlocWork K7
Klocwork was derived from a Nortel Networks tool to evaluate massive code bases
used in telephone switches. As such, it efficiently detects access problems, denial of
service vulnerabilities, buffer overflows, injection flaws, DNS spoofing, ignored return
values, mobile code security injection flaws, broken session management, insecure
storage, cross-site scripting, un-validated user input, improper error handling, and
broken access control. K7 can perform analysis based on Java source code and byte-
codes. This allows even third party libraries to be analyzed for possible defects.
Code Sonar
CodeSonar from GrammaTech, Inc. is a deep flow static analysands for C and
C++ source code. The tool is capable of detecting many common programming
errors, including null-pointer de-references, divide-by-zeros, buffer overruns, buffer
underruns, double-frees, use-after-frees, and frees of non-heap memory.
Codesurfer[Gra00][AT01] has been used by Ganapathy et. all[GJC+03] to detect
37
the presence of buffer overflows within C source code. In this work, Codesurfer was
extended through the use of supplemental plugins to generate, analyze, and solve
constrains within the implemented source code. This extended tool was applied to
the wu ftp daemon as well as the sendmail program.
JiveLint
JiveLint by Sureshot Software is a static analysis tool for the Java programming
language. JiveLint has three fundamental goals: to improve source code quality by
pointing out dangerous source code constructs; to improve readability, maintainability
and debugging through enforced coding and naming conventions; and to communicate
knowledge of how to write high quality code. JiveLint is a stand-alone Windows
application which does not require the Java language to be installed and is Windows
95/98/ME/2000/NT/XP compatible. As a very modestly priced commercial product,
very little information on the analysis techniques is available for JiveLint.
3.1.2 Academic and Research Tools
C Global Surveyor
The NASA C Global Surveyor Project (CGS) was intended to develop an efficient
static analysis tool. The tool was brought about to overcome deficiencies in current
static analysis tools, such as the Polyspace C Verifier which suffers significantly from
scalability issues[VB04] This tool would then be used to reduce the occurrence of
runtime errors within NASA developed software.
38
To improve performance, CGS was designed to allow distributed processing for
larger projects in which the analysis can run on multiple computers. CGS results are
reported to a centralized SQL database. While CGS can analyze any ISO C program,
its analysis algorithms have been precisely tuned for the Mars PathFinder programs.
This tuning results in less than 10% warnings being issued[VB04].
CGS has been applied to several NASA Jet Propulsion Laboratory projects, in-
cluding the Mars PathFinder mission (135K lines of code) and the Deep Space One
mission (280K lines of code). CGS is currently being extended to handle C++ pro-
grams for the Mars Science Laboratory mission. This requires significant advances in
the analysis of pointers in the context of dynamic data structures.
ESP
ESP is a method developed by the Program Analysis group at the Center for
Software Excellence of Microsoft for static detection of protocol errors in large C/C++
programs. ESP requires the user to develop a specification for the high level protocol
that the existing source code is intended to satisfy. The tool then compares the
behavior of the source code as implemented with the requisite specification. The
output is either a guarantee that the code satisfies the protocol, or a browsable list
of execution traces that lead to violations of the protocol.
ESP has been used by Microsoft to verify the I/O properties of the GNU C
compiler, approximately 150,000 lines of C code. ESP has also been used to validate
the Windows OS kernel for security vulnerabilities.
39
JLint
JLint is a static analysis program for the Java language initially written by Kon-
stantin Knizhnik and extended by Cyrille Artho[AB01][Art01]. JLint checks Java
code through the use of data flow analysis, abstract interpretation, and the construc-
tion of lock graphs.
JLint is designed as two separate programs which interact with each other during
analysis, the AntiC syntax analyzer and the JLint semantic analyzer[KA].
JLint has been applied to Space Exploration Software by NASA Ames Research
Center, and shown to be effective in all applications thusfar. Details of this experience
are provided in Artho and Havelund[AH04]. Rutar et al. [RAF04] compare JLint
with several other analysis tools for Java. JLint has also been applied to large scale
industrial programs in [AB01].
Lint4j
Lint4j (“Lint for Java”) is a static analyzer that detects locking and threading
issues, performance and scalability problems, and checks complex contracts such as
Java serialization by performing type, data flow, and lock graph analysis. In many
regards, Lint4j is quite similar in scope to JLint. The checks within Lint4j represent
the most common problems encountered while implementing products designed for
performance and scalability. General areas of problems detected are based upon those
found in Monk et al.[MKBD00], Bloch[Blo01], Allen[All02], Larmann et al.[LG99],
and Gosling et. al[GJSB00]. Lint4j is written in pure Java, and therefore, will
40
execute on any platform on which the Java JDK or JRE 1.4 has been installed.
Java PathFinder
The Java PathFinder (JPF) program is a static analysis model checking tool
developed by the Robust Software Engineering Group (RSE) at NASA Ames Research
Center and available under Open Source Licensing agreement from Sourceforge. This
software is an explicit-state model checker which analyzes Java bytecode classes for
deadlocks, assertion violations and general linear-time temporal logic properties. The
user can provide custom property classes and write listener-extensions to implement
other property checks, such as race conditions. JPF uses a custom Java Virtual
Machine to simulate execution of the programs during the analysis phase.
Java PathFinder has been applied by NASA Ames to several projects. Havelund
and Pressburger [HP00] discuss the general application of an early version of the Java
PathFinder tool. Brat et al. [BDG+04] provide a detailed description of the results
of applying Java PathFinder to Martian Rover software.
ESC-Java
The Extended Static Checker for Java (ESC/Java) was developed at the Compaq
Systems Research Center (SRC) as a tool for detecting common errors in Java pro-
grams such as null dereference errors, array bounds errors, type cast errors, and race
conditions. ESC/Java is neither sound nor complete.
ESC/Java uses program verification technology, and includes an annotation lan-
guage with which programmers can use to express design decisions using light-weight
41
specifications. ESC/Java checks each class and each routine separately, allowing
ESC/Java to be applied to code that references libraries without the need for library
source code.
The initial version of ESC/Java supported the Java 1.2 language set. ESC/Java2
is based upon the initial ESC/Java tool, but has been modernized to support JML and
Java 1.4 as well as support for checking frame conditions and annotations containing
method calls and additional static checks.
ESC/Java has proven very successful at analyzing programs which include an-
notations from the very beginning. However, adding ESC/Java annotations to an
existing program has proven to be an error prone daunting task. To alleviate some of
these difficulties, Compaq Systems Research Center developed a tool, referred to as
Houdini, to aid in annotating source code. Houdini infers a set of ESC/Java annota-
tions for a given programs as candidate annotations. ESC/Java is then run on each
candidate annotation to verify or refute the validity of each candidate assumption,
generating the appropriate warnings as is necessary. The Houdini tool is described in
Flanagan and Leino[FL01].
FindBugs
FindBugs is a lightweight static analysis tool for the Java program with a reputa-
tion for uncovering common errors in Java programs[CM04]. FindBugs automatically
detects common programming mistakes through the use of “Bug Patterns”, which
are code idioms that commonly represent mistakes in software. General usage of the
FindBugs tool is described in Grindstaff[Gri04a]. In addition to the built in detec-
42
tors of the FindBugs program, the tool can be extended through the development of
customized detectors. This is described in Grindstaff[Gri04b].
In practice, the rate of false warnings reported by FindBugs is generally less than
50%. Rutar et al. [RAF04] compare the results of using FindBugs versus other Java
tools and report similar results. Wagner et al. [WJKT05] generally concur with this
assessment as well.
3.2 Security Tools
3.2.1 Commercial Tools
ITS4
ITS4 is a static vulnerability scanner for the C and C++ languages developed by
Cigital. The tool was developed as a replacement for a series of grep scans on source
code used to detect security vulnerabilities as part of Cigital’s consulting practice.
Output from the tool includes a complete report of results as well as suggested fixes
for each uncovered vulnerability[VBKM00].
The analysis performed is quite fast from a performance standpoint, with an
analysis of the 57,000 LOC of sendmail-8.9.3 taking approximately 6 seconds. ITS4,
however, does suffer from its simplistic nature resulting in a significant number of
false positives. ITS4 has been applied to the Linux Kernel by Majors[Maj03].
43
Fortify SCA
Fortify SCA is a static analysis tool produced by Fortify Software aimed at aiding
in the validation of software from a security perspective. The core of the tool includes
the Fortify Global Analysis Engine. This consists of five different static analysis en-
gines which find violations of secure coding guidelines. The Data Flow analyzer is
responsible for tracking tainted input across application architecture tiers and pro-
gramming language boundaries. The Semantic Analyzer detects usage of functions
deemed to be vulnerable as well as the context of their usage. The control flow an-
alyzer tracks the sequencing of programming operations with the intent of detecting
incorrect coding constructs. The Configuration Analyzer detects vulnerabilities in
the interactions between the structural configuration of the program and the code
architecture. The Structural Analyzer identifies security vulnerabilities brought on
by the chosen code structures. These engines can also be expanded by writing custom
rules.
While for the purposes of this article the focus is on Java, Fortify SCA supports
an extensive set of programming languages, including ASP.NET, C/C++, C#, Java,
JSP, PL/SQL, T-SQL, VB.NET, XML and other .NET languages.
3.2.2 Academic and Research Tools
LCLint and SPLint
LCLint was a product of the MIT Lab for Computer Science and the DEC Re-
search Center and was designed to take a C program which has been annotated with
44
additional LCL formal specifications within the source code. In addition to detect-
ing many of the standard syntactical issues, LCLint detects violation of abstraction
boundaries, undocumented uses of global variables, undocumented modification of
state visible to clients and missing initialization for an actual parameter or use of an
uninitialized formal parameter[EGHT94].
Splint is the successor to LCLint, as the focus was changed to include secure pro-
grams. The name is extracted from “SPecification Lint” and “Secure Programming
Lint”. Splint extends LCLint to include checking for de-referencing a null pointer, us-
ing possibly undefined storage or returning storage that is not properly defined, type
mismatches, violations of information hiding, memory management errors, danger-
ous aliasing, modifications and global variable uses inconsistent with specified inter-
faces, problematic control flow (likely infinite loops), fall through cases or incomplete
switches, suspicious statements buffer overflow vulnerabilities, dangerous macro im-
plementations or invocations, and violations of customized naming conventions[EL03].
Splint has been compared to other dynamic tools in Hewett and DiPalma [HD03].
LCLint and Splint can only analyze C source code.
Flawfinder
Flawfinder was developed by David A. Wheeler to analyze C and C++ source
code for potential security flaws. Flawfinder operates by providing a listing of target
files to be processed. Processing then generates a list of potential security flaws sorted
on the basis of their risk. As with most static analysis tools, Flawfinder generates
both false positives and false negatives as it scans the given source code[Whe04].
45
RATS
The Rough Auditing Tool for Security (RATS) is a basic Lexical analysis tool for
C and C++, similar in operation to ITS4 and Flawfinder. As implied by its name,
RATS only performs a rough analysis of source code for security vulnerabilities, and
will not find all errors. It is also hampered by flagging a significant number of false
positives[CM04].
SLAM
The SLAM project from Microsoft is designed to allow the safety verification of
C code. The Microsoft tool accomplishes this by placing a strong emphasis upon
verifying API usage rules. SLAM does not require the programmer to annotate
the source program, and it minimizes false positive error messages through a process
known as “counterexample-driven refinement”. The Slam project is intended to check
temporal safety properties.
SLAM has been extensively used within Microsoft for the verification of Windows
XP Device Drivers. Behavior has been checked using this tool, as well as the usage
of kernel API calls[BR02]. The SLAM analysis engine is the core of Microsoft’s
Static Driver Verifier (SDV), available in Beta form as part of the Windows Software
Developers Kit.
MOPS
MOPS (MOdelchecking Programs for Security properties) was a tool developed
by Hao Chen in collaboration with David Wagner to find security bugs in C programs
46
and to verify compliance with rules of defensive programming. MOPS was targeted
towards developers of security-critical programs and auditors reviewing the security
of existing C code. MOPS was designed to check for violations of temporal safety
properties that dictate the order of operations in a sequence.
3.3 Style Checking Tools
3.3.1 Academic Tools
PMD
PMD, like JLint and FindBugs, is a static analysis tool for Java. However, unlike
these other tools, it does not contain a dataflow component as part of its analysis. In-
stead, it searches for stylistic conventions which occur in suspicious locations[RAF04].
PMD also includes the capability to detect near-duplicate code[Jel04].
PMD allows users to create extensions to the tool to detect additional bug pat-
terns. New bug patterns can be written in either Java or XPath[RAF04].
PMD is mainly concerned with infelicities of design or style. As such, it has a low
hit rate for detecting bugs. Furthermore, enabling all rules sets in PMD generates a
significant amount of noise relative to the number of real issues.
Checkstyle
Checkstyle is a Java style analyzer that verifies if Java source code is compliant
with predefined stylistic rules. Similar to PMD, Checkstyle has a very low rate for
47
detecting bugs within Java software. However, it does spare code reviewers the tedious
effort of verifying coding standards compliance.
3.4 Teaching Tools
3.4.1 Commercial Tools
Safer C Toolkit
The Safer C toolset (SCT) was developed by Oakwood Computing Associates
based upon extensive analysis of the failure modes of C code and the 1995 publi-
cation Safer C: Developing for High-Integrity and Safety-Critical Systems
[Hat95], as well as feedback from teaching 2500 practicing engineers the concepts of
safer programming subsets. The key intent was to provide a tool which was both
educational to the user as well as practical for use with development projects.
3.4.2 Academic Tools
Gauntlet
The Gauntlet tool for Java has been developed by the United States Military
Academy for use in an introductory Information Technology course. The intent of
the tool was to act as a pre-compiler, statically analyzing the source code before it
is sent to the Java compiler and translating the top 50 common errors into layman’s
terms for the students. Gauntlet was developed based upon four years of background
teaching students introductory programming using Java[FCJ04].
Chapter 4
Proposed Software Reliability
Model1
This dissertation has thusfar provided justification for a new software reliability
model. The first chapter provided a brief introduction to the problem as well as an
overview of the key objectives for this research. The second Chapter provided nu-
merous case studies showcasing that catastrophic system failure can be attributed to
software faults. The third Chapter introduced the concept of static analysis and pro-
vided a literature survey of currently existing static analysis tools for the C, C++, and
Java languages. This chapter will present relevant details for the Software Reliability
Model.
As software does not suffer from age related failure in the traditional sense, all
faults which lead to failure are present when the software is released. In a theoreti-
cal sense, if all faults can be detected in the released software, and these faults can
1Portions of this chapter appeared in Schilling and Alam[SA05a][SA06d][SA06b].
48
49
then be assigned a probability of manifesting themselves during software operation,
an appropriate estimation of the software reliability can be obtained. The difficulty
is reliably detecting the software faults and assigning the appropriate failure prob-
abilities. It is understood that it is impossible to prove that a computer program
is correct, as this problem is equivalent to the unsolvable Halting problem[Sip97].
However, it is believed that it is possible to develop a reliability model based upon
static analysis, limited testing, and a series of Bayesian Belief Networks which can be
used for assessing the reliability of existing modules.
4.1 Understanding Faults and Failures
It is often the case that the terms fault and failure are used interchangeably. This
is incorrect, as each term has a distinct and specific meaning. Unfortunately, sources
are not in agreement of their relationships. Different models for this relationship are
shown in Table 4.1.
Table 4.1: Relationship between faults and failures in different models
Source ModelANSI / IEEE 729-1983 error ⇒ fault ⇒ failureFenton error ⇒ fault ⇒ failureShooman fault ⇒ error ⇒ failureIEC 1508 fault ⇒ error ⇒ failureHatton error ⇒ fault or defect or bug ⇒ failureNagappan, Ball, and Zeller[NBZ06] defect ⇒ failureSchilling human error ⇒ fault ⇒ failure
For the purposes of this dissertation, a human makes a mistake during software
development, resulting in a software fault being injected into the source code. The
50
fault represents a static property of the source code. Faults are initiated through
a software developer making an error, either through omission or other developer
action.
The development of software is a labor intensive process, and as such, program-
mers make mistakes, resulting in faults being injected during development into each
and every software product. The majority of faults are injected during the imple-
mentation phase[NIS06]. The rate varies for each developer, the implementation
language chosen, and the Software Development process chosen. Boland[Bol02] re-
ports that the rate is approximately one defect for every ten lines of code developed,
and Hatton[Hat95] reports the best software as having approximately five defects
per thousand lines of code. These injected defects are removed through the software
development process, principally through review and testing.
A software failure is a dynamic property and represents an unexpected departure of
the software package from expected operational characteristics. If a piece of software
never executes, then it can not cause a failure. Software failures occur due to the
presence of one or more software faults being activated through a certain set of input
stimuli[Pai]. Any fault can potentially cause a failure of a software package, but not
all faults will cause a failure, as is shown graphically in Figure 4-1. Adams[Ada84],
as well as Fenton, Pfleeger, and Glass[FPG94] and Wagner[Wag04], indicates that,
on average, one third of all software faults only manifest themselves as a failure once
every 5000 executable years, and only two percent of all faults lead to a MTTF of less
than 50 years. Downtime is not evenly distributed either, as it is suggested that about
90 percent of the downtime comes from at most 10 percent of the faults. From this, it
51
Figure 4-1: The Relationship between Faults and Failures.
follows that finding and removing a large number of defects does not necessarily yield
the highest reliability. Instead, it is important to focus on the faults that have a short
MTTF associated with them. Malaiya et al. [MLB+94] indicate that rarely executed
modules, such as error handlers and exception handlers, while rarely executed, are
notoriously difficult to test, and are highly critical to the resultant reliability for the
system.
4.1.1 Classifying Software Faults
Gray[Gra86] classifies software faults into two different categories, Bohrbugs and
Heisenbugs. Bohrbugs represent permanent design faults within software. Provided
52
that proper testing occurs during product development, most Bohrbugs can be de-
tected and easily removed from the product. Heisenbugs represent a class of tempo-
rary faults which are random and intermittent in their occurrence. Heisenbugs include
memory exhaustion, race conditions and other timing related issues, and exception
handling.
Vaidyanathan and Trivedi [VT01] have extended this initial classification of soft-
ware faults to include a third classification for software faults, “Aging-related faults”.
These faults are similar to Heisenbugs in that they are random in occurrence. How-
ever, they are typically brought on by prolonged execution of a given software pro-
gram.
Figure 4-2: Venn Diagram classifying software bugs.
53
Grottke and Trivedi[GT05] have further refined the Vaidyanathan and Trivedi
model to better reflect the nature of software bugs. In this classification, there are
two major classifications for software faults, Bohrbugs and Mandelbugs. Bohrbugs are
faults that are easily isolated and that manifest themselves consistently under a well-
defined set of conditions. Mandelbugs, the complementary set of bugs to Bohrbugs,
are faults whose activation and/or error propagation are complex. Typically, a Man-
delbugs are difficult to isolate, as the failures caused by a it are not systematically
reproducible. Mandelbugs are divided into two subcategories, Heisenbugs and Ag-
ing related bugs. Heisenbugs are faults that cease to cause failures or that manifest
themselves differently when one attempts to probe or isolate them. Aging-related
bugs are faults that leads to the accumulation of errors either inside the running
application or in its system-internal environment, resulting in an increased failure
rate and/or degraded performance with increasing time. This classification scheme is
shown graphically through the Venn Diagram in Figure 4-2.
4.2 What Causes a Fault to Become a Failure
4.2.1 Example Statically Detectable Faults
The key to understanding software reliability based upon static analysis is to
understand what causes a fault to manifest itself as a failure and to be able to predict
which faults will likely manifest themselves as a failure.
In terms of faults and their density, little has been published classifying faults
54
by their occurrence. One of the most thorough studies of this was published by
Hatton[Hat95]. QAC Clinic from Japan’s Toyo Software Company[QAC98] details
the top five errors in Japanese embedded systems programming. However, this pub-
lication does not provide failure rate information.
1: int32_t foo(int32_t a)
2: {
3: int32_t b;
4: if (a > 0)
5: {
6: b = a;
7: }
8: return ((b) ? 1 : 0);
9: }
Figure 4-3: Source code exhibiting uninitialized variable.
Uninitialized variables pose a significant problem to embedded source code pro-
grams. In the ISO C language[ISO90][ISO99], by default, variables are not automat-
ically initialized when defined. For an automatic variable which is allocated either
on the stack or within a processor register, the value which was previously in that
location will be the value of the variable. Figure 4-3 shows an example of a function
which has potential uninitialized variable. If a > 0, b is initialized to the value of a.
However, if a ≤ 0, the value for b is indeterminate, and therefore, the return value of
the function is also indeterminate. This behavior is statically detectable, yet occurs
once every 250 lines of source code for Japanese programs and once every 840 lines
of code for US programs[QAC98]. If this function is executed, the resulting behavior
is entirely unpredictable.
Figure 4-4 provides another example of source code which contains statically de-
tectable faults. The intent of the code is to calculate the running average of an array
55
1: typedef unsigned short uint16_t;
2: void update_average(uint16_t current_value);
3:
4: #define NUMBER_OF_VALUES_TO_AVERAGE (11u)
5:
6: static uint16_t data_values[NUMBER_OF_VALUES_TO_AVERAGE];
7: static uint16_t average = 0u;
8:
9: void update_average(uint16_t current_value)
10: {
11: static uint16_t array_offset = 0u;
12: static uint16_t data_sums = 0u;
13:
14: array_offset = ((array_offset++) % NUMBER_OF_VALUES_TO_AVERAGE);
15: data_sums -= data_values[array_offset];
16: data_sums += current_value;
17: average = (data_sums / NUMBER_OF_VALUES_TO_AVERAGE);
18: data_values[array_offset] = current_value;
19: }
Figure 4-4: source code exhibiting statically detectable faults.
--- Module: buffer_overflow.c
array_offset = ((array_offset++) % NUMBER_OF_VALUES_TO_AVERAGE);
_
"*** LINT: buffer_overflow.c(14) Warning 564: variable
’array_offset’ depends on
order of evaluation [MISRA Rule 46]"
Figure 4-5: PC Lint for buffer overflow.c source file.
of variables. Variables are stored in a circular buffer data values of length NUM-
BER OF VALUES TO AVERAGE, defined at compile time to be 11. The average
of the values stored is kept in the variable average, the sum of all data values is stored
in the variable data sums, and the current offset into the circular buffer is stored in
array offset. Each time the routine is called, a 16 bit value is passed in representing
the current value that is to be added to the average. The array offset is incremented,
the previous value is removed from the data sums variable, the new version is added
to the array and data sums variable, and the updated average is stored in average.
However, there are several potential problems with this simple routine associated with
the array offset variable. The intent of the source code is to increment the offset by
one and then perform a modulus operation on this offset to place it within the range
of 0 to 11. Based upon the behavior of the compiler, this may or may not be the case.
If array offset = 10, the value of array offset can be either 0 or 10 The value will be
0 if the postfix increment operator (++) is executed before the modulus operation
56
occurs. However, if the compiler chooses to implement the logic so that the postfix in-
crement operator occurs after the modulus operation occurs, the array offset variable
will have a value of 11.
If array offset is set to 11, the execution of line 15 results in an out of bounds
access for the array. In the C Language, reading from outside of the array will not, in
general, cause a processor exception. However, the value read is invalid, potentially
resulting in a very large negative number if the offset value being subtracted is larger
than the current data sum variable value. Line 18 may result in the average value
being overwritten. Depending upon how the compiler organizes RAM, the average
variable may be the next variable in RAM following the data values array. If this is
the case, writing to data values[11] will result in the average value being overwritten.
The exact behavior will depend upon the compiler word alignment, array padding,
and other implementation behaviors. This behavior can vary from one compiler to
another compiler, one compiler version to another compiler version, or be dependant
upon compiler options passed on the command line, especially if compiler optimization
is used.
From a software reliability standpoint, the probability of failure associated with
this construct is easy to verify through testing. So long as the code has been exercised
through this transition point and proper behavior has been obtained, proper behavior
will continue until the code is recompiled, a different compiler version is used, or the
compiler is changed. Code constructs like this, however, do make code portability
difficult.
Figure 4-6 exhibits another statically detectable fault, namely the potential to
57
1: #define NUMBER_OF_CONNECTIONS (18u)
2:
3: void ListenerClass::cmdServer (ListenerClass *This)
4: {
5: int i;
6: ProcesorConnection nCon[NUMBER_OF_CONNECTIONS];
7: ProcesorConnection *slot = nCon;
8:
9: // Init the connections
10: for ( i = 0; i < NUMBER_OF_CONNECTIONS; i++ )
11: {
12: nCon[i].setOwner((void*)This);
13: }
14:
15: while ( _instance->cmdServe.accept( *slot ) == OK )
16: {
17: // Validate connection
18: if ( This->validateClient( slot ) == ERROR )
19: {
20: // Connection rejected
21: slot->close_connection();
22: continue;
23: }
24: // Find a new unused slot
25: slot = nCon;
26: while ((slot != (&nCon[NUMBER_OF_CONNECTIONS])) && (slot->isActive()))
27: {
28: slot++;
29: }
30: if ( slot == (&nCon[NUMBER_OF_CONNECTIONS]) )
31: {
32: log (10, "Command listener overloaded", 0,0,0,0,0,0);
33: }
34: }
35:
36: for ( i = 0; i < NUMBER_OF_CONNECTIONS; i++ )
37: {
38: nCon[i].close_connection ();
39: }
40: }
Figure 4-6: Source exhibiting loop overflow and out of bounds array access.
de-reference outside of an array. In this case, if all of the connections slots are busy,
an error message will be printed out notating this condition. The slot pointer will be
pointing one element beyond the end of the array when the code returns to execute
line 15. When line 15 executes, a de-reference beyond the end of the array will
occur. Depending upon the behavior of the cmdServe.accept(*slot ) function, which
is unknown based upon the code segment provided, this can result in data corruption
outside of the given array. In a worst case scenario, this behavior could result in
an infinite loop which never terminates. The fault present within this code can be
detected by Static Analysis tools.
58
4.2.2 When Does a Fault Manifest Itself as a Failure
In order to use static analysis for reliability prediction, it is important to under-
stand what causes these faults to become failures. In reliability growth modeling, one
of the most important parameters is known as the fault exposure ratio (FER). The
parameter represents the average detectability of a fault within software, or in other
words, the relationship between faults and failures. Naixin and Malaiya[NM96] and
von Mayrhauser and Srimani[MvMS93] discuss both the calculation of this parame-
ter as well as its meaning to a software module. This parameter, however, is entirely
black-box based, and does not help relate faults to failures at a detailed level.
There are many reasons why a fault lays dormant and does not manifest itself as
a failure. The first, and most obvious, deals with code coverage. If a fault does not
execute, it can not lead to failure. While this is intuitively obvious, determining if a
fault can be executed can be quite complicated and require significant analysis.
Figure 4-7 provides such an example of this complexity. The intent of the code
is to calculate the distance from a current number to the next prime number. These
types of calculations are often used in random number generation such as the Linear
Congruential Random Number Generator, which uses the equation
Ik = (a × Ik−1 + c) mod m (4.1)
with
a representing the multiplier,
59
c representing the increment, and
m representing the modulus.
Depending upon the exact algorithm used, the values selected for a and c may de-
pending upon the distance from m to the next prime number.
There is, however, a statically detectable problem with this implementation. The
code begins on line 9 by checking to make certain that the number is greater than 0.
If this is the case, the code will step through a set of if and else if functions, looking
for the smallest prime number which is greater than the value passed in. Once this
has been found, the t next prime number variable is set to this value, and in line 73
the calculation
t return value = t next prime number − p number (4.2)
is executed. However, there is a problem. If
p number ≥ 127, (4.3)
then
t next prime number = 0, (4.4)
resulting in
t return value < 0. (4.5)
Since t return value, however, is an uint8 t, t return value will take on a very large
60
positive value. It is important to note that this statically detectable fault escaped
testing even though 100% statement coverage had been achieved, as is shown in Figure
4-8.
There are several different ways we can assess the probability of this fault mani-
festing itself. If we base the probability on the size of the input space, there are 256
possible inputs to the function, ranging from 0 to 255. The failure will occur any
time
127 ≤ p number < 256 (4.6)
resulting in
pf =129
256= .50360625. (4.7)
However, if the input domain is limited to the domain D such that
D = {x ∈ N ∩ x ≤ 127} (4.8)
then the failure probability is reduced to
pf =1
128= 0.0078125. (4.9)
If the input domain is further reduced to the domain D such that
D = {x ∈ N ∩ x ≤ 100} (4.10)
61
then
pf =0
100= 0. (4.11)
This complexity partly explains why faults lie dormant for such long periods and why
many faults only surface when a change is made to the software. If an initial program
using this software never passes a value greater than 100 to the routine, it will never
fail. But if a change is made and a value of 128 can now be passed in, the failure is
more likely to surface. A change in value to include up to 255 for the input virtually
guarantees that the failure will surface.
If we want to measure failure as to the number of paths through the program
which can cause failure, a control flow graph can be generated for the source code as
is shown in Figure 4-9. From this graph, the static path count through the method
can be calculated. In this case, there are 33 distinct paths through the function; one
path will cause a failure, resulting in pf = 133
= 0.0303 if all paths are assumed to
execute equally.
Uninitialized variables, an example of which is given in Figure 4-10, represent
the second most prevalent problem in Japanese source code. In this case, there are
seven distinct paths through the source code, yielding a static path count of 7. Of
these paths, six of them do not contain any statically detectable faults. However, the
eighth path fails to initialize a function pointer, resulting in the program jumping to
an unknown address, and most likely crashing the program. Assuming that we can
consider these paths of having an equal probability of executing, pf = 17≈ 0.1428.
The very presence of these problems may or may not immediately result in a
62
failure. Returning a larger than expected number from a mathematical function may
not immediately result in a software failure. A random jump to a pointer address in
C will most likely result in an immediate and noticeable failure; Overwriting the stack
return address is likely to result in the same behavior. Thus, for a fault to manifest
itself as a failure, the code which contains the fault first must execute, and then the
result of fault must be used in a manner that will result in a failure occurring.
63
1: #include <stdint.h>
2: /* This routine will calculate the distance from the current number to the next prime number. */
3: uint8_t calculate_distance_to_next_prime_number(uint8_t p_number) {
4: int8_t t_next_prime_number = 0;
5: uint8_t t_return_value = 0;
6: if (p_number > 0) {
7: if (p_number < 2)
8: {t_next_prime_number = 2; }
9: else if (p_number < 3)
10: { t_next_prime_number = 3; }
11: else if (p_number < 5)
12: { t_next_prime_number = 5; }
13: else if (p_number < 7)
14: { t_next_prime_number = 7; }
15: else if (p_number < 11)
16: { t_next_prime_number = 11; }
17: else if (p_number < 13)
18: { t_next_prime_number = 13; }
19: else if (p_number < 17)
20: { t_next_prime_number = 17; }
21: else if (p_number < 19)
22: { t_next_prime_number = 19; }
23: else if (p_number < 23)
24: { t_next_prime_number = 23; }
25: else if (p_number < 29)
26: { t_next_prime_number = 29; }
27: else if (p_number < 31)
28: { t_next_prime_number = 31; }
29: else if (p_number < 37)
30: { t_next_prime_number = 37; }
31: else if (p_number < 41)
32: { t_next_prime_number = 41; }
33: else if (p_number < 43)
34: { t_next_prime_number = 43; }
35: else if (p_number < 47)
36: { t_next_prime_number = 47; }
37: else if (p_number < 53)
38: { t_next_prime_number = 53; }
39: else if (p_number < 59)
40: { t_next_prime_number = 59; }
41: else if (p_number < 61)
42: { t_next_prime_number = 61; }
43: else if (p_number < 67)
44: { t_next_prime_number = 67; }
45: else if (p_number < 71)
46: { t_next_prime_number = 71; }
47: else if (p_number < 73)
48: { t_next_prime_number = 73; }
49: else if (p_number < 79)
50: { t_next_prime_number = 79; }
51: else if (p_number < 83)
52: { t_next_prime_number = 83; }
53: else if (p_number < 89)
54: { t_next_prime_number = 89; }
55: else if (p_number < 97)
56: { t_next_prime_number = 97; }
57: else if (p_number < 101)
58: { t_next_prime_number = 101; }
59: else if (p_number < 103)
60: { t_next_prime_number = 103; }
61: else if (p_number < 107)
62: { t_next_prime_number = 107; }
63: else if (p_number < 109)
64: { t_next_prime_number = 109; }
65: else if (p_number < 113)
66: { t_next_prime_number = 113; }
67: else if (p_number < 127)
68: { t_next_prime_number = 127; }
69: t_return_value = t_next_prime_number - p_number;
70: }
71: return t_return_value;
72: }
Figure 4-7: Source exhibiting statically detectable mathematical error.
64
File ‘prime_number_example1.c’
Lines executed:100.00% of 69
prime_number_example1.c:creating ‘prime_number_example1.c.\index{gcov}gcov’
-: 1:#include <stdint.h>
-: 2:uint8_t calculate_distance_to_next_prime_number(uint8_t p_number);
function calculate_distance_to_next_prime_number called 1143 returned 100% blocks executed 100%
1143: 3:uint8_t calculate_distance_to_next_prime_number(uint8_t p_number) {
1143: 4: int8_t t_next_prime_number = 0;
1143: 5: uint8_t t_return_value;
1143: 6: if (p_number > 0) {
1142: 7: if (p_number < 2)
118: 8: {t_next_prime_number = 2; }
1024: 9: else if (p_number < 3)
119: 10: {t_next_prime_number = 3;}
905: 11: else if (p_number < 5)
220: 12: {t_next_prime_number = 5;}
685: 13: else if (p_number < 7)
152: 14: {t_next_prime_number = 7;}
533: 15: else if (p_number < 11)
134: 16: {t_next_prime_number = 11;}
399: 17: else if (p_number < 13)
55: 18: {t_next_prime_number = 13;}
344: 19: else if (p_number < 17)
72: 20: {t_next_prime_number = 17;}
272: 21: else if (p_number < 19)
23: 22: {t_next_prime_number = 19;}
249: 23: else if (p_number < 23)
29: 24: {t_next_prime_number = 23;}
220: 25: else if (p_number < 29)
55: 26: {t_next_prime_number = 29;}
165: 27: else if (p_number < 31)
17: 28: {t_next_prime_number = 31;}
148: 29: else if (p_number < 37)
29: 30: {t_next_prime_number = 37;}
119: 31: else if (p_number < 41)
13: 32: {t_next_prime_number = 41;}
106: 33: else if (p_number < 43)
6: 34: {t_next_prime_number = 43;}
100: 35: else if (p_number < 47)
14: 36: {t_next_prime_number = 47;}
86: 37: else if (p_number < 53)
16: 38: {t_next_prime_number = 53;}
70: 39: else if (p_number < 59)
13: 40: {t_next_prime_number = 59;}
57: 41: else if (p_number < 61)
8: 42: {t_next_prime_number = 61;}
49: 43: else if (p_number < 67)
6: 44: {t_next_prime_number = 67;}
43: 45: else if (p_number < 71)
5: 46: {t_next_prime_number = 71;}
38: 47: else if (p_number < 73)
6: 48: {t_next_prime_number = 73;}
32: 49: else if (p_number < 79)
5: 50: {t_next_prime_number = 79;}
27: 51: else if (p_number < 83)
4: 52: {t_next_prime_number = 83;}
23: 53: else if (p_number < 89)
8: 54: {t_next_prime_number = 89;}
15: 55: else if (p_number < 97)
5: 56: {t_next_prime_number = 97;}
10: 57: else if (p_number < 101)
1: 58: {t_next_prime_number = 101;}
9: 59: else if (p_number < 103)
2: 60: {t_next_prime_number = 103;}
7: 61: else if (p_number < 107)
2: 62: {t_next_prime_number = 107;}
5: 63: else if (p_number < 109)
2: 64: {t_next_prime_number = 109;}
3: 65: else if (p_number < 113)
1: 66: {t_next_prime_number = 113;}
2: 67: else if (p_number < 127)
2: 68: {t_next_prime_number = 127;}
1142: 69: t_return_value = t_next_prime_number - p_number;
-: 70: }
1143: 71: return t_return_value;
-: 72:}
Figure 4-8: GNU gcov output from testing prime number source code.
65
Figure 4-9: Control flow graph for calculate distance to next prime number method.
66
1 #include "interface.h"
2
3 static uint16_t test_active_flags;
4 static uint16_t test_done_flags;
5
6 void do_walk(void) {
7 uint8_t announce_param;
8 function_ptr_type test_param;
9 if (TEST_BIT(test_active_flags, DIAG_TEST)) {
10 if (check_for_expired_timer(TIME_IN_SPK_TEST) == EXP) {
11 start_timer();
12 if (TEST_BIT(test_active_flags, RF_TEST)) {
13 announce_param = LF_MESSAGE;
14 test_param = LF_TEST;
15 SETBIT_CLRBIT(test_active_flags, LF_TEST, RF_TEST);
16 } else if (TEST_BIT(test_active_flags, LF_TEST)) {
17 announce_param = LR_MESSAGE;
18 test_param = LR_TEST;
19 SETBIT_CLRBIT(test_active_flags, LR_TEST, LF_TEST);
20 } else if (TEST_BIT(test_active_flags, LR_TEST)) {
21 announce_param = RR_MESSAGE;
22 test_param = RR_TEST;
23 SETBIT_CLRBIT(test_active_flags, RR_TEST, LR_TEST);
24 } else if ((TEST_BIT(test_active_flags, RR_TEST)) &&
25 (get_ap_state(AUK_STATUS) != UNUSED_AUK)) {
26 announce_param = SUBWOOFER_MESSAGE;
27 test_param = AUX1_TEST;
28 SETBIT_CLRBIT(test_active_flags, SUBWOOFER1_TEST, RR_TEST);
29 } else {
30 announce_param = EXIT_TEST_MESSAGE;
31 CLRBIT(test_active_flags, DIAG_TEST);
32 SETBIT(test_done_flags, DIAG_TEST);
33 }
34 make_announcements(announce_param);
35 *test_param();
36 }
37 }
38 }
Figure 4-10: Source exhibiting uninitialized variable.
67
Figure 4-11: Control flow graph for do walk method.
68
4.3 Measuring Code Coverage
The first and most important starting point for determining if a fault is to become
a failure is related to source code coverage. If a fault is never encountered during ex-
ecution, it can not result in a failure. In many software systems, especially embedded
systems, the percentage of code which routinely executes is actually quite small, and
the majority of the execution time is spent covering the same lines over and over
again. Embedded systems are also designed with a few repetitive tasks that execute
periodically at similar rates. Thus, with limited testing covering the normal use cases
for the system, information about the “normal” execution paths can be obtained.
There are many different metrics and measurements associated with code cover-
age. Kaner [Kan95] lists 101 different coverage metrics that are available. State-
ment Coverage2 measures whether each executable statement is encountered. Block
coverage is an extension to statement coverage except that the unit of code is a se-
quence of non-branching statements. Decision Coverage 3 reports whether boolean
expressions tested in control structures evaluated to both true and false. Condition
Coverage reports whether every possible combination of boolean sub-expressions oc-
curred. Condition/Decision Coverage is a hybrid measure composed by the union
of condition coverage and decision coverage. Path Coverage4 reports whether each
of the possible paths5 in each function have been followed. Data Flow Coverage, a
2Also known as line coverage or segment coverage.3Also known as branch coverage, all-edges coverage, basis path coverage, or decision-decision-path
testing.4Also known as predicate coverage.5A path is a unique sequence of branches from the function entry to the exit.
69
variation of path coverage, considers the sub-paths from variable assignments to sub-
sequent references of the variables. Function Coverage measure reports whether each
function or procedure was executed. It is useful during preliminary testing to assure
at least some coverage in all areas of the software. Call Coverage6 reports whether
each function call has been made. Loop Coverage measures whether each loop is
executed body zero times, exactly once, and more than once (consecutively). Race
Coverage reports whether multiple threads execute the same code at the same time
and is used to detect failure to synchronize access to resources. In many cases, cover-
age definitions overlap each other, as decision coverage includes statement coverage,
Condition/Decision Coverage includes Decision Coverage and Condition Coverage,
Path Coverage includes Decision Coverage, and Predicate Coverage includes Path
Coverage and Multiple Condition Coverage[Cor05].
Marick [Mar99] cites some of the misuses for code coverage metrics. A certain
level of code coverage is often mandated by the software development process when
evaluating the effectiveness of the testing phase. This level is often varied. Extreme
Programming advocates endorse 100% method coverage in order to ensure that all
methods are invoked at least once, though there are also exceptions given for small
functions which are smaller than the test cases would be[Agu02][JBl]. Piwowarski,
Ohba, and Caruso[POC93] indicate that 70% statement coverage is necessary to en-
sure sufficient test case coverage, 50% statement coverage is insufficient to exercise
the module, and beyond 70%-80% is not cost effective. Hutchins et al.[HFGO94]
indicates that even 100% coverage is not necessarily a good indication of testing ad-
6Also known as call pair coverage.
70
equacy, for though more faults are discovered at 100% coverage than 90% or 95%
coverage, faults can still be uncovered even if testing has reached 100% coverage.
There has been significant study of the relationship between code coverage and
the resulting reliability of the source code. Garg [Gar95] [Gar94] and Del Frate
[FGMP95] indicate that there is a strong correlation between code coverage obtained
during testing and software reliability, especially in larger programs. The exact extent
of this relationship, however, is unknown.
4.3.1 The Static Analysis Premise
The fundamental premise behind this model is that the resulting software relia-
bility can be related to the statically detectable faults present within the source code,
the number of paths which lead to the execution of the statically detectable faults,
and the rate of execution for each segment of the software.
To model reliability, the source code is first divided into statement blocks. A
statement block represents a contiguous block of source code instructions which is
uninterrupted by a conditional statement. By using this organization, the source
code is translated into a set of blocks connected by decisions. Statically detectable
faults can then be assigned into the appropriate block.
Figure 4-12 provides example source code for an embedded system timer routine
which verifies if a timer has or has not expired. Figure 4-13 shows a language transla-
tion of the code into Java. This source code can be decomposed into the block format
shown in Figure 4-14.
71
1 #include <stdint.h>
2 typedef enum {FALSE, TRUE} boolean;
3
4 extern uint32_t get_current_time(void);
5
6 typedef struct {
7 uint32_t starting_time; /* Starting time for the system */
8 uint32_t timer_delay; /* Number of ms to delay */
9 boolean enabled; /* True if timer is enabled */
10 boolean periodic_timer; /* TRUE if the timer is periodic. */
11 } timer_ctrl_struct;
12
13 boolean has_time_expired(timer_ctrl_struct p_timer)
14 {
15 boolean t_return_value = FALSE;
16 uint32_t t_current_time;
17 if (p_timer.enabled == TRUE)
18 {
19 t_current_time = get_current_time();
20 if ((t_current_time > p_timer.starting_time) &&
21 ((t_current_time - p_timer.starting_time) > p_timer.timer_delay))
22 {
23 /* The timer has expired. */
24 t_return_value = TRUE;
25 }
26 else if ((t_current_time < p_timer.starting_time) &&
27 ((t_current_time + (0xFFFFFFFFu - p_timer.starting_time)) > p_timer.timer_delay))
28 {
29 /* The timer has expired and wrapped around. */
30 t_return_value = TRUE;
31 }
32 else
33 {
34 /* The timer has not yet expired. */
35 t_return_value = FALSE;
36 }
37 if (t_return_value == TRUE)
38 {
39 if (p_timer.periodic_timer == TRUE )
40 {
41 p_timer.starting_time = t_current_time;
42 }
43 else
44 {
45 p_timer.enabled = FALSE;
46 p_timer.starting_time = 0;
47 p_timer.periodic_timer = FALSE;
48 }
49 }
50 }
51 else
52 {
53 /* Timer is not enabled. */
54 }
55 return t_return_value;
56 }
Figure 4-12: Source code which determines if a timer has expired.
The reliability for each block is assessed using a Bayesian Belief Network, which is
described in detail in the next chapter. The Bayesian Belief network uses an analysis
of the fault locations, fault characteristics, historical data from past projects, fault
taxonomy data, and other parameters to determine if the given fault is either a valid
statically detected fault or a false positive fault.
72
1 public class sample_timer {
2 public class timer_ctrl_struct {
3 public int starting_time; /* Starting time for the system */
4 public int timer_delay; /* Number of ms to delay */
5 public boolean enabled; /* True if timer is enabled */
6 public boolean periodic_timer; /* TRUE if the timer is periodic. */
7 };
8
9 boolean has_time_expired(timer_ctrl_struct p_timer) {
10 boolean t_return_value = false;
11 int t_current_time;
12 if (p_timer.enabled == true) {
13 t_current_time = (int)(System.currentTimeMillis() % 0xFFFFFFFF);
14 if ((t_current_time > p_timer.starting_time)
15 && ((t_current_time - p_timer.starting_time) > p_timer.timer_delay)) {
16 /* The timer has expired. */
17 t_return_value = true;
18 } else if ((t_current_time < p_timer.starting_time)
19 && ((t_current_time + (0xFFFFFFFF - p_timer.starting_time)) > p_timer.timer_delay)) {
20 /* The timer has expired and wrapped around. */
21 t_return_value = true;
22 } else {
23 /* The timer has not yet expired. */
24 t_return_value = false;
25 }
26 if (t_return_value == true) {
27 if (p_timer.periodic_timer == true) {
28 p_timer.starting_time = t_current_time;
29 } else {
30 p_timer.enabled = false;
31 p_timer.starting_time = 0;
32 p_timer.periodic_timer = false;
33 }
34 }
35 } else {
36 /* Timer is not enabled. */
37 }
38 return t_return_value;
39 }
40 }
Figure 4-13: Translation of timer expiration routine from C to Java. Note that nothinghas changed other than the implementation language. The algorithm is exactly the same.
73
Figure 4-14: Flowchart for check timer routine..
74
Integration of Code Coverage Into the Model
Simply using faults to model reliability is insufficient, for the faults must be acti-
vated through execution before manifesting themselves as a failure.
Table 4.2: Discrete Paths through sample functionPath Uniform Path Uniform Conditional Uniform Path Uniform Conditional
Path Logic with withExecution Value Tracking Value Tracking
A .10 .5000 1/6 = 0.166 = 0.5000B → C .10 .5 · .75 · .75 · .5 = .1406 1/6 = 0.166 .5 · .75 · .75 · 1.0 = 0.2812B → D .10 .5 · .75 · .25 · .5 = .0468 0/6 = 0.000 .5 · .75 · .25 · 0.0 = 0.0000B → E .10 .5 · .25 · .5 = .0625 0/6 = 0.000 .5 · .25 · 0.0 = 0.0000B → C → F .10 .5 · .75 · .75 · .5 · .5 = .0703 0/6 = 0.000 .5 · .75 · .75 · 0.0 · .5 = 0.0000B → D → F .10 .5 · .75 · .25 · .5 · .5 = .0234 1/6 = 0.166 .5 · .75 · .25 · 1.0 · .5 = 0.0468B → E → F .10 .5 · .25 · .5 · .5 = .0312 1/6 = 0.166 .5 · .25 · 1.0 · .5 = 0.0625B → C → G .10 .5 · .75 · .75 · .5 · .5 = .0703 0/6 = 0.000 .5 · .75 · .75 · 0.0 · .5 = 0.0000B → D → G .10 .5 · .75 · .25 · .5 · .5 = .0234 1/6 = 0.166 .5 · .75 · .25 · 1.0 · .5 = 0.0468B → E → G .10 .5 · .25 · .5 · .5 = .0312 1/6 = 0.166 .5 · .25 · 1.0 · .5 = 0.0625
The simplest method for establishing code coverage using the model given would
be to assume that all paths through the method are executed with equal probability.
For example, the function diagramed in Figure 4-14 has ten possible paths through the
source code, as is shown in Table 4.2. Using the simplest method, each path would
have a probability pp = 110
= 0.10 of executing. From this method, we can then
calculate a reliability for the given function. However, empirically it is known that
this assumption of uniform path coverage is incorrect. Many functions contain fault
tolerance logic which rarely executes. Other functions contain source code which,
given the calling parameters, is never executed.
The next natural refinement of this probabilistic assignment would be to look at
the discrete decisions which cause the execution of each path through the source code.
To use this methodology, we assume that each conditional statement used to make a
decision has an equal probability of being true or false. Thus, the statement
if (p_timer_enabled == TRUE)
75
has a probability of pi = .50 of taking the if condition and a probability pe = .50 of
taking the else condition. Using this same logic, the statement
if ((t_current_time > p_timer.starting_time) &&
((t_current_time - p_timer.starting_time) > p_timer.timer_delay))
has a probability of
pi = pC1=TRUE ∩ pC2=TRUE = 0.25 (4.12)
of taking the if condition and a probability of .75 of taking the else condition. We
refer to this measure as uniform conditional logic, and when applying it to Figure
4-14, we obtain the third column in Table 4.2.
While this method does generate a valid distribution for each path through the
source code, the method given above fails to take into account the dependencies
present within the function. For example, Blocks D and E of the source code execute
the command “t return value = TRUE;”. Thus, if block D or E is ever visited
within the function, then we are guaranteed to visit block F or G. Block C sets
“t return value = FALSE;”. Thus, if we ever visit node C, we are guaranteed not to
visit node F or G. Making these changes yields the forth and fifth column in Table 4.2.
One problem with value tracking is that it can be difficult to reliably track variable
dependencies if different variables are used but the values are assigned elsewhere in
code, or if there are multiple aliases used to refer to a single variable.
This method is also problematic in that paths which encounter fewer decisions
have a higher probability of executing. It is known, however, from studying developed
76
source code that this assumption is not always true. In many instances, the first logical
checks within a function are often for fault tolerance purposes (i.e. checking for NULL
pointers, checking for invalid data parameters, etc.), and since these conditions rarely
exist, the paths resulting from this logic is rarely executed. Other methods that
can be used include fuzzy logic reasoning for path probabilities and other advanced
techniques which weigh paths based upon their intent.
Table 4.3: Execution Coverage of various paths
Path CoverageA 0
2537= 0
B → C ∪ B → D ∪ B → E ∪ B → C → F ∪ B → D → F∪B → E → F ∪ B → C → G ∪ B → D → G ∪ B → E → G 2537
2537= 1.0000
B → C 24152537
= 0.9519B → D → F ∪ B → E → F 9
2537= 0.0035
B → D → G ∪ B → E → G 1132537
= 0.0445B → D → F ∪ B → D → G 5
2537= 0.0020
B → E → F ∪ B → E → G 1172537
= 0.0461
These theoretical methods also lack an important relevance to the implemented
source code. Each method profiles thusfar does not take into any account the user
provided data which has the greatest effect on which paths actually execute. De-
pending upon the users preferences and use cases, the actual path behavior may vary
greatly from the theoretical values. Figure 4-15 provides output from the GNU gcov
tool showing a simple usage of the routine diagramed in Figure 4-14. From this infor-
mation, we can construct the experimental probability of each block being executed
and set up a system of equations from this information, as is shown in Equation 4.13.
77
100.00% of 17 source lines executed in file check_timer.c
100.00% of 10 branches executed in file check_timer.c
90.00% of 10 branches taken at least once in file check_timer.c
100.00% of 1 calls executed in file check_timer.c
Creating check_timer.c.\index{gcov}gcov.
#include <stdint.h>
typedef enum {FALSE, TRUE} boolean;
extern uint32_t get_current_time(void);
typedef struct {
uint32_t starting_time; /* Starting time for the system */
uint32_t timer_delay; /* Number of ms to delay */
boolean enabled; /* True if timer is enabled */
boolean periodic_timer; /* TRUE if the timer is periodic. */
} timer_ctrl_struct;
boolean has_time_expired(timer_ctrl_struct *p_timer)
2537 {
2537 boolean t_return_value = FALSE;
2537 uint32_t t_current_time;
2537 if (p_timer->enabled == TRUE)
branch 0 taken = 0%
{
2537 t_current_time = get_current_time();
call 0 returns = 100%
2537 if ((t_current_time > p_timer->starting_time) &&
branch 0 taken = 6%
branch 1 taken = 95%
((t_current_time - p_timer->starting_time) > p_timer->timer_delay))
{
/* The timer has expired. */
117 t_return_value = TRUE;
branch 0 taken = 100%
}
2420 else if ((t_current_time < p_timer->starting_time) &&
branch 0 taken = 94%
branch 1 taken = 97%
((t_current_time + (0xFFFFFFFFu - p_timer->starting_time)) > p_timer->timer_delay))
{
/* The timer has expired and wrapped around. */
5 t_return_value = TRUE;
branch 0 taken = 100%
}
else
{
/* The timer has not yet expired. */
2415 t_return_value = FALSE;
}
2537 if (t_return_value == TRUE)
branch 0 taken = 95%
{
122 if (p_timer->periodic_timer == TRUE )
branch 0 taken = 7%
{
113 p_timer->starting_time = t_current_time;
branch 0 taken = 100%
}
else
{
9 p_timer->enabled = FALSE;
9 p_timer->starting_time = 0;
9 p_timer->periodic_timer = FALSE;
}
}
}
else
{
/* Timer is not enabled. */
}
2537 return t_return_value;
}
Figure 4-15: gcov output for functional testing of timer routine.
78
pB→D→F+ pB→E→F = 92537
pB→D→G+ pB→E→G = 1132537
pB→D→F+ pB→D→G+ = 52537
pB→E→F+ pB→E→G = 1172537
(4.13)
However, this set of equations is unsolvable. Thus, from the information captured
by gcov is not entirely suitable for determining the paths which are executed during
limited testing.
There are many tools that have been developed to aid in code coverage analysis,
both commercial and open source, besides the gcov program. A detailed discussion
of Java code coverage tools is available in [Agu02]. However, none of the existing
analysis tools supports branch coverage metrics, requiring the development of our
own tool to measure branch coverage of Java programs.
It is possible to obtain better information on the branch coverage for the func-
tion if a subtle change is made to the source code. This conceptual change involves
placing a log point in each block of source code. For this primitive example, this
is accomplished through a simple printf statement in the source code and a letter
A → G corresponding to the block of code executing. Upon exit from the function, a
newline is printed, indicating that the given trace has completed. This modified code
is shown in Figure 4-16. In this figure, lines retain their initial numbering scheme
from the original code. Code which has been added is denoted with a ** symbol.
79
** #include <stdio.h>
1 #include <stdint.h>
2 typedef enum {FALSE, TRUE} boolean;
3
4 extern uint32_t get_current_time(void);
5
6 typedef struct {
7 uint32_t starting_time; /* Starting time for the system */
8 uint32_t timer_delay; /* Number of ms to delay */
9 boolean enabled; /* True if timer is enabled */
10 boolean periodic_timer; /* TRUE if the timer is periodic. */
11 } timer_ctrl_struct;
12
13 boolean has_time_expired(timer_ctrl_struct p_timer)
14 {
15 boolean t_return_value = FALSE;
16 uint32_t t_current_time;
17 if (p_timer.enabled == TRUE)
18 {
** printf("B");
19 t_current_time = get_current_time();
20 if ((t_current_time > p_timer.starting_time) &&
21 ((t_current_time - p_timer.starting_time) > p_timer.timer_delay))
22 {
23 /* The timer has expired. */
** printf("E");
24 t_return_value = TRUE;
25 }
26 else if ((t_current_time < p_timer.starting_time) &&
27 ((t_current_time + (0xFFFFFFFFu - p_timer.starting_time)) > p_timer.timer_delay))
28 {
29 /* The timer has expired and wrapped around. */
** printf("D");
30 t_return_value = TRUE;
31 }
32 else
33 {
34 /* The timer has not yet expired. */
** printf("C");
35 t_return_value = FALSE;
36 }
37 if (t_return_value == TRUE)
38 {
39 if (p_timer.periodic_timer == TRUE )
40 {
** printf("G");
41 p_timer.starting_time = t_current_time;
42 }
43 else
44 {
** printf("F");
45 p_timer.enabled = FALSE;
46 p_timer.starting_time = 0;
47 p_timer.periodic_timer = FALSE;
48 }
49 }
50 }
51 else
52 {
** printf("A");
53 /* Timer is not enabled. */
54 }
** printf("\n");
55 return t_return_value;
56 }
Figure 4-16: Modified timer source code to output block trace.
80
BC
BC
. . .
BC
BEF
BC
. . .
BC
BEG
BC
. . .
BC
BDF
Figure 4-17: Rudimentary trace output file.
By compiling and executing this modified code, a trace file can be captured match-
ing that shown in Figure 4-17 providing the behavioral trace for the program. By
postprocessing this file using an AWK or PERL script, it is possible to determine the
number of unique paths executed through the function as well as their occurrence,
as is shown in Table 4.4. Notice that the actual path counts observed during lim-
ited testing are significantly different from the theoretical path count. One drawback
to this method is that trace logging adds significant overhead to program execution
which may affect reliability if used in a hard real time system.
Table 4.4: Execution Coverage of various pathsPath Coverage Percentage Uniform Path Uniform Conditional Uniform Conditional Uniform Path
Count Execution Logic with Value Tracking with Value TrackingFrom Table 4.2
A 0 0
2537= .0000 .10 .5000000 0.166 .500000
B → C 2415 2415
2537= .9519 .10 .1406250 0.166 .281250
B → D 0 0
2537= .0000 .10 .0468750 0.000 .000000
B → E 0 0
2537= .0000 .10 .0625000 0.000 .000000
B → E → F 7 7
2537= .0028 .10 .0312500 0.166 .062500
B → E → G 110 110
2537= .0433 .10 .0312500 0.166 .062500
B → D → G 3 3
2537= .0012 .10 .0234375 0.166 .046875
B → D → F 2 2
2537= .0008 .10 .0234375 0.166 .046875
B → C → F 0 0
2537= .0000 .10 .0703125 0.166 .000000
B → C → G 0 0
2537= .0000 .10 .0703125 0.166 .000000
Through the use of the GNU debugging program (GDB), it is possible to obtain
81
the same path coverage information without modifying the original source code. This
is accomplished through the use of breakpoints combined with output logging. To
accomplish this, a debug script is created which defines a breakpoint for each block.
When one of these breakpoints is reached, an appropriate output message is displayed,
and then the program continues execution. This will generate the same format output
as the print method applied previously. However, this method requires no changes to
the source code. The script used for this is shown in Figure 4-18. One disadvantage
of this method, however, is that it is impossible to generate an explicit output when
block A is encountered, as Block A does not contain any executable code upon which
a breakpoint can be placed.
file a.exe break check_timer.c:19 commands silent printf "B"
continue end break check_timer.c:35 commands silent printf "C"
continue end break check_timer.c:24 commands silent printf "D"
continue end break check_timer.c:30 commands silent printf "E"
continue end break check_timer.c:45 commands silent printf "F"
continue end break check_timer.c:41 commands silent printf "G"
continue end break check_timer.c:55 commands silent printf "\n"
continue end set height 0 run quit
Figure 4-18: gdb script for generating path coverage output trace.
In the event that the breakpoint method described above is inappropriate for ob-
taining the execution trace, there are several other methods that can be used. In
certain applications, it is not feasible for the debugger to interrupt the program’s
execution. Delays introduced by a debugger might cause the program to change its
behavior drastically, or perhaps fail, even when the code itself is correct. In this
situation, the GDB supports a feature referred to as Tracepoints. Using GDB’s trace
and collect commands, you can specify locations in the program, called tracepoints,
82
and arbitrary expressions to evaluate when those tracepoints are reached. The tra-
cepoint facility can only be used with remote targets. As a final (and extremely
difficult) method, a logic analyzer can be connected to the address bus, so long as the
microprocessor on the system has an accessible address bus. By setting the trigger-
ing systems appropriately, the logic analyzer can trigger on the instruction addresses
desired, and by storing this information in a logic buffer, a coverage trace can be cre-
ated. This method, however, is by far the most difficult method for obtaining path
coverage.
Chapter 5
Static Analysis Fault Detectability1
Chapter 3 of this dissertation provided an overview of existing static analysis tools,
an overview of their capabilities, and their availability. However, as the goal of our
research is to use static analysis to estimate the software reliability of existing software
packages, it is imperative that the real-world detection capabilities for existing static
analysis tools be investigated. As with all fields of software engineering, static analysis
tools are constantly evolving, incorporating new features and detection capabilities,
executing on different platforms, and resolving development bugs. Therefore, the
analysis of capabilities must both be current and relevant.
For all of the advantages of static analysis tools, there have been very few indepen-
dent comparison studies of Java static analysis tools. Rutar et al. [RAF04] compares
the results of using Findbugs, JLint, and PMD tools on Java source code. This study,
however, is somewhat flawed in that it only looks at Open Source tools and does not
investigate the performance of commercial tools. Second, while the study itself was
1Portions of this chapter appeared in Schilling and Alam[SA07b].
83
84
conducted on five mid-sized programs, there was no attempt to analyze specifically
the capabilities of each tool when it comes to detecting statically detectable faults.
Forristal[For05] compares 12 commercial and open source tools for effectiveness, but
the analysis is based only on security aspects and security scanners, not the broader
range of static analysis tools available. Furthermore, while the study included tools
which tested for Java faults, the majority of the study was aimed at C and C++
analysis tools, so it is unclear how applicable the results are to the Java programming
language.
Thus, a pivotal need for the success of our modeling is to experimentally determine
the capabilities of existing static analysis tools.
5.1 Experimental Setup
In order to analyze the effectiveness of Java static analysis tools, an experiment
was created which would allow an assessment of the effectiveness of multiple static
analysis tools when applied against a standard set of source code modules. The basic
steps for this experiment involved:
1. Determining the scope of faults that would be included within the validation
suite.
2. Creating test cases which represented manifestations of the faults.
3. Running the analysis tools on validation suite.
4. Combining the tool results using the SoSART tool developed for this purpose.
85
5. Analyzing the results.
Table 5.1: Static Analysis Fault Categories for Validation Suite
Aliasing errorsArray Boundary ErrorsDeadlocks and other synchronization errorsInfinite Loop conditionsLogic FaultsMathematical FaultsNull De-referencing FaultsUninitialized variables
All together, the validation package consisted of approximately 1200 lines of code,
broken down into small segments which demonstrated a single fault. Injected faults
were broken into eight different categories based upon those mistakes which commonly
occur during software development. Categories are shown in Table 5.1.
Aliasing errors represented errors which occurred due to multiple references alias-
ing to a single variable instance. Array out of bounds errors were designed to check
that static analysis tools were capable of detecting instances in which an array refer-
ence falls outside the valid boundaries of the given array. Several different mechanisms
for indexing outside of an array were tested, including off-by-one errors when iter-
ating through a loop and fixed errors in which a definitive reference which is out of
the range of the array occurs. One test case involved de-referencing of a zero length
array, and one test involved the out of range reference into a two dimensional array.
Deadlocks and synchronization errors were tested using standard examples of code
which suffered from deadlocks, livelocks, and other synchronization issues. An at-
tempt to locate infinite loops using static analysis also occurred. Several examples of
commonly injected infinite loop scenarios were developed based upon PSP historical
86
data. One example used a case of a Vacuous Truth for the while condition, whereas
others tested the case in which a local variable is used to calculate the index value
yet the value never changes once inside of the loop construct.
In the area of logic, there were six subareas which had test cases developed.
Case statement tests were designed to validate that case statements missing break
statements were detected, impossible case statements (i.e. case statements whose
values could not be generated) were detected, and dead code within case statements
was detected. Operator precedence tests tested the ability of the static analysis tool
to detect code which exhibited problems with operator precedence and might result
in outcomes which are not correct relating to the desired outcomes. Logic conditions
which always evaluate in the same manner were also tested, as well as incorrect string
comparison uses.
Mathematical analysis consisted of two major areas, namely division by zero de-
tection and numerical overflow and underflow. Division by zero was tested for both
integer numbers and floating point numbers. Null variable dereferences were tested
using a set of logic which resulted in a null variable reference being dereferenced.
Lastly, a set of test conditions verified the ability of the static analysis tools to detect
uninitialized variables.
Code was developed in Eclipse, and, except cases where explicit faults were desired,
was clean of all warnings at compilation time. In the case where the Eclipse tool or
the Java compiler issued a warning indicating an errant construct, this information
was logged for future comparison. In many cases, it was found that even though
Eclipse provided an indication of an existing statically detectable error being present,
87
the actual class file was compiled by the tool.
Once the analysis suite had been developed, it was placed under configuration
management and archived in a CVS repository. The suite was re-reviewed in a PSP
style review, specifically looking for faults within the test suite as well as other im-
provements that could be made. While the initial analysis only included nine tools,
a tenth tool was obtained and included in the experiment.
Following the development of the validation files, an automated process for exe-
cuting the analysis tools was developed. This ensured that the analysis of this suite
(as well as subsequent file sets) could occur in an automated and uniform manner.
An Apache Ant build file was created which automatically invoked the static analy-
sis tools. The output from the analysis tools was then combined using the Software
Static Analysis Reliability Toolkit (SOSART)[SA06a]. The SOSART tool acted as a
static analysis Meta Data tool as well as providing a visually stimulating environment
for reviewing faults.
When running the static analysis tools, each tool was run with all warnings en-
abled. While this maximized the number of warnings generated and resulted in a
significant number of false positives and other nuisance warnings being generated,
this also maximized the potential for each tool to detect the seeded faults.
5.2 Experimental Results
The experiment consisted of analyzing our validation suite using ten different Java
static analysis tools. Five of the tools used were open source or other readily available
88
static analysis tools. The other five tools included in this experiment represented
commercially available static analysis tools. Due to licensing and other contractual
issues with the commercial tools, the results have been obfuscated and all the tools
will simply be referred to as Tool 1 through Tool 10.
In analyzing the results, three fundamental pieces of data from each tool were
sought. The first goal was to determine if the tool itself issued a warning which
would lead a trained software engineer to detect the injected fault within the source
code. This was the first and most significant objective, for if the tools are not able to
detect real-world fault examples, then the tools will be of little benefit to practitioners.
However, beyond detecting injected faults, we were also interested in the other faults
that the tool found within our source code. These faults can be categorized into two
categories, valid faults which may pose a reliability issue for the source code and false
positive warnings. By definition, false positive warnings encompass both faults which
can not lead to failure given the implementation, as well as faults which detect a
problem which is unrelated to a potential failure. Using this definition, all stylistic
warnings are considered to be invalid, for a stylistic warning by its very nature can
not lead to failure.
5.2.1 Basic Fault Detection Capabilities
The first objective of this experiment was to determine which of the tools actually
detected the injected faults. This accomplished by reviewing the tool outputs in
SOSART and designating those tools which successfully detected the injected static
89
Table 5.2: Summary of fault detection. A 1 indicates the tool detected the injected fault.Tool
Count Eclipse 1 2 3 4 5 6 7 8 9 10
Array Out of Bounds 1 1 0 0 0 0 0 0 0 0 0 1 0Array Out of Bounds 2 0 0 0 0 0 0 0 0 0 0 0 0Array Out of Bounds 3 2 0 1 0 0 0 0 0 0 0 1 0Array Out of Bounds 4 3 0 1 1 0 0 0 0 0 0 1 0Deadlock 1 2 0 1 0 0 0 0 1 0 0 0 0Deadlock 2 3 0 1 0 0 1 0 1 0 0 0 0Deadlock 3 1 0 1 0 0 0 0 0 0 0 0 0Infinite Loop 1 3 0 1 0 0 0 0 1 1 0 0 0Infinite Loop 2 1 0 1 0 0 0 0 0 0 0 0 0Infinite Loop 3 2 0 1 1 0 0 0 0 0 0 0 0Infinite Loop 4 2 0 1 0 0 1 0 0 0 0 0 0Infinite Loop 5 0 0 0 0 0 0 0 0 0 0 0 0Infinite Loop 6 1 0 0 0 0 0 0 0 0 0 1 0Infinite Loop 7 2 0 1 0 0 0 0 0 0 0 1 0logic 1 3 0 1 0 0 0 0 1 1 0 0 0logic 2 2 1 1 0 0 0 0 0 0 0 0 0logic 3 0 0 0 0 0 0 0 0 0 0 0 0logic 4 1 0 1 0 0 0 0 0 0 0 0 0logic 5 1 0 1 0 0 0 0 0 0 0 0 0logic 6 1 0 1 0 0 0 0 0 0 0 0 0logic 7 1 0 1 0 0 0 0 0 0 0 0 0logic 8 0 0 0 0 0 0 0 0 0 0 0 0logic 9 0 0 0 0 0 0 0 0 0 0 0 0logic 10 0 0 0 0 0 0 0 0 0 0 0 0logic 11 1 0 0 0 0 0 0 0 0 0 0 1logic 12 2 0 1 0 0 0 0 0 0 0 1 0logic 13 2 0 1 0 0 0 0 0 0 0 1 0logic 14 4 0 1 0 0 0 0 0 1 1 0 1logic 15 3 0 1 0 0 0 0 0 1 0 0 1Math 1 2 0 0 1 0 0 0 0 0 0 1 0Math 2 2 0 0 1 0 0 0 0 0 0 1 0Math 3 1 0 0 0 0 0 0 0 0 0 1 0Math 4 0 0 0 0 0 0 0 0 0 0 0 0Math 5 3 0 0 0 0 0 0 1 1 0 0 1Math 6 3 0 0 0 0 0 0 1 1 0 0 1Math 7 3 0 0 0 0 0 0 1 1 0 0 1Math 8 4 0 0 0 0 1 0 1 1 0 0 1Math 9 0 0 0 0 0 0 0 0 0 0 0 0Math 10 1 0 0 0 0 0 0 0 0 0 1 0Math 11 1 0 0 0 0 0 0 0 0 0 1 0Math 12 1 0 0 0 0 0 0 0 0 0 1 0Math 13 1 0 0 0 0 0 0 0 0 0 1 0Math 14 1 0 0 0 0 0 0 0 0 0 1 0Math 15 1 0 0 0 0 0 0 0 0 0 1 0Math 16 1 0 0 0 0 0 0 0 0 0 1 0Null Dereferences 1 3 0 1 1 0 1 0 0 0 0 0 0Null Dereferences 2 0 0 0 0 0 0 0 0 0 0 0 0Null Dereferences 3 1 0 0 0 0 1 0 0 0 0 0 0Uninitialized Variable 1 1 1 0 0 0 0 0 0 0 0 0 0Uninitialized Variable 2 2 1 0 0 0 0 0 1 0 0 0 0Total Detected 3 21 5 0 5 0 9 8 1 17 7Percent Detected 6% 42% 10% 0% 1% 0% 18% 16% 2% 34% 14%
faults. These results are shown in Table 5.2. In each column, a 1 is present if the
given tool detected the fault. A 0 is present if the tool did not provide a meaningful
warning which would indicate the presence of a fault within the source code.
Beyond determining which tools detected the faults, we were also interested in
knowing which faults were detected by multiple tools. Rutar et al.[RAF04] indicated
that they found little overlap between tools in their research. We wanted to see if
this held true for out results. The simplest way of accomplishing this was to count
90
Table 5.3: Static Analysis Detection Rate by Tool CountNumber of faults detected by 0 tools 9 18.0%
Number of faults detected by 1 tool 19 38.0%
Number of faults detected by 2 tools 11 22.0%
Number of faults detected by 3 tools 9 18.0%
Number of faults detected by 4 or more tools 2 4.0%
the number of tools which detected a given fault. This result is shown in the count
column of Table 5.3. In summary, of the 50 statically detectable faults present within
the validation suite, 22 of them, or 44% of detected injected faults, were detected by
two or more static analysis tools.
Table 5.4: Correlation between warning tool detections.Tool 1 Tool 2 Tool 3 Tool 4 Tool 5 Tool 6 Tool 7 Tool 8 Tool 9 Tool 10
Eclipse -0.0443 -0.0842 N/A -0.0842 N/A 0.1008 -0.1102 -0.036 -0.1813 -0.1019Tool 1 1 0.1215 N/A 0.1215 N/A 0.0232 0.0707 0.1678 -0.183 -0.1098Tool 2 1 N/A 0.1111 N/A -0.1561 -0.1454 -0.0476 0.1829 -0.1345Tool 3 N/A N/A N/A N/A N/A N/A N/A N/ATool 4 1 N/A 0.1908 0.0363 -0.0476 -0.2392 0.0576Tool 5 N/A N/A N/A N/A N/A N/ATool 6 1 0.6475 -0.0669 -0.3362 .4111Tool 7 1 0.3273 -0.3132 .7673Tool 8 1 -0.1025 .3541Tool 9 1 -.2896Tool 10 1
In order to see if there is a relationship between different tools, the correlation
between tool results was calculated. Perfect correlation between tools, in which case
the tools detected exactly the same set of faults, would be represented by a value of
1. No correlation between detection would be captured as a value of 0. Perfect non-
correlation, in which every fault that was detected by the first tool is not detected by
the second tool and every fault which is not detected by the first tool was detected
by the second tool would be represented as a -1 value. While the results of Table
5.4 do show some correlation between tools, the only correlation of significance (and
admittedly a low significance) is between tools 6 and 7. This correlation indicates
that tools 6 and 7 are capable of detecting similar types of faults.
91
Table 5.5: Static Analysis Tool False Positive and Stylistic Rule DetectionsTool
Eclipse 1 2 3 4 5 6 7 8 9 10
Aliasing Error 0 0 2 13 0 5 20 0 41 0 0Array Out of Bounds 1 0 0 0 0 0 0 9 0 7 0 0Array Out of Bounds 2 0 0 0 2 0 0 7 0 9 0 0Array Out of Bounds 3 0 0 0 1 0 0 27 0 24 0 0Array Out of Bounds 4 0 0 0 1 0 0 7 0 5 0 0Deadlock 1 0 0 2 1 0 0 5 1 5 0 0Deadlock 2 0 0 0 2 0 0 11 0 11 0 0Deadlock 3 0 0 0 4 0 0 0 0 10 0 0Infinite Loop 1 0 0 0 1 0 1 11 0 13 0 0Infinite Loop 2 0 1 3 1 0 0 13 0 10 0 0Infinite Loop 3 0 1 2 2 0 0 15 0 11 0 0Infinite Loop 4 0 0 0 2 0 0 8 0 8 0 0Infinite Loop 5 0 0 0 0 0 0 7 0 7 0 0Infinite Loop 6 0 0 0 0 0 0 7 0 7 0 0Infinite Loop 7 0 0 0 2 0 0 11 0 9 0 0logic 1 0 1 0 4 0 0 29 0 31 0 0logic 2 0 0 0 4 0 0 19 0 14 0 0logic 3 0 0 0 0 0 0 9 0 9 0 0logic 4 0 0 0 4 0 0 19 0 15 0 0logic 5 0 0 0 1 0 0 4 0 5 0 0logic 6 0 0 0 5 0 0 4 0 5 0 0logic 7 0 0 0 5 0 0 3 0 4 0 0logic 8 0 1 0 4 0 0 8 0 7 0 0logic 9 0 0 0 5 0 0 8 0 7 0 0logic 10 0 0 0 4 0 0 10 0 7 0 0logic 11 0 0 0 4 0 0 6 0 5 0 0logic 12 0 0 0 3 0 0 8 0 8 0 0logic 13 0 0 0 3 0 0 10 0 11 0 0logic 14 0 0 0 4 0 0 9 0 6 0 0logic 15 0 0 0 4 0 0 3 0 4 0 0Math 1 0 0 0 0 0 0 7 0 5 0 0Math 2 0 0 0 0 0 0 6 0 6 0 0Math 3 0 0 0 0 0 0 8 0 5 0 0Math 4 0 0 0 0 0 0 6 0 6 1 0Math 5 0 0 0 2 0 0 9 0 8 0 0Math 6 0 0 0 5 0 0 9 0 9 0 0Math 7 0 0 0 2 0 0 10 0 9 0 0Math 8 0 0 0 1 0 0 10 0 11 0 0Math 9 0 0 0 1 0 0 5 0 3 0 0Math 10 0 0 0 1 0 0 2 0 1 0 0Math 11 0 0 0 1 0 0 11 0 8 1 0Math 12 0 0 0 2 0 0 7 0 5 0 0Math 13 0 0 0 1 0 0 12 0 9 0 0Math 14 0 0 0 2 0 0 8 0 6 0 0Math 15 0 0 0 0 0 0 13 0 8 0 0Math 16 0 0 0 0 0 0 7 0 6 0 0Null Dereferences 1 0 0 1 3 0 1 0 0 20 1 0Null Dereferences 2 0 0 1 3 0 0 13 0 11 0 1Null Dereferences 3 0 0 2 3 0 0 21 0 16 0 0uninitialized variables 1 0 0 0 0 0 0 8 0 8 0 0uninitialized variables 2 0 0 0 2 0 0 6 0 5 0 0uninitialized variables 3 0 0 0 2 0 0 4 0 4 0 0Total 0 4 13 117 0 7 489 1 484 3 1
5.2.2 The Impact of False Positives and Style Rules
As was stated previously, when executing the static analysis tools, each and every
rule was enabled for the static analysis tools. As would be expected, this method
generated a significant number of false positive warnings which needed to be filtered
before the valid warnings could be addressed. Our purpose for this analysis, however,
was to attempt to understand the relationship between false positives and the overall
detection of faults. Table 5.5 provides raw information relating to the false positives
92
and stylistic warning issued by each of the tools during our analysis.
Table 5.6: Correlation between false positive and stylistic rule detectionsTool 1 Tool 2 Tool 3 Tool 4 Tool 5 Tool 6 Tool 7 Tool 8 Tool 9 Tool 10
Eclipse N/A N/A N/A N/A N/A N/A N/A N/A N/A N/ATool 1 1 0.43 0.07 N/A -0.05 0.34 -0.04 0.23 -0.07 0.49Tool 2 1 0.23 N/A 0.37 0.25 0.36 0.36 0.03 -0.05Tool 3 1 N/A 0.66 0.22 -0.08 0.55 -0.1 0.11Tool 4 N/A N/A N/A N/A N/A N/A N/ATool 5 1 0.21 -0.03 0.69 0.07 -0.03Tool 6 1 -0.11 0.73 -0.16 -0.03Tool 7 1 -0.09 -0.03 -0.02Tool 8 1 0.07 -0.05Tool 9 1 -0.03Tool 10 1
From the raw data, two tools, Tools 6 and 8, seemed to have an extremely high
rate of false positives and stylistic warnings emanated, with a combined total of 484
and 489 instances respectively. Tool 3 was also slightly higher with 117 false positive
detections. This relationship was then confirmed again by performing a statistical
correlation calculation between the two tools, as is shown in Table 5.6.
This correlation, coupled with anecdotal evidence collected during the analysis, led
to a further investigation of the warnings generated. In reviewing the data shown in
Table 5.7, a significant portion of the false positives were generated by two warnings,
Tool 8 Rule 1, with 399 instances, and Tool 6 Rule 4 with 375 instances. Together,
these two warnings constituted nearly two-thirds of the false positive warnings ob-
served. In reviewing the warning documentation, both of these warnings referred to
a stylistic violation of combining spaces and tabs together within source code, which
by the definition of our experiment was a false positive because it could not directly
result in a program failure.
93
5.3 Applicable Conclusions
This study indicates that there is a large variance in the detection capabilities
between tools. The ability of a given tool to detect an injected fault varied between
0% and 42%. This does not mean that the tools that did not detect our injected
faults were ineffective. Rather, it simply means that they were not appropriate tools
for detecting the faults injected in our experiment.
This experiment also showed a significant correlation between false positive warn-
ings. However, it was also found that a significant portion of the false positive de-
tections came from two rules which essentially detected the same conditions. It is
believed that through proper configuration of the tools and filtration of the rules de-
tected, it may be possible to significantly diminish the impact of the false positive
problem.
Based on our results, a correlation between tools and the faults which they de-
tect was observed. 44% of the injected faults were detected by two or more static
analysis tools. This seems to run contradictory to the results of the Rutar experi-
ment, in which no significant correlation was found. The experiment reported here,
however, used a greater variety of tools, and more importantly, included commer-
cially developed tools which may be more capable than the open source tools used
in their experiment. Furthermore, the methods were slightly different, in that their
method involved starting with existing large projects and applying the tool to them
whereas our experiment used a smaller validation suite to test fault discovery. We
can conclude by our experiment that it is possible to use multiple independent tools
94
and correlation to help reduce the impact of false positive detections when running
static analysis tools.
In common with the Rutar experiment, we conclude that it is still necessary to
use multiple static analysis tools in order to effectively detect all statically detectable
faults. Each tool tested appeared to detect a varying subset of injected faults, and
while there was an overlap between tools, we are not yet able to effectively characterize
the minimum set of tools necessary to ensure adequate fault detection.
As a last observation, even though every attempt had been taken to control the
source code in a stylistic manner, including using Eclipse and the build in style for-
matting tools, style warnings detected by the tools were still in great abundance
during the experiment. These false positives, as has been noted by Hatton[Hat07],
reduce the signal to noise ratio of the analysis. Stylistic warnings by their very nature
can not directly lead to program failure and therefore need to be carefully filtered
when reviewing reliability and failure probability.
95
Table 5.7: Percentage of warnings detected as valid based upon tool and warning.# of valid # of invalid # of valid # of invalid
Tool Warning instances instances % valid Tool Warning instances instances % valid1 1 1 0 100 6 1 18 0 1001 2 1 0 100 6 2 1 0 1001 3 1 0 100 6 3 1 0 1001 4 1 0 100 6 4 0 375 01 5 1 0 100 6 5 0 2 01 6 1 0 100 6 6 1 0 1001 7 2 0 100 6 7 28 0 1001 8 0 3 0 6 6 1 0 1001 9 2 0 100 6 9 0 11 01 10 3 0 100 6 10 0 20 01 11 4 0 100 6 11 2 0 1001 12 8 0 100 6 12 4 3 57.141 13 3 0 100 6 13 3 20 13.041 14 1 0 100 6 14 2 3 401 15 2 0 100 6 15 1 0 1001 16 2 0 100 6 16 1 3 252 1 0 1 0 6 17 2 0 1002 2 2 0 100 6 18 1 0 1002 3 4 0 100 6 19 0 21 02 4 5 1 83.33 6 20 0 1 02 5 3 8 27.27 6 21 0 8 02 6 1 6 14.29 6 22 1 0 1003 1 0 44 0 6 23 0 7 03 2 0 1 0 6 24 0 4 03 3 0 4 0 6 25 0 3 03 4 1 1 50 6 26 4 0 1003 5 0 1 0 6 27 0 2 03 6 2 40 4.76 6 28 51 1 98.083 7 0 4 0 6 29 0 3 03 8 1 0 100 6 30 0 1 03 9 0 2 0 6 31 3 0 1003 10 1 0 100 6 32 0 3 03 11 5 0 100 6 33 1 0 1003 12 0 1 0 6 34 3 0 1003 13 0 2 0 7 1 2 0 1003 14 0 1 0 7 2 4 0 1003 15 3 18 14.29 7 3 2 0 1004 1 1 0 100 7 4 0 1 04 2 1 0 100 7 5 0 1 04 3 1 0 100 7 6 1 0 1004 4 1 0 100 7 7 2 0 1004 5 1 0 100 8 1 0 399 04 6 1 0 100 8 2 1 0 1004 7 1 0 100 8 3 1 0 1004 8 1 0 100 8 4 0 3 05 1 0 1 0 8 5 0 26 05 2 0 2 0 8 6 1 42 2.3310 1 1 0 100 8 7 0 1 010 2 1 0 100 8 8 1 0 10010 3 1 0 100 8 9 0 16 010 4 0 1 0 8 10 0 1 010 5 4 0 100 8 11 1 0 10010 6 1 0 100 10 9 1 0 10010 7 3 0 100 10 10 1 0 10010 8 4 0 100 10 11 2 0 100Summary: 84 142 37.16 149 981 13.18Overall 233 1123 17.18
Chapter 6
Bayesian Belief Network
Bayesian Belief Networks (BBNs) are powerful tools which have been found to be
useful for numerous applications when general behavioral trends are known but the
data being analyzed is uncertain or incomplete. BBNs, through their usage of causal
directed acyclic graphs, offer an intuitive visual representation for expert opinions yet
also provide a sound mathematical basis[Ana04].
Bayesian Belief networks have found wide acceptance within the medical field and
other areas. Haddaway et al. [Had99] provides an overview of both the existing BBN
software packages that are available as well as an extensive analysis of projects which
have successfully used BBNs. Within the software engineering field, many different
projects have used Bayesian Belief Networks to solve common software engineering
problems. Laskey et al.[LAW+04] discusses the usage of Bayesian Belief Networks for
the analysis of computer security. The quality of software architectures have been
assessed using a BBN by van Gurp and Bosch[vGB99] and Neil and Fenton[NF96].
Software Reliability has been assessed using Bayesian Belief Networks by Gran and
96
97
Helminen[GH01] and Pai[Pai01][PD01].
6.1 General Reliability Model Overview
The fundamental premise behind this model is that the resulting software relia-
bility can be related to the number of statically detectable faults present within the
source code, the paths which lead to the execution of the statically detectable faults,
and the rate of execution of each path within the software package.
To model reliability, the source code is first divided into a set of methods or
functions. Each method or function is then divided further into a set of statement
blocks and decisions. A statement block represents a contiguous set of source code
instructions uninterrupted by a conditional statement. By using this organization,
the source code is translated into a set of blocks connected by decisions. Statically
detectable faults are then assigned to the appropriate block based upon their location
in the code.
Once the source code has been decomposed into blocks, the output from the ap-
propriate static analysis tools is linked into the decomposed source code. In order
to predict the reliability, the probability of execution for each branch must be de-
termined. This is accomplished by combining the theoretical programs paths with
the actual program paths observed through execution trace capture during limited
testing. The testing consists of a set of black box tests or functional tests which are
observed at the white box level. For each method, the a reliability for each block is
assigned based upon the output of a Bayesian Belief network relating reliability to
98
the statically detectable faults, code coverage during limited testing, and the code
structure for the routine.
6.2 Developed Bayesian Belief Network
6.2.1 Overview
In order to accurately assess both the validity of the static analysis fault detected
as well as the likelihood of manifestation, a Bayesian Belief network has been devel-
oped. This Bayesian belief network, shown in Figure 6-1, incorporates historical data
as well as program executional traces to predict the probability that a given statically
detectable fault will cause a program failure.
The Bayesian Belief network can effectively be divided into three main segments.
The upper left half of the Bayesian Belief Network handles attributes related to the
validity and risk associated with a statically detected fault. The upper right half of
the Bayesian Belief Network assesses the probability that a given fault is exposed
through program execution. The bottom segment combines the results and provides
an overall estimate of the reliability for the given statically detectable fault.
As is required in a Bayesian Belief Network, each continuous variable must be con-
verted into a discrete state value. Based upon the work of Neil and Fenton[NF96], the
majority of the variables are assigned the states of “Very High”, “High”, “Medium”,
“Low”, and “Very Low”. In certain cases, an optional state of “None” exists. This
state is generally not used in probabilistic calculations unless a variable is specifically
99
Figure 6-1: Bayesian Belief Network relating statically detectable faults, code coverageduring limited testing, and the resulting net software reliability.
observed to have a value of none.
6.2.2 Confirming Fault Validity and Determining Fault Risk
By definition, all static analysis tools have the capability of generating false pos-
itives. Some tools have a low false positive rate, while other tools have a high false
positive rate. Determining the validity of the fault is therefore the first required step
in assessing whether or not a statically detectable fault will cause a failure. Once the
validity of a given statically detectable fault has been assessed, the probability of the
fault manifesting itself as an immediate failure given program execution needs to be
assessed. Thus, the upper left segment of the reliability network concerns itself with
100
Table 6.1: Bayesian Belief Network State DefinitionsNode Name State ValuesFalse Positive Rate Very High, High, Medium, Low, Very LowMethod Clustering Valid Cluster Present, Invalid Cluster Present,
No Cluster Present, Unknown Cluster PresentFile Clustering Valid Cluster Present, Invalid Cluster Present,
No Cluster Present, Unknown Cluster PresentIndependent Correlated Yes, NoFault Validity Valid, False Positive
Immediate Failure Risk Very High, High, Medium, Low, Very Low, None
Maintenance Risk Yes , No
Fault Risk Very High, High, Medium, Low, Very Low, NonePercentage Paths Through BlockExecuted
Very High, High, Medium, Low, Very Low
Test Confidence Very High, High, Medium, Low, Very LowFault Exposure Potential Very High, High, Medium, Low, Very Low
Distance from Nearest Path Adjacent, Near, Far, Very Far, NoneNearest Path Execution Percentage Very High, High, Medium, Low, Very Low, None
Fault Execution Potential Very High, High, Medium, Low, Very Low, NoneNet Fault Exposure Very High, High, Medium, Low, Very Low, None
Code ExecutionBlock Executed, Method Executed, Block Reachable,Block Unreachable
Estimated Reliability Perfect, Very High, High, Medium, Low, Very LowTested Reliability Very High, High, Medium, Low, Very Low
Fault Failed Yes , NoNet Reliability Perfect, Very High, High, Medium, Low, Very LowCalibrated Net Reliability Perfect, Very High, High, Medium, Low, Very LowOutput Color Red, Orange, Yellow, Green
determining the likelihood that a given statically detectable fault is either valid or a
false positive and assessing the fault risk assuming it is a valid statically detectable
fault.
Each detected fault type naturally has a raw false positive rate based upon the
algorithms and implementation. Certain static analysis faults are nearly always a
false positive. Therefore, the states of “Very High”, “High”, “Medium”, “Low”,
and “Very Low” have been selected to represent the false positive rate for the given
statically detectable fault. The model itself does not prescribe a specific translation
from percentages into state values, as this translation may change with the domain.
However, it is expected that this value will be collected from historical analysis of
101
previous projects.
The validity of a static analysis fault is also impacted by the clustering of faults.
Kremenek et al.[KAYE04] indicate that there is a strong correlation between code
locality and either valid or invalid faults. The rationale behind this clustering is that
programmers tend to make the same mistakes, and these mistakes will tend to be
localized at the method, class, file, or package level. However, clustering can also
occur if a tool enters into a run-away state and generates a significant number of
false positives. The clustering states can therefore be represented as “Valid Cluster
Present”, “Invalid Cluster Present” , “No Cluster Present”, and “Unknown Cluster
Present”. By default, when an analysis is first performed, all statically detectable
faults which are part of a cluster will be initialized to the state value “Unknown
Cluster Present”, indicating that the cluster has neither been shown to be valid or
invalid. However, as the Software Engineer inspects the results and observes static
analysis faults to be valid or invalid, the cluster will shift to the appropriate states of
“Valid Cluster Present” or “Invalid Cluster Present” depending upon the results of
the inspection. This model recognizes two types of clustering, clustering at the file
level and clustering at the method level.
Another input node contains information on whether the static analysis fault has
been correlated by a fault in a second tool at the same location. The usage of multiple
static analysis tools allows an extra degree of confidence that the detected static
analysis fault is valid, for if two independent tools have detected a comparable fault
at the same location, then there is a better chance that both faults are valid. This
statement assumes that the algorithms used to detect the fault are truly independent.
102
Even though Rutar et al.[RAF04] and Wagner[WJKT05] did not find a significant
overlap in rule checking capabilities between tools, these experiments only used a
limited set of static analysis tools. Since these articles were published, several new
tools were introduced to the commercial marketplace. Our experimental results,
described in Chapter 5 and published in Schilling and Alam [SA07b], indicate that
there is at least some form of correlation between tools when the faults represent
the same taxonomical definition. The Independently Correlated state can have a
value of “Yes” or “No” depending on whether the statically detectable fault has been
independently correlated or not.
From these nodes, an overall summary node can be obtained, referred to as the
fault validity node. This node can contain the states “Valid” or “False Positive”,
depending on whether the fault is believed to be valid or a false positive instance.
When an analysis is first conducted, this node is estimated based upon the input
states and their observed values. However, as the software engineer begins to inspect
the static analysis warnings, the instances of this node will be observed to be either
“Valid” or “False Positive” as appropriate.
The immediate failure node represents whether the fault that has been detected is
likely to cause an immediate failure. A fault which, for example, indicates that a jump
to a null pointer may occur or that the return stack will be overwritten has a very
high probability of resulting in an immediate failure, and thus, this value will reflect
this case. However, a fault which is detected due to an operator precedence issue may
not be assigned as significant of a value. Thus, for each statically detectable fault,
there is a potential that the given fault will result in a failure if the code is executed.
103
This node probability is directly defined by the characteristics of the fault detected,
and can be represented as “Very High”, “High”, “Medium”, “Low”, “Very Low”, and
“None”. In this case, the “None” state is reserved for static analysis warnings of the
stylistic nature, such as the usage of tabs instead of spaces to indent code. While
these can be considered to be valid warnings, these faults can not directly lead to
program failure.
While there are certain statically detectable faults which do not directly lead to
a failure, there are cases in which a statically detectable fault may represent a fault
which will manifest itself through maintenance. For example, it is deemed to be good
coding practice to enclose all if, else, while, do, and other constructs within opening
and closing brackets. While not doing this does not directly lead to a failure, it can
lead to failures as maintenance occurs on the code segment. Thus, the maintenance
risk state can be set to “Yes” or “No”, indicating whether or not the given fault is a
maintenance risk. The maintenance risk only applies to those faults which are marked
to be valid by the fault validity portion of the network.
These parameters all feed into the fault risk node. This node represents the
risk associated with a given fault and can take on the states “Very High”, “High”,
“Medium”, “Low”, “Very Low”, and “None”.
6.2.3 Assessing Fault Manifestation Likelihood
In order for a fault to result in a failure, the fault itself must be executed in
a manner which will stimulate the fault to fail. Thus, the right upper half of the
104
reliability Bayesian Belief Network deals with code coverage and the code block during
program execution.
Code Execution Classification
The first subnetwork assumes that the code block with the statically detectable
fault has executed during testing. If the code block has been executed, the probability
of a fault manifesting itself can be related to the number of discrete paths through the
block which have been executed versus the theoretical number of paths through the
block. A fault that is detected may only occur if certain conditions are present, and
these conditions may only be present if a certain execution path has been followed.
By increasing the number of paths executed, the likelihood of a fault manifesting
itself is diminished. However, as has been noted by Hutchins et al.[HFGO94], even
full path coverage is insufficient to guarantee that a fault will not manifest itself, for
the fault may be data dependent based upon a parameter passed into the method1.
There are four principle states that a code block with a static analysis warning
can be in, namely “Block Executed”, “Method Executed”, “Block Reachable”, or
“Block Unreachable”. If a give code block has been executed, this means that at
least one path of execution has traversed the given code block during the testing
period, resulting in the node having the value “Block Executed”. A second state for
this node occurs when the method that contains the statically detectable fault has
been executed but the specific code block has not been executed by at least one path
1Tracing the entire program state is the basis for Automatic Anomaly detection, as has beenused in the DIDUCE tool[HL02] and the AMPLE[DLZ05] tools.
105
through the method, resulting in the state “Method Executed”. This would indicate
that the state values for the class or the method parameters passed in have not been
set properly to allow the execution of this path.
The last two states effectively deal with whether or not the code block containing
the statically detectable fault is reachable. For a Java method which has private scope
or a C method which has static visibility, this variable can only be “Block Reachable”
if there exists a direct call to the given method or function from within the scope of
the compilation unit. Otherwise, the method itself can not execute, and the value will
be “Block Unreachable”. For a Java method which has public or protected scope, or
a C method which has external linkage, it must be assumed that there is the potential
for the method to execute, and thus, by default, the node will have a value of “Block
Reachable”. “Block Unreachable” truly represents a rare state.
Fault Exposure Potential
The “Test Confidence” node serves to provide the capability to define the con-
fidence in the testing that has been used to obtain execution profiles. The testing
which is referred to reflects limited testing of the module for which the reliability is
being assessed. In the case of new version of a software component delivered from a
vendor, this would reflect best black box testing of the interface or functional testing
of the module. However, through the usage of execution trace capture, a white box
view of the component and the paths taken is obtained. This parameter allows the
evaluating engineer to adjust their confidence in the testing results based upon the
expected usage of the module in the field. As the engineer performing the reliability
106
analysis has more confidence that the results match what will be seen by a produc-
tion module, this value will be increased, reflecting less variance between the observed
coverage and the actual field coverage. Less confidence would indicate that more of
the unexecuted paths within the module would be expected to execute in the field.
The Fault Exposure potential relates the percentage of test paths covered and the
test confidence. When the test confidence is lowest and the percentage of executed
paths though the code block is lowest, this value will be highest. The value will be
lowest if the test confidence is very high and the percentage of executed paths is also
very high. In general, decreased test confidence will result in more variance in the
calculated network percentages as well.
The “Percent Paths Through Block Executed” node indicates what percentage of
the paths which pass through the code block containing the statically detectable fault
have been executed during testing. Based on an appropriate translation which scales
the number of paths through the code block relative to the percentages executed, this
node will have a value of “Very High”, “High”, “Medium”, “Low”, or “Very Low”.
Fault Execution Potential
The second subnetwork is based upon the premise that the code block containing
the statically detectable fault has not executed, but the containing method has exe-
cuted. In this case, the likelihood of this code executed can be related to the distance
to the nearest executed path as well as percentage of execution paths represented by
the nearest path.
The distance to the nearest path is measured in terms of the number of deci-
107
sions between the given code block containing a statically detectable fault and the
nearest executed path. Without additional knowledge, it is impossible to predict the
probability that a decision will result in a given outcome. Thus, the number of de-
cisions between the nearest executed path and the static analysis fault is effectively
a binomial distribution. This node can be represented by the states “Adjacent”, in
which the nearest executed path is only one decision away from the static analysis
fault, “Near”, “Far”, and “Very Far”. “None” is a placeholder state which is used to
indicate that the method itself has never been executed during program testing.
The percentage of paths through code block node represents the percentage of
paths through the method which pass through the code bock containing the statically
detectable fault. In the event that the method has not been executed, no assumption
to the probability of any given path executing can be assumed. All paths must
be assigned an equal probability of executing. Thus, this variable represents the
percentage of paths which lead into this program block. This node can have the
values “Very High”, “High”, “Medium”, “Low”, and “Very Low”.
The “Nearest Path Execution Percentage” node reflects the percentage of net exe-
cution paths which have gone through the nearest node. As this percentage increases,
there is an indication that more of the execution paths through the code are with in
a few decisions of this code block. Since there are more executions paths nearby, the
likelihood of reaching this code block is increased each time a nearby path is executed,
for only a few decisions may be required to be different in order to reach this location.
The Nearest Path Execution Percentage states are “Very High”, “High”, “Medium”,
“Low”, “Very Low”, and “None”.
108
The Fault Execution Potential node represents the potential of a code block con-
taining a statically detectable fault to be executed. As the distance from the nearest
path is lowest and the nearest execution path percentage is highest and the test con-
fidence is lowest, this value will be highest. These values will decrease as the distance
from the nearest execution path increases.
Net Fault Exposure Node
The net fault exposure node is switched based upon whether the code block has
been executed or not. If the code has been executed, this value will mirror that of
the Fault Exposure Potential node. If the code has not been executed, then this
parameter will reflect that of the Fault Execution Potential. This node can have the
values of “Very High”, “High”, “Medium”, “Low”, “Very Low”, and “None”.
6.2.4 Determining Reliability
Reliability for the code block is determined by combining the Fault Risk and the
Net Fault Exposure nodes together to form the Estimated Reliability Node. As the
fault risk increases and the net fault exposure increase, the overall reliability for the
code block will decrease. The net reliability for the block can therefore be expressed as
“Perfect”, “Very High”, “High”, “Medium”, “Low”, and “Very Low”. By default, a
code block which has no statically detectable faults shall have an Estimated Reliability
of “Perfect”.
109
Net reliability
The Fault Failed node will reflect whether or not the given fault has led to failure
during testing. Values for this node can be “Yes” or “No”, with the default value
being “No” unless an observed fault occurs.
In the event that a statically detected fault has actually failed during the limited
testing period, the estimated values using the Bayesian Belief network are replaced
with actual reliability values from testing. The actual Tested Reliability node reflects
the observed reliability of the software as it is related to this specific fault. Values
which can occur include “Very High”, “High”, “Medium”, “Low”, and “Very Low”.
The net Reliability node serves as a switch between the Tested Reliability observed
if a failure occurs and the Estimated Reliability node. In the event that a failure has
occurred, the value here will represent that of the Tested Reliability Node. Otherwise,
the Estimated reliability node will be mirrored.
Calibrated Net reliability
The Calibrated Net reliability node allows the user to calibrate the output of the
basic network relative to the actual system being analyzed. In essence, this node
serves as a constant multiplier to either increase of decrease the reliability measures
in order that the appropriate final values are obtained. While this capability exists
within the model, all testing thus far has used this node simply as a pass through node
where no change to the output probabilities from the Net Reliability node occur.
110
6.3 Multiple Faults In a Code Block
Determining the overall reliability for a code block is straight forward if there is
only a single statically detectable fault within the given code block. In this case, the
reliability of the code block would simply be the value output by the “Calibrated
Net Reliability” node of the Bayesian Belief Network. However, if there are multiple
statically detectable faults present, further processing is necessary to determine the
reliability of the given code block.
In traditional reliability modeling with two faults, the probability of failure can
be expressed as
P (F ) = Pf(F1) + Pf(F2) − Pf(F1) · Pf(F2) (6.1)
where Pf(F1) represents the probability that the first fault will fail on any given
execution and Pf(F2) represents the probability that the second fault will fail. If
an assumption of independence amongst faults failing is assumed, the probability of
failure for a system with two faults can be reduced to
P (F ) = Pf (F1) + Pf(F2) (6.2)
In this case,
P (F1|F2) = P (F2|F1) = 0 (6.3)
However, if a system has two dependant faults such that
P (F1|F2) = P (F2|F1) = 1 (6.4)
111
the probability of failure for the system can be reduced simply to
P (F ) = Pf(F1) = Pf(F2). (6.5)
This core concept of independence must be translated into the Bayesian belief net-
work provided and manifested in a meaningful manner. For organizational purposes,
statically detected faults are grouped and referenced using the taxonomy defined in
Appendix A. We have extended the Common Weakness Enumeration[MCJ05][MB06]
taxonomy. Thus, if two statically detectable faults are categorized into the same clas-
sification, it is assumed that they are multiple instances of the same core fault.
Figure 6-2: Simple Bayesian belief Network combining two statically detectable faults.
If the faults are not of the same type, then an estimation of the combinatorial
effect of their reliabilities must be obtained. In a traditional reliability model, this
112
is obtained by multiplying the reliabilities together for the two faults. However, this
model will use another Bayesian Belief Network system, as is shown in Figure 6-2.
Table 6.2: Network Combinatorial StatesNode Name State ValuesPrevious Typical Reliability Perfect, Very High, High, Medium, Low, Very LowPrevious Worst Case Reliability Perfect, Very High, High, Medium, Low, Very LowNew Fault Reliability Perfect, Very High, High, Medium, Low, Very LowNext Typical Reliability Perfect, Very High, High, Medium, Low, Very LowNext Worst Case Reliability Perfect, Very High, High, Medium, Low, Very Low
Table 6.3: Network Worst Case Combinatorial ProbabilitiesPreviousWorst Case New fault Next Worst Case ReliabilityReliability Reliability Perfect Very High High Average Low Very LowPerfect Perfect 1 0 0 0 0 0Perfect Very High 0 1 0 0 0 0Perfect High 0 0 1 0 0 0Perfect Average 0 0 0 1 0 0Perfect Low 0 0 0 0 1 0Perfect Very Low 0 0 0 0 0 1Very High Perfect 0 1 0 0 0 0Very High Very High 0 1 0 0 0 0Very High High 0 0 1 0 0 0Very High Average 0 0 0 1 0 0Very High Low 0 0 0 0 1 0Very High Very Low 0 0 0 0 0 1High Perfect 0 0 1 0 0 0High Very High 0 0 1 0 0 0High High 0 0 1 0 0 0High Average 0 0 0 1 0 0High Low 0 0 0 0 1 0High Very Low 0 0 0 0 0 1Average Perfect 0 0 0 1 0 0Average Very High 0 0 0 1 0 0Average High 0 0 0 1 0 0Average Average 0 0 0 1 0 0Average Low 0 0 0 0 1 0Average Very Low 0 0 0 0 0 1Low Perfect 0 0 0 0 1 0Low Very High 0 0 0 0 1 0Low High 0 0 0 0 1 0Low Average 0 0 0 0 1 0Low Low 0 0 0 0 1 0Low Very Low 0 0 0 0 0 1Very Low Perfect 0 0 0 0 0 1Very Low Very High 0 0 0 0 0 1Very Low High 0 0 0 0 0 1Very Low Average 0 0 0 0 0 1Very Low Low 0 0 0 0 0 1Very Low Very Low 0 0 0 0 0 1
In this network, the nodes have the states shown in Table 6.2. In a basic system
with one fault present, the Previous Typical and Worst Case Reliability values will
be initialized to “Perfect” and the New Fault reliability value will be initialized to the
reliability value obtained by a single instance of the static analysis reliability network.
The Next Typical and Next Worst case Reliabilities will be calculated based upon
113
Table 6.4: Network Typical Combinatorial ProbabilitiesPreviousTypical New fault Next Typical ReliabilityReliability Reliability Perfect Very High High Average Low Very LowPerfect Perfect 1 0 0 0 0 0Perfect Very High 0.5 0.5 0 0 0 0Perfect High 0 1 0 0 0 0Perfect Average 0 0.5 0.5 0 0 0Perfect Low 0 0 1 0 0 0Perfect Very Low 0 0 0.5 0.5 0 0Very High Perfect 0.5 0.5 0 0 0 0Very High Very High 0 1 0 0 0 0Very High High 0 0.5 0.5 0 0 0Very High Average 0 0 1 0 0 0Very High Low 0 0 0.5 0.5 0 0Very High Very Low 0 0 0 1 0 0High Perfect 0 1 0 0 0 0High Very High 0 0.5 0.5 0 0 0High High 0 0 1 0 0 0High Average 0 0 0.5 0.5 0 0High Low 0 0 0 1 0 0High Very Low 0 0 0 0.5 0.5 0Average Perfect 0 0.5 0.5 0 0 0Average Very High 0 0 1 0 0 0Average High 0 0 0.5 0.5 0 0Average Average 0 0 0 1 0 0Average Low 0 0 0 0.5 0.5 0Average Very Low 0 0 0 0 1 0Low Perfect 0 0 1 0 0 0Low Very High 0 0 0.5 0.5 0 0Low High 0 0 0 1 0 0Low Average 0 0 0 0.5 0.5 0Low Low 0 0 0 0 1 0Low Very Low 0 0 0 0 0.5 0.5Very Low Perfect 0 0 0.5 0.5 0 0Very Low Very High 0 0 0 1 0 0Very Low High 0 0 0 0.5 0.5 0Very Low Average 0 0 0 0 1 0Very Low Low 0 0 0 0 0.5 0.5Very Low Very Low 0 0 0 0 0 1
the conditional probabilities shown in Table 6.3 and Table 6.4.
This core network is expanded as necessary to combine all statically detectable
faults together. For each additional fault present within the system, there will be
one additional instance of this network, with the Previous Typical and Worst case
reliability values cascading from the previous instance of the network, as is shown in
Figure 6-3.
114
6.4 Combining Code Blocks to Obtain Net Relia-
bility
Once the reliability has been obtained for each code block within the method, it
is possible to obtain the overall reliability for each method. To do this, another set of
instances of the combinatorial network defined in Figure 6-2 is created. There will be
one combinatorial network created for each code block within the method. Eventually,
this will result in a complete network similar to that shown in Figure 6-4. This figure
shows the combination of 4 statically detectable faults present on two different code
blocks, which represents a very simple network. The majority of analyzed networks
typically contain upwards of 100 statically detectable faults present on 50 or more
code blocks, yielding the reliability of a single method.
115
Figure 6-3: Simple Bayesian belief Network combining four statically detectable faults.
116
Figure 6-4: Method combinatorial network showing the determination of the reliabilityfor a network with two blocks and four statically detectable faults.
Chapter 7
Method Combinatorial Network1
7.1 Introduction
Markov Models have long been used in the study of systems reliability. As such,
they have also been applied to software reliability modeling. Publications by Musa
[MIO90], Lyu [Lyu95], Rook [Roo90], The Reliability Analysis Center [Cen96], Grot-
tke [Gro01], and Xie [Xie91], Gokhale and Trivedi[GT97] and Trivedi [Tri02] all in-
clude extensive discussion on the usage of Markov Models for the calculation of Soft-
ware Reliability. One of the most commonly used models is that which has been
presented by Cheung[Che80]. In this model, a finite Markov chain with an absorbing
state is used to represent the execution of a software program. Each node of the
Markov model represents a group of executable statements having a single point of
entry and a single point of exit. The probability assigned to each edge represents the
probability that execution will follow the given path to the next node.
1Portions of this chapter have appeared in Schilling [Sch07].
117
118
Figure 7-1: A program flow graph.
Figure 7-1 represents a basic program flow graph. In this case, there is one node
within the flow graph which makes a decision (S1), and two possible execution paths
based upon that decision, (S2) and (S3) respectively. Reaching state (S4) indicates
that program execution has completed successfully. The transitions t1,2 and t1,3 repre-
sent the probability that program execution will follow the path S1 → S2 and S1 → S3
respectively.
By constructing a matrix P representing the transition probabilities for the pro-
gram, the average number of times that each state is visited can be obtained. Again
119
using the program flow exhibited in Figure 7-1, this matrix can be represented as
P =
0 t1,2 t1,3 0
0 0 0 t2,4
0 0 0 t3,4
0 0 0 1
(7.1)
where
t1,2 + t1,3 = 1.0 (7.2)
t2,4 = 1.0 (7.3)
t3,4 = 1.0 (7.4)
The average number of times each statement set is executed can be calculated
from the fundamental matrix. A Markov Chain with states S1, S2, . . . Sn, where Sn
is an absorbing state, and all other states are transient can be partitioned into the
relationship
P =
Q C
O 1
(7.5)
where
Q is an n − 1 by n − 1 matrix representing the transitional state probabilities,
C is a column vector, and
O is a row vector of n − 1 zeros.
120
The kth step transition probability matrix can be expressed as
P k =
Qk C ′
O 1
(7.6)
and will converge as k approaches infinity.
The fundamental matrix for the system can be defined as
M = (I − Q)−1 (7.7)
Returning to the initial problem, if one assigns the values t1,2 = .5 and t1,3 = .5
to the system, the matrix P has the values
P =
0 0.5 0.5 0
0 0 0 1
0 0 0 1
0 0 0 1
(7.8)
Q =
0 0.5 0.5
0 0 0
0 0 0
(7.9)
M = (I − Q)−1 =
1 0 0
0 1 0
0 0 1
−
0 0.5 0.5
0 0 0
0 0 0
−1
=
1 0.5 0.5
0 1 0
0 0 1
(7.10)
121
From this information, it can be concluded that on average, for each execution of
the program, node S2 will be visited 0.5 times and node S3 will be visited 0.5 times.
Figure 7-2: Program flow graph with internal loop.
If the control flow is modified slightly, as is shown in 7-2, the impact of the looping
construct of state S2 can be considered. If t2,2 == .75, then
M =
1 2.0 0.5
0 4 0
0 0 1
(7.11)
indicating that on average node S2 will be visited 2.0 times and node S3 will be visited
0.5 times. If t2,2 == .99, then
122
M =
1 50 0.5
0 4 0
0 0 1
(7.12)
indicating that on average node S2 will be visited 50 times and node S3 will be visited
0.5 times.
This capability can then be used to calculate the reliability of the program using
the relationships
R =∏
j
RVj
j (7.13)
ln R =∑
j
Vj ln Rj (7.14)
R = exp(∑
jVj ln Rj) . (7.15)
where
Rj represents the reliability of node j, and
Vj represents the average number of times node j is executed.
7.2 Problems with Markov Models
Markov models offer two distinct problems as the number of nodes increases. First,
solving Markov models requires extensive mathematical computation to calculate the
transpose of the transition matrix. This operation is an O(n2) operation, resulting in
extensive computation being necessary to compute the net reliability for the system.
123
Second, Markov Models also require accurate estimations for transition probabili-
ties and reliability values to be determined in order to construct the given model. In
many cases, it is not possible to accurately estimate these reliability values with an
appropriate degree of confidence in order for the Markov Model to be applied. What
is needed is a more general approach which, while providing reasonably accurate re-
sults, does not necessarily require the degree of precision necessary to use a Markov
model.
7.3 BBNs and the Cheung Model
Based on the rationale provided, our intent is to develop a system of Bayesian
Belief Networks which can be used to reliably predict the outcomes for a Markov
Model. The specific model which is to be modeled using a BBN is the reliability
model proposed by Cheung[Che80]. The Cheung model relates the net reliability
of the system to three factors, namely the number of nodes within a program, the
reliability of each node within the program, and the frequency of execution for each
node.
The number of nodes within a program represents a structural parameter of the
software being analyzed. It is not uncommon for a real world software project to
be comprised of several thousand routines, each of which would be represented as a
single node within the model.
The reliability of each node represents the probability that when a given node
executes, execution will continue to completion without failure. Reliability can either
124
be measured experimentally or estimated using one of many numerous techniques.
The execution frequency for each node represents, on average, how many times
the given node will be executed when the program is run. It can also represent the
number of times a method is invoked per unit of time. This value can either be
obtained experimentally or through static flow analysis of the software program. In
the first case, the program use case influences the results, which may result in a
more accurate reliability measurement. This is especially true if a common piece of
software is used in multiple environments, as the reliability may be vastly different
depending upon the execution environment. However, the second case provides a
better representation for software failure, in which a significant portion of failures
can be attributed to rarely executing exception handling routines.
7.4 Method Combinatorial BBN
The basic BBN relating the reliability of two Markov model nodes is shown in
Figure 7-3. The nodes Reliability A and reliability B represent the reliability of
the two program segments. The Coverage A and Coverage B represent the average
number of executions for the given node in one unit time period. The net reliability
is directly related to the reliability of each of the two nodes as well as the execution
rate for those nodes.
In order to use this BBN, it is necessary that the continuous values for reliability
and execution frequency be translated into discrete values which can then be further
processed. Since reliability values are often quite high, usually .9 or higher, it is often
125
Figure 7-3: Basic BBN for modeling a Markov Model.
more convenient to discuss reliability in terms of unreliability, which can be expressed
as
U = 1 − R (7.16)
where R represents the reliability of the system and U represents the resulting un-
reliability of the system. As most systems typically have multiple nines within the
reliability value, the U value will typically be followed by several zero values and then
a 1 value. This being the case, let
U = −1 · log10(1 − R) (7.17)
With this translation, each increase in the value of U by 1 represents a decrease by a
factor of 10 of the unreliability of the system.
As a general statement, the reliability for any properly tested system will be at
least .99. Mission critical or safety critical avionics systems require failure rates values
126
Table 7.1: Bayesian Belief Network States Defined for ReliabilityState Name Abbreviation R U
Perfect P 0.99999 ≤ R 5 < U
Very High V H 0.9999 ≤ R < 0.99999 4 ≤ U < 5
High H 0.999 ≤ R < 0.9999 3 ≤ U < 4
Medium M 0.99 ≤ R < 0.999 2 ≤ U < 3
Low L 0.9 ≤ R < 0.99 1 ≤ U < 2
Very Low V L 0.0 ≤ R < 0.9 U < 1
less than 10−9 failures per hour of operation[Tha96]. Software, in general, by its very
essence is typically limited to a minimum failure rate of 10−4 failures per hour of
operation. Specialized techniques, such as N-version programming, can be applied to
increase this value, but even the best software typically has a minimum failure rate of
10−5[Tha96] failures per hour of operation, or four orders of magnitude greater than
that which is required for mission critical systems deployment. Based on this concept,
the states shown in Table 7.1 have been defined for the Bayesian Belief Network.
Table 7.2: Bayesian Belief Network States Defined for Execution RateState Name Abbreviation V log10(V )
Very High V H 31.6 ≤ V 1.5 ≤ log10(V )
High H 3.16 ≤ V < 31.6 0.5 ≤ log10(V ) < 1.5
Medium M 0.316 ≤ V < 3.16 −0.5 ≤ log10(V ) < 0.5
Low L 0.0316 ≤ V < 0.136 −1.5 ≤ log10(V ) < −0.5
Very Low V L V < 0.0316 log10(V ) < −1.5
Referring back to the relationship
ln R =∑
j
Vj ln Rj (7.18)
it can be observed that the execution rate for the node is just as significant in the
net reliability value as the reliability of the nodes. A factor of ten difference in a
given Vj value will impact the net reliability by one order of magnitude. Because the
127
execution rate can vary significantly, and is not bounded by an upper or lower bound,
the values for the execution rate are best expressed in terms of the log10 value of the
execution rate. This behavior results in the state definitions shown in Table 7.2.
7.5 Experimental Model Validation
In order to validate the results of this model, an experiment was set up which
would use the existing Markov model to generate test cases. These test cases would
then be fed into the Bayesian Belief Network and evaluated against the expected
results from the Markov Model.
Figure 7-4: A program flow graph.
128
To accomplish this, a MatLab script was created which evaluated a Markov model
simulation using the Cheung[Che80] model and the program flow which is shown in
Figure 7-7. For simplicity, R1 and R4 were fixed at 1, indicating that there was no
probability of failure for the entry and exit nodes. R2 and R3 were independently
varied between the values of 0 and .999999 with a median value of .999. The param-
eters t3,2, t2,3, t3,4, and t2,4 were also varied independently. Altogether, this resulted
in a total of 47730 test vectors being generated and the value ranges shown in Table
7.3.
Table 7.3: Markov Model Parameter RangesParameter −1 × log10 r2 −1 × log10 r3 v2 v3 Net Reliability (Rnet)
Average 3.045 3.045 79.71 79.72 0.9548
Median 3 3 0.9999 0.9999 0.9914
STD 1.748 1.748 194.9 194.9 0.07225
Min 0 0 0.000012 0.000012 0
Max 6 6 997.1 997.1 0.999999
To evaluate the accuracy of the Bayesian Belief Network, a Java application was
developed using the EBayes[Coz99] core. This application used the same input pa-
rameters that the MatLab script used. The outputs of the Bayesian Belief network
were then compared with the expected values from the Markov model, creating error
values. Comparisons with the Markov model were done in the U domain, as this
allowed an accurate assessment of error across all magnitudes. This resulted in the
derived results shown in Table 7.4.
While the raw error values are important and indicate that the average error is
less than .5, or one half of the resolution of the model, a more thorough analysis of
the error can be obtained by looking at the number of test instances and the error
129
Table 7.4: Differences between the Markov Model reliability values and the BBN PredictedValues
Average 0.4156
Median 0.3799
STD 0.3212
Min 0.000031
Max 3.099
for those instances. Table 7.5 shows the number of test instances in which the error
fell within the documented bounds. 96.70% of the test cases had an error of less than
1.0 relative to the value calculated by the Markov Model in the U domain.
Table 7.5: Test Error ranges and countsError (E) E < 0.1 E < 0.25 E < 0.5 E < 0.75 E < 1.0
Count 5032 13284 28508 43078 46156
Percent of test cases 10.54% 27.83% 59.73% 90.25% 96.70%
The error in the Bayesian Belief network is normally distributed over the data
range, as is shown in Table 7.6.
Table 7.6: Test error relative to normal distributionZ value 0.5 1.0 2.0 3.0
Percentage 34.3% 72.1% 96.6% 98.7%
Normal Percentage 38.3% 68.2% 95.4% 99.7%
7.6 Extending the Bayesian Belief Network
While the network presented previously has been shown to be effective at per-
forming accurate reliability calculations, the network itself suffers from significant
limitations. Because the network can only compare two nodes at once, the program
being analyzed must either be limited to two nodes or two “non-perfect” nodes. This
limits the model itself to be a proof of concept model which can be used in academic
130
and theoretical settings.
Figure 7-5: Extended BBN for modeling a Markov Model.
However, by making a slight modification to the Bayesian Belief Network, it is
possible to extend the model so that it has broader application. This extension, shown
in Figure 7-5 incorporates a node which combines the net execution rate for the two
nodes. This value is scaled in the same manner as the execution rates for the two
input nodes.
By adding this additional node to the Belief network and assigning the appropri-
ate conditional probabilities, a virtually infinite number of execution nodes can be
assessed by structuring them in a manner similar to that shown in Figure 7-6. To
handle the case where there is a number of network nodes which is not equal to a
power of two, it is necessary to add a phantom state to the BBN for execution rate
131
Figure 7-6: Extended BBN allowing up to n nodes to be assessed for reliability
which indicates that the given node never executes and the output of the network
should only be dependent upon the other node values. This results in the modified
states for the execution rate as is shown in Table 7.7.
Table 7.7: Extended Bayesian Belief Network States Defined for Execution RateState Name State Abbreviation V log10(V )
Very High V H 31.6 ≤ V 1.5 ≤ log10(V )
High H 3.16 ≤ V < 31.6 0.5 ≤ log10(V ) < 1.5
Medium M 0.316 ≤ V < 3.16 −0.5 ≤ log10(V ) < 0.5
Low L 0.0316 ≤ V < 0.136 −1.5 ≤ log10(V ) < −0.5
Very Low V L 0 < V < 0.0316 log10(V ) < −1.5
Never N V = 0 N/A
7.7 Extended Network Verification
In order to validate the results of this model, an experiment was set up which
would use the existing Markov model to generate test cases. These test cases would
132
then be fed into the Bayesian Belief Network and evaluated against the expected
results from the Markov Model.
Figure 7-7: Extended program flow graph.
To accomplish this, a MatLab script was created which evaluated a Markov model
simulation using the Cheung[Che80] model and the program flow which is shown in
Figure 7-7. For simplicity, R1 and R4 were fixed at 1, indicating that there was no
probability of failure for the entry and exit nodes. R2 and R3 were independently
varied between the values of 0 and .999999 with a median value of .999. The param-
eters t3,2, t2,3, t3,4, and t2,4 were also varied independently. Altogether, this resulted
in a total of 47730 test vectors being generated and the value rages shown in Table
7.8.
133
Table 7.8: Markov Model Parameter RangesParameter −1 × log10 r2 −1 × log10 r3 v2 v3 Net Reliability
(Rnet)
Average 3.045 3.045 79.71 79.72 0.9548
Median 3 3 0.9999 0.9999 0.9914
STD 1.748 1.748 194.9 194.9 0.07225
Min 0 0 0.000012 0.000012 0
Max 6 6 997.1 997.1 0.999999
To evaluate the accuracy of the Bayesian Belief Network, a Java application was
developed using the EBayes[Coz99] core. This application used the same input pa-
rameters that the MatLab script used. The outputs of the Bayesian Belief network
were then compared with the expected values from the Markov model, creating error
values. Comparisons with the Markov model were done in the U domain, as this
allowed an accurate assessment of error across all magnitudes. This resulted in the
derived results shown in Table 7.9.
Table 7.9: Differences between the Markov Model reliability values and the BBN PredictedValues
Average 0.4156
Median 0.3799
STD 0.3212
Min 0.000031
Max 3.099
While the raw error values are important and indicate that the average error is
less than .5, or one half of the resolution of the model, a more thorough analysis of
the error can be obtained by looking at the number of test instances and the error
for those instances. Table 7.10 shows the number of test instances in which the error
fell within the documented bounds. 96.70% of the test cases had an error of less than
1.0 relative to the value calculated by the Markov Model in the U domain.
134
Table 7.10: Test Error ranges and countsError (E) E < 0.1 E < 0.25 E < 0.5 E < 0.75 E < 1.0
Count 5032 13284 28508 43078 46156
Percent of test cases 10.54% 27.83% 59.73% 90.25% 96.70%
The error in the Bayesian Belief network is normally distributed over the data
range, as is shown in Table 7.11.
Table 7.11: Test error relative to normal distributionZ value 0.5 1.0 2.0 3.0
Percentage 34.3% 72.1% 96.6% 98.7%
Normal Percentage 38.3% 68.2% 95.4% 99.7%
7.8 Summary
This chapter has demonstrated that Bayesian Belief Networks can be used as
a substitute for complete Markov Models when one is assessing the reliability of a
software package. The network presented is capable of estimating the output of a
Markov Model for software reliability within one order of magnitude when using five
Bayesian Belief nodes per pair of program nodes.
It is certainly possible to improve the accuracy by increasing the number of states
used when converting the continuous reliability and execution probability variables
into discrete states. However, for systems in which the exact reliability parameters
are not known, the resolution provided by this network is sufficient to provide an
estimate of the net reliability. One area for future research certainly is to analyze the
effect of increasing the number of states relative to the increased precision that would
be obtained by this method.
Chapter 8
The Software Static Analysis
Reliability Tool1
In order to use the proposed software reliability model, a Software Static Analysis
Reliability Tool (SOSART) has been developed. The SOSART tool combines static
analysis results, coverage metrics, and source code into a readily understandable
interface, as well as for reliability calculation. However, before discussing the details
of the SOSART tool, it is important to understand the capabilities and limitations
of currently existing reliability tools.
8.1 Existing Software Reliability Analysis Tools
In the study of software reliability, there have been many tools that have been
developed to allow assessment and analysis. This section is intended to provide a
1Portions of this chapter have appeared in Schilling and Alam[SA06a].
135
136
brief overview of the existing tools and their capabilities in order that they can be
compared with the SOSART analysis tool developed as part of this research.
The Statistical Modeling and Estimation of Reliability Functions for Software
(SMERFS) [EKN98] [Wal01] estimates reliability of software systems using a black-
box approach. It provides a range of reliability models. The original tool used a tex-
tual based interface and operated predominantly in the UNIX environment. However,
the latest version of SMERFS, SMERFS3[Sto05], operates in a Windows environment
and includes a Graphical User Interface. SMERFS3 also supports extended function-
ality in the form of additional models for both hardware and software reliability. One
important feature missing from SMERFS is the capability to automatically collect
data. This can make it difficult to use with larger projects.
The Computer-Aided Software Reliability Estimation [SU99] (CASRE) is quite
similar to SMERFS in that it is also a black-box tool for software reliability estimation.
CASRE supports many of the same models supported by SMERFS. CASRE operates
in a Windows environment and does have a GUI. One significant feature included
in CASRE is calculating the linear combination of multiple models, allowing the
construction of more complex models.
The Automated Test Analysis for C (ATAC)[HLL94] evaluates the efficacy of
software tests using code coverage metrics. This represents the first tool discussed
that is a white-box tool. The command line tool uses a specialized compiler (atacCC)
which instruments compiled binaries to collect run-time trace information. The run-
time trace file records block coverage, decision coverage, c-use, p-use, etc. The tool,
assesses test completeness and visually displays lines of code not exercised by tests.
137
ATAC only functions on C code.
Software Reliability and Estimation Prediction Tool (SREPT)[RGT00] allows the
assessment software reliability across multiple lifecycle stages. Early reliability pre-
dictions are achieved through static complexity metric modeling and later estimates
include testing failure data. SREPT can estimate reliability as soon as the softwares
architecture has been developed, and the tool can also be used to estimate release
times based on project data and other criteria.
The Reliability of Basic and Ultra-reliable Software Tool (ROBUST)[Den99][LM95]
supports five different software reliability growth (SRG) models. Two of the four
models can be used with static metrics for estimation during the early stages of de-
velopment, while one model includes test coverage metrics. ROBUST operates on
data sets of failure times, intervals, or coverage. Data may be displayed in text form
or in an appropriate graphical form.
The “Good-Enough” Reliability Tool (GERT)[DZN+04] provides a means of calcu-
lating software reliability estimates and of quantifying the uncertainty in the estimate.
The tool combines static source code metrics with dynamic test coverage information.
The estimate and the confidence interval is built using the Software Testing and Re-
liability Early Warning (STREW)[Nag05] in-process metric suite. GERT provides
color-coded feedback on the thoroughness of the testing effort relative to prior suc-
cessful projects. GERT is available as an open source plug-in under the Common
Public License (CPL) for the open source Eclipse development environment. GERT
has been extended to Version 2 [SDWV05], providing additional metrics as well as
better data matching.
138
Thusfar, each of the tools discussed has lacked the ability to interact with static
analysis tools. The AWARE tool[SWX05] [HW06], developed by North Carolina State
University, however, does interact with static analysis tools. Developed as a plug in
for the Eclipse development environment, AWARE interfaces with the Findbugs static
analysis tool. However, whereas the key intent of the tools discussed previously is to
directly aid in software reliability assessment, the AWARE tool is intended to help
software engineers in prioritizing statically detectable faults based upon the likelihood
of them being either valid or a false positive. AWARE is also limited in that it only
supports the Findbugs static analysis tool, and the user interface consists of a basic
textual listing of faults displayed in an Eclipse environment.
8.2 Sosart Concept
To effectively use the model previously developed for all but the smallest of pro-
grams requires the development of an appropriate analysis tool. This tool will be re-
sponsible for integrating source code analysis, test watchpoint generation, and static
analysis importation.
The first responsibility for the SoSART tool is to act as a bug finding meta tool
which automatically combines and correlates statically detectable faults from different
static analysis tools. It should be noted that though, while many examples given
previously in this dissertation were examples of statically detectable C faults, the
general software reliability model is not intended to be language dependent. As such,
the first application for the tool involves an analysis of a Java application. Thus, the
139
tool itself must be constructed in a manner to allow multiple programming languages
to be imported if the appropriate parsers are constructed.
Beyond being a meta tool, however, SoSART is also an execution trace analysis
tool. The Schilling and Alam model requires detailed coverage information for each
method in order to assess the reliability of a given software package. The SoSART
tool thus includes a customized execution trace recording system which captures and
analyzes execution traces to generate branch execution metrics. While similar in
nature to the ATAC tool discussed previously in Section 8.1, the SOSART trace tool
does not require code instrumentation during the compile phase. Instead, for Java
programs, it interfaces with the Java Platform Debugger Architecture (JPDA) and
the Java Debug Interface (JDI). This allows any Java program to be analyzed without
recompilation or modification to the source code.
SoSART also consists of a complete language parser and analyzer which has been
implemented using the ANTLR[Par] toolkit. The parser breaks the source code into
the fundamental structural elements for the model, namely classes, methods, state-
ment blocks, and decisions. A class represents the highest level of organization and
consists of one or more methods. A statement block represents a continuous set of
source code instructions uninterrupted by a conditional statement. One or more state-
ment blocks, coupled with the appropriate conditional decisions, makes up a method.
When determining the path coverage, the SoSART tool uses the parsed information
to determine which execution traces match a given path through a given method.
As a side effect of parsing, SoSART is also capable of generating limited structural
metrics.
140
The SoSART user interface consists of two portions, a command line tool and
a graphic user interface. The command line toolkit allows users to execute Java
programs overtop of the SOSART system while it collects coverage information for
analysis usage.
SoSART also includes a graphical user interface built using the JGraph toolkit[Ben06].
This Graphical User Interface allows the generation of pseudo-UML Activity diagrams
for each method, as well as displaying the static faults, branch coverage, and struc-
tural metrics for each method. Statically detectable faults are identified through a
four color scheme, with green characterizing a statement block with no known stati-
cally detectable faults, yellow indicating a slight risk of failure within that statement
block due to a statically detectable fault, orange indicating an increased risk over yel-
low, and red indicating a serious potential for failure within the code segment. Color
coding uses gradients to indicate both the most significant risk identified as well as
the typical risk. The overall reliability is calculated by combining the observed ex-
ecution paths with the potential execution paths and the statically detectable fault
locations.
General software requirements for the SOSART tool are provided in Appendix B.
8.3 SOSART Implementation Metrics
Development of the SOSART tool, per the development process requirements,
used a process derived from the PSP process for most areas of development. This was
applied for all areas of the tool which were not considered to be “research intensive”,
141
such as the development of the specific Bayesian Belief Networks, the development of
the ANTLR parser which was viewed as a learning process, and other similar areas.
Altogether, the effort expended in the development of the SOSART tool is provided
in Table 8.1. Effort has been recorded to the nearest quarter hour.
Table 8.1: SOSART Development MetricsCategory Effort (Hours) Percentage Total
Planning 92.25 9.23%
High Level Design 262 26.24%
High Level Design Review 37.25 3.74%
Detailed Design 154.5 15.47%
Detailed Design Review 39.25 3.93%
Implementation1 224.75 22.5%
Code Review 29.5 2.96%
Testing 113.5 11.37%
Post Mortum and Project Tracking 26.75 2.68%
Debugging 18.5 1.85%
Total 998.25 100%1This data item combines both the implementation and compile phasesof a PSP project.
Table 8.2 provides performance metrics regarding the implementation of the SOSART
tool in terms of the actual design and implementation complexity. With the exception
of several parameters that were contained within autogenerated code, all implemen-
tation metrics are within standard accepted ranges. Using the value of 30359 LOC
for the tool and the relationship
productivity =LOC
time(8.1)
the productivity was 30.41 lines of code per hour. This number is extremely high
relative to the typical 10-12 LOC per hour expected for commercial grade production
code, but this can be explained by the nature and composition of the project. First,
142
Table 8.2: SOSART Overview MetricsMetric Description Core SOSART Path Trace Total
Number of Packages 20 1 21
Total Lines of Code 29182 1177 30359
Non-ANTLR Lines of Code 14424 1177 15601
Number of Classes 167 13 180
Number of Static Methods 151 9 160
Number of Attributes 571 51 622
Number of Overridden Methods 105 13 118
Number of Static Attributes 48 12 60
Number of Methods 1825 152 1977
Number of Defined Interfaces 25 2 27
Average McCabe Cyclomatic Complexity 3.55 2.13 N/A
Maximum Cyclomatic Complexity 171*/31 17 31
Average Nesting Depth 1.74 1.53 N/A
Maximum Nesting Depth 13**/9 5 9
Depth of Inheritance (Average) 2.641 1.30 N/A
Depth of Inheritance (Maximum) 7 3 7
* Complexity contained within ANTLR autogenerated code module.** Complexity contained within ANTLR autogenerated code module.
since the tool was an experimentally developed tool, the amount of time spent devel-
oping test plans and executing test plans was significantly less than would be expected
for a production grade project. Overall, this would result in a decrease in produc-
tivity if a professional grade development process were being followed. Second, the
usage of the ANTLR tool automatically generated a significant portion of the source
code. The ANTLR package itself contains 14758 lines of code. While a significant
amount of development was required to create the language description for ANTLR,
if these lines of code are removed from consideration, the productivity drops to 15.62
LOC per hour, or much closer to industry accepted values. Third and finally, while
reviews of code were conducted on the material as it was constructed, no significant
peer reviews occurred. Properly reviewing a program of this size, assuming a review
rate of 100 LOC per hour, would require an additional 144 hours of effort, bringing
143
the net total effort to 1142 hours, and the effective productivity to 12.6 LOC / hour.
Bug tracking for the SOSART tool was handled using the SourceForge bug track-
ing system. Any bug which was discovered after testing and the completion of integra-
tion into the development tip was tracked using the bug tracking database. Overall,
18 post-release defects were uncovered and tracked in this manner, resulting in a
post-release defect rate of 0.06 defects per KLOC. This number is extremely low, and
it is suspected that further significant defects will be uncovered as the tool is further
used in the field by different researchers.
8.4 External Software Packages Used within SOSART
In order to streamline development of the SOSART analysis tool, as well as ensure
an appropriate level of quality in the final delivered tool, three external components
were used within the development of the SOSART tool, namely the ANTLR parser
generator, the JGraph graphing routines, and the EBAYES Bayesian belief engine.
Another Tool for Language Recognition (ANTLR)[Par] is a language tool which
provides the required framework to constructing recognizers, compilers, and transla-
tors. The tool uses a grammar definition which contains Java, C#, Python, or C++
actions. ANTLR was chosen because of its availability under the BSD license as well
as a readily available grammar definition for the Java Programming language which
can be readily expanded upon. The ANTLR software was used to generate the parser
for the Java input code, appropriately separating Java files into classes and methods
as well as partitioning methods into code blocks and decisions.
144
JGraph[Ben06] is an open source graph visualization library developed entirely in
the Java language. It is fully Swing compatible in both its visual interface as well as
its design paradigm, and can run on any JVM 1.4 or later. JGraph is used principally
in the user interface area of the SOSART tool, allowing the visualization of method
flows as well as managing the layout of the UML activity diagrams.
EBayes[Coz99] is an optimized Java engine for the calculation of Bayesian Net-
works. Its goal was to develop an engine small enough and efficient enough to perform
Bayesian Network analysis on small embedded systems microprocessors. The engine
is derived from the JavaBayes[Coz01] system which was used for the conceptual gen-
eration of the Bayesian belief networks used in this research and within the SOSART
tool. The EBAYES engine was used to calculate the Bayesian Belief network values
within the SOSART tool.
8.5 SOSART Usage
8.5.1 Commencing Analysis
The general operation for the SOSART tool begins with the user obtaining a pre-
viously existing module to analyze. The module should compile successfully without
any significant compiler warnings. Assuming the user is using the GUI version of the
tool, the user will start the SOSART tool. Upon successful start, the user will be
provided with a standard graphical menu allowing for appropriate selections to be
made.
145
Normal operation begins by the user selecting the Java files that are to be imported
and analyzed using the analysis menu, as is shown in Figure 8-1. Any number of files
can be selected for importation using the standard file dialog box so long as the
files reside within the same directory path. In addition to allowing the importation
of multiple files at one time, it is also possible to import multiple sets of files by
importing each set individually. The tool itself will protect against a single file being
imported multiple times into the project based upon the Java file name.
Figure 8-1: Analysis menu used to import Java source code files.
As the source files are imported into the given source tool, several things occur.
First and foremost, the class is parsed and separated into existing methods. Summary
panel data about each class is generated and this information is used to populate the
class summary panel, as is shown in Figure 8-2.
In addition to the summery panel, the key user interface for each imported class
146
Figure 8-2: Summary panel for imported class..
consists of a tabbed display, with on tab showing the source code, as is shown in
Figure 8-3. Importation will also generate a basic UML activity diagram or control
flow diagram for each method, as is shown in Figure 8-4.
147
Figure 8-3: Source Code Panel for program.
148
Figure 8-4: Basic Activity Diagram for program.
149
8.5.2 Obtaining Program Execution Profiles
In order to obtain the program execution profile for the given source code module,
it is necessary to obtain from the structure the line numbers which represent the start
of each code segment. The SOSART tool provides such capability. The watchpoints
are generated as a textual file by the SOSART GUI, and then these are fed to the
command line profiler while actually executes the program modules and obtains the
execution traces.
Figure 8-5: Basic Tracepoint Panel for program.
For the source code module loaded in Figure 8-3, there are many locations which
represent the start of a code block. Examples include Line 199, Line 216, Line 227,
etc. In order to obtain a listing of these locations, the tracepoint window is opened
from the analysis window. The Generate Tracepoints button will, based upon the
parsed source code, generate a listing of tracepoint locations, as is shown in Figure
8-5.
Once the tracepoint locations have been defined, the Java Path Tracer portion of
the SOSART tool can be invoked from the command line. This tool uses a set of
command line parameters to indicate the classpath for the program which is to be
executed, the trace output file, the path output file, the tracepoint file which defines
150
the locations that are to be observed for program execution, as well as other requisite
parameters. These parameters are shown in detail in Figure 8-6, which represents the
manual page displayed when executing the tool from the command line if an improper
set of parameters is provided.
wws@WWS-Ubunto:~/GRC_Tempest/TempestSrc$ java -jar JavaPathTracer.jar
<class> missing Usage: java Trace <options>
<class> <args> <options> are:
-classpath <path> This parameter sets up the classpath for the JVM running as a JDB process.
-traceoutputfile <filename> Outputs a complete execution trace to a given file. Warning: Files may be large.
-pathoutputfile <filename> This is the output file that is to be used for placing the xml output of the paths traversed.
-tracepointfile <filename> This option will set up the given tracepoint file, indicating which tracepoints are to be logged.
-showtracepointsetup This option will turn on the display of the tracepoints as they are set.
-periodicpathoutput <rate> This option will enable periodic output of the paths traversed. The rate is given in seconds.
-stampedperiodicfilename This option will enable periodic output of the paths traversed. The rate is given in seconds.
-maxloopdepth This option sets the maximum number of times a loop will be recorded when tracing. The default is 0
which results in infinite tracing through loops.
-help Print this help message
<class> is the program to trace <args> are the arguments to <class>
Figure 8-6: Java Tracer command line usage.
By supplying the appropriate tracepoint parameters on the command line, the tool
can be invoked to analyze a given program set. In the case of the example program,
this results in the command line log as is shown in Figure 8-7.
wws@WWS-Ubunto:~/GRC_Tempest/TempestSrc$ java -jar JavaPathTracer.jar -pathoutputfile demo_tracer.xml -tracepointfile demo_trace.txt
-showtracepointsetup -periodicpathoutput 60 -maxloopdepth 1 -stampedperiodicfilename gov.nasa.grc.ewt.server.Tempest 9000 noauth
nolog nodebug nopersist
Deferring breakpoint gov.nasa.grc.ewt.server.HTTPString:1007.
It will be set after the class is loaded.
Deferring breakpoint gov.nasa.grc.ewt.server.HTTPString:1011.
It will be set after the class is loaded.
Deferring breakpoint gov.nasa.grc.ewt.server.HTTPString:1015.
...
Deferring breakpoint gov.nasa.grc.ewt.server.HTTPString:996.
It will be set after the class is loaded.
main
Starting Tempest Java $Revision: 1.3.2.1.4.2 $ at Tuesday, 08 May 2007 11:24:35 EDT
thread 1 about to accept on port 9000
thread 4 about to accept on port 9000
thread 3 about to accept on port 9000
thread 6 about to accept on port 9000
thread 5 about to accept on port 9000Set deferred breakpoint gov.nasa.grc.ewt.server.HTTPString:996
Set deferred breakpoint gov.nasa.grc.ewt.server.HTTPString:993
Set deferred breakpoint gov.nasa.grc.ewt.server.HTTPString:989
...
Set deferred breakpoint gov.nasa.grc.ewt.server.HTTPString:1007
thread 1 accepted a connection from 192.168.0.152 192.168.0.152
thread 7 accepted a connection from 192.168.0.152 192.168.0.152
thread 8 accepted a connection from 192.168.0.151 192.168.0.151
thread 8 about to accept on port 9000
thread 1 accepted a connection from 192.168.0.151 192.168.0.151
thread 7 accepted a connection from 192.168.0.103 192.168.0.103
wws@WWS-Ubunto:~/GRC_Tempest/TempestSrc$
Figure 8-7: Java Tracer execution example.
In this particular instance, the most important output is the XML gathered trace
information showing how many times each of the methods was executed and which
151
paths through the method were taken. A short example of this is shown in Figure
8-8. This figure represents the execution traces obtained during approximately one
minute (60046ms to be exact) of program execution. The SOSART tool allows this
textual trace representation to be imported into the tool, as is shown in Figure 8-
9. In addition to the number of executions for each path being shown numerically
and by color, the relative execution number is shown graphically through thicker
and thinner execution path traces. Those paths which are taken more often have a
thicker execution trace line. Those paths which are executed less often have a thinner
execution path trace.
152
<Program name="Test Program" >
<StartTime value="1178637874426"/>
<CurrentTime value="1178637934472"/>
<ClockTimeDuration value="60046"/>
<SourceFile filename="HTTPString.java" >
<Method filename="HTTPString.java" methodName="<init>" methodType="normal" >
<ExecutionPath method="<init>" path="<init>" pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="_getResponseMessage" methodType="normal" >
<ExecutionPath method="_getResponseMessage" path="_getResponseMessage 693 705 887" pathcount="11" />
<ExecutionPath method="_getResponseMessage" path="_getResponseMessage 693 790 887" pathcount="1" /></Method>
<Method filename="HTTPString.java" methodName="checkAuthorization" methodType="normal" >
<ExecutionPath method="checkAuthorization" path="checkAuthorization 569 611 634 637" pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="findMatch" methodType="normal" >
<ExecutionPath method="findMatch" path="findMatch 1109 1124" pathcount="1" /></Method>
<Method filename="HTTPString.java" methodName="getLocalFile" methodType="normal" >
<ExecutionPath method="getLocalFile" path="getLocalFile 1066 1072 1078" pathcount="10" /></Method>
<Method filename="HTTPString.java" methodName="process" methodType="normal" >
<ExecutionPath method="process" path="process
199 227 235 253 256 259 275 312 333 460 462 476
491 499 515 517 527 529 546 552 555 558 562" pathcount="3" />
<ExecutionPath method="process" path="process
199 227 235 253 256 259 275 312 333 460 462 491
499 515 517 527 533 546 552 555 558 562" pathcount="4" />
<ExecutionPath method="process" path="process
199 227 235 253 256 259 275 312 333 460 462 491
499 562" pathcount="1" />
<ExecutionPath method="process" path="process
199 227 235 253 256 259 275 312 333 470 488 499
515 517 527 533 546 552 555 558 562" pathcount="2" />
</Method>
<Method filename="HTTPString.java" methodName="processString" methodType="normal" >
<ExecutionPath method="processString" path="processString 1029 1037 1042 1045 1052 1057" pathcount="1" />
<ExecutionPath method="processString" path="processString 1029 1037 1042 1045 1057" pathcount="2" />
<ExecutionPath method="processString" path="processString 1029 1037 1042 1045 546 552 555 558 562"
pathcount="1" />
</Method>
<Method filename="HTTPString.java" methodName="replaceString" methodType="normal" >
<ExecutionPath method="replaceString" path="replaceString 1085 1096 1099 1057" pathcount="1" />
<ExecutionPath method="replaceString" path="replaceString 1085 1113 1124" pathcount="2" />
<ExecutionPath method="replaceString" path="replaceString 1085 1119 1124" pathcount="2" />
</Method>
<Method filename="HTTPString.java" methodName="sendContentHeaders" methodType="normal" >
<ExecutionPath method="sendContentHeaders" path="sendContentHeaders 942 949 952 960 964 970 973"
pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="sendExtraHeader" methodType="normal" >
<ExecutionPath method="sendExtraHeader" path="sendExtraHeader 982 989 996" pathcount="10" /></Method>
<Method filename="HTTPString.java" methodName="sendFirstHeaders" methodType="normal" >
<ExecutionPath method="sendFirstHeaders" path="sendFirstHeaders 893 896 901 909 916 923 932"
pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="sendResponse" methodType="normal" >
<ExecutionPath method="sendResponse" path="sendResponse 1007 1015" pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="setAddress" methodType="normal" >
<ExecutionPath method="setAddress" path="setAddress 187" pathcount="11" /></Method>
<Method filename="HTTPString.java" methodName="verifyClient" methodType="normal" >
<ExecutionPath method="verifyClient" path="verifyClient 644 675" pathcount="11" /></Method>
</SourceFile>
</Program>
Figure 8-8: XML file showing program execution for HTTPString.java class.
153
Figure 8-9: Execution trace within the SOSART tool. Note that each path through thecode which has been executed is designated by a different color.
154
8.5.3 Importing Static Analysis Warnings
In order for SOSART to perform its intended purpose and act as a static analysis
Metadata tool, the tool must support the capability to import and analyze statically
detectable faults. Because of the vast variety of static analysis tools on the market,
and the many different manners in which they can be executed, the SOSART tool
does not automatically invoke the static analysis tools. Instead, it is expected that
the static analysis tools will be executed independent of the SOSART tool, through
the code compilation process or by another external tool prior to analyzing the source
code with SOSART.
Table 8.3: A listing of SOSART supported static analysis tools.Software Tool Domain Responsible Party
ESC-Java Academic
Software Engineering with Ap-plied Formal Methods GroupDepartment of Computer ScienceUniversity College Dublin
FindBugs Academic University of MarylandFortify Source CodeAnalysis (SCA) Commercial Fortify Software
JiveLint Commercial Sureshot SoftwareKlocwork K7 Commercial KlocworkLint4j Academic jutils.com
JLint* AcademicKonstantin KnizhnikCyrille Artho
PMD AcademicAvailable from SourceForge withBSD License
QAJ Commercial Programming Research Limited*Modified to generate an XML output file.
Once the analysis tools have been run, importation of the faults begins by im-
porting the source code that is to be analyzed into the SOSART tool, as has been
described previously. Once this has been completed, the statically detectable faults
detectives by each of the executed static analysis tools are imported into the program.
Static analysis tools supported by SOSART are given in Table 8.3.
As each statically detectable warning is imported, it is compared with the warn-
ings which have been previously imported into the SOSART tool and assigned a
taxonomical definition. If a warning has not been previously assigned a taxonomical
155
Figure 8-10: Taxonomy assignment panel.
Figure 8-11: A second taxonomy definition panel.
definition, a dialog box prompts the user to assign the fault to an appropriate def-
inition. An example of this dialog box is shown in Figure 8-10. In this particular
instance, the first instance of the PMD fault has been imported warning against a
method having multiple return statements. From a taxonomy standpoint, using the
SOSART taxonomy this fault can be classified as a General Logic Problem, as there is
no specific categorization defined for this type of fault. In general, this fault exhibits
a very low potential for immediate failure, though there is a maintenance risk associ-
ated with this fault, as methods which have multiple returns can be more difficult to
maintain over time versus methods which only contain a single return statement.
156
In the case of the fault defined in Figure 8-11, this represents a PMD fault where
the programmer may have confused the Java equality operator (==) with the assign-
ment operator(=). At a high level, this represents a Medium risk of failure. Upon
future review certain cases may be found to have a more significant risk of failure.
This fault is assigned to the taxonomical definition “Comparison and Assignment
confused”, as this best represents the area of the fault.
Once all faults have been imported and assigned to the appropriate taxonomical
definitions, the Activity Diagrams / Control flow diagrams for each method are up-
dated to include the statically detectable faults. This results in a display similar to
that shown in Figure 8-12. This display shows the respective anticipated reliability
for each block as a color code. Red indicates the most significant risk of failure,
while green indicates that the code segment is relatively safe from the risk of failure.
Orange and yellow reflect intermediate risks in between green and red, with yellow
being slightly more risky than green and orange being slightly less risky than red. In
addition to marking the fault as valid or invalid, this panel can also be used to mod-
ify the immediate failure risk, the maintenance risk, whether the fault failed during
testing, and fault related reliability which have been set by the taxonomy when the
fault was imported.
Each code segment block and listing of static analysis warnings can have a two
color gradient present. One color of the gradient represents the typical risk associated
with the given code block. The second color represents the maximum risk associated
with that code block. For example, a code block which contains both green and
orange colors indicates that the code block typically has very little risk associated
157
with it, but there is the potential that under certain circumstances, a significantly
high amount of risk exists. These colors are driven by the Bayesian Belief Network
described previously in Chapter 6.
By clicking on the static analysis listing, the software engineer can open a panel
which can then be used to mark a fault as either “Valid”, “Invalid”, or “Unverified”,
as is shown in Figure 8-13. Faults which are deemed to be invalid will be converted
into a grey background the next time the panel is opened. Faults which are valid
but have no risk will have a green background, and faults which are valid but have
a higher risk associated with them will be displayed with an appropriate background
color of either green, yellow, orange, or red, depending upon their inherent risk.
In addition to invoking this display panel from the method activity diagram /
program data flow diagrams, this same display panel can be invoked from the report
menu option. However, when invoking this display from the report menu, there are
a few differences. the report menu option has two potential selection values, namely
the class level faults option or the all file faults option. Whereas clicking on the code
segment on the activity diagram only shows the faults which are related to the given
code block, selecting the item from the menu displays either those faults which are
detectable at the class level (and thus, are not assigned to a given method) or all
statically detectable faults within the file. In either case, the behavior of the panel is
the same, just the faults that are displayed differs.
The report menu also allows the user to view report data about the project and
the distribution of statically detectable faults. For example, the report shown in
Figure 8-14 provides a complete report of the statically detectable faults which are
158
present within the HTTPString.java file of this project. The first three columns deal
with warning counts. The first column shows the number of faults of each type which
have been detected in the overall file. The second column indicates the number of
faults which have been deemed to be valid upon inspection of the fault, and the third
column indicates the number of faults which have been deemed to be invalid based
upon project inspection. The percent valid column indicates the number of faults of
each type which have been determined to be valid upon inspection. The number of
statements counts the number of executable statements found within this source code
module. the last two columns calculate the density of the warnings being detected
and found valid versus the number of statements.
This report can be generated for three different sets of data. The first set, de-
scribed previously, generates this report for the currently opened file. This same
report can also be generated at the project scope which encompasses all files that
are being analyzed. Finally, this report can be generated based upon historical data,
which encompasses all previously analyzed projects.
8.5.4 Historical Database
In order to allow the appropriate data retention and historical profiling so that the
SOSART tool improves over time, SOSART includes a built in historical database
system. When used properly, the database system allows the user to store past
information regarding the validity of previously detected faults over multiple projects.
Furthermore, to allow project families to be created, the database system can be
159
segmented to allow different database sets to be used based on the project. Even
though the system is referred to as a database, the current SOSART implementation
does not actually require the usage of a separately installed database, such as MySQL.
The analyze menu contains many of the parameters necessary to use the historical
database capabilities to analyze a given project, as is shown in Figure 8-15. From this
menu, the user can load and save the historical database to a given file. This allows
the user to control which historical database is used for assigning validity values to the
statically detectable faults. This capability also allows separate historical databases to
be kept, preventing violations of Non-Disclosure agreements when analyzing software
projects.
As a user is analyzing a module with SOSART, the static analysis warnings
that are manipulated are not automatically transferred into the historical database.
Instead, the user must explicitly force the analyzed warnings to transfer into the
database. This is done for two reasons. First off, this prevents the database from
being contaminated by erroneous entries made when learning the tool. Second, and
most importantly, this allows the user to prevent the transfer into the database until
a project has been fully completed. Large projects may require more than one pro-
gram execution of the SOSART tool in order to fully complete the analysis, and it
is best for the transfer of warnings into the database to be delayed until the project
is completed. This is a commonly used paradigm for metrics collection tools within
software engineering.
The analyze menu also has the capability to allow the user to clear the database
of all previously analyzed faults. In general, it is expected that this capability would
160
rarely be used, but it is supported to allow the database to be reset if new program
families are analyzed or there is some other desire to reset the historical data back to
its initial values.
The Program Configuration Panel, shown in Figure 8-16, also allows configuration
related to the historical database to be performed. The SOSART program, based
upon its default configuration, will automatically load a given database upon program
start. This is a basic feature of the program. However, it is possible to change which
database is loaded based upon the users preferences. It is also possible to configure
whether or not the database is automatically saved upon program exit. Under most
circumstances, it is desired for all changes to the historical database to be saved when
the program is exited. However, there may be certain circumstances where this is not
the appropriate action to take based upon the analysis being performed.
It is also possible to configure the tool to automatically transfer all analyzed
faults into the historical database upon program exit. While this does prevent the
occurrence of user error whereby the data is improperly transferred to the historical
database, it also can lead to corruption of the database if a person is analyzing a
project that is not intended to be kept for future reference.
The Randomize Database on Load feature allows the user to randomize the pri-
mary key used within the database when a given database is loaded. By design, the
key used to uniquely identify a statically detectable fault includes the file name, the
line number, the static analysis tool which detected the fault, and the fault which
was detected. If the database is randomized, an additional random value is appended
into the key definition. This allows for multiple instances of the same warning in
161
subsequent revisions of a given source code module to be considered unique. Ran-
domization also, to some extent, obfuscates the data, which may be important based
upon conditions of an NDA.
8.5.5 Exporting Graphics
In order to allow the graphics which are generated by SOSART to be imported
into external reports and other documents, SOSART supports the exportation of
program diagrams into graphics files. In order to avoid issues with royalties and
patent infringement, the Portable Network Graphic Format (PNG) was selected as
the only exportation format natively included with SOSART. The PNG format is a
raster image format intended to serve as a replacement for the GIF format. As such,
it is an Open Extensible Image Format with Lossless Compression. Another reason
for selecting the PNG format is that it was readily supported by the JGraph utilities
used for Graph manipulation.
Graphics exportation is only available if the currently selected tab on the display
is a Method activity diagram. The summary panels can not be exported, and the
code listing can not be exported in this manner. The resolution of the exportation
is effected by the image zoom as well. A larger zoom factor will result in a large
image, while a smaller zoom factor will result in a smaller image with less resolution.
Complex graphics may result in very large file sizes when exported.
162
8.5.6 Printing
In order to support the creation of hardcopy printouts of the analysis conducted
with SOSART, a standard print capability has been integrated with the tool. Printing
is accessed from the file manu, and opens a standard GUI print dialog series. Options
to be selected by the user include scaling features, which allow the graphs to be
printed to a normal size, to fit a given page size, to a specific scale size, or to fit a
certain number of pages.
Other print configuration parameters include the capability to print either the
current graph or all loaded graphs. If the current graph is selected, only the currently
viewed method graph, summary panel, or source code panel will be printed. If all
loaded graphs is selected, then the summary panel, source code panel, and all graphs
associated with the current method will be printed.
8.5.7 Project Saving
In order to facilitate the analysis of larger Java projects which can not be readily
analyzed in one sitting, as well as to protect the person doing the analysis from
random machine failure and retain results for future consultation, the SOSART tool
offers several mechanisms that can be used to load and save projects.
As with most analysis tools, SOSART offers the user the capability to create a new
project, save the current project, or load a previously saved project. When creating
a new project, the currently open project will first be closed before a new project is
created. The new project will not have any imported Java files, static warnings, or
163
execution traces loaded.
Saving a project will store all details related to the project, including loaded
source code modules, method activity diagrams, imported static analysis faults, and
execution traces. All data files are stored in XML using JavaBeans XML Persistence
mechanism.
Because of the large size of XML files created when saving an entire project, a
secondary method has been created to store projects. With this method, only the
static analysis warnings and their modified validity values and risks are stored for
future re-importation. This mechanism offers an extremely large savings in terms of
file size and also results in a tremendous performance improvement when loading a
large project. Using this mechanism, a user will work on a project assessing the risks
associated with a project. When the time comes to save the project, only the static
analysis warnings are saved. To again work on the project, the user must re-import
the Java files and program execution traces before reloading the statically detectable
faults which were previously saved.
8.5.8 GUI Capabilities
The SOSART GUI interface supports all common graphical behaviors related to
image size, including cascading of images, tiling of images, and zooming.
Zooming of images is principally used when viewing method activity diagrams.
There are three principle mechanisms for performing zoom operations. The “Zoom
In” and “Zoom Out” features zoom the graphic in or out by a factor of two, depending
164
on which menu item is selected. The zoom dialog box allows the user to zoom to one
of eight pre-configured zoom values, namely 200%, 175%, 150%, 125%, 100%, 75%,
50%, or 25%, as well as offering a drag bar which can set the zoom value to any
intermediate zoom factor.
Because of the possibility that there may be multiple java files imported into one
analysis project, the SOSART tool includes the capability to tile horizontally and
vertically, as well as cascade the opened files. These behaviors follow standard GUI
practices.
8.5.9 Reliability Report
The key functionality required by the SOSART tool is the ability to estimate the
reliability of a given program in a manner that can be used by a typical practicing
engineer. This is accomplished via the Reliability Report Panel, an example of which
is shown in Figure 8-17.
The reliability report panel provides the user with the appropriate details relative
to the given reliability of the loaded modules and execution traces. The display itself
is also color coded, with green indicating very good reliability values, and yellow,
orange, and red indicating lesser reliability values. In the particular example provided
in Figure 8-17, the net anticipated reliability is 0.999691.
In reviewing the report further, however, this high reliability is achieved because
the modules themselves very rarely execute. To facilitate these detailed reviews, the
reliability report can be exported to a text file for external review and processing.
165
Figure 8-18 represents a portion of the complete textual report detailing the reliability
for the report shown in Figure 8-17.
166
Figure 8-12: Imported Static Analysis Warnings.
167
Figure 8-13: Static Analysis Verification panel.
168
Figure 8-14: Static Analysis fault report.
Figure 8-15: Analyze menu of SOSART tool.
169
Figure 8-16: SOSART Program Configuration Panel.
Figure 8-17: SOSART Reliability Report Panel.
170
Method: setAddress
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: setBuffer
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: process
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: checkAuthorization
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: verifyClient
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: _getResponseMessage
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: sendFirstHeaders
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: sendContentHeaders
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: sendExtraHeader
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: sendResponse
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: processString
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: getLocalFile
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Method: replaceString
Posterior marginal for RawReliability: 1.000 0.000 0.000 0.000 0.000 0.000
Posterior marginal for CalibratedReliability: 1.000 0.000 0.000 0.000 0.000 0.000
##########################################################################################
Method Reliability Values
setAddress Posterior marginal for CoverageA: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
setAddress Posterior marginal for ReliabilityA: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
setBuffer Posterior marginal for CoverageB: Very Often 0.000 Often 0.000 Normal 0.000
Rarely 0.000 Very Rarely 1.000 Never 0.000
setBuffer Posterior marginal for ReliabilityB: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
process Posterior marginal for CoverageA: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
process Posterior marginal for ReliabilityA: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
checkAuthorization Posterior marginal for CoverageB: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
checkAuthorization Posterior marginal for ReliabilityB: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
verifyClient Posterior marginal for CoverageA: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
verifyClient Posterior marginal for ReliabilityA: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
_getResponseMessage Posterior marginal for CoverageB: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
_getResponseMessage Posterior marginal for ReliabilityB: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
sendFirstHeaders Posterior marginal for CoverageA: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
sendFirstHeaders Posterior marginal for ReliabilityA: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
sendContentHeaders Posterior marginal for CoverageB: Very Often 0.000 Often 0.000 Normal 1.000
Rarely 0.000 Very Rarely 0.000 Never 0.000
sendContentHeaders Posterior marginal for ReliabilityB: Perfect 1.000 Very High 0.000 High 0.000
Medium 0.000 Low 0.000 Very Low 0.000
.
.
.
##########################################################################################
##########################################################################################
Final Results...
setAddress:setBuffer:process:checkAuthorization:verifyClient:_getResponseMessage:sendFirstHeaders:sendContentHeaders
Posterior marginal for CoverageA: Very Often 0.000 Often 0.059 Normal 0.941 Rarely 0.000 Very Rarely 0.000 Never 0.000
sendExtraHeader:sendResponse:processString:getLocalFile:replaceString:verifyClient:_getResponseMessage:sendFirstHeaders:...
Posterior marginal for CoverageB: Very Often 0.000 Often 0.030 Normal 0.970 Rarely 0.000 Very Rarely 0.000 Never 0.000
setAddress:setBuffer:process:checkAuthorization:verifyClient:_getResponseMessage:sendFirstHeaders:sendContentHeaders
Posterior marginal for ReliabilityA: Perfect 0.047 very High 0.421 High 0.457 Medium 0.073 Low 0.002 Very Low 0.000
setAddress:setBuffer:process:checkAuthorization:verifyClient:_getResponseMessage:sendFirstHeaders:sendContentHeaders:...
Posterior marginal for NetCoverage: Very Often 0.000 Often 0.096 Normal 0.904 Rarely 0.000 Very Rarely 0.000 Never 0.000
setAddress:setBuffer:process:checkAuthorization:verifyClient:_getResponseMessage:sendFirstHeaders:sendContentHeaders:...
Posterior marginal for NetReliability: Perfect 0.015 very High 0.230 High 0.529 Medium 0.202 Low 0.022 Very Low 0.001
Net Anticipated Reliability: 0.999691
##########################################################################################
Figure 8-18: Textual Export of Reliability report for analyzed source.
Chapter 9
Model Validation using Open
Source Software
9.1 Introduction
In order to provide the first set of experimental validations for the SoSART model,
a series of experiments were conducted in which the reliability of an open source
software package was calculated using the SoSART model and then compared with
a reliability analysis using the STREW[Nag05] metrics. This series of experiments
consisted of 3 open source programs, the RealEstate academic game developed by
North Carolina State University and used for Software Engineering education, the
open source jsunit program available from SourceForge, and the Jester program, also
available from SourceForge.
Programs were selected with several criteria in mind. First off, due to the com-
plexities of using the SoSART analysis tool, smaller projects needed to be analyzed,
171
172
as the tool suffered from technical difficulties as larger applications were analyzed.
While the intent of the model was to be applicable up to approximately 20 KLOC,
significant tool problems developed as programs larger than 5KLOC were analyzed.
Second, in order to apply the STREW model, a set of JUnit test scripts was required.
The STREW metrics correlate expected software reliability with implementation met-
rics and testing metrics. Third, and finally, access to a set of pseudo requirements
or other program documentation was required in order to estimate the number of
program requirements in order to then estimate the resulting software reliability.
9.2 STREW Metrics and GERT
The STREW metrics suite is a set of development metrics defined by Nagappan
et al. [Nag05] [NWVO04] [NWV03] which have been shown to be effective at esti-
mating the software reliability from metric relationships. Through a combination of
measurable parameters, including:
1. Number of Test Cases
2. The number of Source Lines of Code (SLOC)
3. The Number of Test Lines of Code (TLOC)
4. The Number of Requirements
5. The Number of Test Assertions
6. The Number of Source Classes
173
7. The Number of Conditionals
8. The Number of Test Classes
Based on these metrics, the reliability of software can be estimated using the
equation
Reliability = C0 + C1 · R1 + C2 · R2 − C3 · R3 + C4 · R4 (9.1)
where
R1 = Number of Test CasesSLOC
R2 = Number of test casesNumber of Requirements
R3 = Test Lines of CodeSLOC
R4 = Number of AssertionsSLOC
and
C0, C1, C2, C3, C4 represent calibration constants.
The confidence interval, CI is calculated using the relationship
CI = Zα/2
√
R(1 − R)
n(9.2)
where
174
R represents the calculated reliability
Zα/2 represents the upper α/2 quartile of the standard normal distribution for
the desired confidence interval, and
n is the number of test cases provided by the project[DZN+04].
The STREW metrics are supported by the GERT [DZN+04] toolkit.
9.3 Real Estate Program Analysis
The Real Estate program is an example program developed as a part of the open
seminar project at NCSU. The program, developed by a series of graduate students,
documents the entire software development process for a simple game using the agile
development process. A complete suite of unit tests was constructed using JUnit, and
these test cases are readily available with the source code. The Real Estate program
was developed in the Java language and has the overview metrics provided in Table
9.1 and 9.2.
Table 9.1: Real Estate Overview MetricsMetric Description Value
Number of Packages 2
Total Lines of Code (LOC) 1250
Number of Static Methods (NSM) 14
Number of Classes 40
Number of Attributes (NOF) 80
Number of Overridden Methods (NORM) 6
Number of Static Attributes (NSF) 25
Number of Methods (NOM) 236
Number of Defined Interfaces 4
175
Table 9.2: RealEstate Class MetricsName NSM NOF NORM NSF NOM LOC DIT*Card.java 0 0 0 2 3 1 4CardCell.java 0 1 0 0 3 1 4Cell.java 0 3 1 0 9 1 4Die.java 0 0 0 0 1 2 4FreeParkingCell.java 0 0 0 0 2 4 3GameBoard.java 0 4 0 0 13 4 3GameBoardFull.java 0 0 0 0 1 7 3GameMaster.java 1 8 0 2 41 11 2GoCell.java 0 0 1 0 3 13 2GoToJailCell.java 0 0 0 0 2 14 2JailCard.java 0 1 0 0 4 18 2JailCell.java 0 0 0 1 2 18 2MoneyCard.java 0 3 0 0 4 23 1MovePlayerCard.java 0 2 0 0 4 25 1Player.java 0 9 1 0 34 28 1PropertyCell.java 0 5 1 0 11 38 6RailRoadCell.java 2 0 1 3 3 46 5TradeDeal.java 0 3 0 0 7 75 1UtilityCell.java 1 0 1 2 3 169 1BuyHouseDialog.java 0 3 0 1 7 1 4CCCellInfoFormatter.java 0 0 1 0 1 1 4CellInfoFormatterTest.java 0 0 0 0 2 1 4ChanceCellInfoFormatter.java 0 0 0 1 1 1 4FreeParkingCellInfoFormatter.java 0 0 0 1 1 4 3GameBoardUtil.java 5 0 0 0 0 10 3GoCellInfoFormatter.java 0 0 0 1 1 14 2GotoJailCellInfoFormatter.java 0 0 0 1 1 14 2GUICell.java 0 3 0 1 7 16 2GUIRespondDialog.java 0 2 0 1 3 16 2GUITradeDialog.java 0 6 0 1 4 17 2InfoFormatter.java 2 0 0 1 0 17 2InfoPanel.java 0 0 0 1 1 18 2JailCellInfoFormatter.java 0 0 0 1 1 19 2Main.java 2 0 0 0 0 19 2MainWindow.java 0 6 0 1 32 21 2PlayerPanel.java 0 13 0 1 18 34 1PropertyCellInfoFormatter.java 0 0 0 0 1 39 6RRCellInfoFormatter.java 0 0 0 0 1 50 5TestDiceRollDialog.java 0 4 0 1 2 59 1UtilCellInfoFormatter.java 0 0 0 0 1 92 1UtilDiceRoll.java 1 4 0 1 3 114 1*Depth of Inheritance Tree
While the RealEstate program represents a slightly different domain than the
intended domain for this software model, it does provide a readily available convenient
package which can be used as a proof of concept application for the tool and model.
In order to provide a baseline reliability estimate for the RealEstate program, the
GERT analysis tool [DZN+04] [Nag05] was invoked from within the Eclipse environ-
ment. While this tool was intended to directly calculate the reliability of the software
given the input parameters, compatibility issues between the tool and available Eclipse
platforms and Java Run Time Environments limited application of the tool to data
collection, and the reliability was calculated externally. From the STREW metrics,
the reliability for the RealEstate program was calculated to range between 0.9185
176
and 1.000, with an expected reliability of 0.9712, as is shown in Table 9.3.
Table 9.3: Real Estate STREW Metrics Reliability ParametersParameter Value
Estimated Reliability 0.9712
Confidence Interval 0.0526
Maximum Reliability 1
Minimum Reliability 0.9185
After the assessment using the GERT tool was completed, the source code was
statically analyzed using seven independent static analysis tools which were supported
by the SOSART tool.1 This resulted in the detection of 889 statically detectable
faults, 214 which were deemed to be valid faults upon review with the SOSART tool,
as is shown in Table 9.4.
Based on the imported static analysis faults, an estimated reliability value was
calculated using the model and the assumption that all execution methods will be
called at a uniform normal rate. This resulted in the first reliability estimate of
0.8945.
Once this reliability was obtained, the program was executed and execution traces
were captured with the SOSART tool, providing accurate data on the branch coverage
under the tested use case. This data was then imported into the SOSART tool and the
reliability was re-calculated (assuming a medium confidence in the testing performed)
and found to be 0.9753. This reliability jump can be attested to the fact that the
methods which were the most unreliable within the system were rarely (if ever) called
and the methods which were most often traversed are the most reliable within the
1While previous research included 10 static analysis tools, Licensing issues only allowed 8 toolsto be used for this portion of experimentation.
177
Table 9.4: RealEstate Class MetricsTotal SA 1 SA 2 SA 3 SA 4 SA 5 SA 6 SA 7 SA 8
Filename Valid / InvalidBuyHouseDialog.java 12 / 19 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 12 / 19 0 / 0 0 / 0Card.java 4 / 1 0 / 0 0 / 0 0 / 0 2 / 0 0 / 0 2 / 1 0 / 0 0 / 0CardCell.java 3 / 3 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 3 / 3 0 / 0 0 / 0Cell.java 5 / 4 0 / 0 0 / 1 0 / 0 0 / 0 0 / 0 5 / 3 0 / 0 0 / 0CellInfoFormatter.java 1 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 1 / 0 0 / 0 0 / 0ChanceCellInfoFormatter.java 0 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 0 0 / 0Die.java 2 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 2 / 2 0 / 0 0 / 0FreeParkingCell.java 1 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 1 / 1 0 / 0 0 / 0FreeParkingCellInfoFormatter.java 0 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 0 0 / 0GameBoard.java 8 / 44 0 / 0 4 / 28 0 / 0 0 / 0 0 / 0 4 / 16 0 / 0 0 / 0GameBoardFull.java 0 / 41 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 41 0 / 0 0 / 0GameBoardUtil.java 2 / 18 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 2 / 18 0 / 0 0 / 0GameMaster.java 13 / 127 1 / 0 3 / 82 2 / 1 0 / 0 1 / 0 6 / 44 0 / 0 0 / 0GoCell.java 2 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 2 / 2 0 / 0 0 / 0GoCellInfoFormatter.java 0 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 0 0 / 0GotoJail.java 1 / 4 0 / 0 0 / 2 0 / 0 0 / 0 0 / 0 1 / 2 0 / 0 0 / 0GotoJailCellInfoFormatter.java 0 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 0 0 / 0GUICell.java 8 / 12 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 8 / 12 0 / 0 0 / 0GUIRespondDialog.java 8 / 6 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 8 / 6 0 / 0 0 / 0GUITradeDialog.java 15 / 22 0 / 0 0 / 0 1 / 0 0 / 0 0 / 0 14 / 22 0 / 0 0 / 0InfoFormatter.java 4 / 9 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 4 / 9 0 / 0 0 / 0InfoPanel.java 1 / 3 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 1 / 3 0 / 0 0 / 0JailCard.java 2 / 6 0 / 0 0 / 2 0 / 0 0 / 0 0 / 0 2 / 4 0 / 0 0 / 0JailCell.java 3 / 2 0 / 0 0 / 0 0 / 0 1 / 0 0 / 0 2 / 2 0 / 0 0 / 0JailCellFormatter.java 0 / 2 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 0 0 / 0Main.java 1 / 32 0 / 0 0 / 0 0 / 0 0 / 0 0 / 6 1 / 18 0 / 8 0 / 0mainWindow.java 17 / 37 0 / 0 0 / 0 1 / 0 0 / 0 1 / 0 13 / 37 2 / 0 0 / 0MoneyCard.java 1 / 7 0 / 0 0 / 2 0 / 0 0 / 0 0 / 0 1 / 5 0 / 0 0 / 0MovePlayerCard.java 4 / 17 0 / 0 0 / 10 0 / 0 0 / 0 0 / 0 4 / 7 0 / 0 0 / 0Player.java 23 / 111 1 / 0 9 / 62 0 / 0 0 / 0 0 / 0 13 / 49 0 / 0 0 / 0PlayerPanel.java 22 / 19 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 22 / 19 0 / 0 0 / 0PropertyCell.java 5 / 16 0 / 0 1 / 4 0 / 0 0 / 0 0 / 0 4 / 12 0 / 0 0 / 0PropertyCellInfoFormatter.java 0 / 13 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 13 0 / 0 0 / 0RailRoadCell.java 7 / 7 0 / 0 0 / 3 0 / 0 1 / 0 0 / 0 6 / 4 0 / 0 0 / 0RealEstateGUI.java 28 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 28 / 0 0 / 0 0 / 0RespondDialog.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0RRCellInfoFormatter.java 0 / 11 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 11 0 / 0 0 / 0TestDiceRollDialog.java 1 / 27 0 / 0 0 / 2 0 / 0 0 / 1 0 / 0 1 / 24 0 / 0 0 / 0TradeDeal.java 0 / 11 0 / 0 0 / 3 0 / 0 0 / 0 0 / 0 0 / 8 0 / 0 0 / 0TradeDialog.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0UtilCellInfoFormatter.java 0 / 11 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 11 0 / 0 0 / 0UtilDiceRoll.java 3 / 11 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 3 / 11 0 / 0 0 / 0UtilityCell.java 7 / 9 0 / 0 1 / 4 0 / 0 0 / 0 0 / 0 6 / 5 0 / 0 0 / 0
Total 214 / 675 2 / 0 18 / 205 4 / 1 4 / 1 2 / 6 182 / 454 2 / 8 0 / 0
system.
These results, which are within the confidence interval as was calculated by the
STREW metrics, provide a preliminary proof of concept for the validity of using
statically detectable faults and the defined Bayesian Belief Network for assessing the
reliability of software.
9.4 JSUnit Program Analysis
JSUnit is an open-source unit testing framework which allows the testing of client-
side Javascript programs. Development began in 2001, and currently there are more
178
than 275 subscribed members and over 10000 downloads. The tool is developed in
Java and is available from Sourceforge. Code metrics for JSUnit are provided in Table
9.5 and Table 9.6.
Table 9.5: JSUnit Overview MetricsMetric Description Value
Number of Packages 2
Total Lines of Code 582
Number of Static Methods 19
Number of Classes 19
Number of Attributes 36
Number of Overridden Methods 3
Number of Static Attributes 32
Number of Methods 139
Number of Defined Interfaces 0
Table 9.6: JSUnit Class MetricsName NSM NOF NORM NSF NOM LOC DIT*AllTests.java 2 0 0 0 0 23 1ArgumentsConfiguration.java 0 5 0 0 7 19 2ArgumentsConfigurationTest.java 0 0 0 0 3 21 3Configuration.java 1 0 0 7 13 61 1ConfigurationException.java 0 2 0 0 3 5 3ConfigurationTest.java 0 0 0 0 4 9 3DistributedTest.java 0 1 0 1 3 36 3DistributedTestTest.java 0 1 0 0 5 19 3DummyHttpRequest.java 0 1 0 0 51 54 1EndToEndTestSuite.java 1 0 0 0 0 4 3EnvironmentVariablesConfiguration.java 0 0 0 0 6 6 2EnvironmentVariablesConfigurationTest.java 0 1 0 0 4 20 3JsUnitServer.java 1 7 2 0 24 86 2JsUnitServlet.java 1 0 0 1 0 1 3PropertiesConfigurationTest.java 0 0 0 0 4 23 3PropertiesFileConfiguration.java 0 2 1 1 8 15 2ResultAcceptorServlet.java 0 0 1 0 1 8 4ResultAcceptorTest.java 0 2 0 0 10 50 3ResultDisplayerServlet.java 0 0 1 0 1 15 4StandaloneTest.java 0 3 0 1 8 48 3StandaloneTestTest.java 0 0 0 0 3 7 4Suite.java 1 0 0 0 0 9 3TestCaseResult.java 2 4 0 3 13 28 1TestCaseResultBuilder.java 0 0 0 0 3 16 1TestCaseResultTest.java 0 0 0 0 6 40 3TestCaseResultWriter.java 0 1 0 6 5 33 1TestRunnerServlet.java 0 0 1 0 3 13 4TestSuiteResult.java 4 9 0 0 27 79 1TestSuiteResultBuilder.java 0 1 0 0 6 37 1TestSuiteResultTest.java 0 2 0 0 8 33 3TestSuiteResultWriter.java 0 1 0 12 8 42 1Utility.java 10 0 0 1 0 34 1*Depth of Inheritance Tree
Following the procedure applied previously, the reliability of the JSUnit program
was assessed using the STREW metrics, resulting in the data shown in Table 9.7.
Reliability was estimated to range between 0.6478 and 0.9596, with a typical reliability
179
value of 0.8037.
Table 9.7: JSUnit STREW Metrics Reliability ParametersParameter Value
Estimated Reliability 0.8037
Confidence Interval 0.1559
Maximum Reliability 0.9596
Minimum Reliability 0.6478
Using the same 8 static analysis tools, the source code was analyzed for statically
detectable faults, as is shown in Table 9.8. A total of 480 statically detectable faults
were discovered, 220 of which were deemed to be valid and of varying risks.
Table 9.8: JSUnit Static Analysis FindingsTotal SA 1 SA 2 SA 3 SA 4 SA 5 SA 6 SA 7 SA 8
Filename Valid / InvalidTestRunnerServlet.java 1 / 1 1 / 0 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0 0 / 0ResultDisplayServlet.java 2 / 2 0 / 0 0 / 0 2 / 0 0 / 0 0 / 2 0 / 0 0 / 0 0 / 0ResultAcceptorServer.java 0 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0 0 / 0JSUnitServlet.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0Utility.java 21 / 16 0 / 0 4 / 0 3 / 0 0 / 0 4 / 0 8 / 16 0 / 0 2 / 0ArgumentsConfiguration.java 18 / 14 0 / 0 0 / 0 5 / 0 0 / 0 0 / 5 13 / 9 0 / 0 0 / 0Configuration.java 33 / 17 0 / 0 0 / 0 6 / 0 0 / 0 0 / 5 17 / 10 0 / 2 10 / 0DistributedTest.java 10 / 28 0 / 0 0 / 0 2 / 0 0 / 0 2 / 5 2 / 23 0 / 0 4 / 0EnvironmentVariablesConfiguration.java 0 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0TestSuiteResultBuilder.java 12 / 18 0 / 0 0 / 0 4 / 0 0 / 0 1 / 3 5 / 15 0 / 0 2 / 0PropertiesFileConfiguration.java 6 / 3 0 / 0 0 / 0 0 / 0 0 / 0 1 / 0 3 / 3 0 / 0 2 / 0TestCaseResultsBuilder.java 5 / 6 0 / 0 0 / 0 2 / 0 0 / 0 0 / 0 3 / 6 0 / 0 0 / 0StandaloneTest.java 22 / 21 0 / 0 0 / 0 3 / 0 0 / 0 1 / 1 16 / 20 0 / 0 2 / 0TestCaseResultWriter.java 13 / 19 0 / 0 0 / 0 2 / 0 0 / 0 0 / 6 11 / 13 0 / 0 0 / 0TestSuiteResult.java 31 / 30 5 / 1 0 / 0 9 / 0 0 / 0 0 / 0 17 / 29 0 / 0 0 / 0ConfigurationExample.java 1 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 1 / 1 0 / 0 0 / 0TestCaseResult.java 13 / 6 0 / 0 0 / 0 2 / 0 0 / 0 0 / 0 11 / 6 0 / 0 0 / 0TestSuiteResultsWriter.java 6 / 30 0 / 0 0 / 0 1 / 0 0 / 0 0 / 0 5 / 30 0 / 0 0 / 0JsUnitServer.java 26 / 46 2 / 0 0 / 0 3 / 0 1 / 0 2 / 2 16 / 44 0 / 0 2 / 0
Total 220 / 8 / 1 4 / 0 44 / 0 1 / 0 11 / 31 128 / 0 / 2 24 / 0260 226
These results were then imported into the SOSART analysis tool which calculated
an anticipated reliability of 0.9082 if each and every method was executed with a
uniform normal rate. Adding execution traces to the model reduced the reliability
value to 0.8102. These values are slightly higher than would be expected by the
STREW metrics calculations, but are within the range of acceptable values.
180
9.5 Jester Program Analysis
Java Jester is an Open Source tool available from Sourceforge which is intended
to aid in Extreme programming development by finding code which is inadequately
covered during testing. Jester uses a technique referred to as mutation testing to
automatically inject errors into source and determine if those errors are detected by
the developed test cases. Jester is developed in Java and can test Java code. Code
metrics for Jester are provided in Table 9.9 and Table 9.10.
Table 9.9: Jester Overview MetricsMetric Description Value
Number of Packages 1
Total Lines of Code 835
Number of Static Methods 16
Number of Classes 27
Number of Attributes 63
Number of Overridden Methods 8
Number of Static Attributes 15
Number of Methods 135
Number of Defined Interfaces 15
Table 9.10: Jester Class MetricsName NSM NOF NORM NSF NOM LOC DIT*ConfigurationException.java 0 0 0 0 2 2 4FileBasedClassIterator.java 0 3 0 0 4 35 1FileBasedClassSourceCodeChanger.java 0 8 0 0 8 45 1IgnoreList.java 0 1 1 1 3 26 1IgnoreListDocument.java 0 3 1 1 10 54 1IgnorePair.java 0 2 3 0 7 8 1IgnoreRegion.java 0 2 1 0 4 5 1JesterArgumentException.java 0 0 0 0 1 1 3MainArguments.java 1 4 0 2 7 46 1RealClassTestTester.java 0 2 0 0 3 24 1RealCompiler.java 1 1 0 0 2 11 1RealConfiguration.java 0 2 0 1 13 31 1RealLogger.java 0 1 0 0 2 10 1RealMutationsList.java 0 2 0 1 5 40 1RealProgressReporter.java 0 2 0 0 5 8 1RealProgressReporterUI.java 1 2 0 0 3 40 6RealReport.java 0 10 1 0 20 76 1RealTestRunner.java 1 2 0 0 2 16 1RealXMLReportWriter.java 0 1 0 0 2 12 1ReportItem.java 0 5 1 0 6 54 1SimpleCodeMangler.java 0 2 0 0 6 18 1SimpleIntCodeMangler.java 1 0 0 0 4 29 2SourceChangeException.java 0 0 0 0 2 2 3TestRunnerImpl.java 3 1 0 6 10 101 1TestTester.java 3 3 0 3 2 61 1TwoStringSwappingCodeMangler.java 0 4 0 0 2 18 2Util.java 5 0 0 0 0 62 1*Depth of Inheritance Tree
181
The same procedure as was used for the RealEstate and JSUnit programs was
applied to the Jester program to estimate system reliability, resulting in the data
shown in Table 9.11. Reliability was estimated to range between 0.6478 and 0.9596,
with a typical reliability value of 0.8037.
Table 9.11: Jester STREW Metrics Reliability ParametersParameter Value
Estimated Reliability 0.8911
Confidence Interval 0.0865
Maximum Reliability 0.9775
Minimum Reliability 0.8046
Using the same 8 static analysis tools, the source code was analyzed for statically
detectable faults, as is shown in Table 9.12. A total of 652 statically detectable faults
were discovered, 144 of which were deemed to be valid and of varying risks.
These results were then imported into the SOSART analysis tool which calculated
an anticipated reliability of 0.9024 if each and every method was executed with a
uniform normal rate. Execution traces for the program were obtained by running
the acceptance test suite included within the source code module, as this test set
was deemed to be representative of the desired use case for the program. Adding
execution traces to the model reduced the reliability value to 0.9067. These values
are slightly higher than would be expected by the STREW metrics calculations, but
are within the range of acceptable values.
182
Table 9.12: Jester Static Analysis FindingsTotal SA 1 SA 2 SA 3 SA 4 SA 5 SA 6 SA 7 SA 8
Filename Valid / InvalidClassIterator.java 0 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0ClassSourceCodeChanger.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0ClassTestTester.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0CodeMangler.java 0 / 1 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0Compiler.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0Configuration.java 0 / 9 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 9 0 / 0 0 / 0ConfigurationException.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0FileBasedClassIterator.java 2 / 12 0 / 0 2 / 3 0 / 0 0 / 0 0 / 0 0 / 9 0 / 0 0 / 0FileBasedClassSourceCodeChanger.java 6 / 29 0 / 0 0 / 13 0 / 1 0 / 0 0 / 5 6 / 10 0 / 0 0 / 0FileExistenceChecker.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0FileVisitor.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0IgnoreList.java 0 / 17 0 / 0 0 / 3 0 / 0 0 / 0 0 / 0 0 / 14 0 / 0 0 / 0IgnoreListDocument.java 5 / 52 0 / 0 2 / 25 0 / 0 1 / 0 0 / 0 2 / 27 0 / 0 0 / 0IgnorePair.java 0 / 17 0 / 0 0 / 11 0 / 0 0 / 0 0 / 0 0 / 6 0 / 0 0 / 0IgnoreRegion.java 0 / 9 0 / 0 0 / 3 0 / 0 0 / 0 0 / 0 0 / 6 0 / 0 0 / 0JesterArgumentException.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0Logger.java 1 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 1 / 0 0 / 0 0 / 0MainArguments.java 7 / 35 0 / 0 1 / 15 2 / 1 0 / 1 1 / 0 3 / 18 0 / 0 0 / 0MutationMarker.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0MutationsList.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0ProgressReporter.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0RealClassTester.java 4 / 7 0 / 0 4 / 2 0 / 0 0 / 0 0 / 0 0 / 5 0 / 0 0 / 0RealCompiler.java 4 / 6 0 / 0 4 / 1 0 / 0 0 / 0 0 / 1 0 / 4 0 / 0 0 / 0RealConfiguration.java 6 / 23 1 / 0 1 / 5 0 / 0 1 / 0 0 / 0 3 / 18 0 / 0 0 / 0RealLogger.java 3 / 7 0 / 0 0 / 0 0 / 0 0 / 0 3 / 1 0 / 6 0 / 0 0 / 0RealMutationsList.java 7 / 34 0 / 0 1 / 3 0 / 0 0 / 0 0 / 2 6 / 29 0 / 0 0 / 0RealProgressReporter.java 0 / 10 0 / 0 0 / 9 0 / 0 0 / 0 0 / 0 0 / 1 0 / 0 0 / 0RealProgressReporterUI.java 15 / 14 0 / 0 0 / 6 0 / 0 0 / 0 0 / 1 15 / 7 0 / 0 0 / 0RealReport.java 11 / 35 0 / 1 0 / 11 0 / 0 0 / 0 1 / 0 10 / 23 0 / 0 0 / 0RealTestRunner.java 0 / 7 0 / 0 0 / 0 0 / 0 0 / 0 0 / 2 0 / 5 0 / 0 0 / 0RealXMLReportWriter.java 3 / 15 0 / 0 2 / 4 0 / 0 0 / 0 0 / 0 1 / 11 0 / 0 0 / 0Report.java 3 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 3 / 0 0 / 0 0 / 0SimpleCodeMangler.java 1 / 8 0 / 0 0 / 3 0 / 0 0 / 0 0 / 0 1 / 5 0 / 0 0 / 0SimpleIntCodeMangler.java 2 / 11 0 / 0 0 / 2 0 / 0 0 / 0 0 / 0 2 / 9 0 / 0 0 / 0SourceChangeException.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0TestRunner.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0TestRunnerImpl.java 33 / 42 1 / 0 0 / 0 3 / 1 5 / 1 4 / 8 20 / 32 0 / 0 0 / 0TestTester.java 10 / 54 1 / 2 2 / 2 0 / 0 1 / 0 0 / 10 6 / 40 0 / 0 0 / 0TwoStringSwappingCodeManager.java 1 / 14 0 / 0 1 / 0 0 / 0 0 / 0 0 / 0 0 / 14 0 / 0 0 / 0Util.java 20 / 39 0 / 0 4 / 0 0 / 0 0 / 0 9 / 5 7 / 34 0 / 0 0 / 0XMLReportWriter.java 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0 0 / 0Total 144 / 508 3 / 3 24 / 121 5 / 3 8 / 2 18 / 35 86 / 344 0 / 0 0 / 0
9.6 Effort Analysis
While the previous sections of this chapter have discussed the accuracy of the
reliability model, without providing a cost effective mechanism for applying the model,
it is difficult to establish the practical application for the model. This section intends
to analyze the effort required to apply the model relative to other means of ensuring
reliability for reused source modules.
One of the oldest and thusfar most effective mechanisms for ensuring the reliabil-
ity of reused components is a complete formal review of the implementation of the
program. For code reviews to obtain their maximum effectiveness, the review rate for
183
the peer review meeting should be approximately 100 lines of code per hour[Gla79].
Furthermore, effective peer reviews require 3 to 4 meeting attendees. Thus, the ef-
fort required to complete review a source code package can be estimated using the
equation
ECR =LOC
RR· NR (9.3)
where
ECR represents the total effort necessary review the source code package
LOC represents the count of the lines of code within the package
RR represents the review rate for the source code package in LOC per unit of
time
NR represents the number of reviewers of the source code package.
For the three projects analyzed in this chapter, the effort can be estimated to be
between 1047.6 and 2250 minutes, as is shown in Table 9.13.
Table 9.13: Software Complete Review Effort EstimatesReal Estate JSUnit Jester
Lines of Code 1250 1033 582Estimated Effort to Analyze (Minutes) 750 619.8 349.2Persons Required 3 3 3Net Effort (Man-Minutes) 2250 1859.4 1047.6
The effort required to undertake the reliability analysis using the SOSART method
was measured during development to allow comparison with the estimated effort for
a complete source code review, resulting in the data shown in Table 9.14. The data
is broken into two portions, the effort required for the static analysis tools to analyze
184
the source code modules and the effort required to analyze the reliability of the
program. This second field includes the effort required to review the static analysis
tool detected faults using the SOSART tool as well as the time required to execute
limited module testing. These results clearly indicate that this method is cost effective
versus complete code review.
Table 9.14: Software Reliability Modeling Actual Effort
Real Estate JSUnit JesterStatic Analysis Execution Time (Minutes) 8.33 5.93 8.27SOSART Review (Minutes) 546.03 319.65 218.8Net Effort (Minutes) 554.36 325.58 227.07Effective Review Rate (LOC / Hour) 135 190 154Difference (Minutes) 1695 1533 820Savings 226% 247% 234%
Chapter 10
Model Validation using Tempest
The validation for the software reliability model presented previously relied upon
comparing the reliability calculated using the SOSART tool with the reliability cal-
culated through the STREW metrics method. While these experiments provided
a preliminary proof of concept for the model, a more extensive experiment using a
more appropriate software package was necessary. In this particular instance, the
given software package would be operated in an experimental fixture and the reliabil-
ity of the software obtained would be measured. These reliability values would then
be compared with the values obtained through the usage of the SOSART tool.
This chapter describes the experiment which was used to validate the reliability
model. The first section describes the Tempest Web Server software which was used
to validate the software reliability model. The second section describes the setup
which was used to evaluate the Tempest software from a reliability standpoint. The
third section of this chapter the discusses the results of measuring the reliability of the
Tempest software in an experimental environment. The fourth section of this chapter
185
186
discusses the process used to experimentally measure the reliability of the Tempest
software using the software reliability model and the SoSART tool. The fifth and
final section of this document discusses the economic costs which are incurred by
using this methodology.
10.1 Tempest Software
The Tempest web server was developed by members of the NASA Glenn Research
Center, Flight Software Engineering Branch will be analyzed through the SOSART
tool. It is an embedded real-time HTTP web server, accepting requests from standard
browsers running on remote clients and returning HTML files. It is capable of serving
Java applets, CORBA, and virtual reality (VRML), audio, video files, etc. NASA
uses Tempest for the remote control and monitoring of real, physical systems via
inter/intra-nets. The initial version of Tempest were developed for the VxWorks
program using the C programming language and occupied approximately 34kB ROM.
Subsequently, the code has been ported to the Java language and can execute on any
machine capable of running the Sun Java Virtual machine[YP99].
By its nature, Tempest represents platform technology which is intended to be
integrated into other products, as is shown in Figure 10-1. Tempest has been used for
various space applications and communications applications, distance learning[YB98],
Virtual Interactive Classrooms, and other Real Time Control applications[Dan97].
Future intended uses for Tempest include enabling near real-time communications
between earth-bound scientists and their physical/biological experiments on-board
187
Figure 10-1: Flow diagram showing the relationship between Tempest, controlled experi-ments, and the laptop web browsers[YP99].
space vehicles, mitigating astronaut risks resulting from cardiovascular alterations, as
well as developing new teaching aids for education enabling students and teachers to
perform experiments. Since being developed, Tempest has received several awards,
notably the Team 2000 FLC Award for Excellence in Technology Transfer, the 1999
Research and Development 100 Award, and the 1998 NASA Software of the Year
Award Winner
Tempest is implemented using the Java language. While the Java Language does
support Object oriented implementation and design, the Tempest web server is con-
structed more in a structural manner, as is shown Table 10.1. Standard class based
metrics are provided in Table 10.2.
As an embedded web server, Tempest allows the user a significant number of
188
Table 10.1: Tempest Overview MetricsMetric Description Value
Number of Packages 2
Total Lines of Code 2200
Number of Static Methods 6
Number of Classes 14
Number of Attributes 42
Number of Overridden Methods 2
Number of Static Attributes 48
Number of Methods 51
Number of Defined Interfaces 0
Table 10.2: Tempest Class MetricsWeighted Number of Depth of
Static Overridden methods Static Number of Lines of InheritanceName Methods Attributes Methods per Class Attributes Methods Code TreeContentTag.java 0 1 0 41 0 2 211 1DummyLogger.java 1 0 0 1 0 0 0 1GetString.java 0 0 0 2 0 1 16 2HeadString.java 0 0 0 1 0 1 5 2HTTPFile.java 0 12 0 33 0 6 145 1HTTPString.java 1 21 0 155 32 12 797 1Logger.java 0 3 1 14 0 3 74 2MessageBuffer.java 0 1 0 5 0 2 21 1NotFoundException.java 0 1 1 3 0 3 7 3ObjectTag.java 0 1 0 50 0 4 300 1PostString.java 0 0 0 2 0 1 12 2RuntimeFlags.java 0 0 0 8 4 8 16 1SomeClass.java 0 1 0 6 0 4 27 1Tempest.java 2 1 0 97 12 4 552 1TimeRFC1123.java 2 0 0 2 0 0 17 1
configuration parameters which can be passed to the software on the command line
when an execution instance is started. Command line options are defined in Table
10.3.
10.2 Evaluating Tempest
To evaluate the Reliability of the Tempest software and the accuracy of the pro-
posed software reliability model, an experiment was constructed using the Tempest
software and the Java Web Tester software package. In essence, one machine was
configured as a Tempest web server and was given test web site to serve., A second
machine was configured to use the Java Web Tester software package. This tool al-
189
Table 10.3: Tempest Configuration ParametersOption Description
port number
This option controls which port that Tempest listens on for client requests. The stan-dard HTTP port is 80, the default. UNIX users note that you must have root priv-ileges to open a server on port 80 (or any server socket below port 1024). If youopen Tempest on say, port 9900, the browser URL will be appended with :9900 (e.g.http://www.somesite.com:9900/index.html)
auth or noauth
These options control the client and user checks. These checks are performed against thecontents of the CLIENTS.SYS file and the USERS.SYS file. With ”noauth”, anyone canaccess the server from any machine. With ”auth” the user will be challenged for a validID and password (which appear in the USERS.SYS file) and the user must be accessingTempest from a client computer allowed in CLIENTS.SYS file.
log or nolog
These options control log file creation. The log file consists of a series of messages showingwho accessed the server, when, and from where. Be sure that you have a Tempest/LOGdirectory even if there are no files in it.An example message is: (Tempest/LOG/TempestLog.0) User adam access from24.55.242.190 at Saturday, 30 March 2002 05:46:35 ESTNote that the log file is not written until Tempest is stopped.
debug or nodebugThese options control the display of debugging messages from Tempest in the window (e.g.DOS window) used to start Tempest.
persist or nopersist
These options control the type of connection established between Tempest and the client.Normally, server-client connections are stateless, non-persistent connections after Tem-pest delivers the information requested by the client. In persistent connections, Tempestmaintains the connection with a client after the client’s request is satisfied. Persistentconnections are selected to reduce client-server transaction times when the transactionsare many. The disadvantage of persistent connections is that ports are consumed and notavailable to other clients. The timeout for a persistent connection is 6 seconds. The useof a persistent connection also depends on browser support for this feature.
lows the user to configure a set of web sites which are to be periodically monitored
for their reliability.
10.2.1 Java Web Tester
The Java Web Tester software was previously developed for research into the re-
liability of web servers, as is detailed in Schilling and Alam[SA07a]. This tool was
developed in the Java Programming language and allows the user to verify connec-
tivity with a set of existing websites.
The tool consists of three tools bundled into a single jar file. The first tool, a
GUI based tool, is used to configure the website tester and can be used for short
duration tests. The GUI allows the user to configure the remote site which is to be
used as a test site, the port to connect to the remote site, and the test rate. The test
rate determines how often the remote site is polled for a connection and subsequently
downloaded. Test rates can range between 1 and 3600 seconds. The tool also allows
the remote server to be pinged before an attempt is made to open the http connection.
190
Figure 10-2: Web tester GUI Tool
The web testing tool also allows the user to compare the file with a previously
downloaded file. The intent of this is to detect downloads in which the connection
is successful yet the actual material downloaded is corrupted. When comparing files,
an entry can be flagged if either the downloaded file is identical to the previously
downloaded file or differs from the previously downloaded depending upon how the
tool is configured.
The web site testing tool is not limited in the number of test sites that can
be tested. In testing the tool, up to 100 sites were tested simultaneously without
performance degradation. When enabled to run, each test operates as its own Java
Thread, thus preventing the behavior of one remote site from effecting other sites
191
being monitored.
The second portion of the web tester tool is a command line tool which allows
previously developed configurations for tests to be executed from a command shell.
This allows web testing to go on a background mode either via a UNIX script or
CRON job without a required graphical user interface being visible. During extended
duration tests, this is the preferred method to use.
The third portion of the tool, also a command line tool, post processed results
from the other segments of the web testing tool and created summary data reports
on website connectivity.
10.2.2 Initial Experimental Setup
The experiment began with setting up the Tempest software to serve a set of
test web pages in the University of Toledo OCARNet lab. For the purposes of this
experiment, two machines were isolated from the rest of the lab (and the rest of the
University of Toledo domain) through a standard commercial firewall and hub setup.
Two Linux workstations were setup on the OCARNet lab using the topology shown
in Figure 10-3. One machine served to execute the Tempest web server software,
serving out a sample web site. This second machine automatically polled the web
server once a minute and downloaded a series of the web pages from the sample web
site. The successes and failures were logged and stored.
In the first instance of testing, this setup executed continuously without software
failure for 2 calender months. However, this setup was flawed in several minor fash-
192
Figure 10-3: OCARNet Lab topology.
ions. First off, the Tempest software was only operated using one set of configuration
parameters. Second, and more importantly, the testing used very low bandwidth
utilization and did not stress the software from a performance perspective.
As part of the general web reliability study conducted by Schilling and Alam[SA07a],
a second test of the reliability of the software was conducted using a similar setup.
However, in this study, the web server operated using two different configuration pa-
rameters. However, the results related to the Tempest web server ended up being
flawed in that the computer used was inappropriately configured for the experiment
and suffered from significant performance problems which unfortunately influenced
193
the testing results1.
A third reliability study using the OCARNet equipment and a new Windows XP
machine was conducted. In this case, four different instances of the Tempest web
server executed simultaneously from one machine. Each of the four test instances ran
a different set of user configurable parameters, representing four different use cases
for the Tempest web server. This executed for one week before being abandoned
due to performance problems with the machine and required anti-virus software and
Windows Update features which could not be circumvented.
10.2.3 Final Experiment Setup
The final experimental setup used two Linux machines running in an independent
environment away from the OCARNet Lab. The experimental setup began by creat-
ing the network topology shown in Figure 10-4. In this topology, two Linux machines
were separated from the rest of the network by a router, effectively isolating them
from all traffic except for the web traffic between machines. One of the machines
served a set of web pages over the network. The second machine executed the Java
Web tester software.
Table 10.4: Tempest Test Instance Configurations
Test Instance TCP/IP Port Command Line Options1 9000 noauth nolog nodebug nopersist2 9001 noauth nolog nodebug persist3 9002 noauth nolog debug nopersist4 9003 noauth log nodebug nopersist
1While the Schilling and Alam[SA07a] article does include a Tempest Web server within itsresults, this is a separate machine. All data from the flawed experiment was removed before theanalysis of results were presented in that paper.
194
Figure 10-4: Network topology for test setup.
The machine executing the Tempest web server actually executed four different
instances of the web server in four different Unix processes. Each instance ran a dif-
ferent use case, manifested through the usage of different command line parameters,
as is shown in Table 10.4. While running multiple instances of Tempest simulta-
neously had been impossible under the Windows XP operating environment due to
resource constraints, the combination of a Dual Core microprocessor and the usage
of the Ubuntu Linux Operating System allowed for all four instances of Tempest to
run simultaneously without resource conflict or other performance problems.
Source Code Modifications
The experiment began by adding one class source code package, namely the Dum-
myLogger class, which is provided in Figure 10-5. This class provides for a single
195
01: package gov.nasa.grc.ewt.server;
02:
03: public class DummyLogger {
04: public static void LogAccess()
05: {
06: // This routine does nothing, as we do not log anything.
07: }
08:
09: }
Figure 10-5: DummyLogger.java class.
static method which can be called by any class within the Tempest project. This is
necessary due to a limitation of the JDI interface used by the SoSART tool. Under
certain circumstances, there will be method paths that will have code blocks either
optimized out of the final Java Byte Code or conditionals which do not contain ex-
ecutable code. In order for path tracing to function in a reliable fashion, each and
every code segment must contain at least one executable line on which a watchpoint
can be placed. In order to facilitate this, each and every code block as was parsed by
the SoSART tool was appended with a call to the DummyLogger.LogAccess static
method. While technically modifying the source code, this insertion was deemed not
to significantly change the behavior of the system, yet it did allow more accurate anal-
ysis with the SoSART tool and the JDI interface which it relies upon. An example
of changed code is given in Figure 10-6.
Once the code modification was completed and appropriately archived into the
local configuration management system, a clean build of the source code from the
archive occurred. In this operation, all existing class files and generated modules
were removed and rebuilt by the javac compiler. This ensured that any and all
remnants of the previous structure were removed.
196
01: package gov.nasa.grc.ewt.server;
02:
03: class NotFoundException extends Exception {
04: String message;
05:
06: public NotFoundException() {
07: super();
**08: DummyLogger.LogAccess();
09: }
10:
11: public NotFoundException(String s) {
12: super(s);
**13: DummyLogger.LogAccess();
14: message = new String(s);
15: }
16:
17: public String toString() {
**18: DummyLogger.LogAccess();
19: return message;
20: }
21: }
Figure 10-6: Modified NotFoundException.java class, showing lines added to call theDummyLogger routine.
The code was then imported into the SoSART analysis tool. This importation was
used to generate a set of tracepoints which would be used to log the program execution
profile under each of the four use cases that would be tested. The tracepoints were
saved to a text file.
10.2.4 Measured Reliability
The net goal for this series of experiments was to experimentally estimate the
reliability of the Tempest Web Server under four different use cases. In order to
do this, four different configurations of the Tempest Web Server were configured to
serve the same material. Each instance ran a different configuration. Over a 25 hour
period, the machines were then tested for operation and the number of failures and
197
mean time between failures was recorded for the test cases. This resulted in the data
provided in Table 10.5. Because of the fact that the first three use cases did not fail in
the first 24 hours of testing, the test was subsequently extended 168 hours. However,
the result remained substantially unchanged after 168 hours, as the first three use
cases still did not fail under test conditions.
Table 10.5: Tempest Field Measured Reliabilities
Configuration Operational Uptime Number of Failures MTBF(hours) (hours)
1 25.0 0 *2 25.0 0 *3 25.0 0 *4 14.3 3 4.77
* Can not be calculated, as no failures occurred.
Assuming an exponential probability density function, for the fourth use case, the
failure rate λ can be estimated through the equation
MTBF =1
λ(10.1)
which results in a λ of 0.2096. This can be translated into a reliability value of 0.8109
at one hour of program execution.
For the other examples, we must estimate the reliability based upon the fact that
there was no failure in the system. Using the relationship described in Hamlet and
Voas[HV93], the reliability for the first three use cases can be estimated to be 0.9840
at a 90% confidence interval.
Using this reliability as a calculation basis, and assuming an exponential proba-
bility density function, the MTBF for the software can be estimated to be 62.5 hours,
198
resulting in 33% probability of failure occurring at the 25 hour experimental cutoff.
10.2.5 Analyzing the Source code for Statically Detectable
Faults
The initial intent of analyzing the Tempest source code was to start by enabling
all static analysis rules on all tools. Thus, each and every potential rule would be
output, and each and every rule could be pulled into the SoSART tool. This would
allow SoSART to have a complete picture of the occurrence rate for each rule, and
all tools could be correlated in the most optimum fashion.
To automate the process of analysis, allowing for deterministic repeatability within
the process, the analysis was automated using a Apache Ant build script which au-
tomatically invoked each of the 10 static analysis tools.
Table 10.6: Tempest Rule Violation Count with All Rules EnabledTool 1 2 3 4 5 6 7 8 9 10ContentTag.java 0 1 33 2 25 334 14 243 14 13DummyLogger.java 0 1 2 1 0 12 0 5 0 1GetString.java 7 3 3 2 3 52 1 25 0 1HTTPFile.java 0 9 32 4 28 263 33 189 14 15HTTPString.java 4 31 79 9 134 464 85 1074 32 38HeadString.java 5 1 3 2 0 30 1 13 0 1Logger.java 0 10 8 2 13 130 6 97 4 3MessageBuffer.java 0 5 12 0 0 63 3 38 0 1NotFoundException.java 1 3 2 1 0 27 1 16 0 1ObjectTag.java 3 1 25 4 79 514 23 355 22 19PostString.java 8 4 3 2 0 37 2 20 6 2RuntimeFlags.java 4 1 9 4 0 65 4 41 0 0Tempest.java 40 10 60 7 145 873 34 656 44 21TimeRFC1123.java 0 1 11 1 0 48 1 25 2 1SomeClass.java 0 4 9 3 2 72 4 45 0 4Total 72 85 291 44 429 3984 212 2842 138 121
Using this approach, however, had one significant drawback. Because all of the
tools had all rules enabled, there was a significant number of violations which were
flagged, as is shown in 10.6. The 8218 rules which were flagged by the analysis tools
unfortunately overwhelmed the internal SoSART database engine, and this complete
199
Table 10.7: Tempest Rule Violation Densities with All Rules EnabledFile LOC 1 2 3 4 5 6 7 8 9 10ContentTag.java 300 0.000 0.003 0.110 0.006 0.083 1.11 0.046 0.810 0.046 0.043DummyLogger.java 1 0.000 1.000 2.000 1.000 0.000 12.0 0.000 5.00 0.000 1.000GetString.java 16 0.437 0.187 0.187 0.125 0.187 3.25 0.062 1.56 0.000 0.062HTTPFile.java 145 0.000 0.062 0.220 0.027 0.193 1.81 0.227 1.30 0.096 0.103HTTPString.java 797 0.005 0.038 0.099 0.011 0.168 1.83 0.106 1.34 0.040 0.047HeadString.java 5 1.000 0.200 0.600 0.400 0.000 6.00 0.200 2.60 0.000 0.200Logger.java 74 0.000 0.135 0.108 0.027 0.175 1.75 0.081 1.31 0.054 0.040MessageBuffer.java 21 0.000 0.238 0.571 0.000 0.000 3.00 0.142 1.80 0.000 0.047NotFoundException.java 7 0.142 0.428 0.285 0.142 0.000 3.85 0.142 2.28 0.000 0.142ObjectTag.java 300 0.010 0.003 0.083 0.013 0.263 1.71 0.076 1.18 0.073 0.063PostString.java 12 0.666 0.333 0.250 0.166 0.000 3.08 0.166 1.66 0.500 0.166RuntimeFlags.java 16 0.250 0.062 0.562 0.250 0.000 4.06 0.250 2.56 0.000 0.000Tempest.java 552 0.072 0.018 0.108 0.012 0.262 1.58 0.061 1.18 0.079 0.038TimeRFC1123.java 17 0.000 0.058 0.647 0.058 0.000 2.82 0.058 1.47 0.117 0.058SomeClass.java 27 0.000 0.148 0.333 0.111 0.074 2.66 0.148 1.66 0.000 0.148Total 1989 0.036 0.042 0.146 0.022 0.215 2.00 0.106 1.42 0.069 0.060
analysis could not occur. On average, 4.1317 warnings were issued for each line of
code present within the software, as is shown in Table 10.7.
Table 10.8: Static Analysis Rule Configuration MetricsTool Overall 1 2 3 4 5 6 7 8 9 10Total Rules Detected 1094 85 21 207 276 * 176 90 39 19 181Rules Disabled 620 2 0 156 252 * 73 55 14 0 68Rules Enabled 484 83 21 51 24 * 103 45 25 19 113Percent Disabled 56.7% 2.4% 0.0% 75.4% 91.3% * 41.5% 61.1% 35.9% 0.0% 37.6%* For this tool, it was impossible to obtain the metrics, as the documentation did not provide a complete warning listing.
To avoid this problem, it was necessary to configure each tool independently in or-
der to filter out those warnings which would not be capable of causing a direct system
failure. The exercise was conducted using the methodology described in Schilling and
Alam[SA06c]. All in all, once all ten of the tools were properly configured, 56.7% of
the rules had been disabled as being either stylistic in nature or otherwise represent-
ing faults which would not result in a system failure based upon the characteristics
of the detected fault. As is shown in Table 10.8, the percentage of rules disabled was
different for each tool, ranging from 0.0% to 75.4%.
Once each of the tools had been properly configured and inappropriate warnings
had been removed from analysis, the static analysis tools were re-executed using the
newly created configuration profiles, resulting in a total of 1867 warnings being issued
by the tools, as is shown in Table 10.9.
200
Table 10.9: Tempest Rule Violation Count with Configured Rulesets.Tool 1 2 3 4 5 6 7 8 9 10ContentTag.java 0 1 4 0 25 85 2 2 14 4DummyLogger.java 0 1 0 0 0 1 0 0 0 0GetString.java 2 3 0 1 3 23 0 0 0 0HTTPFile.java 0 9 6 0 28 63 13 1 14 4HTTPString.java 3 31 12 0 134 374 0 7 32 5HeadString.java 1 1 0 1 0 12 0 0 0 0Logger.java 0 10 0 1 13 26 1 1 4 3MessageBuffer.java 0 5 0 0 0 8 1 1 0 1NotFoundException.java 1 3 0 0 0 4 0 0 0 0ObjectTag.java 3 1 10 1 79 166 0 5 22 2PostString.java 3 4 0 1 0 14 0 0 6 0RuntimeFlags.java 4 1 0 4 0 13 4 0 0 0Tempest.java 39 10 9 5 145 199 5 6 44 8TimeRFC1123.java 0 1 1 0 0 12 1 1 2 1SomeClass.java 0 4 2 0 2 20 0 1 0 1Total 56 85 44 14 429 1020 27 25 138 29
10.2.6 SOSART Reliability Assessment
Once the statically detectable faults had detected by the static analysis tools,
these outputs were analyzed using the SOSART tool and assessed for their validity.
Of the 1967 warnings detected, 456, or 23.1%, were deemed to be valid faults which
had the potential of causing some form of systemic operational degradation which
might result in a software failure.
Table 10.10: Tempest Estimated Reliabilities using SOSARTOption Set SOSART Estimated Reliability
1 noauth nolog nodebug nopersist 0.98982 noauth nolog nodebug persist 0.98983 noauth nolog debug nopersist 0.98984 noauth log nodebug nopersist 0.9757
Using the SOSART tool, and assigning all execution paths the likelihood of “Nor-
mal” for their execution rate, the base reliability for the SOSART tool was estimated
to be 0.8427. Using the program execution profile generated by limited evaluations of
the usage options provided in Table 10.4, a set of estimated reliabilities were obtained,
as is shown in Table 10.10. It is important to note that the first three use cases result
in the exact same reliability estimation. This is caused by the fact that their exe-
cution profiles upon which the estimate are based are virtually identical. Execution
201
profiles for use cases 1 and 2 differ by only 9 execution points out of a total of 234
execution points, and do not include any different method invocations Furthermore,
each of these differences can be attributed to a single logical change within a method
in that in the first profile one branch is taken for a given decision but in the second
example a different branch of execution occurs. When making the same comparison
between the first execution profile and the third execution profile, there is a net total
of 41 branch locations which are different out of a total of 301. However, the exact
same methods are still invoked as are invoked in the first two profiles.
In the fourth case, which has a lower reliability score, there are 69 execution point
locations which different between the first and the fourth execution profiles. However,
more importantly, the fourth execution profile includes six method invocations in
classes which are not even used in the first two execution profiles. Thus, it can
clearly be justified that the first three execution profiles, given the granularity of
measurement for this experiment, will have identical reliability values while the fourth
use case will have a different reliability estimation due to the difference in execution
profiles.
Once all four reliabilities have been calculated, a comparison between the mech-
anisms can be obtained. In the first three use cases, the reliability estimated by
SOSART and the reliability calculated based upon field testing were quite similar,
with values of 0.9898 and 0.9840 respectively, representing a 0.5% difference. In the
case of the fourth use case, SOSART estimated a reliability of 0.9757 while the actual
measured reliability was 0.8109, or a difference of 16.89%.
Chapter 11
Conclusions and Future Directions
11.1 Conclusions
The problem of software reliability is vast and ever growing. As more and more
complex electronic devices rely further upon software for fundamental functionality,
the impact of software failure becomes greater. Market forces, however, have made
it more difficult to measure software reliability through traditional means. The reuse
of previously developed components, the emergence of open source software, and the
purchase of developed software has made delivering a reliable final product more
difficult.
The first chapter of this dissertation emphasized the need to investigate software
reliability. This need is urgent as the cost of failure to the American economy is
estimated to be $59.5 billion annually and is certain to grow, as the quantity of
embedded software in everyday items doubles every 18 months. Software reliability
problems have surpassed hardware as the principle source for system failure by at
202
203
least a factor of ten.
To provide justification for study, it was important to analyze past failures. Post-
mortum analysis of failure has been a common mechanism in other engineering fields,
yet has generally been lacking in the area of software engineering. Because of this
failure to obtain a historical perspective, there have been many failure modes which
repeatedly reoccur in different products. To this end, numerous case studies of failure
were presented to introduce the subject. Part of this presentation included whether
software static analysis tools would have been capable of detecting the fault and thus
preventing the failure. It was found that in a significant number of cases the fault
that ultimately led to the failure of the system was statically detectable.
This analysis led to the conceptual proposal to establish a relationship between
software reliability and statically detectable faults. While static analysis can not
guarantee completeness in its analysis, it has shown to be quite effective at detecting
faults.
It is with this background that an approach to modeling software reliability was
presented which targets the estimation of reliability for existing software. Traditional
software reliability models require significant data collection during development and
testing, including the operational time between failures, the severity of the failures,
code coverage during testing, and other metrics. In the case of COTS software pur-
chases or open source code, this development data is often not readily available,
leaving a software engineer little information to make an informed decision regarding
the reliability of a software module. This reliability model does not suffer from this
limitation as it only requires black box testing and static analysis of the source code
204
to estimate reliability. The reliability is calculated through a Bayesian Belief Network
incorporating the path coverage obtained during limited testing, the structure of the
source code, and results from multiple static analysis tools combined using a meta
tool.
Next, it was necessary to establish that static analysis tools can effectively find
faults within a Java program. This was established through the development of a
validation suite which proved that static analysis tools can be effective at finding
faults seeded within a validation suite. Overall, ten different analysis tools were used
to find 50 seeded faults, and 82% of the faults were detectable by one or more tools.
More importantly, 44% of the faults were detected by two or more static analysis
tools, indicating that multiple tool may aid in the reduction of false positives from
static analysis executions.
The static analysis tool effectiveness experiment also emphasized the importance
of proper tool configuration. In this case, the number of valid warnings was dwarfed
by the number of false positives detected which were incapable of causing a software
failure. However, it was also found that these false positives were often limited to a
small number of rules which could easily be disabled.
In order for this reliability model to be applied to software a reliability toolkit was
necessary. Therefore, a Software Static Analysis Reliability Toolkit (SOSART) was
constructed to allow the user to apply the reliability model to non-trivial projects.
An overview of the requirements for this tool as well as the tool usage was provided.
Proof of concept validation for the reliability model was presented in two exper-
iments. In the first experiment, the results of the SOSART reliability model were
205
compared with the results from the STREW metrics reliability model. In all cases,
the SOSART estimates were determined to be within the confidence interval for the
accuracy of the STREW metrics model, and typically less than 2% away from the es-
timate of the STREW metrics. In the second experiment, the results of applying the
SOSART method to an existing software project were presented. In this experiment,
an existing program is assessed for its operational reliability on the basis of four dif-
ferent sets of configuration parameters. Three of the parameter sets are found to have
identical reliabilities while the fourth is found to have a lesser reliability value. This
was both predicted by the SOSART tool and validated by the experimental results.
While the SOSART results exhibit slightly larger error than would be desired, this
can be explained by the highly non-linear nature of software execution.
These experiments provided the required proof of concept for the validity of this
reliability model. From a conceptual standpoint, it is possible to provide a high
level estimate of software reliability from static analysis tool execution coupled with
execution profiling. Furthermore, through an effort assessment, the method is shown
to be cost effective in that the effort required to apply this method is less than what
would be expected for a complete code review of source code.
11.2 Future Directions
This dissertation has put forth a proof of concept experiment that static analysis
can be used to estimate the reliability of a given software package. However, while
this work has been successful as a proof of concept, there are many areas which need
206
to be further investigated.
First, in terms of the Bayesian belief Network, we realize that our network is
highly simplified and may be improved in accuracy through additional parameters
and additional resolution of the given parameters.
While we considered clustering effects at the method and file level, it is known
that clustering occurs at the package and project levels as well. What we do not
know is the relationship between clustering at the various levels. For example, is
clustering at the method level more indicative of a valid cluster versus clustering at
the package level? How does clustering change as a module undergoes revision by
multiple development teams?
The impact of independent validation also needs further assessment. While our
results indicated that multiple tools often did detect the same faults, we also saw
multiple false positives being detected as well. Is the assumption that was made re-
garding the independence of algorithms truly valid? While we would like to believe
that independent commercial tools should be independent, the research of Knight and
Leveson [KL86] [KL90] indicates that n version programming may not result in as
much independence as would be anticipated, for while the versions are developed in-
dependently, the differences in algorithms is not as diverse as would be expected. This
frame of reference needs to be addressed with the algorithms and implementations
used for static analysis tools.
Our Bayesian Belief Network also simplifies the concept of a maintenance risk. It
is known that certain code constructs, such as missing braces for if constructs, often
results in implementation failures as a module is maintained. What we do not know
207
is qualitatively the risk that this poses over time. Our model simply indicates that a
coded construct either is or is not a maintenance risk. But yet, this parameter more
than likely has some degree of variability associated with it and should be modeled
in the same manner as the Immediate Failure Risk. Clearly additional research using
lessons learned databases and difference analysis of existing program modules across
revisions as field defects are fixed may be capable of addressing this issue.
Similar areas for research exist on the code coverage side of the Bayesian Belief
Network. We know that at a high level the assumptions used to generate this model
are valid, but in many cases, the exact relationship has not been definitively shown
through empirical study.
The SOSART tool clearly needs additional performance tuning and development.
Due to technical limitation, it was incapable of analyzing programs much larger than
approximately 2 KLOC. While that was acceptable for a proof of concept application
in which smaller programs were used, it is imperative that the tool be capable of
efficiently and reliably analyzing larger programs as well. It may be advisable that
the GUI used for fault analysis be separated from the mathematical model used to
calculate reliability, thus saving memory and allowing for distributed analysis of the
tool.
The model also needs a significant analysis in terms of its granularity. While in
this experiment the software programs typically exhibited what would be considered
relatively low reliabilities, the tool itself suffered from numerical granularity problems
in calculation which seemingly preclude its utilization with higher reliability modules.
This may be an effect of the network itself, or it may be an effect of the definitions
208
placed upon the model by the application.
Further experimental validation is also necessary in order to determine the rela-
tionships present in the model. While each of the software packages assessed in this
dissertation has used the same network coefficients, it is highly probable that the
relationships between nodes are not necessarily the same for all developed software.
The work of Nagappan et al.[NBZ06] indicates that there is no single set of predictive
metrics which can be used to estimate field failure rates. We believe that these con-
clusions also apply to our model, and that there may be multiple relationships which
are specific to the project domain or project family. While our model is targeted at
Embedded Systems, the bulk of the validation occurred with non-embedded appli-
cation programs. It is expected that with proper assessment in different domains as
well as the appropriate calibration using the built in calibration parameters for the
network, more accurate results can be obtained.
Another area of research is to look at the impact of statically detectable faults on
subsequent software releases. As has been discussed in Schilling and Alam[SA06c], it
is often a risk management issue which decides whether known statically detectable
faults are removed between releases of a given project. It may be possible to relate the
change in software reliability between releases with the change in statically detectable
faults given the appropriate analysis.
Lastly, for static analysis (or any other software engineering method) to be com-
mercially acceptable, it must be cost effective. While this research looked at cost in
terms of time, this is not entirely accurate. There are direct monetary costs asso-
ciated with static analysis unrelated to effort, including licensing fees, tool configu-
209
ration, training exercises, and others. In order for this method to be viable in the
commercial software engineering environment, it must be both applicable to the needs
of practicing software engineers. This has been one of the goals of this research. The
proof of concept model presented here appears promising, although it has only been
evaluated in the academic arena.
Bibliography
[AB01] C. Artho and A. Biere. Applying Static Analysis to Large-Scale, Multi-
threaded Java Programs. In Proceedings of the 13th Australian Software
Engineering Conference, pages 68–75, Canberra, Australia, 2001. IEEE
Computer Society Press.
[Ada84] Edward N. Adams. Optimizing preventive service of software products.
IBM J. Research and Development, 28(1):2–14, January 1984.
[Agu02] Joy M. Agustin. JBlanket: Support for Extreme Coverage in Java Unit
Testing. Technical Report 02-08, University of Hawaii at Manoa, 2002.
[AH04] Cyrille Artho and Klaus Havelund. Applying JLint to space exploration
software. In VMCAI, pages 297–308, 2004.
[All02] Eric Allan. Bug Patterns in Java. Apress, September 2002. ISBN: 1-
59059-061-9.
[Ana04] Charles River Analytics. About Bayesian Belief Networks. Charles River
Analytics, Inc., 625 Mount Auburn Street Cambridge, MA 02138, 2004.
210
211
[And96] Tom Anderson. Ariane 501. E-mail on safety critical mailing list., July
1996.
[Arn00] Douglas N. Arnold. The Patriot Missle Failure. Website, August 2000.
[Art01] C. Artho. Finding Faults in Multi-Threaded Programs. Master’s thesis,
Federal Institute of Technology, 2001.
[AT01] Paul Anderson and Tim Teitelbaum. Software inspection using
CodeSurfer . In WISE01: Proceedings of the First Workshop on In-
spection in Software Engineering, July 2001.
[BDG+04] Guillaume Brat, Doron Drusinsky, Dimitra Giannakopoulou, Allen Gold-
berg, Klaus Havelund, Mike Lowry, Corina Pasareanu, Arnaud Venet,
Willem Visser, and Rich Washington. Experimental evaluation of verifi-
cation and validation tools on Martian Rover software. Form. Methods
Syst. Des., 25(2-3):167–198, 2004.
[Ben06] David Benson. JGraph and JGraph Layout Pro User Manual, December
2006.
[BK03] Guillaume Brat and Roger Klemm. Static analysis of the Mars explo-
ration rover flight software. In Proceedings of the First International
Space Mission Challenges for Information Technology, pages 321–326,
2003.
[BL06] Steve Barriault and Marc Lalo. Tutorial: How to statically ensure soft-
ware reliability. Embedded Systems Design, 19(5), 2006.
212
[Blo01] Joshua Bloch. Effective Java programming Language Guide. Sun Mi-
crosystems, Inc., Mountain View, CA, USA, 2001.
[Bol02] Phillip J. Boland. Challenges in software reliability and testing. In Third
International Conference on Mathematical Methods in Reliability Method-
ology and Practice, 2002.
[BPS00] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A static
analyzer for finding dynamic programming errors. Software Practice and
Experience, 30(7):775–802, 2000.
[BR02] Thomas Ball and Sriram K. Rajamani. The SLAM project: debugging
system software via static analysis. In POPL, pages 1–3, 2002.
[Bro04] Matthew Broersma. Microsoft server crash nearly causes 800-plane pile
up. Techworld, 2004.
[BV03] Guillaume Brat and Arnaud Venet. Static program analysis using Ab-
stract Interpretation. Unpublished tutorial, Proceedings ASE’03: 18th
IEEE International Conference on Automated Software Engineering, Oc-
tober 6-10 2003.
[Car92] Ralph V. Carlone. GAO report: Patriot missle defense - software problem
led to system failure at Dhahran, Saudi Arabia. Technical report, General
Accounting Office, February 1992. GAO/IMTEC-92-26.
[Cen96] Reliability Analysis Center. Introduction to Software Reliability: A state
of the Art Review. Reliability Analysis Center (RAC), 1996.
213
[Che80] Roger C. Cheung. A user-oriented software reliability model. IEEE
Transactions on Software Engineering, 6(2):118–125, March 1980.
[CM04] Brian Chess and Gary McGraw. Static analysis for security. IEEE Secu-
rity & Privacy, 2(6):76–79, November-December 2004.
[Cor05] Steve Cornett. Code coverage analysis. Website, December 2005.
http://www.bullseye.com/coverage.
[Coz99] Fabio Gagliardi Cozman. Embedded Bayesian Networks.
http://www.cs.cmu.edu/˜javabayes/EBayes/Doc/, 1999.
[Coz01] Fabio G. Cozman. The JavaBayes System. ISBA Bulletin, 7(4):16–21,
2001.
[Cre05] Jack W. Crenshaw. Time to re-evaluate Windows CE. Embedded Systems
Programming, 18(2):9–14, 2005.
[Dan97] Carl Daniele. Embedded web technology: Internet technology applied to
real-time system control. Research & Technology 1997, 1997.
[Dar88] Ian F. Darwin. Checking C Programs with Lint. O’Reilly and Associates,
Inc,, 103 Morris Street, Suite A Sebastopol, CA 95472, October 1988.
[Den99] Jason Denton. Accurate Software Reliability Estimation. Master’s thesis,
Colorado State University, 1999.
[Dew90] Philip Elmer Dewitt. Ghost in the machine. Time, pages 58–59, January
29 1990.
214
[DLZ05] Valentin Dallmeier, Christian Lindig, and Andreas Zeller. Evaluating a
lightweight defect localization tool. In Workshop on the Evaluation of
Software Defect Detection Tools, June 12 2005.
[DZN+04] Martin Davidsson, Jiang Zheng, Nachiappan Nagappan, Laurie Williams,
and Mladen Vouk. GERT: An empirical reliability estimation and testing
feedback tool. In 15th International Symposium on Software Reliability
Engineering (ISSRE’04), pages 269–280, 2004.
[EGHT94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint:
A tool for using specifications to check code. In Proceedings of the ACM
SIGSOFT ’94 Symposium on the Foundations of Software Engineering,
pages 87–96, 1994.
[EKN98] William Everett, Samuel Keene, and Allen Nikora. Applying software
reliability engineering in the 1990s. IEEE Transactions on Reliability,
47(3):372–378, September 1998.
[EL03] Davie Evans and David Larochelle. Splint Manual. Secure Programming
Group University of Virginia Department of Computer Science, June 5
2003.
[ELC04] The Economic Impacts of the August 2003 Blackout. Technical report,
Electricity Consumers Resource Council, 2004.
[Eng05] Dawson R. Engler. Static analysis versus model checking for bug finding.
In CONCUR, page 1, 2005.
215
[FCJ04] Thomas Flowers, Curtis A. Carver, and James Jackson. Empowering stu-
dents and building confidence in novice programmers through Gauntlet.
In 34th ASEE IEEE Frontiers in Education Conference, 2004.
[FGMP95] Fabio Del Frate, Praerit Garg, Aditya P. Mathur, and Alberto Pasquini.
On the correlation between code coverage and software reliability. In
Proceedings of the Sixth International Symposium on Software Reliability
Engineering, pages 124–132, 1995.
[FL01] Cormac Flanagan and K. Rustan M. Leino. Houdini, an Annotation
Assistant for ESC/Java. In Proceedings of the International Symposium
of Formal Methods Europe on Formal Methods for Increasing Software
Productivity, pages 500–517, London, UK, 2001. Springer-Verlag.
[For05] Jeff Forristal. Source-code assessment tools kill bugs dead. Secure En-
terprise, December 2005.
[FPG94] Norman Fenton, Shari Lawrence Pfleeger, and Robert L. Glass. Science
and substance: A challenge to software engineers. IEEE Softw., 11(4):86–
95, 1994.
[Gan00] Jack Ganssle. Crash and burn: Disasters and what we can learn from
them. Embedded Systems Programming, November 2000.
[Gan01] Jack Ganssle. The best ideas for developing better firmware faster. Tech-
nical Presentation, 2001.
[Gan02a] Jack Ganssle. Born to fail. Embedded Systems Programming, 2002.
216
[Gan02b] Jack Ganssle. Codifying good software design. Embedded.com, August
2002.
[Gan04] Jack Ganssle. When disaster strikes. Embedded Systems Programming,
November 11 2004.
[Gar94] Praerit Garg. Investigating coverage-reliability relationship and sensitiv-
ity of reliability to errors in the operational profile. In CASCON ’94:
Proceedings of the 1994 conference of the Centre for Advanced Studies on
Collaborative research, page 19. IBM Press, 1994.
[Gar95] Praerit Garg. On code coverage and software reliability. Master’s thesis,
Purdue University, Department of Computer Sciences, May 1995.
[Gep04] Linda Geppert. Lost radio contact leaves pilots on their own. IEEE
Spectrum, November 2004.
[Ger04] Andy German. Software static code analysis lessons learned. Crosstalk,
16(11):13–17, 2004.
[GH01] Bjorn Axel Gran and Atte Helminen. A Bayesian Belief Network for
Reliability Assessment. In SAFECOMP ’01: Proceedings of the 20th
International Conference on Computer Safety, Reliability and Security,
pages 35–45, London, UK, 2001. Springer-Verlag.
[Gie98] Dirk Giesen. Philosophy and practical implementation of static analyzer
tools. Technical report, QA Systems Technologies, 1998.
217
[GJC+03] Vinod Ganapathy, Somesh Jha, David Chandler, David Melski, and
David Vitek. Buffer overrun detection using linear programming and
static analysis. In Proceedings of the 10th ACM conference on Computer
and Communications Security, pages 345–354, New York, NY, USA,
2003. ACM Press.
[GJSB00] James Gosling, Bill Joy, Guy L. Steele, and Gilad Bracha. The Java
Language Specification. Java series. Second edition, 2000.
[Gla79] Robert L. Glass. Software Reliability Guidebook. Prentice-Hall, Engle-
wood Cliffs, NJ, 1979.
[Gla99a] Robert L. Glass. Inspections - some surprise findings. Communications
ACM, 42(4):17–19, 1999.
[Gla99b] Robert L. Glass. The realities of software technology payoffs. Communi-
cations ACM, 42(2):74–79, 1999.
[Gle96] James Gleick. A bug and a crash. New York Times Magazine, December
1996.
[God05] Patrice Godefroit. The soundness of bugs is what matters. In Proceed-
ings of BUGS’2005 (PLDI’2005 Workshop on the Evaluation of Software
Defect Detection Tools), Chicago, IL, June 2005.
[Gra86] Jim Gray. Why do computers stop and what can be done about it? Proc.
5th Symp. on Reliability in Distributed Software and Database Systems,
pages 3–12, 1986.
218
[Gra00] Codesurfer technology overview: Dependence graphs and program slic-
ing. Technical report, GrammaTech, 2000.
[Gri04a] Chris Grindstaff. Findbugs, part 1: Improve the quality of your code
why and how to use findbugs. IBM DeveloperWorks, May 2004.
[Gri04b] Chris Grindstaff. Findbugs, part 2: Writing custom detectors how to
write custom detectors to find application-specific problems. IBM Devel-
operWorks, May 2004.
[Gro01] Michael Grottke. Software Reliability Model Study. Technical Report
IST-1999-55017, PETS, January 2001.
[GT97] Swapna Gokhale and Kishor Trivedi. Structure-based software reliability
prediction. In Proc. of Advanced Computing (ADCOMP), Chennai, India,
1997.
[GT05] Michael Grottke and Kishor S. Trivedi. A clasification of software faults.
In Supplementary Proceedings 16th IEEE International Symposium on
Software Reliability Engineering, Chicago, Illinois, 8-11 November 2005.
[Hac04] Mark Hachman. NASA: DOS glitch nearly killed mars rover. Extreme-
Tech, August 2004.
[Had99] P. Haddaway. An overview of some recent developments in Bayesian
problem solving techniques, 1999.
219
[Hal99] Todd Halvorson. Air Force Titan 4 rocket program suffers another failure.
Florida Today, May 8 1999.
[Hal03] Christopher D. Hall. When spacecraft wont point. In 2003 AAS/AIAA
Astrodynamics Specialists Conference, Big Sky, Montana, August 2003.
[Har99] K. J. Harrison. Static code analysis on the C-130J Hercules safety-critical
software. Technical report, Aerosystems International, UK, 1999.
[Hat95] Les Hatton. Safer C: Developing for High-Integrity and Safety-Critical
Systems. McGraw-Hill, January 1995.
[Hat99a] Les Hatton. Ariane 5: A smashing success. Software Testing and Quality
Engineering, 1(2), 1999.
[Hat99b] Les Hatton. Software faults and failures: Avoiding the avoidable and
living with the rest. Draft text from “Safer Testing” Course, December
1999.
[Hat07] Les Hatton. Language subsetting in an industrial context: a compari-
son of MISRA C 1998 and MISRA C 2004. Information and Software
Technology, 49(1):475–482, May 2007.
[HD03] Elise Hewett and Paul DiPalma. A survey of static and dynamic analyzer
tools. In Proceedings 1st CSci 780 Symposium on Software Engineering,
The College of William and Mary, December 15-16 2003.
[HFGO94] Monica Hutchins, Herb Foster, Tarak Goradia, and Thomas Ostrand.
220
Experiments of the effectiveness of dataflow and controlflow based test
adequacy criteria. In ICSE ’94: Proceedings of the 16th International
Conference on Software Engineering, pages 191–200, Los Alamitos, CA,
USA, 1994. IEEE Computer Society Press.
[HJv00] Marieke Huisman, Bart Jacobs, and Joachim van den Berg. A case study
in class library verification: Java’s Vector Class. Technical Report CSI-
R0007, 2000.
[HL02] Sudheendra Hangal and Monica S. Lam. Tracking down software bugs
using automatic anomaly detection. In Proceedings of the 24th Interna-
tional Conference on Software Engineering, May 2002.
[HLL94] Joseph R. Horgan, Saul London, and Michael R. Lyu. Achieving software
quality with testing coverage measures. IEEE Computer, 27(9):60–69,
September 1994.
[Hof99] Eric J. Hoffman. The NEAR rendezvous burn anomaly of december 1998.
Technical report, Johns Hopkins University, 1999.
[Hol99] C. Michael Holloway. From bridges to rockets: Lessons for software sys-
tems. In Proceedings of the 17th International System Safety Conference,
pages 598–607, August 1999.
[Hol04] Ralf Holly. Lint metrics and ALOA. C/C++ Users Journal, pages 18–22,
June 2004.
221
[Hot01] Chris Hote. Run-time error detection through semantic analysis: A
breakthrough solution to todays software testing inadequacies in auto-
motive. Technical report, Polyspace Technologies, September 2001.
[HP00] Klaus Havelund and Thomas Pressburger. Model checking JAVA pro-
grams using JAVA PathFinder. STTT, 2(4):366–381, 2000.
[HV93] Dick Hamlet and Jeff Voas. Faults on its sleeve: amplifying software
reliability testing. In ISSTA ’93: Proceedings of the 1993 ACM SIGSOFT
international symposium on Software testing and analysis, pages 89–98,
New York, NY, USA, 1993. ACM Press.
[HW06] Sarah Heckman and Laurie Williams. Automated adaptive ranking and
filtering of static analysis alerts. In 17th International Symposium on
Software Reliability Engineering, Raleigh, NC, November 2006.
[ISO90] International Standard ISO/IEC9899 - Programming Languages - C, De-
cember 1990. ISO/IEC9899-1990.
[ISO99] International Standard ISO/IEC9899 - Programming Languages - C, De-
cember 1999. ISO/IEC9899-1999.
[ISO03] ISO/IEC 14882 Programming languages C++ (Langages de programma-
tion C++). Technical report, International Standard ISO/IEC, Ameri-
can National Standards Institute, 25 West 43rd Street, New York, New
York 10036, October 15 2003.
[JBl] Jblanket. Online at. http://csdl.ics.hawaii.edu/Tools/JBlanket/.
222
[Jel04] Rick Jelliffe. Mini-review of Java bug finders. The O’Reilly Network,
March 15 2004.
[Jes04] Anick Jesdanun. GE energy acknowledges blackout bug. The Associated
Press, February 2004.
[JM97] Jean-Marc Jezequel and Bertrand Meyer. Design by contract: The lessons
of Ariane. Computer, 30(1):129–130, 1997.
[Joh78] S.C. Johnson. Lint, a C Program Checker. Unix Programmer’s Man-
ual 65, AT&T Bell Laboratories, 1978.
[KA] Konstantin Knizhnik and Cyrille Artho. JLint manual.
[Kan95] Cem Kaner. Software negligence and testing coverage. Technical report,
Florida Tech, 1995.
[KAYE04] Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. Corre-
lation exploitation in error ranking. In SIGSOFT ’04/FSE-12: Proceed-
ings of the 12th ACM SIGSOFT / Twelfth International Symposium on
Foundations of Software Engineering, pages 83–93, New York, NY, USA,
2004.
[KL86] J. C. Knight and N. G. Leveson. An experimental evaluation of the
assumption of independence in multiversion programming. IEEE Trans-
actions on Software Engineering, 12(1):96–109, 1986.
[KL90] John C. Knight and Nancy G. Leveson. A reply to the criticisms of the
223
Knight & Leveson experiment. SIGSOFT Softw. Eng. Notes, 15(1):24–
35, 1990.
[Koc04] Christopher Koch. Bursting the CMM hype. Software Quality, March 1
2004.
[Lad96] Peter B. Ladkin. Excerpt from the Case Study of The Space Shuttle
Primary Control System. Excerpted 12 August 1996, Communications
of the ACM 27(9), September 1984, p886, August 1996.
[LAW+04] Kathryn Laskey, Ghazi Alghamdi, Xun Wang, Daniel Barbara, Tom
Shackelford, Ed Wright, and Julie Fitzgerald. Detecting threatening be-
havior using bayesian networks. In Proceedings of the Conference on
Behavioral Representation in Modeling and Simulation, 2004.
[LB05] Marc Lalo and Steve Barriault. Maximizing software reliability and de-
veloper’s productivity in automotive: Run-time errors, MISRA, and se-
mantic analysis. Technical report, Polyspace Technologies, 2005.
[LE01] David Larochelle and David Evans. Statically detecting likely buffer over-
flow vulnerabilities. In USENIX Security Symposium, pages 177–190,
Washington, D. C., August 13-17 2001.
[Lee94] S. C. Lee. How Clementine really failed and what NEAR can learn. John
Hopkins University Applied Physics Laboratory Memorandum, May 26
1994.
224
[Lev94] Nancy G. Leveson. High-pressure steam engines and computer software.
IEEE Computer, pages 65–73, October 1994.
[LG99] Craig Larman and Rhett Guthrie. Java 2 Performance and Idiom Guide.
1999.
[Lio96] J. L. Lions. Ariane 5 flight 501 failure report by the inquiry board.
Technical report, CNES, 1996.
[LL05] V. Benjamin Livshits and Monica S. Lam. Finding security vulnerabili-
ties in Java applications with static analysis. In 14th USENIX Security
Symposium, 2005.
[LLQ+05] Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou.
Bugbench: Benchmarks for evaluating bug detection tools. In Proceedings
of the Workshop on the Evaluation of Software Defect Detection Tools,
June 2005.
[LM95] Naixin Li and Y. K. Malaiya. ROBUST: a next generation software re-
liability engineering tool. In Proceedings of the Sixth International Sym-
posium on Software Reliability Engineering, pages 375–380, Toulouse,
France, October 24–27 1995.
[LT93] Nancy Leveson and Clark S. Turner. An investigation of the Therac-25
accidents. IEEE Computer, 26(7):18–41, 1993.
[Lyu95] Michael R. Lyu, editor. Handbook of Software Reliability Engineering.
Number ISBN 0-07-039400-8. McGraw-Hill publishing, 1995.
225
[Maj03] Dayle G. Majors. An investigation of the call integrity of the Linux
System. In Fast Abstract ISSRE, 2003.
[Mar99] Brian Marick. How to misuse code coverage. Technical report, Reliable
Software Technologies, 1999.
[MB06] Robert A. Martin and Sean Barnum. A status update: The common
weaknesses enumeration. In NIST Static Analysis Summit, Gaithersburg,
MD, June 29 2006.
[McA] McAfee. W32/nachi.worm.
[MCJ05] Robert A. Martin, Steven M. Christey, and Joe Jarzombek. The case for
common flaw enumeration. Technical report, MITRE Corporation, 2005.
[ME03] M. Musuvathi and D. Engler. Some lessons from using static analysis
and software model checking for bug finding, 2003.
[Mef05] Barmak Meftah. Benchmarking bug detection tools. In Workshop on the
Evaluation of Software Defect Detection Tools, June 2005.
[Mey92] Scott (Scott Douglas) Meyers. Effective C++: 50 specific ways to improve
your programs and designs. Addison-Wesley professional computing se-
ries. Addison Wesley Professional, 75 Arlington Street, Suite 300 Boston,
MA 02116, 1992.
[Mey01] Sleighton Meyer. Harris corporation completes acceptance test of FAA’s
226
voice switching and control system (VSCS) upgrade. Corporate Press
Release, November 2001.
[MIO90] John D. Musa, Anthony Iannino, and Kazuhira Okumoto. Software Re-
liability: Measurement, Prediction, Application. McGraw-Hill, Inc., New
York, NY, USA, Professional edition, 1990.
[MIS98] MISRA-C guidlines for the use of the C language in critical systems. The
Motor Industry Software Reliability Association, 1998.
[MIS04] MISRA-C:2004 guidlines for the use of the C language in critical systems.
The Motor Industry Software Reliability Association, October 2004.
[MKBD00] Eric Monk, J. Paul Keller, Keith Bohnenberger, and Michael C. Daconta.
Java Pitfalls: Time-Saving Solutions and Workarounds to Improve Pro-
grams (Paperback). John Wiley & Sons, 2000.
[MLB+94] Y.K. Malaiya, N. Li, J. Bieman, R. Karcich, and B. Skibbe. The relation-
ship between test coverage and reliability. In Proc. Int. Symp. Software
Reliability Engineering, pages 186–195, November 1994.
[MMZC06] Kevin Mattos, Christine Moreira, Mark Zingarelli, and Denis Coffey. The
effect of rapid commercial off-the-shelf (COTS) software insertion on the
software reliability of large-scale undersea combat systems. In 17th Inter-
national Symposium on Software Reliability Engineering, Raleigh, North
Carolina, November 2006.
227
[MvMS93] Y.K. Malaiya, A. von Mayrhauser, and P.K. Srimani. An examination
of fault exposure ratio. IEEE Transactions on Software Engineering,
19(11):1087–1094, November 1993.
[Mye76] Glenford J. Myers. Software Reliability: Principles and Practices. John
Wiley & Sons, 1976.
[Nag05] Nachiappan Nagappan. A software testing and reliability early warning
(STREW) metric suite. PhD thesis, 2005. Chair-Laurie A. Williams.
[NB05] Nachiappan Nagappan and Thomas Ball. Static analysis tools as early
indicators of pre-release defect density. In International Conference on
Software Engineering, (ICSE 2005)., 2005.
[NBZ06] Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. Mining metrics
to predict component failures. In International Conference on Software
Engineering, Shanghai, China, May 2006.
[Neu99] Peter G. Neumann. The risks digest. Online Digest of Computing Failures
and Risks, September 15 1999.
[NF96] Martin Neil and Norman Fenton. Predicting software quality using
Bayesian Belief Networks. In Proc 21st Annual Software Eng Workshop,
pages 217–230, NASA Goddard Space Flight Centre, December 1996.
[NIS06] NIST. Source Code Analysis Tool Functional Specification. Technical re-
port, National Institute of Standards and Technology Information Tech-
228
nology Laboratory Software Diagnostics and Conformance Testing Divi-
sion, September 15 2006.
[NM96] Li Naixin and Y.K. Malaiya. Fault exposure ratio estimation and ap-
plications. In Seventh International Symposium on Software Reliability
Engineering (ISSRE ’96) p. 372, 1996.
[Nta97] Simeon Ntafos. The cost of software failures. In Proceedings of IASTED
Software Engineering Conference, pages 53–57, November 1997.
[NWV03] Nachiappan Nagappan, Laurie Williams, and Mladen Vouk. Towards a
metric suite for early software reliability assessment. In FastAbstract in
Supplementary Proceedings, International Symposium on Software Reli-
ability Engineering, 2003.
[NWV+04] Nachiappan Nagappan, Laurie Williams, Mladen Vouk, John Hudepohl,
and Will Snipes. A preliminary investigation of automated software in-
spection. In IEEE International Symposium on Software Reliability En-
gineering, pages 429–439., 2004.
[NWVO04] Nachiappan Nagappan, Laurie Williams, Mladen Vouk, and Jason Os-
borne. Initial results of using in-process testing metrics to estimate soft-
ware reliability. Technical report, North Carolina State University, 2004.
[OS00] Emilie O’Connell and Hossein Saiedian. Can you trust software capability
evaluations? Computer, 33(2):28–35, 2000.
229
[OWB04] Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell. Using static
analysis to determine where to focus dynamic testing effort. In Second In-
ternational Workshop on Dynamic Analysis, Edinburgh, Scotland, 2004.
[Pai] Ganesh J. Pai. A survey of software reliability models. A Project Report
CS 651: Dependable Computing.
[Pai01] Ganesh J Pai. Combining bayesian belief networks with fault trees to
enhance software reliability analysis. In Proceedings of the IEEE Inter-
national Symposium on Software Reliability Engineering, November 2001.
[Par] Terence Parr. Antlr parser generator. http://www.antlr.org.
[Pat02] David A. Patterson. A simple way to estimate the cost of downtime. In
LISA ’02: Proceedings of the 16th USENIX conference on System admin-
istration, pages 185–188, Berkeley, CA, USA, 2002. USENIX Association.
[Pav99] J. G. Pavlovich. Formal report of investigation of the 30th April 1999 Ti-
tan IV B/Centaur TC-14/Milstar-3 (B-32) Space Launch Mishap. Tech-
nical report, U.S. Air Force, 1999.
[PD01] Ganesh J Pai and Joanne Bechta Dugam. Enhancing Software Relia-
bility Estimation Using Bayesian Networks and Fault Trees. In ISSRE
FastAbstracts 2001. Chillarege, 2001.
[Pet94] Henry Petroski. Design Paradigms: Case Histories of Error and Judge-
ment in Engineering. Cambridge University Press, 1994.
230
[Pil03] Daniel Pilaud. Finding run time errors without testing in embedded
systems. Minatec, 2003. Keynote Address by Daniel Pilaud, Chairman
PolySpace Technologies, http://www.polyspace.com.
[POC93] Paul Piwowarski, Mitsuru Ohba, and Joe Caruso. Coverage measurement
experience during function test. In ICSE ’93: Proceedings of the 15th
international conference on Software Engineering, pages 287–301, Los
Alamitos, CA, USA, 1993. IEEE Computer Society Press.
[Pol] PolySpace for C++. Product Brochure.
[Pou03] Kevin Poulsen. Nachi worm infected Diebold ATMs. The Register,
November 25th 2003.
[Pou04a] Kevin Poulsen. Software bug contributed to blackout. SecurityFocus,
February 2004.
[Pou04b] Kevin Poulsen. Tracking the blackout big. The Register, April 2004.
[Pro] The Programming Research Group. High Integrety C++ Coding Standard
Manual, 2.2 edition.
[QAC98] QAC Clinic. Available Online, 1998. Available from
http://www.toyo.co.jp/ss/customersv/doc/qac clinic1.pdf.
[RAF04] Nick Rutar, Christian B. Almazan, and Jeffrey S. Foster. A comparison
of bug finding tools for Java. In Proceedings of the 15th IEEE Symposium
on Software Reliability Engineering, Saint-Malo, France, November 2004.
231
[Rai05] Abhishek Rai. On the role of static analysis in operating system checking
and runtime verification. Technical report, Stony Brook University, May
2005. Technical Report FSL-05-01.
[Ree97] Glenn E Reeves. What really happened on Mars. E-mail discussion of
the failure of the Mars Pathfinder spacecraft, December 1997.
[RGT00] S. Ramani, S. Gokhale, and K. S. Trivedi. Software reliability estimation
and prediction tool. Performance Evaluation, 39:37–60, 2000.
[Ric00] Debra J. Richardson. Static analysis. ICS 224: Software Testing and
Analysis Class Notes, Spring 2000.
[Roo90] Paul Rook, editor. Software Reliability Handbook. Centre for Software
Reliability, City University, London, U.K., 1990.
[SA05a] Walter Schilling and Mansoor Alam. A methodology for estimating soft-
ware reliability using limited testing. In Supplemental Proceedings ISSRE
2005: The 16th International Symposium on Software Reliability Engi-
neering, Chicago, IL, November 2005. IEEE Computer Society and IEEE
Reliability Society.
[SA05b] Walter Schilling and Mansoor Alam. Work In Progress - Measuring the
roitime for Static Analysis. In 2005 Frontiers in Education, Indianapolis,
IN, October 2005. IEEE Computer Society / ASEE.
[SA06a] Walter Schilling and Dr. Mansoor Alam. The software static analysis
reliability toolkit. In Supplemental Proceedings ISSRE 2006: The 17th
232
International Symposium on Software Reliability Engineering, Raleigh,
NC, November 2006. IEEE Computer Society and IEEE Reliability So-
ciety.
[SA06b] Walter Schilling and Mansoor Alam. Estimating software reliability with
static analysis technique. In Proceedings of the 15th International Confer-
ence on Software Engineering and Data Engineering (SEDE-2006), Los
Angeles, California, 2006. International Society for Computers and their
Applications (ISCA).
[SA06c] Walter Schilling and Mansoor Alam. Integrate static analysis into a
software development process. Embedded Systems Design, 19(11):57–66,
November 2006.
[SA06d] Walter Schilling and Mansoor Alam. Modeling the reliability of existing
software using static analysis. In Proceedings of the 2006 IEEE Inter-
national Electro/Information Technology Conference, East Lansing, MI,
2006. IEEE Region IV.
[SA07a] Walter Schilling and Mansoor Alam. Measuring the reliability of exist-
ing web servers. In Proceedings of the 2007 IEEE International Elec-
tro/Information Technology Conference, Chicago, IL, 2007. IEEE Region
IV.
[SA07b] Walter W. Schilling and Mansoor Alam. Evaluating the Effectiveness of
233
Java Static Analysis Tools. In Proceedings of the International Conference
on Embedded Systems and Applications, Las Vegas, NV, June 2007.
[Sch04a] Walter Schilling. Issues effecting the readiness of the Java language for
usage in safety critical real time systems. Submitted to fulfill partial re-
quirements for Special Topics: Java EECS8980-001, Dr. Gerald R Heur-
ing, Instructor, May 2004.
[Sch04b] Katherine V. Schinasi. Stronger Management Practices Are Needed to
Improve DODs Software-Intensive Weapon Acquisitions. Technical re-
port, Government Accounting Office, 2004.
[Sch05] Walter Schilling. Embedded systems software reliability. In NASA / Ohio
Space Grant Consortium 2004-2005 Annual Student Research Symposium
Proceedings XIII, Cleveland, Ohio, 2005. Ohio Space Grant Consortium.
[Sch07] Walter Schilling. Relating software reliability to execution rates using
bayesian belief networks. In NASA / Ohio Space Grant Consortium 2006-
2007 Annual Student Research Symposium Proceedings XV, Cleveland,
Ohio, 2007. Ohio Space Grant Consortium.
[SDWV05] Michele Strom, Martin Davidson, Laurie Williams, and Mladen Vouk.
The “Good Enough”’ Reliability Tool (GERT) - Version 2. In Supplemen-
tary Proceedings of the 16th IEEE International Symposium on Software
Reliability Engineering (ISSRE 2005), Chicago, Illinois, 8-11 November
2005.
234
[Sha06] Lui Sha. The complexity challenge in modern avionics software. In Na-
tional Workshop on Aviation Software Systems: Design for Certifiably
Dependable Systems, 2006.
[Sho84] Martin L. Shooman. Software Reliability: a historical perspective. IEEE
Transactions on Reliability, R-33(1), April 1984.
[Sho96] Martin L. Shooman. Avionics software problem occurrence rates. In The
Seventh International Symposium on Software Reliability Engineering,
1996.
[Sip97] Michael Sipser. Introduction to the Theory of Computation. PWS Pub-
lishing Company, 20 Park Plaza, Boston, MA 02116-4324, 1997.
[SK95] Hossein Saiedian and Richard Kuzara. SEI Capability Maturity Model’s
impact on contractors. Computer, 28(1):16–26, 1995.
[Sla98a] Gregory Slabodkin. Control-system designers say newer version could
have prevented LAN crash. GCN, December 14 1998.
[Sla98b] Gregory Slabodkin. Software glitches leave navy smart ship dead in the
water. GCN, July 13 1998.
[Sop01] Joe Sopko. CTC195, 197, 203, No Sound. National Electronic Service
Dealers Association of Ohio Newsletter, page 10, June 2001.
[Ste02] Henry Stewart. Meetign FDA requirements for validation of medical
device software, September 2002. Briefing Advertisement.
235
[Sto05] Walt Stoneburner. Software reliabilty overview. SMERFS Website, 2005.
[SU99] Curt Smith and Craig Uber. Experience report on early software reli-
ability prediction and estimation. In 10th International Symposium on
Software Reliability Engineering, November 1999.
[SWA+00] Donald Savage, Helen Worth, Diane E. Ainsworth, George Diller, and
Keith Takahashi. The Near Earth Asteroid Rendezvous: A Guide to the
Mission, the Spacecraft, and the People. NASA, 2000.
[SWX05] Sarah E. Smith, Laurie Williams, and Jun Xu. Expediting Program-
mer AWAREness of Anomalous Code. In Supplemental Proceedings 16th
International Symposium on Software Reliability Engineering, Chicago,
Illinois, 2005.
[Sys02] QA Systems. Overview large Java project code quality analysis. Technical
report, QA Systems, 2002.
[Tas02] Gregory Tassey. The economic impacts of inadequate infrastructure for
software testing. Technical Report RTI 7007.011, National Institute of
Standards and Technology, May 2002.
[Tha96] Henrik Thane. Safe and reliable computer control systems: Concepts and
methods. Technical report, Royal Institute of Technology, Mechatronics
Laboratory Department of Machien Design Royal Institute of Technology,
KTH S-100 44 Stockholm Sweden, 1996.
236
[Tri02] Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing and
Computer Science Applications. John Wiley and Sons, Inc., 2002.
[VB04] Arnaud Venet and Guillaume Brat. Precise and efficient static array
bound checking for large embedded C programs. In PLDI ’04: Proceed-
ings of the ACM SIGPLAN 2004 conference on Programming language
design and implementation, pages 231–242. ACM Press, 2004.
[VBKM00] J. Viega, J. T. Bloch, Y. Kohno, and G. McGraw. ITS4: A static vul-
nerability scanner for C and C++ code. In ACSAC ’00: Proceedings of
the 16th Annual Computer Security Applications Conference, page 257,
Washington, DC, USA, 2000. IEEE Computer Society.
[vGB99] Jilles van Gurp and Jan Bosch. Using Bayesian Belief Networks in As-
sessing Software Designs. In ICT Architectures ’99, November 1999.
[VT01] Kalyanaraman Vaidyanathan and Kishor S. Trivedi. Extended classi-
fication of software faults based on aging. In The 12th International
Symposium on Software Reliability Engineering, 2001.
[Wag04] Stefan Wagner. Efficiency analysis of defect-detection techniques. Tech-
nical Report TUMI-0413, Institut fur Informatik, Technische Universitat
Munchen, 2004.
[Wal01] Dolores R. Wallace. Practical software reliability modeling. In Proceed-
ings 26th Annual NASA Goddard Software Engineering Workshop, pages
147–155, November 27-29 2001.
237
[Wal04] Matthew L. Wald. Maintance lapse blamed for air traffic control problem.
New York Times, September 2004.
[Whe04] David A. Wheeler. Flawfinder, May 30 2004. Manual Page.
[wir98] Sunk by Windows NT. wired.com, July 1998.
[WJKT05] Stefan Wagner, Jan Jrjens, Claudia Koller, and Peter Trischberger. Com-
paring bug finding tools with reviews and tests. In Proceedings of Testing
of Communicating Systems: 17th IFIP TC6/WG 6.1 International Con-
ference, TestCom 2005, Montreal, Canada, May - June 2005. Springer-
Verlag GmbH.
[Woe99] Jack J. Woehr. A conversation with Glenn Reeves: Really remote debug-
ging for real-time systems. Dr. Dobb’s Journal, November 1999.
[Xie91] M. Xie. Software Reliability Modeling. World Scientific Publishing Com-
pany, Singapore, 1991.
[XNHA05] Yichen Xie, Mayur Naik, Brian Hackett, and Alex Aiken. Soundness and
its role in bug detection systems. In Proceedings of Workshop on the
Evaluation of Software Defect Detection Tools (BUGS’05), June 2005.
[XP04] Shu Xiao and Christopher H. Pham. Performing high efficiency source
code static analysis with intelligent extensions. In APSEC, pages 346–
355, 2004.
[YB98] David York and Maria Babula. Virtual interactive classroom: A new
238
technology for distance learning developed. Research & Technology 1998,
1998.
[YP99] David York and Joseph Ponyik. New Web Server - the Java version of
Tempest - Produced. Research and Technology, 1999.
[ZLL04] Misha Zitser, Richard Lippmann, and Tim Leek. Testing static analysis
tools using exploitable buffer overflows from open source code. SIGSOFT
Software Engineering Notes, 29(6):97–106, 2004.
[ZWN+06] Jiang Zheng, Laurie Williams, Nachiappan Nagappan, Will Snipes,
John P. Hudepohl, and Mladen A. Vouk. On the value of static analysis
for fault detection in software. IEEE Transactions on Software Engineer-
ing, 32(4):240–253, 2006.
Appendix A
Fault Taxonomy
Table A.1: SoSART Static Analysis Fault Taxonomy
Num. Categorization Name Description
100 Input ValidationGeneral Input Valida-tion problem
101 Input Validation Path ManipulationAllowing user input to control paths used by the applica-tion may enable an attacker to access otherwise protectedfiles.
102 Input ValidationCross Site Script-ing.Basic XSS
Basic’ XSS involves a complete lack of cleansing of anyspecial characters, including the most fundamental XSSelements such as < and >.
103 Input Validation Resource InjectionAllowing user input to control resource identifiers mightenable an attacker to access or modify otherwise pro-tected system resources.
104 Input Validation OS Command Injec-tion
Command injection problems are a subset of injectionproblem, in which the process is tricked into calling ex-ternal processes of the attackers choice through the in-jection of control-plane data into the data plane. Alsocalled ”shell injection”.
105 Input Validation SQL Injection
SQL injection attacks are another instantiation of injec-tion attack, in which SQL commands are injected intodata-plane input in order to effect the execution of rede-fined SQL commands.
200 Range Errors General Range ErrorProblem
201 Range Errors Stack overflow
A stack overflow condition is a buffer overflow condition,where the buffer being overwritten is allocated on thestack (i.e., is a local variable or, rarely, a parameter to afunction).
202 Range Errors Heap overflow
A heap overflow condition is a buffer overflow, where thebuffer that can be overwritten is allocated in the heapportion of memory, generally meaning that the bufferwas allocated using a routine such as the POSIX malloc()call.
203 Range ErrorsFormat string vulnera-bility
Format string problems occur when a user has the abilityto control or write completely the format string used toformat data in the printf style family of C/C++ func-tions.
204 Range ErrorsImproper Null Termi-nation
The product does not properly terminate a string or ar-ray with a null character or equivalent terminator. Nulltermination errors frequently occur in two different ways.An off-by-one error could cause a null to be written outof bounds, leading to an overflow. Or, a program coulduse a strncpy() function call incorrectly, which prevents anull terminator from being added at all. Other scenariosare possible.
205 Range Errors Array Length problem Array length [%2d,%3d] is or may be less than zero.206 Range Errors Index out of Bounds Index [%2d,%3d] is or may be out of array bounds.
300 API AbuseGeneral API AbuseProblem
301 API Abuse Heap InspectionUsing realloc() to resize buffers that store sensitive in-formation can leave the sensitive information exposed toattack because it is not removed from memory.
Continued on next page
239
240
Table A.1 – continued from previous page
Num. Categorization Name Description
302 API AbuseOften Misused: StringManagement
Functions that manipulate strings encourage buffer over-flows.
400 Security Features General Security Fea-ture Problem
401 Security Features Hard-Coded PasswordStoring a password in plain text may result in a systemcompromise.
500 Time and State General Time andState Problem
501 Time and State Time-of-check Time-of-use race condition
Time-of-check, time-of-use race conditions occur whenbetween the time in which a given resource (or its ref-erence) is checked, and the time that resource is used, achange occurs in the resource to invalidate the results ofthe check.
502 Time and State Unchecked Error Con-dition
Ignoring exceptions and other error conditions may allowan attacker to induce unexpected behavior unnoticed.
600 Code QualityGeneral Code Qualityproblem
601 Code Quality Memory leak
Most memory leaks result in general software reliabilityproblems, but if an attacker can intentionally trigger amemory leak, the attacker might be able to launch a de-nial of service attack (by crashing the program) or takeadvantage of other unexpected program behavior result-ing from a low memory condition .
602 Code Quality Unrestricted CriticalResource Lock
A critical resource can be locked or controlled by an at-tacker, indefinitely, in a way that prevents access to thatresource by others, e.g. by obtaining an exclusive lockor mutex, or modifying the permissions of a shared re-source. Inconsistent locking discipline can lead to dead-lock.
603 Code Quality Double FreeCalling free() twice on the same value can lead to a bufferoverflow.
604 Code Quality Use After FreeUse after free errors sometimes have no effect and othertimes cause a program to crash.
605 Code Quality Uninitialized variable
Most uninitialized variable issues result in general soft-ware reliability problems, but if attackers can intention-ally trigger the use of an uninitialized variable, theymight be able to launch a denial of service attack bycrashing the program.
606 Code QualityUnintentional pointerscaling
In C and C++, one may often accidentally refer to thewrong memory due to the semantics of when math oper-ations are implicitly scaled.
607 Code Quality Improper pointer sub-traction
The subtraction of one pointer from another in order todetermine size is dependant on the assumption that bothpointers exist in the same memory chunk.
608 Code Quality Null DereferenceUsing the NULL value of a dereferenced pointer asthough it were a valid memory address
700 EncapsulationGeneral Encapsulationproblem
701 EncapsulationPrivate Array-TypedField Returned FromA Public Method
The contents of a private array may be altered unexpect-edly through a reference returned from a public method.
702 EncapsulationPublic Data Assignedto Private Array-Typed Field
Assigning public data to a private array is equivalentgiving public access to the array.
703 Encapsulation Overflow of static in-ternal buffer
A non-final static field can be viewed and edited in dan-gerous ways.
704 Encapsulation Leftover Debug Code
Debug code can create unintended entry points in an ap-plication. Output on System.out or System.err. Someprogrammers debug code with a debugger, some useprintouts on System.out and System.err. Some printoutsmay by mistake not be removed when the debug sessionis over. This rule flags output using System.out.XX()System.out.XX() and Exception.printStackTrace();
1000 Operator PrecedenceGeneral Operator Pri-ority Problem
1001 Operator Precedence Logical OperatorPrecedence Problem
May be wrong assumption about logical operators prece-dence
1002 Operator Precedence Shift Operator Prece-dence Problem
May be wrong assumption about shift operator priority
1003 Operator PrecedenceBit Operator Prece-dence Problem
May be wrong assumption about bit operation priority
1100 Object Oriented Problems General Object Ori-ented Problem
1101 Object Oriented Problems Incomplete Override
This error condition indicates that there is an over-ridden method but other associated methods have notbeen overridden. Override both Object.equals() and Ob-ject.hashCode() Some containers depends on both hashcode and equals when storing objects [3]. Method %2mis not overridden by method with the same name of de-rived class %3c.
1102 Object Oriented ProblemsComponent shadowinguncovered.
Component A in class B shadows one in base class C.
1103 Object Oriented Problems Run method not over-ridden
The run method should be overridden when extendingthe Thread class [7]. This rule does not warn when theclass is abstract.
1104 Object Oriented ProblemsClass ComparisonProblem
Rule 1034 - Compare classes using getClass() in non-final equals method. When comparing object for equal-ity use the .getClass() method to make sure that the ob-ject are of exactly the same type. Checking types withthe instance of operator only breaks the required equiva-lence relation [3] when a subclass has redefined the equalsmethod.
Continued on next page
241
Table A.1 – continued from previous page
Num. Categorization Name Description
1105 Object Oriented Problems Finalizer BehaviorProblem
A finalizer implementing only the default behavior is un-necessary. The general contract of finalize is that it isinvoked if and when the virtual machine has determinedthat there is no longer any means by which this objectcan be accessed by any thread that has not yet died.Also, the finalize method is never invoked more than onceby a Java virtual machine for any given object A finalizershould always call the superclass’ finalizer.
1106 Object Oriented Problems Bad Inheritance Path A class has been derived in an inappropriate manner.
1200 Logic ProblemsGeneral Logic Prob-lem
1201 Logic ProblemsMissing Body inSwitch Statement
Suspicious SWITCH without body
1202 Logic Problems Missing Break State-ment
Possible miss of BREAK before CASE/DEFAULT. Nobreak statement found. The flow of control fall into thecase of default statement below. Is this the intention oris there a break statement missing? If this was deliber-ate then place a comment immediately before the nextcase of default statement. Add a comment containingthe substring: ”fall through” or ”falls through”.
1203 Logic ProblemsLogic always executedthe same path
If condition always follows the same execution path dueto logic. Comparison always produces the same result.
1204 Logic Problems Suspicious ElseBranch Association
May be wrong assumption about ELSE branch associa-tion
1205 Logic Problems Suspicious If BranchAssociation
May be wrong assumption about IF body. if statementwith empty body. If statements directly followed by asemicolon is usually an error.
1206 Logic Problems Missing If Statement ELSE without IF
1207 Logic ProblemsUnreachable State-ment
The defined statement can not be reached.
1208 Logic ProblemsSuspicious Loop BodyAssociation
May be wrong assumption about loop bod. while state-ment with empty body. while statements directly fol-lowed by a semi-colon is usually an error. for statementwith empty body. for statements directly followed by asemi-colon is usually an error.
1209 Logic Problems Suspicious Case state-ment
Suspicious CASE/DEFAULT
1210 Logic Problems Missing While No WHILE for DO
1211 Logic ProblemsImproper string com-parison
Compare strings as object references. String compari-son using == or != operator. Avoid using the == or!= operator to compare strings. Use String.equals() orString.compareTo() when testing strings for equality. [1]
1213 Logic Problems Missing Default Block
No default case in switch block. There was no ”default:”case in the switch block. Is this the intention? It isprobably better to include the default case with an ac-companying comment. If the flow of control should neverreach the default case, use an assertion for early error de-tection:
1214 Logic Problems True False boolean lit-eral test
Equality operation on ’true’ boolean literal. Avoidingequality operations on the ’true’ boolean literal savessome byte code instructions and in most cases it improvesreadability. Inequality operation on ’false’ boolean lit-eral. This rule flags inequality operations on the ’false’literal which is also an identity operation
1215 Logic Problems Dead Code DetectedThis rule flags dead code. Only ’if (false)’, ’while (false)’and ’for(..;false;..)’ are detected.
1216 Logic Problems Comparison Problem Compared expressions can be equal only when both ofthem are 0.
1217 Logic ProblemsCase value can not beproduced
Switch case constant %2d cant be produced by switchexpression.
1218 Logic Problems Zero operand Zero operand for %2s operation.1219 Logic Problems Result always zero. Result of operation %2s is always 0.
1220 Logic Problems Loop variable defini-tions
Something is wrong with the handling of a variablewithin a loop.
1300 Comments General CommentsProblem
1301 Comments Unclosed Comment Unclosed Comments1302 Comments Nested Comment Nested Comments
1303 Comments Missing javadoc tagEach compilation unit should have an @author javadoctag.
1304 Comments Unknown Javadoc tagA javadoc tag not part of the standard tags [6] was found.Is this the purpose or a spelling mistake?
1400 Exception handlingGeneral ExceptionHandling Problem
1401 Exception handlingGeneral ExceptionHandling Problem
1402 Exception handling Suspicious Catch Suspicious CATCH/FINALLY
1403 Exception handlingCatch / Throw toogeneral
Catched exception too general. Exceptions should behandled as specific as possible. Exception, Runtime Ex-ception, Error and Throwable are too general [8]. Run-time Exception is inappropriate to catch except at thetop level of your program. Runtime Exceptions usuallyrepresents programming errors and indicate that some-thing in your program is broken. Catching Exceptionor Throwable is therefore also inappropriate since thatcatch clause will also catch Runtime Exception. Pre-fer throwing subclasses instead of the general exceptionclasses. Exception, Runtime Exception, Throwable andError are considered too general
1404 Exception handling Empty Block Empty catch block. Exceptions should be handled in thecatch block.
1406 Exception handling Return in FinallyBlock
Avoid using the return statement in a finally block.
1500 Syntax Error General Syntax Error Uncategorized Syntax errors.1501 Syntax Error Missing colon No ’:’ after CASE. No ’;’ after FOR initialization part
Continued on next page
242
Table A.1 – continued from previous page
Num. Categorization Name Description
1502 Syntax Error Assignment Problem May be ’=’ used instead of ’==’
1503 Syntax ErrorEscape SequenceProblem
May be incorrect escape sequence
1504 Syntax ErrorInteger constant prob-lem
May be ’l’ is used instead of ’1’ at the end of integerconstant
1600 ImportGeneral Import Prob-lem
1601 Import Unused importUnused Import Avoid importing types that are neverused.
1602 ImportExplicit import ofjava.lang classes.
Explicit import of java.lang classes. All types in thejava.lang package are implicitly imported. There’s noneed to import them explicitly.
1603 Import Wildcard Import
Wildcard import. Demand import declarations are notallowed in the compilation unit. Enumerating the im-ported types explicitly makes it very clear to the readerfrom what package a type come from when used in thecompilation unit.
1604 Import Duplicate ImportDuplicate import. The imported type has already beenimported.
1605 ImportImporting classes fromcurrent package.
Classes from the same package are imported by default.There’s no need to import them explicitly.
1700 PackagingGeneral Packagingproblem
1701 Packaging No Package foundNo package declaration found. Try to structure yourclasses in packages.
1800 Style General Style Problem
1801 StyleSpace Tab Indentationproblem
Mixed indentation. Both spaces and tabs have been usedto indent the source code. If the file is indented with oneeditor it might look unindented in another if the tab sizeis not exactly the same in both editors. Prefer spacesonly.
1802 Style Wrong modifier order
Wrong order of modifiers. The order of modifiers shouldaccording to [1] be: Class modifiers: public protected pri-vate abstract static final strictfp Field modifiers: publicprotected private static final transient volatile Methodmodifiers: public protected private abstract static finalsynchronized native strictfp
1803 Style80 Character LineLength Exceeded
Avoid lines longer than 80 characters, since they’re nothandled well by many terminals and tools [5]. Printersusually have a 80 character line limit. The printed copywill be hard to read when lines are wrapped.
1804 Style Assert is reserved key-word in JDK 1.4
Prefer using another name to improve portability.
1805 Style Declare ParametersFinal
Assigning a value to the formal parameters can confusemany programmers because the formal parameter may beassociated with the actual value passed to the method.Confusion might also arise if the method is long and thevalue of the parameter is changed. If you do not intendto assign any value to the formal parameters you canstate this explicitly by declaring them final.
1806 Style Suspicious MethodName
The method has almost the same name as Ob-ject.hashCode, Object.equals or Object.finalize. Maybethe intention is to override one of these?
1900 SynchronizationGeneral Synchroniza-tion problem
1901 SynchronizationBroken Double-checked locking idiom
Double-Checked Locking is widely cited and used as anefficient method for implementing lazy initialization ina multithreaded environment. Unfortunately does thisidiom not work in the presence of either optimizing com-pilers or shared memory multiprocessors [2].
1902 Synchronization Potential DeadlockLoop %2d: invocation of synchronized method %3m cancause deadlock
1903 Synchronization Synchronized methodoverwritten
Synchronized method %2m is overridden by non-synchronized method of derived class
1904 Synchronization Unsynchronized callsMethod %2m can be called from different threads and isnot synchronized
1905 SynchronizationVariable volatilityproblem
Field %2u of class %3c can be accessed from differentthreads and is not volatile
1906 Synchronization Potential Race Condi-tion
Value of lock %2u is changed outside synchronization orconstructor. Value of lock %2u is changed while (poten-tially) owning it
2000 Variables General VariableProblem
2001 Variables Unused VariableAn unused local variable may indicate a flaw in the pro-gram.
2002 Variables Comparison TypeProblem
Comparison of short with char.
2003 Variables Typecast problem Maybe type cast is not correctly applied.
2004 VariablesTruncation results indata loss
Data can be lost as a result of truncation to %2s.
2100 ConstructorGeneral ConstructorProblem
2101 ConstructorSuperclass constructornot called
A class which extends a class does not call the superclassconstructor.
Appendix B
SOSART Requirements
B.1 Functional Requirements
Requirement Rationale
F.1
The SOSART analysis tool shall becapable of loading any Java 1.4.2 com-pliant source code module and creat-ing UML based activity diagrams /control flow diagrams for each methodwithin the source code module.
The first project to be analyzed re-quires Java support.
F.2
Based upon a loaded source code mod-ule, SOSART shall be capable of gen-erating watchpoints which can be usedto collect execution profiles. Watch-points shall be located at the entry toeach and every code block, as well asat all return statements.
Execution traces require indications ofthe location for the start of each codeblock, and this must be obtained bystructurally analyzing the source codemodule.
F.3
The SOASRT tool shall be capableof visualizing execution traces whichhave been captured by the tool. Vi-sualization shall occur on the activitydiagram / flow diagram representationof the program.
Aids in understanding the executionflow and its relationship to the variouscode blocks within the method.
F.3.1
The number of times a given path hasbeen executed shall be displayed onthe activity diagram when an execu-tion path is added to the display.
Usability and understanding of the ex-ecuted paths.
Continued on Next Page. . .
243
244
Requirement Rationale
F.4
The SOSART tool shall have the capa-bility to store external to the programa historical database which representsall warnings which have been analyzedduring program execution.
Allows historical trending to be usedto improve the accuracy of the model.
F.4.1Transfer to the historical databaseshall be at the operators command.
Prevents erroneous analysis resultsfrom being transferred into the histor-ical database.
F.4.2The SOSART tool shall allow the op-erator to clear the historical databaseas is necessary.
This allows projects of entirely differ-ent scope to be analyzed without datacontamination by projects from differ-ent domains.
F.4.3The SOSART tool shall allow the op-erator to store and retrieve differenthistorical databases as is necessary.
This allows projects from different do-mains to be analyzed independently.
F.5SOSART shall be capable of calculat-ing Cyclomatic Complexity and StaticPath count on a per method basis.
These are basic metrics which shouldbe readily available when analyzingCOTS and other developed software.
F.6
SOSART shall be capable of import-ing static analysis warnings and dis-playing the warnings on the generatedactivity diagrams / control flow dia-grams.
Visualization static analysis warningsrelative to the execution profile ob-tained during program execution.
F.6.1
SOSART shall be capable of interfac-ing at minimum with the followingJava Static analysis tools:1. JLint1
2. ESC/Java3. FindBugs4. Fortify SCA5. PMD6. QAJ7. Lint4J8. JiveLint9. Klocwork K7
These are commonly available staticanalysis tools for Java which havebeen shown to be reliable and thusserves as a starting set for this anal-ysis.
F.7
The SOSART GUI tool shall sup-port commonly existing document in-terface behavior, including but notlimited to image zooming, tiling, andprinting of generated graphics.
These are standard GUI behaviors ex-pected in a completed analysis tool.
F.7.1
SOSART shall automatically relaygraphics displays when necessary, butshall also have a button to force thegraphics to be relayed using the builtin algorithm.
Allows the user to force a relay ofthe display if a program failure occurswhich prevents the proper display ofthe activity diagram graph.
F.8
The SOSART tool shall provide thecapability to export generated graph-ics into a standard file format forgraphics.
This allows generated graphics to beimported into reports and other doc-uments as is necessary.
Continued on Next Page. . .
245
Requirement Rationale
F.9SOSART shall allow the user to saveprojects and generated metrics for fu-ture usage.
Ease of use for long term projects.
F.10
The SOSART tool shall allow the userto save analyzed static analysis warn-ings separate from the project. Warn-ings saved as such shall be recoverablewith all attributes set to the valuesmodified by the user during analysis.
This will allow larger projects to beanalyzed which may not be storablein their complete format due to limita-tions of XML persistence within Java.
F.11
SOSART shall use a configuration filewhich shall store configuration pa-rameters for the tool across multipleprojects.
Allows the user to store common pa-rameters and commands within a con-figuration file so that they do not needto be set when invoking the tool fromthe command line.
F.11.1Configuration data shall be stored inthe XML format.
XML is a standard markup language.
F.12SOSART shall categorize importedstatic analysis warnings based upon adefined taxonomy.
Required for the proper characteriza-tion of warnings.
F.12.1
SOSART shall support the CWEand SAMATE taxonomies as well asa custom developed taxonomy forSOSART.
CWE and SAMATE are two existingtaxonomies for static analysis warn-ings. However, these taxonomies tar-get security as opposed to the largerdomain of static analysis tools.
F.12.2
All static analysis warnings shall becategorized into the appropriate tax-onomy upon importation if the cate-gorization has not already occurred.
This allows the most efficient catego-rization of faults to the taxonomy def-initions, as they are categorized by theuser when a new instance is detected.
F.12.3Fault taxonomy assignments shall beviewable within the SOSART tool bythe operator.
This allows the user to view existingtaxonomy assignments as is necessary.
F.13
The SOSART tool shall calculate theestimated software reliability usingthe reliability model developed bySchilling and Alam[SA06d] [SA06b][SA05a].
This is one of the fundamental pur-poses for the tool.
F.13.1The reliability shall be shown in a tex-tual format which can be saved to afile.
Allows storage of reliability and im-portation into external reports.
F.14
The SOSART tool shall be capableof generating reports based upon thestatically detectable faults importedinto the tool.
Basic core functionality required forthe tool to be a Metadata tool.
F.14.1The SOSART tool shall provide his-torical reports, project level reports,and file level reports.
These represent the three major clas-sifications of faults supported bySOSART, as a fault is either historicalin nature (in that it is from a previousproject), part of a project (which con-sists of multiple files), or part of a file.
Continued on Next Page. . .
246
Requirement Rationale
F.15
The SOSART tool shall allow forfaults which are not directly at-tributable to a given method to bestored at the class level.
Certain static analysis faults may belocated in such a manner that they arenot related to a given method but aredirectly connected with the class dec-laration. While these faults do not di-rectly play into this reliability model,they should be kept for metrics pur-poses.
F.16The SOSART tool shall allow all filefaults to be viewed in a listing separatefrom the individual method displays.
Under certain circumstances, it maybe beneficial to visualize faults as a listinstead of on the activity diagrams.
F.17SOSART shall be capable of exportingdata into XML format for importationinto external programs.
XML is a standard language for inter-face between data systems.
F.18
The SOSART toolset set contain atrace generator capable of loggingbranch execution traces in a man-ner which can be imported into theSOSART tool.
This is necessary in order to log exe-cution traces for model usage.
1Currently, the JLint tool does not include support for XML output. In orderto interface properly with the SOSART tool, JLint will need to be improved toinclude XML output.
B.2 Development Process requirements
Requirement Rationale
D.1The SOSART tool, when practical,shall be developed using the PersonalSoftware Process.
The PSP represents a well docu-mented approach to quality soft-ware development for small, individ-ual projects. This is essential to en-sure a reliably delivered analysis tool.However, being that this is a researchtool being developed using other re-searchers code as modules, it may notbe feasible to follow a strict PSP pro-cess during development.
D.1.1Development data shall be collectedusing the Software Process Dash-board.
Reliably following the PSP is simpli-fied by the usage of tools which au-tomatically collect the requisite data.By doing this, individual mistakes canbe reduced.
Continued on Next Page. . .
247
Requirement Rationale
D.2Design documentation for theSOSART tool shall be created usingUML design format.
UML represents a standard approachfor software design which is readily un-derstood by practitioners.
D.3
Source code developed for theSOSART tool shall be verified forcoding standards compliance usingthe Checkstyle tool.
Enforcement of coding standards dur-ing development has been shown to re-duce the number of defects in a finaldelivered product. Checkstyle helpsto prevent common Java programmingmistakes as well as ensuring consistentstyle.
D3.1The SOSART tool shall compile with-out any compiler warnings in thehand-coded segments.
This ensures that any compiler de-tected problems have been removed.While it is desirable to remove com-piler warnings from the automaticallygenerated code, this may not be feasi-ble given the limitations of automaticcode generation.
D.4
SOSART source code, design docu-mentation, and other materials shallbe kept under version management atall times through development.
Appropriate software engineeringpractice.
D.4.1SOSART shall use the CVS versionmanagement system for all configura-tion management practices.
CVS is readily available as an opensource project and is well supportedand extremely extensible.
D.5SOSART shall be released through theSourceforge site.
SourceForge is a readily available dis-tribution site which is commonly usedfor Open Source programs.
B.3 Implementation Requirements
Requirement Rationale
I.1The SOSART tool shall be imple-mented using the Java programminglanguage.
Java allows for easy development of aGUI, is portable, and is an appropri-ate for language research based tooldevelopment1.
I.2Source code parsing shall be accom-plished through the use of the ANTLRparser.
The ANTLR parser is a readily avail-able tool which can be easily dis-tributed. It is also well documentedand has an extensive record of success-ful usage.
Continued on Next Page. . .
248
Requirement Rationale
I.3
The SOSART tool shall not useany constructs which would limit theportability of the tool to a given envi-ronment.
It is important to develop a tool whichis portable across multiple develop-ment platforms.
I.4Java 1.4.2 shall be used for tool devel-opment.
Many higher end UNIX systems donot support Java versions newer than1.4.2, and thus, would be unable torun the tool if newer Java constructsare used.
1This is in contrast to embedded systems development in which Java is generallynot an appropriate language for reasons explained in Schilling[Sch04a].