Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | melvyn-parks |
View: | 225 times |
Download: | 0 times |
Software Metrics and Defect Prediction
Ayşe Başar Bener
Problem 1
How to tell if the project is on schedule and within budget? Earned-value
charts.
Problem 2
How hard will it be for another organization to maintain this software? McCabe
Complexity
Problem 3
How to tell when the subsystems are ready to be integrated Defect Density
Metrics.
Problem Definition Software development
lifecycle: Requirements Design Development Test (Takes ~50% of overall time)
Detect and correct defects before delivering software.
Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as secondary
tools
Testing
Defect Prediction 2-Class Classification Problem.
Non-defective If error = 0
Defective If error > 0
2 things needed: Raw data: Source code Software Metrics -> Static Code
Attributes
Static Code Attributes void main() { //This is a sample code
//Declare variables int a, b, c;
// Initialize variables a=2; b=5;
//Find the sum and display c if greater than zero
c=sum(a,b); if c < 0 printf(“%d\n”, a); return; }
int sum(int a, int b) { // Returns the sum of two numbers return a+b; }
c > 0
c
Module
LOC LOCC V CC Error
main() 16 4 5 2 2
sum() 5 1 3 1 0
LOC: Line of CodeLOCC: Line of commented CodeV: Number of unique operands&operatorsCC: Cyclometric Complexity
+
Research on Defect Prediction
Defect prediction using machine learning techniques How effectively we can estimate defect density?
Regression models First classification, then regression
Defect prediction in multi version software Defect prediction in embedded software
B. Turhan, and A. Bener, "A Multivariate Analysis of Static Code Attributes for Defect Prediction", QSIC 2007, Portland, USA, October 11-12, 2007
A.D. Oral and A. Bener, "Defect Prediction for Embedded Software", ISCIS 2007, Ankara, Turkey, November 9-11, 2007. Software Defect Identification Using Machine Learning Techniques”, E. Ceylan, O. Kutlubay, A. Bener, EUROMICRO SEAA, Dubrovnik,
Croatia, August 28th - September 1st, 2006 "Mining Software Data", B. Turhan and O. Kutlubay, Data Mining and Business Intelligence Workshop in ICDE'07 , İstanbul, April 2007 "A Two-Step Model for Defect Density Estimation", O. Kutlubay, B. Turhan and A. Bener, EUROMICRO SEAA, Lübeck, Germany, August
2007 "A Defect Prediction Method for Software Versioning", Y. Kastro and A. Bener, Software Quality Journal (in print). “Software Defect Density Estimation Using Static Code Attributes: A Two Step Model”, O. Kutlubay, B. Turhan, A. Bener, Eng. App. of AI
(under review)
Constructing Predictors Baseline: Naive Bayes. Why?: Best reported results so far (Menzies et
al., 2007) Remove assumptions and construct different
models. Independent Attributes ->Multivariate dist. Attributes of equal importance
"Software Defect Prediction: Heuristics for Weighted Naïve Bayes", B. Turhan and A. Bener, ICSOFT2007, Barcelona, Spain, July 2007.
“Software Defect Prediction Modeling”, B. Turhan, IDOESE 2007, Madrid, Spain, September 2007
“Yazılım Hata Kestirimi için Kaynak Kod Ölçütlerine Dayalı Bayes Sınıflandırması”, UYMS2007, Ankara, September 2007
“A Multivariate Analysis of Static Code Attributes for Defect Prediction”, B. Turhan and A. Bener QSIC 2007, Portland, USA, October 2007.
Weighted Naive Bayes))(log(
2
1)(
2
1i
d
j j
ijtj
i CPs
mxxg
Naive Bayes
Weighted Naive Bayes ))(log(2
1)(
2
1i
d
j j
ijtj
ji CPs
mxwxg
DatasetsName # Features #Modules Defect Rate(%)
CM1 38 505 9
PC1 38 1107 6
PC2 38 5589 0.6
PC3 38 1563 10
PC4 38 1458 12
KC3 38 458 9
KC4 38 125 40
MW1 38 403 9
Performance Measures
Defects
Actual
no yes
Prd
no A B
yes
C D
Accuracy: (A+D)/(A+B+C+D)
Pd (Hit Rate): D / (B+D)
Pf (False Alarm Rate): C / (A+C)
Results: InfoGain&GainRatio
DataWNB+IG (%) WNB+GR (%) IG+NB (%)
pd pf bal pd pf bal pd pf bal
CM1 82 39 70 82 39 70 83 32 74
PC1 69 35 67 69 35 67 40 12 57
PC2 72 15 77 66 20 72 72 15 77
PC3 80 35 71 81 35 72 60 15 70
PC4 88 27 79 87 24 81 92 29 78
KC3 80 27 76 83 30 76 48 15 62
KC4 77 35 70 78 35 71 79 33 72
MW1 70 38 66 68 34 67 44 07 60
Avg: 77 31 72 77 32 72 65 20 61
Results: Weight Assignments
ICSOFT’07
WC vs CC Data?• When to use WC or CC?
• How much data do we need to construct a model?
ICSOFT’07
Thank You
http://softlab.boun.edu.tr