Revisiting the Impact of Classification Techniques on the Performance of
Defect Prediction Models
Baljinder Ghotra
Ahmed E.Hassan
Shane McIntosh
Quality assurance teams have limited resources
Personnel Schedules
2
Executing all test suitestakes too long
3
Often release several timesin one day!
Defect models can help QA teams to allocate limited resources effectively
4
Defect prediction
model
Defect models are trained using historical data to predict the defect-prone modules
5
a
b
c c
a
New!
c
Reasonfor change
Changedmodules
Developerresponsible
Defect prediction model
Defect models are trained using historical data to predict the defect-prone modules
6
abccaNew!c
Low riska b
High risk
c
Defect models are trained using various techniques
7
Simple techniques
Advanced techniques
Decision Trees
Logistic Regression+
Logistic Model Trees (LMT)
Most classification techniques produce models that achieve similar performance?
8
Decision Trees Logistic Model Trees (LMT)
+
The performance of 17 of 22 studied techniques are
indistinguishableBenchmarking classification models for software defect
predictionS. Lessmann, B. Baesens,
C. Mues, S. Pietsch [TSE 2008]
Limitations of the prior work
9
Overlapping statistical ranks
Noisy data
Limited scope
Do most techniques produce models with similar performance, when we use:
10
Non-overlappingstatistical ranks
Cleandata
Expandedscope
Overlapping statistical ranks
Noisy data
Limited scope
Do most techniques produce models with similar performance, when we use:
11
Non-overlapping statistical ranks
Expanded scope
Clean data
Do most techniques produce models with similar performance, when we use:
12
Non-overlapping statistical ranks
Expanded scope
Clean data
Our approach to study the impact of classification techniques on defect models
13
Train and test models
using different
techniques
Rank techniques
using statistical clustering
11a
22b
NNz
...
Performance scores for
each technique
Rank Tech.123
z, …a,b,…
…
Repeat100 times
Unfortunately, some projects yieldpoorer results than others
14
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●
●●
●
●●
●
●
●
CM1
JM1
KC1
KC3
KC4
MW1
PC1
PC2
PC3
PC4
0.5
0.6
0.7
0.8
0.9
AUC
Performance values rarely overlap!
Non-overlapping ranks using a double Scott-Knott test
15
Scott-Knotttest (2nd run)
Project 2
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
T2, T5
TechniqueRank
1
T1, T7, T102
T3, T4, T63
T8, T94
Project 1
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
Project M
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T2, T10
TechniqueRank
1
T1, T7, T82
T3, T4, T63
T5, T94
...
Non-overlapping ranks using a double Scott-Knott test
16
Scott-Knotttest (2nd run)
Project 2
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T2, T5, T7
TechniqueRank
1
T1, T102
T3, T4, T63
T8, T94
T2, T5
TechniqueRank
1
T1, T7, T102
T3, T4, T63
T8, T94
Project 1
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T3, T7, T8
TechniqueRank
1
T2, T102
T1, T4, T63
T5, T94
Project M
Scott-Knotttest (1st run)
...Mean AUC value
Technique 1
Mean AUC value
Technique 1
Mean AUC value
Technique 1
10xMean AUC
value
Technique 2
Mean AUC value
Technique 2
Mean AUC value
Technique 2
10xMean AUC
value
Technique N
Mean AUC value
Technique N
Mean AUC value
Technique N
10x
T2, T10
TechniqueRank
1
T1, T7, T82
T3, T4, T63
T5, T94
...
17
Non-overlapping test:Most techniques have similar performance
Rank12
Ad+NB, EM, RBFs, …Rsub+SMO, J48, …
Technique
Similar to the prior work, techniques are grouped into 2 distinct ranks
Do most techniques produce models with similar performance, when we use:
18
Non-overlapping statistical ranks
Expanded scope
Clean data
Yes, techniques
are grouped into
2 distinct ranks
Do most techniques produce models with similar performance, when we use:
19
Non-overlapping statistical ranks
Expanded scope
Clean data
Yes, techniques
are grouped into
2 distinct ranks
Clean NASA dataset:Cleaning criteria of prior work
20
Data Quality: Some Comments on the NASA Software Defect Datasets
M. Shepperd, Q. Song, Z. Sun, C. Mair [TSE 2013]
Identical cases
Missing values
Constraint violations
Clean NASA dataset:Many distinct ranks of techniques
21
Rank12
LMT, SL, …KNN, RBFs, …
Technique
3 J48, K-means, …4 SMO, Ridor, …
Unlike the prior work, techniques are grouped into 4 distinct ranks
Top performers are LMT and logistic regression
Do most techniques produce models with similar performance, when we use:
22
Non-overlapping statistical ranks
Expanded scope
Clean data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike the prior work, techniques are grouped into 4 distinct ranks
Do most techniques produce models with similar performance, when we use:
23
Non-overlapping statistical ranks
Expanded scope
Clean data
Yes, techniques
are grouped into
2 distinct ranks
No, unlike the prior work, techniques are grouped into 4 distinct ranks
Another dataset:The PROMISE corpus
24
Another dataset:Four significant ranks of techniques
25
Rank12
LMT, SL, …KNN, RBFs, …
Technique
3 J48, K-means, …4 SMO, Ridor, …
Unlike the prior work, techniques are grouped into 4 distinct ranks
Top performers are LMT and logistic regression
Do most techniques produce models with similar performance, when we use:
26
Non-overlapping statistical ranks
Expanded scope
Clean data
No, similar to the
clean data study,
techniques are
grouped into 4
distinct ranks
Yes, techniques
are grouped into
2 distinct ranks
No, unlike the prior work, techniques are grouped into 4 distinct ranks
Classification techniquematters!
27
Decision Trees Logistic Model Trees (LMT)
+
Low-cost suggestion:Experiment with the available techniques
28
6,618 packages
are available
on CRAN
148 packages are available in package explorer