Ghotra icse

transcript

Revisiting the Impact of Classification Techniques on the Performance of

Defect Prediction Models

Baljinder Ghotra

Ahmed E.Hassan

Shane McIntosh

Quality assurance teams have limited resources

Personnel Schedules

Executing all test suitestakes too long

Often release several timesin one day!

Defect models can help QA teams to allocate limited resources effectively

Defect prediction

Defect models are trained using historical data to predict the defect-prone modules

Reasonfor change

Changedmodules

Developerresponsible

Defect prediction model

Defect models are trained using historical data to predict the defect-prone modules

abccaNew!c

Low riska b

High risk

Defect models are trained using various techniques

Simple techniques

Advanced techniques

Decision Trees

Logistic Regression+

Logistic Model Trees (LMT)

Most classification techniques produce models that achieve similar performance?

Decision Trees Logistic Model Trees (LMT)

The performance of 17 of 22 studied techniques are

indistinguishableBenchmarking classification models for software defect

predictionS. Lessmann, B. Baesens,

C. Mues, S. Pietsch [TSE 2008]

Limitations of the prior work

Overlapping statistical ranks

Noisy data

Limited scope

Do most techniques produce models with similar performance, when we use:

Non-overlappingstatistical ranks

Cleandata

Expandedscope

Overlapping statistical ranks

Noisy data

Limited scope

Non-overlapping statistical ranks

Expanded scope

Clean data

Expanded scope

Clean data

Our approach to study the impact of classification techniques on defect models

Train and test models

using different

techniques

Rank techniques

using statistical clustering

Performance scores for

each technique

Rank Tech.123

z, …a,b,…

Repeat100 times

Unfortunately, some projects yieldpoorer results than others

●●

●●●

●●

Performance values rarely overlap!

Non-overlapping ranks using a double Scott-Knott test

Scott-Knotttest (2nd run)

Project 2

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T2, T5, T7

TechniqueRank

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T3, T7, T8

TechniqueRank

T2, T102

T1, T4, T63

T5, T94

Project M

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T2, T10

TechniqueRank

T1, T7, T82

T3, T4, T63

T5, T94

Non-overlapping ranks using a double Scott-Knott test

Scott-Knotttest (2nd run)

Project 2

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T2, T5, T7

TechniqueRank

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T3, T7, T8

TechniqueRank

T2, T102

T1, T4, T63

T5, T94

Project M

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

T2, T10

TechniqueRank

T1, T7, T82

T3, T4, T63

T5, T94

Non-overlapping test:Most techniques have similar performance

Rank12

Ad+NB, EM, RBFs, …Rsub+SMO, J48, …

Technique

Similar to the prior work, techniques are grouped into 2 distinct ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Clean NASA dataset:Cleaning criteria of prior work

Data Quality: Some Comments on the NASA Software Defect Datasets

M. Shepperd, Q. Song, Z. Sun, C. Mair [TSE 2013]

Identical cases

Missing values

Constraint violations

Clean NASA dataset:Many distinct ranks of techniques

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Another dataset:The PROMISE corpus

Another dataset:Four significant ranks of techniques

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Expanded scope

Clean data

No, similar to the

clean data study,

techniques are

grouped into 4

distinct ranks

Yes, techniques

are grouped into

2 distinct ranks

Classification techniquematters!

Decision Trees Logistic Model Trees (LMT)

Low-cost suggestion:Experiment with the available techniques

6,618 packages

are available

on CRAN

148 packages are available in package explorer

shanemcintosh@acm.org

Ghotra icse

Documents