Ghotra icse

Post on 17-Aug-2015

25 views 3 download

Tags:

transcript

Revisiting the Impact of Classification Techniques on the Performance of

Defect Prediction Models

Baljinder Ghotra

Ahmed E.Hassan

Shane McIntosh

Quality assurance teams have limited resources

Personnel Schedules

2

Executing all test suitestakes too long

3

Often release several timesin one day!

Defect models can help QA teams to allocate limited resources effectively

4

Defect prediction

model

Defect models are trained using historical data to predict the defect-prone modules

5

a

b

c c

a

New!

c

Reasonfor change

Changedmodules

Developerresponsible

Defect prediction model

Defect models are trained using historical data to predict the defect-prone modules

6

abccaNew!c

Low riska b

High risk

c

Defect models are trained using various techniques

7

Simple techniques

Advanced techniques

Decision Trees

Logistic Regression+

Logistic Model Trees (LMT)

Most classification techniques produce models that achieve similar performance?

8

Decision Trees Logistic Model Trees (LMT)

+

The performance of 17 of 22 studied techniques are

indistinguishableBenchmarking classification models for software defect

predictionS. Lessmann, B. Baesens,

C. Mues, S. Pietsch [TSE 2008]

Limitations of the prior work

9

Overlapping statistical ranks

Noisy data

Limited scope

Do most techniques produce models with similar performance, when we use:

10

Non-overlappingstatistical ranks

Cleandata

Expandedscope

Overlapping statistical ranks

Noisy data

Limited scope

Do most techniques produce models with similar performance, when we use:

11

Non-overlapping statistical ranks

Expanded scope

Clean data

Do most techniques produce models with similar performance, when we use:

12

Non-overlapping statistical ranks

Expanded scope

Clean data

Our approach to study the impact of classification techniques on defect models

13

Train and test models

using different

techniques

Rank techniques

using statistical clustering

11a

22b

NNz

...

Performance scores for

each technique

Rank Tech.123

z, …a,b,…

Repeat100 times

Unfortunately, some projects yieldpoorer results than others

14

●●

●●

●●●

●●

●●

CM1

JM1

KC1

KC3

KC4

MW1

PC1

PC2

PC3

PC4

0.5

0.6

0.7

0.8

0.9

AUC

Performance values rarely overlap!

Non-overlapping ranks using a double Scott-Knott test

15

Scott-Knotttest (2nd run)

Project 2

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T5, T7

TechniqueRank

1

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

1

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T3, T7, T8

TechniqueRank

1

T2, T102

T1, T4, T63

T5, T94

Project M

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T10

TechniqueRank

1

T1, T7, T82

T3, T4, T63

T5, T94

...

Non-overlapping ranks using a double Scott-Knott test

16

Scott-Knotttest (2nd run)

Project 2

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T5, T7

TechniqueRank

1

T1, T102

T3, T4, T63

T8, T94

T2, T5

TechniqueRank

1

T1, T7, T102

T3, T4, T63

T8, T94

Project 1

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T3, T7, T8

TechniqueRank

1

T2, T102

T1, T4, T63

T5, T94

Project M

Scott-Knotttest (1st run)

...Mean AUC value

Technique 1

Mean AUC value

Technique 1

Mean AUC value

Technique 1

10xMean AUC

value

Technique 2

Mean AUC value

Technique 2

Mean AUC value

Technique 2

10xMean AUC

value

Technique N

Mean AUC value

Technique N

Mean AUC value

Technique N

10x

T2, T10

TechniqueRank

1

T1, T7, T82

T3, T4, T63

T5, T94

...

17

Non-overlapping test:Most techniques have similar performance

Rank12

Ad+NB, EM, RBFs, …Rsub+SMO, J48, …

Technique

Similar to the prior work, techniques are grouped into 2 distinct ranks

Do most techniques produce models with similar performance, when we use:

18

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Do most techniques produce models with similar performance, when we use:

19

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

Clean NASA dataset:Cleaning criteria of prior work

20

Data Quality: Some Comments on the NASA Software Defect Datasets

M. Shepperd, Q. Song, Z. Sun, C. Mair [TSE 2013]

Identical cases

Missing values

Constraint violations

Clean NASA dataset:Many distinct ranks of techniques

21

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Do most techniques produce models with similar performance, when we use:

22

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Do most techniques produce models with similar performance, when we use:

23

Non-overlapping statistical ranks

Expanded scope

Clean data

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Another dataset:The PROMISE corpus

24

Another dataset:Four significant ranks of techniques

25

Rank12

LMT, SL, …KNN, RBFs, …

Technique

3 J48, K-means, …4 SMO, Ridor, …

Unlike the prior work, techniques are grouped into 4 distinct ranks

Top performers are LMT and logistic regression

Do most techniques produce models with similar performance, when we use:

26

Non-overlapping statistical ranks

Expanded scope

Clean data

No, similar to the

clean data study,

techniques are

grouped into 4

distinct ranks

Yes, techniques

are grouped into

2 distinct ranks

No, unlike the prior work, techniques are grouped into 4 distinct ranks

Classification techniquematters!

27

Decision Trees Logistic Model Trees (LMT)

+

Low-cost suggestion:Experiment with the available techniques

28

6,618 packages

are available

on CRAN

148 packages are available in package explorer

shanemcintosh@acm.org