+ All Categories
Home > Software > Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

Date post: 11-Feb-2017
Category:
Upload: feng-zhang
View: 156 times
Download: 1 times
Share this document with a friend
88
Cross - project Defect Prediction Using a Connectivity - based Unsupervised Classifier Feng Zhang Quan Zheng Ying Zou Ahmed E. Hassan
Transcript

Cross-project Defect Prediction Using a Connectivity-based

Unsupervised Classifier

Feng Zhang Quan Zheng Ying Zou Ahmed E. Hassan

Defect prediction

Training

Defect prediction

Past data to build the model

Training Target

Past data to build the model New

Defect prediction

Training Target

Past data to build the model New

Within-project defect prediction

Target

Past data to build the model

Historical data may not be available

Target

Historical data may not be available

Other projects as training data

Target

Target

Cross-project defect prediction

Train-ing

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Defect data

Cross-project defect prediction

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity across projects(ICSM 2013)

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Supervisedclassifier

Softwaremetrics

Defect data

Cross-project defect prediction

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Our Previous Solution(MSR 2014)

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

Heterogeneity

Our Previous Solution(MSR 2014)

Supervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

Heterogeneity

Unsupervisedclassifier

Softwaremetrics

Defect data

Softwaremetrics

Targetproject

Defect proneness

Trainingproject

How About Using Unsupervised Classifiers?

HeterogeneityInitial attempts using K-means were not very successful.

How About Using Unsupervised Classifiers?

How About Using Unsupervised Classifiers?

Short distance

How About Using Unsupervised Classifiers?

Short distance

How About Using Unsupervised Classifiers?

Long distance

Long distance

How About Using Unsupervised Classifiers?

Long distance

Long distance

How About Using Unsupervised Classifiers?

Connections

Connections

Social network

c

Far away in distance but may be connected !c

Far away in distance but may be connected !

Far away in distance but may be connected !

Connection is more important than distance.

Far away in distance but may be connected !

Are defective software entities connected to each other?

Within-community and cross-community connections

Stronger Stronger

Weaker

Within-community and cross-community connections

Stronger Stronger

Weaker

Defective entities tend to connect to other defective entities.

Within-community and cross-community connections

Our connectivity-based unsupervised approach

Consider each entity (file/class) as a node

Step 1. Compute software metrics

Step 2. Build a graph based on the similarity

Step 3. Make a bipartition on the graph

Step 4. Label the defective cluster

Defective Clean

17 lines of R code is provided in the paper

Looks simple? Does it really work?

Research questions

RQ1. How does the spectral clustering based classifier perform in cross-project defect prediction?

RQ2. Does the spectral clustering based classifier perform well in within-projectdefect prediction?

Subject projects (Total: 26)

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

Subject projects (Total: 26)

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

CM1 JM1 KC3 MC1 MC2 MW1

NASA (11 projects)

PC1 PC2 PC3 PC4 PC5

Subject projects (Total: 26)

Subject projects (Total: 26)

Equinox JDT Lucene Mylyn PDE

AEEEM (5 projects)

CM1 JM1 KC3 MC1 MC2 MW1

NASA (11 projects)

PC1 PC2 PC3 PC4 PC5

PROMISE (10 projects)

Ant Camel Ivy Jedit Log4j

Lucene POI Tomcat Xalan Xerces

Classifiers for comparison (Total: 9)

Unsupervised

1. K-means clustering (KM)

2. Partition around medoids (PAM)

3. Fuzzy C-means (FCM)

4. Neural-gas (NG)

Classifiers for comparison (Total: 9)

Unsupervised

1. K-means clustering (KM)

2. Partition around medoids (PAM)

3. Fuzzy C-means (FCM)

4. Neural-gas (NG)

Supervised

1. Random forest (RF)

2. Naïve Bayes (NB)

3. Logistic regression (LR)

4. Decision tree (DT)

5. Logistic model tree (LMT)

Classifiers for comparison (Total: 9)

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

NASA

AEEEM

PROMISE

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

NASA

AEEEM

PROMISE

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

NASA

AEEEM

PROMISE

AverageAUC

AverageAUC

AverageAUC

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

AverageAUC

AverageAUC

AverageAUC

NASA

AEEEM

PROMISE

Rank classifiers(Scott-Knott Test)

RQ1. How does the spectral clustering based classifier perform in cross-projectdefect prediction?

RQ1. Results (cross-project)

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Red text:Unsupervised

Blue text:Supervised

Rank 1

Rank 2

Rank 3

Rank 4

RQ1. Results (cross-project)

Our approach can compete with supervised classifiers under study,

and sometime is even better.

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

50%

50%

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

50%

50%

AUCTraining Testing

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

50%

50%

AUCTraining

Training

Testing

Testing AUC

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

50%

50%

AUCTraining

Training

Testing

Testing AUC

50%

50%

AUCTraining

Training

Testing

Testing AUC

…(500 random splits, thus 1,000 evaluations)

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

50%

50%

AUC

Rank classifiers(Scott-Knott Test)

Training

Training

Testing

Testing AUC

50%

50%

AUCTraining

Training

Testing

Testing AUC

…(500 random splits, thus 1,000 evaluations)

RQ2. Does the spectral clustering based classifier perform well in within-project defect prediction?

RQ2. Results (within-project)

RQ2. Results (within-project)

1

Random forest

Gold

RQ2. Results (within-project)

12

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Silver Gold

12 3

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Fuzzy C-means

RQ2. Results (within-project)

Silver BronzeGold

12 3

Random forest

Logistic regressionSpectral clusteringLogistic model treeNaïve Bayes

Fuzzy C-means

RQ2. Results (within-project)

Silver BronzeGold

Our approach can achieve similar performance as supervised classifiers,

except random forest.

Summary

Feng Zhang([email protected]) (http://www.feng-zhang.com)


Recommended