+ All Categories
Home > Documents > Transfer defect learning

Transfer defect learning

Date post: 25-Jun-2015
Category:
Upload: sung-kim
View: 644 times
Download: 0 times
Share this document with a friend
Description:
JC's ICSE 2013 presentation.
Popular Tags:
52
Transfer Defect Learning Jaechang Nam The Hong Kong University of Science and Technology, China Sinno Jialian Pan Institute for Infocomm Research, Singapore Sunghun Kim The Hong Kong University of Science and Technology, China
Transcript
Page 1: Transfer defect learning

Transfer Defect Learning

Jaechang Nam The Hong Kong Un iver s i t y o f Sc ience and Techno logy , Ch ina

Sinno Jialian Pan I n s t i t u te for I n focomm Research , S in gapore

Sunghun Kim The Hong Kong Un iver s i t y o f Sc ience and Techno logy , Ch ina

Page 2: Transfer defect learning

Defect Prediction

• Hassan et al.@ICSE`09, Predicting Faults Using the Complexity of Code Changes

• D’Ambros et al.@MSR`10, An Extensive Comparison of Bug Prediction Approaches

• Rahman et al.@ICSE`12, Recalling the Impression of Cross-Project Defect Prediction

• Hata et al.@ICSE`12, Bug Prediction based on Fine -grained Module histories

• …

2

Program Prediction Model (Machine learning)

Future defects

Page 3: Transfer defect learning

Training prediction model

3

Test set

Training set

Page 4: Transfer defect learning

Training prediction model

3

Test set

Training set

M1 M2 … M19 M20 Class

11 5 … 53 78 Buggy

… … … … … …

1 1 … 3 9 Clean

M1 M2 … M19 M20 Class

2 1 … 2 8 ?

… … … … … …

13 6 … 45 69 ?

Page 5: Transfer defect learning

Cross prediction model

4

Target project (Test set)

Source project (Training set)

Page 6: Transfer defect learning

Cross-project Defect Prediction

5

“Training data is often not available, either

because a company is too small or it is the first

release of a product”

Zimmerman et al.@FSE`09, Cross-project Defect Prediction

Page 7: Transfer defect learning

Cross-project Defect Prediction

5

“Training data is often not available, either

because a company is too small or it is the first

release of a product”

Zimmerman et al.@FSE`09, Cross-project Defect Prediction

“For many new projects we may not have enough

historical data to train prediction models.”

Rahman, Posnett, and Devanbu @ICSE`12, Recalling the

“Imprecision” of Cross-project Defect Prediction

Page 8: Transfer defect learning

Cross-project defect prediction

• Zimmerman et al.@FSE`09

– “We ran 622 cross-project predictions and found

only 3.4% actually worked.”

6

Worked, 3.4%

Not worked, 96.6%

Page 9: Transfer defect learning

Cross-company defect prediction

• Turhan and Menzies et al.@ESEJ`09

– “Within-company data models are still the best”

7

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Cross Cross with a NN

filter

Within

Avg. F-measure

Page 10: Transfer defect learning

Cross-project defect prediction

• Rahman, Posnett, and Devanbu@FSE`12

8

0

0.1

0.2

0.3

0.4

0.5

0.6

Cross Within

Avg. F-measure

Page 11: Transfer defect learning

Cross prediction results

9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F-measure

Cross Within Cross Within Cross Within

Equinox JDT Lucene

Page 12: Transfer defect learning

Approaches of Transfer Defect Learning

10

Normalization TCA

TCA+

Page 13: Transfer defect learning

11

• Data preprocessing for training and test data Normalization

• A state-of-the art transfer learning algorithm

• Transfer Component Analysis TCA

• Adapted TCA for cross-project defect prediction

• Decision rules to select a suitable data normalization option TCA+

Approaches of Transfer Defect Learning

Page 14: Transfer defect learning

Data Normalization

• Adjust all feature values in the same scale

– E.g., Make Mean = 0 and Std = 1

• Known to be helpful for classification

algorithms to improve prediction

performance [Han et al. 2012].

12

Page 15: Transfer defect learning

Normalization Options

• N1: Min-max Normalization (max=1, min=0)

[Han et al., 2012]

• N2: Z-score Normalization (mean=0, std=1)

[Han et al., 2012]

• N3: Z-score Normalization only using source

mean and standard deviation

• N4: Z-score Normalization only using target

mean and standard deviation

13

Page 16: Transfer defect learning

14

• Data preprocessing for training and test data Normalization

• A state-of-the art transfer learning algorithm

• Transfer Component Analysis TCA

• Adapted TCA for cross-project defect prediction

• Decision rules to select a suitable data normalization option TCA+

Approaches of Transfer Defect Learning

Page 17: Transfer defect learning

Transfer Learning

15

Page 18: Transfer defect learning

Transfer Learning

15

Traditional Machine Learning (ML)

Learning

System

Learning

System

Page 19: Transfer defect learning

Transfer Learning

15

Traditional Machine Learning (ML)

Learning

System

Learning

System

Transfer Learning

Learning

System

Learning

System

Knowledge

Transfer

Page 20: Transfer defect learning

Transfer Learning

15

Traditional Machine Learning (ML)

Learning

System

Learning

System

Transfer Learning

Learning

System

Learning

System

Knowledge

Transfer

Page 21: Transfer defect learning

A Common Assumption in

Traditional ML

16

Pan and Yang@TKDE`10, Survey on Transfer Learning

• Same distribution

Page 22: Transfer defect learning

A Common Assumption in

Traditional ML

16

Pan and Yang@TKDE`10, Survey on Transfer Learning

• Same distribution

Cross Prediction

Page 23: Transfer defect learning

A Common Assumption in

Traditional ML

16

Pan and Yang@TKDE`10, Survey on Transfer Learning

• Same distribution

Transfer Learning

Page 24: Transfer defect learning

Transfer Component Analysis

• Unsupervised Transfer learning

– Target project labels are not known.

• Must have the same feature space

• Make distribution difference between

training and test datasets similar

17

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 25: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

18

Page 26: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

18

2-dimensional feature space

Page 27: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

18

1-dimensional feature space

Page 28: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

18

1-dimensional feature space

Page 29: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

18

1-dimensional feature space

2-dimensional feature space

Page 30: Transfer defect learning

Transfer Component Analysis (cont.)

• Feature extraction approach

– Dimensionality reduction

– Projection

• Map original data

in a lower-dimensional feature space

– C.f. Principal Component Analysis (PCA)

18

1-dimensional feature space

Page 31: Transfer defect learning

Transfer Component Analysis (cont.)

19

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Target domain data Source domain data

Page 32: Transfer defect learning

Transfer Component Analysis (cont.)

20

PCA TCA

Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis

Page 33: Transfer defect learning

Preliminary Results using TCA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure

21 *Baseline: Cross-project defect prediction without TCA and normalization

Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4

Safe Apache Apache Safe

Page 34: Transfer defect learning

Preliminary Results using TCA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure

21 *Baseline: Cross-project defect prediction without TCA and normalization

Prediction performance of TCA

varies according to different

normalization options!

Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4

Safe Apache Apache Safe

Page 35: Transfer defect learning

22

• Data preprocessing for training and test data Normalization

• A state-of-the art transfer learning algorithm

• Transfer Component Analysis TCA

• Adapted TCA for cross-project defect prediction • Decision rules to select a suitable data

normalization option TCA+

Approaches of Transfer Defect Learning

Page 36: Transfer defect learning

TCA+: Decision rules

• Find a suitable normalization for TCA

• Steps

– #1: Characterize a dataset

– #2: Measure similarity

between source and target datasets

– #3: Decision rules

23

Page 37: Transfer defect learning

#1: Characterize a dataset

24

3

1

Dataset A Dataset B

2

4

5

8

9

6

11

d1,2

d1,5

d1,3

d3,11

3

1

2 4

5

8

9

6 11

d2,6

d1,2

d1,3

d3,11

DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}

A

Page 38: Transfer defect learning

#2: Measure Similarity between source and target

• Minimum (min) and maximum (max) values of

DIST

• Mean and standard deviation (std) of DIST

• The number of instances

25

Page 39: Transfer defect learning

#3: Decision Rules

• Rule #1

– Mean and Std are same NoN

• Rule #2

– Max and Min are different N1 (max=1, min=0)

• Rule #3,#4

– Std and # of instances are different

N3 or N4 (src/tgt mean=0, std=1)

• Rule #5

– Default N2 (mean=0, std=1)

26

Page 40: Transfer defect learning

EVALUATION

27

Page 41: Transfer defect learning

Experimental Setup

• 8 software subjects

• Machine learning algorithm

– Logistic regression

28

ReLink (Wu et al.@FSE`11)

Projects # of metrics

(features)

Apache 26

(Source code) Safe

ZXing

AEEEM (D’Ambros et al.@MSR`10)

Projects # of metrics

(features)

Apache Lucene (LC)

61

(Source code,

Churn,

Entropy,…)

Equinox (EQ)

Eclipse JDT

Eclipse PDE UI

Mylyn (ML)

Page 42: Transfer defect learning

Experimental Design

29

Test set

(50%)

Training set

(50%)

Within-project defect prediction

Page 43: Transfer defect learning

Experimental Design

30

Target project (Test set)

Source project (Training set)

Cross-project defect prediction

Page 44: Transfer defect learning

Experimental Design

31

Target project (Test set)

Source project (Training set)

Cross-project defect prediction with TCA/TCA+

TCA/TCA+

Page 45: Transfer defect learning

RESULTS

32

Page 46: Transfer defect learning

ReLink Result

33 *Baseline: Cross-project defect prediction without TCA/TCA+

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

F-measure

Baseline TCA TCA+ Within

Safe Apache Apache Safe Safe ZXing

Baseline TCA TCA+ Within Baseline TCA TCA+ Within

Page 47: Transfer defect learning

ReLink Result F-measure

34

Cross

Source Target

Safe Apache

Zxing Apache

Apache Safe

Zxing Safe

Apache ZXing

Safe ZXing

Average

Baseline

0.52

0.69

0.49

0.59

0.46

0.10

0.49

TCA

0.64

0.64

0.72

0.70

0.45

0.42

0.59

TCA+

0.64

0.72

0.72

0.64

0.49

0.53

0.61

Within

Target Target

0.64

0.62

0.33

0.53

*Baseline: Cross-project defect prediction without TCA/TCA+

Page 48: Transfer defect learning

AEEEM Result

35 *Baseline: Cross-project defect prediction without TCA/TCA+

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F-measure

Baseline TCA TCA+ Within

JDT EQ PDE LC PDE ML

Baseline TCA TCA+ Within Baseline TCA TCA+ Within

Page 49: Transfer defect learning

AEEEM Result F-measure

36

Cross Source Target

JDT EQ

LC EQ

ML EQ

PDE LC

EQ ML

JDT ML

LC ML

PDE ML

Average

Baseline

0.31

0.50

0.24

0.33

0.19

0.27

0.20

0.27

0.32

TCA

0.59

0.62

0.56

0.27

0.62

0.56

0.58

0.48

0.41

TCA+

0.60

0.62

0.56

0.33

0.62

0.56

0.60

0.54

0.41

Within

Source Target

0.58

0.37

0.30

0.42

Page 50: Transfer defect learning

Threats to Validity

• Systems are open-source projects.

• Experimental results may not be

generalizable.

• Decision rules in TCA+ may not be

generalizable.

37

Page 51: Transfer defect learning

Future Work

• Transfer defect learning on different

feature space

– e.g., ReLink AEEEM

AEEEM ReLink

• Local models using Transfer Learning

• Adapt Transfer learning in other Software

Engineering (SE) problems

– e.g., Knowledge from mailing lists

Bug triage problem

38

Page 52: Transfer defect learning

Conclusion

• TCA+

– TCA

• Make distributions of source and target similar

– Decision rules to improve TCA

– Significantly improved cross-project defect prediction performance

• Transfer Learning in SE

– Transfer learning may benefit other

prediction and recommendation systems in

SE domains.

39


Recommended