+ All Categories
Home > Documents > Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date:...

Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date:...

Date post: 23-Dec-2015
Category:
Upload: bridget-henry
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011
Transcript
Page 1: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Defect prediction using social network analysis on issue repositories

Reporter: Dandan WangDate: 04/18/2011

Page 2: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Basic information

• Conference: ICSSP 2011• Authors – Serdar Bicer• Gerger consulting, Istanbul, Turkey

– Ayse Bsar Bener• Ryerson university, Ted Rogers School of information

Technology Management, Toronto, Canada

– Bora Caglayan• Bogazici university Department of Computer

Engineering, Istanbul, Turkey

Page 3: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Outline

1

•Introduction

2

•Methodology

3

•Results

4

•Conclusion

Page 4: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Introduction

• Objective – Overcome ceiling effects of defect predictors.

• Research question– What is the benefit of social network metrics on issue

repositories to predict defects?• Metrics – Social network metrics – Churn metrics

• Method– Naive Bayes (Learning based prediction model)

Page 5: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Outline

1

•Introduction

2

•Methodology

3

•Results

4

•Conclusion

Page 6: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Methodology

• Dataset • Communication structure in projects• Metrics used • Defect prediction model• Performance measures

Page 7: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Dataset • RTC

– Year: 2007 and 2008.– Team: Large distributed team and used the Jazz platform– Version control system, issue repository

• Drupal– Year: 2009-2010– Team : Large distributed team – Public CVS repository, issue repository(bug reports, feature requests, and

other tasks)

Page 8: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Data extraction process for datasets

• Nodes in graphs represents developers who commented on each file.

• Files were labeled as defective if they were modified after snapshot date.

Page 9: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Communication structure in projects

• RTC and Drupal projects are similar to each other in communication structure.

• Commenting on issues is the main task-related communication used by contributors in both projects. If a commit in version control system is related with an issue, issue number is written to commit message.

• Jazz framework automatically creates a connection from issue to change set, which is not available in Drupal.

• The issues are assigned to and owned by contributors. • Other project members express their opinions by commenting

on issues.

Page 10: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Metrics used

• While first 6 metrics were used in previous studies [22, 33, 44, 42],• Diameter, Clustering Coefficient, Bridge Rate, and Characteristic Path

Length are new metrics

Page 11: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Defect prediction model

• Metrics – Social network metrics on issue repositories

• Algorithm– Naive Bayes data mining algorithm

• Validation – 10*10-fold cross validation to eliminate sampling

bias– Cost-benefit analysis (Weka software)

Page 12: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Performance measures

• Widely used performance measures– Probability of detection(pd)– Probability of false alarms(pf)

Higher balances are better because their points (pd, pf) are closer to the ideal point (1, 0)

Page 13: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Cost-benefit analysis

Page 14: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Cost curve

• Cost curve is proposed by Drummond and Holte to supply the deficiencies of ROC curves. It is a visualization technique that shows classifier’s performance based on the cost of misclassification.– X: PC(+). Probability of positive class, combination of the two

misclassification costs and the class distribution into a single value.– Y: NEC. Normalized expected cost which denotes error rate.

Page 15: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Outline

1

•Objective

2

•Methodology

3

•Results

4

•Conclusion

Page 16: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Results

• Prediction performance analysis

• T-test analysis: statistically significantly• Cost-benefit analysis

Page 17: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Cost curves for datasets

Page 18: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Beneficial outcomes • Our proposed model either considerably decreases high false alarm rates without

compromising the detection rates or considerably increases low prediction rates without compromising low false alarm rates compared to churn metrics. In both cases this results in increase of overall prediction performance. Consequently, this leads to decrease in verification costs compared to churn metrics. Thus we recommend practitioners to collect social network metrics on issue repositories.

• We can interpret this result as structure of information flow in a developer communication network has significant effect on code quality. Since our metrics are directly related with network’s topology, this model can help managers to build developer networks more efficiently.

• We used only a recent part of developer communication history to construct our model. Communication between project members begins at the start and continues until the end of the project. But in this study, we did not collect full communication history. This is important for software teams which have begun to keep record of developer communication after the beginning of the project because our proposed model can also be used for these kind of projects.

Page 19: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Outline

1

•Objective

2

•Methodology

3

•Results

4

•Conclusion

Page 20: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Conclusion

• Reason: communication and coordination between developers is important but patterns of interaction between developers have not been investigated for defect prediction.

• Main contribution of this study is using new data source and metrics in the area of defect prediction.

• Performance analysis– Churn metrics, social network metrics– Pd,pf, balance

• Cost-benefit analysis. – Social network metrics on issue repositories reduced costs required for

verification of prediction results and made results closer to cost-adverse region of ROC curve.

Page 21: Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.

Thank you!

Q&A


Recommended