+ All Categories
Home > Software > A Case Study of Bias in Bug-Fix Datasets

A Case Study of Bias in Bug-Fix Datasets

Date post: 15-Apr-2017
Category:
Upload: sailqu
View: 72 times
Download: 2 times
Share this document with a friend
16
SAIL, School of Computing, Queen’s University, Kingston, Canada A Case Study of Bias in Bug-Fix Datasets Thanh H. D. Nguyen, Bram Adams, Ahmed E. Hassan
Transcript
Page 1: A Case Study of Bias in Bug-Fix Datasets

SAIL, School of Computing, Queen’s University, Kingston, Canada

A Case Study of Bias in Bug-Fix Datasets

Thanh H. D. Nguyen, Bram Adams, Ahmed E. Hassan

Page 2: A Case Study of Bias in Bug-Fix Datasets

2

We need bug prediction• Problem:

• Quality improvement resource is limited.• Solution:

• Bug prediction identifies defect-prone modules.

Our focus is data quality

Page 3: A Case Study of Bias in Bug-Fix Datasets

3

What if there is sample

bias?

Page 4: A Case Study of Bias in Bug-Fix Datasets

We should consider bias in our studies

Stanford graduate student housing survey

Page 5: A Case Study of Bias in Bug-Fix Datasets
Page 6: A Case Study of Bias in Bug-Fix Datasets

6

1

2

#1

#2

#2

Unlinked bugs have:Higher severityLess experience[Bird al et. 2009] Linkage Bias

Page 7: A Case Study of Bias in Bug-Fix Datasets

7

1

2

#1

#2

#2

Page 8: A Case Study of Bias in Bug-Fix Datasets

8

1

2

#1

#2

#2

Tagging BiasAbout 2/3 of all bugs

reports are not defects[Antoniol al et. 2008].

Page 9: A Case Study of Bias in Bug-Fix Datasets

9

Biases are threats to validity of software quality studies

• Because of linkage bias, our models:• neglect higher severity bugs.• neglect less experienced developers.

• Because of tagging bias, our models:• inaccurately consider more bugs that existed.

Do biases really exist? How do biases

affect our research?

Page 10: A Case Study of Bias in Bug-Fix Datasets

10

Page 11: A Case Study of Bias in Bug-Fix Datasets

11

Page 12: A Case Study of Bias in Bug-Fix Datasets

12

Near ideal data:Linkage is enforced.Tagging is provided.

Page 13: A Case Study of Bias in Bug-Fix Datasets

13

Severity

Experience

Maturity

Release pressure

Collaboration

✔✔−−−

−✔✔−−

Conjecture: Biases are properties of the

software process, not of missing links.

Do linkage biases exist in Jazz?

Page 14: A Case Study of Bias in Bug-Fix Datasets

14

Severity

Experience

Maturity

Release pressure

Collaboration

✔✔−−

Question:How does

tagging biases affect our research?

Do tagging biases exist in Jazz?

Page 15: A Case Study of Bias in Bug-Fix Datasets

15

How tagging biases affect our research?

Files Defects + Tasks

A 5B 4C 6D 1

Defects only

3441

Not biasWhich we should use

BiasWhich we

normally use

Spearman: .94Pearson: .97

Conjecture: It might be ok to

use biased data.

Page 16: A Case Study of Bias in Bug-Fix Datasets

16


Recommended