+ All Categories
Home > Spiritual > Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Date post: 02-Jul-2015
Category:
Upload: sung-kim
View: 281 times
Download: 1 times
Share this document with a friend
Description:
Yida's FSE presentation.
65
Automatically Generated Patches as Debugging Aids: A Human Study Yida Tao , Jindae Kim, Sunghun Kim Dept. of CSE, The Hong Kong University of Science and Technology Chang Xu State Key Lab for Novel Software Technology, Nanjing University
Transcript
Page 1: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Automatically Generated Patches as Debugging Aids: A Human Study

Yida Tao, Jindae Kim, Sunghun Kim

Dept. of CSE, The Hong Kong University of Science and Technology

Chang Xu

State Key Lab for Novel Software Technology, Nanjing University

Page 2: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

• Promising research progress• ClearView1: Prevent all 10 Firefox exploits

• GenProg2: Fix 55/105 real bugs

[1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09[2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12

2

Automatic Program Repair

Page 3: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

3

Automatic Program Repair

Page 4: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code

4

“It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.”

Automatic Program Repair

Page 5: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

#what-could-possibly-go-wrong

• Blackbox repair

• Increasing maintenance cost

• Vulnerable to attack

- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code- A human study of patch maintainability. ISSTA’12- Automatic patch generation learned from human-written patches. ICSE’13

5

Page 6: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code- A human study of patch maintainability. ISSTA’12- Automatic patch generation learned from human-written patches. ICSE’13

#program-out-of-control

6

#what-could-possibly-go-wrong

• Blackbox repair

• Increasing maintenance cost

• Vulnerable to attack

Page 7: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Use automatically generated patches as debugging aids

7

Page 8: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Use automatically generated patches as debugging aids

Our Human Study

• Investigate the usefulness of generated patches as debugging aids

• Discuss the impact of patch quality on debugging performance

• Explore practitioners’ feedback on adopting automatic program repair

8

Page 9: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Methodology

9

Page 10: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid

10

Debugis given to

Page 11: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 11

Page 12: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 12

Low-quality generated patch

Page 13: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 13

Low-quality generated patch

High-quality generated patch

Page 14: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 14

Low-quality generated patch

High-quality generated patch

Buggy method location

Page 15: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 15

Grad: 44

Engr: 28

MTurk: 23

95 Participants

CS graduate students

Industrial software engineers

Amazon Mechanical Turk workers

Page 16: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 16

Page 17: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 17

44 Graduate students• Between-group design

14 students

15 students

15 students

Page 18: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 18

44 Graduate students• Between-group design

Low-quality generated patch

High-quality generated patch

Buggy method location

14 students

15 students

15 students

Page 19: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 19

44 Graduate students• Between-group design• Onsite setting

• Eclipse IDE• Supervised session

Low-quality generated patch

High-quality generated patch

Buggy method location

14 students

15 students

15 students

Page 20: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 20

Low-quality generated patch

High-quality generated patch

Buggy method location

Remote participants(28 Engr + 23 MTurk)

• Within-group design

Page 21: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 21

Remote participants(28 Engr + 23 MTurk)

• Within-group design• Online debugging system

Low-quality generated patch

High-quality generated patch

Buggy method location

Page 22: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 22

Page 23: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 23

Bug Selection Criteria

• Real bugs

• The bug has accepted patches written by developers

• Proper number of bugs

• The bug has generated patches with different quality

Page 24: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 24

Automatic patch generation learned from human-written patches. Kim et al. ICSE’13

Page 25: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 25

Automatic patch generation learned from human-written patches. Kim et al. ICSE’13

Auto-generated patch A Auto-generated patch B

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){

args[i+1] = sub.toString();}

}

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);

}

Page 26: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 26

Automatic patch generation learned from human-written patches. Kim et al. ICSE’13

Auto-generated patch A Auto-generated patch B

avg. ranking from 85 devs and students

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){

args[i+1] = sub.toString();}

}

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);

}

1.6

2.8

Page 27: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 27

Automatic patch generation learned from human-written patches. Kim et al. ICSE’13

Auto-generated patch A Auto-generated patch B

avg. ranking from 85 devs and students

High-Quality Patch Low-Quality patch

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){

args[i+1] = sub.toString();}

}

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);

}

1.6

2.8

Page 28: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 28

Page 29: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 29

Participants submit 337 patches as their debugging outcome

Page 30: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 30

Participants submit 337 patches as their debugging outcome

Location109

LowQ112

HighQ116# submitted patches

w.r.t debugging aid

Page 31: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

BugsParticipantsDebugging aid 31

Participants submit 337 patches as their debugging outcome

Location109

LowQ112

HighQ116# submitted patches

w.r.t debugging aid

Bug166

Bug274

Bug359

Bug476

Bug562

# submitted patches w.r.t bugs

Page 32: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Evaluation of debugging performance

32

Page 33: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Patch CorrectnessCorrectness

33

Page 34: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Patch Correctness

• Passing test casesCorrectness

34

Page 35: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Patch Correctness

• Passing test cases

• Matching the semantics of original accepted patches

Correctness

35

Page 36: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Patch Correctness

• Passing test cases

• Matching the semantics of original accepted patches

• 3 evaluators

Correctness

36

Page 37: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Debugging Time

• Eclipse Plug-in

• Website Timer

Correctness

Debugging time

37

Page 38: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Correctness

Debugging time

• Independent variables• Debugging aids

• Bugs

• Participant types

• Programming experience

38

Page 39: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Multiple Regression AnalysisCorrectness

Debugging time

• Independent variables• Debugging aids

• Bugs

• Participant types

• Programming experience

correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4

debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4

39

Page 40: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Post-study Survey

• Helpfulness of debugging aids

• Difficulty of bugs

• Opinions on using generated patches as debugging aids

Correctness

Debugging time

Survey feedback

40

Page 41: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Results

41

Page 42: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

High-quality patches significantly improve debugging correctness

1

48%

33%

71%

42

Page 43: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

High-quality patches significantly improve debugging correctness

1

48%

33%

71%

43

Location LowQ HighQ

% of correct patches

48%

71%

Page 44: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Location LowQ HighQ

% of correct patches

High-quality patches significantly improve debugging correctness

1

Positive Coefficient = 1.25

p-value= 0.00 < 0.05 48%

71%

44

Page 45: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Location LowQ HighQ

% of correct patches

Low-quality patches slightly undermine debugging correctness

2

48%

33%

71%

45

Page 46: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Location LowQ HighQ

% of correct patches

Low-quality patches slightly undermine debugging correctness

2

Negative Coefficient = -0.55

p-value= 0.09 48%

33%

71%

46

Page 47: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Location LowQ HighQ

% of correct patches

Low-quality patches can undermine debugging correctness

2

Negative Coefficient = -0.55

p-value= 0.09 48%

33%

71%

47

Page 48: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

High-quality patches are more useful for difficult bugs3

48

Page 49: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

High-quality patches are more useful for difficult bugs3

49

2

3

4

5

Bug Difficulty

Bug1Math-280

Bug2Rhino-114493

Bug3Rhino-192226

Bug4Rhino-217379

Bug5Rhino-76683

Page 50: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

High-quality patches are more useful for difficult bugs3

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Bug1 Bug2 Bug3 Bug4 Bug5

% of correct patches

Location LowQ HighQ

50

2

3

4

5

Bug Difficulty

Bug1Math-280

Bug2Rhino-114493

Bug3Rhino-192226

Bug4Rhino-217379

Bug5Rhino-76683

Page 51: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

4The type of debugging aid does not affect debugging time

51

Page 52: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

4The type of debugging aid does not affect debugging time

0

20

40

60

80

Location LowQ HighQ

Debugging time (min)

52

Page 53: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

5Other factors’ impact on debugging performance

Difficult bugs significantly slow down debugging

Engr and MTurk are more likely to debug correctly

Novices tend to benefit more from HighQ patches

53

Page 54: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Helpfulness of debugging aidsVery helpful

Helpful

Medium

Slightly Helpful

Not Helpful

54

Participants consider high-quality generated patches much more helpful than low-quality patches

Low-quality generated patch

High-quality generated patch

Mann-Whitney U test

p-value = 0.001

6

Page 55: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Feedback

55

Page 56: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

56

Page 57: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Quick starting point

• Point to the buggy area

• Brainstorm

“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”

57

Page 58: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Quick starting point

• Point to the buggy area

• Brainstorm

Confusing, incomplete, misleading

• Wrong lead, especially for novices

• Require further human perfection

“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”

58

Page 59: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

“Generated patches would be good at recognizing obvious problems”

“…but may not recognize more involved defects.”

59

Page 60: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

“Generated patches would be good at recognizing obvious problems”

“…but may not recognize more involved defects.”

60

“Generated patches simplify the problem”

“…but they may over-simplify it by not addressing the root cause.”

Page 61: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

“I would use generated patches as debugging aids, as they provide extra diagnostic information”

61

Page 62: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

“I would use generated patches as debugging aids, as they provide extra diagnostic information”

“…along with access to standard debugging tools.”

62

Page 63: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Threats to Validity

63

Page 64: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Threats to Validity

• Bugs and generated patches may not be representative

• Quality measure of generated patches may not generalize

• May not generalize to domain experts

• Possibility of blindly reusing generated patches• Remove patches that are submitted less than 1 minute

64

Page 65: Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Takeaway

65

• Auto-generated patches can be useful as debugging aids• Participants fix bugs more correctly with auto-

generated patches

• Quality control is required• Participants’ debugging correctness is

compromised with low-quality generated patches

• Maximize the benefits• Difficult bugs

• Novice developers


Recommended