+ All Categories
Home > Documents > Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie...

Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie...

Date post: 13-Jan-2016
Category:
Upload: roy-phelps
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University
Transcript
Page 1: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Open Source Java Bug Study:Understanding where help is needed

Tim Halloran

SSSG 6 Nov 2003

Carnegie Mellon University

Page 2: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Motivation—Why study open source Java bugs?

Technology: Chains of evidence (CoE) Extra-linguistic program assurance

(Lock, Uniqueness) Bureaucratic (mechanical)

Fluid Assurance Tool

Where ishelp needed

What canbe assured

+impact

Question: Can this have a positive impact on practice?

Study defect reports on and code changes made (fixes) to widely deployed open source Java projects

Goal: determine, empirically, how useful CoE is (how common and

“costly” are defects CoE could help prevent)

Page 3: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

This talk

Methodology Data collection

Selected Java projects Tool data and limitations (and solutions)

Variable creation Variable reduction Summary Questions & discussion

Page 4: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Focus

Today…

Methodology

(1) Bug Selection

Data collection

(2) Expert Analysis

Variable creation

Variable reduction/exploratory data analysis

SamplingInter-raterReliability?

Results Analysis

Expert judgment:Is bug/fix bureaucratic?Could CoE have helped?Semantic category?Do we understand bug/fix?

Develop definitions (bureaucratic)

Example…

Page 5: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Data collection:3 Java projects investigated

Ant (64 kSLOC) A Java-based build tool

Struts (40 kSLOC) Framework for building Java web

applications based on a variation of the classic MVC design paradigm

Tomcat (65 kSLOC Java) The official reference

implementation of Java Servlet and JavaServer Pages technologies (web server)

Ant

Selection: Widely used Java software (external validity?)

Struts

Tomcat

Page 6: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Data collection:Tool data used

Software Defect (“Bug”) Data Off-line copy of Apache Software Foundation (ASF)

Bugzilla MySQL database Ant: 2,230 bugs (7-Sep-00 to 16-May-03) Struts: 1,473 bugs (19-Oct-00 to 16-May-03) Tomcat: 4,052 bugs (26-Aug-00 to 16-May-03)

Code Changes CVS commit logs

Ant: 9,565 commits (13-Jan-00 to 4-Jun-03) Struts: 3,610 commits (31-May-00 to 4-Jun-03) Tomcat: 14,833 commits (10-Oct-99 to 4-Jun-03)

Page 7: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Data collection:Limitations of ASF tool data

BugzillaBugs

CommitLogs

Goal: Link code changes made by each bug toadd code change information to bug information

Problems•No link from bug to commits•Informal links from commits to bugs•Informal identity management

Page 8: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Problem: No link from bug to commits

Page 9: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Data examples------------------ CVS commit log 1272 at 2001-02-01 15:37:28 by Nico Seessle ------------------Fixed Bug #378.ExecuteOn (and Apply) have a default-value of false for their parallel-attribute.

Problem: Informal links from commits to bugs

Problem: Informal identity management

[email protected]@[email protected]@daedelus.apache.org

Commit Email Real name Bugzilla Id

Craig R. McClanahan [email protected]

[email protected]@[email protected]

Rob [email protected]

Page 10: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Solution: 1st manual identity determination

Manual building of project committer identity 99 individuals identified Used:

ASF web pages Google, etc. Dates of actions Project mailing lists (headers noting real name)

Very Manual—High Confidence in Links: an “Anchor” for linking bugs to commit logs

Page 11: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Solution: 2nd semi-automated linking of bugs to commits

Wrote Java code to assist linking CVS commits to individual Bugzilla bugs Extracts all numbers from CVS commit log Checks if number is a bug for the project

Becomes set of possible bugs Checks if commit is within the duration of bug Checks if committer was “involved” with the bug

Becomes inferred set of bugs

If extracted set matches inferred set then entry is made automatically—otherwise

researcher shown all information and asked to correct the inferred set (if necessary)

Page 12: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Example: Automatic Link

"struts" bug 15799 found : created 2003-01-04 15:12:17 (15799) Bugzilla description: Nested tags picks up wrong bean for values (15799) 2003-01-05 22:13:43 David Morris 4 1.0 Beta 3 1.1 Beta 3 (15799) 2003-02-04 21:03:34 James Mitchell 4 1.1 Beta 3 Nightly Build (15799) 2003-02-05 02:40:54 James Turner 15 [email protected] (15799) 2003-02-05 03:36:34 Ted Husted 4 Nightly Build 1.1 Beta 3 (15799) 2003-02-06 00:36:48 Arron Bates 8 NEW RESOLVED (15799) 2003-02-06 00:36:48 Arron Bates 11 FIXED------------------ CVS commit log 27541 at 2003-02-05 16:26:11 by Arron Bates ------------------Committed patch Bug15799, reported and patched by David Morris.IDEA also told me to remove a redundant class cast ( ...a fashionable thing to do it seems :)Inferred set [15799] = [15799]

No decision required by researcher

Page 13: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Example: Manual Link"tomcat" bug 207 found : created 2000-10-28 11:58:02 (207) Bugzilla description: mod_jk.conf-auto is not generated when tomcat is started BugRat Report#319Not adding bug 207 to inferred set [:log time after bug lifetime:comitter not in bug group]"tomcat" bug 660 found : created 2001-02-21 03:04:15 (660) Bugzilla description: Bad context on Authentication Form PageNot adding bug 660 to inferred set [:log time after bug lifetime:comitter not in bug group]"tomcat" bug 371 found : created 2000-12-22 20:24:31 (371) Bugzilla description: Webdav status code 207 not present in core/LocalStrings.properties BugRat Report#660------------------ CVS commit log 13662 at 2001-03-15 12:15:21 by Marc Saegesser ------------------Added 207 result code for WEBDAV.PR: 660/Bugzilla 371Submitted by: [email protected] (David F. Sklar)Inferred set [371]Link bug ids (c to clear)[207, 660, 371] 371

Decision required by researcher: 207 is a result code (not a bug reference) and 660 is the id from the pre-

Bugzilla Jakarta bug system

MANUAL INPUT

Page 14: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Noting and linking outside contribution: not done (yet)

Linking contribution by non-committers to bug fixes (or enhancements) between CVS and Bugzilla Often committers commit code changes contributed by

non-committers No standard approach in CVS logs to indicate such a

contribution (informal references to known contributors) Obscuring of email address (to fight SPAM) has hit

open source logs

Linking contributor names to Bugzilla Ids would face same issues noted for committers Larger scale and less “context” to manually build up

a case to link identity to identifiers

Testcase submitted by: Martijn Kruithof <martijn at kruithof.xs4all.nl>

Page 15: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Variable creation:Narrowing bug focus

Bugs Ant Struts Tomcat Total

total 2,230 1,474 4,052 7,756

fixed 886 711 1302 2,899

w/java 479 275 561 1,315

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Ant Struts Tomcat

Bugstotal

fixed

w/java

total to fixed? fixed to w/java?

Examined 20 bugs:

Project Lost

Doc Proc Fixed-NIR

Ant 1 13 3 3

Struts 5 12 1 2

Tomcat 9 1 2 8

Total: 15 26 6 13

Page 16: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Variable creation:Per-bug variables

Variable Description Use/Transformation

Ptotal Total # of people involved Ppublic_LN = Log(Ptotal - Pkey + 1)

Pkey # of project committers involved Pcommit_BI (BI: 0 = no; 1 = yes)

Pasf # of ASF members involved Pasf_BI (any?)

Dtotal Duration in days

Dtotal_nonlater Duration in days excluding any time in LATER status Dtotal_nonlater_LN

STATUSchange # of changes (any type) to bug STATUSchange_BI (>2 total)

DUPcount # of duplicate bugs reported to this one DUPEcount_BI (any duplicates)

COMMcount # of comments posted

COMMsize Size in characters of all comments COMMsize_LN

ATTACHcount # of attachments (patches, images) posted ATTACHcount_BI (any?)

ATTACHsize Size in bytes of all attachments

REOPENEDcount # of times bug was reopened after being closed

PRIORITYFinal programmer assigned priority(Low, Med, High, Other)

PRIORITY_BI (not Other)

PRIORITYchanges # of times the priority of the bug was changed

SEVERITYFinal programmer assigned severity(Enhancement, Minor, Normal, Major, Critical)

SEVERITY_BI (> Normal)

SEVERITYchanges # of times the severity of the bug was changed

JavaSLOCcount # of lines of Java changed for bug fix JavaSLOCcount

JavaCUcount # of Java files (compilation units) changed for bug fix JavaCUCount

JavaPKcount # of Java packages changed for bug fix JavaPKCount

subsets

non-normal

Page 17: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Variable reduction: (preliminary) Principal components analysis

Factor 1: Public interest Public_LN (0.7) COMMsize_LN (0.6) DUPcount_BI (0.6) STATUSchanges_BI (0.3)

Factor 2: Java code changed JavaCUchange (0.9) JavaPKchange (0.8) JavaSLOCchange (0.7)

Factor 3: Committer interest Pcommit_BI (0.9) Pasf_BI (-0.9)

Factor 4: Effort/Time Dtotal_nonLATER_LN (0.7) PRIORITY_BI (0.7) STATUSchanges_BI (0.6) SEVERITY_BI(-0.3)

# of:committer

s

ASFmember

s either

0 449 bugs 651 bugs 2 bugs

1 643 bugs 633 bugs 901 bugs

2 183 bugs 27 bugs 325 bugs

3 29 bugs 4 bugs 65 bugs

4 11 bugs 17 bugs

5 5 bugs

Page 18: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Summary

We have a reasonable set of “synthetic” measures of some of the important characteristics of bugs and their fixes How “costly” in several dimensions (time, public

interest, etc.) Next step: Identify, via expert judges, bugs

for which CoE would have been effective Combination with results so far will provide some

understanding of how

Page 19: Open Source Java Bug Study: Understanding where help is needed Tim Halloran SSSG 6 Nov 2003 Carnegie Mellon University.

Questions & Discussion

Questions?

Issues: Approach to study

Definitions bureaucratic (mechanical) vs.

functional program properties

NetBeans data


Recommended