Partitioning Composite Code Changes to Facilitate Code Review
Yida Tao and Sunghun KimThe Hong Kong University of Science and Technology
Atomic Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56Removed duplicate codeAdd a featureJavadoc updated
Composite Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56Removed duplicate codeAdd a featureJavadoc updated
Composite Code Change
Fixed bug #123
Difficult to reviewLikely be rejected
Research Questions
• RQ1: Are composite code changes prevalent?
• RQ2: Can we propose an approach to improve the semantic atomicity of composite code changes?
• RQ3: Can our approach help developers better review composite code changes?
RQ1: Occurrence of composite code changes
• Data source• 4 open-source Java projects
• Revisions that changed >= 2 lines of code
• Commit logs and source code were manually inspected
6
Time period Revisions Avg. cLOC Avg. files
Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0
Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5
Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0
JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1
Total revisions 453
82%
13%
5% Xerces
7
8% - 29% revisions address multiple issues
RQ1: Occurrence of composite code changes
92%
7%
1%Ant
82%
12%6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Approach
A set of changed statements
Partition of the change(A subset is a change-slice)
A composite code change
Identify related
statements
8
Approach Formatting
Dependency
Similarity
9
A set of changed statements
A composite code change
Partition of the change(A subset is a change-slice)
Approach
Partition of the patch
Unix diff(text differencing)
ChangeDistiller*(AST differencing)
*http://www.ifi.uzh.ch/seal/research/tools/changeDistiller.html
Formatting changes
10
Formatting
Dependency
SimilarityA set of changed statements
A composite code change
Approach
Partition of the patch
IBM T.J. Watson Libraries for Analysis(WALA)*
Inter-proceduralBackward static slicing
*http://wala.sourceforge.net/wiki/index.php/Main_Page
11
Formatting
Dependency
SimilarityA set of changed statements
A composite code change
Approach
Partition of the patch“protect array entries against corruption by returning a clone”
Same change typeSimilar delta
12
Formatting
Dependency
SimilarityA set of changed statements
A composite code change
P
P’
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptable if it exactly matched the manual partition
82%
13%
5%Xerces
92%
7%
1%Ant
82%
12%6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptable if it exactly matched the manual partition
82%
13%
5%Xerces
92%
7%
1%Ant
82%
12%6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptable if it exactly matched the manual partition
Acceptable # / Total #
Ant 8 / 11
Commons Math 10 / 19
Xerces 16 / 21
JFreeChart 20 / 27
54 / 78 (69%)
Ant revision 943068 (24 changed LOC)
“Wrong assignment after I renamed the parameter. Unfortunately there doesn’t seem to be a testcase that catches the error.”
Later fixed in revision 943070
17
One of the two change-slices after partitioning
Preliminary User Study
• RQ3• Can our automatic partition help developers better review composite changes?
• Participants• 18 CS graduate students
• Task• Participants review 12 composite code changes
• Answer a series of code review questions [1], e.g., • “What is the consequence of removing the schemaType field?”
• “What do changes in these files have in common?”
18[1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
Experimental Settings
• Treatments
• Control group: review code changes by file• Experimental group: review code changes by partition
19
Results
By file By partition By file By partition
20
p = 0.01 p = 0.81
Formatting
Dependency
SimilarityComposite Code Changes
Partition of the change
8% - 29% 69%
By file By partition
Discussion
• Impact of unsatisfactory change partitions
•Balancing between partition costs and benefits
Related Work
• Helping developers help themselves: Automatic decomposition of code review changesets. Barnett et al. ICSE 2015• The industrial perspective of composite code changes
• The impact of tangled code changes. Kim Herzig and Andreas Zeller. MSR 2013
• Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect Prediction and Localization. Nguyen et al. ISSRE 2013• How composite code changes affect defect prediction
Acknowledgment
• We thank all the students that participated in our user study.
• We especially thank Tao He and Hai Wan for their kind help of arranging the user study.
• We also thank Ananya Kanjilal for her comments on the paper draft.