1
Source Code Comprehension on Evolving Software: A Literature Survey
Yida Tao
Supervisor: Sunghun Kim
2
Motivation
Code Change Comprehension
Tao et al., FSE’12Code change comprehension is• Frequently required• In major development activities,
in particular the code-review process
• How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12
• Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13
Bacchelli & Bird, ICSE’13• “…review and understand code
they have not seen before may be more common that a developer working on new code”
• “From interviews, no other code review challenge emerged as clearly as understanding the submitted change”
3
Outline
Program Differencing
Describing code changes
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
Code Change Comprehension
4
Program Differencing Text Differencing
Syntactic Differencing
Semantic Differencing
5
Text Differencing Flat representation of a program
Sequence of strings
Unix diff Only output added/deleted lines, can not detect modified lines Hard to determine when a code fragment is moved upward or
downward
Ldiff (Canfora et al., ICSE’09) An enhanced line differencing tool
Limitations Changes to *characters* No syntactic-structure information
6
Syntactic Differencing Structured representation of a program
Abstract syntax tree; XML ChangeDistiller (Fluri et al., TSE’07)
Tree differencing Node: bigram string similarity Control structure: subtree similarity
Output: tree edit script (insert, delete, move, update) XML differecing
srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure within the source code
diffX (Al-Ekram et al., CASCON '05) Limitation
Cannot describe how the behavior of a program is changed Still report differences for behavior-preserving changes
7
Semantic Differencing Semantic diff (Jackson and Ladd, ICSM’94)
Method-level Variable dependencies comparison
==
8
Semantic Differencing (cont.) JDiff (Apiwattanapong et al. ASE’04, 06)
Extended control-flow graph (ECFG) Dynamic binding, class hierarchy, exception handling,
etc.
9
Semantic Differencing (cont.) Differential symbolic execution (Person et al.,
FSE’08) “Executing” a program using symbolic values
10
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Comprehension
Code Change Summarization
Explaining code changes
Querying and Filtering
Customization
11
Code Change Summarization LSdiff (Kim and Notkin, ICSE’09)
Group related changes Detect potential inconsistencies in a code change
12
Code Change Summarization (cont.) DeltaDoc (Buse and Weimer, ASE’10)
Symbolic execution: obtain path predicates for each statement in both versions
Identify statements that are added, deleted, or have a changed predicates
Summarization
13
Code Change Summarization (cont.) Multi-document summarization (Rastkar and Murphy,
ICSE’13) Linking evolutionary documents (commit log, issue tracking entries) Finding the most informative sentences to extract to form a
summary Similarity between a sentence and the title of the enclosing document Overlap between a sentence and the adjacent document
14
Code Change Summarization (cont.) Challenges
Evolutionary documents Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11) Human-written document may be unavailable or uninformative (Buse
and Weimer, ASE’10, Tao et al., FSE’12) Automatically generated document
Verbosity Uninteresting changes are identified, e.g., “all types that declared
toString() added constructors” (Kim and Notkin, ICSE’09)
LSdiff DeltaDoc
15
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Code Change Comprehension
Querying and Filtering
Customization
16
Querying and Filtering Specifying and detecting meaningful changes (Yu et al.,
ASE’11) Normalize the program (user-specified) before differencing Non-trivial to construct the query
17
Querying and Filtering (cont.) Filtering non-essential changes (Kawrykow
and Robillard, ICSE’11) Non-essential changes: rename-induced
modifications, local variable extraction, trivial keyword modification, whitespace and documentation updates
ChangeDistiller (Fluri et al., TSE’07) + Partial program analysis (Dagenais and Robillard, ICSE’08)
Goal: improving mining and recommendation accuracy instead of developers’ comprehension
18
Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change Comprehension
19
Research Directions
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Source Code Changes
Work-item-based changes?
Work-item-based Changes Multiple work-items in a single code change (e.g.,
a bug fix + code cleanup + a new feature) Very difficult to understand (Tao et al., FSE’12)
20JFreeChart revision 1083
Trivial keyword removal
Bug fix
Formatting
Work-item-based Change Detection Multiple work-items in a single code change (e.g.,
a bug fix + code cleanup + a new feature) Very difficult to understand (Tao et al., FSE’12) Change decomposition
Program slicing (entity dependencies) Pattern matching (similarities)
A single work-item spreads across multiple code changes (e.g., 5 changes to finally fix a bug completely) Change aggregation
Linkage to the same issue Heuristics like time duration, commit authors, program
dependencies, etc.21
22
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change ComprehensionWork-item change
detection
Change decomposition
Change aggregation
23
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific changes
Code Change ComprehensionWork-item change
detection
Change decomposition
Change aggregation
24
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific changes
Code Change Comprehension
Concrete Execution
Work-item change detection
Change decomposition
Change aggregation
25
Explaining code changes with executions of co-changed test cases Test cases
Best documentation for source code Test cases co-changed with source code
Documentation for code changes? Mostly synchronous co-evolution of production and
test code (Zaidman et al., Empirical Software Engineering’11)
Differential test executions Co-changed test cases T Executing T on the old version P and new version
P’ Comparing executions to explained change
behaviors
From StackExchangehttp://programmers.stackexchange.com/questions/154439/quality-of-code-in-unit-tests?newsletter=1&nlcode=67628%7c1a35• “Unit tests are one of the best sources of documentation for
your system, and arguably the most reliable form”• “Unit tests are often the first thing you look at when trying to
grasp what some piece of code does”• “They can also serve as a starting point for people new to the
code base”
26
Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific changes
Code Change Comprehension
Concrete Execution
• Co-changed test cases• Differential test
execution
Work-item change detection
Change decomposition
Change aggregation