1
Program Comprehension through Dynamic Analysis
Visualization, evaluation, and a survey
Bas Cornelissen (et al.)Delft University of Technology
IPA Herfstdagen, Nunspeet, The NetherlandsNovember 26, 2008
Context
• Software maintenance– e.g., feature requests, debugging– requires understanding of the program at hand– up to 70% of effort spent on comprehension process
Support program comprehension
2
Definitions
Program Comprehension• “A person understands a program when he or she is able
to – explain the program, its structure, its behavior, its
effects on its operation context, and its relationships to its application domain
– in terms that are qualitatively different from the tokens used to construct the source code of the program.”
3
Definitions (cont’d)
Dynamic analysis• The analysis of the properties
of a running software systemUnknown system
Instrumentation
Execution Scenario
e.g., open source
e.g., using AspectJ
(too) much data
• Advantages• preciseness• goal-oriented
• Limitations• incompleteness• scenario-dependence• scalability issues
4
Outline
1. Literature survey2. Visualization I: UML sequence diagrams3. Comparing reduction techniques4. Visualization II: Extravis5. Current work: Human factor6. Concluding remarks
5
Literature survey
6
Why a literature survey?
• Numerous papers and subfields– last decade: many papers annually
• Need for a broad overview– keep track of current and past developments– identify future directions
• Existing surveys (4) do not suffice– scopes restricted– approaches not systematic– collective outcomes difficult to structure
7
Characterizing the literature
• Four facets– Activity: what is being performed/contributed?
• e.g., architecture reconstruction
8
– Target: to which languages/platforms is the approach applicable?
• e.g., web applications
– Method: which methods are used in conducting the activity?
• e.g., formal concept analysis
– Evaluation: how is the approach validated?• e.g., industrial study
Attribute framework
9
Characterization
10Etc.
Attribute frequencies
11
Survey results
• Least common activities– surveys, architecture reconstruction
12
• Least common target systems– multithreaded, distributed, legacy, web
• Least common evaluations– industrial studies, controlled experiments,
comparisons
Visualization I: Sequence Diagrams
13
UML sequence diagrams
• Goal– visualize testcase executions as sequence diagrams– provides insight in functionalities– accurate, up-to-date documentation
• Method1. instrument system and testsuite2.execute testsuite3.abstract from “irrelevant” details4.visualize as sequence diagrams
14
Evaluation
• JPacman– Small program for educational purposes– 3 KLOC– 25 classes
• Task– Change requests
• addition of “undo” functionality• addition of “multi-level” functionality
15
Evaluation (cont’d)
• Checkstyle– code validation tool– 57 KLOC– 275 classes
• Task– Addition of a new check
• which types of checks exist?• what is the difference in terms of implementation?
16
Results
• Sequence diagrams are easily readable– intuitive due to chronological ordering
• Sequence diagrams aid in program comprehension– supports maintenance tasks
• Proper reductions/abstractions are difficult– reduce 10,000 events to 100 events, but at what cost?
17
Results (cont’d)
• Reduction techniques: issues– which one is “best”?
• which are most likely to lead to significant reductions?• which are the fastest?• which actually abstract from irrelevant details?
18
Comparing reduction techniques
19
Trace reduction techniques
• Input 1: large execution trace– up to millions of events
• Input 2: maximum output size– e.g., 100 for visualiz. through UML sequence diagrams
• Output: reduced trace– was reduction successful?– how fast was the reduction performed?– has relevant data been preserved?
20
Example technique
Stack depth limitation [metrics-based filtering]• requires two passes
discard events above maximum depth
determine depth frequenciesTrace
0 28,4501 13,9022 58,4443 29,9334 10,004...
determinemaximum depth
maximum outputsize (threshold)
Trace
200,000events
50,000events
42,352events
> depth 1
21
How can we compare the techniques?
• Use:– common context– common evaluation criteria– common test set
Ensures fair comparison
22
Approach
• Assessment methodology1. Context2. Criteria3. Metrics4. Test set5. Application6. Interpretation
23
need for high level knowledge
reduction success rate; performance; info preservation
output size; time spent; preservation % per type
five open source systems, one industrial
apply reductions using thresholds 1,000 thru 1,000,000
compare side-by-side
Techniques under assessment
• Subsequence summarization [summarization]
• Stack depth limitation [metrics-based]
• Language-based filtering [filtering]
• Sampling [ad hoc]
24
Assessment summary
25
Subseq. summ.
Stack depth limitation
Lang.-based filterings
Sampling
Reduction success rate o o -- +
Performance-- o o --
Information preservation + o o --
Visualization II: Extravis
26
Extravis
• Execution Trace Visualizer– joint collaboration with TU/e
• Goal– program comprehension through trace visualization
• trace exploration, feature location, ...
– address scalability issues• millions of events sequence diagrams not adequate
27
28
Evaluation: Cromod
• Industrial system– Regulates greenhouse conditions– 51 KLOC– 145 classes
• Trace– 270,000 events
• Task– Analysis of fan-in/fan-out characteristics
29
Evaluation: Cromod (cont’d)
30
Evaluation: JHotDraw
• Medium-size open source application– Java framework for graphics editing– 73 KLOC– 344 classes
• Trace– 180,000 events
• Task– feature location
• i.e., relate functionality to source code or trace fragment
31
Evaluation: JHotDraw (cont’d)
32
Evaluation: Checkstyle
• Medium-size open source system– code validation tool– 73 KLOC– 344 classes– Trace: 200,000 events
• Task– formulate hypothesis
• “typical scenario comprises four main phases”• initialization; AST construction; AST traversal; termination
– validate hypothesis through trace analysis33
Evaluation: Checkstyle (cont’d)
34
Current work: Human factor
35
Motivation
• Need for controlled experiments in general– measure impact of (novel) visualizations
• Need for empirical validation of Extravis in particular– only anecdotal evidence thus far
36
Measure usefulness of Extravis in software maintenance
• does runtime information from Extravis help?
Experimental design
• Series of maintenance tasks– from high level to low level– e.g., overview, refactoring, detailed understanding
• Experimental group– ±10 subjects– Eclipse IDE + Extravis
• Control group– ±10 subjects– Eclipse IDE
37
Concluding remarks
38
Concluding remarks
• Program comprehension: important subject– make software maintenance more efficient
• Difficult to evaluate and compare– due to human factor
• Many future directions– several of which have been addressed by this research
39
Want to participate in the controlled experiment..?
• Prerequisites– at least two persons– knowledge of Java– (some) experience with Eclipse– no implementation knowledge of Checkstyle– two hours to spare between December 1 and 19
Contact me:– during lunch, or– through email: [email protected]
40