Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey

1

Program Comprehension through Dynamic Analysis

Visualization, evaluation, and a survey

Bas Cornelissen (et al.)Delft University of Technology

IPA Herfstdagen, Nunspeet, The NetherlandsNovember 26, 2008

Context

• Software maintenance– e.g., feature requests, debugging– requires understanding of the program at hand– up to 70% of effort spent on comprehension process

Support program comprehension

2

Definitions

Program Comprehension• “A person understands a program when he or she is able

to – explain the program, its structure, its behavior, its

effects on its operation context, and its relationships to its application domain

– in terms that are qualitatively different from the tokens used to construct the source code of the program.”

3

Definitions (cont’d)

Dynamic analysis• The analysis of the properties

of a running software systemUnknown system

Instrumentation

Execution Scenario

e.g., open source

e.g., using AspectJ

(too) much data

• Advantages• preciseness• goal-oriented

• Limitations• incompleteness• scenario-dependence• scalability issues

4

Outline

1. Literature survey2. Visualization I: UML sequence diagrams3. Comparing reduction techniques4. Visualization II: Extravis5. Current work: Human factor6. Concluding remarks

5

Literature survey

6

Why a literature survey?

• Numerous papers and subfields– last decade: many papers annually

• Need for a broad overview– keep track of current and past developments– identify future directions

• Existing surveys (4) do not suffice– scopes restricted– approaches not systematic– collective outcomes difficult to structure

7

Characterizing the literature

• Four facets– Activity: what is being performed/contributed?

• e.g., architecture reconstruction

8

– Target: to which languages/platforms is the approach applicable?

• e.g., web applications

– Method: which methods are used in conducting the activity?

• e.g., formal concept analysis

– Evaluation: how is the approach validated?• e.g., industrial study

Attribute framework

9

Characterization

10Etc.

Attribute frequencies

11

Survey results

• Least common activities– surveys, architecture reconstruction

12

• Least common target systems– multithreaded, distributed, legacy, web

• Least common evaluations– industrial studies, controlled experiments,

comparisons

Visualization I: Sequence Diagrams

13

UML sequence diagrams

• Goal– visualize testcase executions as sequence diagrams– provides insight in functionalities– accurate, up-to-date documentation

• Method1. instrument system and testsuite2.execute testsuite3.abstract from “irrelevant” details4.visualize as sequence diagrams

14

Evaluation

• JPacman– Small program for educational purposes– 3 KLOC– 25 classes

• Task– Change requests

• addition of “undo” functionality• addition of “multi-level” functionality

15

Evaluation (cont’d)

• Checkstyle– code validation tool– 57 KLOC– 275 classes

• Task– Addition of a new check

• which types of checks exist?• what is the difference in terms of implementation?

16

Results

• Sequence diagrams are easily readable– intuitive due to chronological ordering

• Sequence diagrams aid in program comprehension– supports maintenance tasks

• Proper reductions/abstractions are difficult– reduce 10,000 events to 100 events, but at what cost?

17

Results (cont’d)

• Reduction techniques: issues– which one is “best”?

• which are most likely to lead to significant reductions?• which are the fastest?• which actually abstract from irrelevant details?

18

Comparing reduction techniques

19

Trace reduction techniques

• Input 1: large execution trace– up to millions of events

• Input 2: maximum output size– e.g., 100 for visualiz. through UML sequence diagrams

• Output: reduced trace– was reduction successful?– how fast was the reduction performed?– has relevant data been preserved?

20

Example technique

Stack depth limitation [metrics-based filtering]• requires two passes

discard events above maximum depth

determine depth frequenciesTrace

0 28,4501 13,9022 58,4443 29,9334 10,004...

determinemaximum depth

maximum outputsize (threshold)

Trace

200,000events

50,000events

42,352events

> depth 1

21

How can we compare the techniques?

• Use:– common context– common evaluation criteria– common test set

Ensures fair comparison

22

Approach

• Assessment methodology1. Context2. Criteria3. Metrics4. Test set5. Application6. Interpretation

23

need for high level knowledge

reduction success rate; performance; info preservation

output size; time spent; preservation % per type

five open source systems, one industrial

apply reductions using thresholds 1,000 thru 1,000,000

compare side-by-side

Techniques under assessment

• Subsequence summarization [summarization]

• Stack depth limitation [metrics-based]

• Language-based filtering [filtering]

• Sampling [ad hoc]

24

Assessment summary

25

Subseq. summ.

Stack depth limitation

Lang.-based filterings

Sampling

Reduction success rate o o -- +

Performance-- o o --

Information preservation + o o --

Visualization II: Extravis

26

Extravis

• Execution Trace Visualizer– joint collaboration with TU/e

• Goal– program comprehension through trace visualization

• trace exploration, feature location, ...

– address scalability issues• millions of events sequence diagrams not adequate

27

28

Evaluation: Cromod

• Industrial system– Regulates greenhouse conditions– 51 KLOC– 145 classes

• Trace– 270,000 events

• Task– Analysis of fan-in/fan-out characteristics

29

Evaluation: Cromod (cont’d)

30

Evaluation: JHotDraw

• Medium-size open source application– Java framework for graphics editing– 73 KLOC– 344 classes

• Trace– 180,000 events

• Task– feature location

• i.e., relate functionality to source code or trace fragment

31

Evaluation: JHotDraw (cont’d)

32

Evaluation: Checkstyle

• Medium-size open source system– code validation tool– 73 KLOC– 344 classes– Trace: 200,000 events

• Task– formulate hypothesis

• “typical scenario comprises four main phases”• initialization; AST construction; AST traversal; termination

– validate hypothesis through trace analysis33

Evaluation: Checkstyle (cont’d)

34

Current work: Human factor

35

Motivation

• Need for controlled experiments in general– measure impact of (novel) visualizations

• Need for empirical validation of Extravis in particular– only anecdotal evidence thus far

36

Measure usefulness of Extravis in software maintenance

• does runtime information from Extravis help?

Experimental design

• Series of maintenance tasks– from high level to low level– e.g., overview, refactoring, detailed understanding

• Experimental group– ±10 subjects– Eclipse IDE + Extravis

• Control group– ±10 subjects– Eclipse IDE

37

Concluding remarks

38

Concluding remarks

• Program comprehension: important subject– make software maintenance more efficient

• Difficult to evaluate and compare– due to human factor

• Many future directions– several of which have been addressed by this research

39

Want to participate in the controlled experiment..?

• Prerequisites– at least two persons– knowledge of Java– (some) experience with Eclipse– no implementation knowledge of Checkstyle– two hours to spare between December 1 and 19

Contact me:– during lunch, or– through email: [email protected]

40

Date post:	19-Mar-2016
Category:	Documents
Upload:	elan
View:	48 times
Download:	0 times

Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey

Documents