Incremental Dynamic Impact Analysis for Evolving Software Systems

Incremental Dynamic Impact Analysis for Evolving Software Systems

James LawComputer Science Dept.Oregon State University

Corvallis, [email protected]

Gregg RothermelComputer Science Dept.Oregon State University

Corvallis, [email protected]

Abstract

Impact analysis – determining the potential effects ofchanges on a software system – plays an important role inhelping engineers re-validate modified software. In previ-ous work we presented a new impact analysis technique,PathImpact , for performing dynamic impact analysis atthe level of procedures, and we showed empirically thatthe technique can be cost-effective in comparison to promi-nent prior techniques. A drawback of that approach as pre-sented, however, is that when attempting to apply the tech-nique to a new version of a system as that system and itstest suite evolves, the process of recomputing the data re-quired by the technique for that version can be excessivelyexpensive. In this paper, therefore, we present algorithmsthat allow the data needed byPathImpact to be collectedincrementally. We present the results of a controlled experi-ment investigating the costs and benefits of this incrementalapproach relative to the approach of completely recomput-ing prerequisite data.

1. Introduction

Successful software systems evolve, and as they evolve,the changes made to those systems can have unintended,expensive, or even disastrous effects [13], Software changeimpact analysis is a technique for predicting the potentialeffects of changes before they are made, or measuring thepotential effects of changes after they are made, with thegoal of reducing the costs and risks associated with releas-ing the changed software into the field, and helping ensurethat the system remains dependable and reliable. In particu-lar, change impact analysis can be used to predict or identifyportions of a system that will need to be retested as a resultof changes. The results of this analysis can be used to deter-mine whether the potential impact of changes on re-testingeffort is low enough to allow the changes to be made, orto guide test engineers in determining how to allocate theirregression testing efforts on the modified system.

The importance of the impact analysis problem has ledmany researchers to propose specific impact analysis tech-niques [2, 3, 14, 16, 21]. In [8] we present a new impactanalysis technique,PathImpact , that differs from theseprevious impact analysis techniques in its reliance on dy-namic program behavior information gathered for a specificset of executions (e.g, relative to operational profiles and/orspecific test suites), coupled with its reliance on relativelyhigh-level, low-cost (e.g., the level of procedure invoca-tions and returns) system execution information. Empiricalcomparisons ofPathImpact to two other prominent tech-niques (transitive closure on call graphs and static slicing)[8], show that it can obtain significantly greater recall thanthe former, and significantly greater precision, with greaterefficiency, than the latter.

Impact analysis can be expensive in terms of time andcomputational effort. PathImpact is lower-cost thanmany other impact analysis techniques due to its focus onrelatively high-level execution information. For the pur-poses of this investigation we focus on procedure calls andreturns. Even so, following system modifications — includ-ing both code modifications and changes made to test suitesin response to those modifications — time constraints maymake it difficult to re-collect the data needed by the algo-rithm. This can be the case even after only small modi-fications to code and test suites have been made. To em-ploy PathImpact cost-effectively across entire systemlifetimes, techniques are needed to efficiently update thedata required by the algorithm as those systems and theirtest suites evolve.

In this paper we present such techniques. Our ap-proaches take the form of recursive graph algorithms, whichtogether handle the various types of program and test suitemodifications that need to be accommodated to keep thedata required for the application ofPathImpact up-to-date and available without resorting to complete recompu-tation from scratch. Specifically, these algorithms respondto the addition, modification, or deletion of system compo-nents, or to the addition, modification, or deletion of test

cases from the system’s test suite. (We use the term “com-ponents” here generically to refer to parts of a software sys-tem, such as procedures, methods, or classes, rather thanspecifically in the sense of reusable code or COTS com-ponents.) We present the results of a controlled experimentinvestigating the costs and benefits of using this incrementalapproach, relative to the alternative approach of completelyrecomputing prerequisite data, that illustrates the tradeoffsbetween the approaches.

2. Background

In this section we review related work on impact analy-sis, and describe ourPathImpact approach.

2.1 Impact Analysis and Related Work

Impact analysis techniques can be partitioned into twoclasses: those based on traceability analysis and those basedon dependence analysis [3]. Our work focuses on the latter.

Dependence-analysis-based impact analysis techniques[9, 10, 11, 19] attempt to assess the effects of change onsemantic dependencies between program entities, typicallyby identifying the syntactic dependencies that may signalthe presence of such semantic dependencies [17]. The tech-niques used to identify these syntactic dependencies typ-ically include static (e.g., [4, 22]) and/or dynamic (e.g.,[1, 6]) slicing techniques. (For an overview of slicingtechniques see [5, 20]). Other techniques using transitiveclosure on call graphs [3] attempt to approximate slicing-based techniques, while avoiding the costs of the depen-dence analyses that underlie those techniques.

Other common attempts to approximate dependence-based impact analysis approaches have involved expertjudgement and code inspection; however, these are not eas-ily automated. Moreover, expert predictions of the extent ofchange impact have been shown to be frequently incorrect[12], and performing impact analysis by inspecting sourcecode can be prohibitively expensive [16].

2.2 The PathImpact Approach

Existing, dependence-based approaches to impact anal-ysis have both advantages and disadvantages. These advan-tages and disadvantages are discussed in detail in [8], wesummarize them here:

� Transitive closure on call graphs is relatively inexpen-sive, but it can also be inaccurate, claiming impactwhere none exists and failing to claim impact whereit does exist.

� Static slicing can predict change impact conservatively(safely); however, by focusing on all possible program

behaviors, it may return impact sets that are too large,or too imprecise relative to the expected operationalprofiles of a system, to be useful for maintainers.

� Dynamic slicing can predict impact relative to specificprogram executions or operational profiles, which maybe useful for many maintenance tasks, but it sacrificessafety in the resulting impact assessment.

� Static and dynamic slicing techniques are relativelycomputationally expensive, requiring data depen-dence, control dependence, and alias analysis, whereascall graph based analysis is comparatively inexpensive.

� Finally, all three approaches rely on access to sourcecode to determine the static calling structure or depen-dencies in that code, and require a relatively large ef-fort to recompute the information needed to assess im-pact on subsequent releases of a software system.

These advantages and disadvantages motivated the cre-ation of our impact analysis technique,PathImpact ,which uses relatively low-cost instrumentation to obtain dy-namic information about system execution, and from thisinformation builds a representation of the system’s behav-ior which it uses to estimate impact.

2.2.1 Summary of the Approach

The PathImpact approach relies on lightweight instru-mentation to collect information from a running softwaresystem. The information is collected in the form of multi-ple execution traces in the form shown in Figure 1. Eachtrace lists each method encountered, and each return taken(symbol ‘r’), and the system exit (symbol ‘x’), in the orderin which they occur during execution. These traces are pro-cessed by a compression algorithm calledSEQUITUR[7]into a space-efficient whole-path directed acyclic graph, orwhole-path DAG, such as the one shown in Figure 2.SE-QUITUR, however, is an “online” algorithm: the sequenceof traces can be processed as they are collected, obviatingthe need to store large amounts of trace data. The resultingwhole-path DAG has been shown to achieve compressionby a factor of approximately 20 (a 2GB trace reduced to a100MB whole-path DAG[7]).

PathImpact walks the DAG to determine an impactset given a set of changes. The complete algorithm for per-forming this DAG walk is presented in [8], together withexamples of its operation and empirical results on its use,and due to space limitations we cannot present it in detailhere. One way to visualize its operation, however, is to con-sider beginning at a changed component’s node in the DAG,and ascending through the DAG performing recursive for-ward and backward in-order traversals at each node, andstopping when any trace termination symbol is found. By

M B r A C D r E r r r r x M B G r r r x M B C F r r r r x

Figure 1. Multiple execution traces. Upper case lettersdenote component names, “r” denotes a return, and “x”denotes a system exit.

2 rT −> x4 rFC34G311ErDCA

4 −> 1 r3 −> x 22 −> M B1 −> r r

M xrGFEDCBA

Figure 2. Whole-path DAG created from the executiontraces in Figure 1.

traversing forward in the DAG we can find all program com-ponents which execute after the change and therefore couldbe affected by the change. By traversing backward in theDAG we can determine by examination which componentsexecution returns into.

In the case of our example, when componentA ischanged, the resulting impact set isfM, A, C, D, Eg:these are all the components into which execution is deter-mined, through the DAG walk, to have encountered, or re-encountered subsequent to the changed componentA. Notethat B is not included in the impact set because it returnsbeforeA is called, and therefore could not be affected byany change inA. This can be determined by examining thesequence of nodes during the backwards traveral. SinceBis immediately preceeded by a return,B cannot be returnedinto, and is excluded from the impact set. This also can beseen by examining the original trace in Figure 1.

2.2.2 Cost-Benefits Tradeoffs for Path-Impact

PathImpact provides a different set of cost-benefitstradeoffs than impact analysis techniques based on transi-tive closure, static slicing, or dynamic slicing – a set whichcan potentially be beneficial for an important class of im-pact analysis tasks not accommodated by those techniques.In particular:

� PathImpact is fully dynamic, requiring no staticanalysis of the system. The resulting impact estimateis thus not safe — it cannot predict all possible impacts

that can occur under all possible inputs. However, be-cause this estimate is drawn on operational behavior,it can be used in cases where safety is not required toprovide more precise information relative to a specificoperational profile or set of test executions than staticanalysis can provide.

� PathImpact is call-based, and thus identifies impactat a coarser level than that provided by fine-grainedanalyses, but it is much more precise than call graphbased analysis, and requires no dependence analysis.

� The instrumentation on whichPathImpact is basedhas a relatively low overhead, and can be performedon binaries, so the technique does not require access tosource code.

3. Incremental, Dynamic Whole ProgramPath-Based Impact Analysis

When a software system is modified, the execution tracescollected for the previous version of the system, and anyimpact-analysis-facilitating representation built from thosetraces, may become incorrect. Even with lightweight instru-mentation, the effort to regenerate dynamic information, incases where this means reexecuting all the tests and buildinga completely new impact representation, can exceed avail-able resources. In such cases it may be preferable to identifythe outdated or incorrect information, remove it, and incre-mentally generate new information where required.

To create such an incremental approach for ourPathImpact technique, we need to consider two possi-ble sources of modifications that can be made to the soft-ware system and its related information: changes to the testsuite, and changes to system components. In the remainderof this section, we overview our approach to handling thesetypes of changes, and then in the next section we providethe algorithms that comprise this approach.

Where changes to the test suite are concerned, whenmaintainers remove a test case, the whole-path DAG thencontains an invalid trace, which must be removed from theDAG. Likewise, if a test case is modified, the existing tracemust be removed from the DAG, the test case must be reex-ecuted, and its new trace must be added to the DAG. Finally,traces for any new test cases must also be added to the DAG.

Next, consider changes to system components. Changesto components may involve code changes, configurationchanges, environmental changes, or other changes that af-fect the runtime behavior of the component. If a componentis removed from the system, each trace that includes thatcomponent must be removed from the DAG. If a compo-nent is modified, or the conditions that may affect a com-ponent’s behavior are changed, each trace that could be af-fected by that modification (as a conservative approxima-

tion, each test case that executes the component) must beremoved. In both of these cases, test cases must be reexe-cuted and their new traces added to the DAG. Finally, addi-tion of new components must also be accommodated.

Two of the foregoing cases can be handled relatively eas-ily. First, addition of new traces, or traces gotten from re-executing test cases, to an existing DAG is easily performedusing theSEQUITURalgorithm described in 2.2.1, by sim-ply concatenating the new traces to the end of the exist-ing whole-path DAG. Note further that these existing algo-rithms ensure that this process works even when the newtrace contains procedures not previously listed in the DAG.

Second, addition of new system components affects onlytraces that reach calls to added procedures. Such tracescan be identified by the handling of code modifications andthrough identification of new invocations of those compo-nents (these may be invocations of new components, as wellas invocations of components that themselves, transitively,invoke other new components).

Our approach to incremental updating requires taggingindividual test executions with unique keys. These keys ap-pear at the beginning of each execution trace, while eachexecution trace ends with a system exit symbol. Any num-ber of such traces can be concatenated and processed byModSEQUITUR, a modified version of theSEQUITURal-gorithm, which we present in Section 4.1.

Given these unique execution keys, individual traces fordeleted or modified test cases can be removed from thewhole-path DAG using the algorithm we present in Section4.2. Test cases that have simply been modified are then re-executed and their traces added to the DAG using the pro-cess for handling new test cases.

The remaining problem is how to accommodate dele-tions or modifications of system components. If a compo-nent is removed from the system, each execution trace con-taining that component must be removed from the DAG. Ifa component is modified, each trace containing that compo-nent must be removed, each test corresponding to a removedtrace must be reexecuted, and the new traces must then beadded to the DAG.

In [8] we outlined an algorithm for removing traces con-taining a given component from a whole-path DAG. We an-ticipated that it would be easy to ascend into the middleof a trace in the DAG and perform delete operations bothforwards and backwards to remove a trace. However, theintroduction of unique execution keys that index theendoftraces complicates the removal of traces from the middle byrequiring extensive DAG traversal to maintain the test keysneeded to support test-level incremental updating. Fortu-nately, given a component in the system, it is relatively sim-ple to determine which test cases contain that component,and then remove these test cases. In this manner, a changeto a component in the system can be reduced to a collec-

tion of test case deletions and additions. In Section 4.3 wepresent the algorithm for determining the appropriate exe-cution keys related to a system change.

An important aspect of our algorithms for handlingchanges is that, although the DAG resulting from a com-plete rebuild and the DAG resulting from updating may nothave equivalent structures, the results of impact analysesperformed on these DAGs will be equivalent. The truth ofthis statement rests with the fact that both the rebuilt and up-dated DAGs contain exactly the same execution traces, onlythe order of the traces may change. The traversal that deter-mines the impact set is insensitive to the order of the tracessince the search stops at any execution boundary. Since thesame traces are searched, the impact sets are equivalent.

4. Algorithms

We now present our new algorithms, which we refer tocollectively by the nameEvolveImpact , to distinguishthem from our earlierPathImpact algorithms which didnot accommodate updates to the DAG.

First, we use a modification of theSEQUITURdata com-pression algorithm [15], which we callModSEQUITUR, toreduce the size of the traces that are collected and store therequired execution keys that allow updating of the DAG.

Our technique requires programs to be instrumented atcomponent entry and exit (generally, this means procedureor method entry). Unique execution keys are added to thetrace before each execution begins. This produces a se-quence of traces, each of which is composed of a sequenceof tokens containing a unique execution key, a sequence ofprocedure names and function returns, and a program exit,in the order in which they occurred.

In the following Sections we presentModSEQUITUR,followed by our algorithms for handling test case removaland procedure removal from the whole-path DAG.

4.1. TheModSEQUITURalgorithm

TheModSEQUITURalgorithm examines a trace gener-ated by a program and removes redundancies in the ob-served sequence of events by creating a grammar that canexactlyregenerate the original trace. The trace may containloops and backedges. As stated earlier,ModSEQUITURisan online algorithm – this means it can process traces asthey are generated, rather than requiring the entire trace tobe available. To facilitate re-use of the collected trace, onlythe resultingModSEQUITURgrammar need be stored.

ModSEQUITUR(see Figure 3), uses two hash tables foreach grammar rule to store the end point of each trace.The end of each trace is marked by a unique test key en-countered at the beginning of each trace. One hash table(TEnd) stores the link in the rule’s production where a test

algorithm ModSEQUITUR( S )input Execution TraceSoutput GrammarG

GrammarGRuleTHash TableTEnd, key= testkey, value= tlink

Hash TableTLink, key= link, value= list

Test Keytkey1. for eachtoken in S

2. if( token is a Test Key )3. tkey = token

4. continue5. appendtoken to end of production forT6. if( token = systemexit )7. Storetkey; currlink in TEnd.8. Gettlist from TLink usingkey = currlink

9. Appendtkey to end oftlist10. Storecurrlink; tlist in TLink.11. if duplicate digram appears12. if other occurrence is a ruleg in G

13. replace new digram with non-terminal ofg.14. else15. form a new rule and replace duplicate16. digrams with the new non-terminal.17. if any rule inG is used only once18. remove the rule by substituting the production.19. returnG

Figure 3. ModSEQUITURalgorithm.

[a M B r A C D r E r r r r x [b M B G r r r x [c M B C F r r r r x

Figure 4. Multiple execution traces with test keys.

ends, using the test identifier as a key. The other hash table(TLink)stores a list of test identifiers, using the link in theproduction where the test ends as the key. A list must bestored inTLink because compression may cause multipletest cases to map to the same link in a production rule. Bothhashes are necessary, since our algorithms must be able toboth efficiently look up where a test ends and check whethera given link has any tests ending at that link. The test keytokens must be unique and must be easily distinguishablefrom the trace tokens that follow it.

Given the keys and traces shown in Figure 4, the result-ing EvolveImpact DAG is shown in Figure 5. (In thefigure, execution keys are preceded by the “[” character.Empty hash tables are not shown.) SinceModSEQUITURdiffers from SEQUITURonly in constant time operations,it runs in timeO(N), whereN is the uncompressed tracelength [15]. The two hash tables grow with the number oftraces rather than the length of the traces. Therefore, thesize of the compressed trace is stillO(N) worst case (nocompression possible) andO(logN) best case [15]. Pro-grams containing loops may generate extremely long traces;however, loops will introduce redundancies thatModSE-QUITURtakes advantage of.

4G311ErDCAr2 3

M xrGFEDCBA

4 −> 1 r3 −> x 22 −> M B1 −> r r

T −> x4 rFC

(<*−x−2>, [a), (<*−x−2>, [b)TLink:

([a, <*−x−2>), ([b, <*−x−2>)TEnd:

(<1−3−G>, [a), (<4−3−C>, [b), (<r−x−*>, [c)

([a, <1−3−G>), ([b, <4−3−C>), ([c, <r−x−*>)

TLink:

TEnd:

Figure 5. EvolveImpact DAG.

4.2. Test removal algorithm

TheRemove-test algorithm (Figure 6) is used to re-move a test with keytkey from the DAG.Remove-testworks by finding the location of the end of the executionwith key tkey and then selectively expanding (T ), and then(if necessary) expanding the symbol at the beginning of thetrace that will be removed.Expand (Figure 7) is used toexpand a single symbol in the top level rule. AfterExpandcreates the expanded representation of a symbol,Insert(Figure 8) is used to replace the symbol inT with the ex-pansion. If the list of tests obtained fromTLink containsa predecessor totkey we know that the entire trace is com-pressed into this one symbol inT . In this case there is onlyone symbol to expand, after which we can delete the tracefrom T . If there is no predecessor in the list, the algorithmsearches backwards inT until it finds the end of the preced-ing trace and expands this symbol also.

Once the beginning and end of the trace have beenlocated and expanded, the trace is deleted byRemove-trace (Figure 9) which starts at the end symbol anddeletes predecessors inT until a system exit is encountered.The intervening symbols are known to be in the trace we aredeleting due to the traversal that established the end of thepreceding trace. Therefore, we can delete the interveningsymbols in the production ofT without any deeper traver-sal of the DAG.

Expanding a symbol may require traversing the entireDAG, in the worst case. Traversing the entire DAG is equiv-alent to traversing the uncompressed trace, inO(N) steps,whereN is the uncompressed trace length. In the best case,expanding a symbol would requireO(logN) steps. In-sert has comparable running time since it may be calledon to insert an expansion of the entire DAG. Likewise, thenRemove-trace would have to traverse the fully expanded

algorithm Remove-test ( tkey )input Test Keytkeyoutput DAG E

SequenceExpList ExpTestList

1. Gettlink from TEnd usingkey = tkey.2. Gettlist from TLink usingkey = currlink

3. ExpTestList = tlist

4. Removetkey from TEnd hash table.5. Removetlink from TLink hash table.6. Findtkey in tlist.7. Exp = Expand(tlink; tkey )8. Insert(tlink;Exp )9. if tkey has a predecessor in tlist10. Remove-trace(tkey )11. return;12. else13. plink = predecessor oftlink14. while(plink exists )15. Getplist from TLink usingplink16. if plist is not empty17. break18. else19. plink = predecessor ofplink20. Addplist to beginning ofExpTestList21. Exp = Expand( plink, tkey )22. Insert(plink;Exp )23. Remove-trace(tkey )24. while( any rule with zero uses exists )25. Remove the zero use rule.

Figure 6. Remove-test algorithm.

DAG. Therefore the running time forRemove-test isO(N), and the algorithm may requireO(N) space. Sincemost program paths exhibit some regularity [7], however,this worst case behavior is unlikely.

As an example, consider removing test key “[b” from theDAG in Figure 5. FromTEnd we find thattlink points to< 4 � 3 � C >. Expanding the symbol3 givesx M B. In-serting givesT ! 2rACDrE113G4xMBCF4rx. Sincethere is no predecessor intlist (line 8 in Remove-test ),we must search backwards for the end of the previous trace.Moving backwards inT one symbol at a time, we check forthe presense of a non-emptytlist in theTLink hash table.We find a non-empty list at the link< 1 � 3 � G >. Ex-panding3 and inserting gives the uncompressed trace, fromwhich the trace with key “[b” is easily removed. The lasttwo lines ofRemove-test attempt to clean up the gram-mar by removing any rules which are no longer used. How-ever, since a symbol in the grammar may expand into manysymbols when inserted, deleting traces generally results in alarger DAG (some loss in compression) after the deletions.

4.3. Procedure removal algorithm

When a procedure is changed or removed from the pro-gram we useFind-Key-List (Figure 10) to find the testkeys corresponding to the traces in which this procedureoccurs. Find-Key-List begins at the terminal in the

algorithm Expand ( expLink, tkey )input Link expLink

input Stringtkeyoutput List ExpList

List ExpList, global, staticStringelemLink elink; tlink

List tlistGrammar Ruleirule

1. irule = rule pointed to by expLink2. if( irule is a terminal )3. Appendirule to end ofExpList.4. return5. elem = first symbol in production ofirule.6. elink = link in irule production containingelem7. if( elem = system exit )8. Appendelem to end ofExpList.9. elink = link in ExpList pointing toelem10. Removetkey from TEnd and TLink for irule.11. else12. Removetkey from TEnd andTLink for irule.13. Expand(elink; tkey )14. while(elem has a successor in production )15. elem = successor ofelem in production ofirule16. elink = link in irule production containingelem17. if( elem = system exit )18. Appendelem to end ofExpList19. Removetkey from TEnd andTLink for irule.20. else21. Expand(elink; tkey )22. returnExpList

Figure 7. Expand algorithm.

algorithm Insert ( inslink,ExpList,ExpTestList )input Link inslink

input List ExpListinput List ExpTestList

List tmplist

Stringtmpprod

Stringtmptest

1. while(ExpList is not empty )2. tmpprod = Remove first element ofExpList3. Inserttmpprod after inslink4. Moveinslink to tmpprod

5. if( tmpprod = system exit )6. tmplist = new List7. tmptest = Remove first element ofExpTestList8. Addtmptest to tmplist

9. Storetmptest; inslink in TEnd for T10. Storeinslink; tmplist in TLink for T

Figure 8. Insert algorithm.

DAG representing the procedure or component and callsMove-up (Figure 11) to move upwards in the DAG toeach rule which uses the component. In each rule,Move-up searches forward in each production rule for test keysand adds them toTKeyList. If no keys are found in theproduction,Move-up usesDescend (Figure 12) to re-cursively search subtrees in the DAG for test keys. Afterthe keys are found the corresponding traces are removedusing Remove-test , described in the previous section.

algorithm Remove-trace ( T , tkey )input RuleTinput Stringtkeyoutput RuleT

List dlistLink dlink

StringcurrentSymbol

1. Getdlink from TEnd for T usingtkey2. Removetkey from T TEnd hash table.3. Removedlink from T TLink hash table.4. currentSymbol = symbol pointed to bydlink5. while(currentSymbol is not asystem exit )6. RemovecurrentSymbol from T production7. returnT

Figure 9. Remove-trace algorithm.

algorithm Find-Key-List ( procedure )input Stringprocedureoutput List TKeyList

List TKeyList, global, static1. for eachrule whereprocedure appears in the production2. Move-up(rule, link )3. returnTKeyList

Figure 10. Find-Key-List algorithm.

The worst case running time ofFind-Key-List is alsoO(N), whereN is the uncompressed trace length, since itmay also result in traversing the entire DAG.

Consider the effect of modifying procedureF in theDAG in Figure 5.Find-Key-List would useMove-upto examine each rule that hasF in its production, in this caseT . Move-up usesDescend to recursively search forwardin T , examining4, 1, andr, before findingx and encoun-tering a non-emptytlist containing the key “[c”. Then,Fcan be removed from the DAG by removing test key “[c”usingRemove-test .

5. Empirical Validation

The primary research question we wish to investigate iswhether incrementally computing impact analysis informa-tion with theEvolveImpact algorithms is more efficientthan recomputing that information from scratch. We alsowish to determine whether this relationship changes withnumbers of changes.

We thus performed a controlled experiment comparingthe time required to update an initialEvolveImpactDAG using our algorithms to remove and add traces tothe time required to completely rebuild anEvolveIm-pact DAG, after various numbers of test suite or programchanges. We also compared the sizes of the resulting DAGsand traces.

algorithm Move-up ( rule, link )input Ruleruleinput Link link

List nlistStringprocBooleankeeplooking

1. proc = symbollink points to in production2. while(proc has a successor )3. proc = proc successor4. nlink = link in rule production containingproc)5. Getnlist from TLink usingnlink6. if( nlist is not empty )7. Appendnlist to TKeyList

8. returnfalse9. else10. keeplooking = descend(proc )11. if( keeplooking = false )12. returnfalse13. for eachrule whereproc appears in the production14. move-up(rule, link )

Figure 11. Move-up algorithm.

algorithm Descend ( rule )input Rulerule

StringprocList tlistLink tlink

1. if( rule a terminal )2. returntrue3. else4. proc = first element of production5. tlink = link in production containingproc6. Gettlist from TLink usingtlink7. if( tlist is not empty )8. Appendtlist to TKeyList

9. returnfalse10. else11. if( descend(proc ) == false )12. returnfalse13. while(proc has a successor in production )14. proc = proc successor in production )15. if( descend(proc ) == false )16. returnfalse17. returntrue

Figure 12. Descend algorithm.

5.1. Experiment materials

As a subject we used a program developed for the Eu-ropean Space Agency; we refer to this program asspace .Space is an interpreter for an antenna array definition lan-guage (ADL).Space has approximately 6200 non-blank,non-comment lines of C code and 136 functions.

To provide input sets for use in determining dynamic be-havior, we used test cases and suites created forspacefor earlier experiments [18]. The test cases consisted of apool of 10,000 randomly generated test cases, plus 3,585test cases generated by hand to ensure coverage of every

branch in the program by at least 30 test cases. This poolof test cases was used to generate 1000 branch coverageadequate test suites containing on average approximately150 test cases each. We randomly selected 50 of these testsuites. The selected test suites in aggregate contained 7,773test cases including 2,886 duplicates.

We implemented theEvolveImpact algorithms in theJava language, using the earlierPathImpact implemen-tation as a starting point. TheEvolveImpact implemen-tation consists of 35 classes and approximately 9700 linesof non-comment code.

5.2. Experiment methodology

5.2.1 Measures

To measure the efficiency of the various algorithms, we col-lected measures of (1) the time required to collect an ini-tial set of traces and build an initial DAG, (2) the time re-quired to reexecute the changed program or test suite and re-build another DAG from scratch, and (3) the time requiredto incrementally update the initial DAG by removing andadding traces using our new algorithms. When testing theEvolveImpact algorithms with program modification,we recorded how many test keys were removed and sub-sequently added when updating the initial DAG. We alsocollected the size in bytes of each trace and each resultingDAG.

All measures were collected on five identical Sun Ultra5 workstations. Each workstation had a 360MHz SPARCprocessor, 256 MBytes of main memory, 1 GByte of swapspace, and used 100 MBit ethernet attached storage. Therewere no other users on the machines during the course ofgathering information for this experiment.

Elapsed time information was collected using the stan-dard Java library system function which returns the currentsystem time in milliseconds. The sizes of the traces and theDAGs was measured after the trace or the DAG had beenstored in a disk file.

5.2.2 Design

In this experiment, the three measures of time in millisec-onds and six measures of size in bytes are our independentvariables. Our dependent variables are the technique ap-plied (rebuild from scratch versus incremental), and eitherthe number of changes in the test suite or the number ofprocedures in the program that were modified.

We ran the experiment in two stages. The first stage ex-amined the effect of changes in test suites. For each testsuite T in our set of fifty test suites, we built the initialDAG for T . We then chose a random number,k, betweenone and the number of test cases inT , and then,k times,chose randomly whether to add or delete a test case to or

from T . Added test cases were randomly drawn from thepool of available test cases, and deleted test cases were ran-domly extracted from the test cases currently remaining inT , excluding any of the test cases just added. The result-ing modified test suite was then used to rebuild a DAG fromscratch. The updated DAG was created by deleting the testcases that were removed from the test suite from the ini-tial DAG, executing the added test cases to create a tracefile, and then adding this trace to the updated DAG. We per-formed this sequence of operations 15 times for each of our50 test suites.

In the second stage of the experiment we examined theeffect of making changes to procedures inspace . For eachof our test suitesT , after building the initial DAG forT ,we selected a random number of procedures inspace (be-tween one and four), found the list of test cases in the DAGcontaining these procedures usingFind-Key-List , re-moved the listed test cases from the DAG, reexecuted thetest cases, and added the new traces to the DAG. We alsorebuilt the DAG from scratch, as in the previous stage. Notethat in this case, since the test suite was unchanged, andthere were no actual modifications made tospace , therebuilt DAG exactly recreated the initial DAG. While thiswould not always be the case in practice (changes to pro-cedures may or may not alter test traces) when procedureshave been modified, this process provided a necessary ex-perimental control, allowing us to avoid conflating the ef-fects of procedure changes with the associated test casechange effects. We also used the rebuilt DAG to provide acomparison and consistency check against the initial DAG.As in the first stage, we repeated this sequence 15 times foreach test suite.

5.2.3 Threats to validity

Like any controlled experiment, this experiment facesthreats to validity, and these must be controlled for, or ac-counted for when interpreting results.

Threats to internal validity concern our ability to prop-erly draw conclusions about the connections between ourindependent and dependent variables. In this experiment,our primary concern involves instrumentation effects, suchas errors in our algorithm implementations or measurementtools, which could affect outcomes. To control for thesethreats we carefully validated these implementations andtools on known examples.

Threats to external validity concern our ability to gen-eralize our results to larger classes of subjects. In this ex-periment we have studied only one program, and althoughit is a “real” program we cannot claim it is representativeof programs generally. Also, the test suites we use repre-sent only one type of suite that could be found in practice.Finally, the program and test suite modifications were simu-lated, not actual, modifications. Additional studies of other

0

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

0 20 40 60 80 100 120 140

Upd

ate

Tim

e (m

s)

Test Suite Modifications

Figure 13. Rebuild times and Update times in milliseconds for test suite modifications. The average time required torebuild the DAG for each number of modifications is represented using an “x”. The averageupdate time is shown as a circle.

programs and types of inputs are needed to address suchthreats to external validity.

Threats to construct validity concern the appropriatenessof our measures for capturing our dependent variables. Ourmeasures of time and space capture one aspect of these vari-ables, but other factors (e.g. the amount of memory ratherthan disk consumed) may be important in particular situ-ations. Also, Space executes very quickly, and its execu-tion time contributes little to the initial, rebuild, and updatetimes. This causes algorithm run time to play a more im-portant role in comparisons of techniques than it would fora program requiring longer execution times. Longer exe-cution times, however, would be expected in general to in-crease the benefit gained by the incremental algorithms.

5.3. Analysis and results

We now present our results, beginning by presenting anddiscussing data. Section 5.4 discusses implications.

5.3.1 Data

In the first stage of our experiment, over each of the 50 testsuites, we measured three times in milliseconds, six sizes

in bytes, and the number of changes made to the test suite.The time required to build the initial DAG was collectedas an experimental control and consistency check. (Sincethe initial DAG would be created in practice regardless ofthe decision to rebuild or update the DAG after changes, itscreation time and size has no effect on the research questionwe are investigating.)

The graph in Figure 13 provides a view of the rebuild andupdate time measurements collected during the test suitemodification stage. Each data point represents the averagetime in milliseconds for 15 trials for each number of randomtest suite modifications.

Stage two of our experiment simulated changes to proce-dures inspace . The number of program modifications wasrandomly chosen between one and four, inclusive. How-ever, since the number of modifications is transformed intoa list of test cases to remove, we reasoned that the number oftest cases removed from the DAG was a more reliable indi-cator of the extent of changes made for each trial. We referto the number of test cases that were removed as the “TestChange Factor”, or TCF. Figure 14 shows a box plot of thenumber of modifications made to the program compared tothe observed TCF for 750 trials. In general, the more pro-

Figure 14. Number of program modifications versusTest Change Factor (TCF).

cedures that were modified, the more test cases in the testsuite were affected. However, it was not uncommon for amodification of a single procedure to affect the majority ofthe test cases in a test suite.

Finally, Figure 15 shows TCF versus the rebuild and up-date times for the procedure modification stage of the ex-periment.

5.3.2 Analysis of time differences

When responding to test suite modifications, the averagetime required to update the DAG is less than the averagetime required to rebuild the DAG across the entire range oftest suite modification sizes. The average rebuild time was958,156 milliseconds with a standard deviation of 172,616milliseconds. The computational effort required to updatethe DAG after test suite modifications appears to be linearin the number of test cases added to or removed from theDAG under the circumstances in which we collected ourdata. Rebuild time grows closer to update time as test suitesize increases, but never reaches it, even as maximum testsuite size is reached.

The time required to update the DAG after proceduremodifications shows a more complex relationship in rela-tion to rebuild times. The average rebuild time was 830,417milliseconds with a standard deviation of 95,340 millisec-onds. For program modifications that affect fewer than ap-proximately 25 test cases in the DAG, updating the DAGrequired less time than recomputing the DAG. For largeTCF values, the computational effort required by the twoapproaches appears relatively equal. Between a TCF of 25and approximately 145, however, updating the DAG wasconsistently (with one outlier, a TCF of 33 having an aver-age update time of 668,389 milliseconds versus an averagerebuild time of 830,653 milliseconds) less efficient than re-

computation. Update times were also relatively variable inthis range, most likely due to variation in the internal com-pression of the traces in the DAG. In areas of high compres-sion, less graph traversal is required when searching for theappropriate test keys.

5.3.3 Analysis of size differences

We present the size measurements for the test suite modi-fication stage of our experiment first, followed by the sizemeasurements for the program modification stage.

The average size of the initial traces generated by the testsuites during the first stage of the experiment was 982,460bytes, with a standard deviation of 49,221 bytes. The aver-age size of the initial DAG was 80,685, with a standard de-viation of 3,967 bytes. The average size of the initial DAGrelative to the initial trace was 7.78%. The standard devia-tion was 0.57%.

The recomputed DAGs in the first stage had an averagesize of 81,906 bytes, and standard deviation of 5,197 bytes.The traces used to rebuild the DAGs had an average size of1,039,282 bytes with a standard deviation of 90,874 bytes.The relative size of the DAG compared to its correspondingtrace was 7.74%, with a standard deviation of 0.56%.

Updated DAGs had an average size of 92,832 bytes witha standard deviation of 8,220 bytes. The updated DAG wason average 1.13 times the size of the rebuilt DAG, with astandard deviation of 0.085. Since the size of the corre-sponding trace reflects only test cases that were added, abetter relative comparison of the compression obtained canbe obtained by observing the size of the updated DAG inrelation to the size of the traces used to rebuild the DAG. Inthis case the updated DAG was 8.44% the size of the tracesused to rebuild the DAG, and had a standard deviation of0.72%.

For our experiments with program modifications, the av-erage size of the initial DAG was 80,651 bytes, and had astandard deviation of 4,407 bytes. The average size of theinitial traces was 982,461 bytes with a deviation of 49,221bytes. On average the size of the DAG was 7.80% the sizeof the trace, with a standard deviation of 0.57%.

The traces used to recompute DAGs, and the recomputedDAGs, exactly matched the size and variation of the initialtraces and DAGs, since they were based on the same testsuite and the run-time behavior ofspace did not change.

The updated DAGs had an average size of 103,623 bytesand a standard deviation of 23,833 bytes. In comparison tothe recomputed DAGs, the updated DAGs were 1.28 timeslarger, with a standard deviation of 0.290. The updatedDAGs were 10.0% the size of the traces used to create therecomputed DAGs, with a standard deviation of 2.4%.

0

500000

1e+06

1.5e+06

2e+06

2.5e+06

3e+06

3.5e+06

0 20 40 60 80 100 120 140 160

Upd

ate

Tim

e (m

s)

Test Change Factor

Figure 15. Rebuild times versus Update times in milliseconds by TCF. The average time required to rebuild the DAG foreach TCF is represented using an “x”. The averageupdate time is shown as a circle.

5.4. Discussion

Our primary goal was to compare the time required toupdate the DAG to the time required to recalculate the DAG.In all observed cases, it was more efficient to update a DAGthan to regenerate it when the program’s test suite was mod-ified. The computational effort measured by elapsed timegrows slowly with the number of changes made to the testsuite, never becoming equivalent to the recomputation ef-fort even when all test cases in the suite are modified.

When a procedure or procedures in the program weremodified, results varied with the number of test cases con-taining the modified procedures. Inspace we found it isnot uncommon for one procedure to be hit in a majorityof the test cases. The computational effort required to up-date the DAG was less than the effort of recomputation onlyfor program modifications that affect approximately 25 testcases or fewer. One possible explanation for this result isthatFind-Key-List may perform more graph traversaloperations than expected, or that the traversal may be morecomputationally expensive than was apparent, above a cer-tain modification threshold.

An additional goal of our experiments was to measurethe absolute and relative sizes of the tracesspace gen-erated and theEvolveImpact DAGs constructed fromthem. The sizes of the traces and DAGs was consistent andshowed little variation. A compression factor of approxi-mately 10 was consistently observed. Updating the whole-path DAGs caused only minor loss of compression.

6 Conclusions and Future Work

In this paper, we have presented algorithms that allowourPathImpact approach to dynamic impact analysis tobe applied incrementally as a system evolves, processingthe set of test case or procedure modifications made to aprogram, and avoiding the overhead of completely recom-puting the information needed for impact analysis.

Our empirical study of these algorithms suggest thatchanges in a test suite can be accommodated more effi-ciently usingEvolveImpact than by rebuilding the DAGfrom scratch. The results also suggest that when proce-dures are changed, the cost-effectiveness of the approachwill vary depending on the number of test cases which reach

those procedures. After a system change inspace , it be-comes more efficient to rebuilt the DAG than to use theEvolveImpact algorithms for a relatively small numberof affected tests. This may be due to the behavior ofspaceand the particular test cases we used in our experiment, butcould be expected to hold for other workloads too. An im-mediate point of investigation is to find a practical metricwith which to determine when this threshold is reached,both by extended experimentation withspace and by ex-amining the behavior of other programs. We will also inves-tigate the possibility of creating a more efficient algorithmfor removing components from the DAG, one that removescomponents directly rather than usingFind-Key-Listand then removing each test.

Future work includes experimentation with a widerrange of programs, an examination of the sensitivity of ourtechniques to test input data, and consideration of object-oriented and distributed programs. While this research hasinvestigated the application of an approach at the level ofa procedure, we intend to apply this model to other levels,such as components. We intend to investigate scalability tolarge systems by adapting the local level of modeling ac-cording to various criteria, such as local usage or complex-ity measures.

We also plan additional empirical comparisons withconventional dynamic slicing techniques. While typicaldynamic slicing techniques operate at a different level(statement-level versus procedure-level) than our tech-niques, the behavior of impact analyses based on dynamicslicing may be more directly comparable to our techniquesthan analyses based on other approaches such as static slic-ing, and the techniques may offer different cost-benefitstradeoffs.

Acknowledgements

This work was supported in part by the NSF Infor-mation Technology Research program under AwardCCR-0080900 to Oregon State University. Phyllis Frankl, Al-berto Pasquini, and Filip Vokolos provided thespace pro-gram and randomly generated tests. A special thanks toChengyun Chu for creating the additional tests forspace .

References

[1] H. Agrawal and J. Horgan. Dynamic program slicing. InProceedings: SIGPLAN ’90 Conference on ProgrammingLanguage Design and Implementation. SIGPLAN Notices.,pages 246–56, White Plains, June 1990. ACM.

[2] R. S. Arnold and S. A. Bohner. Impact analysis - towardsa framework for comparison. InProceedings of the Inter-national Conference on Software Maintenance, pages 292–301, Montreal, Que, Can, Sept. 1993. IEEE.

[3] S. Bohner and R. Arnold.Software Change Impact Analy-sis. IEEE Computer Society Press, Los Alamitos, CA, USA,1996.

[4] S. Horwitz, T. Reps, and D. Binkley. Interprocedural SlicingUsing Dependence Graphs.ACM Trans. Prog. Lang. Syst.,12(1):27–60, Jan. 1990.

[5] M. Kamkar. An Overview and Comparative Classification ofProgram Slicing Techniques.Journal of Systems Software,31(3):197–214, 1995.

[6] B. Korel and J. Laski. Dynamic slicing in computer pro-grams.Journal of Systems Software, 13(3):187–95, 1990.

[7] J. Larus. Whole Program Paths. InProc. SIGPLAN PLDI99, pages 1–11, Atlanta, GA, May 1999. ACM.

[8] J. Law and G. Rothermel. Whole Program Path-Based Dy-namic Impact Analysis. InICSE 2003, Portland, Oregon,May 2003. IEEE.

[9] M. Lee. Change Impact Analysis of Object-Oriented Soft-ware. Ph.D. dissertation, George Mason University, Feb.1999.

[10] M. Lee, A. J. Offutt, and R. T. Alexander. Algorithmic Anal-ysis of the Impacts of Changes to Object-oriented Software.In TOOLS-34 ’00, 2000.

[11] L. Li and A. J. Offutt. Algorithmic analysis of the impact ofchanges to object-oriented software. InProceedings of theInternational Conference on Software Maintenance, pages171–184, Monterey, CA, USA, Nov. 1996. IEEE.

[12] M. Lindvall and K. Sandahl. How well do experienced soft-ware developers predict software change?Journal of Sys-tems and Software, 43(1):19–27, Oct. 1998.

[13] J. L. Lions. ARIANE 5, Flight 501 Failure, Report by theInquiry Board.European Space Agency, July 1996.

[14] J. P. Loyall, S. A. Mathisen, and C. P. Satterthwaite. Impactanalysis and change management for avionics software. InProceedings of IEEE National Aerospace and ElectronicsConference, Part 2, pages 740–747, Dayton, OH, July 1997.

[15] C. Nevill-Manning and I. Witten. Linear-time, incrementalhierarchy inference for compression. InProc Data Com-pression Conference (DDC 97), pages 3–11. IEEE, 1997.

[16] S. L. Pfleeger.Software Engineering: Theory and Practice.Prentice Hall, Englewood Cliffs, NJ, 1998.

[17] A. Podgurski and L. Clarke. A formal model of program de-pendences and its implications for software testing, debug-ging, and maintenance.IEEE Trans. Softw. Eng., 16(9):965–79, Sept. 1990.

[18] G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. TestCase Prioritization: An Empirical Study. InProceedingsof the International Conference on Software Maintenance,pages 179–188, Oxford, UK, Sept. 1999.

[19] B. G. Ryder and F. Tip. Change impact analysis for object-oriented programs. InACM SIGPLAN – SIGSOFT workshopon on Program analysis for software tools and engineering,pages 46–53. ACM Press, 2001.

[20] F. Tip. A survey of program slicing techniques.Journal ofProgramming Languages, 3:121–189, 1995.

[21] R. J. Turver and M. Munro. Early impact analysis tech-nique for software maintenance.Journal of Software Main-tenance: Research and Practice, 6(1):35–52, Jan. 1994.

[22] M. Weiser. Program slicing. In5th International Conferenceon Software Engineering, pages 439–49, San Diego, CA,Mar. 1981. IEEE.

Date post:	12-Feb-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Incremental Dynamic Impact Analysis for Evolving Software Systems

Documents