+ All Categories
Home > Documents > 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

Date post: 06-Apr-2018
Category:
Upload: sri-sugianto
View: 225 times
Download: 0 times
Share this document with a friend

of 20

Transcript
  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    1/20

    Automatic Extraction of Heap ReferenceProperties in Object-Oriented Programs

    Brian Demsky and Martin Rinard, Member, IEEE Computer Society

    AbstractWe present a new technique for helping developers understand heap referencing properties of object-oriented programs

    and how the actions of the program affect these properties. Our dynamic analysis uses the aliasing properties of objects to synthesize

    a set of roles; each role represents an abstract object state intended to be of interest to the developer. We allow the developer to

    customize the analysis to explore the object states and behavior of the program at multiple different and potentially complementary

    levels of abstraction. The analysis uses roles as the basis for three abstractions: role transition diagrams, which present the observed

    transitions between roles and the methods responsible for the transitions; role relationship diagrams, which present the observed

    referencing relationships between objects playing different roles; and enhanced method interfaces, which present the observed roles

    of method parameters. Together, these abstractions provide useful information about important object and data structure properties

    and how the actions of the program affect these properties. We have implemented the role analysis and have used this implementation

    to explore the behavior of several Java programs. Our experience indicates that, when combined with a powerful graphical user

    interface, roles are a useful abstraction for helping developers explore and understand the behavior of object-oriented programs.

    Index TermsProgram understanding, roles, design recovery.

    1 INTRODUCTION

    THIS paper presents a new technique to help developersunderstand heap referencing properties (such proper-ties capture constraints that involve references betweenobjects in the heap) of object-oriented programs and howthe actions of the program affect those properties. Ourthesis is that each objects referencing relationships withother objects determine important aspects of its purpose in

    the computation, and that we can use these referencingrelationships to synthesize a set of conceptual object states(we call each state a role) that captures these aspects. As theprogram manipulates objects and changes their referencingrelationships, each object transitions through a sequence ofroles, with each role capturing the functionality inherent inits current referencing relationships. To the best of ourknowledge, the concept that an objects referencing relation-ships, in conjunction with other properties, should deter-mine its conceptual state was initially developed by Kuncaket al. [18].

    We have built two tools that enable a developer to useroles to explore the behavior of object-oriented programs:1) a dynamic role analysis tool that automatically extractsthe different roles that objects play in a given computationand characterizes the effect of program actions on theseroles, and 2) a graphical, interactive exploration tool that isintended to present this information in an intuitive form to

    the developer. By allowing the developer to customize thepresentation of this information to show the amount ofdetail appropriate for the task at hand, these tools supportthe exploration of both detailed properties within a singledata structure and larger properties that span multiple datastructures. Our experience using these tools indicates thatthey can provide substantial insight into the structure,

    behavior, and key properties of the program and the objectsthat it manipulates.

    1.1 Role Separation Criteria

    The foundation of our role analysis system is a set of criteria(the role separation criteria) that the system uses to separateobjects of the same class into different roles. Conceptually,we formalize the role separation criteria as a set ofpredicates that classify objects into roles. Note that thisclassification of objects into roles can change with time asthe objects referencing relationships change. Each predicatecaptures some aspect of the objects referencing relation-ships. Two objects play the same role if they have the same

    values for these predicates. Our system supports predicatesthat capture the following kinds of relationships:

    . Referenced-By Relationships: The functionality ofan object often depends on the objects that refer to it.For example, objects of the PlainSocketImplclass1 acquire input and output capabilities whenreferenced by a SocketInputStream or Sock-etOutputStreamobject. The role separation criter-ia capture these distinctions by placing objects thatare referenced by different fields in different roles.Formally, there is a role separation predicate for each

    field of each class for a specific number of references.

    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009 305

    . B. Demsky is with the University of California, Irvine, 544E EngineeringTower, Irvine, CA 92697. E-mail: [email protected].

    . M. Rinard is with the MIT Computer Science and Artificial IntelligenceLaboratory, The Stata Center, Building 32-G744, 32 Vassar Street,Cambridge, MA 02139. E-mail: [email protected].

    Manuscript received 10 Apr. 2008; accepted 8 Aug. 2008; published online 24

    Nov. 2008.Recommended for acceptance by M. Dwyer.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-2008-04-0144.Digital Object Identifier no. 10.1109/TSE.2008.91.

    1. The PlainSocketImpl class is the undocumented implementationclass for socket communications in the Java API. We used the version of thisclass included with the Sun JDK version 1.1.8.

    0098-5589/09/$25.00 2009 IEEE Published by the IEEE Computer Society

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    2/20

    An object o satisfies the role separation predicate forthe field f declared in the class C for i number ofreferences if the f field in exactly i objects thatextend C or implement Cs interface contain areference to o. The user can specify an upper boundon the number of distinctions to make for a givenfield. If the upper bound is i, then all objects with atleast i such references satisfy the same predicate.

    . Reference-To Relationships: The functionality of anobject often depends on the objects to which it refers.A Java Socket object, for example, does not supportcommunication until its file descriptor field refers toan actual file descriptor object. To capture thesedistinctions, our analysis contains role separationcriteria that place objects in different roles if theyhave different nonnull fields. Formally, there is arole separation predicate for each nonprimitive fieldof each class. An object o satisfies the role separationpredicate for the field f declared in the class C if theobjects f field is nonnull. Although we could, in

    principle, use more complicated criteria, we havefound that the null value criterion usually capturesthe important properties of the referenceto relationwithout introducing extraneous distinctions.2 For-mally, there is a predicate for each field of everyclass. An object satisfies the predicate for the field fif the f field of the object is not null.

    . Reachability: The functionality of an object oftendepends on the specific data structures in which itparticipates. For example, a program may maintaintwo sets of objects: one set that it has completedprocessing, and another that it has yet to process. To

    capture such distinctions, our role separation criteriaidentify the roots of different data structures andplace objects with different reachability propertiesfrom these roots in different roles. Formally, there isa predicate for each variable that may be a root of adata structure. An object satisfies the predicate if it isreachable from the variable. Additionally, we definea unique garbage role for unreachable objects.

    . Identity: To facilitate navigation, data structuresoften contain reverse pointers. For example, theobjects in a circular doubly linked list satisfy identitypredicates corresponding to the paths next.prevand prev.next. Formally, there is a role separationpredicate for each pair of fields. The predicate is trueif the path specified by the two fields exists andleads back to the original object.

    . History: In some cases, objects may change theirconceptual state when a method is invoked on them, but the state change may not be visible in thereferencing relationships. For example, the nativemethod bind assigns a name to objects of thePlainSocketImpl class, enabling them to acceptconnections. But the data structure changes asso-ciated with this change are hidden behind the

    operating system abstraction. To support this kindof conceptual state change, the role separationcriteria include part of the method invocation historyof each object. Formally, there is a predicate for eachparameter of each method. An object satisfies one ofthese predicates if it was passed as that parameter insome invocation of that method.

    1.2 Role SubspacesTo allow the developer to customize the role separationcriteria, our system supports role subspaces. Each rolesubspace contains a subset of the possible role separationcriteria. The developer specifies a role subspace by choosingan arbitrary subset of the role separation criteria. Whenoperating within a given subspace, the tools coarsen theseparation of objects into roles by only keeping thedistinctions made by the criteria in that subspace. Weenvision that developers will use subspaces in a varietyof ways:

    .

    Focused Subspaces: As developers explore the behavior of the program, they typically focus ondifferent and changing aspects of the object proper-ties and referencing relationships. By choosing asubspace that excludes irrelevant criteria, the devel-oper can explore relevant properties at an appro-priate level of detail while ignoring distractingdistinctions that are currently irrelevant.

    . Orthogonal Subspaces: Developers can factor therole separation criteria into orthogonal subspaces.Each subspace identifies a current role for eachobject; when combined, the subspaces provide aclassification structure in which each object can

    simultaneously play multiple roles, with each rolechosen from a different subspace. These subspacesallow the developer to separate orthogonal concernsinto orthogonal subspaces.

    . Hierarchical Subspaces: Developers can construct ahierarchy of role subspaces, with child subspacesaugmenting parent subspaces with additional roleseparation criteria. In effect, this approach allowsdevelopers to identify an increasingly precise anddetailed dynamic classification hierarchy for theroles that objects play during their lifetimes in thecomputation.

    Role subspaces give the developer great flexibility inexploring different perspectives on the behavior of theprogram. Developers can use subspaces to view changingobject states as combinations of roles from differentorthogonal role subspaces, as paths through an increasinglydetailed classification hierarchy, or as individual points in aconstellation of relevant states. Unlike traditional structur-ing mechanisms such as classes, roles and role subspacessupport the evolution of multiple complementary views ofthe programs behavior, enabling the developer to seam-lessly flow through different perspectives as he or sheexplores different aspects of the program at hand.

    1.3 Static Versus Dynamic Analysis

    The tool presented in this paper uses a dynamic analysis toextract role information that reflects the observed actions

    306 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    2. If a developer finds that this criterion does not provide sufficientdetails about the reference-to relationships, he or she can use the multipleobject data structure mechanism described in Section 3.2 to merge the entirerole description of a given class of objects into the roles of the objects thatreference them.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    3/20

    and states from a single execution of the program. Like all

    dynamic analyses, the extracted information may, there-

    fore, be incomplete in that different executions of the

    program may produce different actions and states. Sound

    static analyses, on the other hand, compute information that

    reflects the actions and states of all possible executions. The

    potential drawback is that the information may be more

    difficult to extract or less precise than the information froma dynamic analysis with similar goals. In particular, we

    found it easier to build a dynamic tool that extracts role

    information than to build a static analysis that either

    verifies or discovers such information [18]. The reasons

    range from simple engineering issues (we found it is easier

    to instrument the program and analyze the extracted

    information than to build a parser and semantic analysis

    for a complete programming language) to fundamental

    complexity issues (dynamic analyses only need to deal with

    the concrete relationships that occur when the program

    executes, while static analyses need some systematic way to

    characterize uncertainty in these relationships).

    1.4 Contributions

    This paper makes the following contributions:

    . Role-Based Program Understanding: It introducesthe concept that object referencing relationships andmethod invocation histories can be used to synthe-size a cognitively tractable abstraction for under-standing the changing roles that objects play in thecomputation.

    . Role Separation Criteria: It presents a set of criteriafor classifying objects of the same class into differentroles. It also presents an implemented tool that usesthese criteria to automatically extract informationabout the roles that objects play.

    . Role Subspaces: It shows how developers can userole subspaces to structure their understanding andpresentation of the different aspects of the programstate. Specifically, the developer can customize therole subspaces to focus the role separation criteria tohide (currently) irrelevant distinctions, to factor theobject state into orthogonal components, and todevelop object classification hierarchies.

    . Graphical Role Exploration: It presents a tool thatgraphically and interactively presents role informa-tion. Specifically, this tool presents role transitiondiagrams, which display the trajectories thatobjects follow through the space of roles, and rolerelationship diagrams, which display referencingrelationships between objects that play differentroles. These diagrams are hyperlinked for easynavigation.

    . Role Exploration Strategy: It presents a generalstrategy that we developed to use the tools toexplore the behavior of object-oriented programs.

    . Experience: It presents our experience using ourtools on several Java programs. We found that thetools enabled us to quickly discover and understandimportant properties of these programs.

    2 EXAMPLE

    We next present a simple example that illustrates how adeveloper can use our tools to explore the behavior of aWeb server. We use a version of JhttpServer, a Web serverwritten in Java. This program accepts incoming requests forfiles from Web browsers and serves the files back to theWeb browsers.

    The code in the JhttpServer class first opens a portand waits for incoming connections. When it receives aconnection, it creates a JhttpWorker object, passes theSocket controlling the communication to the JhttpWorkerinitializer, and turns control over to the JhttpWorkerobject.

    The code in the JhttpWorker class first builds inputand output streams corresponding to the Socket. It thenparses the Web browsers request to obtain the requestedfilename and the http version from the Web browser. Next,it processes the request. Finally, it closes the streams and thesocket and returns to code in the JhttpServer class.

    2.1 Starting Out

    To use our system, the developer first compiles the programusing our compiler, then runs the program. The compilerinserts instrumentation code that generates an executiontrace. This trace consists of a log of the important heapoperations, local variable manipulations, and method callsthat the program performs. Note that it is not possible touse the standard JVM profile interface [28] to obtain thisinformationalthough this interface can generate a notifi-cation event for many important operations, it does notgenerate notification events for object or array field writes.

    We need this write event information to track the objectreferencing relationships and synthesize roles from thisinformation.

    The tool then reads the trace to extract the roleinformation and convert it into a form suitable forinteractive graphical display. The tool evaluates the rolesof the objects at method boundaries. We use four abstrac-tions to present the observed role information to thedeveloper:

    1. role transition diagrams, which present the observedrole transitions for objects of a given class,

    2. role relationship diagrams, which present referen-cing relationships between objects from differentclasses,

    3. role definitions, which present the referencingrelationships that define each role, and

    4. enhanced method interfaces, which show the objectreferencing properties at invocation and the effect ofthe method on the roles of the objects that it accesses.

    The graphical user interface runs in a Web browser withrelated information linked for easy navigation. We chosethis implementation platform because it satisfied all of ouruser interface needs. The alternative, building our own

    custom user interface platform, would have substantiallyincreased the engineering effort required to build thesystem without a corresponding increase in the usabilityin the system.

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 307

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    4/20

    2.2 Role Transition Diagrams

    We expect that developers will typically start exploring the behavior of a program by examining role transitiondiagrams to get a feel for the different roles that objects ofeach class play in the computation. In this example, weassume the developer first examines the role transitiondiagram for the JhttpWorker class, which handles clientrequests. Fig. 1 presents this diagram.3 Note that our tool

    automatically generates initial names for roles. If thedeveloper is unhappy with the automatically generatednames, he or she can rename the roles. The initial names ourtool generates consist of a list of the fields that referenceobjects playing the role, followed by the class name,followed by a list of fields that are nonnull in objectsplaying the role. Special names are generated for the initialrole for each class and the garbage role. Excessivelycomplex roles are simply assigned a name consisting ofthe class followed by a unique number.

    The ellipses represent roles and the arrows representtransitions between roles. Each arrow is labeled with the

    method that caused the object to take the transition. Solidedges denote the execution of methods that takeJhttpWorker objects as a parameter; dotted edges denoteeither methods that change the roles of JhttpWorkerobjects but do not take JhttpWorker objects as aparameter or portions of methods. From the figure, wecan see that the JhttpWorker.method method transitionsJhttpWorker objects from the Initialized JhttpWor-ker role to the JhttpWorker with filename role. Therole JhttpWorker with methodType is an intermediaterole that is made visible in the middle of the execution ofthe JhttpWorker.method when that method makes a call

    to another method. Note that methods may change the rolesof objects that are not parameters either indirectly, bychanging a heap reference to the object or the objectsreachability, or directly, by accessing the object through aglobal variable or through another object. The diagramalways presents the most deeply nested (in the call graph)method responsible for the role change.

    2.3 Role Definitions

    Role transition diagrams show how objects transition between roles, but provide little information about theroles themselves. Our graphical interface therefore linkseach role node with its role definition, which specifies theproperties that all objects playing that role must have. Fig. 2presents the role definition for the JhttpWorker withfilename role, which is easily accessible by using the mouseto select the roles node in the role transition diagram. Thisdefinition specifies that objects of the JhttpWorker withfilename role have the class JhttpWorker, no referencesfrom other objects, no identity relations, and referenceobjects using the fields httpVersion, fileName, meth-odType, and client.

    2.4 Role Relationship Diagrams

    After obtaining an understanding of the roles of importantclasses, the developer typically moves on to considerrelationships between objects of different classes. Theserelationships are often crucial for understanding the largerdata structures that the program manipulates. We believethat role relationship diagrams are the primary tool thatdevelopers will use to help them understand these relation-ships. Fig. 3 presents a portion of the role relationshipdiagram surrounding one of the roles of the JhttpWorkerclass. The ellipses in this diagram represent roles, and thearrows represent referencing relationships between objectsplaying those roles.

    Note that some of the groups of roles presented in Fig. 3correspond to combinations of objects that conceptually actas a single entity. For example, the HashStrings objectand the underlying array of Pairs that it points toimplement a map from String to String. Developersoften wish to view a less detailed role relationship diagramthat merges the roles for these kinds of combinations.

    In many cases, the analysis can automatically recognizethese combinations and represent them with a single-rolenode.4 Fig. 4 presents the role relationship diagram that thetool produces when the developer turns this option on. Theanalysis uses the heuristic that if only one heap reference

    ever exists to an object, this is likely to be conceptually part

    308 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    3. In addition to graphically presenting these diagrams in a Web browser, our tool is capable of generating PostScript images of eachdiagram using the dot tool [6]. All of the diagrams in this paper wereautomatically generated using our tool.

    Fig. 1. Role transition diagram for JhttpWorker class.

    Fig. 2. Sample role definition for JhttpWorker class.

    4. Section 3.2 discusses the specific user-selected policies the analysisuses to discover combinations of objects that conceptually act as a singleentity.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    5/20

    of the object that references it. Notice that this heuristicenables the analysis to recognize the Socket object and thehttpVersion string as being part of the JhttpWorkerobject. Also, notice that it recognizes the Pair arrays, Pairobjects, and key strings as being part of the correspondingHashStrings object, with the key strings disappearing inthe abstracted diagram because they are encapsulatedwithin the HashStrings data structure. The analysisallows the developer to choose, for each class, a policy that

    determines how (and if) the analysis merges roles of thatclass into larger data structures.

    An examination of Figs. 3 and 4 shows that objects of thePlainSocketImpl class play many different roles. Toexplore these roles, the developer examines the roletransition diagram for the PlainSocketImpl class.Fig. 5 presents this diagram. The diagram contains twodisjoint sets of roles, each branching off of the InitialPlainSocket role. This structure indicates that objects of theclass have two distinct purposes in the computation:PlainSocketImpl objects that are referenced by Socketobjects manage communication over a TCP/IP connection

    while PlainSocketImpl objects that are referenced byServerSocket objects accept new incoming socket con-nections. We have used the name ServerPlainSock-etImpl to name the roles that serve the latter purpose. Thisis an example of a common code reuse pattern in which

    multiple distinct functionalities are merged into a singleobject type. In this example, our analysis was able to recoverdesign information about two distinct usage scenarios forthe PlainSocketImpl class.

    Each PlainSocketImpl object has a corresponding filedescriptor that the underlying operating system uses toimplement the socket communication. Even though thestate associated with these file descriptors is inaccessible toJava, this state is conceptually part of the object and can

    affect the objects interface. We note that although the bindand listen methods do not modify the heap referencingproperties of a PlainSocketImpl object, they do modifythe state associated with the corresponding file descriptor.Moreover, they enable the accept method to be invokedon the corresponding PlainSocketImpl object. To cap-ture this conceptual change in the objects role, thedeveloper can specify that the invocation of certain methodson an object changes the objects role. Our implementationuses a set of method invocation history predicates tocapture these changes. In this example, we configured thetool to include method history predicates for both the bindand listen methods.

    2.5 Enhanced Method Interfaces

    Finally, our tool can present information about the roles ofparameters and the effect of each method on the roles thatdifferent objects play. Given a method, our tool presents this

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 309

    Fig. 4. Portion of role relationship diagram for JhttpServer after part object abstraction.

    Fig. 3. Portion of role relationship diagram for JhttpServer.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    6/20

    information in theform of an enhanced methodinterface. Thecall context section of this interface provides the roles of theparameters at method entry and exit. Thewrite effects sectionof this interface provides a list of regular expressions

    summarizing the writes performed by the method. Theseregular expressions give the path to an object from theparameters of the method and the global variables in terms ofthe heap at method invocation. The read effects sectionprovides a list of regular expressions summarizing thereferences that the method reads. The role transitions sectionprovides a list of the role transitions the method has beenobserved to perform and the corresponding regular expres-sions specifying the path to the objects that have undergonethe role transition. The presence of the keyword NEW for aregular expression indicates that the object was allocatedwithin the scope of the method. Fig. 6 presents an enhanced

    method interface for the SocketInputStream initializer.This interface indicates that the SocketInputStreaminitializer operates on object that plays the roles of InitialInputStream and PlainSocket w/fd. When it executes, it

    changes the roles of these objects to InputStream w/impl andPlainSocket w/input, respectively.

    Enhanced method interfaces provide the developer withadditional information about the (otherwise implicit)

    assumptions that the method may make about its para-meters and the roles of the objects that it manipulates. Thisinformation may help the developer better understand thepurpose of the method in the computation and provideinsight for its successful use in other contexts.5

    2.6 Role Information

    In general, roles capture important properties of the objectsand provide useful information about how the actions of theprogram affect those properties.6

    . Consistency Properties: Our analysis can discoverprogram-level data structure consistency properties.

    310 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    Fig. 5. Role transition diagram for the PlainSocketImpl class.

    5. For example, the method may require one of its parameters to play aspecific role. The enhanced method would reveal this to the developer.

    6. Section 6 presents some examples of the tool being used to discoverthe following properties for a set of benchmarks.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    7/20

    For example, our analysis may discover that whilean object may participate in several data structuresover its lifetime in the program, at any given time, itparticipates in at most one of those data structures.The analysis can also discover fine-grained proper-ties within an individual data structure, for example,the next and prev references in a doubly linked listare inverses. Section 6 provides several examples ofadditional data structure properties that occur in ourset of benchmark applications.

    . Enhanced Method Interfaces: In many cases, the

    interface of a method makes assumptions about thereferencing relations of its parameters. Our analysiscan discover constraints on the roles of parametersof a method and determine the effect of the methodon the heap.

    . Multiple Uses: Code factoring minimizes codeduplication by producing general-purpose classesthat can be used in a variety of contexts. But thispractice obscures the different purposes that differ-ent objects of these classes serve in the computation.Our analysis can rediscover these distinctions.

    . Correlated Relationships: In many cases, groups ofobjects cooperate to implement a piece of function-ality, with the roles of the objects in the groupchanging together over the course of the computa-tion. Our analysis can discover these correlated statechanges.

    3 DYNAMIC ANALYSIS

    We implemented the dynamic analysis as several compo-

    nents. The first component uses the MIT FLEX compiler7 to

    instrument Java programs to generate execution traces.

    Because our compiler accepts Java byte codes as input and

    generates native code as output, it does not require source

    code. The instrumented program assigns unique identifiersto every object and records relevant heap and pointer

    operations in the execution trace. Relevant operations inthis case include writing pointer values to arrays or fields,cloning objects, creating new objects, reading pointervalues, creating or changing local variable references toobjects, and method calls and returns. The second compo-nent uses this trace to replay the programs manipulation ofthe heap. As part of this computation, it also calculatesreachability information and records the effect of eachmethods execution on the roles of the objects thatit manipulates.

    3.1 Predicate Evaluation

    The dynamic analysis uses the information it extracts fromthe trace to apply the role separation criteria as follows:

    . Referenced-By: In addition to reconstructing theheap, the analysis also maintains a set of inversereferences. There is one inverse reference for eachreference in the original heap. For each reference to atarget object, the inverse reference enables the

    dynamic analysis to quickly find the source of thereference and the field containing the reference. Tocompute the referenced-by predicates for a givenobject, the analysis examines the inverse referencesfor that object.

    . Reference-To: The reconstructed heap contains all ofthe references from the original program, enablingthe analysis to quickly compute all of the reference-to predicates for a given object by examining its listof references.

    . Identity: To compute the identity predicates for agiven object, the analysis traces all paths of lengthtwo from the object to find paths that lead back to

    the object. These predicates are designed to identifypairs of fields with an inverse relationthese arecommonly used to provide the ability to traverse adata structure both forward and backward.

    . Reachability: There are two key issues in computingthe reachability information: using an efficientincremental reachability algorithm and choosingthe correct set of variables to include in the roleseparation criteria. Whenever the program changes areference, the incremental reachability algorithmfinds the object whose reachability properties mayhave changed, and then incrementally propagates

    the reachability changes through the reconstructedheap. We discuss the reachability algorithm ingreater detail in Section 3.4.

    Programs often use temporary variables to tra-verse or manipulate a data structure. Using tempor-ary variables to separate objects into different roleswould have the effect of separating objects intodifferent roles with meaningless distinctions. Suchroles would likely make navigating the generatedrole abstractions more difficult. To avoid undesirableseparations caused by temporary variables, wedeveloped two rules to identify variables that are

    the roots of data structures. We believe that theconceptually important references to data struc-tures are likely to be older. Furthermore, the entiredata structure is likely to be reachable from these

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 311

    Fig. 6. Enhanced method interface for SocketInputStream initializer.

    7. Available at http://flex-compiler.csail.mit.edu/.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    8/20

    conceptually important references. We designed thefollowing two rules to eliminate references whichare not likely to be conceptually important: If anobject o is reachable from variables x and y thatpoint to objects ox and oy, respectively, and ox isreachable from y but oy is not reachable from x, thenwe exclude x from the role separation criteria.Alternatively, if ox is reachable from y, oy is reach-

    able from x, and the reference y was created beforethe reference x, we exclude x from the criteria.

    These rules keep temporary references used fortraversing heap structures from becoming part of therole definitions, but allow long-term references tothe roots of data structures to be incorporated intorole definitions. These rules also have the propertythat if an object is included in two disjoint datastructures with different roots, then the objects rolewill reflect this double inclusion.

    In theory, these rules can fail to extract theconceptual root reference of a data structure for cyclic

    data structures if some extraneous reference to thedata structurewas created before theroot reference, orif some other data structure references the same objectthat the conceptual root variable references. Further-more, these rules can lead to extraneous roots if atemporary reference exists to the root object. Inpractice, we believe that these rules will rarely missincluding a reference from root variable and will notinclude many extraneous root variables.

    . Method Invocation History: Whenever an object ispassed as a parameter to a method, the analysisrecords the invocation as part of the objects method

    invocation history. This record is then used toevaluate method invocation history predicates whenassigning future roles to the object.

    . Array Roles: We treat arrays as objects with a special[] field, which points to the elements of the array.Additionally, we generalize the treatment of refer-ence-to relations to allow roles to specify the classesand the corresponding number (up to some bound)of the arrays elements.

    By default, the analyzer evaluates these predicates onevery object whose role may have changed since the lastmethod entry or exit point. Whenever an object is observed

    to transition from one role to another, the role change isrecorded along with the method that performed the change.This role transition information is used to construct the roletransition diagram and is also presented in the enhancedmethod interfaces. Furthermore, the roles of the objects thatreference or are referenced by the object are recordedwhenever a new reference is created or an object changesroles. This referencing information between roles is used toconstruct the role relationship diagram. When a methodperforms a write, changes an objects role, or obtains areference to an object, this information is recorded for use inthe enhanced method interface.

    We allow the developer to coarsen the granularity of roleevaluation by declaring methods atomic, in which case theanalysis attributes all role transitions that occur inside themethod to the method itself. When a method is declared

    atomic, the analysis does not compute the roles of objects formethods that the atomic method (transitively) invokes. Thisis implemented by not checking for role transitions until theatomic method returns. This mechanism hides temporary orirrelevant role transitions that occur inside the method. Thisfeature is most useful for simplifying role transitiondiagrams. In particular, many programs have a complicatedprocess for initializing objects. Once we use the role

    transition diagram to understand this process, we oftenfind it useful to abstract the entire initialization process asatomically generating a fully initialized object.

    3.2 Multiple-Object Data Structures

    A single data structure often contains many componentobjects. Java HashMap objects, for example, use an array oflinked lists to implement a single map. To enable thedeveloper to view such composite data structures as asingle entity, our dynamic analysis supports operations thatmerge multiple objects into a single entity. Specifically, thedynamic analysis can optionally recognize any objectplaying a given role (such roles are called part roles) asconceptually part of the object that refers to it. The userinterface will then merge all of the role information from thepart role into the role of the object that refers to it.

    Depending on the task at hand, different levels ofabstraction may be useful to the developer. For example,the developer may be attempting to understand the use of aparticular class and desire to see role information for justthat class. Or the developer may be interested in under-standing how an object of a particular class interacts withobjects it references, and may like to see role information forthe combination of multiple objects. On a per class basis, thedeveloper can specify whether to merge one objects role

    into another objects role. Furthermore, the developer canspecify a default policy for the classes for which thedeveloper does not explicitly specify a policy. This defaultpolicy allows the developer to only specify the policies forthe classes in which the developer is currently interested.This approach is especially important for large programs,given the potential developer overhead associated withexplicitly specifying a policy for every class.

    The analysis provides four different policies: nevermerge, always merge, merge only if one heap reference tothe object ever exists, and merge only if at most one heapreference at a time exists to the object. The analysisimplements these policies by examining the execution trace

    that the instrumented application generates. Note that themerge policies are based on properties that depend, ingeneral, on the entire trace: any partial examination of thetrace will, in general, be unable to determine that theunexamined part of the trace does not create multiple ormultiple simultaneous references to a given object.

    The analysis therefore uses a two-pass approach tomerge multiple objects into a single entity. The first passapplies the merge policies to determine which objects tomerge into the objects that refer to them; the second passuses the list of merged objects from the first pass toappropriately assign the roles for merged objects.

    3.3 Method Effect Inference

    For each method execution, the dynamic analysis recordsthe reads, writes, and role transitions that the execution

    312 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    9/20

    performs. There is a method effect summary for eachcombination of a method and an assignment of roles to theparameters of the method. The analysis combines theresults of all invocations of a method with the sameassignment of roles to the parameters. Each method effectsummary uses regular expressions to identify paths to theaccessed or affected objects. These paths are identifiedrelative to the method parameters or global variables and

    specify edges in the heap that existed when the method wasinvoked. Method effect inference therefore has two steps:detecting concrete paths with respect to the heap at methodinvocation and summarizing these paths into regularexpressions.

    To detect concrete paths, we keep a path table for eachmethod invocation. This table contains the concrete path, interms of the heap that existed when the method wasinvoked, to all objects that the execution of the method mayaffect. The path table includes not only objects that themethod may read or write, but also any objects that themethod may cause to transition to a different role. Since

    reachability changes may change the role assignment of anobject, the table must include a concrete path to any objectswhose reachability information may change. At methodinvocation, our analysis records the objects to which theparameters and the global variables point. Whenever theexecution retrieves a reference to an object or changes anobjects reachability information, the analysis records a pathto that object in the path table.8 If the execution creates anew object, the tool adds a special NEW token to the pathtable; this token represents the path to that object.

    The tool obtains the regular expressions in the methodeffect summary by applying a set of rewrite rules to the

    extracted concrete paths. These concrete paths consist of astarting point, a parameter of the method, or a globalvariable, and a list of fields that give the concrete path to theobject in terms of the heap at invocation time. Fig. 7 presentsthe current set of rewrite rules. Given a concrete pathf1:f2 . . . fn, we apply the rewrite rules to the tupleh; f1:f2 . . . fni to obtain a final tuple hQ; i, where Q is theregular expression that contains the concrete path and represents an empty concrete path or regular expression.We present the rewrite rules in the order in which they areapplied. We use the notation that f denotes the class inwhich the field f is declared as an instance variable, andf is the declared type of the field f. In addition to theserules, our tool uses a set of rules to determine whether tworegular expressions can be merged. If the regular expres-sions of two of the same effects can be merged, the effectsare merged.

    Rules 1 and 2 simplify intermediate expressions gener-ated during the rewrite process. Rules 3 and 4 generalizeconcrete paths involving similar fields such as pathsthrough a binary tree. Rules 5 and 6 generalize repeated

    sequences in concrete paths. The goal is to capture paths

    generated in loops or recursive methods and ensure that

    path expressions are not overly specialized to any particular

    execution.For example, consider the concrete path f:g:f:g:f:g,

    where field f is declared in type G and references an object

    of type F and fields g and h are declared in type F and

    reference an object of type G. The initial state for the rewritealgorithm is h; f:g:f:g:f:hi. The algorithm begins by

    applying rule 7 to this state four times to generate the state

    hf:g:f:g;f :hi. The algorithm next applies rule 5 to generate

    the state hf:g;f:hi. The algorithm then applies rule 7 two

    more times to generate the state hf:g:f:h; i. Finally, the

    algorithm applies rule 6 to generate the final state

    hf:gj h; i.When a method performs a read operation on an object

    or causes the role an object plays to change, the analysis

    records the change as a read effect or a role transition effect.

    The analysis also records an expression that identifies theobjects involved in the operation in terms of a path through

    the heap. The expression gives the starting point of the path

    (either a parameter of the method or a global variable) and a

    regular expression that summarizes the sequence of fields

    in the path. When a method performs a write, the analysis

    records a write effect and similar path information as for a

    read effect, specifically, the field the method wrote and the

    path expressions for both object containing the field and the

    object reference written to the field.Finally, the inference algorithm must also recognize

    object creations and writes of null references to object fields.

    We use the NEW token to denote objects created during the

    methods invocation. We use the NULL token to denote

    writing a null reference to an objects field.

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 313

    8. The tool calculates concrete paths by tracking how the method obtainsa reference to the objects in the heap. Whenever the method first obtains areference to an object, the tool records, in a table, the field and the object thatthe method dereferenced. If a change to the heap affects an objects

    reachability, the tool records the path information for that object when thetool updates the reachability information for that object. The tool uses thistable to efficiently generate paths from the parameters of the method andthe global variables to the objects. These paths refer to version of the heapthat existed when the method was invoked.

    Fig. 7. Rewrite rules for paths.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    10/20

    3.4 Incremental Reachability Algorithm

    Our results indicate that most methods make relatively smallchanges to the heap. An incremental approach to computingreachability should therefore be more efficient than comple-tely recomputing the reachability information whenever it isneeded. Our tool records a list of changes performed to theheap since the last reachability computation.

    When computing reachability, the tool starts by proces-sing the list of removed references in the heap. For eachremoved reference, it marks the destination object aspossibly unreachable from the set of local and globalvariables that both the destination and origin objects werepreviously reachable from. This set contains the rootvariables from which the destination of the removedreference may no longer be reachable. After processingthe list of removed references, the tool propagates thepossibly unreachable sets of roots through the heap in thedirection of the references in the heap. During thispropagation step, the tool checks that the propagation stepnever marks an object directly referenced by a local variable

    or global variable as possibly unreachable from the samelocal or global variable.

    The tool next adds any new references from local orglobal variables to the reachability sets of the objects that itreferences. Finally, a workset algorithm propagates reach-ability information through the heap. The workset initiallyincludes any objects that reference the previous destinationsof any removed references and any objects that are thesources of any new references. The workset algorithm thenpropagates the reachability information through the heap.

    Formally, we represent the heap as a set of objects O, aset of heap reference edges E O F O, where F is the

    set of fields and array indices, and a set of variablereferences L V O, where V is the set of local and globalvariables. We represent the reachability information usingthe reachability set R O Vthat maps an object to the setof variables from which the object can be reached. Wedefine Ro fv j hv; oi 2 Rg.

    The incremental analysis takes as input a set of removedheap references ER O F O; a set of removed variablereferences LR V O; a set of newly created heapreferences ENew O F O; a set of newly createdvariable references LNew V O; a tuple containing theset of objects, the current heap references, and the current

    variable references hO;E;Li; and the reachability set R.Fig. 8 presents the incremental reachability algorithm.

    The incremental reachability algorithm internally uses theworkset S O to store the objects whose reachabilityinformation may need to be propagated. The algorithm begins by initializing this set to the empty set. Thealgorithm next calls the ProcessRemovedReferences proce-dure. The ProcessRemovedReferences procedure internallyuses a workset K O Vof tuples comprised of an objectand a label to maintain a list of variables that an object mayn o l o n g e r b e r e a c h a b l e f r o m . W e d e f i n eKo fv j ho; vi 2 Kg. The procedure initializes the set K

    in lines 2-5 using the set of heap references and variablereferences that have been removed since the last reach-ability computation. The algorithm next loops through eachobject o that serves as a key in K. The algorithm then adds

    any objects that reference o to the set S of objects whosereachability information may need to be propagated. Thealgorithm then looks up in K the variables from which theobject o may no longer be reachable. The algorithm nextloops through all of the objects that are referenced by theobject o and propagates this list of variables from which theobject may possibly be unreachable.

    The incremental reachability algorithm next calls theProcessNewReferences procedure. This procedure propa-gates reachability information for both new references andany references that were mistakenly removed in theprevious procedure. Lines 1-4 process any newly created

    references from variables. These lines add the destination ofthese references to the workset S and the new reference toreachability information set R. Lines 5-6 process the newlycreated heap references in a similar manner. Finally, thealgorithm loops through the objects in set S to propagatetheir reachability information to any objects they reference,and then adds these referenced objects to S.

    The overhead of the increment algorithm is determined by the number of heap references through which thealgorithm must propagate the possibly unreachable set ofvariables and the number of heap references through whichthe algorithm must propagate the reachability information.

    In the worst case, the incremental algorithm does notperform better than a standard reachability algorithmitcan take time proportional to the number of references inthe heap times the number of global and local variables that

    314 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    Fig. 8. Incremental reachability algorithm.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    11/20

    reference the heap. In practice, we expect that manymethods will make changes that require propagatingreachability information through only a small part of theprograms heap, and therefore, that the algorithm willperform much better than the worst case bound.

    3.5 Multiple Executions

    Our tool supports the analysis of traces from multiple

    executions. We have architected our multiple tracesupport as follows: the traces are processed individually by the analysis and then the Web frontend merges theanalysis results for the individual traces into a singlemerged result. The benefit of this approach is that itenables our implementation to parallelize the analysis ofthe traces. The basic approach is straightforward. Sincethe role transition diagrams capture the range of possiblebehaviors of an object, the combined role transition diagramis simply the union of the role transition diagrams for theindividual traces. Similarly, the combined role relationshipdiagram is simply the union of the role relationship

    diagrams for the individual traces. While the currentlyimplemented Web interface does not currently process theenhanced method interfaces, it is conceptually straightfor-ward to combine enhanced method interfaces from differ-ent traces. The algorithm would simply take the union ofthe enhanced method interfaces from the individual traces.If the same enhanced method interface appears in multipletraces, the algorithm would take the union of the read andwrite effects from different instances of the same enhancedmethod interface.

    3.6 Role Subspaces

    Our tool allows the developer to define multiple rolesubspaces and to modify the role separation criteria for eachsubspace as follows:

    . Fields: The developer can specify fields to ignore forthe purpose of assigning roles. The analysis willshow these fields in the role relationship diagram,but the references in these fields will not affect theroles assigned to the objects.

    . Methods: The developer can specify which methodsand which parameters to include in the roleseparation criteria.

    . Reachability: The developer can specify variables to

    include or to exclude from the reachability-basedrole separation criteria.

    . Classes: The developer can collapse all objects of agiven class into a single role.

    In practice, we have found role subspaces both usefuland usableuseful because they enabled us to isolate theimportant aspects of relevant parts of the system whileeliminating irrelevant and distracting detail in other parts,and usable because we were usually able to obtain asatisfactory role subspace with just a small number ofchanges to the default criteria.

    4 USER INTERFACE

    The user interface presents four kinds of web pages:9 classpages, role pages, method pages, and the role relationship

    page. Each class page presents the role transition diagramfor the class. From the class page, the developer can click onthe nodes and edges in the role transition diagram to see thecorresponding role and method pages for the selected nodeor edge. Each role page presents a role definition,displaying related roles and classes and enabling thedeveloper to select these related roles and classes to bringup the appropriate role or class page. Each method page

    shows the developer which methods called the givenmethod and allows the developer to configure method-specific abstraction policies. The role relationship pagepresents the role relationship diagram. From this diagram,the developer can select a role node to see the appropriaterole definition page.

    The user interface allows the developer to create andmanipulate multiple role subspaces. The developer cancreate a new role subspace by selecting the set of predicatesto capture the desired role separation criteria. The devel-oper can then define a view, which allows the developer tosee the role transition diagrams, the role relationship

    diagrams, and the enhanced method interfaces generatedusing one or more role subspaces. Views with a singlesubspace use the role separation criteria from that subspace.Views with multiple subspaces use a cross-product opera-tor to combine the roles from the different subspaces, withthe set of roles appearing in diagrams isomorphic10 to thoseobtained by taking the union of the role separation criteriafrom all of the subspaces. Within a view, the developer canidentify additional role subspaces to be used for labelingpurposes. These role subspaces do not affect the separationof objects into roles, but rather label each role in the viewwith the roles that objects playing those roles have in these

    additional labeling subspaces.

    5 EXPLORATION STRATEGY

    As we used the tool, we developed the following strategyfor exploring the behavior of a new program. We believethis strategy is useful for structuring the process of usingthe tool and that most developers will use some variant ofthis strategy.

    When we started using the tool on a new program, wefirst recompiled the program with our instrumentationpackage, and then ran the program to obtain an executiontrace. We then used our graphical tool to browse the roletransition diagrams for each of the classes, looking forinteresting initialization sequences, splits in the role transi-tion diagram indicating different uses for objects of theclass, and transition sequences indicating potential changesin the purpose of objects of the class in the computation.

    During this activity, we were interested in obtaining a broad overview of the actions of the program.11 Wetherefore often found opportunities to appropriately sim-

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 315

    9. We chose a Web interface because it provides a convenient, reliable,cross platform mechanism for communicating the results of the analysis.The web interface allows us to easily support viewing multiple pages atonce, linking nodes in graphs to descriptions of the underlying roles, and

    would enable us to easily link analysis results to online resources.10. There exists a one-to-one and onto mapping between the roles

    appearing in a view with multiple subspaces and the roles appearing in asubspace that has the union of the role separation criteria from all thesubspaces as its role separation criteria.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    12/20

    plify the role transition diagrams, typically by creating arole subspace to hide irrelevant detail, by declaringinitializing methods atomic, or by utilizing the multipleobject abstraction feature. Occasionally, we found opportu-nities to include aspects of the method invocation historyinto the role separation criteria. We found that our defaultpolicy for merging multiple object data structures into asingle data structure for role presentation purposes workedwell during this phase of the exploration process.

    Once we had created role subspaces revealing roles at anappropriate granularity, we then browsed the enhancedmethod interfaces to discover important constraints on theroles of the objects passed as parameters to the method.This information enabled us to better understand thecorrelation between the actions of the method and the roletransitions, helping us to isolate the regions of the programthat performed important modifications, such as insertionsor removals from collections. It also helped us understandthe (otherwise implicit) assumptions that each methodmade about the states of its parameters. We found thisinformation useful in understanding the program; webelieve that maintainers will also find it useful.

    We next observed the role relationship diagram. Thisdiagram helped us to better understand the relationships between classes that work together to implement a givenpiece of functionality. In general, we found that thecomplete role relationship diagram presented too muchinformation for us to use it effectively. We thereforeadopted a strategy in which we identified a starting classof interest, then viewed the region surrounding the roles ofthat class. We found that this strategy enabled us to quickly

    and effectively find the information we needed in the rolerelationship diagram.Finally, we sometimes decided to explore several roles in

    more detail. We often returned to the role transitiondiagram and created a customized role subspace to exposemore detail for the current class but less detail for lessrelevant classes. In effect, this activity enabled us to easilyadapt the system to view the program from a morespecialized perspective. This multiple-level approach toprogram understanding is well known; developers oftenuse bottom-up [23] and top-down [2] approaches forunderstanding software. Given our experience using this

    feature of our role analysis tool, we believe that this abilitywill prove valuable for any program understanding tool.

    6 EXPERIENCE

    We next discuss our experience using our role analysis toolto explore the behavior of several Java programs. We reportour experience for several programs: Jess, an expert systemshell in the SPECjvm98 benchmark suite; Direct-To, a Javaversion of an air-traffic control tool; Tagger, a text formattingprogram; Treeadd, a tree manipulation benchmark in the

    JOlden benchmark suite;12 and Em3d, a scientific computa-tion in the JOlden benchmark suite.

    6.1 Jess

    Jess first builds a network of nodes, then performs acomputation over this network. While the network containsmany different kinds of nodes, all of the nodes exhibit asimilar construction and use pattern. To generate a trace to

    analyze, we simply selected one of the Jess exampleproblems included with the Jess distribution and ran theinstrumented version of Jess on that problem to produce thetrace. We analyzed the trace for Jess with our tool, and theninvestigated the role transition diagrams for the classes.From the quick overview of role transition diagrams, itappeared to us that the Node structures used by Jess wouldbe the most interesting to a developer.

    Consider, for example, objects of the Node1TELN class.Fig. 9 presents the role transition diagram for objects of thisclass. An examination of this diagram and the linked roledefinitions shows that during the construction of the

    network, the program represents the edges between nodesusing a resizable vector of references to Successor objects,each of which is a wrapper around a node object. The succfield refers to this vector. When the network is complete, theprogram constructs a less flexible but more efficientrepresentation in which each node contains a fixed-sizearray of references to other nodes; the _succ field refers tothis array. This change occurs when the program invokesthe freeze method on the node. For this benchmark, weused two different test cases to generate two differentexecution traces. We discovered in the second executiontrace that Jess can create node objects, but not freeze them.

    We used the multiple execution functionality of our tool tocombine the traces from the two executions to generateFig. 9. Because the names of the fields in the program wereinformative, the automatically generated role names werevery helpful. Only minimal renaming was done for thepurpose of aesthetics.

    The generated extended method interfaces provideinformation about the assumptions that several keymethods make about the roles of their parameters.Specifically, they show that the program invokes theCallNode method (this method implements the primarycomputation on the network) on a node only after thefreeze

    method has converted the representation of theedges associated with the node to the more efficient form.This invocation sequence constraint could also be deter-mined using specification mining techniques [1].

    The role definitions also provide information about thenetworks structure, specifically that all of the nodes in thenetwork have either one or two incoming edges. Each fullyconstructed object of the Node1TELN, Node1TECT, No-de1TEQ, NodeTerm, or Node1TMF class has exactly oneSuccessor object that refers to it, indicating that thesekinds of nodes all have exactly one incoming edge. Eachfully constructed object of the Node2 class, on the otherhand, has exactly two references from Successor objects,indicating that Node2 nodes have exactly two incoming

    316 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    11. We expect that many developers will be interested in understanding

    or debugging a particular aspect of the program. We believe that our toolwill be useful for this purpose. In fact, we believe that much of theinformation we obtained when obtaining a broad overview would usefulfor developers debugging or modifying very specific pieces of theapplications.

    12. Available at ftp://ftp.cs.umass.edu/pub/osl/benchmarks/jolden.tar.gz.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    13/20

    edges.

    6.2 Direct-To

    Direct-To is a prototype Java implementation of a compo-nent of the Center-Tracon Automation System (CTAS) [13].The tool helps air traffic controllers streamline flight paths by eliminating intermediate points; the key constraint is

    that these changes should not cause new conflicts, whichoccur when aircraft pass too close to each other. We ranDirect-To on a short input file consist of a few aircraft togenerate a trace. We looked at the role transition diagramsfor the different classes and identified the Flight class as acentral class in the computation.

    We first discuss our experience with the Flight class,which represents flights in progress. Fig. 10 presents therole transition diagram for the Flight class. Each Flightobject contains references to other objects, such as Flight-Plan objects and Route objects, that are part of its state.Our analysis recognized these other objects as part of the

    corresponding Flight objects state, and merged all ofthese objects into a single multiple object data structure.

    Roles helped us understand the initialization sequenceand subsequent usage pattern of Flight objects. An exam-ination of the role transition diagram reveals that aninitialized Flight object has been inserted into the flightlist; various fields of the object refer to the objects thatimplement the flights identifier, type, aircraft type, andflight plan. Once initialized, the flight is ready to participatein the main computation of the program, which repeatedlyacquires a radar track for the flight and uses the track andthe flight plan to compute a projected trajectory. The

    initialization sequence is clearly visible in the role transitiondiagram, which shows a linear sequence of role transitionsas the flight object acquires references to its part objects andis inserted into the list of flights. The acquisition and

    computation of the tracks and trajectories also show up astransitions in this diagram. Because the initialization ofFlight class objects is performed by the combined actionsof several methods from different classes, discovering theinitialization sequence by simply examining the code is notstraightforward.

    Roles also enabled us to untangle the different ways in

    which the program uses objects of the Point4d class.Specifically, the program uses objects of this class torepresent aircraft tracks, trajectories, and velocities. Thisdistinction is useful, because the operations that are validon a Point4d object used in a velocity are different fromthe operations that are valid on a Point4d object used by atrajectory. For example, multiplying a Point4d object usedas a velocity by time is a legal operation while the sameoperation performed on a Point4d object used as aposition in a trajectory is nonsensical. The role transitiondiagram makes these different uses obvious; each usecorresponds to a different region of roles in the diagram. No

    transitions exist between these different regions, indicatingthat the program likely uses the corresponding objects fordisjoint purposes.

    6.3 Tagger

    Tagger is a document layout tool written by Daniel Jackson.It processes a stream of text interspersed with tokens thatidentify when conceptual components such as paragraphs begin and end. Tagger works by first attaching actionobjects to each token, and then processing the text andtokens in order. Whenever it encounters a token, it executesthe attached action. To generate a trace for Tagger, we

    simply ran the instrument version on the example fileincluded with Tagger.

    It turns out that there are dependencies between theoperations of the program and the roles of the actions and

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 317

    Fig. 9. Role transition diagram for the Node1TELN class.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    14/20

    tokens. For example, one of the tokens causes the output ofthe following paragraph to be suppressed. Tagger imple-ments this output suppression with a matched pair ofactions: a suppress action and a corresponding unsuppress

    action. Figs. 11 and 12 give the role transition diagrams forthe unsuppress action and the suppress action, respectively.When the suppress action executes, it places an unsuppressaction at the end of the paragraph, ensuring that only oneparagraph will be suppressed. These actions are reflected inrole transitions as follows: When the program binds thesuppress action to a token, the action takes a transitionbecause of the reference from the token. When the suppressaction executes, it binds the corresponding unsuppressaction to the token at the end of the paragraph, causing theunsuppress action to take a transition to a new state. Roles,therefore, enabled us to discover an interesting correlation between the execution of the suppress action and data

    structure modifications required to undo the action later.This is visible in role transition diagram for the unsuppressaction classthe unsuppress object transitions from theunbound role (STANDARD unsuppress_action Stan-

    dardEngine$16 w/ generator & this) to the bound

    role (STANDARD element unsuppress_action Stan-

    dardEngine$16 w/ generator & this) when the

    suppress action is performed and transitions back to the

    unbound role when the unsuppress action is performed.We were also able to observe a role-dependent interface13

    the method that executes actions always executes actions

    that are bound to tokens.

    6.4 Treeadd

    Treeadd builds a tree ofTreeNode objects; each such object

    has an integer value field. It then calculates the sum of the

    values of the nodes. The role analysis tool extracted some

    interesting properties of the data structure and gave us

    insight into the behavior of the parts of the program that

    construct and use the tree. To generate a trace file for

    Treeadd, we simply ran the Treeadd benchmark.

    318 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    13. A role-dependent interface is a method that expects its parameters tohave a certain role.

    Fig. 10. Role transition diagram for the Flight class.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    15/20

    Fig. 14 presents the region of the role relationshipdiagram that contains the roles of TreeNode objects. Byexamining this diagram, enhanced method interfaces, andthe linked role definitions, we were able to determine thatthe structure returned by the tree construction method did,in fact, comprise a treethe tree construction methodreturns a root TreeNode object playing the role TreeNode

    w/ right & left, which according to the role definitionhas no references from left or right fields of otherTreeNode objects. The other TreeNode roles have exactlyone reference from the left or right field of anotherTreeNode. Combining these two pieces of information,

    allows us to infer that the structure returned by the treeconstruction method is a tree.

    Fig. 13 presents the role transition diagram for Tree-Node objects. This diagram, in combination with the linkedrole definitions, clearly shows a bottom-up initializationsequence in which each TreeNode acquires a left child anda right child, then a reference from the right or left field

    of its parent. Alternative initialization sequences produceTreeNode objects with no children. Note that the auto-matically generated role names in this figure are intended tohelp the developer understand the referencing relationshipsthat define each role. The role name right TreeNode w/right & left, for example, indicates that objects playingthe role have 1) a reference from the right field of an objectand 2) non-null right and left fields. The role nameTreeNode w/left indicates that an object playing this rolehas a non-null left field.

    6.5 Em3d

    Em3d simulates the propagation of electromagnetic wavesthrough objects in three dimensions. It uses enumeratorsextensively in two phases of the computation. The firstphase builds a graph that models the electric and magneticfields; the second phase traverses the graph to simulate thepropagation of these fields. Fig. 15 gives the role transitiondiagram for the Node1Enumerate class. The role transitiondiagram for the enumerator objects contains roles corre-sponding to an initialized enumerator, an enumerator withremaining elements, and an enumerator with no remainingelements. As expected, the program never invokes the nextmethod on an enumerator object that has no remainingelements, enabling the developer to verify that the program

    uses enumerator objects in a standard way.

    6.6 Utility of Roles

    In general, roles helped us to discover key data structureproperties and understand how the program initialized andmanipulated objects and data structures. The combinationof the role relationship diagram and linked role definitionstypically provided the most useful information about datastructure properties. Examples of these properties includethe referencing properties of TreeNode objects in theTreeadd benchmark and the correspondence betweenSuccessor nodes and network nodes in Jess.

    The role transition diagram typically provided the mostuseful information about object initialization sequences andusage patterns. Examples of object initialization sequencesinclude the initialization of Flight objects in the Direct-To

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 319

    Fig. 11. Role transition diagram for the unsuppress action class.

    Fig. 12. Role transition diagram for the suppress action class.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    16/20

    benchmark and ofTreeNode objects in the Treeadd bench-

    mark. Jess provides an interesting example of a conceptual

    phase transitionin a data structurethe program uses a more

    flexible but less efficient data structure during a constructionphase, then replaces this data structure with a more efficient

    frozen version for a subsequent computation phase. The

    Point4d class in Direct-To provides a good example of how

    a program canuseobjects of a singleclass for several different

    purposes in the computation. In all of these cases, the role

    analysis enabled us in a matter of minutes to understand the

    underlying initialization sequences or usage patterns.

    Finally, we found that the information about the rolesof method parameters helped us to understand theotherwise implicit expectations that methods have about

    the states of their parameters and the effects of methodson these states. Examples of methods, with importantexpectations or effects, include the freeze and Call-Node methods in Jess and the next method in Em3d. Ingeneral, we expect the role analysis tool to be useful inthe software development process in the following ways:

    . Program Understanding: Developers have to un-derstand programs to modify or reuse them. Inobject-oriented languages, we believe that under-standing heap allocated data structures is key to

    320 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 35, NO. 3, MAY/JUNE 2009

    Fig. 15. Role transition diagram for the Node1Enumerate class.

    Fig. 13. Role transition diagram for the TreeNode class.

    Fig. 14. Role relationship diagram for the TreeNode class.

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    17/20

    understanding the program. Roles help developersdiscover potential key data structure invariants andunderstand how programs initialize and manip-ulate these data structures, thus aiding programcomprehension.

    . Maintenance: To safely modify programs, devel-opers need to understand the data structures theseprograms build, the referencing relations methods

    assume, and the effects of methods on these datastructures. We expect that the diagrams and en-hanced method interfaces that our tool generateswill prove useful for this purpose.

    . Verifying Expected Behavior: We expect thatdevelopers could use our tool as a debugging aid.Developers write programs with certain invariantsabout heap structures in mind. If the role relation-ships that our tool discovers are inconsistent withthese invariants, the developer knows that a bugexists. The enhanced method interfaces and roletransition diagrams can also help the developer

    quickly isolate the bug.. Documentation: Developers often need to document

    high-level properties of the program. We believe thatroles may provide an effective documentationmechanism because they come with a set ofinteractive graphical representations, they can oftencapture key properties of the program in a concise,cognitively tractable representation, and (at least forthe roles that our analysis tool discovers) they areguaranteed to faithfully reflect some of the behaviorsof the program. Role subspaces may prove to beespecially useful in presenting focused, orthogonal,or hierarchical perspectives on the purposes of theobjects in the program.

    . Design: High-level design formalisms often focus onthe conceptual states of objects and the relationships between objects in these states. For instance, usecases can be thought of as executing a state transitionon the objects involved to update information in thesystem. UML class diagrams and state charts aremore obvious instantiations of such design formal-isms. Our role analysis can extract information thatis often similar to this design information, helpingthe developer to establish the connection betweenthe design and the behavior of the program.

    Furthermore, the role abstraction suggests severalconcrete ways of realizing high-level design patternsin the code. As developers become used to workingwith roles, they may very well adopt role-inspiredcoding styles that facilitate the verification of aguaranteed connection between the high-level de-sign and its realization in the program.

    6.7 Analysis Overhead

    We measured the instrumentation and analysis overheadsfor our tool on the benchmark applications. We ran each onbenchmark on a 2.2 GHz Core 2 Duo with 1 GB of RAM.Table 1 presents the overhead measurements for our

    benchmarks. For each benchmark, we report:

    1. the time for the normal, un-instrumented version toexecute,

    2. the time for the instrumented version to execute andstore the trace,

    3. the time for our analysis to process the trace, and4. the size of the trace.

    We compute that the slowdown due to instrumentation

    ranges from a low of 16x for the Direct-To benchmark to a

    high of 122x for Jess benchmark with an average of 51.6x.

    Note that because the instrumentation primarily records

    changes to heap referencing properties, the instrumentation

    overhead varies depending on how much data structure

    manipulation the application performs.We next compute the time taken to analyze the trace in

    seconds per megabyte of trace file. We found that Em3d

    took the most t ime per megabyte with a t ime of

    8.2 seconds/MB. We found that Treeadd took the least

    time per megabyte with a time of 0.211 seconds/MB. We

    noted that Jess and Direct-To have similar-sized trace files,

    but significantly different analysis times. We profiled the

    role inference analysis on these two benchmark applications

    to help understand the difference in analysis times. We

    found that computing the reachability information took55.5 percent of the analysis time for the Jess benchmark,

    while it only took 26.6 percent of the analysis time for the

    Direct-To benchmark.The reachability computations dominate the worst-case

    analysis time. In the worst case, the incremental reachability

    algorithm can take time proportional to the number of

    edges in the heap times the number of possible root

    variables. Since the tool recomputes reachability at every

    method boundary, the worst-case analysis time is propor-

    tional to the worst-case reachability bound times the

    number of method calls.

    In practice, we believe that the incremental reachabilityalgorithm will yield significant performance improvements

    for many benchmarks as it will only recompute reachability

    for small portions of the heap. In this case, the analysis time

    should be approximately proportional to the execution time

    of the program.The multiple object data structure functionality can

    theoretically affect the complexity of the analysis. If the

    developer were to use the always merge policy for classes

    that are instantiated many times to build large-linked data

    structures, the analysis could potentially have to propagate

    a role change for an object through many other objects. Inpractice, this is unlikely to pose a problem as this would

    generate a huge number of roles for such classes and the

    resulting graphs would be very difficult to interpret.

    DEMSKY AND RINARD: AUTOMATIC EXTRACTION OF HEAP REFERENCE PROPERTIES IN OBJECT-ORIENTED PROGRAMS 321

    TABLE 1Overhead Measurements

    Authorized licensed use limited to: Mochamad Hariadi. Downloaded on July 12, 2009 at 02:24 from IEEE Xplore. Restrictions apply.

  • 8/2/2019 01 Automatic Extraction of Heap Reference Properties in Object-Oriented Programs

    18/20

    7 RELATED WORK

    We survey related work in three fields: design formalismsthat involve the concept of abstract object states, programunderstanding tools that focus on properties of the objectsthat programs manipulate, and static analysis for auto-matically discovering or verifying properties of linked datastructures.

    7.1 Design Formalisms

    Early design formalisms identified changes in abstractobject or component states as an important aspect of thedesign of the program [26]. Our tool also focuses on abstractstate changes as a key aspect, but uses the role separationcriteria to automatically synthesize a set of abstract objectstates rather than relying on the developer to specify theabstract state space explicitly.

    Object models enable a developer to describe relation-ships between objects, both at a conceptual level and asrealized in programs. Object modeling languages such asUML [25] and Alloy [12] can describe the different statesthat objects can be in, the constraints that these statessatisfy, and the transitions between these states. One canview our role analysis tool as a way of automaticallyextracting an object model that captures the importantaspects of the objects that the program manipulates. In thissense our tool establishes a connection between the abstractconcepts in the object model and the concrete realization ofthose concepts in the objects that the program manipulates.

    The concept of objects playing different roles in thecomputation while maintaining their identity often arises inthe conceptual design of systems [11], and researchers haveproposed several methodologies for realizing these roles in

    the program [11], [10], [15]. Our role analysis tool canrecognize many of the design patterns used to implementthese roles, and may, therefore, help developers establish aconnection between an existing conceptual system designand its realization in the program. Conversely, our roleseparation criteria may also suggest alternate ways toimplement conceptual roles. In particular, previouslyproposed methodologies tend to focus on ways to tagobjects with (potentially redundant) information indicatingtheir roles, while the role separation criteria identify datastructure membership (which may not be directly obser-vable in the state of the object itself) as an important

    property that helps to determine the roles that theobject plays.

    7.2 Program Understanding Tools

    Daikon [8] extracts likely algebraic invariants frominformation gathered during the programs execution.For example, Daikon can infer invariants such asy 2x. Daikon handles heap structures in a limitedfashion by linearizing them into arrays under somespecific conditions [9]. Our work differs in that we handleheap structures in a much more general fashion and focuson referencing relationships as opposed to algebraic

    invariants. Furthermore, our tool can discover and com-municate the changes that occur to an objects state duringa programs execution. As a result, our tool allows thedeveloper to discover a rich new class of program

    invariants. This class of program invariants relate anobjects state and changes to the objects state to theprograms actions.

    Jinsight [4] also extracts information from a programsexecution. Jinsight allows the developer to see a histogramview, which shows a programs use of space and memoryon a class by class basis. It also provides two different viewsshowing method invocation information. Jinsight also

    provides pattern views for references and invocations ofthe execution. Jinsight appears to be most useful forunderstanding how a program uses computational re-sources and identifying opportunities for optimization.Furthermore, Jinsight may be useful for identifying codingerrors that leak computational resources. Our tool is moreuseful for understanding deeper properties of the objects ina program, and how these objects interact with theprograms code. We believe that our tool is useful forfinding subtle bugs in the use of objects.

    Womble [14] and Chava [16] both use a static analysis toautomatically extract object models for Java programs. Both

    tools use information from the class and field declarations;Womble also uses a se


Recommended