QVM: An Efﬁcient Runtime for Detecting Defects in Deployed … · 2018. 9. 5. · cations with...

QVM: An Efficient Runtime for Detecting Defectsin Deployed Systems

Matthew ArnoldIBM Research

Martin VechevIBM Research

Eran YahavIBM Research

AbstractCoping with software defects that occur in the post-deploymentstage is a challenging problem: bugs may occur only whenthe system uses a specific configuration and only under cer-tain usage scenarios. Nevertheless, halting production sys-tems until the bug is tracked and fixed is often impossible.Thus, developers have to try to reproduce the bug in labora-tory conditions. Often the reproduction of the bug consistsof the lion share of the debugging effort.

In this paper we suggest an approach to address the afore-mentioned problem by using a specialized runtime environ-ment (QVM, for Quality Virtual Machine). QVM efficientlydetects defects by continuously monitoring the execution ofthe application in a production setting. QVM enables the ef-ficient checking of violations of user-specified correctnessproperties, e.g., typestate safety properties, Java assertions,and heap properties pertaining to ownership.

QVM is markedly different from existing techniques forcontinuous monitoring by using a novel overhead managerwhich enforces a user-specified overhead budget for qualitychecks. Existing tools for error detection in the field usuallydisrupt the operation of the deployed system. QVM, on theother hand, provides a balanced trade off between the costof the monitoring process and the maintenance of sufficientaccuracy for detecting defects. Specifically, the overheadcost of using QVM instead of a standard JVM, is low enoughto be acceptable in production environments.

We implemented QVM on top of IBM’s J9 Java VirtualMachine and used it to detect and fix various errors in real-world applications.

Categories and Subject Descriptors D.2.5 [Testing andDebugging]General Terms Algorithms, Reliability

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.OOPSLA’08, October 19–23, 2008, Nashville, Tennessee, USA.Copyright c© 2008 ACM 978-1-60558-215-3/08/10. . . $5.00

1. IntroductionDespite increasing efforts and success in identifying andfixing software defects early in the development life cycle,some defects inevitably make their way into production. Thewide variety of deployment configurations and the diversityof usage scenarios is almost a certain guarantee that anylarge system will exhibit defects after it has been deployed.

Detecting and diagnosing defects in a production environ-ment remains a significant challenge. Failures in such envi-ronments might occur with low frequency and be virtuallyimpossible to reproduce. For example, a defect might occurdue to a specific concurrent interleaving, a specific lengthyuser interaction, or a slow resource leak that gradually de-grades system performance leading to an eventual crash.

Existing tools for diagnosing defects “in the wild” arelimited and usually incur an unacceptable overhead that sig-nificantly disrupts the operation of the deployed system. Onthe other hand, reproducing the failure in a test environment(if at all possible) may require considerable time and effort.

One way to detect rarely occurring defects is to contin-uously monitor a system for violations of specified correct-ness properties. For example, this can be achieved by usingglobal property monitors and local assertions. However, thetypical cost of these techniques prevents programmers fromwidely using them in production environments.

This work describes a runtime environment that is ableto detect and help diagnose defects in deployed systems.Towards this end, we present the Quality Virtual Machine(QVM), a runtime environment that uses the technology andinfrastructure available in a virtual machine to improve soft-ware quality. QVM provides an interface that allows soft-ware monitoring clients to be executed with a controlledoverhead. Based on this interface, we present three suchclients that continuously monitor application correctnessby using a combination of simple global property moni-tors (typestate properties) and assertions. In addition, QVMautomatically collects debug information which enables ef-fective defect diagnosis.

We implemented QVM on top of IBM’s J9 Java VirtualMachine. We used a number of large-scale real-world appli-

cations with QVM and found defects in many of them. Weexplain the design rationale behind QVM in Section 3.1.

1.1 Main ContributionsThe contributions of this paper include:

• QVM: a runtime environment targeted towards defectdetection and diagnosis in production systems.

• A novel overhead manager that enforces an overheadbudget on client analyses, while maintaining sufficientaccuracy for detecting defects.

• We introduce property-guided sampling and in particularobject-centric sampling, to collect sampled profiles whilepreserving correctness of the analysis.

• A lightweight interface that helps separate analysis clientsfrom the details of the underlying VM, and transparentlymanages overhead of these clients.

• We use this infrastructure to implement three representa-tive analysis clients: (i) tracking simple temporal safetyproperties and providing debug information; (ii) check-ing standard Java assertions; (iii) checking expressiveheap queries pertaining to object ownership.

• We implemented QVM on top of IBM’s production JavaVirtual Machine (J9). We used QVM as our standardday to day virtual machine, running a wide range ofapplications without a noticeable slowdown. We showthat QVM can be used to effectively detect defects insuch applications, and help diagnose them. In addition,we evaluate the overhead on the standard SPECjvm98and Dacapo benchmarks.

1.2 OverviewIn this section we provide a brief informal overview of QVMcomponents and our experimental evaluation.

Overhead Manager QVM allows the user to specify anoverhead that is considered acceptable for the current moni-toring environment. The maximum acceptable overhead maybe 5%-10% in a live deployed system, yet 100% overhead(factor of 2 slowdown) may be considered acceptable in atesting environment. Given an overhead budget, the QVMstrives to collect as much useful information as possiblefrom the executing program while staying within the speci-fied budget.

QVM Interface (QVMI) A performance-aware profiling/-monitoring interface that allows client analyses to remaindecoupled from the VM, while maintaining efficiency. Thedesign goal of this component is to enable development ofpowerful, yet efficient dynamic analyses. Technically, theoverhead manager and the QVMI work together to provideclients with a transparent adaptive overhead management.

Analysis Clients Using the QVM platform, we implementthree analysis clients as follows.

Typestate Properties: This analysis client enables thedynamic checking of typestate properties. Dynamic check-ing of typestate properties, as well as generalized multiple-object typestate, has been addressed before in Tracematches[3] and MOP [12]. We use the typestate client to demon-strate three contributions of our platform: (i) adaptive over-head management; (ii) collection of timing information fortypestate transitions; (iii) collection of additional detaileddebug information with low overhead.

Local Assertions: QVM allows efficient sampling of userassertions by intercepting standard Java assertions and man-aging their execution through the overhead manager.

Heap Probes and Operations: QVM enables the dy-namic checking of various global heap properties such asobject-sharing, ownership, thread-ownership and reachabil-ity. These properties are useful for both debugging and pro-gram understanding purposes.

Experimental Evaluation To evaluate the usability ofQVM in finding defects and diagnosing them, we focusedon typestate properties that correspond to resource leaks. Forthat purpose, we set QVM as the default JVM used in our en-vironment and used it to perform all of our daily tasks whilerecording its error reports. To further exercise QVM, weused a wide range of applications on a regular basis. Some ofthe applications considered are an instant-messenger (goim),newsfeed readers (feednread, rssowl), file management utili-ties (virgoftp, jcommander), large IBM internal applications,etc. For all of these applications, the overhead incurred byrunning them on top of QVM was unnoticeable to the user.

In some of our experiments (e.g., Azureus, virgoftp,goim), we investigated each report manually, diagnosed thecauses of the errors, and implemented fixes. For some ap-plications, our defect reports were confirmed by the devel-opment team, and our fixes were incorporated into the code-base.

To evaluate the usability of heap and local assertions, wehave added such assertions to a small number of applicationsand evaluate their effectiveness. The overhead of QVM isnot noticeable by the user while using interactive applica-tions, so we use the SPECjvm98 and Dacapo benchmarks toperform evaluate the overhead manager’s effectiveness.

2. Motivating ExampleAzureus [8] is an open-source implementation of the Bit-Torrent protocol. It supports several modes of user inter-action, all implemented using the Standard Widget Toolkit(SWT) [18]. Azureus is the #1 downloaded Java programfrom SourceForge, and has more than 160 million down-loads to date. Azureus plays the role of both a client and aserver for P2P file sharing, and is therefore a relatively long-running application.

Finding Bugs We run Azureus with QVM, monitoringvarious correctness properties, including possible SWT re-source leaks and IOStream leaks. Azureus runs on QVM

QVM ERROR:[Resource_not_disposed] object [0x98837030]of class [org/eclipse/swt/graphics/Image]allocated at site ID 2742 in method[com/aelitis/azureus/.../ListView.handleResize(Z)V]died in state [UNDISPOSED]with last QVM method [org/.../Image.isDisposed()Z]

Figure 1. A sample QVM error report for Azureus.

with no apparent slowdown. Over the course of few hours,we check the QVM logs and observe that some errors werereported.

Fig. 1 shows an example of an error reported by QVMwhile running Azureus. This is the actual error report as pro-duced by QVM where some package names have been ab-breviated. By itself, this error report provides useful infor-mation about the property being violated. In this case, thereported Image object has not been properly disposed be-fore it became unreachable. Failure to properly dispose suchSWT resources leads to leakage of OS-level resources andmay gradually hinder performance and even lead to a systemcrash. The error report of Fig. 1 provides the basic informa-tion necessary to track down the error: the method in whichthe object was allocated, the object’s last state, and the lastmethod invoked on the object.

Diagnosing the Cause The QVM error report above no-tifies the user that there is an error, but understanding thecause of the error and introducing a fix is still nontrivial. Theprogrammer needs to track the flow of the object throughthe program to identify why dispose was not called. To as-sist the programmer in this task, QVM provides additional,more detailed, debug information in the form of a typestatehistory. A typestate history for an object shows all the meth-ods that have been invoked with that object as a receiver,over the course of the object’s lifetime — from allocationto collection. For every method invocation, the invocationhistory collects the contexts in which it was invoked. (Weprovide a more elaborate description of the typestate historyin Section 5.1.)

To maintain a low runtime overhead, a typestate historyis only collected for some of the tracked objects. Wheneveran allocation site is identified as allocating a number of ob-jects that violate a property, QVM starts recording typestatehistories for a sample of objects allocated at that site. Thisobject-centric sampling is one of the features that makes itpossible to collect detailed debug information with low over-head.

Fig. 2 shows an example of a typestate history for anobject allocated at the same site as the object reported inFig. 1. The typestate history abstracts the history of methodsinvoked on the object. Technically, the typestate history isa directed graph with labeled nodes and labeled edges. Anode in the graph represents the state of the object after aspecific method has been invoked on it. There is a singlenode in the graph for each method invoked on the object

initial

Image.(Device;Rectangle;)V

Image.init(Device;II)V

1 : Image.init(Device;II)V

2 : Image.(Device;Rectangle;)V 3 : ListView.handleResize(Z)V

4 : ...

Image.internal_ new_GC(GCData;)I

1 : Image.internal_new_GC(GCData;)I

2 :GC.(Drawable;I)V 3 : GC.(Drawable;)V

4 : ListView.handleResize(Z)V

Image.internal_ dispose_GC(IGCData;)V

1 : Image.internal_dispose_GC(IGCData;)V

2 : GC.dispose()V 3 : ListView.handleResize(Z)V

4: ...

Image.isDisposed()Z

1 : Image.isDisposed()Z

2 : GC.drawImage(Image;IIIIIIII)V 3 : ListView$canvasPaintListener

.doPaint(Event;)V 4: ...

OD

1 : Image.(Device;Rectangle;)V

2 : ListView.handleResize(Z)V 3 : ListView$14.run()V

4 : ...

Figure 2. Sample typestate history for a single instance ofImage that was reported as non-disposed in Fig. 1. The fig-ure only shows a single sample stack trace for every methodinvoked on the object.

(summarizing all invocations of that method). A node in thegraph is labeled by the name of the invoked method, andby a set of (bounded) contexts — representing the contextsin which the method was invoked. An edge between nodesm1 and m2 in the graph represents the fact that the methodcorresponding to m2 has been invoked immediately after themethod corresponding to m1 has been invoked. Note thatthis directed edge only denotes the order in time betweenthe two methods. It does not say that m2 is called from m1.

Next, we show how we used the debug informa-tion provided by QVM in order to find the cause ofan error. In the example of Fig. 2, there are 5 meth-ods that have been invoked on the tracked object. First,the object is initialized by invoking the andinit methods. Then, a graphical context (GC) is cre-ated around the image (internal new GC) and disposed(internal dispose GC). Finally, isDisposed is invokedover the image. The method Image.dispose() that is re-quired for properly disposing the image is never invoked.

c l a s s Lis tView ex tends . . . {p r i v a t e Image imgView = n u l l ; / / . . .p r o t e c t e d void h a n d l e R e s i z e ( boolean bForce ) { / / . . .

i f ( imgView == n u l l | | bForce ) {imgView = new Image ( l i s t C a n v a s . g e t D i s p l a y ( ) , c l i e n t A r e a ) ;l a s t B o u n d s = new R e c t a n g l e ( 0 , 0 , 0 , 0 ) ;bNeedsRef re sh = t rue ;

} e l s e {/ / . . .}/ / . . .}}

Figure 3. Azureus code fragment leaking SWT Image ob-jects.

In this simple example, there is only one context in whicheach method has been invoked. The context is shown inside arectangle next to its corresponding graph node. Consideringthe contexts in which the methods in this example wereinvoked, we can see that most of the operations on thetracked object are performed through the handleResizemethod in which it was allocated. The only exception is thecall to isDisposed() which originates in a paint event ofthe list view.

We therefore focus our attention on the handleResizemethod in azureus.ui.swt.views.list.ListView.The typestate history serves as a guide to the execution inwhich the property was violated. Following the sequence ofcalls in the debug information we further focus attention tothe code excerpt shown in Fig. 3.

The problem in this method represents a common sourceof leaks: a new image is stored into the field imgView with-out properly disposing the previous image that was stored inthe field. In this example, handleResize mixes the case ofimgView == null (no previous image is known for takingprevious bounds) with the case of forced resize (bForce ==true). As a result, there are cases in which a new Imageis created without properly disposing the previous Imagestored in imgView.

The number of Image objects leaked as a result of thisbug directly depends on user interaction. Since this leak isassociated with a resize event, it may not occur in high-frequency. However, the cumulative effect of a large numberof small leaks may be fatal. In Section 7.1, we discuss addi-tional problems found on Azureus by QVM, and show thatsome of these occur very frequently and result in significantresource leaks.

Developing a Fix Now that we have diagnosed the bug asbeing caused by not disposing the old Image object storedin imgView, the question is how do we introduce a fix. Whatwe would like to do is to invoke dispose on the object storedin imgView before we stored the newly allocated image intothe field. Unfortunately, we do not know what is the sourceof the Image stored in imgView, and in particular, whetherthis image is shared with other GUI components. In SWT,it is common for resources such as images, fonts, and colors

p r o t e c t e d void h a n d l e R e s i z e ( boolean bForce ) { / / . . .i f ( imgView == n u l l | | bForce ) {assert ( ! QVM.isShared ( imgView ) ) ;i f ( imgView != n u l l && ! imgView . i s D i s p o s e d ( ) )imgView . d i s p o s e ( ) ;

imgView = new Image ( l i s t C a n v a s . g e t D i s p l a y ( ) , c l i e n t A r e a ) ;l a s t B o u n d s = new R e c t a n g l e ( 0 , 0 , 0 , 0 ) ;bNeedsRef re sh = t rue ;} e l s e {

/ / . . .}/ / . . .}

Figure 4. A fix to the Image leak in handleResize ofFig. 3

to be shared between multiple GUI components. The con-vention is that whoever allocates the resource is responsi-ble for its safe disposal. When we reach the point of allo-cating a new Image and storing it into imgView, we don’tknow whether the previous value of imgView was allocatedin this method. Furthermore, we don’t know whether otherGUI components are still using the image.

At this point, we leverage QVM’s heap assertions andcheck that the object pointed-to by imgView is not shared(i.e., does not have any references other than imgViewpointing to it). We introduce disposal code preceded byan assertion that makes sure that we are not disposing ashared resource. (The disposal of a shared resource mightend up crashing the application at a later point when theuser takes an action that uses the resource.) The modifiedhandleResize method is show in Fig. 4.

We now run the fixed version of this method with QVMfor a few weeks, and observe that the previously reportedleak does not occur. Our assertion also makes sure thatthe disposal of the Image does not affect any other GUIcomponent.

We reported this leak and its fix, as well as other prob-lems mentioned in Section 7, to the Azureus developmentteam. The problems were confirmed as real bugs, and oursuggested fixes were incorporated into the project’s code-base.

3. QVM PlatformIn this section we describe the QVM platform. First, we pro-vide some background and design rationale, then we brieflydescribe the overall QVM architecture and its main compo-nents. Finally, in Section 3.3, we describe the QVM interface(QVMI).

3.1 Design Rationale: Modifying a VMToday’s production-grade virtual machines employ sophisti-cated techniques and optimizations to achieve maximal ap-plication performance. In contrast, there is little support forapplication correctness in a production environment besideschecking low-level properties such as absence of null deref-erences and array index bounds. While rich in functionality,

current debug and monitoring interfaces (e.g., JVMTI) arealso not applicable as they incur a slowdown that is unac-ceptable in production mode.

The goal of this work is to extend a production-gradevirtual machine to provide software-quality services whilemaintaining competitive performance. We would like a so-lution to provide:

(I) high performance and low overhead

(II) maximal separation of analysis clients from the detailsof the underlying VM

There is an apparent tension between requirement (I)and (II). We resolve this tension by providing a genericinterface (QVMI) that manages functionality common toall analysis clients, but in addition, we allow clients to cutthrough abstraction layers and use other VM services whenappropriate.

However, our technique still requires VM modifications,and modifying a production-grade virtual machine is a non-trivial task. A virtual machine is a large, complex system.Moreover, implementing the quality services inside a spe-cific VM makes them non-portable and ties users of the sys-tem to the specific VM version. In contrast, using pure byte-code instrumentation at the language level or a standard pro-filing interface such as JVMTI [29] is portable across virtualmachines.

Despite these disadvantages, there are a number of ad-vantages in having at least part of an analysis reside withinproduction VM, as we describe below.

VM only information Having access to the runtime allowsthe client analyses to utilize information that is not readilyavailable at the language level. For example, analyses canuse free bits in object headers, directly examine the heap,quickly access structures like thread local storage, re-useexisting VM code (such as garbage collection heap traversallogic) to perform a slightly different functionality. Analysescan utilize low-level profile data and infrastructure, such ashardware performance monitors (HPM) and fine-granularitytiming (for example, see overhead monitor in Section 4).

Performance Having access to the dynamic optimizer(JIT) ensures that the critical code paths are well optimized.The JIT can also use advanced optimization techniques forfast and slow paths (thin guards [5], code patching [37], fullduplication [4], etc.). The system can also make use of pro-file data already collected by the VM to optimize and tune adynamic analysis.

Dynamic updating By using advanced techniques such ascode patching and on-stack replacement (OSR) [20], VMscan support efficient dynamic updating of instrumentationduring an application run.

Deployment The deployment process becomes trivial be-cause the required features become as ubiquitous as the VM.

Figure 5. Overall architecture of QVM.

There is no need to “install” an analysis (recompile the pro-gram source to add instrumentation, etc) which is partic-ularly difficult for large production application that mightmake heavy use of custom class loaders. Our analysis can berun by simply enabling a command line flag on the VM.

In the next section, we provide an overview of the QVMarchitecture and show how we hide the complexity of the un-derlying VM from most analysis clients by using the genericQVMI interface.

3.2 QVM ArchitectureFig. 5 shows the overall architecture of QVM. At a highlevel, the QVM extends the VM execution engine with threemain components:

1. QVM Interface (QVMI): A performance-aware profil-ing/monitoring interface that allows client analyses to re-main decoupled from the VM, while maintaining effi-ciency. The design goal of this component is to enablequick and easy development of powerful, yet efficient dy-namic analyses. QVMI is described in Section 3.3.

2. Overhead Manager (OHM): The overhead control sys-tem enables users to bound the overhead incurred byQVM clients. The system does fine-grained monitoringof the time spent in the clients and adapts the sampling tostay near or below overhead bounds. OHM is describedin Section 4.

3. QVM Clients: A flexible set of clients that leverageQVMI. In this paper we describe three example clientsthat enable checking of a variety of correctness propertieswith controlled overhead. Clients are discussed in Sec-tion 5.

In this architecture, the overhead manager and the QVMIwork together to provide clients with a transparent adaptiveoverhead management. The clients use QVMI without the

need to be aware of overhead management mechanisms (butwith the ability to partially control it when desired).

The OHM uses the information collected by QVMI toadjust the sample rate such that the overhead matches thedesired overhead specified by the user.

3.3 QVMI: The QVM InterfaceVarious profiling interfaces such as JVMTI make it easy towrite monitoring clients. The client specifies the events of in-terest, and these events are provided by the interface. Clientsare kept separate from the internal VM implementation thatcollects the events. Similarly, although our profiling clientsare packaged as part of the VM, keeping a clear abstrac-tion interface between the core VM details and the profil-ing clients is important for software engineering reasons, forboth maintenance and ease of adding additional clients in thefuture.

The primary limitation with existing and general profil-ing interfaces is performance. For example the granularity atwhich events are requested is too coarse. With existing in-terfaces such as JVMTI, if a client wants to receive methodcallbacks for some subset of the method invocations, it mustregister to receive callbacks for all method invocations, andfilter out the unnecessary callbacks on the client side of theinterface. This introduces significant overhead that is com-pletely unnecessary if the analysis needs only a subset of themethods.

Filtering on the VM side To address this problem, theQVM interface is designed such that an efficient implemen-tation is possible. The key difference from existing profilinginterfaces is that it is structured with the goal of allowing asmuch filtering as possible to occur on the VM side of theinterface. For example, if an analysis client needs methodcallbacks, it must specify what methods callbacks are neces-sary. This allows the remainder of the program to run at fullspeed. Similarly, the client may request method callbacksonly for a subset of the objects in the program. The VM canuse its suite of dynamic optimization techniques to achievean efficient implementation of the sampled profile.

Table 1 shows a partial list of the operations supportedby QVMI. Clients that register with QVMI have to support asimilar set of operations (as described below). In addition tothe operations listed in Table 1, QVMI has similar callbacksfor field read and writes, exceptions being thrown, and otherevents supported by standard interfaces such as JVMTI.

In the table, we separate operations of different stagesof the execution by double horizontal lines. The manner inwhich these operations are used is illustrated below.

On VM initialization Upon startup of the virtual machine,the clients have to register themselves with QVMI to receivecallbacks by calling registerClient.

On method compilation During the compilation of amethod, the VM queries the QVM agents to determine

whether the code being compiled needs any form of in-strumentation. This insures that maximal filtering occurs;instrumentation is not inserted on any program statements ifit is not required by at least one client.

This querying is done by invoking QVMI operations suchas isTrackedAlloc and isTrackedCallSite, whichquery all of the registered QVM clients to obtain aTrackLevel, which determines what level of instrumen-tation is needed. For example, for our typestate client, thecompiler prompts QVMI to check whether allocation ormethod call sites in the code should be tracked. Further de-tails on how the typestate client is implemented via QVMIis discussed in Section 6.2.

During execution Depending on the tracking-level, theVM fires events for tracked sites by invoking operations suchas allocEvent and invocationEvent. When an objectis collected by the garbage collector, QVMI is notified bycalling objectDeath.

3.4 Property-guided samplingOne of the major features provided by QVMI is the abilityto perform property-guided sampling. Sampling is a keymechanism QVM uses to reduce analysis overhead, but formany analyses using naive random sampling would renderthe analysis completely useless because the analysis relieson certain relationship between events.

For example, if a dynamic analysis detects files that areopened but not closed, and tracking of method invocationswere sampled randomly, QVM would report false positivesany time file open was sampled, but file close was not. Toaddress this problem, QVM performs property-guided sam-pling, ensuring that the sampled profile maintains sufficientproperties to make the dynamic analysis meaningful.

Object-centric sampling QVM supports a novel featurecalled object-centric sampling. This technique allows ananalysis to sample at the object instance level; an objectcan be marked as tracked and the analysis can receive allprofile events for this object, while receiving no events foruntracked objects. This allows overhead reduction via sam-pling, without destroying the profile properties needed forthe dynamic analysis to produce meaningful results.

We refer to the points in the execution at which samplingdecisions are made (ie, whether an object is tracked, whetheran assertion is executed) as origins.

Allocation sites are origins in our implementation ofobject-centric sampling. The decision of whether an objectis tracked is made at allocation time; if sampled, a bit is setin the object header to mark the object as tracked. A short in-lined code sequence checks this tracked bit on calls to QVMmethods to determine whether a callback is needed.

Method Descriptionvoid registerClient(Client c) Registers a client to receive callbacksTrackLevel isTrackedAlloc(AllocSite as) should the specified allocation site be trackedCallTrackLevel isTrackedCallSite(CallSite cs) should the specified call site be trackedboolean shouldExecute(Site s) should this site fire an event (based on sampling info)void allocEvent(AllocSite as) tracked allocation eventvoid invocationEvent(CallSite as) tracked invocation eventvoid objectDeath(Object o) object death event

Table 1. A Partial list of the operations supported by QVMI.

3.5 ExtensionsOur current interface is not intended to be complete, but issufficient to cover a broad range of clients, including thoseincluded in this paper. The clients we currently implementedare built as part of the VM, but the interface could also beexposed to enable external clients. A full spec that could bepublished as a performance-aware alternate to the JVMTI isleft for future work.

4. Overhead ManagerTraditional dynamic analyses typically operate under themodel that the user defines an analysis, then evaluates it todetermine whether the overhead is acceptable. The instru-mentation that is used to implement the analysis is fixed,and the overhead incurred is a function of the program thatis executed.

The QVM Overhead Manager, or OHM, reverses thismentality by allowing the user to specify an overhead thatis considered acceptable for the current monitoring environ-ment. The maximum acceptable overhead may be 5%-10%in a live deployed system, yet 100% overhead (factor of 2slowdown) may be considered acceptable in a testing envi-ronment.

Thus, the acceptable overhead is one of the inputs toQVM. Given an overhead budget, the QVM strives to collectas much useful information as possible from the executingprogram while staying within the specified budget. If themaximum overhead specified is too low, QVM may notreport any useful information. This is obviously not thedesired outcome, but in many cases it is more desirable thanlosing control of the overhead and having a performancecrisis as a result.

There are three components to the overhead manager,each of which are discussed in the sections that follow.

1. Monitoring: measures the overhead imposed by the QVMclients

2. Sampling strategy: a strategy for sampling each origin(e.g. allocation site or an assertion site) to ensure thesystem stays within the overhead budget

3. Controller: adjusts the sampling strategies for each originbased on the measured overhead

4.1 MonitoringThe overhead monitor uses fine granularity timers on entryand exit to all QVMI calls to record the time spent in QVMclients and in the QVMI itself. The time is maintained sep-arately for each origin (see Section 3.4) so that the samplerate of each origin can be adjusted independently.

Timer accuracy The most important step in managingoverhead is having the ability to measure overhead accu-rately. The overhead controller cannot be expected to makereasonable decisions if it is being given incorrect timing dataas input.

Measuring overhead for coarse grained events (such asgarbage collection time) is relatively easy; a number of sys-tem timing routines can be used to obtain reasonable results.However, timing short, frequently executed regions is moredifficult and requires having a timer that is both accurate andefficient.

Using an inefficient timer mechanism has two seriousproblems: 1) it can cause significant overhead if called fre-quently (which can be the case with some QVM clients), and2) the error can be significant when timing short regions andthese timing errors will accumulate.

To address these problems, our OHM implementationuses inline assembly to read the cycle counter using the In-tel’s RDTSC (Read Time Stamp Counter) instruction. Thismechanism results in very fast and accurate time stampingon entry and exit of the QVMI. Our initial implementationused the system call gettimeofday() and it created sig-nificant inaccuracies, as described in Section 7.

Measuring total application time The timers measuretime spent performing QVM tasks. To compute overheadrelative to the non-QVM application, the OHM must alsomeasure the total execution time. Using wall clock time,rather than process time, would be grossly incorrect for tworeasons. First, interactive applications would create signifi-cant error because idle time would be counted as applicationtime. Second, wall clock would be wrong for multi-threadedapplications running on multi-processor machines. QVMtime is measured and accumulated from all running threads,thus the total time must be the sum of the time spent execut-ing on all processors.

For these reasons, we compute total time by using thegetrusage() Linux system call to obtain the total timeused by the JVM process. This solves the problems associ-ated with using wall clock time discussed above and workswell in practice for most applications. However, it is still nota fully robust solution when QVM activity is not evenly dis-tributed across the application threads.

For example, consider an application with 2 threads run-ning for 1-second each in parallel on a 2-processor machine;getrusage() will report 2 seconds of total executiontime. Assume that QVM was given a 10% overhand budget,which translates to 0.2 seconds allocated to QVM. If all ofthe QVM callback activity takes place in one of the two ap-plication threads, one thread will run for 1.2 seconds whilethe other runs for 1 second. Although the total CPU time isincreased by 10% a user of the program would observe theprogram terminating after 1.2 seconds, a 20% increase.

The most robust solution to this problem is to performoverhead tracking at the thread-level. If overhead budgets aretracked and enforced per-thread, total overhead as perceivedby the user will always be within budget as well. A similarapproach of using per-thread metrics has been employed byreal time systems to track time spent performing systemservices [6]. We leave an implementation of this approachwithin QVM as part of future work.

Base overhead Even when accurately measuring the timespent in the QVM clients, there are still two potential sourcesof errors: 1) checking overhead, and 2) indirect effects.

The main sources of checking overhead is the inlinedfiltering. For example:

• virtual method calls (or inlined method bodies) for meth-ods relevant to QVM clients filter samples by checking abit in the object header.

• origin sites (i.e. allocation sites) check their samplingstrategy (described in Section 4.2) to decide whether theallocated object is tracked.

These checks are short inlined code sequences and con-tribute very little to overall overhead (see Section 7); how-ever, for very aggressive instrumentation, such as instru-menting all calls in the program, the base overhead can po-tentially become significant.

Although not easy to measure online while the applica-tion is executing, base overhead can be estimated by observ-ing the frequencies of the checks, and using a model of per-formance to estimate the overhead. Using a model is lessdesirable than direct measurement, but can still be used as away of avoiding large performance surprises for aggressiveclients. Our implementation does not yet perform this mod-eling to avoid large base overhead, and it is left as part offuture work.

The second source of base overhead is indirect effectson performance, such as cache pollution, or optimizationin the JIT that are hindered by the presence of instrumenta-

tion. These sources of overhead are very difficult to measurewithout having two separate versions of the code and usingtechniques such as performance auditor [25] to identify theperformance differences.

4.2 Sampling strategyThe QVMI maintains separate overhead statistics for eachorigin (see Section 3.4), allowing the OHM to increase ordecrease the sample rate independently for each origin. Hav-ing origin-specific sample rates enables significant advan-tages for the client analysis. Maintaining a single sample ratewould be sufficient for managing total overhead, but wouldbe likely to miss origins in infrequently executed code. Withorigin-specific sampling, the controller can reduce overheadby scaling back hot origin sites, but continues to exhaustivelytrack objects from cold sites, thus allowing the client analy-sis to see a broader view of the program execution. As shownin Section 7, this sampling strategy results in increased errorcoverage for a given overhead budget.

Our implementation achieves sampling by maintaininga sampleCounter and a sampleCounterReset for eachorigin. At runtime, the checking code at each origin sitedecrements and checks sampleCounter; if it is less thanzero, the origin is selected to be tracked and the counter isreinitialized by the value in sampleCounterReset.

The sampleCounterReset for each origin is adjustedby the Overhead Controller to change the sample frequencyfor that origin, thus reducing or increasing its overhead.

Emergency shutdown Object-centric sampling is most ef-fective for managing overhead when there are a large num-ber of objects contributing to total overhead. If the majorityof execution is dominated by method calls on a single, long-lived object, tracking this object will result in large overhead.

To avoid severe performance degradation when a hot,long lived object is tracked, the QVM supports the notion ofan emergency shutdown. On each QVMI callback for alloca-tions and invocations, the system checks a flag to determinewhether an emergency shutdown is needed. If so, it disablesthe monitoring bit in the object header such that the objectwill no longer be sampled. The client analysis may now needto discard this object, as the method callbacks are not com-plete. However, this mechanism allows the system to ensurethat overhead can be controlled.

4.3 Overhead ControllerThe job of the Overhead Controller is to periodically checkthe QVM overhead, and adjust the sampling frequenciesaccordingly. If the overhead is above the budget, samplefrequencies are reduced; if the overhead is below budget, thefrequencies are increased.

To avoid oscillation and large spikes in overhead, the con-troller monitors not only total overhead, but recent overhead.Recent overhead is computed via exponential decay; a sec-ond copy of application time and QVM time are maintained,

and multiplied by a decay factor each time the controllerwakes up. This gives more weight to recent timings, effec-tively measuring the overhead over a previous window ofexecution.

The primary focus of the controller is keeping the over-head below the overhead budget. Maximizing the client ex-ecuting time within that budget is also a goal, but it is sec-ondary. Thus the controller reduces sample frequencies if ei-ther the total overhead or recent overhead exceed their bud-gets.

If the overhead deviates too high above the budget, thecontroller enacts the emergency shutdown to stop profilingin the current set of objects, and starts tracking new objectsonce the overhead is within budget.

Origin-specific adjustment The QVMI maintains separateoverhead statistics for each origin (see Section 4.2), allow-ing the OHM to increase or decrease the sample rate inde-pendently for each origin. These origin-specific adjustmentsare made as follows.

The controller decides on sample rates for each ori-gin by maintaining a second overhead threshold, calledoriginOverheadBudget. The sample rate of each originis adapted to stay below this overhead budget. If the over-head for an origin is below originOverheadBudget, thesample rate is increase (or left alone if the origin is alreadyexhaustively tracked).

When the controller sees that total overhead is too high,it reduces the originOverheadBudget, thus effectivelyreducing the sample frequency only for origins that exceedthis overhead threshold. The originOverheadBudget isalways less than or equal to the total overhead budget, butmay be significantly lower if there are a large number oforigins.

This approach is similar to [22] which uses inverse sam-pling to avoid missing memory leaks in cold code.

5. QVM ClientsIn this section, we describe three clients built on top of theQVM platform. We have implemented a number of clients inorder to cover a range of user properties: ranging from localassertions to continuous monitoring using temporal safetyproperties.

5.1 TypestateIn this section, we show how QVM is used to dynamicallycheck typestate properties.

Typestate [36] is a framework for specifying a class oftemporal safety properties. Typestates can encode correct us-age rules for many common libraries and application pro-gramming interfaces (APIs). For example, typestate can ex-press the property that a Java program should dispose a na-tive resource before its Java object becomes unreachable andis collected by the garbage collector.

initial undisposed

elsedisposed

dispose* |

release*

err

object death

*

*

Figure 6. A typestate property tracking proper disposal ofSWT resources. Names of tracked types are not shown.

Dynamic checking of typestate properties, as well as gen-eralized multiple-object typestate (also known as “first-orderproperties” [34, 38]), have been addressed before in Trace-matches [3] and MOP [12]. We use the typestate client todemonstrate three contributions of our platform: (i) adap-tive overhead management; (ii) timed typestate transitions;(iii) collection of additional detailed debug information withlow overhead.

Using the QVM platform to implement dynamic types-tate checking also provides us with an advantage in gettingobject-death callbacks directly from the garbage collectorand not relying on a finalizer method to be called. This guar-antees that object-death events are fired in a timely manner(which is not guaranteed to happen when using finalizers)and allows us to measure resource-drag (see below) moreprecisely.

QVM uses a simple input language to let the user specifya finite-state automaton that represents the typestate prop-erty, and the types to which it applies. We refer to a typethat appears in at least one typestate property as a trackedtype. Once the tracked type is specified, our implementationinstruments every object of this tracked type with additionalinformation that maps the object to its typestate. During ex-ecution, QVM updates the typestate of each tracked object,and when an object reaches its error state, QVM records anerror report (as the one shown in Fig. 1) in a designated logfile.

EXAMPLE 5.1. Fig. 6 shows a typestate property (repre-sented as a finite state automaton) that identifies when anSWT resource has not been disposed prior to its garbagecollection, thus possibly leaking native resources such asGDI handles. The tracked types are not shown in the figure,as this property applies to a large number of types (e.g.,org/eclipse/swt/widgets/Widget). Since all statesother than the designated error state are accepting, we sim-plify notation by not using a special notation for acceptingstates. We label edges of the finite-state automaton with reg-ular expressions that define when the transition is taken. Forexample, the transition from undisposed to disposed occurswhen invoking a method whose name begins with disposeor release. We use else to denote a transition that is firedwhen no other transition from the state can be matched (notethat the automaton is deterministic).

initial

Image.(Device;InputStream;)V

1 / 1

Image.init(Device;ImageData;)V

1 / 1

1 : Image.init(Device;ImageData;)V

2 : Image.(Device;InputStream;)V

3 : IR.loadImage(ClassLoader;...)Image;

Image.createMask(ImageData;Z)I

1 / 1

1 : Image.createMask(ImageData;Z)I

2 : Image.init(Device;ImageData;)V


Image.isDisposed()Z

1 / 1

69 / 39

Image.getImageData()ImageData;

4 / 39

Image.getBounds()Rectangle;

65 / 39

Image.createMask()V

65 / 524 / 39

1 : Image.getImageData()ImageData;

2 : BGT1.doPaint(Z)V

3 : BGT1.setGraphic(Image;)Z

65 / 39

1 : Image.getBounds()Rectangle;

2 : CLabel.getTotalSize(Image;...)Point;

3 : CLabel.computeSize(IIZ)Point;

64 / 52

1 : Image.createMask()V

2 : GC.drawImageMask(Image;...)V

3 : GC.drawImage(Image;...)V

OD

1 / 52


2 : IR.loadImage(ClassLoader;...)Image

3 : IR.loadImage(Display;...)Image

Figure 7. An example typestate history for a leaking Image in Azureus. For brevity, we only show sample contexts and omitthe context for isDisposed.

In Section 7.1 we report experimental results for suchproperties.

For every typestate property, QVM tracks the number oftimes it has been violated. When the number of violationspasses a specified threshold, QVM starts recording addi-tional debugging information in the form of a typestate his-tory.

As mentioned in Section 2, a typestate history of an objecto is an abstraction of the sequence of method invocationsperformed during execution with o as a receiver. We use thename typestate history because we summarize the sequenceof method invocations as an annotated DFA, similar to atypestate property.

Intuitively, a state in the typestate history represents thestate of the object after a specific method has been invokedon it. A state in the history is labeled with a set of (bounded)contexts — representing the contexts in which the methodhas been invoked. A transition between states m1 and m2 inthe history represents the fact that the method correspond-ing to m2 has been invoked immediately after the methodcorresponding to m1 has been invoked.

A typestate history therefore provides information aboutthe way a single object that violates the property was usedin the program. This helps the programmer to diagnose thecause of the reported violation.

EXAMPLE 5.2. Fig. 7 shows an example typestate his-tory produced by QVM. This provides an account of

the behavior of a single object that violates the prop-erty. In the figure, we have abbreviated the type nameBufferedGraphicTableItem1 to BGT1, and the typename ImageRepository to IR. In figures of typestate his-tories we do not show method signatures on the edges be-cause the label of an edge is always identical to the label ofits target state.

Unlike the simple typestate history of Fig. 2, the typestatehistory of Fig. 7 contains cycles and multiple invocationsof methods. The label on a transition edge represents thenumber of times this transition occurred in the execution andthe last time when it occurred. For example, the transitionfrom the state in which createMask is the last methodinvoked on the object to the state in which isDisposedis the last method invoked on the object occurs 64 timesin the execution summarized by the history of Fig. 7. Thelast time in which the transition occurred is 52, where timeis measured as the number of allocations performed by theprogram. In the figures, we show the time counter divided by1024.

Resource Drag and Lag Since QVM tracks the last timeeach transition took place, it can be used to identify whena resource is not released in a timely manner (known as re-source drag). In such cases it is sometimes possible to im-prove performance by releasing the resource earlier. Simi-larly, since QVM also tracks calls to constructors and object-death events, it can be used to identify when an object is al-

ca n v a s . a d d D i s p o s e L i s t e n e r ( new D i s p o s e L i s t e n e r ( ) {@Overridep u b l i c vo id w i d g e t D i s p o s e d ( D i s p o s e E v e n t a rg0 ) {

i f ( img != n u l l && ! img . i s D i s p o s e d ( ) ) {assert ( QVM.isObjectOwned ( img ) ) ;img . d i s p o s e ( ) ;}}} ) ;

Figure 8. Using QVM to check that an SWT resource is notshared before attempting to dispose it.

located too early (memory lag) or kept reachable for a longertime than necessary (memory drag).

Extensions Our current implementation supports single-object typestate properties. In the future, we plan to inves-tigate how our VM extensions can be combined with tech-niques for handling multiple object typestates such as Trace-matches [3] and MOP [12].

In some cases, static analysis (e.g., [19, 10]) can be usedto verify that a typestate property is never violated, or thatsome transitions of a typestate property never occur in aprogram. These static approaches can be used to reducethe runtime overhead by eliminating some of the dynamicchecks. However, in practice, the static approaches usuallydo not scale to the systems targeted by QVM.

5.2 Local AssertionsTo allow adjustment of overhead, we allow Java assertionsto be sampled. This means that during execution, we maysometime choose not to evaluate an assertion.

5.3 Heap ProbesQVM enables the dynamic checking of various global heapproperties such as object-sharing, heap-ownership, thread-ownership and reachability. These properties are useful forboth debugging and program understanding purposes.

QVM provides a library that exports a set of methods, onefor each heap property. We refer to these library methods asheap probes. The programmer can invoke heap probes fromher program in order to inspect the shape of the heap at aprogram point. The library uses various components of theunderlying runtime in order to obtain an answer. Our listof currently supported probes is shown in Table 2. In thetable, We use TC(o) to denote the set of all objects thatare transitively reachable from o. Technically, o can refer toeither an object or a thread.

Similarly to non-heap probes, our heap-probes can besampled by the overhead manager to allow adjustment ofoverhead, and can therefore evaluate to one of three possiblevalues: true, false, and unknown. The return value of a heap-probe can be used in a standard Java assertion. When a heapprobe is used inside an assertion we refer to it as a heapassertion.

EXAMPLE 5.3. Disposal of SWT resources is based on twoprinciples: (i) the object which allocated the resource is re-sponsible for its disposal; (ii) disposing a parent object dis-poses its children. These principles work well for many casesas a large number of the allocated resources are set to formimmutable containment tree that guarantees proper (albeitnot timely) disposal. However, the treatment of shared re-sources such as Color, Fonts, and Images, is more compli-cated and error prone.

For shared resources, finding the proper disposal pointin the program may be rather challenging. In particular, thedisposal may be based on programmer knowledge of the lastuse of the shared resource in the application.

Fig. 8 shows how a QVM assertion can be used to checkthat a resource is not shared by others, before it is beingdisposed. The code fragment shown here corresponds to acommon idiom for disposing a resource by a dispose lis-tener. This particular code fragment is taken from a fix weintroduced for the Azureus benchmark as described in Sec-tion 7.1.

5.3.1 Discussion and ExtensionsWhen assertions are not sampled, our approach is also ap-plicable for reducing verification efforts by adding runtimechecks of heap properties. For example, establishing thatparts of the heap are disjoint may allow us to employ moreefficient verification techniques that abstract each part sepa-rately.

The heap operations supported by QVM could be ex-tended to provide a comprehensive runtime support for own-ership (e.g., the release and capture operations of [32]).

6. ImplementationIn this section we provide the implementation details ofobject-centric sampling, as well as QVM clients of Sec-tion 5.

6.1 Object-Centric SamplingThere are two key components to the efficient implementa-tion of object-centric sampling. First is the ability to obtain asingle free bit in the object header, to enable efficient check-ing of whether an object is tracked.

Once identified as a tracked object, QVM clients needthe ability to associate analysis data with an object. Weimplemented this in QVM by creating an OBJECTINFO forevery tracked object. This ObjectInfo is then passed to theclient on all object-related callbacks so the client can lookupor store data associated with the object (such as DFA state,etc).

The mapping from object to ObjectInfo is performed via ahashtable lookup. On allocation of an object, the correspond-ing ObjectInfo is created and inserted into the hashtable; onobject death, they are removed. QVMI callbacks that requireaccess to the ObjectInfo obtain it by doing a hash lookup.

Probe Name DescriptionisHeap(Object o) Returns true if object o is pointed to by a heap object, false otherwiseisShared(Object o) Returns true if object o is pointed to by two or more heap objects, false otherwiseisObjectOwned(Object o1, Object o2) Returns true if o1 dominates o2, false otherwiseisObjectOwned(Object o) Returns true if the object pointed to by this dominates o, false otherwiseisThreadOwned(Thread ta, Object o) Returns true if ta dominates o, false otherwiseisThreadOwned(Object o) Returns true if the current thread dominates o, false otherwiseisUniqueOwner(Object root) Returns true if root dominates all objects in TC(root), false otherwiseisReachable(Object src, Object dst) Returns true if object dst is reachable from object src, false otherwise

Table 2. QVM heap probes. We use TC(o) to denote the set of all objects that are transitively reachable from o.

An alternate implementation would be to reserve a wordin the object header to point to the object’s ObjectInfo. Whilethis provides faster lookup, it is not necessarily the supe-rior design because it reduces locality by increasing objectsize, and this overhead is regardless of the sample rate. Ahashtable lookup is significantly slower, but the hashtablelookup is performed only for sampled objects; the inlinedfast path only checks the tracked bit in the object header. Soalthough the hashtable implementation is slower for trackedobjects, it allows a lower base overhead that converged uponwhen the sample rate is reduced. Because the goal of QVMis to target low-overhead scenarios, the hashtable design waschosen.

6.2 Typestate ClientUpon VM startup, the typestate module loads all of the usersupplied properties, parses and stores that information in itsown internal data structures. The typestate module then reg-isters itself with the runtime via the QVMI.registerClientcall.

On method compilation, the QVMI interface is called bythe JIT via the isTrackedAlloc and isTrackedCallSitefunctions to determine whether instrumentation is needed forallocations and calls. These functions return a value of typeTrackLevel. This type can take on one of three totally or-dered values: NEVER (the minimal value), SOMETIMES andALWAYS (the maximal value). All of the registered QVMclients are queried and the return result is computed by tak-ing the maximal value from all of the client responses toensure that sufficient instrumentation is inserted.

QVM then adjusts the instrumentation based on the track-ing level. If the tracking-level is ALWAYS or SOMETIMES,QVM instruments the code with a callback to report theevent that occurred. In the case of SOMETIMES, QVM insertsinlined logic to decide (during execution) whether the call-back gets invoked, If the tracking-level is NEVER, no codeinstrumentation is performed by QVM for the site.

For allocations sites marked with track level SOME-TIMES, the inlined sampling logic consults the samplingstrategy for that origin (see Section 4.2). If selected for sam-pling, the typestate allocation handler is called via the QVMIallocEvent call. The handler creates its internal QVM

tracking structure for the allocated object, and marks the ob-ject as tracked by setting a bit in the object header. Note thatthere could be multiple tracking structures per-object (e.g.the object is part of multiple typestate properties).

For method invocations tagged with SOMETIMES, the in-lined code sequence checks whether the receiver is a trackedobject by checking the tracked bit in the header. This checkis executed even for inlined methods to ensure that callbacksare not optimized away by the JIT. If the object’s trackedbit is set, QVMI’s invocationEvent is invoked whichthen calls the typestate invocation handler. The handler ispassed the receiver object, that object’s OBJECTINFO, andthe method that was invoked. This handler updates the track-ing structure for each DFA the object participates in.

In our implementation for typestate, we have used theobject-centric tracking and sampling capabilities providedby QVMI (Section 3.4) and have inlined check of whetherthe object is tracked. This keeps overhead low by ensuringthat QVMI is invoked only for tracked (sampled) objects.There are many other such property specific optimizationsthat can be made. For example, if we know that the trackedobject is in an error state that will not be exited, QVM doesnot need to invoke any other callbacks on this object.

On Object Death We have instrumented the garbage col-lector to provide precise death events. Whenever an objectis detected to be unreachable during the sweep phase ofthe collector, the collector calls the QVMI’s objectDeathfunction. That function leads to calling the typestate mod-ule’s handler for death events, where all object tracking in-formation is freed (if the object is tracked), ensuring nomemory leakage. If the object is found to be in a non-accepting state, an error is reported.

6.2.1 Collecting Typestate HistoriesIn typestate histories, we use a notion of “time” to recordwhen events occurred. We measure the time as the num-ber of allocations performed by the program. To provide ascalable and efficient implementation of global clock, eachthread maintains a local allocation counter, and these areaggregated to a single global (approximate) time every 10millisec. The precision of the aggregate global clock can beadjusted by the user by changing the frequency of aggrega-

tion operations (at the cost of a performance hit when usinghigher frequency).

6.2.2 DiscussionAlthough the typestate module is written as part of the VM,it is completely isolated from the VM via the QVMI in-terface; this interface can be used to easily write clients tocheck properties other than typestate. By having access toan unused bit in the object header bits, QVM is able to effi-ciently perform object-centric sampling without needing tostore additional words in the object. Moreover, the ability toprecisely intercept object death events frees us from havingto rely on technique such as finalizers and weak references.

6.3 Heap ProbesIn our platform, the underlying memory subsystem alreadyprovides a stop the world mark and sweep, parallel garbagecollector, where the number of parallel marker threads is pa-rameterized by the number of cores in the system. This mem-ory system is highly tuned for performance and provides richsynchronization functionality for controlling the applicationand collector threads. Interestingly, such a setup, althoughcomplex, contains many of the basic components necessaryto perform our probe evaluation. Hence, we implement ourheap probes by re-using and adjusting at key places much ofthis existing machinery. Next we describe in more detail ourheap probe evaluation system.

Operation On system startup, a set of evaluation threadsTm is created by the virtual machine, where |Tm| is thenumber of cores. After creation, each thread tm ∈ Tm im-mediately blocks. Upon probe invocation, the system un-blocks all evaluation threads and each tm starts executingthe probe in question. At the abstract level, the basic exist-ing graph traversal components are shown in Fig. 9. Eachcomponent is parameterized by the evaluator thread tm. Thefunction trace() performs the transitive closure from the settm.pending. The only addition we made to the standard par-allel tracing phase is the callback trace-step, which is firedwhenever a new reference is encountered. Each probe is freeto specialize this function. The set Ta denotes the set ofapplication threads ta in the system at the time a probe isinvoked. The function mark-thread() processes the contentsof each application thread stack but does not trace from it.The function mark-object() marks the object if it is not al-ready marked atomically. If it is not marked, it stores thechildren of the object in the pending set of each evaluatorthread. Since this is a local operation, it is done without syn-chronization. The function barrier() essentially waits for allevaluation threads tm to reach it and then releases them. Toavoid clutter, we assume that all sets are initialized to ∅ be-fore invoking the probe.

Probe is-shared Fig. 10 shows how the components ofa parallel garbage collector are used to implement the probeis-shared() for a tracked object trackedo. For this probe,

trace(tm)while (tm.pending 6= ∅)

remove s from tm.pendingfor each o ∈ {v | (s, v) ∈ E}

trace-step(s, o)mark-object(tm, o)

mark-object(tm, o)atomic

if (o 6∈ Marked)Marked ← Marked ∪ {o}

elsereturn

tm.pending ← tm.pending ∪ {o}

mark-thread(tm, ta)for each o ∈ roots(ta)

mark-object(tm, o)

mark-threads(tm, T )for each ta ∈ T

mark-thread(tm, ta)

Figure 9. Basic Components

is-shared(tm, trackedo)tm.sources = ∅mark-threads(tm, Ta)trace(tm)lock(allsources)allsources ← allsources ∪ tm.sourcesresult ← |allsources| > 1unlock(allsources)barrier()

trace-step(s, t)if (trackedo = t)

tm.sources = tm.sources ∪ {s}

Figure 10. Shared from heap

a special case needs to be addressed in order to compute asound result of the probe when heap traversal is done in par-allel (such a case does not exist in the sequential traversal).The special case is the following: it is possible that with par-allel evaluator threads, two or more parallel evaluators tmreach object o only once. In that case, we need to make sureto combine the results of all of the evaluator threads. Notethat in the case where a single evaluator reaches two or moresource objects pointing to o, the probe will return true with-out needing to inspect what other threads have reached.

One solution is to synchronize the evaluator threads onevery trace-step() (e.g. by using a compare-and-swapinstruction for example). However, on many processor ar-chitectures this would have a negative effect on performance.To avoid this, each parallel thread records the set of sources

pointing to o that it encounters in trace-step(). Note that thisis a local operation and requires no synchronization. Upontermination of its tracing phase each tm updates a globalset allsources under a lock. If there is more than one ob-ject in that shared set, we return true, otherwise we returnfalse. For clarity of presentation we have omitted some im-plementation details from the figures. For example, in theimplementation, both local and global (i.e. allsources) setsof sources are of size two and we stop recording sourcesonce that size is reached for the local set in trace-step().Next, before agreeing on a global value of result and return-ing, the threads again synchronize via a call to barrier(). Wehave also omitted key portions of the runtime system such asload-balancing, a key technique in parallel collectors. Suchtechniques are completely orthogonal to our implementationand can be added without affecting the code for the probeevaluation.

6.3.1 OptimizationsWe are currently working on various optimizations to oursystem including evaluating multiple probes in parallel, con-current evaluation of probes and heuristic optimizations viawrite barriers with techniques similar to those described in[33]. Such an optimized implementation of heap probes andits evaluation remains a topic of future work.

7. Experimental EvaluationIn this section we experimentally evaluate QVM.

7.1 Typestate Monitoring7.1.1 MethodologyIn our experiments we focused on typestate properties thatcorrespond to resource leaks. We monitor leaks of SWT re-sources and of IO streams. In these experiments the goal wasto see if we can detect typestate violations that occur over anextended period of time. It is likely that massive leaks wouldhave been detected and fixed in the testing phase, and there-fore what we expect to find in these experiments is mostlya small number of leaks that accumulate over time. For thatpurpose, we used a range of applications on a regular basisto perform our daily tasks.

Some of the applications considered are an instant-messenger (goim), newsfeed readers (feednread, rssowl),file management utilities (virgoftp, jcommander), large IBMinternal applications, etc. For all of these applications ourstrategy was to simply run them over QVM and record thereported errors. In some of our experiments we investigatedeach report manually, diagnosed the causes of the errors, andimplement fixes. This was an important exercise for evaluat-ing and refining the debug information we collect (e.g., thetypestate history).

Application SWT IOStreams High FixedResources Frequency

azureus 11 0 4 5etrader 17 0 2 0feednread 1 7 0 0goim 3 0 1 3ibm app 1 0 0 0 0ibm app 2 3 2 0 0jcommander 9 0 0 0juploader 0 1 0 0nomadpim 2 0 0 0rssowl 8 3 0 0tvbrowser 0 5 0 0tvla 0 4 0 0virgoftp 6 0 0 6Total 60 22 7 14

Table 3. Sources of typestate violations in our applicationFor every application, we indicate the number of sources thatare executed in a high-frequency (corresponding to criticalleaks).

7.1.2 Applications and ResultsTable 3 summarizes the number of sources of typestate vio-lations found in our applications. Rather than counting thenumber of objects that violate the property, we count theallocation sites in which such objects were allocated. Thisis a more objective measure of the number of bugs in theprogram than the number of objects exhibiting the violationwhich usually depends on the duration of program execu-tion. In order to measure the significance of a violation, werecord whether it occurs frequently in the program execu-tion. In some of our experiments we took the effort to inves-tigate the errors and come up with appropriate fixes. Columnfixed in the table reports the number of fixes we have intro-duced and tested.

Azureus Azureus [8] is a Java implementation of the Bit-Torrent protocol. It supports several modes of user interac-tion, all implemented using SWT. Azureus is the #1 down-loaded Java program from SourceForge, and has more than160 million downloads to date. Using QVM we were ableto detect 11 sources of resource leaks in this application. Wefixed 5 of these and reported them to the Azureus develop-ment team. The reports were confirmed by the developmentteam, and the fixes were incorporated into the codebase.

At least 4 of the reports correspond to leaks that wereoccurring rather frequently. One particularly high-frequencycase was a method Utils.getFontHeightFromPX(...)that was allocating a Font object in order to compute fontheight and was not properly disposing the Font object uponits return. This method is frequently called and resulted withthousands of leaking fonts even for short executions. Thismethod was very likely created by copying another methodin the class that has similar functionality but returns theFont object. Among our other fixes, we fixed the frequentlyleaking method getFontHeightFromPX(...) and our fixwas incorporated into the Azureus codebase.

Another fix in Azureus required the addition of a disposelistener that properly disposes of an Image object. This leakwas not very frequent, but it would leak an image whenevera certain panel would be displayed (image is created in theVivaldiPanel.refreshContacts(...) method).

Eclipse Trader eclipseTrader is an SWT application thatprovides a framework for building an online stock trad-ing system. eclipseTrader uses a frequently-updating UI topresent streams of stock information, and as a result maybe particulary sensitive to resource leaks. Using QVM, wedetected 17 sources of resource leaks.

Our count of violation sources represents a lower boundon the number of places that have to be modified for in-troducing a fix. This is in part due to the fact that we arecounting the number of allocation sites and not the alloca-tion sites in context. When a common method (such as afactory method) is used to create objects that violate a prop-erty in many contexts, we only count this as a single vio-lation. Specifically, for eclipseTrader, there are several allo-cation sites that are used in different contexts. For example,the method Settings.getColor(Color) returns a newColor object, and is used in a large number of contexts thatfail to properly dispose the color. We count this method asa single violation source that occurs with high frequency(there are tens of thousands leaking objects that are allo-cated in this method in a typical execution of eclipsetrader).In general, counting the number of violation sources has tobe done carefully as the sources are not necessarily indepen-dent. For example. a whole sub-tree of components may leakdue to a single missing dispose operation on the parent of thetree.

Feed’N Read feednread is an open source newsfeed reader.In this news reader, the SWT resources are mostly properlymanaged. There are some resources that are not disposedbefore the program exits, but these are resources that aresupposed to be live throughout program execution by design.Although QVM reports these as violations, we do not countthem here as violation sources because this seems to beacceptable treatment of such resources (resources will bereturned to the OS anyway when the application terminates).feednread seems to have some minor problems in properlyclosing IO streams when managing archived feeds.

GOIM GOIM [1] is an Instant Messaging client based onthe open source Jabber/XMPP protocol. We used GOIM run-ning on QVM to communicate between team members fora few days. Over the course of our evaluation, we detected3 sources of leaks and introduced fixes to all of them. Wetested our fixed version of GOIM and confirmed that all pre-viously reported leaks have been resolved.

The fixes we introduced in GOIM were rather involvedas we had to add new disposal code in places where no suchcode existed. Our fixes therefore involved introducing new

dispose methods as well as making sure that calls to thesemethods are propagated properly.

IBM Applications We used QVM to run a developmentversion of a large scale IBM product on a daily basis fora period of a few weeks. For this application, no problemswere reported by QVM. This is not surprising as the devel-opment team is putting a lot of emphasis on preventing thekind of leaks we are tracking.

We used QVM to run a development version of anothersmaller IBM tool that makes heavy use of SWT. For thisapplication, we found 5 source of violations. The leaks areassociated with user actions like opening a new file.

JCommander JCommander is a multi-platform file man-ager. For this application we found 9 sources of violations.

JUploader JUploader is a small application that uploadsimages to Flickr. Its UI is very basic and only involvesa few SWT resources. For this application we found asingle source of leaks causing the rather frequent leak ofEventOutputStream objects.

Nomad PIM Nomad PIM is a personal information man-ager. It has a rather involved SWT interface. For nomad wefound 2 sources of violations.

RSS Owl RSSOwl is an RSS newsreader. Running RSSOwlon QVM, we find 8 sources of SWT leaks and 3 sources ofIO Streams leaks.

TV Browser TV Browser is na electronic program guide.For this application we found 5 sources of leaking streams.

TVLA TVLA [26] is a parametric program analysis frame-work. Running TVLA with QVM we find two input streamsthat are not closed by the parser processing input files, andtwo streams that are not closed when producing the analysisoutput. These are very low frequency leaks that only createone leaking object per execution of the analysis engine.

VirgoFtp VirgoFTP is a multi-platform, graphical FTPclient written in Java using SWT. For this application QVMreported 6 sources of leaks. We introduced fixes to all ofthese leaks, and tested that the fixed version resolves them.

One source of a low-frequency leak in VirgoFTP is atypical pattern that repeats across many SWT applications.Changing the color/font preferences in an application oftencauses the leak of the previous colors/fonts used. These kindof leaks occur in such a low frequency that programmersare very likely choosing to ignore disposal of resources inthis case. Fixing this simple problem in VirgoFTP was rathercomplicated because the code was completely non-preparedfor handling these leaks. In order to fix these leaks we had toemploy a rather significant refactoring of the code.

7.1.3 Overhead EvaluationMethodology For overhead measurements we use theSPECjvm98 and Dacapo benchmark suites.1 The bench-marks were configured to run for roughly one minute tocreate a reasonable usage scenario, and total time was mea-sured. 20 runs of each benchmark were used to reduce noise.

We created a set of representative typestate properties thatincur a significant overhead. We instrumented classes suchas Java Collections, Enumerations, Vectors, and Streams.

Results Figure 11 reports the overhead of the typestatemonitoring client when applied to our benchmarks suite witha range of overhead budgets (5%, 10%, and 20%). The right-most bar for each benchmark shows the overhead when thetypestate client is applied exhaustively, ie, without sampling.The leftmost bar shows the base overhead (as described inwhich represents the base checking overhead that is incurredwhen no sampling takes place (see Section 4.1).

The overhead incurred when checking these typestateproperties exhaustively is high (up to 10x slowdown, with 7of the benchmarks over 2x slowdown). Heavyweight prop-erties that introduce frequent callbacks were selected inten-tionally to allow us to evaluate the effectiveness of the sam-pling infrastructure.

The base overhead (leftmost bar) is low, at most 2.5%.Having the base overhead be low is critical, as this is theoverhead that is the lowest overhead that can be achievedwhen sampling is disabled.

The middle three bars show overhead incurred whenQVM was run with a specific overhead budget. Althoughthere is some fluctuation in the overhead achieved, it is gen-erally quite close to the requested budget. Achieving ac-curacy at this level is quite challenging because the wholeprocess takes place online and within a single execution ofthe benchmark. These results demonstrate not only the over-head monitor’s ability to measure the overhead introduced,but the overhead controller’s ability to keep the overheadclose to the desired budget.

Figure 12 shows an example of the overhead manageradapting the overhead of the typestate client online forthe javac benchmark and a 10% overhead budget. The x-axis shows time in seconds, and the y-axis shows percentoverhead, as measured online by the QVM overhead mon-itor. The spike around 0.5 seconds occurs because there issome lag before the overhead monitor can react and reducethe sample rates. However, once the controller throttles thetagged objects at the hot allocation sites the overhead con-verges on the desired budget of 10%.

The goal of QVM is not just to have low overhead, butto collect as much useful information as possible within theoverhead budget. The sampling strategy employed by theoverhead manager (see Section 4.2) strives to distribute the

1 Jython and xalan were excluded from the study because they do notrun properly on the developmental version of the VM used for this work(independent of the QVM modifications).

Figure 12. Overhead over time

Figure 13. Allocation Site Coverage: Percentage of alloca-tion sites (of tracked types) that allocate at least one trackedobject.

samples across the allocation sites in the program, to helpfind bugs that may occur in cold code. Figure 13 comparesthe coverage of allocations sites achieved with 5% budgetwhen using origin-specific sampling, as well as global sam-pling, where all sites are sampled equally. Origin-specificsampling enables nearly 100% coverage for all benchmarks,while global sampling misses a significant percentage of theallocation sites for at least half of the benchmarks.

QVM is using sampling to reduce overhead so there is noexpectation that all objects will be tracked, however in manycases the sampling mechanism allows the dynamic numberof tracked objects to be significantly higher than one mightanticipate. Table 4 reports the percent of objects allocated(of the tracked types) that are sampled to be tracked by thetypestate monitor.

Consider the program javac. Previously in Figure 11 wesaw that our example set of typestate properties introduces

Figure 11. Overhead with budget

Overhead BudgetBenchmark 1% 2% 5% 10% 20% 50% 100%db 100 100 100 100 100 100 100mpegaudio 98 100 100 100 100 100 100jess 63 76 85 87 95 100 100jack 22 37 45 52 71 100 100javac 0.4 1 4 9 31 41 49compress 100 100 100 100 100 100 100mtrt 39 46 66 83 90 93 94antlr 13 19 34 68 67 92 98eclipse 4 7 12 28 44 66 67luindex 5 51 79 97 99 99 100hsqldb 7 13 16 30 43 31 75chart 40 64 85 88 93 94 97fop 47 70 42 66 100 100 100bloat 100 100 100 100 100 100 100pmd 81 99 99 99 99 100 100

Table 4. Object Coverage: Percent of allocated objects (oftracked types) that are selected by QVM for typestate moni-toring.

overhead of around 970% when checked exhaustively. How-ever, Table 4 shows that with an overhead budget of 100%slowdown (more than a factor of 9 less than the exhaustiveslowdown) 49% of the objects allocated (of tracked types)were still selected for tracking. This can be explained whena relatively small number of objects contribute significantlyto the overhead; once sampling at these sites is throttled, thenumber of remaining allocations that can be tracked withinthe overhead budget may be large.

Some benchmarks (db, compress, bloat) report 100% forall overhead budgets because their exhaustive overhead forthe typestate properties we selected is below 1% (see Fig-ure 11).

7.1.4 DiscussionWrapper Streams For a large number of applicationsQVM reports violations of stream types that do not holdreal resources but violate the contract of the InputStreamand OutputStream API specification. An example thatis widely reported by QVM is the LEDataInputStreamfrom the package swt.internal.image. This stream is awrapper around an InputStream and is often not closedbecause closing the wrapper closes the underlying Input-Stream. In many cases, the underlying stream outlives thewrapper stream and is therefore closed directly without everinvoking close() on the wrapper stream.

In addition, streams such as ByteArrayInputStreamand ByteArrayOutputStream are simply wrappers arounda byte array. Invoking close on such streams has no effect(although it is required by the streams API in principle), andprogrammers therefore avoid this redundant method call.

We do not consider these to be real violations and do notinclude them in our QVM reports.

Library Objects vs. Application Objects Our initial speci-fication for SWT resources was not the one shown in Fig. 6.Our initial specification required that dispose() be in-voked on every SWT Widget, as this is the public methodthat an application code can invoke to dispose a resource.However, in SWT, widgets are arranged into an ownershipstructure in which a widget may have a parent that is respon-sible for its disposal. When the parent is disposed, it dis-poses all of its children, but instead of invoking the (public)method dispose to do so, it directly calls the (protected)internal method release. We therefore had to refine ourspecification to be aware of the internal library implementa-

tion and the fact that an SWT widget could be also releasedby an invocation of release that originates in library code.

Additional refinement of the specification is required toavoid objects that are allocated in the library for internallibrary use, and their lifetime is not managed (and shouldnot be managed) by the application. For example, Fontobjects allocated by the static method Font.gtk new() aremanaged by the library.

7.2 Assertions and Heap ProbesEvaluating local assertions and heap probes on realisticbenchmarks is a nontrivial task, as it requires that we de-vise meaningful assertions for each benchmark. Currently,we evaluated assertions and heap probes on a number ofsynthetic benchmarks and demonstrated that the overheadmanager works as expected for these benchmarks. Sincethese are synthetic benchmarks, the measured numbers arerather arbitrary and we therefore do not report them here.

We have also evaluated heap probes in a single bench-mark — SPECJbb2005. For this benchmark, heap probeswere inserted on fairly frequently executed instructions, thuswhen run exhaustively caused significant slowdowns (on theorder of 100x). However, when running the system with anoverhead budget of 10%, the overhead manager success-fully achieved an overhead of 10.5% by sampling the heapprobes. Furthermore, with 10% overhead, QVM provided100% coverage of the probe sites.

8. Related WorkAspects and Monitoring Dynamic tools such as Trace-matches [3], and MOP [12] are able to detect violation oftypestate properties, and in particular detect resource leaks.For example, in [12], JavaMOP was used to successfully de-tect a number of resource leaks in Eclipse. These tools ex-tend aspect-oriented programming with the ability to specifydeclarative patterns against the history of the program, ratherthan against single events as in traditional aspects. Optimiz-ing the performance of code generated from these declara-tive specifications is a challenging task and is currently anactive area of research. In [7], the authors concentrate on dy-namic optimizations that only consider the specified declara-tive pattern and not the program on which it is applied. Suchoptimizations include avoidance of memory leaks and bet-ter representation of the typestate automata. Alternatively,in [10], the authors take the program into account and per-form static optimizations, e.g., removing unnecessary instru-mentation points from the program. Unfortunately, despitethese optimizations, there are cases where the overhead isstill unacceptable for some properties. In [9], the authorspropose two techniques: spatial and temporal partitioning.In the first optimization, assuming multiple users of the ap-plication, the instrumentation points are partitioned into setsoptimizing the per-user overhead. However, it is still possi-ble to partition the points in a way that some set has a hot

point. The second optimization spawns a monitoring threadwhich can switch the instrumentation on and off at varioustimes. The intervals defining when the point should be on oroff are predetermined off-line and given to the thread as pa-rameter. It seems that our approach of automatically adjust-ing the overhead online for a particular set of control siteswill be beneficial to the second optimization.

Sampling for Scalable Monitoring Previous work has fo-cused on low overhead techniques for sampling instrumenta-tion [4] and collecting such profiles in bursts [14]. Howeverthese techniques turn sampling on and off based on time orcode execution frequency, and do not support a techniquesuch as our object-centric sampling.

In the cooperative bug isolation (CBI) project [27], theoverhead of monitoring program execution is mitigated byusing sparse random sampling and collecting informationfrom a large number of users exercising the code. Collabora-tive techniques could be combined into QVM to collect ap-plication errors from a wider group of users. We believe thatthe ubiquity of QVM provides a natural channel for wideradoption of CBI-based techniques.

Typestate Verification and Static Leak Detection A num-ber of sound static tools target detection or prevention ofmemory and resource leaks [23, 15, 16, 21, 19, 35]. Sometools specifically target detection of SWT resource leaks[28], and others target automatic generation of resourcemanagement code [17]. In principle, most of these ap-proaches are capable of detecting cases where an objectis leaked or double disposed. In practice, however, theseapproaches do not scale to industrial-sized applications, andproduce a large percentage of false alarms. In addition, someof these approaches either require additional (potentiallycumbersome) annotations or restrict the class of programsthat may be written, e.g. by restricting aliasing [15, 21].

Heap Properties Mitchell [30] provides concise and in-formative summaries of real world heap graphs arising inproduction applications. The summaries are done offline andfollow a set of useful heuristical patterns for summarizinggraphs. In contrast, our goal is to check various user spec-ified heap properties online. Subsequent work by Mitchelland Sevitsky [31] study offline heap snapshots with the goalof finding inefficiencies in memory usage enforced by a par-ticular program design.

Chilimbi et. al. [13] provide a two-stage framework suit-able for testing, where in the first stage a set of likely heapinvariants based on node degree are computed at a smallnumber of program points. Then the instrumented programis executed and checked against these invariants and a bug isreported if a deviation is observed.

Various works have relied on the garbage collector to findmemory leaks. Jump et al. [24] use the collector to help insuggesting potential leaks. Bond et al. [11] studies efficientleak detection for Java. Similarly to us, they make use of

available bits in the object header and the adaptive profilingtechniques from [22] applied on object use sites, in order toreduce the space and time overheads. We see these advancesas potential QVM clients, which could manage the overalloverhead for them. In a recent paper by Aftandilian et al. [2],the authors suggest the idea of piggybacking on an existinggarbage collector in order to check various heap properties.They propose two of the assertions we consider here, namelyisShared and isObjectOwned, but have not implementedthese assertions and hence have not had the chance to studythe wide cla

Date post:	04-Feb-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

QVM: An Efﬁcient Runtime for Detecting Defects in Deployed … · 2018. 9. 5. · cations with...

Documents