Parameterized Object Sensitivityfor
Points-to Analysis for Java
Presented By: -Anand Bahety
Dan Bucatanschi
Presentation Roadmap
• Introduction • Terms and Definitions• Application of previous techniques to OOP• Imprecision analyzed• Object Sensitive analysis and its advantages• Parameterized Object Sensitivity
Introduction
• Points-to Analysis: - Method in Java to determine the set of objects pointed to by a reference variable or a reference object field
• Goal• Advantages
Terms and Definitions
• Side-effect analysis• Def-use analysis• Flow sensitive & flow insensitive• Context sensitive & context insensitive• Object sensitivity
Sample points-to graph
Object Oriented Programing
• Encapsulation• Inheritance• Collection (Containers)…
Lets try to analyze these features using flow insensitive and context insensitive analysis
Semantics
• R – set of all reference variables• O – set of all objects created at object
allocation sites• F – contains all instance fields in program class• Edge (r,oi) Є R x O• (<oi,f>, oj) Є (O x F) x O• Transfer functions
Encapsulation
x1 O1 O2 x2
y1 O3
y2 O4
f
x
this
f
f
f
Inheritance
y O1 O2 z
b O3
B.xb
A.xa
this
f
f
f
O4 c
C.xc
f
Imprecision
• Encapsulation• Inheritance
– Both of these are strong concepts of OOP– But not captured properly with old techniques– Solution is Object sensitivity
Object Sensitivity
• Revised semantics– O` - set of all object names– R` - set of replicas of reference variable– Relation α(C,m)– Set of new transfer functions
Context sensitivity included
• B.thiso3,B.xbo3, A.xao3
C.thiso4,C.xco4,A.xao4
y O1 O2 z
b O3 O4 c
B.xb
A.xa
this
C.xc
f
f
f
f
Older representation
Advantages
• Models OOP features• Distinguishes between different receiver
objects • Static methods and variables can be handled
with insensitivity • Can be parameterized
Parameterized Object Sensitivity
• Two dimensions– Degree of precision in
naming scheme
o21 , o31
– Set R* of reference variables for which multiple points-to sets should be maintained
Implementation and Performance
• Techniques for implementation and optimization
• Side-effect analysis (MOD)• Def-Use analysis• Empirical Results• Conclusions• Future Work
Techniques for Implementation
• Typical implementation of flow- and context-insensitive analysis (Andersen’s technique):– Statement processing routine: processes different
kinds of program statements– Virtual dispatch routine: models the semantics of
virtual calls
Techniques for Implementation
• Implementation of parameterized object-sensitive analysis:– Implement function map(v, c)– Process each statement once for every possible
context– Augment the virtual dispatch routine to map the
return variable and the formal parameters of the invoked method to the corresponding context.
Techniques for Optimization
• The points-to set of a replica thiso = {o}.• Suppose statement s contains only
nonreplicated variables (i.e. the variables are not in the R* set), then analyze s only once for one “default” context.
• Similarly, if l ∈ R* but r ∉ R*, and l is assigned only at statements of the form:– l = r– l = r.f
Techniques for Optimization
• Suppose l ∈ R* and p ∉ R*.• Consider the assignments: l.f=p, p=l, p=l.f, and p.f=l.
• We can add a nonreplicated variable l’ and a new (context-dependent) statement l’=l.
• Then the points-to set of l’ is the union of the the points-to sets of all context copies of l.
• So the statements can be analyzed context-independently.
Side-effect Analysis (MOD)
• Goal:– For each statement s and context c of the method
enclosing s, compute set Mod(s, c) of objects that could be modified by executing s when in c.
– Also, MMod(m, c) is the set of objects that could be modified by each contextual version of a method m.
• The previous optimizations can be applied.
Side-effect Analysis (MOD)
Instance field assignments
Virtual method calls
Static method calls
Typo: should be c
Def-Use Analysis
• Goal: compute def-use associations between pairs of statements.
• A def-use association for a memory location l is a pair of statements (m, n) such that m assigns a value to l and subsequently n uses that value.
Standard Def-Use Analysis
• For procedural languages, well known methods for computing intraprocedural associations and interprocedural associations.
• We need a pointer analysis to disambiguate indirect definitions and uses.
• Reaching definitions (RD) analysis needed to determine the sets of definitions that may reach a program statement (because of use of pointers), in order to identify def-use pairs.
Object Sensitivity in Def-Use Analysis
• Points-to analysis must be used in order to determine which objects may be accessed by expressions of the form p.f.
• ∀ oi ∈Pt(p), memory location oi.f is added to the DEF or USE set for the corresponding statement.
• MDEF(m) contains definitions created in method m and in all direct and indirect callees of m.
Standard Def-Use Analysis
DEF set;Direct and indirect DEF set
Reaching Definitions set broken down by type of node (statement)
DEF-USE pairs
Implementations
• Parameterized object-sensitive points-to analysis (context depth = 1):– ObjSens1: keeps context-sensitive information for
implicit parameters this and formal parameters of instance methods and constructors.
– ObjSens2: the same as ObjSens1, but it also keeps track of return variables.
Implementations
• Context-sensitive analysis based on the call string approach to context sensitivity, for a call string k = 1 (CallSite).
• Distinguishes context per call site.• To allow for comparison, the context
replication is performed for this, formal parameters and return variables in instance methods and constructors.
Implementations
• The 3 context-sensitive analyses were built on top of an existing implementation of Andersen’s context-insensitive points-to analysis (And).
• The analyses are using the optimization techniques we discussed.
• The Soot framework was used to process Java bytecode and to build a typed intermediate representation.
Characteristics of Programs
Analysis Cost
Discussion
• Time and memory cost is comparable to Andersen’s analysis.
• Amount of work is similar: And has to consider all possible objects for a statement s. Even though context-sensitive analyses do more work to keep track of different contexts, they eventually end up doing less work per statement s.
• For the majority of programs, adding the return values to R* does not increase cost.
Discussion
• Call string context-sensitive analysis (CallSite) achieves practical cost.
• CallSite has poor running time for larger programs, probably because it is less precise than ObjSens2.
MOD Analysis Implementation
• Measurements of ObjSens2, CallSite, and And.• Percentages are with respect to the number
of statements that modify at least one object.• Each column shows the percentage of the
total number of statements that modify the respective number of objects.
• More precise analyses produce a smaller percentage number.
MOD Analysis Precision
Conclusions
• Presented a framework for parameterized object-sensitive points-to analysis, and side-effect and def-use analyses based on it.
• Object-sensitive analysis achieves significantly better precision than context-insensitive analysis, while remaining efficient and practical.
Future Work
• Investigate other instantiations of the framework: more precise naming of sub-objects of composite objects.
• Investigate applications of points-to, side-effect, and def-use analyses in the context of software productivity tools.