Program Analysis Course Notes · Ashok Sreenivas, 2008 1. Background / overview 1.1 Course overview...

Program Analysis Course Notes

Ashok Sreenivas, 2008

1. Background / overview

1.1 Course overview

• Introduction: what and why of program analysis• Background and program analysis techniques

• Lattice theory• Data flow analysis• Abstract interpretation• Non-standard type inference• Inter-procedural analysis

• Analysis I: Identifying equivalent expressions• Different approaches• Relative merits, demerits

• Analysis II: Pointer analysis• Theoretical complexities• Families of algorithms

1.2 Program analysis – what and why

• What is program analysis?• Infer ‘properties’ of a given program• Analogies to other kinds of analysis (Shakespeare's poetry or an airplane)

• What do we mean by ‘properties’?• Syntactic properties – analogous to ‘physical’ properties. Not of interest in

this course.• ‘Semantic’ properties – properties that hold when the program runs. Similar

to a flying plane or ‘understanding’ Shakespeare.• Much more interesting and relevant.• Of course, we want to find properties without running the program. (Why?)

• Why should we study program analysis?• Program verification

• Ensure that a program ‘meets’ its specifications• Discover (all) ‘invariants’ about the program

• Property verification – ‘static debugging’• Relatively more modest aim of ensuring if a given property holds for

the program• Works against ‘partial’ specifications• Examples: safety of file operations, array index overflows etc.

• Program optimization

1

• Finding properties that hold which ensure that a giventransformation on a program does not change its behaviour [eg.,eliminating ‘constant’ computations]

• Preferably, it should also make the program run faster!• Translation validation

• Does the object code of your program faithfully reflect the source?• Requires identifying and comparing properties across two

languages!• Program understanding, re-engineering etc.

• ‘Software engineering’ applications – very little ‘completely new’code is ever written

• Want to ‘understand’ the program’s behaviour, perhaps in pieces,perhaps under specific conditions, …

• Useful to maintain ‘legacy’ systems, ‘re-target’ them• Analysis results primarily intended for human consumption (unlike

in other cases)• What do we mean by a program? Or, more precisely, what kind of programs are we

talking about?• Program analysis techniques talk about analyzing a class of programs – not

one program (like a compiler)• Class of programs defined by a ‘language’ or ‘semantic model’

• Languages syntactically different but (almost) similar semanticallycan be treated similarly

• Obviously, difficulty of analysis proportional to the complexity of thelanguage / semantic-model

• For the purposes of this course: semantic model would broadly include allimperative, maybe object-oriented programs.

• In particular, it includes variables, assignments, control-flow (sequence,if-then-else, loops).

• Also includes pointers, procedures / functions (with parameters).• ‘Aggregate types’ – structures, arrays etc. also considered.• It does not include higher-order functions, functions as first-class values

(though it may include ‘function pointers’).• Example analyses: sign analysis, interval analysis.

• Sign analysis: If some value should never become negative (say, temperatureor pressure …)

• Interval analysis: Similarly, for some critical values such as temperature,pressure etc. Or also very relevant to array index analysis, and thereforesecurity violations in web applications.

• Sample program and solutions for these analyses on the sample program.

<<x: unknown, y: unknown>> <<x: [], y: []>>x = -10;<<x: -ve, y: unknown>> <<x: [-10, -10], y: []>>y = 1;<<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>>while (x <= 100)<<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>>

2

{<<x: any, y: +ve>> <<x: [-10, 100], y: [1, inf]>>x = x + y;<<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>>y = y * 2;<<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>>}<<x: any, y: +ve>> <<x: [101, inf], y: [1, inf]>>

• Points to note• One has to design suitable abstractions for the analyses• One needs info at each program point• Info should represent all possible executions• The idea of approximations:

• Why is x’s sign <<any>> at the top of the while loop?• Why is the interval of y [1,inf] at the top of the while loop? Can it be

better?

1.3 Fundamentals

• Underlying principles of Program Analysis• The ‘actual’ (or 'concrete') program works on ‘concrete values’ giving

‘concrete outputs’ – say program with integer inputs and outputs.• Questions you ask of the program, i.e. the properties of interest are

‘abstract’. Eg., range of values, set of values (why set?), signs etc.• So, notion of abstraction is the key.• The ‘concrete values’ almost always abstracted (eg. signs, intervals etc.), i.e.

the domain on which the program operates is changed from concrete toabstract.

• The program itself may also be abstracted to simplify the analysis.• Approximation

• Finding ‘exact’ information about the program often impossible• Why?• Halting problem is a program analysis problem!• Many others undecidable too.

• Even if theoretically possible, may be extremely hard, computationallyintractable.

• Therefore, will have to settle for approximate answers many times.• Approximations introduce notions of soundness and completeness – how

‘correct’ is the approximation and how ‘good’ is it.• Soundness

• Everything inferred by the analysis is also true of the program• True in analysis => True in program

• Though everything true at run-time may not be inferred• Basically determines the ‘direction’ of approximation• For interval analysis, the ‘natural’ direction would be that it is OK to predict

wider intervals• Say x ‘really’ takes on values 11, 16, 19.

3

• If analysis says x: [0, 25], i.e. it is never the case that x takes valuesoutside of this range it is OK.

• But if analysis says x: [12, 18] or even x: [12, 28], it is unsoundbecause it says x never takes the value 11 which is wrong.

• But notion of soundness depends on what you want to use analysisinformation for.

• Example (variable initialization)• If application is to detect bugs arising due to uninitialized variables,

you want to report a super-set of all ‘actual’ uninitialized variables.That is, you want to catch all 'real' bugs and perhaps also somespurious 'bugs' that are not really so.

• If application is to ‘pre-initialize’ uninitialized variables (rather thanwhen encountered first – assume this is more efficient or easy), thenyou want to report a subset of all uninitialized variables. That is, it isOK to miss out on a few uninitialized variables, because this islikely to save effort at run-time (if we caught a superset ofuninitialized variables, some variables that are deemed uninitializedbut are actually initialized will get initialized twice which may notbe desirable).

• But most often, the direction of approximation for soundness is obvious.• Example:

• If analysis predicts no invalid memory access, or overflowingcomputations, program is really free of such errors.

• If it points out potential invalid memory accesses or overflowingcomputations, these may or may not be so.

• Completeness• Converse, i.e. everything true in program is also predicted by the analysis

• True in program => True in analysis• The ‘other’ direction of approximation. Not everything predicted by the

analysis may be true of the program!• If analysis predicts invalid memory access, then it is an invalid memory

access.• In almost all situations, an analysis must be sound. Preferably, it should also be

complete, but as we have seen this may be extremely hard or even impossible.• Being sound and incomplete is also called being ‘conservative’ or 'safe'• Being just sound (and horribly incomplete!) is always very easy.

• Just pick the extreme solution!• Infinite intervals for interval analysis,• +/– for the sign analysis,• Every variable for the uninitialized variables analysis etc.

• But these are also completely useless solutions. Hence ‘precision’ important.• Precision

• Try to get as close to complete as possible without losing out on soundness• ‘Tighter’ intervals, fewer ‘false-alarms’ with uninitialized variables etc.• Often, trade-off between precision and effort.• Sometimes (very rarely) maybe trade-off between soundness and effort.

4

2. Program analysis techniques

2.1 Lattice theory

• The mathematical underpinning of the idea of approximations, soundness etc. Useful/relevant in many analyses

• Partial order with the ordering relation representing the notion of approximation• Notions of joins, meets, product lattices, functions over lattices, monotonicity and

fixed points.• Details in lattice.pdf and lattice-others.pdf• Example lattices

• Consider the set S = {1, 2, 3}.• (S, \subseteq, \cup, \cap, \phi, S) is a lattice of subsets of S ordered

by the subset relation (which is a partial order).• The LUB (join) operator is union as it gives the smallest element

larger than two elements that contains both the elements.• Similarly, the GLB (meet) operator is set intersection.• The bottom element is the empty set and the top element is the entire

set.• Consider the set S = {1, 2, 3, 4, 6, 8, 12, 24} and the 'divides' relation (|). | is

a partial order as• x | x \forall x;• x | y \wedge x \neq y => y does not divide x• x | y \wedge y | z => x | z• (S, |, LCM, GCD, 1, 24) is a complete lattice.• LCM is the join/LUB operator as it gives the least element 'larger'

(i.e. multiple of) than any two given elements under the chosenordering.

• Similarly, GCD is the meet/GLB operator, least element (bottom) is1 which divides everything and greatest (top) element is 24 whicheverything divides.

• Both these lattices can be extended to all natural numbers (> 0). But wouldresult in lattices of infinite height and infinite 'width'.

• Lattice of signs• {Bottom, +, 0, -, Any}• Bottom <= x \forall x• Any >= x \forall x• +, -, 0 unrelated to each other• Note: One can have other sign lattices too with elements such as

non-negative, non-positive and non-zero to represent other classes ofnumbers

• Finite lattice (with obviously finite chains)• Lattice of intervals

• Elements of the form [x, y]• [x, y] <= [a, b] iff x >= a \wedge y <= b, i.e. a 'tighter' interval is lower than

a more loose one.• Bottom element is the empty interval (special case for the <= relation

defined above)

5

• Top element is the complete interval [-\inf, \inf]• Infinite height and width

2.2 Analysis techniques

• Initially, focus only on single procedure programs. Inter-procedural analysisintroduced later

• Three approaches to program analysis• Data flow analysis• Abstract interpretation• Non-standard type inference

• The three approaches not independent – just different ways of looking at theproblem. Sometimes ideas from multiple approaches work best.

• Running example program across all techniques:S0: ENTRYS1: x = 10S2: while (x < 100)

doS3: if (x > 0)S4: x = x - 3

elseS5: x = x + 2

fiS6: x = x * 4

odS7: EXIT

• Example analysis: Sign analysis• Five elements (bottom, 0, +, -, top/any/+-) ordered in the usual way

2.3 Data flow analysis

• Developed primarily in the context of program optimization• References: Kildall 73, Hecht 77, ASU 86.• Uses two abstractions

• The program is always abstracted to a control flow graph (see below)• The information abstraction depends on the analysis. This defines the desired

lattice (called L)• A control flow graph (CFG) abstracts a program

• Description of CFG with example• A node for every basic ‘statement’ (can be extended to ‘basic

blocks’)• An edge for every possible transfer of control• Cycles in the presence of loops

• Not every path in the CFG may be a path in P (even if every branch in theCFG may have an equivalent). (Why?)

• Develop (monotone) functions corresponding to basic constructs in the program• Each node type corresponds to a 'basic construct' of the language

6

• ‘Flow’ functions that describe what happens to property of interest when theprogram executes that construct

• Set of monotone, abstract flow functions defined for each ‘node type’: F• <L, F> defines a data flow problem

• Given a program P and a <L,F> pair• Build a CFG for P• Instantiate F for each node in P• Assume some information (usually ‘no information’) at program entry• Results in a set of (mutually recursive) equations for information at each

‘program point’• Can be ‘in of node’, ‘out of node’ or just one of them.

• Mutually recursive equation set• Multiple solutions! For example, in interval analysis, all variables having

[-inf, +inf] interval or the computed interval• Lattice properties ensure existence of solution.

• See p 27 of lattice.pdf• Least (Greatest in data flow literature) fixed point gives us the ‘best’ solution

for the chosen abstraction.• Two questions

• How to solve the mutually recursive system of equations?• What is the relationship between this solution and desired analysis property?

• Solution can be through ‘iterative’ or ‘elimination’ approaches.• Guaranteed to terminate if chains are finite.• Example worked through iterative approach.

• The concept of meet-over-all-paths (MOP) solution• Assume all paths are executable• The desired solution at a point is the meet of information at that point along

all paths to that point• Meet to combine information• Information along a path is just function composition of individual

functions• Discovered fixed point is equal to MOP solution if the analysis framework (i.e. all

flow functions) is distributive• Distributive framework: f (x \meet y) = (f x) \meet (f y)

• Even if framework not distributive, solution is ‘conservative’• That is discovered solution is a ‘safe approximation’ of MOP solution, i.e.

‘<=’ MOP solution in the ‘upside down data flow lattice’• The classical 'separable' or 'bit-vector' problems

• Available expressions, 'reaching definitions', 'live' variables, 'very busy'expressions

• Each 'element' (expression, definition, variable, expression) can be dealt withindependently

• That is, whether an expression is available or not etc. does not depend on anyother expression

• Note: Terminology confusion between data-flow analysis and other semantics /analysis literature

• Semantics / abstract interpretation literature typically orders lattices such thatthe 'smaller' elements represent more precise information and therefore, the

7

desired solution is the least fixed point. This is computed by repeatedapplication of the function to the bottom element of the lattice, and the bestapproximation of any two elements is the join or LUB.

• Data flow analysis literature typically orders lattices such that the 'larger'elements represent more precise information and therefore, the desiredsolution is the maximal fixed point. This is computed by repeated applicationof the function to the maximal element of the lattice, and the bestapproximation of any two elements is the meet or GLB. Hence the termMOP solution.

• In other words, the two sets of literature view the lattice 'upside-down' withrespect to the other.

2.4 Abstract interpretation

• Use running example• Formally defines multiple ‘levels’ of semantics• References: CoCo 77, CoCo 79, Nielson-Nielson-Hankin (PPA) book• ‘Lowest’ level of semantics describes actual program execution

• Semantic state transformer functions defined for each construct• Overall program semantics is defined as a fixed point of (composition of)

these functions• These functions operate over a concrete domain of values

• Integers, Booleans, physical memory locations etc.• Each domain is described by a (natural) lattice

• This is the ‘concrete interpretation’ – the ‘base’ abstract interpretation• Each analysis is now described as an ‘abstract interpretation’

• A (semi- or complete) lattice of abstract values (intervals, signs etc.) with itsown ordering

• An abstract semantics, i.e. a semantic function for each construct thatoperates on the abstract values

• The analysis itself is now the semantics as derived from these abstractfunctions

• Fixed point computation to determine ‘analysis solution’• Existence of fixed point, its computability etc. follows from lattice theory

• Semantics can be described at any 'level' and in any 'form':• Eg: the CoCo77 papers work on 'trace-like' semantics, i.e. semantics of

flow-chart like programs• But it can also be a denotational semantics (or big step semantics) etc.

• Also see absint.pdf• Consistent abstract interpretations

• To prove ‘correctness’ of analysis• Define a pair of functions – abstraction (alpha) and concretization (gamma)

functions from concrete to abstract domains and vice-versa• These functions may introduce loss of information

• gamma . alpha >= id; alpha . gamma <= id• Alpha and gamma may form a ‘Galois connection’ – ‘best

abstraction’• Use these functions to show correctness

8

• Showing that information loss in alpha/gamma is ‘consistent’(commuting diagram on p 242 of CoCo77 paper), i.e. ‘safeapproximations’

• Often fixed points impossible or expensive to compute• Extremely long chains• Example: Interval analysis

• Means of approximating fixed points• ‘Widening’ operators (from interval widening – i.e. approximating) to ensure

safe approximation of least fixed point reached• Narrowing operators to get you back towards the least fixed point while

always remaining a safe approximation• Examples

• Widening operator (p 246, 247, Sec 9.1.3.1 - 9.1.3.3 of CoCo 77)• Widening operator not commutative, unlike the join operator!• Narrowing operator (p 248, 249, Sec 9.3.4.1 - 9.3.4.3 of CoCo 77,

and example)• Diagrams to explain widening / narrowing (p 249)

• Note: The CFG of data flow analysis itself is an abstract interpretation (albeit a very‘low level’ abstraction)

• Basically throwing away some control flow information but keeping all thedata flow information

2.5 Non-standard type inference

• Use running example• ‘Logic-based’ approach to analysis• Consider the problem of program analysis as a ‘type inference’ problem where the

information we require corresponds to the types to be discovered (and thecorresponding program constructs are the ‘identifiers’ or ‘variables’ to whom weneed to assign types)

• Such ‘types’ should form an inclusion or subtype hierarchy – this corresponds to theinformation lattice

• Similar to the flow functions of data flow analysis and abstract semantic functions ofabstract interpretation, you define ‘typing rules’

• Typing rules specify when a construct is well-typed, i.e. what is the ‘best’ typeassignment to the constituents of a construct which would make them mutuallyconsistent

• Example: y + z is the construct and (say) + is known to be an operator Arithx Arith -> Arith. Then the typing rule for the addition construct would saythe typing is consistent if y is Arith, z is Arith and the expression itself isalso Arith. For y < z, the last part would change to Bool.

• Since we often want information at each point in the program, there may bedifferent ‘types’ associated to each program point

• This in turn, means that the type of a statement is really a ‘type transformer’– i.e. modifying a type to another, just as a flow function or semanticfunction

• Defining the analysis• Similar to defining the lattice / flow functions / semantic functions

9

• Deciding on the set of types, their inclusion relationship and the typing rules• Performing the analysis

• Finding the most general types (equivalent to best approximations) under thegiving typing rules for a given program

• Done similar to type inferencing algorithms such as Milner’s• Could be expensive – depends on the typing hierarchy• Can have special subsumption rules to help approximate faster

• Similar to widening• Also see TypeSyst.pdf

2.6 Inter-procedural analysis

• Analysis of programs with multiple procedures / functions raises issues differentfrom analysis of single procedure programs

• Two papers to go through• One, introducing general techniques of inter-procedural analysis• Two, better algorithms for a specific class of analyses

2.7 Two approaches to inter-procedural analysis

• Classic Sharir, Pnueli paper (1981) laying out the general techniques• In a data-flow analysis context

• Consider an inter-procedural control flow graph (ICFG) with calls connectedto procedures and returns connected to the node(s) after the call.

• Example on p 198, Fig 7-1 of paper• Many inter-procedural paths in the ICFG are ‘obviously’ wrong since a

return must return back to the corresponding call• The set of inter-procedurally valid paths (IVP) in an ICFG is a subset of the

set of paths in the ICFG• Considering all paths in the ICFG sound but highly imprecise

• Formally, the set of paths in a CFG are generated by a regular language,while the set of IVPs in an ICFG are generated by a context-free language.

• Because of the need to simulate a call stack• There is also a need to handle scopes, lifetimes of variables,

parameter passing etc. These are ignored for now, as they are easy todo.

• Therefore, need techniques to only consider IVPs rather than all paths.• Broadly, two approaches to address the problem

• ‘Functional’ approach• ‘Call strings’ approach

• Functional approach• Define functional equations for the information at a point in terms of the

information at the entry of the procedure, along IVPs [pp 199 – 200 ofpaper]

• Solve for the functional equations to obtain solutions that are functions• Existence of the solution depends on the height of the function

lattice• Approximation techniques can be used

10

• Having obtained functions for each point along IVPs from its procedureentry, actual information at each point can be found using another set ofrecursive (non-functional) equations [pp 200 – 201 of paper]

• Example on page 201 – 202 of paper• Can show that this approach yields the MOP solution over IVPs if the

functions are distributive, and that it yields a sound approximation if thefunctions are non-distributive.

• Proof on pp 202 - 204 of paper.• Practical problem: to represent the computed functions efficiently• A purely iterative algorithm to implement the functional approach is also

possible• Algorithm on pp 207 - 208 of paper• Example on pp 209 - 210• Does not explicitly represent functions but directly applies the

function (only) to values occurring in the analysis• But does not necessarily make it cheap• In fact, may not even converge - if the chains are infinitely high etc.• But guaranteed to yield a correct result if it converges

• The call strings approach• Resembles the iterative data flow approach• Explicitly carry the ‘call stack’ with the information• Propagate only relevant information back from return edges using the call

strings at return nodes• Obvious problem when call strings unbounded (due to recursion)• Formal definitions of call strings and their ‘extensions’ for each edge in the

CFG [pp 212 of paper]• Using these definitions, define an ‘augmented’ data flow framework for the

inter-procedural case: <L*, F*>• L* defines functions from call strings to L (equivalently pairs of call

strings and lattice values). This is the information at a point in thenew framework – i.e. information at a point is parametrized by its‘calling context’ – hence the name context sensitive analysis

• F* defines functions over L* and is derived from F and properties ofcall strings.

• Functions for inter-procedural edges change only thecall-string part

• Functions for intra-procedural edges change only the L part• Should be closed under composition and meet, and contain the

identity function.• Note: this framework depends on the ICFG and not independent –

unlike in the intra-procedural case• Solving the resultant data flow equations results in the MOP solution over all

IVPs.• After solving, the solutions for all the call-strings merged (through a

join) to get the solution valid for all call-sequences (and eliminatingthe call-strings in the process). See p 215, 7-12 of paper.

• Proof given in paper.

11

• Solution may not converge for recursive programs even if L is finite. Butpossible to converge by choosing appropriately finite subset of call strings

• Finite prefix closed subset of all call strings• Basically choose them long enough to allow convergence even if

longer paths possible• Height of lattice * longest cycle in call graph. Or for

simplicity, size of lattice * number of calls.• But the size of the call string set may be still too huge

• But not so in the case of the so called ‘separable’ problems wherethe ‘effective’ height of the lattice is 1!

• Example pp 224 – 225.

2.8 Special case of inter-procedural analysis

• Reps-Horwitz-Sagiv paper of 1995• For a special sub-class of problems, one can have efficient algorithms for precise

inter-procedural analysis• The class is defined as the set of problems which have distributive transfer functions

and finite data-flow ‘facts’ – i.e. transfer functions are from (finite) sets of facts to(finite) sets of facts, and the meet operator is either union or intersection.

• Inter-procedural, finite, distributive, subset (IFDS) problems• Transfer (flow) functions associated with edges

• This includes the classical ‘bit-vector’ or ‘separable’ problems and also problemssuch as copy-constant propagation, possibly uninitialized variables etc.

• Precise inter-procedural analysis for IFDS problems is reduced to an equivalentproblem of graph reachability over IVPs (or equivalent).

• Adapts the ‘functional’ approach of Sharir-Pnueli• Interprocedural control-flow graph (‘supergraph’ in the paper) has 4 kinds of edges:

• Normal intraprocedural edges• Edges from call nodes to entry nodes• Edges from exit nodes to return nodes• Edges from call nodes to return nodes• Example: Fig 1, p 3 of paper

• Each flow function f from 2D -> 2D is mapped to a binary relation Rf over (D union{0}) where {0} represents the ‘empty set’. Rf has at most (D+1)2 elements. Rf isdefined as follows:

• (0, 0) \in Rf• \forall y \in f(\phi). (0,y) \in Rf• \forall y \in f({x}). (x,y) \in Rf if y \not\in f(\phi)• Basically the ‘bottom’ element maps to itself.• The bottom element also maps to all those elements that are ‘generated’ (i.e.

obtained by applying f to the empty set – in other words independent of theinput)

• If the singleton x maps to y, then (x,y) is also part of the relation• Some examples on p 4, Sec 3 of paper.

• Mapping from a representation relation to a function is also easily possible:• [R] (X) = ({y | \exists x \in X. (x,y) \in R} union {y | (0,y) \in R }) \ {0}• Easy to see that [Rf] = f.

12

• Composition of two flow functions also maps to composition of correspondingrelations

• Rf; Rg = { (x, y) | \exists z. (x,z) \in Rf \wedge (z, y) \in Rg }• Therefore, path functions are compositions of relations, i.e. [Rf; Rg] = g \circ

f• In other words, if relation expressed as graph, composition of path functions

is equivalent to tracing a path in the graph!• Translating the IFDS problem to a graph reachability problem

• Associated with every flow graph node, have D+1 ‘points’ corresponding theelements of D and 0.

• ‘Exploded super graph’ whose nodes are pairs consisting of originalICFG nodes and a data-flow fact

• Corresponding to flow function f of every edge, connect correspondingpoints as defined by Rf.

• Assuming no information is available at the entry of main, the solution to the IFDSproblem is simply the set of points reachable from the point <entry of main , 0>along IVPs.

• But, how to determine reachability along IVP?• Done by a work-list algorithm using ‘path edges’ and ‘summary edges’

• Similar in spirit to the Sharir-Pnueli functional approach without actuallycomputing functions

• Path edge is an edge of the form <ep, d1> to <n, d2> where ep is the entry ofa procedure containing node n.

• Indicates that there is an IVP from the <entry of main, 0> to <ep, d1>, and a‘same-level’ IVP from <ep, d1> to <n, d2>.

• In other words, the data-flow fact at the target of the path edge is part of theIFDS solution at that node.

• Summary edges similarly capture the effect a procedure, i.e. they are edgesof the form <c, d1> to <r, d2> where c is a call node and r its return node.

• It represents information that (may have) passed through the procedurecalled.

• Algorithm for computation of path and summary edges in Fig 3 (page 7) of paper.• Detailed example with possibly uninitialized variables problem in Fig 1 (page 3) and

Fig 2 (page 5).• Some special cases such as h-sparse and separable problems lead to more optimal

algorithms. Complexities of all given in Table 5.2 (page 9).• Worst case is O(ED3).• Becomes O(ED) for separable problems.

• 'Real' analyses of course have to contend with many more issues/constructs!!• Arrays• Records/structures• Polymorphism• Sub-typing / inheritance• Pointers ...

13

2.9 Analysis problems to focus on

• Equivalence between expressions• General question: Does expression x + 2y have the same value at a program

point as expression c – 3b + d?• (Obviously!) un-decidable• Still useful to find approximate solutions and solve special cases as

many applications• Software verification

• Does assertion x + y = 3 * z hold at this point?• Constant propagation

• Does variable x have constant value c? Or, does x – c = 0 hold?• Replace variable with constant, if so.

• Copy propagation• Do two variables hold the same value? Or, does x – y = 0 hold?• Can be used in, say, efficient register usage.

• Common sub-expression elimination• Do x+y and c+d have the same values, and has one of them already

been computed?• If so, use it in place of the other.

• Alias / pointer analysis• Alias analysis: Do two ‘names’ refer to the same ‘location’?

• Eg., <*p, *q> if p, q are pointers to the same type, and can point tothe same memory location (stack or heap) at that point in execution.

• Points to analysis: What are the ‘names’ to which a pointer may point to?• Eg., <p, x>, <q, x> if both p and q may point to x at a point in

execution.

3. Herbrand equivalence analysis

3.1 Problem definition

• Most general problem statement• During the execution of the program, find the relationships between values

of expressions that hold at each point.• Eg. at point p, x3 – 2xy + 3yz – 23 <= 0

• Similar to discovering program invariants!• Obviously undecidable in most general form

• Many simpler variants• • One variant

• Finding relations (equality and inequality) among linear expressions• Essentially, restrict the kind of expressions being dealt with

• Herbrand equivalence of expressions• Operators in the expression are ‘un-interpreted’• So, only ‘structural equivalence’ to be checked• Herbrand equivalence => expression equivalence but not the other

way round, i.e. sound but incomplete, as desired.• Can be solved 'precisely', even though expensive.

14

• Expression x+y equivalent to a+b only if both can be ‘reduced’ tostructurally equivalent expression

• Say, at the point in question, x has value c, y has value d+e; while ahas value c+d and b has value e

• Both reduce to c+d+e and hence Herbrand equivalent• Is this right? Not really, as it needs the knowledge that + is

associative!• Applications• • (Copy) Constant propagation

• Common sub-expression elimination• Invariant code motion• Detecting / verifying invariants

3.2 Cousot-Halbwachs (1978)

• Focuses on linear restraints (which include linear equalities and inequalities) – socloser to invariant discovery but restricted to linear expressions

• Tries to find relationships such as x + 3y – z <= 0• Uses abstract interpretation approach• In a sense, completely orthogonal to Herbrand equivalences

• Here operators are interpreted• Subsumes Herbrand equivalences and therefore constant propagation,

‘available expressions’ etc.• But only for linear expressions

• Lattice of linear restraints with ‘geometric’ representation• Each restraint (i.e. equality or inequality) represented by (an approximation

of) the set of points allowed by it• Geometric interpretation (one 'dimension' per variable)• 'Continuous' domains• Example of space for a set of restraints on p 86 of paper

• Partial ordering by geometric inclusion• If the ‘region’ of one restraint subsumes the ‘region’ of another, it is more

approximate• Basically allows more values

• Lattice of infinite height (and width!)• So need widening to converge

• Determining the ‘region’ from a set of restraints (and vice-versa) requires a lot ofcomplex math

• Polyhedra, convex hulls, frames and so on …• Usually approximations used as exact intersection of two restraints may be

impossible or very hard to find• Intersection of two regions represents the merge (meet) information

• The abstract semantic functions• Describe how the state of restraints is transformed by each statement• Semantic function for assignment has to consider different possibilities such

as assignment of non-linear expressions, linear expressions etc.• Semantic functions given from p 90 to p 93 of paper for flow chart programs

15

• Assignment of non-linear expression to x means we know nothing aboutvalue of x. So other relationships involving x have to be modified toeliminate x - cutting out one dimension!

• Assignment of linear expressions does not result in elimination of adimension but requires complex geometric jugglery

• Similarly, linear and non-linear equality and inequality tests also change theset of restraints

• Start with initial restraints (relationships among input parameters)• Keep applying semantic functions (with widening) until it saturates• Example from the paper on p 94 (Sec 5) of paper

• Pictorial intuition to how it works, including widening• Expensive but comprehensive technique

• Example given in paper (p 84) finds a whole bunch of inequalities for bubblesort

• Shows the power of program analysis …• … even if not practically feasible

3.3 Kildall’s paper (1973)

• The paper proposed the idea of a general data flow framework and themeet-over-all-paths solution

• One of the problems addressed was ‘common sub-expression elimination’• If the equivalent of an expression has already been computed, do not

compute it again• At each program point, compute a set of ‘equivalent’ expressions

• Abstract lattice• Partitioning of expressions (computed so far)• Each equivalence class containing equivalent expressions• Ordering corresponds to partition refinement

• P1 <= P2 if P1 is a 'coarser' partitioning• 'More expressions are equivalent' in P1

• Example• {{a,b,c,d},{e,f}} <= {{a, b}, {c, d}, {e, f}}• {{a,b,c}, {d,e,f}} ?? {{a,b}, {c,d}, {e,f}} : These partitionings are

unrelated!• Join is difficult

• ‘Prune’ equivalence classes to relevant ones• Details on p 198 of paper

• Identify expressions common to the two partitionings• For each common expression, intersect corresponding partitions to

get partitions of the ‘joined’ partioning• The ‘flow function’ works on the equivalence classes depending on the computations

inside a node (p 198 of paper)• Assume a partitioning P at the entry to a node N• For each (partial) computation exp at N, if exp is already in some partition of

P, it is redundant• Else, create a new partition for exp

16

• Also have to add other elements to exp’s partition depending onequivalences of exp’s sub-expressions

• Infer new expressions to be added to equivalence classes to makethem ‘complete’ (‘structuring’ the equivalence classes)

• Eg., if exp is a+b, and a is in a partition with c+d and b is in apartition with e+f, then have to add all of a+e+f, c+d+b, c+d+e+f tothe partition with exp.

• These operations make it hard• If the node has an assignment (say v = exp)

• Remove all expressions containing v from their partitions.• For all expressions exp' that have exp as a sub-expression, create a

new entry in the partition with exp replaced by v.• The flow function is also distributive!• Can easily add constant propagation to this function

• Add constants also to the equivalence classes• If operands of an expression are in an equivalence class with

constants, then compute the expression itself• If computed constant has a class of its own, add this expression to

that class, else add the constant to the class of the expression• Basic complexity exponential because of trying to deal with partitions, and trying to

‘complete’ partitions• Basically have to look at all possible ways of combining operands based on

equivalence classes of operands• Meet also expensive

• Global value numbering• Number the partitions and represent expressions by operators operating on

partition numbers• Decreases number of expressions in a partition• Brings down the cost• Small examples on pp 203 of the paper of decrease in partition size

• Meet becomes more complex• Need to recover the 'hidden' information from value-numbered

expressions• Details on p 203 of paper if required

• And complexity remains as bad• Example on p 235 (Fig 1) and p 236 (Fig 2) of Ruthing-Knoop-Steffen (RKS) paper

and discussion on difficulties with Kildall on p 236-237

3.4 Alpern-Wegman-Zadeck (1988)

• A simple, cheap algorithm to detect some but not all equivalences between variables(and expressions)

• Single global data-structure suffices unlike Kildall• Data flow framework• Uses an SSA (static single assignment form) representation

• Replace each variable by many ‘copies’ so that only one assignment to acopy

• Introduce ‘merge’ functions (non-deterministic phi functions) at join points

17

• Phi-functions subscripted by the node in which they belong (Fig 3on p 3 of paper)

• Otherwise might detect two phi-expressions as equivalent justbecause they have equivalent operands, even if the conditions underwhich the branching (corresponding to the phi) occurred wasdifferent

• Given SSA form, a ‘value graph’ is built for the program that describes how the SSAvariables are related

• So, if x1 = y0 + z1 and z1 = x0 * 3:• there would be a + node labelled with x1 with edges to nodes

labelled y0 and z1• The z1 node would have operator * and have edges to nodes labelled

x0 and 3.• Given the value graph representation, it does a partition refinement (on the lines of

FSA minimization) to merge or collapse ‘isomorphic’ parts of the graph• Two nodes with the same operator and dependences would be collapsed into

one partition (and carry labels from both the nodes)• Partitioning algorithm in Fig 7 (p 5) of paper

• Begin with putting all expressions with common root operator in samepartition

• Then for each partition of expressions of a particular operator• If corresponding (m'th) operands are not in the same partition, split

the original partition so that all operands of operators in onepartition come from the same partiton

• Keep doing this in a worklist until no more partitions can be split• After partitioning, variables sharing a node are said to be congruent

• Congruence implies equivalence (not the other way around)• That is, sound but not complete

• Slightly super-linear, so cheap to do• Fig 3, 4 (Sec 4, p 237-138) of RKS paper gives example where this algorithm works• Fig 5, 6 (Sec 4, p 240) of RKS paper gives example where this algorithm does not

work• Because phi functions also treated as un-interpreted operators• So, right at the beginning (most optimistic point), an expression rooted at a

phi can never match an expression rooted at a ‘proper’ operator – meaning itcan never later be merged

• Even if they are the same inside the phi

3.5 Ruthing, Knoop, Steffen (1999)

• An improvement to AWZ paper• One problem with AWZ is that it misses out on many equivalences since phi-nodes

also left un-interpreted.• Overcomes this lacuna of the AWZ by ‘interpreting’ phi nodes, i.e. distributing the

phi over an operator• Presents two simple graph rewrite rules to do this

• Calls them ‘normalization’ rules as the aim is to convert the value graph to a‘normal form’

18

• One rewrite rule just eliminates a phi node both of whose operands are the same.• The variables associated with the phi are moved to the operand

• Another distributes the phi over an operator if both operands of the phi are the sameoperator

• Exact rules in Fig 7 (p 242) of paper• The ‘rewrite system’ consists of these two rules plus the graph partitioning exercise –

the three together are repeatedly applied• The rewrite system is sound

• Because each of the three rules is• The rewrite system also has the nice desired properties of confluence and termination

• The rewriting process will terminate• It does not matter in what order you apply the rules where in the graph

• Given multiple choices for rule application, you can choose onerandomly

• May of course affect how many steps it takes to converge!• Example in Fig 8 (p 243) for confluence

• The rewrite system is also complete for an acylic program• Proof in the paper

• Examples in Fig 9 and Fig 10 (p 244 of paper)• The failure cases for AWZ which work here

• Complexity• O(n4log(n)) in the worst case (repeated partitioning is the most expensive

step)• O(n2log(n)) expected in practice• Example where RKS is incomplete: Can we detect that x and y have the

same values after the first assignment inside the loop (even assuming we canperform computations involving constants)?

x = 0;y = x + 1;while (C1){

x = x + 1;if (C2)

x = x + 1;y = y + 2;

elsex = x + 2;y = y + 3;

}

3.6 Probabilistic approaches (Gulwani-Necula)

• Both AWZ and RKS incomplete but sound• Can we get completeness if we are ready to sacrifice soundness?• That is what probabilistic approaches try to do• The probability of error should be very small

• Confidence in results

19

• Typically, tuneable – lower the error probability, greater the cost• So, you choose the cost-precision trade-off

3.7 Discovering linear equalities

• Discovers relationships of the form a1x1 + a2x2 + … + anxn + c = 0 where the ai’sand c are constants

• So, only discovers specific kind of relationship between variables• Generalization of constant propagation• Not all Herbrand equivalences

• Idea is extremely simple• Just run the program on different sets of (randomly chosen) input values

• So multiple parallel executions• Each execution results in a state at a point• Collection of states called a sample

• At each branch point, ignore the condition and execute both the branches!• At join points, combine the values obtained from the two branches using a

freshly chosen random affine combination of weights• That is, values on one branch are given weight w and on the other

are given weight (1 – w) - an affine combination• One weight per state in the sample• Example in Fig 1, p 2 of the paper

• Multiple executions help decrease probability of error• Particularly for expressions of the form x = k

• Geometric intuition behind the idea• Each state is a point in n-dimensional space• Each sample is a set of points in the n-dimensional space• Merging two samples using affine combinations of weights

• Is randomly choosing a point on the line connecting the two pointsrepresenting the two states

• So merger of two samples picks a set of points on lines joiningcorresponding pairs of states in the two samples

• Example in Fig 2, p3 of paper• Completeness:

• Such affine combinations preserve any linear relationship (of the kinddesired)

• And since the desired property is expected to be valid for all executions, itshould be valid for the sample executions too!

• Lemma 1, p 4 of paper, and associated proof• 'Almost' soundness:

• Has a very low probability of satisfying any non-existent linear relationship• Lemma 2, p 4 and associated proof

• Schwartz's theorem (Theorem 1, p3 of second Gulwani-Neculapaper)

• Relation to testing• Testing equivalent to choosing affine weights 0 and 1.

• Example program in Fig 1, p2 of paper

20

• 3 paths exhibit relationship while only one doesn't - still easy to catch by thismethod while testing will only find that the relationship does not hold ifexactly that path is executed.

• Intersection of spaces• Identifying relationships within branches (of linear conditions)• Example in Sec 5, p 4 of paper, also in Fig 3• Need to 'derive' a sample that satisfies the given condition from the current

sample• Geometric intuition: Points in the sample 'projected' on to the hyperplane

represented by the condition• But projection not orthogonal so as to preserve linear relationships• But as given in Fig 4 (p 5).• Connect two samples with a (hyper)line• Choose a point on this hyperline different from all other points and

not on the hyperplane of the condition• Draw hyperlines from this chosen point to all other points and take

the intersections of these hyperlines with the hyperplane. Thesepoints will all satisfy the linear relationships found so far and willalso satisfy the chosen condition.

• Note: in the process one point in the sample is 'sacrificed' as theoriginal two samples will both result in the same final point.

• Details in Fig 3, Fig 4• But all this goes well beyond Herbrand equivalences, as you detect value

equivalent expressions and not just Herbrand equivalent ones!!• Technical details regarding union of spaces, soundness, completeness, fixed point

computation available in the paper.

3.8 Global value numbering using random interpretation

• Extends the discovery of linear relationships to discovering Herbrand equivalences• Kildall’s is exponential, AWZ is efficient but highly imprecise,

Ruthing-Knoop-Steffen is in between. This one catches as much as Kildall inpolynomial time but is (probabilistically) unsound.

• Unsoundness probability tuneable through parameters• Idea is to choose random interpretations for operators and execute the program to

discover relationships, rather than leaving them uninterpreted.• Previous paper chose random affine interpretations for the phi-operators and

natural interpretations for the linear operators• Joins are still treated using affine combinations.• Interpretation for each operator F can be made by choosing p parameters from field L

• E.g. if p = 2 and F is binary, F(a,b) may be interpreted as p1 * a + p2 * b(where p1 and p2 are the two random ‘parameters’)

• Interpretation should be linear as it should ‘distribute’ over the interpretationfor the affine join (\phi) combinations. Equation 5 on p 4 of paper.

• Unfortunately, this is not (probabilistically) sound as two distinct functionscan easily get the same interpretation (Fig 3, p 4 of paper).

• No point having more than 2 or 3 parameters for a binary operator with alinear interpretation, but these are too few to distinguish between the

21

different leaves of a complex reason as there could be more than 2 leaves inthe expression.

• To overcome this, choose k parallel values for each variable (and thereforeexpression)

• Uses 4k – 2 parameters for the interpretation, namely r1 … rk, r1’ … rk’, s1… sk-1, s1’ … sk'-1.

• Interpretation of ith (i between 1 and k) linear interpretation of F is aregression as follows:

• P(x,i) = x• P(F(e1,e2),1) = r1 * P(e1, 1) + r1’ * P(e2, 1)• P(F(e1,e2),i) = ri * P(e1,i) + ri’ * P(e2,i) + si-1 * P(e1, i-1) + si-1’ *

P(e2, i-1)• Degree of P(e,i) is same as depth of e.• Implicit ‘ordering’ among parameters, i.e. P(e,i) does not contain ri+1 … rk

or si … sk (and their primed varieties).• Can be shown that this interpretation is sound, i.e. if P(e1,i) = P(e2,i) for i > j

(where j is log of max leaves of e1 and e2), then e1 = e2 under Herbrandequivalence.

• Lemma 7, p 5 of paper. Essentially, induction on the depth of theexpression(s).

• Work out interpretations of example in Fig 3 (p 4) of paper to showthat this interpretation does distinguish between the two nonHerbrand equivalent expressions (if required).

• Therefore, max value of k should be log of depth of largest expression inprogram.

• This also appears very conservative, i.e. it can be smaller.• This gives us a sound interpretation for each operator as a polynomial over

the parameters, where soundness is defined as: if the interpretations of twoexpressions are equal, they are Herbrand equivalent. Essentially, defines anon-standard semantics for Herbrand equivalence.

• The analysis proceeds by random interpretation similar to the previous paper• Run the program on a sample of size k• Choose interpretations for each operator as shown earlier (by picking 4k-2

parameters and an interpretation)• Compute values of expressions – not necessarily by computing polynomials

first, but can compute values directly. See function V on page 5 of paper.• A sample S satisfies Herbrand equivalence e1 = e2 if V(e1, k, S) = V(e2, k,

S). Note: comparing the kth values of the polynomials as they have thegreatest ‘distinguishing’ power.

• At join points, perform random affine joins.• Fixed point (i.e. same set of Herbrand equivalences attained) guaranteed as lattice

has finite depth – bounded by n, the number of program variables (page 7).• Intuition: All possible herbrand equivalences can be represented by a pair

(I,E) where I is a set of independent variables and E is a set of expressions ofthe form x=e, one for each non-independent variable such that e containsonly variables from I. So, if one set of Herbrand equivalences is 'less than'the other, then the 'lesser' one has lesser variables in I. So, any chain lengthis bounded by the number of program variables. (Lemma 13 of paper)

22

• Error probability derives directly from Schwartz’s theorem regarding probability ofrandom values being the root of a polynomial

• Turns out the error probability is d/L (for one union operation) where d is thedegree of the polynomial and L is the size of the field from which randomvalues are chosen

• Obviously, bigger L is, the better!• Probability of error of whole analysis <= (2n2 + t) / L, if k >= (2n2 + t) where n is the

max of number of variables, function applications and join points; t is the max depthof expressions.

• Probability can be still further decreased by performing the random interpretation mtimes, decreasing the error probability to:((2n 2 + t) / L)m

4. Alias / Pointer analysis

4.1 Problem definition

• Do two ‘names’ refer to the same ‘object’, or does a pointer point to a given object?• Relevant for most analysis problems in modern languages

• ‘First’ analysis whose results are used by other analyses• Examples

• Array over-flow analysis: both subscripts and arrays: a [*p], p[*q +3]

• Escape analysis for security: Which objects are ‘leaked’?• Constant propagation / common sub-expression elimination etc.

• Flavours of the problem• Exact definition will depend on the particular language's semantics. For

example, C,C++ with explicit pointers that may point to the stack or heap v/sJava which has only object references that point only to heap.

• Soundness• Typically interested in superset of 'actual' aliases• That is, it is OK to say that a and b are aliased when they are not, but it is not

to OK to miss out the alias pair (a,b) if there may be an execution on which(a, b) may be aliased

• Because missing out such aliases may result in unsound predictions. Forexample, one may conclude that an array index is never out of bounds as wemissed out a possible alias pair and hence a possible array index value.

4.2 Theoretical complexities

• Bill Landi / Barbara Ryder• Complexities of aliasing problems in languages like ‘C’

• <a,b> is an alias pair at a program point if they refer to the samelocation. Typically, each of a and b is something of the form *p or xor p->left etc.

• Binary relation over the space of names.• Reflexive and symmetric relation but not transitive! Why?

(<a,b>, <b,c> could be along different paths, so <a,c> maynever hold!)

23

• Problem definition:• Formal problem definition: <a, b> aliased at a point iff there exists

an (inter-procedurally valid) path from program entry to the pointsuch that a and b refer to the same location along that path

• 'May' or 'Path existence' problem as against 'Must' or 'All paths'problem such as Herbrand equivalence.

• The problem space presented along multiple dimensions:• Number of levels of pointers allowed (single or multiple)• Programming language constructs present (eg., reference

parameters, pointers inside structures, presence of heap anddynamically allocated memory etc.)

• Intra-procedural or inter-procedural analysis• For each ‘point’ in this problem space, what is the complexity?

• For the precise ‘MOP’ solution• Remember one can always approximate and get a ‘simpler’ solution

cheaper• Broadly, when no heap / dynamic memory:

• Single level pointers yield polynomial algorithms• Otherwise, the problems are NP-hard. This includes cases where you

only have single level pointers but also have structure fields thathave single level pointers, thus effectively allowing multi-leveldereferencing.

• Intra- or inter-procedural nature does not seem to affect thecomplexity class

• Result due to Bill Landi, Barbara Ryder's 1991 paper. Exact complexities onp 96 of paper

• Proof of hardness (NP-hard) given by reducing 3-SAT to the aliasingproblem.

• Given a 3-SAT problem, derives a program from it• Derived program such that a pair belongs to the intra-procedural

may alias set iff that truth assignment solves the 3-SAT problem• Exact derivation given on p 102 of paper.• A two level pointer corresponding to each literal, a single level

pointer for each of true and false, and two integers corresponding toyes and no.

• Initially, assign truth values to the literals guarded by unknownconditions, and assign &no to false. Then have code to check thevalue of the given formula.

• The alias pair <*false, no> broken iff the truth assignment makesthe formula false. So, if the alias pair <*false, no> holds at the endof the program, then you have a truth assignment that satisfies theformula.

• Chakravarthy, 2003• Flow-sensitive alias analysis in the presence of dynamic memory (heap

allocation) and only scalar variables (i.e. no structures, arrays etc.) isundecidable even in the intra-procedural case

• Earlier results by Ramalingam and Landi showed that programs withnon-scalar (arrays, structures) variables along with dynamic memory

24

(essentially allowing unbounded cyclic data structures) leads toundecidability. Landi proved this by reducing the halting problem tothis problem, Ramalingam did so (later) with a simpler proof byreducing the Post's Correspondence Problem.

• Proof of undecidability of the problem even in the presence of onlyscalar variables given by Chakravarthy in 2003 by reducingHilbert’s tenth problem (finding integer roots of a multi-variatepolynomial) to this one. Details on p 122 - 123 of paper.

• Basically, create a series of pointers to simulate the "number line".Also create a variable that points to "success".

• Assign elements from this number line to the different variables ofthe polynomial. Also, choose a sign for each variable.

• Then "compute" the polynomial in such a way that the variablepointing to success changes to point to failure if the polynomialdoesn't evaluate to zero. So, at the end of the program, the variablepoints to success iff the polynomial has integer roots.

• Nice diagram of the complexity/hardness results for flow sensitive analyseson p 116 of Chakravarthy's paper.

• Basically, presence of heap makes it undecidable.• If no heap, but program/language is badly typed, then it is NP-hard

(basically can simulate multi-level pointers even if language doesn'tallow it!).

• If there is no heap and language is well-typed, presence ofmulti-level pointers makes it NP-hard.

• If no multi-level pointers, then it is in P.

4.3 Landi-Ryder algorithm

• Approximate polynomial complexity inter-procedural flow-sensitive multi-levelpointer alias algorithm in the presence of structures (but no heap)

• Hard problem, so approximate solution required• Presents the k-limiting (data-flow) algorithm where pointers deeper than a depth k are

essentially ignored• That is not kept track of separately.

• The algorithm performs inter-procedural aliasing based on ‘assumed alias pairs’ atentry (functional flavour of the two Sharir-Pnueli approaches)

• Key result: Each assumed alias pair can be treated separately. So, if a set ofalias relations holds on procedure entry, compute aliases within theprocedure for each separately and then combine them.

• Note: This separability is not true for node level flow functions. Differentalias pairs can and do interfere. For eg., <*p, x> may be affected by astatement *q = *r, because of another alias pair <*q, p>!

• If assumed alias pairs set is empty, it is equivalent to 'generating' theresultant alias pairs, i.e. alias pairs that are unconditionally created by aprocedure.

• Algorithm overview• Lattice is basically sets of pairs of ‘objects’

• Each pair representing an alias

25

• Ordered by subset (bigger, the more approximate)• At normal nodes solve the question: If an alias pair <p, q> is assumed to

hold at procedure entry, will a and b be aliased at a given point?• At procedure calls find what aliases actually created at called procedure

entry, i.e. what aliases may be assumed• At return points, pass only those aliases back to calling procedure which

were created either unconditionally (i.e. independent of any assumptions) ordue to an assumption at that call point.

• ‘Invisible’ variables a major source of pain• Page 5 of paper (second column) and Fig 1 (page 3): Procedure P

creates alias pair <**l1, g1> in spite of l1 being invisible in P.• k-limiting introduced to make it tractable: all pointers greater than depth k

assumed aliased to those at depth k.• Algorithm details

• Simple work-list based algorithm• Works only on assumed aliases that actually arise - similar to Sharir-Pnueli's

'on-the-fly functional' approach• Fig 2 on p 5• Highly "case-by-case" treatment• some functions such as alias-intro-by-assignment, alias-intro-by-call etc. on

p 6-7 etc. (Secs 4.1, 4.2)• alias_at_x implies: Secs 4.3, 4.4, 4.5• bind-call on p 5, 6 Sec 4; back-bind-call on p 7 Sec 4.3• Example from Fig 1 (p 3)???

• Sources of imprecision (p 11-12)• k-limiting• Joining information along multiple 'paths' to lose ‘dependence’ information

• Joining of alias information along different paths• Example(s) on p 11-12

4.4 Deutsch’s improvement on Landi-Ryder

• Abstract interpretation flavoured approach• Goes beyond the ‘k-limiting’ limitation by parametrizing the analysis• Particularly powerful for structures like lists, trees etc.

• Can help to infer facts such as ith element of x is aliased to i + 3rd element ofy etc.

• But of course at a greater cost!• The abstract lattice of information has two components

• ‘Symbolic access paths’ (SAPs)• Eg., p->(left)i->right) where the i's are called ‘coefficients’ of the

SAP• So, the symbolic access paths represent an infinity of objects• An alias pair such <p->(left)i, q->(right)j> would mean that q

dereferenced j times through right is aliased to p dereferenced itimes through left.

• Provided a given relationship between i and j is satisfied (say, i =j+3 for the previous example)

26

• A lattice to represent relations among the coefficients• Basically constraints among integers• Hypothesis 2.1 (p 3 of paper) gives list of properties that must be

satisfied by the operators on the parameter lattice.• So, can use any lattice to approximate numerical relationships

• Kildall’s or Halbwach’s or interval etc.• This is the ‘parameter’ lattice• Used to ‘solve’ the constraints between coefficients

• The lattice for the analysis is a set of elements, where each element is a pairof symbolic access paths and a set of constraints on the coefficientsoccurring in the access paths.

• Needs the coefficient variables to be canonically named to make iteasy to combine them and to simplify alias relation representation.

• So, two alias elements <(f1, f2), K> and <(f1, f2), K'> where K !=K' cannot exist in the same alias relation.

• Join on these elements is defined point-wise• Combine the two sets of alias pairs first, retaining the constraints as

is.• If there are two elements in the combined set of the form <(f1, f2),

K> and <(f1, f2), K'>, then unify them into one element <(f1, f2), K\lub K'>

• Example in Fig 3, p 4 of paper• Infinite height lattice as SAPs unbounded and lattices of intervals etc.

needn't be bounded. So, need widening operators.• Widening of parameter lattice through its widening operator, while

widening of SAPs through Factor function defined, that basicallyconverts repetitive structures in a SAP to a parametrized form

• Eg: Factor (x->tl->tl->hd) = x->(tl->)ihd with the constraint {i = 2}• The Gen / Kill functions• • Kill function takes an access path and removes any alias pair from the alias

relation that has this access path or its extension.• Kill used for assignment (aliases involving LHS killed), and when variable

goes out of scope.• Kill function example in Fig 5, top of p 5.• Similarly, gen used to introduce aliases on assignment. Considers three

cases: lhs unaliased (easiest), lhs aliased but not to extensions of rhs, and lhsaliased to extensions of rhs (hardest - leads to cycles).

• Simple flow function in Fig 7 at bottom of p 5.• Detects more than Landi/Ryder but also more expensive due to the more expressive

lattices• Sec 4 talks of complexity

4.5 Inter-procedural points-to analysis with function pointers

• Emami-Ghiya-Hendren, 1994• New abstraction called points-to• • Different (uncomparable) abstraction to aliases

27

• Abstraction is an ordered pair of ‘names’ (stack locations) along with a flagto indicate definite or possible relationship.

• <p, q, D> says p definitely points to (not aliased to) q.• <p, q, P> says p may point to q.• The definitely points-to (D) variety of points-to helps to kill more

efficiently. For eg., given *p = ... where p is a two-level pointer, if ppoints to q definitely, then things pointed to by q can be killed andreplaced with what I get from this assignment. Else, one can onlyadd to the set of things pointed to by q.

• This relation is not symmetric, in fact it is asymmetric if the programis well-typed, i.e. it cannot be the case that p points to q and q pointsto p.

• Set of such abstractions at each point• Related by subset inclusion and with P >= D• Larger the set, the more approximate the information. Similarly, the

fuzzier the information (P), the more abstract.• Aliasing represents transitive closures which points-to eliminates

• <p, q> and <q, r> in points-to set basically represents (*p, q), (*q, r), (**p,*q) and (**p, r) aliases.

• But the alias set may not contain all transitive closure elements so aliasesderived from points-to may be too conservative (Sec 7.1, Fig 9)

• Also cases where aliasing kills conservatively due to absence of information,so points-to gives better results (Sec 7.1, Fig 8)

• So incomparable to aliasing• ‘Invisible’ variables are a pain here too.

• See p 6, Sec 4.1• Presents a functional or compositional solution

• A semantic function for each syntactic construct such as sequence,if-then-else, while etc.

• These functions defined in terms of other functions• Basic functions simple enough

• Fig 1, p 5• Join is 'set union' with the additional job of joining the D and P bits

• D \lub P = P; and x \lub x = x.• Weird case is <x, y, D/P> in one set and x doesn't point to anything

in the other set (uninitialized). Just go ahead with <x, y, P> as can'tdo better (other than perhaps also raising a flag saying x may beuninitialized here)

• Inter-procedural analysis handled compositionally• Context sensitivity achieved through ‘invocation graphs’

• Orthogonal notion to the points-to analysis• Basically use the call tree

• Examples in Fig 2, p6• Recursion handled through ‘pseudo-cycles’ leading to fixed point

computation• Therefore, each invocation handled in its context

• So a procedure may be analyzed lots of times!• Handles function pointers seamlessly

28

• By building invocation graph dynamically• Heap objects

• Typically, most algorithms (including Landi/Ryder and this one) adapted toaccommodate heap objects and dynamic allocation

• Different possible approximations of heap• Map all of heap to one abstract object• Map all allocations in one procedure to an object• Map each allocation 'site' (i.e. program statement) to a separate heap

object: most commonly used• Note: all are sound, since they do not break up one 'real' location

into multiple abstract locations, but may collapse multiple reallocations into one abstract location. So, may result in spuriouspoints-to's (aliases) without losing any real points-to (alias).

• Given a heap approximation, rest of analysis is normal, just that yourprogram has that many more (finite, statically determined number) ofobjects now corresponding to heap objects.

• Example program and results for inter-procedural analysis in Fig 6, p 9.

4.6 Flow-insensitive analyses for points-to

• Too basic an analysis• Cannot afford to be expensive• Most recent research focussed on flow insensitive approaches• Flow insensitive algorithms

• Analysis ignoring control flow• In other words, statements can execute in any order• Or, the CFG is a complete graph• Note: this is a conservative (sound) assumption – more paths than possible• Typically, leads to simpler but more approximate (less complete) analyses• No ‘point-wise’ information: information computed at ‘program level’• For points-to, it means a ‘name’ points to a set of ‘names’ ‘throughout’ the

program• Soundness requires this set to be a super-set of the points-to set at

each point in a flow-sensitive analysis• Earlier complexity result by Horwitz: (Precise) flow-insensitive points-to analysis

without dynamic memory, only scalar variables is NP-hard if arbitrary number ofdereferences are allowed in expressions

• Chakravarthy (2003) presents a polynomial algorithm for the flow-insensitive casewhen no dynamic memory is allowed and variables are scalars and program iswell-typed (this is the difference from Horwitz), even if pointers and expressions aremulti-level.

• Useful theoretical complexity result showing that flow-insensitive analysis ischeaper than flow-sensitive analysis.

• Remember: Corresponding case is NP-hard with flow sensitive algorithms.• Algorithm based on neat, simple idea that points-to graphs are acyclic in

nature (in the absence of structures). In fact, they are 'hierarchic' in the sensethat nodes at one 'level' can only point to nodes at the next 'level'.

• Two basic popular algorithms for 'real' languages …

29

• Both based on non-standard type inference• Andersen’s algorithm: inclusion constraints or sub-type constraints based• Steensgaard’s algorithm: equality constraints or type unification based

• … and improvements thereon

4.7 Andersen’s algorithm [acknowledgements to Sriram Rajamani at MSR forcourse material]

• Information or type structure:• Locations: l0, l1, …• Type: Location | Ref ({type1, type2, …})• Ignoring procedures for now …

• Basic idea: On every assignment the type of the LHS is a subset of the type of RHS• That is, the lhs points to what the rhs points• But may point to more things

• a = b: Pts-to(a) superset Pts-to(b)• Inclusion or subset constraints

• a = &b: Pts-to(a) contains Loc(b)• Complete typing rules (Slide 24 of Sriram’s Lecture9.pdf)• Running over the program gives a set of type (subset or inclusion) constraints• Solving them for the ‘best’ solution (smallest types) gives a typing representing the

points-to information• Can be looked at as a typing problem or equivalently a set constraint problem• Example (slides 25 – 30 of Sriram’s Lecture9.pdf)• Results computed by Andersen's analysis often very close to flow-sensitive

algorithms in 'real programs'• Empirical evidence

• O (N3) cost• Typing constraints represented as a constraint graph• Each node representing a location and each edge represents an inclusion

constraint• Edges corresponding to ‘complex’ inclusion constraints (Hardekopf paper,

page 3, Sec 3, Fig 1 and thereabouts) added dynamically.• Basically cases corresponding to p = *q and *p = q.

• In the former, points-to(p) \superseteq points-to(v) \forall v \inpoints-to (q). So, need to 'adjust' points-to(p) when points-to ofsomething pointed to by q changes!

• Vice-versa in the second case• These cases need 'back-patching' or equivalent

• Done by computing a dynamic transitive closure of this graph, i.e. pushingpoints-to information along edges

4.8 Steensgaard’s algorithm

• Difference from Andersen: Treat assignments as ‘bi-directional’, rather thanuni-directional

• ‘After’ assignments, LHS and RHS types are unified, i.e. they become thesame

30

• So, LHS would point to everything RHS does and vice-versa• Sound to do so because things would only point to more things than

expected• Information / type structure subtly different

• Type: bottom | ref (Type)• Meaning, you either point to nothing (bottom – for non-pointer types) or you

point to exactly one ‘thing’ which can of course point to other things• So, each ‘thing’ represents a set of ‘variables’ since obviously one variable

can point to many variables• Contrast with Andersen, where the type was Ref ({Type1, Type2 ...})

• Difference with Andersen in terms of resultant points-to graph:• Andersen's points-to graph has nodes that represent one program element

and out-degree of each node >= 0.• Steensgaard's points-to graph has nodes that represent a set of elements and

out-degree of each node <= 1.• Essentially, you collapse program entities together, so that if one of them

points to a set of entities, then any other pointer pointing to one of theseentities points to all of them

• Partition variables into equivalence classes that all point to the samething - which is of course another equivalence class.

• Running over program gives a set of type unification or equality constraints• Typing rules: Slide 18, Lecture9.pdf of Sriram (except for assignment rule, either Fig

3 of the paper or slide 22 of Lecture9.pdf)• To allow for untyped programs, and a 'cleaner' type representation where

unnecessary boxes are not unified• Solving for the best solution gives typing

• This involves both adding arrows between boxes and merging boxes (andcorresponding arrows).

• Slightly super-linear by using appropriate data structures• Need delayed union for the bottom types – hence each statement not

fully processed immediately• Fast union-find data structure for representing the equivalence

classes so that 'delayed updates' can be done efficiently.• Basically whether a type remains bottom or whether it becomes

non-bottom may be clear only later in the processing• Faster than Andersen’s approach but also much more imprecise

• Example program and algorithm working (Slides 11-16 of Sriram’s Lecture9.pdf)• Comparative results of Steensgaard and Andersen on Slides 31 and 33

4.9 Manuvir Das’s improvement on Steensgaard

• Aims to improve Steensgaard’s precision without losing speed with simple extension• Empirical 'observation' that it is first level unifications that seem to make

Steensgaard lose precision over Andersen• Example from Fig 1, p 2 of Das's paper, starting from real program to

flow-insensitive equivalent, to results for Andersen, Steensgaard and Das.• Uses subset inclusion at first level of pointers (for any assignment) and unification

thereafter

31

• Quadratic complexity• In between Andersen and Steensgaard

• Set constraint specifications for this (Slide 37, Lecture9.pdf of Sriram)• Compare with corresponding typing rules for Andersen and Steensgaard

(Slides 35 and 36 of Sriram's Lecture9.pdf)• Typing rules in Das’s paper p 4,5 (Figs 3, 4).

• Each box or cell has a set of identifiers and a 'value' which is either bottom(pointing to nothing) or ptr(type), where a type is again a box or cell.

• Typing rules define a <= operator between the values in a cell, so that a <= beither if a = bottom, or if a = (id1, type1) and b = (id2, type2), id1 subset id2and type1 = type2.

• So, when x has type (id1, t1) and y has type (id2, t2), the assignment x = y iswell typed if t2 <= t1. Because if t2 is not a pointer (i.e. bottom), then t1need not be a pointer either. If t2 is a pointer (id3, t3), t1 must be some (id4,t3) so that id3 subset id4 - meaning that x (and all others in id1) point to allthe things in id3 (since id4 is id3's superset). Moreover this lot (id4, as wellas id3) together point to t3. So, at first level, there is no unification and atlater levels, there is unification.

• Empirical results for C suggest performance slightly worse than Steensgaard butprecision nearly same as Andersen

• Both Steensgaard and Das's algorithms have been applied to really large programs upto Microsoft Word (1.4 MLOC)

4.10 Hardekopf-Lin improvement to Andersen

• Improves the performance of dynamic transitive closure in Andersen’s algorithm• Results identical

• Andersen's typing rules viewed as three kinds of constraints:• Base: Statements such as a = &b where the points to constraint is loc(b) \in

pts-to(a)• Simple: Statements such as a = b where the constraint is pts-to(a) \superseteq

pts-to(b)• Complex1: a = *b where \forall v \in pts-to(b). pts-to(a) \superseteq pts-to(v)• Complex2: *a = b where \forall v \in pts-to(a). pts-to(v) \superseteq pts-to(b)

• Typing constraints viewed as a graph• Nodes represent names / variables• Edge from a to b says pts-to(b) \superset pts-to(a)• Each node also has a points-to set associated with it representing the actual

solution to the problem• Graph initially has nodes corresponding to variables and edges corresponding to

simple constraints and points-to sets corresponding to base constraints.• Final points-to sets computed by:

• Transitive closure by propagating points-to information along edges• Dynamically adding edges corresponding to the complex constraints,

requiring a dynamic transitive closure computation.• For each v \in pts-to(b), an edge (v,a) to be added to the graph (and

transitive closure computed) for Complex1 constraints

32

• For each v \in pts-to(a), an edge (b,v) to be added to the graph (andtransitive closure computed) for Complex2 constraints

• Simple worklist algorithm to compute the dynamic transitive closure given in Fig 1,p3 of paper

• Key problem is in dealing with cycles• In this case, all nodes in a cycle have the same typing information• Major source of inefficiency is in not detecting cycles and therefore

propagating information slowly through it• Eagerly trying to detect cycles may waste effort in going through the graph

without finding useful information• Delaying the detection of cycles will make it slower as information slowly

gets pushed around node to node in the cycle• Two independent approaches to identifying cycles

• Lazy approach• Hybrid approach• Both based on heuristics• The two are independent, so can be combined

• Lazy approach• Keep building graph as usual• When two adjacent nodes identified to have similar types (i.e. same points-to

sets)• Invoke cycle detection

• May not catch all cycles as soon as they are created• Hence not eager, so lazy• Instead detects them when their effect (same points-to sets) is visible• May detect cycles late, but also may save on unnecessary effort in

cycle detection• Algorithm in Fig 2, p4 of paper

• Remember edges that have contributed to a cycle so you don't lookfor them again, even if you come across them

• Hybrid approach• Do a ‘pre-analysis’ (offline analysis) of the program• Identify ‘obvious’ cycles in constraint graphs (from simple constraints)

• These are collapsed immediately• Build auxiliary graph with ‘potential’ cycles using ‘ref nodes’

• Ref node for a corresponds to *a and is a 'stand-in' for whatever amay point to in the program

• Fig 3, p 4 of paper• Cycles identified in auxiliary graph used during online analysis

• Linear time complexity to build and collapse initial graph and build theauxiliary graph

• During ‘actual’ analysis, potential cycles may become actual cycles• Changed algorithm and illustrative examples from paper

• When a new node n is taken up for processing, check if the ref-nodeof this node belongs to a cycle in the auxiliary graph

• If it does, collapse the points-to information of each v in pts-to(n)with other (non-ref) nodes in the auxiliary graph

• Rest of the algorithm identical

33

• Modified algorithm: p 5, Fig 5• Illustrative example Fig 4, p 5 (along with Fig 3, p 4)

• Not guaranteed to find all cycles• Only those cycles that can detected through the auxiliary graph• But whatever it finds, it finds at the earliest opportunity because the

information has been pre-computed• Hybrid and Lazy detection can also be combined to yield even better results

• Empirical results on combining the two very encouraging• Has scaled to programs such as Wine (1.4 MLOC) and the Linux kernel (2.1

MLOC)• With exactly the same results as Andersen

5. Summing up

• Seen different techniques, algorithms for different kinds of program analyses.• Most prevalent approaches: abstract interpretation and data-flow analysis.

• Non-standard type inference, constraint-based analyses also common andrelated to these.

• Some constraint based techniques (Eg., Saturn from Stanford by Xie andAiken) also gaining ground

5.1 Data flow analysis is model-checking of abstract interpretations

• What exactly is the relationship between abstract interpretation and data-flowanalysis?

• David Schmidt’s paper in 1998• Connects data flow analysis (dfa) to abstract interpretation (ai) through model

checking (mc).• Basically, the Achilles heel of dfa is the ‘all paths are executable assumption’

• Its starting point is a CFG that makes this assumption• CFG itself is an ai – the interpretation of ‘traces’!• DFA discovers whether some properties hold over these traces• Therefore, DFA is a model-check of a trace AI!

• Trace-based ai:• Basically interpretations where the ‘possible execution traces’ are captured

as a (possibly infinite, but ‘regular’) tree but the values may be abstracted.• ‘Safe simulation’ is defined as a relation between a ‘concrete’ tree and an

‘abstract’ tree that ensures that every transition in the concrete tree has acorresponding one in the abstract tree.

• Note that every transition in the abstract tree needn’t be so in the concreteone.

• Useful to check safety properties.• Dually, ‘live simulation’ can also be defined (every transition in the abstract

tree will have an equivalent in the concrete).• Then comes ‘collecting semantics’: just the collection of information from a state in

a computation tree (concrete or abstract). So, collecting semantics can be concrete orabstract.

34

• Typically, a dfa computes a collecting semantics of properties i.e. the abstract valuesin the collecting semantics are properties of interest.

• So, properties are expressed in a logic and we want to check if a computationtree (could be a sub-tree of the whole computation tree – i.e. also a node inthe whole tree) satisfies the property.

• Additionally, a ‘weak consistency’ requirement is that if the abstraction is asafe simulation and a property holds in the abstraction, then the propertymust hold in the concrete one.

• So, as can be seen, this is something valid for the ‘must’ dfa problems andnot the ‘may’ ones.

• The logic for expressing properties happens to be modal mu-calculus,• Allows expression of properties such as

• ‘in all next/previous states. p’ (box operator),• ‘in at least one next/previous state. p’ (diamond operator),

• Recursive properties need least/greatest fixed points.• Mu-calculus used for mc – whether a property holds at a state in a tree.

• Therefore, dfa translated to mc over ai (CFG).• Using mu-calculus and some trace-based ai, it is now possible to identify

states where some properties hold.• Caveat: the weak consistency property is only satisfied by the box operator

and not the diamond operator, since diamond is only existential. So, for agiven abstract trace, there may be no corresponding concrete trace.

• Dfas are modelled over flow graphs, which are just the ‘simplest ais’ of concretesemantics (basically, just ignore values fully). Dfa equations (particularly the ‘bitvector’ ones) can be rewritten as modal mu-formulae, and the fixed pointcomputation is the same.

• Easy to see why the may problems are ‘unsound’ in this formulation as they violateweak consistency – i.e. something deemed live by analysis may not be live on anactual execution (as expected).

• But the ‘conservative’ nature of these dfas (you really want dead variables,not live ones) obtained by negating the formula, which is a box formula andso consistent.

35

Date post:	31-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Program Analysis Course Notes · Ashok Sreenivas, 2008 1. Background / overview 1.1 Course overview...

Documents