Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | davis-hebert |
View: | 51 times |
Download: | 0 times |
Context-Sensitive, Interprocedural Dataflow
Analysis as CFL ReachabilitySeth Hallem and Eric Watkins
Exhaustive Analysis Papers
• “Precise Interprocedural Dataflow Analysis via Graph Reachability”– Reps, Horowitz, Sagiv -- POPL 1995
– applies CFL reachability to context-sensitive, interprocedural dataflow analysis
• “Program Analysis via Graph Reachability”– Reps -- ILP 1997
– describes two additional applications: interprocedural program slicing and shape analysis
The Reduction to CFL Reachability
• Question 1: What problems can we solve?
• Question 2: How do we set up the problem?
• Question 3: How do we solve the problem?
• Question 4: What is the complexity of this approach?
• Running example: possibly uninitialized variables
What problems can we solve?
• IFDS problems– Finite set of dataflow facts (D)
– Mapping from functions ƒ:2D2D to edges in the CFG
– Each ƒ is distributive wrt the meet operator:• ƒ(a b) = ƒ(a) ƒ(b)
• Possibly uninitialized vars:– Each program variable corresponds to a dataflow fact.
When that fact holds, the variable may be uninitialized.
– Transfer functions: a variable is uninitialized if it was just declared or if it is assigned an expression containing uninitialized variables.
Simple Exampleint z;
int main (void) {
int x ,y = 0; /* {x, z} */
y = y + x; /* {x, y, z} */
z = 0; /* {x, y} */
}
• D = {x, y, z}, domain/range of transfer functions is the power set of D (2D)
How do we setup and solve IFDS problems?
• Inputs to the algorithm:– Exploded supergraph (next couple of slides)
• Outputs from the algorithm:– meet-over-all-realizable-paths solution:
• MRPn = pfq( )qRpaths (startmain, n)
The Supergraph
Representation Relations
• Each dataflow function, ƒ, is converted to a representation relation, which is represented as a graph consisting of 2D + 2 nodes– D input nodes, one for each dataflow fact, plus the node
(or 0), which corresponds to the empty set.
– D output nodes plus the node – There is an edge from input node d1 to output node d2 if
d2 ƒ(S) if d1S and d2 ƒ()
More Representation Relations
• (a) and (b) show representation relations for two functions (nodes smain and n1)
• (c) and (d) show two ways to compose these relations– (d) illustrates the need for the in each relation
Exploding the Supergraph
CFL Reachability
• Want to solve the dataflow problem with a reachability query on the exploded supergraph.
• Not all paths in G# are valid, though. Must match calls w/returns.
• Insight: context-sensitivity = matching parens; language of matching parens is a CFL
Context-Sensitivity = CFL
• Assign a unique index to each callsite, define a CFL of matching calls and returns.
• Suppose we have two call-sites to function P(), which we label i and k– (i (k )k )i is a valid path
– (i (k )k is a valid path
– (i (k )i is not
Reachability Algorithm
• Dynamic programming is the key– Start at the entry point to the program. Follow the
edges in G#, recording what dataflow facts we can reach.
– At a procedure call, follow the call. To avoid re-doing any work, though, maintain a cache of edges of that summarize pieces of the computation.
– Summary edges record the results of an entire procedure, start at a callsite, end at the corresponding return-site.
– Path edges record the suffix of a valid path.
Dynamic Programming Details
Complexity
• Worst case for general CFL reachability is cubic in the number of nodes in the graph
• Can do better for dataflow analysis: O(ED3) for any distributive problem, O(Call D3 + hED2) for h-sparse problems– possibly uninitialized variables is 2-sparse when
aliasing is ignored: a variable’s status as initialized or uninitialized can only affect itself and one other variable (if it is assigned to that variable)
Other Applications• Interprocedural slicing
– identify all pieces of a program relevant to a particular statement
• Shape Analysis
– For any DAG data structure, determines a superset of the possible shapes for that data structure.
– Each dataflow fact corresponds to a single possible shape.
– Problem: infinite number of shapes. Solution is to define shape at program point q in terms of shape at previous program points.
– ILP paper has an example of shape analysis of a linked list.
The other papers
• “Demand Interprocedural Dataflow Analysis”– Horowitz, Reps, Sagiv -- FSE 1995
• “Demand-driven Computation of Interprocedural Data Flow”– Duesterwald, Gupta, Soffa -- POPL 1995
• Provide two possible frameworks for transforming any IFDS analysis into a demand-driven analysis
Steps to Demand-driven analysis
• Define problem in the IFDS framework
• Reverse the flow functions, or reverse the flow edges
• Start with initial query < d, n >
• Propagate the query backwards until solved
Reversing dataflow
• In Duesterwald et al., the dataflow problem is specified with flow functions– Reverse the functions
• For CFL problems, the problem is represented as a set of edges– Just reverse the edges
Example: CCPNotation
• x – set of dataflow facts
• xw – dataflow fact for variable w
• fn(x)w – transfer fn for variable w at node n
• [w = c] – set of dataflow facts, where the fact for variable w equals c
Query Algorithm
• Worklist holds the set of outstanding queries
• While not empty, remove a query
• Propagate backwards one node in the flowgraph
• For a function call, create a backwards summary for that function and apply that
Query Propagation
More notation• rp – entry node for
procedure p• m, n – normal nodes• fm – reverse dataflow fn
for node m• Ncall – all nodes that are
callsites• call(m) – the procedure
called at node m• (rp, ep) – summary fn
for procedure p
Backwards edge propagation
Query Algorithm Efficiency
• Optimizations: function summaries, early termination, query result cache
• In the worst case, it’s the same as exhaustive analysis
• Some problems work better than others for demand-driven analysis.– Depends how much information you need to answer
queries, or how many queries need to be made.
Conclusions
• Demand-driven analysis is a powerful idea
• Saves time and space, but in the worst case it’s no better than exhaustive analysis
• Only works for distributive problems
• Two approaches for demand-driven analysis are equivalent
Discussion
• Are these algorithms generally applicable?• Are they fast?
– No evidence the papers, but the answer is yes (see ESP in a couple of weeks)
• Why are they efficient (beyond the complexity guarantee)?
• Is it always cheap to compute the exploded supergraph?– How can an imprecise alias analysis influence this step
and the overall performance of the algorithm?