Date post: | 04-Jun-2018 |
Category: |
Documents |
Upload: | sandeep-kumar |
View: | 223 times |
Download: | 0 times |
of 57
8/13/2019 HRA Project Report
1/57
B.TECH PROJECT REPORT
on
HEAP REFERENCE ANALYSIS AND ITS IMPLEMENTATION IN GCC
SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF
BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING
Submitted by:Pratik Patre
Niranjan Viladkar
Waman Virgaonkar
Under the guidance of
Dr. C. S. Moghe
Professor, Computer Science and Engineering, VNIT
&
Dr. U. P. Khedker
Professor, Computer Science and Engineering, IITB
Visvesvaraya National Institute of Technology, Nagpur
2010-2011
8/13/2019 HRA Project Report
2/57
2
Visvesvaraya National Institute of Technology, Nagpur
2010-2011
CERTIFICATE
This is to certify that the project work entitled HEAP REFERENCE ANALYSIS AND ITS
IMPLEMENTATION IN GCC, is a bonafide work written by Mr. Pratik Patre, Mr. Niranjan
Viladkar and Mr. Waman Virgaonkar in the Electronics and Computer Science Engineering
Department, Visvesvaraya National Institute of Technology, Nagpur, in partial fulfilment of the
requirements for the award of the degree of Bachelor of Technology inComputer Science and
Engineering.
Dr. C.S. Moghe Dr. K. D. Kulat
Professor, Head of Department,
Electronics and Computer Science Electronics and Computer Science
Engineering, Engineering,
VNIT, Nagpur VNIT, Nagpur
8/13/2019 HRA Project Report
3/57
8/13/2019 HRA Project Report
4/57
4
ACKNOWLEDGEMENTS
We take this opportunity to acknowledge with deep sense of gratitude our project guides Dr.C.S. Moghe, Professor, Department of Electronics and Computer Science Engineering, VNIT
Nagpur and Dr. U. P. Khedker, Professor, Department of Computer Science, IITB for their
invaluable guidance, motivation, and support which has led to the successful completion of this
project.
We also take this opportunity to pay our sincere thanks to Dr. K. D. Kulat, Head of Department,
Department of Electronics and Computer Science Engineering, VNIT, Nagpur, for providing the
requisite facilities needed to complete the project. We would also like to thank all the teaching
and non- teaching staff for supporting us.
8/13/2019 HRA Project Report
5/57
5
ABSTRACT
Garbage in programs in defined to be unsued data. However current garbage collectors
approximate it as unreachable data. This is due to the lack of effective analysis techniques for
heap data. The use of current data flow analysis techniques for heap references is difficult as
they are matured enough for static data but not for heap. In this project we put forth a data
flow analysis technique for heap references.
Our technique for collecting garbage is based on liveness analysis which approximates unused
data very closely. This analysis uses access graphs as data flow information which captures the
pattern of heap reference accesses. Since access graphs are bounded and the operations
defined on them are monotonic we can use data flow analysis framework and all its standard
results.
8/13/2019 HRA Project Report
6/57
6
Table of Contents
1. Introduction ....................................................................................................................... 81.1. Motivation ................................................................................................................... 8
1.2. The solution ................................................................................................................. 8
1.3. Related work .............................................................................................................. 10
1.4. Challenges.................................................................................................................. 10
1.5. Contributions ............................................................................................................. 11
1.6. Organization of the report ......................................................................................... 11
2. Data Flow Analysis ........................................................................................................... 12
2.1. Program analysis ........................................................................................................ 12
2.2. Data flow analysis abstraction .................................................................................... 12
2.3. Data flow analysis schema ......................................................................................... 143. Explicit Liveness Analysis of Heap .................................................................................... 16
3.1. Program to be analysed ............................................................................................. 16
3.2. Capturing liveness of heap ......................................................................................... 16
3.3. Capturing liveness using access paths ........................................................................ 17
3.4. Capturing liveness using access graphs ...................................................................... 22
3.5. Other analyses ........................................................................................................... 31
3.6. Implementation in GCC .............................................................................................. 31
4. Overview of GCC............................................................................................................... 32
4.1. Intermediate representation ...................................................................................... 32
4.2. GCC Pass .................................................................................................................... 334.3. Adding a GIMPLE interprocedural pass....................................................................... 34
4.4. Building a compiler from GCC..................................................................................... 35
5. Pass Details ...................................................................................................................... 36
5.1. General outline .......................................................................................................... 36
5.2. Visiting each statement .............................................................................................. 36
5.3. Identifying assignment statements ............................................................................ 38
5.4. Identifying pointer type statements ........................................................................... 38
5.5. Generate access path set ........................................................................................... 39
6. Access Graph Library ........................................................................................................ 41
6.1. Files ........................................................................................................................... 41
6.2. Formal definitions of the data structures ................................................................... 41
6.3. The data structures .................................................................................................... 42
6.4. Operations on access graphs ...................................................................................... 44
7. Implementation of Explicit Liveness Analysis in GCC........................................................ 53
7.1. The main function ...................................................................................................... 53
7.2. Explicit liveness analysis ............................................................................................. 54
8. Conclusion ........................................................................................................................ 56
9. References ........................................................................................................................ 57
8/13/2019 HRA Project Report
7/57
7
Table of Figures
Figure 1-1: Motivating Example of HRA .................................................................................... 10
Figure 2-1: A code to illustrate DFA .......................................................................................... 13
Figure 2-2: General algorithm for DFA ...................................................................................... 15
Figure 3-1: Capturing live objects on the heap ......................................................................... 17
Figure 3-2: Computation of ELInand ELOut.............................................................................. 19
Figure 3-3: Flow functions for liveness ..................................................................................... 21
Figure 3-4: Unbounded access path example ........................................................................... 22
Figure 3-5: Set of access paths represented using access graphs .............................................. 24
Figure 3-6: Summarization in access graphs ............................................................................. 24
Figure 3-7: Liveness capturing equations for assignment statement......................................... 26
Figure 3-8: Liveness capturing equations for function call statement ....................................... 27
Figure 3-9: Liveness capturing equations for return statement ................................................ 28
Figure 3-10: Liveness capturing equations for use statement ................................................... 28
Figure 3-11: Computation of ELIn for section 3.4.3 ................................................................... 30
Figure 3-12: ELIn and ELOut definitions .................................................................................... 30
Figure 3-13: Solution to Figure 1-1 ........................................................................................... 31
Figure 6-1: Examples of operations on access graphs ............................................................... 47
Figure 7-1: Main data structure ................................................................................................ 53
Figure 7-2: Data structure for liveness analysis ......................................................................... 54
Figure 7-3: General Algorithm .................................................................................................. 54
Figure 7-4: Computation of ELIn ............................................................................................... 54
Figure 7-5: Computation of LDirect .......................................................................................... 55
Figure 7-6: Calculation of EKillPath ........................................................................................... 55
Figure 7-7: Calculation of LTransfer .......................................................................................... 55
8/13/2019 HRA Project Report
8/57
8
1. Introduction
Program analysis techniques, especially data flow analysis techniques are employed to find
various properties of data used in a program. This summarization of properties of data have
enabled us perform validation, verification and various optimizations on a program. These
techniques have matured significantly over time for static data i.e. data allocated on stack and
in static area. However analysis of data allocated on heap has not reached same level of
maturity.
Garbage is unused data in program causing memory leak and is mainly present on heap. The
current inability to analyse heap data has prevented efficient garbage collection. Taking this
problem as our main motivation we develop a technique for analysis of heap data [5] for
solving the problem of garbage collection. We would also implement the analysis in GCC to
obtain a working model of the analysis.
1.1. Motivation
Data is allocated on stack or heap. Data allocated on stack has fixed size and fixed lifetime,
depending on function scope or block scope. This fixed lifetime of static data makes it easy toallocate and de-allocate stack data. Allocating data on heap gives us the flexibility of variable
size and variable lifetimes. However variable lifetime of heap data makes the question of de-
allocating heap data a difficult one.
Traditionally, liveness of heap data has been approximated by reachability. The heap data that
is unreachable is considered as garbage and de-allocated. However what if some data on heap
is reachable but never used after a certain program point? That heap data should also be
treated as garbage and de-allocated. However the current analysis techniques are not powerful
enough to find such data. Solving the above problem and implementing it in GCC is our main
motivation.
1.2. The solution
We perform static analysis of program extracting properties of heap data accesses and find
unused data beyond each program point. We make all the references to this heap data as null.
Now that data is unreachable and will be collected by conventional garbage collectors. This is
8/13/2019 HRA Project Report
9/57
9
known as Cedar Mesa folk wisdom. This would be done by analysing four properties of heap
references which are explicit liveness, aliasing, availability and anticipability. In accordance with
these analyses, null assignments are decided upon and checked for safety and profitability.
However we limit ourselves to explicit liveness analysis in this project. We will implement
explicit liveness analysis in GCC as an implementation of our approach. GCC is a widely used
compiler and supports many front-end languages and back-end machines. Also, GCC provides
good API for interfacing with the program and its manipulation. Hence we would implement
our analysis in GCC. Since the runtime environment of C program does not guarantee a garbage
collector, we have to explicitly free the memory when all aliases to an object are nullified.
1.2.1. Illustrative Example:We present an example to illustrate our approach.Figure 1-1(a) shows the program operating
on the heap.Figure 1-1(b) shows the memory graph. Root variables are on the stack and the
actual objects corresponding to the root variables are in the heap. The heap is represented as a
directed graph with entry nodes on the stack and objects represented as nodes and links i.e.
references represented as directed edges. Here before execution of line 5 w refers to ma
always as represented by solid edge. Depending on whether while loop executes none, once,
twice or thricexrefers to ma, mb, mc, mdas represented by dashed edges. Similarly, yrefers
to mi, mf, mg, me. mk is an unreachable object while variablezdoes not refer heap and is
ignored.
A conventional copying collector will preserve all nodes except mk. However, only a few of
them are used beyond line 5. The modified program makes the unused nodes unreachable by
nullifying relevant links. The modifications in the program are general enough to nullify
appropriate links for any number of iterations of the loop. Observe that a null assignment hasalso been inserted within the loop body thereby making some memory unreachable in each
iteration of the loop.
8/13/2019 HRA Project Report
10/57
10
Figure 1-1: Motivating Example of HRA
Courtesy:[5]
1.3. Related work
The theoretical basis of our work which includes the heap reference analysis schema and
proofs of correctness of the analysis was done by Khedker et al.[5]
1.4. Challenges
A program accesses data through expressions that have l-values and called access expressions.
They can be scalar variables such asx or can be a reference expression such asx.lptr.rptr.
Program analyzes data and hence needs to know the binding of an access expression with data
i.e. answer the question: What are the different bindings of an access expression to any
object o on the heap at a program point p along different possible program paths? The
precision of the analysis depends on the precision of the answer to the above question.
When the access expressions are simple and correspond to static data, answering the above
question is often easy because, the mapping of access expressions to l-values remains fixed in a
given scope throughout the execution of a program. However in the case of reference
8/13/2019 HRA Project Report
11/57
11
expressions, the mapping between an access expression and its l-value is likely to change
during execution. Observe that manipulation of the heap is nothing but changing the mapping
between reference expressions and their l-values. For example, inFigure 1-1,access expression
x.lptrrefers to miwhen the execution reaches line number 2 and may refer to mi, mf, mg, orme at line 4. This implies that, subject to type compatibility, any access expression can
correspond to any heap data, making it difficult to answer the question mentioned above. All
these make analysis of programs involving heaps difficult.
1.5. Contributions
This project would be the first complete implementation of the heap reference analysis in GCC.
We would be contributing to both the heap reference analysis by doing its first
implementation. And to GCC as it is an open source compiler by implementing this analysis in
GCC.
1.6. Organization of the report
Chapter2 would talk about data flow analysis techniques in general. Chapter3 would use the
data flow techniques for explicit liveness analysis of heap. Chapter4 would give an overview of
GCC. Chapter5 would be about the interfacing with GCC. Chapter0 consists of implementation
access graph and its associated operations. Chapter 7 would be about implementation of
explicit liveness analysis of heap.
8/13/2019 HRA Project Report
12/57
12
2. Data Flow Analysis
Data Flow analysis1 is an important technique for program analysis. It is a technique for
gathering information about the flow of data regarding a particular property at various points
in a computer program. The information gathered is often used for validating a program or by
compilers when optimizing a program.
2.1. Program analysis
Program analysis techniques analyze a particular program with respect to some property.
Program analyses cover a large spectrum of motivations, basic principles, and methods.
Different approaches to program analysis differ in details but at a conceptual level, almost all
program analyses are characterized by some common properties. Although these properties
are abstract, they provide useful insights about a particular analysis. A deeper understanding of
the analysis would require exploring many more analysis-specific details.
Program analysis can be used to determine the validity of a program, to understand the
behaviour of a program or to transform and optimize a program. Some common paradigms of
program analysis are inference systems, constraint resolution systems, model checking andabstract interpretations. Data flow analysis is a constraint resolution system based program
analysis technique.
2.2. Data flow analysis abstraction
Data flow analysis statically computes information about the flow of data (i.e., uses and
definitions of data) for each program point in the program being analyzed. This information is
required to be a safe approximation of the desired properties of the run time behaviour of the
program during each possible execution of that program point on all possible inputs.
A state of a program at a particular time may be regarded as to consisting of values of various
data objects. The execution of a program can be viewed as a series of transformations of the
program state. Each execution of an intermediate-code statement transforms an input state to
1Based on[1] and[4]
8/13/2019 HRA Project Report
13/57
13
a new output state. The input state is associated with the program point before the statement
and the output state is associated with the program point after the statement.
When we analyze the behaviour of a program, we must consider all the possible sequences of
program points i.e. paths through a flow graph that the program execution can take. We then
extract, from the possible program states at each point, the information we need for the
particular data-flow analysis problem we want to solve. In general, there is infinite number of
possible execution paths through a program, and there is no finite upper bound on the length
of an execution path. Program analyses summarize all the possible program states that can
occur at a point in the program with a finite set of facts. Different analyses may choose to
abstract out different information, and in general, no analysis is necessarily a perfect
representation of the state.
Illustration:
Consider the program given below.
Figure 2-1: A code to illustrate DFA
What values can a have at program point 5? Answering this question this question seems
difficult because there is infinite number of execution paths reaching program point 5.
However in data-flow analysis, we do not distinguish among the paths taken to reach a
program point. Moreover, we do not keep track of entire states; rather, we abstract out certain
details, keeping only the data we need for the purpose of the analysis. Summarizing all
program states at program point 5, a can have values {5, 13}. Also different data flow analyses
collect different information like, reaching definitions analysis says that definition set {1, 3}
reaches point 5 while constant folding detects that a cannot be treated as constant at point 5.
1: a = 5;
2: while (is_stop()) {
3: a = 13;
4: }
5: if ( a == 13 )
6: b = a;
7: else
8: b = 9;
9: return b;
8/13/2019 HRA Project Report
14/57
14
2.3. Data flow analysis schema
In each application of data-flow analysis, we associate with every program point a data-flow
value that represents an abstraction of the set of all possible program states that can be
observed for that point. We denote the data-flow values before and after each statements
by
IN[s] and OUT[s], respectively. The data-flow problem is to find a solution to a set of
constraints on the IN[s]'s and OUT[s]'s, for all statements s. There are two sets of
constraints: those based on the semantics of the statements (transfer functions) and those
based on the flow of control.
Transfer function depends on the semantics of the statement and the analysis being
performed. In a forward-flow problem, the transfer function fs for statement s converts a
data-flow value before the statement to a new data-flow value after the statement. That is,
OUT[s] =fs(IN[s]) (2.1)
Conversely, in a backward-flow problem, the transfer function fs for statement s converts a
data-flow value after the statement to a new data-flow value before the statement. That is,
IN[s] =fs(OUT[s]) (2.2)
Control flow constraints are derived from flow of control. The flow of control is explicitly
represented in a program flow graph. In the forward flow problem, the constraint flow function
where U is confluence function is,
IN[s] = Up is a predecessor of sOUT[p] (2.3)
In backward flow problem, the constraint flow function is,
OUT[s] = Up is a successor of sIN[p] (2.4)
Illustration:
Consider program inFigure 2-1.While performing reaching definitions analysis of x, consider
the transfer function of statement 3. The IN set consists of definition set {1} while after the
statement OUTset is {1, 3}.
Now consider the constraint flow function at point 9, the program flow graph indicates 2
predecessors as 6 with OUTset {6} and 8 with OUTset as {8}. The INset of 9 is union of sets at6 and 8 and is {6, 8}.
8/13/2019 HRA Project Report
15/57
15
Unlike linear arithmetic equations, the data-flow equations usually do not have a unique
solution. Our goal is to find the most "precise" solution that satisfies the two sets of
constraints. That is, we need a solution that encourages valid code improvements, but does not
justify unsafe transformations.
The general method of solving the above constraints is by initializing the INand OUTsets and
then traversing the program either against or with the control flow satisfying the equations.
The program is traversed iteratively till no further changes are made to the INand OUTsets.
The general algorithm for a forward flow problem is,
Figure 2-2: General algorithm for DFA
1: out[entry] = {initialization};
2: for (each statement s other than entry)
out[s] = {initialization};
3: while (changes to any OUT occur)
4: for (each statement s other than entry) {
5: IN[s] = p is a predecessor of sOUT[p];
6: OUT[s] = fs (IN[s])
7: }
8/13/2019 HRA Project Report
16/57
16
3. Explicit Liveness Analysis of Heap
The method is based on liveness of links for a particular object. The links which are used
beyond a program point are live while those not used are dead and can be set to null. Here we
develop a method for liveness analysis of heap data. We define liveness of heap references,
devise a bounded representation called an access graph for liveness, and then propose a data
flow analysis for discovering liveness. The method is flow sensitive but context insensitive since
we take into account flow of control but approximate interprocedural information.
3.1. Program to be analysed
The analysis is context insensitive so we would not maintain a call graph and work on program
flow graph. The program flow graph has a unique Entry and a unique Exit node. Each
statement forms a basic block. All complex statements are broken down and all the resulting
simple statements fall into following categories:
Assignment Statements: These are assignments to references and are denoted by x= ywhere
the frontier of xand yare references. Only these statements can modify the structure of the
heap.
Function Calls: These are statements function calls which involve access expressions in
arguments and are likex = f (y, z,. . .).
Use Statements: These statements use heap references to access heap data but do not modify
heap references. These are access expressions with their frontiers not as references like
x.data = y.data + z.data.
Return Statement: These statements are return involving access expression like return x.
Other Statements: These statements include all statements which do not refer to the heap. We
ignore these statements since they do not influence heap reference analysis.
3.2. Capturing liveness of heap
Capturing liveness of heap at a program point p would mean finding all objects that can be
accessed in the program after program point p. Links is the way to access an object on the
heap. Thus if we capture links used after program point p we can capture live objects a s, if at
8/13/2019 HRA Project Report
17/57
17
least one link to an object is live then the object is live. Link lcan be used in two different ways.
It may be dereferenced to access an object or tested for comparison. An erroneous nullification
of lwould affect the two uses in different ways: Dereferencing lwould result in an exception
being raised whereas testing lfor comparison may alter the result of condition and thereby theexecution path. Links are accessed in a program using access expressions as they contain heap
references. Thus by considering the access expressions after program point p, we can capture
live links thereby capturing live objects on heap.
Illustration:
Consider the program with root as binary tree with left and right as its children:
Figure 3-1: Capturing live objects on the heap
At program point 4, what is the liveness of heap? We see that root.left.dataaccess expression
is used in statement 5 hence the link between root and left (denoted as rootleft) in the
memory graph becomes alive. Thus we say that the left child of binary tree root is live and
since right child does not have any live link, it is dead.
Now we need to capture liveness of links in a memory graph which we do using access paths.
Access paths actually denote links in a memory graph. The next section would describe the
approach in detail.
3.3. Capturing liveness using access paths
3.3.1. Access paths
As discussed above, in order to discover liveness and other properties of heap, we need a way
of naming links in the memory graph. We do it using access paths. An access path is a root
variable name followed by a sequence of zero or more field names and is denoted by xx
f1f2....fk. Since an access path represents a path in a memory graph, it can be used for
naming links and nodes. An access path consisting of just a root variable name is called a simple
access path; it represents a path consisting of a single link corresponding to the root variable. E
1: binary_tree root;
2: root = set_binary_tree();
3: aliased_root = root;
4
5: return root.left.data;
8/13/2019 HRA Project Report
18/57
18
denotes an empty access path. The last field name in an access path is called itsfrontier and is
denoted by Frontier (). The frontier of a simple access path is the root variable name. The
access path corresponding to the longest sequence of names in excluding its frontier is called
its base and is denoted by Base(). Base of a simple access path is the empty access path. Theobject reached by traversing an access path is called the target of the access path and is
denoted by Target(). When we use an access path to refer to a link in a memory graph, it
denotes the last link in, that is, the link corresponding to Frontier ().
Illustration:
ConsiderFigure 3-1,for the access pathroot leftat program point 3, Base ()is root
while Frontier ()is the link rootleft and Target ()is the left child of root.
As explained earlier, Figure 1-1(b) is the superimposition of memory graphs that can result
before line 5 for different executions of the program. For the access pathxx lptr lptr,
depending on whether the while loop is executed 0, 1, 2, or 3 times, Target (x) denotes
nodes mj, mh, mm,or ml. Frontier (x)denotes one of the links mimj, mfmh, mgmm
or meml. Base(x) represents the following paths in the heap memory: xmami ,
xmbmf, xmcmgorxmdme.
In the rest of the report, denotes an access expression, denotes an access path and
denotes a (possibly empty) sequence of field names separated by . Let the access expression
xbe xf1f2 fn. Then, the corresponding access path xis xf1f2 fn. When the
root variable name is not required, we drop the subscripts from xandx.
3.3.2. Liveness of access paths
Now we need to define liveness of access paths. For a link lto be live there must be at least one
access path from some root variable to lsuch that every link in this path is live. This is the path
that is actually traversed while using l. An access path is defined to be live at p if the link
corresponding to its frontier is live along some path starting at p. Safety of null assignments
requires that the access paths which are live are excluded from nullification.
We initially limit ourselves to a subset of live access paths, whose liveness can be determined
without taking into account the aliases created before p. These access paths are live solely
because of the execution of the program beyond p. We call access paths that are live in this
8/13/2019 HRA Project Report
19/57
19
sense as explicitly live access paths. An interesting property of explicitly live access paths is that
they form the minimal set covering every live link.
Illustration:
Consider the program in Figure 3-1 at program point 4, the left child of root is accessed and
hence live. The access path used in program is rootleft and hence it is live. But even if
aliased_rootleft access path is not used after statement 4 its frontier link is live i.e. link
between objects pointed by rootand left child. Here we say that rootleft is explicitly live
since all its links are actually in the program. While for aliased_rootleft it is not explicitly
live and we also notice that aliased_root link (from aliased_rootvariable on stack to root
object on heap) is never used.
We would now focus on developing a data flow analysis technique based on capturing liveness
using access paths.
3.3.3. Using access paths to capture liveness
We now look at how statement semantics would affect liveness of access paths. And thus
derive flow constraints in the form of flow functions. Liveness analysis is a backward flow
analysis. Any statement can affect the incoming access path set in the following ways. Here
ELIndenotes incoming access path set and ELOutdenote the outgoing access path set from a
statement.
Let us try to see the effect by an illustration:
Illustration:
Consider the program fragment,
Figure 3-2: Computation of ELInand ELOut
The EOutof the above statement 2 is {xlptrrptrlptr}. Consider,
xlptrrptr is being modified rendering the value before the statement useless. Hence
access paths with prefixxlptrrptrcease to exist before the statement. Such access paths
are reffered as killed access paths. In this case it is {xlptrrptrlptr}.
1:
2: x.lptr.rptr = y.rptr.lptr;
3: print (x.lptr.rptr.lptr.data);
8/13/2019 HRA Project Report
20/57
20
Objects with access paths xlptr and yrptr are directly accessed. These access paths
become live. Such access paths are reffered as directly generated access paths.
Here yrptrlptr is being assigned to xlptrrptr. Thus the objects accessed using
xlptrrptr{some_path} after the statement must be accessible using y
rptrlptr{some_path} before the assignment. Such access paths are reffered as
transferred access paths. Thus transferred access paths are { yrptrlptrlptr}.
The final set of access paths which are live can be computed by removing the killed access
paths from ELIn and adding directly generated and transferred access paths.
Thus the final ELInof statement 2 is {xlptr, yrptrlptrlptr}.
Formalizing the above observations,
Killed Access Paths: These are the access paths that cease to exist before the statement since
the access path was modified in the statement invalidating the previous value assigned to it.
Access paths those are live after the assignment and not killed by it are live before the
assignment also.
Directly Generated Access Paths: These are access paths directly used in a statement and hence
become live before a statement.
Transferred Access Paths: These are the access paths that get transferred from one access path
to another due to an assignment statement. This is to take into account the change in bindings
of an access expression.
Finally the ELInset is computed from the ELOutset as,
ELIn = (ELOut Killed access paths)
U (Directly generated access paths U Transferred access paths)(3.1)
3.3.4. Liveness analysis schema
Now we define the liveness analysis schema using access path. We would also describe control
flow constraints on data flow equations.
8/13/2019 HRA Project Report
21/57
21
Explicit Liveness: The set of explicitly live access paths at a program point p, denoted by
Livenesspis defined as follows:
(3.2)
where, Paths(p)is a control flow path frompto Exitand
denotes the
liveness atpalong .
Path Liveness: Ifp is not program exit, then let the statement that follows it be denoted by s
and the program point immediately following sbe denoted byp. Then,
(3.3)
Statement Liveness: The flow function is defined as:
(3.4)
LKills denotes the sets of access paths that cease to be live before statement s, LDirects
denotes the set of access paths that become live due to local effect of s and LTransfers(X)
denotes the set of access paths which become live before sdue to transfer of liveness from
live access paths after s.
Illustration:
The flow functions explained later in section3.4.3
Flow function is defined as,
Figure 3-3: Flow functions for liveness
Courtesy:[5]
The definitions of LKills, LDirects, and LTransfers(X) ensure that the Livenessp is prefix-
closed.
8/13/2019 HRA Project Report
22/57
22
3.3.5. Difficulties
3.3.5.1. Unbounded access paths:
Access paths cannot be guaranteed to be bounded in case of loops and thus termination
cannot be guaranteed.
Illustration:
Figure 3-4: Unbounded access path example
During 1st
iteration: ELInat 3 is {xptr}, ELOutat 3 is {xnptr}
During 2nd
iteration: ELInat 3 is {xnptr}, ELOutat 3 is {xnnptr}
During nth
iteration: ELInat 3 is {xn[n-1 times]ptr}, ELOutat 3 is {xn[n times]ptr}
Hence a way to summarize access paths is needed.
3.3.5.2. Data Flow Equations
The data flow equations above were MoP solution equations. Hence they are not suitable for
data flow analysis. We need to define MFP solution equations.
3.4. Capturing liveness using access graphs
In the presence of loops, the set of access paths may be infinite and the lengths of access paths
may be unbounded. This problem is solved by representing a set of access paths by a graph ofbounded size.
3.4.1. Access Graphs
A set of access paths can be represented using access graphs. An access graph, denoted by Gv,
is a directed graph representing a set of access paths starting from a root variable
v. N is the set of nodes, n0NF is the entry node with no in-edges and E is the set of edges.
Every path in the graph represents an access path. The empty graph Ghas no nodes or edges
and does not accept any access path.
1:
2: while (is_stop()) {
3: x = x.n;
4: }
5: print (x.ptr.data);
8/13/2019 HRA Project Report
23/57
23
The entry node of an access graphs is labelled with the name of the root variable while the
non-entry nodes are labelled with a unique label created as follows: If a field name is
referenced in basic block b, we create an access graph node with a label 2where iis the
instance number used for distinguishing multiple occurrences of the field name in block b.Note that this implies that the nodes with the same label are treated as identical. Access paths
xare represented by including a summary node denoted nwith a self loop over it. It is
distinct from all other nodes but matches the field name of any other node. A node in the
access graph represents one or more links in the memory graph.
Illustration:1:
2: x.lptr.rptr = y.rptr.lptr;3: print (x.lptr.rptr.lptr.data);
4: print (y.rptr.obj1.data);
The live access paths at each point represented using both access paths and access graphs are,
Program
Point
Set of live access
pathsAccess graphs
OUT set at 4 NULL
IN set at 4 yrptrobj1
OUT set at 3 yrptrobj1
IN set at 3yrptrobj1,
xlptrrptrlptr
OUT set at 2yrptrobj1,
xlptrrptrlptr
2In implementation, lable is where s is statement number in a basic block b created by GCC.
8/13/2019 HRA Project Report
24/57
24
IN set at 2xlptr,
yrptrlptrlptr,
yrptrobj1
Figure 3-5: Set of access paths represented using access graphs
Access graphs solve the problem of infinite access paths by summarization. Summarization in
access graphs is achieved by merging appropriate nodes in access graphs, retaining all INand
OUTedges of merged nodes. The technique is illustrated as below,
Illustration:
Consider the program flow graph shown,
Figure 3-6: Summarization in access graphs
Courtesy:[5]
Node n1 in access graph 1 indicates references of r at different execution instances of the
same program point. Every time this program point is visited during analysis, the same state is
reached in that the pattern of references after r1 is repeated. Thus all occurrences of r1 are
merged into a single state. This creates a cycle which captures the repeating pattern of
references.
8/13/2019 HRA Project Report
25/57
25
In 2, nodes r1and r2indicate referencing n at different program points. Since the references
made after these program points may be different, r1and r2are not merged.
Some operations are defined on access graphs as, the complete formal definitions of the
following and more graph functions are described in Chapter0.
G () Constructs access graphs corresponding to
Path Removal() The operation Gremoves those access paths in Gthat haveas aprefix
lastNode (G) Returns the last node of a linear graph G
Union (U) GU Gcombines access graphs Gand Gsuch that any access pathcontained in Gor Gis contained in the resulting graph
Factorization (/) G/(G,M)returns all remainder graphs in Gstarting from nodes in Gcorresponding to Min G
Extension(#) (G,M)#R returns graph Gextending it by remainder graphs in Ratnodes in M
3.4.2. Liveness representation using access graphs
A set of access paths can be represented using access graphs. Every path in the graph
represents an access path. All the access paths present in an access graph are live. This causes
approximation during summarization but is safe.
3.4.3. Capturing liveness using access graphs
We now look at how statement semantics would affect liveness of access paths. And thus
derive flow constrains in the form of flow functions. Liveness analysis is a backward flow
analysis. Any statement affects the incoming access path set depending on its type and is
explained below. Here ELIndenotes incoming access path set and ELOutdenote the outgoing
access path set from a statement.
3.4.3.1. Assignment statement
Assignment statement will be of the form : x= y
8/13/2019 HRA Project Report
26/57
26
We know how an assignment statement affects liveness of heap as seen in the illustration in
section3.3.3.Now we will see how to capture these effects using access graphs.
Illustration:
Consider the program statement,
5: x.left.right = y.right.left.right
The access path xleftright gets modified. So we have to remove all access paths with
xleftright as prefix. Hence killed access paths are {xleftright}.
The access paths xleft and yrightleft are generated. Thus the base of directly used access
expressions is generated.
Some access paths are to be transferred from xleftright to yrightleftright. The access
paths from access graph of x with prefix xleftright have to be copied as remainder graphs
using graph factorization and then attached to access graph of y with prefix
yrightleftright using graph extension.
Formalizing the above observations,
Figure 3-7: Liveness capturing equations for assignment statement
In theabove equations, Gxand Gydenote G(x) and G(y), respectively, whereas Mxand My
denote lastNode(G(x))and lastNode(G(y))respectively.
3.4.3.2. Function callFunction call will be of the form: x=(y)
8/13/2019 HRA Project Report
27/57
27
We conservatively assume that a function call may make any access path rooted at y or any
global reference variable live. Thus, this version of our analysis is context insensitive.
Illustration:
Consider the program statement, with global variable z,
5: x.left.right = func (y.right);
The access path xleftright gets modified. So we have to remove all access paths with
xleftright as prefix. Hence killed access paths are {xleftright}.
The access paths xleft and y get directly accessed hence get directly generated.
The access path yright is passed as parameter to the function and so any access paths may be
accessed after yright. Thus we conservatively approximate that the generated access path is
{yrightn*}. Similarly any access path from global variable may be accessed and so we
conservatively assume that the generated access path is {zn*}.
Formalizing the above observations,
Figure 3-8: Liveness capturing equations for function call statement
3.4.3.3. Return statement
Return statement will be of the form : return x
Illustration:
Consider the program statement, with global variable z,
5: return x.left;
The access paths xleft get directly accessed hence get directly generated.
8/13/2019 HRA Project Report
28/57
28
The access path xleft is passed as a return value to the calling function and so any access
paths may be accessed after xleft. Thus we conservatively approximate that the generated
access path is {xleftn*}. Similarly any access path from global variable may be accessed and
so we conservatively assume that the generated access path is {zn*}.
Formalizing the above observations,
Figure 3-9: Liveness capturing equations for return statement
3.4.3.4. Use statement
Illustration:
Consider the program statement, with global variable z,
5: x.left.data = y.right.data + z.left.right.data;
The access paths xleft, yright, zleftright get directly accessed hence get directlygenerated.
Formalizing the above observations,
Figure 3-10: Liveness capturing equations for use statement
3.4.4. Liveness analysis schema revisited
Now we define the liveness analysis schema using access graphs. We would also describe
control flow constraints on data flow equations.
Now to compute liveness ELIn due to a statement, we have to remove killed access paths and
add directly generated and transferred access paths.
8/13/2019 HRA Project Report
29/57
29
And while computing ELOut we have to merge the access paths present at the ELIn of its
successors.
Now we will see their computation using some illustrations,
Illustration:
Now we will illustrate ELIn computation for the examples used to illustrate effect of each
statement type on access graphs.
Illustration
in sectionOUT set IN set
3.4.3.1
(Assignment
statement)
3.4.3.2(Function
call)
8/13/2019 HRA Project Report
30/57
30
3.4.3.3
(Return
statement)
3.4.3.4
(Use
statement)
Figure 3-11: Computation of ELIn for section3.4.3
Formalizing,
For a given root variable v, ELInv(i) and ELOutv(i) denote the access graphs representing
explicitly live access paths at the entry and exit of statement i. We use EGas the initial value for
ELInv(i) / ELOutv(i).
Figure 3-12: ELIn and ELOut definitions
EKillPath, LDirectand LTransferare defined according to the type of statement.
Solving theabove data flow equations we get the solution as access graphs.
8/13/2019 HRA Project Report
31/57
31
Illustration:
The solution of the problem described inFigure 1-1 is,
Figure 3-13: Solution toFigure 1-1
Courtesy:[5]
3.5. Other analyses
Other analyses that are required for null assignment insertions are discussed in brief below.
Their study and implementation is not covered in this project.
Alias analysis and complete liveness computation: This analysis discovers all aliases and thus
finds all paths aliased to live access paths.
Anticipability and availability analysis: This analysis discovers available and anticipable access
paths so that insertion of new access paths does not cause exceptions.
Null assignment insertion: Null assignment insertion is subject to safety and profitability.
3.6. Implementation in GCC
We have now seen the formulation of data flow analysis equations for heap reference analysis.
Now we would implement the analysis in GCC in the succeeding chapters.
8/13/2019 HRA Project Report
32/57
8/13/2019 HRA Project Report
33/57
33
4.1.4. GIMPLE
GIMPLE is a simplified version of GENERIC. It is lowering of GENERIC to a three-operand
representation. Temporaries are introduced to hold intermediate values needed to compute
complex expressions as three-operand statements. Additionally, all the control structures used
in GENERIC are lowered into the conditional jumps.
The compiler pass, which converts GENERIC to GIMPLE is referred to as gimplifier [7]. This
pass works recursively replacing each complex statement by a result-wise equivalent set of
gimple three-operand statements. These GIMPLE statements are also referred to as GIMPLE
tuples.
Earlier implementation of GIMPLE used trees as internal data structure[3].But, tree structure
was much more general than required for three address statements. Here comes the concept
of tuples. It contains information such as type of statement, result, operator and operands.
Operands themselves are represented as trees.
For example,
x= 10 would be represented as gimple_assign
x = b+c would be represented as gimple_assign
4.2. GCC Pass
In order to analyze programs, perform certain operations on them, we need to add a pass to
GCC. Pass is a C program that with the help of GCC APIs extracts information from previous
pass or input program or both, performs certain operations on the information received and
produce output that may or may not be forwarded to next pass. Behaviour of any pass can be
observed by looking at the dumps produced by corresponding pass. For eg. To observe the
output dump by gimplifier, while compiling input program, we can provide a switch -fdump-
tree-gimple.
4.2.1. Types of passes
There are 4 types of passes, gimple_opt_pass, simple_ipa_opt_pass, ipa_opt_pass and
rtl_opt_pass. The definitions and declarations are provided in $SOURCE/gcc/tree-pass.h. We
will use simple_ipa_opt_pass.
8/13/2019 HRA Project Report
34/57
34
4.3. Adding a GIMPLE interprocedural pass
In GCC, any pass is represented by a structure, in our case that structure is:
simple_ipa_opt_pass. The declaration of this structure and detailed information about the
fields of this structure can be found in $SOURCE/gcc/tree-pass.h. The definition of our pass
structure is as follows:
struct simple_ipa_opt_pass pass_empty = {
{
SIMPLE_IPA_PASS, /*Type of Pass*/
"hra" , /*Switch to execute the pass*/
NULL , /*Condition function */
empty_func_driver, /*Entry point*/
NULL , /*sub passes*/
NULL , /*Next subpasses*/
0 , /*static pass number*/
0 , /*tv_id */
0 , /*properties required, indicated by bit position*/
0 , /*properties provided, indicated by bit position*/
0 , /*properties destroyed, indicated by bit position*/
0 , /*todo flags start*/
0 /* todo flags finish */
}
};
4.3.1. Registering the pass
We need to register our pass, i.e. our C program file by adding it in $SOURCE/gccdirectory
and make changes in following files:
1. $SOURCE/gcc/passes.c
2. $SOURCE/gcc/tree-pass.h3. $SOURCE/gcc/Makefile.in
In passes.c, we need to determine the position of pass by adding its entry in appropriate
position in pass list present in init_optimization_passes()function. As our pass is simple ipa
optimization pass, we can add our pass when the pass pointer is set to point all regular ipa
8/13/2019 HRA Project Report
35/57
35
passes. As it does not take into input from any previous pass neither does it provide its output
to any other pass, the exact ordering is not of much importance.
In tree-pass.h, we need to make declaration of our pass as :
extern struct simple_ipa_opt_pass ;
InMakefile.in, we need to write rule to make target pass_name.o and and pass_name.o to the
list of language independent object files.
4.4. Building a compiler from GCC
Here, our target is to build a compiler (cc1) which when input by a C program would produce
corresponding assembly *.s file. The steps to build a compiler are as follows:
1. Write rule to make target cc1 in file $SOURCE/Makefile.in
cc1:
make all-gcc TARGET-gcc=cc1$(exeext)
2. Make a new build directory (hereafter $BUILD)outside the source code directory
3. With current directory as $BUILD, configure it with $SOURCE/configure. We can give
many options while configuring, such as, enable-languages, target(i.e target
architecture / machine for which generated compiler would produce the assembly
code), install directory etc.
4. After configuring, run make with target as cc1. This step requires time, roughly 10-12
minutes on average machine.
5. After successful completion of make, generated compiler can be used by using
$BUILD/gcc/cc1
for eg. $BUILD/gcc/cc1 program.c -fdump-ipa-allwould compileprogram.cto produce
program.sand around 20-25 dumps of all the interprocedural passes.
By observing the dumps, we can understand the behaviour of various passes for given input
program. For our pass, the corresponding switch is -fdump-ipa-hra.
8/13/2019 HRA Project Report
36/57
8/13/2019 HRA Project Report
37/57
37
5.2.2. Visiting each basic block
In a given function, each basic block can be visited in the following manner:
FOR_EACH_BB(BB){
//code to analyze each basic block here.
}
Here, FOR_EACH_BB(BB)is a macro provided by GCC which uses a global variable cfunto point to
current function, and in current function, it uses BBto point to each basic block. The body of
macro is a simple for loop which starts from the first basic block and then advances to next
block till it reaches end.
5.2.3. Visiting each GIMPLE statement
In a given basic block, each GIMPLE statement can be visited using the following macro:
#define FOR_ALL_STMT_FWD_VNIT(BB, GSI) \
FOR_EACH_BB(BB) \
FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI)
Here, body of FOR_ALL_STMT_FWD_VNIT(BB, GSI)is made up of two macros, former is provided by GCC
and the later has been defined in the pass as:
#define FOR_EACH_GIMPLE_STMT_VNIT(BB, GSI) \
for(GSI = gsi_start_bb(BB); !gsi_end_p(GSI); gsi_next(&GSI))
Here, GSIis a gimple statement iterator, whose data type is provided by GCC. As we can see, in
the body of second macro, GSI first points to the start statement of the basic block and then
goes till it reaches the end. In the body of this for loop, we can use gsi_stmt(GSI) to access the
corresponding GIMPLE statement. Thus driver function for our pass after removing
unnecessary details looks like:
static unsigned int empty_func_driver(){
preparatory_iterations();
for (cnode = cgraph_nodes; cnode; cnode=cnodenext){ //iterate over all functions
push_cfun (DECL_STRUCT_FUNCTION (cnodedecl)); //push current function
FOR_ALL_STMT_FWD_VNIT(bb, gsi){ //iterate over each gimple statement in current function
if ( is_gimple_assign(gsi_stmt(gsi)) && is_stmt_pointer_type(gsi_stmt(gsi)) )
get_access_paths(gsi_stmt(gsi));
}
pop_cfun ();
}
return 0;
}
8/13/2019 HRA Project Report
38/57
38
5.3. Identifying assignment statements
In our pass, we are currently able to identify only the GIMPLE assignment statements. Future
work will include identifying and analysing function call, return and use statements. After the
study of file$SOURCE/gcc/gimple.h
, we found a function is_gimple_assign( gimple stmt )that checks
whether a given GIMPLE statement is an assignment statement or not. So when we visit each
GIMPLE statement, we check that statement with above function and proceed towards further
analysis if it is an assignment statement else we move to the next GIMPLE statement.
5.4. Identifying pointer type statements
Once found to be an assignment statement, it needs to be checked for pointer type. If any of
the three operands of an assignment are of pointer type, we recognize that statement as
pointer type statement. The check consists of checking the tree codes and types of all the
operands. GCC assigns each operand a tree code and provides a macro TREE_CODE() that
extracts the tree code. It also provides with macro POINTER_TYPE_P(type) which checks the type (of
any operand) to be of pointer type and returns the boolean result. Type of operand can be
found by TREE_TYPE()macro, again provided by GCC. The code to check if variable is of pointer
type:
static bool is_pointer_var(tree var){
if (TREE_CODE(var) == COMPONENT_REF || TREE_CODE(var) == ADDR_EXPR)
return true;
return is_pointer_type(TREE_TYPE(var));
}
static bool is_pointer_type(tree type){
if(POINTER_TYPE_P(type))
return true;
if(TREE_CODE(type) == ARRAY_TYPE)
return (is_pointer_var(TREE_TYPE(type)));
return AGGREGATE_TYPE_P(type);
}
5.4.1. Extracting operands
In order to check the tree codes and types, first we need to extract operands from a given
GIMPLE statement. This can be done using functions provided by GCC:
1. tree gimple_assign_lhs(gimple stmt)
2. tree gimple_assign_rhs1(gimple stmt)
3. tree gimple_assign_rhs2(gimple stmt).
8/13/2019 HRA Project Report
39/57
39
5.5. Generate access path set
This function returns access path set for each pointer type assignment statement. It gets access
paths for each operand and then clubs them together to get an access path set.
5.5.1. Getting access paths
In order to get access path from each operand, we use functions such as :
access_path * get_access_path_lhs(gimple stmt). This function extracts the names (field names) of
variables as used by programmer (or compiler generated temporaries). The function to get field
names looks like: (functions for rhs operands resemble this function)
static char * get_lhs_op (const gimple stmt){
tree t;
if (is_gimple_assign(stmt)){
t = gimple_assign_lhs(stmt);
return get_name_of_tree1(t);
}
return NULL;
}
And if operand is of pointer type, it generates a label for that operand from following entities:
field name, basic block number, statement number. Out of these, field name extraction and
assigning statement number task has been done in the pass. GCC assigns each basic block a
unique index (number). This triplet makes a label unique.
Once labels are prepared, they are combined together to get an access path for corresponding
operand. And then, access paths of all the operands in a statement are combined together to
get an access path set for that GIMPLE statement. Note that, access path is for an operand
while access path set is for a GIMPLE statement.
8/13/2019 HRA Project Report
40/57
40
The code for getting access path set looks like:
static access_path_set * get_access_paths(gimple stmt){
switch(gimple_code(stmt)){
case GIMPLE_ASSIGN:
ap_lhs = get_access_path_lhs (stmt);
ap_rhs1= get_access_path_rhs1(stmt);
ap_rhs2= get_access_path_rhs2(stmt);
break;
default:
break;
}
stmt_aps lhs = ap_lhs;
stmt_aps rhs1 = ap_rhs1;
stmt_aps rhs2 = ap_rhs2;
return stmt_aps;
}
This completes the phase of retrieving static information from GCC.
8/13/2019 HRA Project Report
41/57
41
6. Access Graph Library3
6.1. Files
AccessGraph.h
This file contains the declaration of the data structure used to represent access graphs and
access paths and also the declaration of the functions associated with it.
AccessGraph.c
This file contains the definition of all the functions required in the explicit liveness analysis.
6.2. Formal definitions of the data structures
6.2.1. Access Paths
An access path is a root variable name followed by a sequence of zero or more field names and
is denoted by x xf1f2 fk. Since an access path represents a path in a memory
graph, it can be used for naming links and nodes. An access path consisting of just a root
variable name is called a simple access path; it represents a path consisting of a single link
corresponding to the root variable. E denotes an empty access path.
The last field name in an access path r is called its frontier and is denoted by Frontier (). The
frontier of a simple access path is the root variable name. The access path corresponding to the
longest sequence of names in r excluding its frontier is called its base and is denoted by Base
(). Base of a simple access path is the empty access path E. The object reached by traversing
an access path r is called the target of the access path and is denoted by Target (). When we
use an access path r to refer to a link in a memory graph, it denotes the last link in , i.e. the
link corresponding to Frontier ().
6.2.2. Access graphsAn access graph, denoted by Gv, is a directed graph representing a set of access paths
starting from a root variable v. N is the set of nodes, n0NFis the entry node with no in-edges
and E is the set of edges. Every path in the graph represents an access path. The empty graph
EGhas no nodes or edges and does not accept any access path.
3Based on [5]
8/13/2019 HRA Project Report
42/57
8/13/2019 HRA Project Report
43/57
43
Here the access path lhs corresponds to the access path of the variable that is on the left hand
side of the =sign, while the access paths rhs1and rhs2correspond to the access paths of the
variables that are on the right hand side in the expression.
6.3.5. Access graph node
This structure represents a node in an access graph which has been implemented as a node in
an adjacency linked list representation of a graph.
typedef struct AGN{
unsigned summary : 1 ;
Label l ;
struct AGE * edges ;
struct AGN * next ;
} AccessGraphNode ;
The label lholds the information in the node while the summary bit denotes whether the node
is a summary node or not. The edges pointer points to the linked list of edges originating from
the node.
6.3.6. Access graph edge
This structure represents an edge in the access graph as well as in the adjacency linked list
representation of the graph.
typedef struct AGE{
AccessGraphNode * from_node ;
AccessGraphNode * to_node ;
struct AGE * next ;
} AccessGraphEdge ;
The access graph node pointers from_node and to_node point to the originating and
destination node of the edge respectively.
6.3.7. Nodes set
The nodes set is set of nodes in the access graph and is implemented as a simple linked list of
nodes.
typedef struct NS{
AccessGraphNode * first_node ;
} Nodes_Set ;
8/13/2019 HRA Project Report
44/57
44
6.3.8. Edges set
The edges set is the set of edges in the access graph and is implemented as a simple linked list
of edges. Thus, unlike the conventional adjacency linked list representation, all the edges in the
access graph form a single linked list with edges originating from the same node grouped
together.
typedef struct ES{
AccessGraphEdge * first_edge ;
} Edges_Set ;
6.3.9. Access graph
As given by the formal definition of the access graph, it has been implemented as structure
with nodes set and edges set. The first node in the nodes set always corresponds to the entry
node in the graph.
typedef struct G{
Nodes_Set Nodes ;
Edges_Set Edges ;
struct G * next ;
} AccessGraph ;
6.3.10. Access graph set
The access graph set represents the set of access graphs as a link list.typedef struct AG{
AccessGraph * start ;
} AccessGraphSet ;
6.4. Operations on access graphs
6.4.1. Auxiliary operations
6.4.1.1. ConstructGraph( g)Constructs access graph g corresponding to access path . It involves converting the access
path nodes to access graph nodes and adding the corresponding edges.
void ConstructGraph (AccessPath * p , AccessGraph * g)
begin
For all the nodes in the access path
begin
Create a corresponding node in the access graph
end
Add edges with respect to access path to access graphend procedure
8/13/2019 HRA Project Report
45/57
45
6.4.1.2. lastNode(G)
Returns the last node of a linear graph G constructed from a given
AccessGraphNode* lastNode (AccessGraph * G)
begin
Traverse the linked list and return the last node
end procedure
6.4.1.3. CleanUp(G)
Deletes the nodes which are not reachable from the entry node.
void CleanUp (AccessGraph * g)
begin
1. Run a Depth First Traversal over the graph and mark all the visited nodes
2. Traverse the linked list of nodes and delete all the unmarked nodes and
their edges from the graph
end procedure
6.4.1.4. CorrespondingNodes(G,G,S)
Computes the set of nodes of Gwhich correspond to the nodes of Gspecified in the set S. To
compute CN(G,G,S), we defineACN(G,G), the set of pairs of all corresponding nodes. Let G
and G .A node nin Gcorresponds to a node nin Gif there
exists an access path rwhich is represented by a path from n0tonin Gand a path from n0 to
nin G.
Formally,ACN(G,G)is the least solution of the following equation:
(0.1)
Note that Field(nj) = Field(nj)would hold even when njor njis the summary node n.
void Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set S, Nodes_Set CN)
begin
All_Corresponding_Nodes (G , G_ , ACN1 , ACN2);
For each node n in ACN2 and n in ACN1
if n S then add n to CN
end procedure
8/13/2019 HRA Project Report
46/57
46
void All_Corresponding_Nodes (AccessGraph* G, AccessGraph* G_, Nodes_Set ACN1,
Nodes_Set ACN2)
begin
if root(G) != root (G_) then return;
Starting from the root node recursively add pair of nodes to the set ACN1 and
ACN2 which are same and have edges coming to them from the pair of nodes
already in these sets.
end procedure
6.4.1.5. CopyGraph (G,G)
Copies the graph Ginto a new access graph G.
AccessGraph* copy_graph (AccessGraph * g)
begin
Copy all the nodes of g into a new graph g
Copy all the edges of g into g establishing links between the nodes and the
edges set
Return g
end procedure
6.4.1.6. RemainderGraph(G,G,n)
Constructs a remainder graph Gfrom an access graph Gwith n as the entry node.
AccessGraph* remainder_graph (AccessGraph* g, AccessGraphNode* n)
beginRun a recursive depth first traversal over the graph g starting from node n
and add each node to a new graph g while visiting it along with all its
edges.
end procedure
6.4.2. Main operations
6.4.2.1. Union
G Gcombines access graphs Gand Gsuch that any access path contained in Gor G is
contained in the resulting graph.
G G = < n0, N N, E E > (0.2)
The operation N N treats the nodes with the same label as identical. Because of
associativity,can be generalized to arbitrary number of arguments in an obvious manner.
This operation can be explained more effectively by the examples given in Figure 6-1. In thefirst example the access graphs g3 and g4 unite to give the access graph g4 since the g3 is the
8/13/2019 HRA Project Report
47/57
47
subset of g4. In the second example the union of access graphs g2 and g4 results in the access
graph g5. Note here that union basically just takes the unions of the nodes and edges set of the
two access graphs with the same root variable. The other two examples are on the same line.
The implementation of this operation is based on the definition given above. The union of
nodes set and edges set of both the graphs is done and then the links are established between
the two sets resulting in a new graph.
Figure 6-1: Examples of operations on access graphs
Courtesy:[5]
AccessGraphSet * Union (AccessGraphSet * G1 , AccessGraphSet * G2)
begin
AccessGraphSetG3 ;
for each graph g1 in set G1
begin
for each graph g2 in set G2
begin
if(root (g1) == root (g2))
then begin
g3 = union_graph(g1 , g2) ;
add g3 to G3 ;
endif
end for
end for
Return G3
end procedure
8/13/2019 HRA Project Report
48/57
48
accessgraph * union_graph (accessgraph * g1 , accessgraph * g2)
begin
accessgraph * g3 ;
copy all the nodes of g1 to g3 ;
for each node n2 in g2
begin
if n2 is not present in g3
then add n2 to g3 ;
end for
copy all edges of g1 to g3 ;
for each edge e2 in g2
begin
if e2 is not present in g3
then add e2 to g3 ;
end for
for each node n3 in g3
begin
search for the first edge e3 in g3
such that e3 from_node = n3
n3 edges = e3 ;
end for
return g3 ;
end procedure
6.4.2.2. Path removalThe operation Gremoves those access paths in G which have as a prefix.
(0.3)
Where,
(0.4)
UniqueAccessPath?(G, n)returns true if in G, all paths from the entry node to node n represent
the same access path.
In the first example given in Figure 6-1,we can see that removal of the access path xl from
the access graph g6 results in the access graph g2. This operation requires removing the
frontier(),i.e. in this case, the node lfrom the access graph g6. The second example illustrates
8/13/2019 HRA Project Report
49/57
49
the case where the is a simple access path. The third and the fourth examples are on the
same lines.
The implementation of this operation is derived from the definition given above. Firstly, the
access graph GB is constructed from the access path Base() and then set of corresponding
nodes is calculated as given above. Each node in the set obtained is then checked to see if it has
a unique access path from root to itself and also an edge to a node which is the frontier of . If
such an edge exists then it is removed from the set and after removing all such edges the graph
is cleaned up.
AccessGraphSet * Path_Removal (AccessGraphSet * G , AccessPath * p)
begin
if p is empty then return copy(G) ;
for each graph g in set G
begin
if (root(p) != root (g)) continue ;
if p is a simple access path
then remove everything from g (empty);
else
GB= construct_graph (base (p)) ;
Nodes_set N = Corresponding_nodes (G , GB, {lastNode(GB)})
for each node niin gbegin
if ni N
if UniqueAccessPath?(G,ni)
then begin
for each edge e from node ni
if e to_node == frontier(p)
delete edge e ;
end if
end for
CleanUp (g) ;
end for
end procedure
6.4.2.3. Factorization
Given a node m (N {n0})of an access graph G, the Remainder Graph of Gat m is the
subgraph of G rooted at m and is denoted by RG(G, m). If m does not have any outgoing
edges, then the result is the empty remainder graph RG. Let M be a subset of the nodes of G
8/13/2019 HRA Project Report
50/57
50
and Mbe the set of corresponding nodes in G. Then, G/(G,M)computes the set of remainder
graphs of the successors of nodes in M.
G/(G,M) = {RG(G, nj) | ni njE, niCN(G,G,M)} (0.5)
A remainder graph is similar to an access graph except that (a) its entry node does not
correspond to a root variable but to a field name and (b) the entry node can have incoming
edges.
InFigure 6-1,the first example illustrates the result when g2 is factorized with g1 and {x}. The
resultant graph rg1 is the sub graph of g2 rooted at {r} which is the successor of the node {x},
which is the corresponding node between the two graphs and the given set. The second
example is on the same lines with the difference that {x} here has two successors, thus,
resulting in two different remainder graphs. In the third example the corresponding node {r}
does not have successor thus resulting in an empty graph. The fourth example illustrates the
case in which there is no corresponding node between the two graphs and thus the result is a
null set.
In the implementation of this operation, the set of corresponding nodes is calculated and then
a remainder graph is constructed for each successor of the node in this set.
AccessGraphSet * Factorization (AccessGraphSet G1, AccessGraphSet G2, Nodes_Set M)
begin
AccessGraphSet RG ;
for each graph g1 in set G1
for each graph g2 in set G2
begin
if (root(g1) != root(g2)) continue ;
Nodes_set N = Corresponding_nodes(g1,g2,M);
for each node n in N
for each edge e of n
begin
new_graph = remainder_graph (g1 , e to_node) ;
add new_graph to RG ;
end for
end for
end procedure
8/13/2019 HRA Project Report
51/57
51
6.4.2.4. Extension
Extending an empty access graph EG results in the empty access graph EG. For non-empty
graphs, this operation is defined as follows.
(a) Extension with a remainder graph (). Let M be a subset of the nodes of G and R be a remainder graph. Then, (G,M) R appends the suffixes in R to the access paths ending
on nodes in M.
(G,M) RG= G (0.6)
(G,M) R = (0.7)
(b) Extension with a set of remainder graphs (#). Let S be a set of remainder graphs. Then, G#S
extends access graph G with every remainder graph in S.
(G,M) # = EG (0.8)
(G,M) #S =
(G,M) R (0.9)
This operation simply involves adding the remainder graph to the given graph at a certain given
node. From the Figure 6-1,we can see that extending g3 with rg1 at l1 results in the access
graph g4. In the second example the given access graph is extended with two remainder graphs
at two nodes, while the third and fourth examples are pretty much straight forward from the
definition given above.
The implementation of this function requires the union function followed by addition of some
edges from the nodes in the set M to the root node of the remainder graph R.
8/13/2019 HRA Project Report
52/57
8/13/2019 HRA Project Report
53/57
53
7. Implementation of Explicit Liveness Analysisin GCC
The theory of data flow analysis and explicit liveness analysis of heap have been seen in
Chapter2 and 3.Later chapters discussed interfacing with GCC and implementation of access
graph and access path libraries. Now we have access path and other information from GCC and
access graph library to support our analysis, so we now implement the explicit liveness analysis.
7.1. The main function
The analysis was divided into 3 functions, the preparatory pass, explicit liveness analysis and
other analyses. They are explained below,
The main data structuring storing the information is,
Figure 7-1: Main data structure
The preparatory pass: This pass consisted of computation of information which is static and
would be needed by all other analyses. Type of statement is computed as ASSIGNMENT,
FUNCTION CALL, RETURN, USE, OTHERand stored in tos field. Access paths are extracted from
each statement and stored for further use in other analyses in access_paths field. Each
statement would consist of maximum 3 access paths due to use of SSA form in GIMPLE. Basic
blocks are numbered in decreasing order while returning from depth first traversal. This
enables us to traverse each function against the control flow when basic blocks are traversed in
decreasing numbering[2].Also information of any statement can be accessed from Stmt_info
as a tuple .
Explicit Liveness analysis: This is the main function computing explicit liveness. It is explained in
the next section.
Other analyses: The other analyses are not implemented as of now.
typedef struct {
enum type_of_satement tos;
Access_paths * access_paths;
Liveness_analysis_info *
liveness_info;
} Heap_analysis_info;
Heap_analysis_info** Stmt_info;
8/13/2019 HRA Project Report
54/57
54
7.2. Explicit liveness analysis
The explicit liveness analysis extracts information from statements and performs analysis on
this information. Some of the information remains constant while some of it changes with each
iteration. We do an initialization pass over the program computing the static information like
LDirect, EKillPath, some information required by LTransfer.
The data structure used to information in this pass is,
Figure 7-2: Data structure for liveness analysis
After the static information is computed and stored then comes the general data flow
algorithm iterations over the program. It is as shown below,
Figure 7-3: General Algorithm
7.2.1. Computation of ELOut
ELOutis computed by directly implementing Equation for ELOut inFigure 3-12.
7.2.2. Computation of ELIn
ELIndepends on the type of statement and calculated as,
Switch on type of statement
Assignment: calculate LTransfer, EKillPath and LDirect using equation inFigure 3-7;
calculate ELGen using equation inFigure 3-12;
return ELIn using equation inFigure 3-12;
Function Call or Return or Use:/*not completely implemented*/
Other: return same as ELOut;Figure 7-4: Computation of ELIn
typedef struct {
access_graph_set * LDirect;
access_graph_set * ELKillPath;
access_graph_set * LTransfer_info;
access_graph_set * ELIn;
access_graph_set * ELOut;
access_graph_set * LIn;
access_graph_set * LOut;
} Liveness_analysis_info;
For each function
for each statement in specified traversal ordercompute ELOut set of statement
compute ELIn set of statement
break if ELOut or ELIn is changed
8/13/2019 HRA Project Report
55/57
55
7.2.3. Computation of LDirect
LDirectalso depends on type of statement and is calculated as,
Switch on type of statement
Assignment:
calcuate LDirect using equation inFigure 3-7;
Function Call*:
calcuate LDirect using equation inFigure 3-8;
Return*:
calcuate LDirect using equation inFigure 3-9;
Use*:
calcuate LDirect using equation inFigure 3-10;
* not implemented completely
Figure 7-5: Computation of LDirect
7.2.4. Calculation of EKillPath
EKillPathis only defined for assignment and function call statement.
Switch on type of statement
Assignment:
caculate EKillPath using equation inFigure 3-7;
Function Call:
/*not implemented completely*/
Figure 7-6: Calculation of EKillPath
7.2.5. Calculation of LTransfer
LTransferis defined only for assignment statement.
Switch on type of statement
Assignment :
calculate LTransfer using equation inFigure 3-7;
Figure 7-7: Calculation of LTransfer
Thus the above mentioned algorithm computes liveness analysis of heap and stores final access
graphs associated with each statement.
8/13/2019 HRA Project Report
56/57
8/13/2019 HRA Project Report
57/57
9. References[1]Aho, Sethi, & Ullman.Dragon Book.Pearson Education.
[2]Khedker. (2010). Generic Data Flow Analyser.IITB.
[3]Khedker. (2010).Manipulating GIMPLE and RTL IRs.IITB: GRC.
[4]Khedker, Sanyal, & Karkare.Data Flow Analysis: Theory and Practice.CRC Press.
[5]Khedker, Sanyal, & Karkare. (2007). Heap Reference Analysis Using Access Graphs.
ACM.
[6]Merrill, J. (2003). GENERIC and GIMPLE: A New Tree Representation for Entire
Functions.GCC Developers Summit.
[7]Stallman, R. (2010). GCC Internals.GCC.