Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 225 times |
Download: | 7 times |
Register Allocation and Spilling via Graph ColoringRegister Allocation and Spilling via Graph Coloring
G. J. Chaitin
IBM Research, 1982
MotivationMotivation
Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers
The symbolic registers must be mapped to real registers in a way that avoids conflicts
Symbolic registers that cannot be mapped to real registers must be spilled to memory
We need an algorithm to map registers with minimal spilling cost
Paper OverviewPaper Overview
Register allocation overview Subsumption algorithm Interference graph coloring algorithm Spilling algorithm
Register Allocation StepsRegister Allocation Steps
1. Determine which registers are live at any point in the intermediate language (IL) program
2. Build a register interference graph Nodes represent symbolic registers Edges represent a conflict between symbolic
registers
3. Subsumption: eliminate unnecessary register copies
4. Find a 32-coloring of the interference graph5. Decide which registers to spill if necessary
SubsumptionSubsumption
If the source and destination of a register copy do not interfere, they may be coalesced into a single node
For each register copy in IL, determine whether the registers interfere
If not, coalesce the two nodes into one After first pass, rewrite IL code Repeat until no more coalescing is possible
Subsumption ExampleSubsumption Example
Instructions Live Dead
A = 1 A
B = A B
B = B + 1
C = B C B
D = A D A
… C, D
A B
C D
Subsumption ExampleSubsumption Example
Instructions Live Dead
AD = 1 AD
BC = AD BC
BC = BC + 1
… AD, BC
AD BC
Finding a 32-ColoringFinding a 32-Coloring
Each symbolic register is assigned a color representing a real register
If no adjacent nodes have the same color, then the coloring succeeds
Assume that G has a node N with degree < 32 Then G is 32-colorable iff the reduced graph from
which N and all its edges have been omitted is 32-colorable
Algorithm throws away nodes of degree < 32 until all nodes have been removed
Algorithm fails if no node has degree < 32
3-coloring example3-coloring example
Instructions Live Dead
A = 1 A
B = 2 B
C = 3 C
? = A A
D = 4 D
? = B B
? = C C
? = D D
A B
C D
SpillingSpilling
If the 32-coloring fails, then nodes must be spilled to memory
Spilled registers are stored to memory, then loaded momentarily when their results are needed
Every time spill code is generated, the interference graph must be rebuilt
Usually recoloring succeeds after spilling, but sometimes several passes are required
SpillingSpilling
NP-Complete problem Heuristic: spill the node that minimizes
– Cost of spilling / Degree of node Cost of spilling
– (number of definition points + number of use points) * frequency of each point
In some cases, spilled node can be reloaded for an extended interval
ConclusionConclusion
The graph coloring and spilling algorithms should produce faster code
The register allocation algorithm is efficient– Graph coloring is (N)– But uses (N2) space
Compile-time Copy EliminationCompile-time Copy Elimination
Peter Schnorf
Mahadevan Ganapathi
John Hennessy
Stanford, 1993
MotivationMotivation
Single assignment languages simplify dependency checking
Which simplifies automatic detection and exploitation of parallelism
But single-assignment languages require a large number of copies
Previous implementations eliminate copies at runtime
Increased efficiency if copies can be eliminated at compile time
Paper OverviewPaper Overview
Single-assignment languages Code generation Compile-time copy elimination techniques
– Substitution– Pattern matching– Substructure sharing– Substructure targeting
Results – success!– Eliminated all copies in bubble sort
Single-assignment languagesSingle-assignment languages
Functional languages (LISP, Haskell, SISAL) Simpler dependency checking
– True dependencies – write, read b = f(c), a = f(b)
– Anti-dependencies – read, write a = f(b), b = f(c)
– Output dependencies – write, write a = f(b), a = f(c)
– Aliasing caused by pointers, array indexes
To avoid aliasing, all inputs and outputs are passed by value
Example – Swap(A,i,j)Example – Swap(A,i,j)
Data flow diagram– Edges transport values– Simple nodes are operations
Pick any feasible node evaluation order at random
Naïve implementation– Each edge has its own memory– Swap uses 5 array copies!
Optimized implementation– Swap array updates are done in-
place
AElement AElement
AReplace
AReplace
Input
Example: BubbleSort(A)Example: BubbleSort(A)
Compound nodes represent control flow
Loops are implemented using recursion to avoid multiple assignment of the iteration variable
Naïve implementation– Bubble sort requires (n2) array
copies Optimized implementation
– All array updates are done in place
– But parallelism is decreased
Code Generation OverviewCode Generation Overview
Input is from compiler front-end– IF1: intermediate data-flow graph
representation Code generator eliminates copies Output is in C
– Compiled into machine code using an optimized C compiler
Vertical SubstitutionVertical Substitution
If input and output have the same type and size, they can share memory– Updates are done
in-place
AElement AElement
AReplace
AReplace
Input
4
3
21
Horizontal SubstitutionHorizontal Substitution
If an output has several destinations, the output edges can share memory
AElement AElement
AReplace
AReplace
Input
4
3
21
Horizontal and Vertical SubstitutionHorizontal and Vertical Substitution
Horizontal and vertical substitution can interfere with each other– A node along the substitution chain
modifies the shared object before its last use
Edges can be marked as read-only if they are shared and this is not the last use
Horizontal and Vertical SubstitutionHorizontal and Vertical Substitution
AElement AElement
AReplace
AReplace
Input
4
3
21 AElement AElement
AReplace
AReplace
Input
4
2
31
Interprocedural SubstitutionInterprocedural Substitution
Previous discussion concerned simple nodes that can be analyzed at compiler design time
Information about a function is needed in order to use substitution– Does the function modify an input?– Will an input be chained to an output?
Intersubgraph SubstitutionIntersubgraph Substitution
Substitution analysis is done for each construct
Same basic principles
Determining the Evaluation OrderDetermining the Evaluation Order
Evaluation order can impact efficiency of substitution
Naïve implementation selects the next node to evaluate at random
Hints tell algorithm which nodes should be evaluated before and after other nodes if possible
Hints are ad hoc?
Pattern MatchingPattern Matching
Replace hard-to-optimize pieces of code
Patterns are language-specific Patterns are detected using “ad hoc”
methods
Substructure SharingSubstructure Sharing
Allow substructures to be referenced without copies
AElement can be treated as a NoOp Happens after substitution analysis –
less important Same principles as substitution
analysis
Substructure TargetingSubstructure Targeting
Allow structures to be built from substructures without copies
Similar to substructure sharing
ResultsResults
Compared optimizations versus naïve implementation
Optimization eliminate all copies for bubble sort
Informal comparison to run-time optimizer shows improvements