Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.

Register Allocation and Spilling via Graph ColoringRegister Allocation and Spilling via Graph Coloring

G. J. Chaitin

IBM Research, 1982

MotivationMotivation

Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers

The symbolic registers must be mapped to real registers in a way that avoids conflicts

Symbolic registers that cannot be mapped to real registers must be spilled to memory

We need an algorithm to map registers with minimal spilling cost

Paper OverviewPaper Overview

Register allocation overview Subsumption algorithm Interference graph coloring algorithm Spilling algorithm

Register Allocation StepsRegister Allocation Steps

1. Determine which registers are live at any point in the intermediate language (IL) program

2. Build a register interference graph Nodes represent symbolic registers Edges represent a conflict between symbolic

registers

3. Subsumption: eliminate unnecessary register copies

4. Find a 32-coloring of the interference graph5. Decide which registers to spill if necessary

SubsumptionSubsumption

If the source and destination of a register copy do not interfere, they may be coalesced into a single node

For each register copy in IL, determine whether the registers interfere

If not, coalesce the two nodes into one After first pass, rewrite IL code Repeat until no more coalescing is possible

Subsumption ExampleSubsumption Example

Instructions Live Dead

A = 1 A

B = A B

B = B + 1

C = B C B

D = A D A

… C, D

A B

C D

Subsumption ExampleSubsumption Example


AD = 1 AD

BC = AD BC

BC = BC + 1

… AD, BC

AD BC

Finding a 32-ColoringFinding a 32-Coloring

Each symbolic register is assigned a color representing a real register

If no adjacent nodes have the same color, then the coloring succeeds

Assume that G has a node N with degree < 32 Then G is 32-colorable iff the reduced graph from

which N and all its edges have been omitted is 32-colorable

Algorithm throws away nodes of degree < 32 until all nodes have been removed

Algorithm fails if no node has degree < 32

3-coloring example3-coloring example


A = 1 A

B = 2 B

C = 3 C

? = A A

D = 4 D

? = B B

? = C C

? = D D

A B

C D

SpillingSpilling

If the 32-coloring fails, then nodes must be spilled to memory

Spilled registers are stored to memory, then loaded momentarily when their results are needed

Every time spill code is generated, the interference graph must be rebuilt

Usually recoloring succeeds after spilling, but sometimes several passes are required

SpillingSpilling

NP-Complete problem Heuristic: spill the node that minimizes

– Cost of spilling / Degree of node Cost of spilling

– (number of definition points + number of use points) * frequency of each point

In some cases, spilled node can be reloaded for an extended interval

ConclusionConclusion

The graph coloring and spilling algorithms should produce faster code

The register allocation algorithm is efficient– Graph coloring is (N)– But uses (N2) space

Compile-time Copy EliminationCompile-time Copy Elimination

Peter Schnorf

Mahadevan Ganapathi

John Hennessy

Stanford, 1993

MotivationMotivation

Single assignment languages simplify dependency checking

Which simplifies automatic detection and exploitation of parallelism

But single-assignment languages require a large number of copies

Previous implementations eliminate copies at runtime

Increased efficiency if copies can be eliminated at compile time

Paper OverviewPaper Overview

Single-assignment languages Code generation Compile-time copy elimination techniques

– Substitution– Pattern matching– Substructure sharing– Substructure targeting

Results – success!– Eliminated all copies in bubble sort

Single-assignment languagesSingle-assignment languages

Functional languages (LISP, Haskell, SISAL) Simpler dependency checking

– True dependencies – write, read b = f(c), a = f(b)

– Anti-dependencies – read, write a = f(b), b = f(c)

– Output dependencies – write, write a = f(b), a = f(c)

– Aliasing caused by pointers, array indexes

To avoid aliasing, all inputs and outputs are passed by value

Example – Swap(A,i,j)Example – Swap(A,i,j)

Data flow diagram– Edges transport values– Simple nodes are operations

Pick any feasible node evaluation order at random

Naïve implementation– Each edge has its own memory– Swap uses 5 array copies!

Optimized implementation– Swap array updates are done in-

place

AElement AElement

AReplace

AReplace

Input

Example: BubbleSort(A)Example: BubbleSort(A)

Compound nodes represent control flow

Loops are implemented using recursion to avoid multiple assignment of the iteration variable

Naïve implementation– Bubble sort requires (n2) array

copies Optimized implementation

– All array updates are done in place

– But parallelism is decreased

Code Generation OverviewCode Generation Overview

Input is from compiler front-end– IF1: intermediate data-flow graph

representation Code generator eliminates copies Output is in C

– Compiled into machine code using an optimized C compiler

Vertical SubstitutionVertical Substitution

If input and output have the same type and size, they can share memory– Updates are done

in-place

AElement AElement

AReplace

AReplace

Input

4

3

21

Horizontal SubstitutionHorizontal Substitution

If an output has several destinations, the output edges can share memory

AElement AElement

AReplace

AReplace

Input

4

3

21

Horizontal and Vertical SubstitutionHorizontal and Vertical Substitution

Horizontal and vertical substitution can interfere with each other– A node along the substitution chain

modifies the shared object before its last use

Edges can be marked as read-only if they are shared and this is not the last use

Horizontal and Vertical SubstitutionHorizontal and Vertical Substitution

AElement AElement

AReplace

AReplace

Input

4

3

21 AElement AElement

AReplace

AReplace

Input

4

2

31

Interprocedural SubstitutionInterprocedural Substitution

Previous discussion concerned simple nodes that can be analyzed at compiler design time

Information about a function is needed in order to use substitution– Does the function modify an input?– Will an input be chained to an output?

Intersubgraph SubstitutionIntersubgraph Substitution

Substitution analysis is done for each construct

Same basic principles

Determining the Evaluation OrderDetermining the Evaluation Order

Evaluation order can impact efficiency of substitution

Naïve implementation selects the next node to evaluate at random

Hints tell algorithm which nodes should be evaluated before and after other nodes if possible

Hints are ad hoc?

Pattern MatchingPattern Matching

Replace hard-to-optimize pieces of code

Patterns are language-specific Patterns are detected using “ad hoc”

methods

Substructure SharingSubstructure Sharing

Allow substructures to be referenced without copies

AElement can be treated as a NoOp Happens after substitution analysis –

less important Same principles as substitution

analysis

Substructure TargetingSubstructure Targeting

Allow structures to be built from substructures without copies

Similar to substructure sharing

ResultsResults

Compared optimizations versus naïve implementation

Optimization eliminate all copies for bubble sort

Informal comparison to run-time optimizer shows improvements

ResultsResults

ConclusionsConclusions

Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language.

Copy elimination no longer has to be done at run-time.

Single assignment languages should be more efficient for parallel programs.

Date post:	22-Dec-2015
Category:	Documents
View:	225 times
Download:	7 times

Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.

Documents