+ All Categories
Home > Documents > Optimizing Memory Accesses for Spatial Computation

Optimizing Memory Accesses for Spatial Computation

Date post: 06-Jan-2016
Category:
Upload: ronia
View: 45 times
Download: 3 times
Share this document with a friend
Description:
Optimizing Memory Accesses for Spatial Computation. Mihai Budiu , Seth Goldstein CGO 2003. Optimizing Memory Accesses for Spatial Computation. Program. Compiler. This work. Why at CGO?. C. Predicated IR. Optimized IR. Optimizing Memory Accesses for Spatial Computation. =*q. *p=. - PowerPoint PPT Presentation
43
Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003
Transcript
Page 1: Optimizing Memory Accesses for Spatial Computation

Optimizing Memory Accesses for Spatial Computation

Mihai Budiu, Seth Goldstein

CGO 2003

Page 2: Optimizing Memory Accesses for Spatial Computation

2

Optimizing Memory Accesses for Spatial Computation

Program

Compiler

Page 3: Optimizing Memory Accesses for Spatial Computation

3

This work

C

Predicated IR

Optimized IR

Why at CGO?

Page 4: Optimizing Memory Accesses for Spatial Computation

4

Optimizing Memory Accesses for Spatial Computation=*q

*p=

=a[i]

=*q *p= =a[i]

=*p

=*p

This paper describes compiler representations and algorithms to• increase memory access parallelism• remove redundant memory accesses

Tim

e

Page 5: Optimizing Memory Accesses for Spatial Computation

5

...

def-use

may-dep.

:Intermediate Representation

Traditionally

• SSA + predication

• Uniform for scalars and memory

• Explicitly encode may-depend

• Summarize control-flow

• Executable

Our proposal

CFG

Page 6: Optimizing Memory Accesses for Spatial Computation

6

Contributions

• Predicated SSA optimizations for memory– Boolean manipulation instead of CFG dependences– Powerful term-rewriting optimizations for memory– Simple to implement and reason about

• Expose memory parallelism in loops– New loop pipelining techniques– New parallelization method: loop decoupling

Page 7: Optimizing Memory Accesses for Spatial Computation

7

Outline

• Introduction

• Program representation

• Redundant memory operation removal

• Pipelining memory accesses in loops

• Conclusions

Page 8: Optimizing Memory Accesses for Spatial Computation

8

Executable SSA

if (x)y = x*2;

elsey++;

* +

2 y

y’

!

x 1

• Program representation is a graph:• Nodes = operations, edges = values

Page 9: Optimizing Memory Accesses for Spatial Computation

9

Predication

…=*p;if (x)

…=*q;else

*r = …;

(1) …=*p;

(x) …=*q;

(!x) *r = …;

• Predicates encode control-flow• Hyperblock ) branch-free code• Caveat: all optimizations on hyperblock scope

Pred

Page 10: Optimizing Memory Accesses for Spatial Computation

10

Read-write SetsMemory

*p=…;

if (x)…=*q;

else*r =

…;

Entry

Exit

Page 11: Optimizing Memory Accesses for Spatial Computation

11

Token EdgesMemory

*p=…;

if (x)…=*q;

else*r = …;

Entry

Exit

Page 12: Optimizing Memory Accesses for Spatial Computation

12

Tokens ¼ SSA for Memory

*p=…;

if (x)…=*q;

else*r =

…;

Entry

*p=…;

if (x)…=*q;

else*r = …;

Entry

Page 13: Optimizing Memory Accesses for Spatial Computation

13

Meaning of Token Edges• Token graph is maintained transitively reduced

• Focus the optimizer• Linear space complexity in practice

• Maybe dependent• No intervening memory operation

• Independent

…=*q

*p=…

…=*q

*p=…

Page 14: Optimizing Memory Accesses for Spatial Computation

14

Outline• Introduction• Program Representation• Redundant memory operation removal

– Dead code elimination– Load || load– Store ) load– Store ) store– Useless token removal– ...

• Pipelining memory accesses in loops• Evaluation• Conclusions

Page 15: Optimizing Memory Accesses for Spatial Computation

15

Dead Code Elimination

*p=…(false)

Page 16: Optimizing Memory Accesses for Spatial Computation

16

¼ PRE

...=*p(p1) ...=*p(p2) ...=*p(p1 Ç p2)

This corresponds in the CFG to lifting the load to a basic block dominating the original loads

Page 17: Optimizing Memory Accesses for Spatial Computation

17

Forwarding Data (St ) Ld)

…=*p(p2)

*p=…(p1)

…=*p

*p=…(p1)

(p2 Æ : p1)

Load is executed only if store is not

Page 18: Optimizing Memory Accesses for Spatial Computation

18

Forwarding Data (2)

…=*p(p2)

*p=…(p1)

…=*p(false)

*p=…(p1)

• When p2 ) p1 the load becomes dead...• ...i.e., when store dominates load in CFG

Page 19: Optimizing Memory Accesses for Spatial Computation

19

Store-store (1)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• When p1 ) p2 the first store becomes dead...• ...i.e., when second store post-dominates first in CFG

Page 20: Optimizing Memory Accesses for Spatial Computation

20

Store-store (2)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• Token edge eliminated, but...• ...transitive closure of tokens preserved

Page 21: Optimizing Memory Accesses for Spatial Computation

21

Key Observation

The control-dependence tests and transformations

(i.e., dominance, post-dominance)

are carried by simple predicate

Boolean manipulations.

Page 22: Optimizing Memory Accesses for Spatial Computation

22

Implementation Is Clean

Optimization LOC

Useless dependence removal 160

Immutable loads 70

Dead-code elimination (incl. memory op) 66

Load-after-load and store-after-store removal 153

Redundant load and store removal 94

Transitive reduction of token edges 61

Loop-invariant scalar & load discovery 74

Page 23: Optimizing Memory Accesses for Spatial Computation

23

Operations Removed:- static data -

0

5

10

15

20

25

30

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

pegw

it_e

pegw

it_d

g721

_e

g721

_d

mes

a go

m88

ksim

com

pres

s

li

ijpeg pe

rl

vorte

x

reads

writes

Per

cent

Mediabench SpecInt95

Page 24: Optimizing Memory Accesses for Spatial Computation

24

Operations Removed:- dynamic data -

0

5

10

15

20

25

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

pegw

it_e

pegw

it_d

g721

_e

g721

_d

mes

a go

m88

ksim

com

pres

s

li

ijpeg pe

rl

vorte

x

readswrites

57 43

Per

cent

Mediabench SpecInt95

Page 25: Optimizing Memory Accesses for Spatial Computation

25

Outline• Introduction

• Program Representation

• Redundant memory operation removal

• Pipelining memory accesses in loops

• Conclusions

Page 26: Optimizing Memory Accesses for Spatial Computation

26

Loop Pipelining

...=*in++;

*out++ =...

...=*in++;

*out++ =...

• 1 loop ) 2 loops, which can slip with respect to each other• ‘in’ slips ahead of ‘out’ ) pipelining of the loop body

Page 27: Optimizing Memory Accesses for Spatial Computation

27

One Token Loop Per “Object”

extern int a[ ];

void g(int* p)

{

int i;

for (i=0; i < N; i++)

a[i] += *p;

}

a[ ] =*a

*a=

a

a

=*p

other

other

Page 28: Optimizing Memory Accesses for Spatial Computation

28

All accesses after current iteration

All accesses prior to current iteration

Inter-iteration Dependences

a other

=*p=*a

*a=

a other

!

Page 29: Optimizing Memory Accesses for Spatial Computation

29

collector

generator

Monotone Addresses

*a++=

• a[1] must receive token from a[0]• but these are independent!

*a++=

Page 30: Optimizing Memory Accesses for Spatial Computation

30

independent

Loop Decoupling: Motivation

for (i=0; i < N; i++) {

a[i] = ....

.... = a[i+3];

}

a

a[i]=

=a[i+3]

a

a[i]=

=a[i+3]

Page 31: Optimizing Memory Accesses for Spatial Computation

31

Loop Decoupling

for (i=0; i < N; i++) {

a[i] = ....

.... = a[i+3];

}

a0

a[i]=

=a[i+3]

a3

tk(3)

Slip control

• Token generator emits 3 tokens “instantly”• It allows a0 loop to slip at most 3 iterations ahead of a3

Page 32: Optimizing Memory Accesses for Spatial Computation

32

Performance Impact of Memory Optimizations

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

adpc

m_e

adpc

m_d

gsm_

e

gsm_

d

epic_

e

epic_

d

mpeg

2_e

mpeg

2_d

jpeg_

e

jpeg_

d

pegw

it_e

pegw

it_d

g721

_e

g721

_d mesa

m88k

sim

comp

ress

li

ijpeg pe

rl

vorte

x

Spe

ed-u

p vs

. no

mem

ory

optim

izat

ions

2.1

2.0

Mediabench SpecInt95

Page 33: Optimizing Memory Accesses for Spatial Computation

33

Conclusions

• Tokens = compact representation of memory dependences

• Explicit dependences enable easy & powerful optimizations

• Simple predicate manipulation replaces control-flow transforms

• Fine-grain dependence information enables loop pipelining

• Token generators + loop decoupling = dynamic slip control

Page 34: Optimizing Memory Accesses for Spatial Computation

34

Backup Slides

• Compilation speed• Compiler structure• Tokens in hardware• Cycle-free condition• How performance is evaluated• Sources of performance• Aren’t these optimizations well known?• Computing predicates

Page 35: Optimizing Memory Accesses for Spatial Computation

35

Compilation Speed

• On average 3.5x slower than gcc -O3• Max 10x slower• We do intra-procedural pointer analysis, but no scheduling or register allocation

back

Page 36: Optimizing Memory Accesses for Spatial Computation

36

Compiler Structure

Suif CC

C/FORTRAN

low Suif IR

Pointer analysisLive var. analysisCFG constructionUnreachable codeBuild hyperblocksCtrl dominance Path predicates

high Suif IR

inliningunrolling

call-graph

Pegasus(Predicated SSA)

call-graph

C circuitsimulation

Verilog

back

CSEDead-code

PREInduction variablesStrength reductionLoop-invariant lift

ReassociationMemory optimizationConstant propagation

Constant foldingUnreachable code

Page 37: Optimizing Memory Accesses for Spatial Computation

37

Tokens in Hardware

Load

add

data

predtoken

token

Memory

• Tokens are actual operation inputs and outputs• Operation waits for token to execute• Output token released as soon as side-effect certain

back

LSQ

Page 38: Optimizing Memory Accesses for Spatial Computation

38

Cycle-free Condition

...=*p(p1)

...=*p(p2)

...=*p(p1 Ç p2)

• Requires a reachability computation to test• Using memoization complexity is amortized constant

back

Page 39: Optimizing Memory Accesses for Spatial Computation

39

How Performance Is Evaluated

C

Unlimited ILP

LSQ

limited BW(2 words/c)

L18K

L21/4M

Mem

2

8

72

back

Page 40: Optimizing Memory Accesses for Spatial Computation

40

Sources of Performance

• Removal of redundant operations

• More freedom in scheduling

• Pipelining loops

back

Page 41: Optimizing Memory Accesses for Spatial Computation

41

Aren’t These Opts. Well Known?

• gcc –O3, Pentium• Sun Workshop CC –xo5, Sparc• DEC cc –O4, Alpha• MIPSpro cc –O4, SGI• SGI ORC –O4, Itanium• IBM cc –O3, AIX• Our compiler

back

void f(unsigned*p, unsigned a[], int i){

if (p) a[i] += p;else a[i]=1;a[i] <<= a[i+1];

}

Only ones to removeaccesses to a[i]

Page 42: Optimizing Memory Accesses for Spatial Computation

42

Computing Predicates

• Correct for irreducible graphs• Correct even when speculatively computed • Can be eagerly computed

s t

b

back

Page 43: Optimizing Memory Accesses for Spatial Computation

43

Spatial Computation


Recommended