+ All Categories
Home > Documents > 1 Proactive Loop-nest Optimizations Mei Ye [email protected] Acknowledgements: Dinesh Suresh, Roy Ju,...

1 Proactive Loop-nest Optimizations Mei Ye [email protected] Acknowledgements: Dinesh Suresh, Roy Ju,...

Date post: 04-Jan-2016
Category:
Upload: rudolf-miller
View: 217 times
Download: 1 times
Share this document with a friend
25
1 Proactive Loop-nest Optimizations Mei Ye [email protected] Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai
Transcript
Page 1: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

1

Proactive Loop-nest Optimizations

Mei Ye

[email protected]

Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai

Page 2: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

2

Adjacent Loops

Five little pumpkins sitting on a gate …

Page 3: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

3

Page 4: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

4

Func

If Block If

Then Else

If

Then Else

Loop Loop

Then Else

Loop IfBlock

Then

Loop

Else

Page 5: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

5

Proactive Loop Fusion

An automation that applies a set of code transformations (if-merging, head/tail duplication, code motion and etc.) iteratively over the whole function without a fixed order to bring pairs of loops adjacent to each other for the purpose of enabling loop fusion.

Page 6: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

6

Proactive Loop Fusion Candidates

A pair of loops are proactive loop fusion candidates iff:

1) Have a Least Common Predecessor (LCP) in the tree. 2) Paths from candidates to LCP have equal length.3) Each pair of nodes on the path have the same type. Pairs of Ifs have

identical values for condition expressions.4) Loops not adjacent to each other but are otherwise good fusion candidates.

O(( depth * n)^2) (depth: depth in tree, n: number of loops at that depth)

LCP

If Block If

Then Else Then Else

Loop1 Loop2

Page 7: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

7

Proactive Loop Fusion Transformation Candidates

Proactive loop fusion transformation candidates, cand1 and cand2:1. Are immediate children of the LCP of loop fusion candidates.2. Are either a If or a Loop.3. For every sibling in-between (cand1, cand2) that is a Block or a If. The Block can be safely and legally move above cand1 if cand1 is a Loop. The If has at least one path that does not have dependency on loop fusion candidates.4. For every sibling in-between (cand1, cand2] that is a If, Its preceding siblings can be legally if-merged or head-duplicated into it. 5. For every sibling in-between [cand1, cand2) that is a if. Its succeeding siblings can be legally if-merged or tail-duplicated into it.

LCP

If Block If

Then Else Then Else

Loop1 Loop2

cand1 cand2

Page 8: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

8

If Block Ifcand1 cand2

sc1

LCP

sc2

LCP

If If

sc1 sc2

tail-duplication

if-merging

LCP

If

(1)

(2)

(3)

Page 9: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

9

Action Table

sc1 sc2 Action

Loop Block Safe code motion of sc2 before sc1;

Iteration continues on sc1. If Block Tail-duplication of sc2 into sc1;

Iteration continues on sc1. Loop If Head duplication of sc1 into sc2;

Iteration continues on sc2. If If If-merging or tail duplication of sc2

into sc1. Iteration continues on sc1.

If Loop Tail duplication of sc2 into sc1.

Iteration continues on sc1.

Page 10: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

10

if (a) {

for (i=0; i<n;i++)

stmt1;

if (b)

stmt2;

}

if (a) {

for (i=0; i<n;i++)

stmt3;

}

if (a) {

for (i=0;i<n;i++)

stmt1;

if (b)

stmt2;

for (i=0;i<n;i++)

stmt3;

}

Func

If(a) If(a)

Then Else

Func

If(a)

cand1 cand2

Loop If(b)

Then Else

Else

Block

Loop

Then

Then Else

Loop If(b) Loop

Then Else

Block

----------------------------------if-merging------------------------------------------------

LCP

(sc1) (sc2)

LCP

Page 11: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

11

if (a) {

for (i=0;i<n;i++)

stmt1;

if (b)

stmt2;

for (i=0;i<n;i++)

stmt3;

}

if (a) {

if (b) {

for (i=0;i<n;i++)

stmt1;

stmt2;

}

else {

for (i=0;i<n;i++)

stmt1;

}

for (i=0;i<n;i++)

stmt3;

}

If(a)

Then Else

Loop If(b) Loop

Then Else

Block

If(a)

Then Else

If(b) Loop

Then Else

Loop Block Loop

cand1 cand2

------------------------------head duplication-----------------------------------------------

LCP

(sc1) sc2

LCP

sc1 sc2

Page 12: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

12

if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; } else { for (i=0;i<n;i++) stmt1; } for (i=0;i<n;i++) stmt3;}

if (a) { if (b) { for (i=0;i<n;i++) stmt1; stmt2; for (i=0; i<n;i++) stmt3; } else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}

If(a)Then Else

If(b) Loop

Then Else

Loop Block Loop

sc2sc1

If(a)

Then Else

If(b)

Then Else

Loop Block Loop Loop Loop

---------------------------------- tail duplication----------------------------------------------------

LCP

LCP

Page 13: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

13

if (a) { if (b) { for (i=0; i<n;i++) stmt1; stmt2; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for(i=0;i<n;i++) stmt3;}

if (a) { if (b) { stmt2; for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3; }}else { for (i=0;i<n;i++) stmt1; for (i=0;i<n;i++) stmt3;}

If(a)

Then Else

If(b)Then Else

Loop Block Loop Loop Loop

cand1 cand2

If(a)

Then Else

If(b)

Then Else

Block Loop Loop Loop Loop

-----------------------------------code motion-------------------------------------------------------

LCP

(sc1) sc2

LCP

Page 14: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

14

1. void COMP_UNIT::Pro_loop_fusion_trans() {2. // Identifying proactive loop fusion candidates and flags LCPs3. pro_loop_fusion_trans->Classify_loops(func);4. // Start a top-down proactive loop fusion transformations.5. pro_loop_fusion_trans->Top_down_trans(func); }

6. void PRO_LOOP_FUSION_TRANS::Top_down_trans(SC_NODE * sc) {7. if (sc is a LCP) { // Process LCPs8. while (1) {9. // Find proactive loop fusion transformation candidates.10. Find_cand(sc, &cand1, &cand2);11. // Invoke proactive loop fusion transformations.12. if (cand1 && cand2) 13. Traverse_trans(cand1, cand2);14. else15. break; }16. if (transformation happens) {17. // Re- identify proactive loop fusion candidates.18. Classify_loops(sc); } }19. // Recursively visit chid nodes. 20. SC_LIST_ITER sc_list_iter;21. SC_NODE * kid;22. FOR_ALL_ELEM(kid, sc_list_iter, Init(sc->Kids())) 23. Top_down_trans(kid); }

O(n*m) (n: number of LCPs, m: number of intervening nodes among loop fusion candidates)

Page 15: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

15

Proactive Loop Interchange

An automation that applies loop unswitching, reverse loop unswitching, if-condition distribution, if-condition tree height reduction and other control flow graph transformations to eliminate intervening statements between the outer loop and the inner loop in a loop-nest for the purpose of enabling loop interchange.

Page 16: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

16

for (i=0; i<n;i++) {

if (a & (1<<i)) {

if (b)

bar();

else if (c) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

}

}

for (i=0;i<n;i++) {

if (a & (1<<i)) {

if (!b && c) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

else if (b)

bar();

}

}

Loop

if(a&(1<<i))

Then Else

if(b)

Then Else

if(c)Block

Then Else

Loop

Loop

if (a&(1<<i))

Then Else

if(!b&&c)

Then Else

if(b)

Then Else

Block

Loop

-----------------------if-condition tree height reduction-------------------------

Loop

Loop

Loop

Loop

blue

red

red

blue

red

Page 17: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

17

for (i=0; i<n;i++) {

if (a & (1<<i)) {

if (!b && c) {

for (j=0;j<m;j++)

a[j][i]=0;

}

else if (b)

bar();

}

}

for (i=0;i<n;i++) {

if (!b &&c) {

if (a & (1<<i)) {

for (j=0;j<m;j++)

a[j][i] = 0;

}

}

else if (b) {

if (a & (1<<i))

bar();

}

}

Loop

if(a&(1<<i))

Then Else

if(!b&&c)

Then Else

Loop if(b)

Then Else

Block

Loop

if(!b&&c)

Then Else

if(a&(1<<i))

Then Else

Loop

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

------------------------------ if-condition distribution -------------------------------------------------------

Loop

blue

red

Loop

Loop

red

blue

Loop

Page 18: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

18

for (i=0;i<n;i++) { if (!b && c) { if (a & (1<<i)) { for (j=0;j<m;j++) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

Loop

if(!b&&c)Then Else

if(a&(1<<i))

Then Else

Loop

if(b)

Then Else

if(a&(1<<i))

Then Else

BlockBlock

Loop

if(!b&&c)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

----------------------------reversed loop un-switching----------------------------------

Loop

red

blue

Loop

Loop

red

Loop

Page 19: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

19

ty

for (i=0;i<n;i++) { if (!b && c) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } } else if (b) { if (a & (1<<i)) bar(); }}

if (!b && c) { for (i=0;i<n;i++) { for (j=0;j<m;j++) { if (a & (1<<i)) a[j][i]=0; } }}else if (b) { for (i=0;i<n;i++) { if (a & (1<<i)) bar(); }}

Loop

if(!b&&c)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

if(a&(1<<i))

Then Else

Block

if(!b&&c)

Then Else

Loop

Loop

if(a&(1<<i))

Then Else

Block

if(b)

Then Else

Loop

if(a&(1<<i))

Then Else

Block

---------------------------loop un-switching --------------------------------------------------------------

Loop

red

Loop

Loop

Loop

Page 20: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

20

Heuristics

Proactive loop fusion Maximize loop fusion. Large or unknown trip count loops. Loops on symmetric paths with same iteration spaces. Pre-check on transformation legality.

Proactive loop interchange Fully-permutable loop-nest. Memory reference iterates on inner loop’s dimension. Inner loop

has large or unknown trip counts. Simply-nested if-regions. Pre-check on transformation legality.

Page 21: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

21

Peak scores of libquantum

Binary Istanbul (1c) Istanbul (12c)

Default peak 52.8 174

Default peak + proactive loop fusion

81.6 (1.55x) 459 (2.64x)

Default peak + proactive loop fusion + proactive loop interchange

58.8 (-28%) 632 (+38%)

AMD Istanbul, 2.4GHz, 2 socket, 6 cores/socket, 64KB L1 instruction cache, 64KB L1 data cache, 512 KB L2 cache, 6MB/socket L3 cache, 32GB DDR2-800 memory, SLES10 SP2

Page 22: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

22

Reference

Kit Barton (www.cs.ualberta.ca/~cbarton)

Gather intervening codes between loops using dominance relation. Build Data Dependence Graph of the intervening codes. Use schedule queue to identify movable nodes.

Page 23: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

23

Barton’s Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

b := a * 2;

c := b + 6;

g := 0;

if (c < 100)

d := c/2;

else

e := c * 2;

h := g + 10;

Page 24: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

24

Barton’s Non-Adjacent loops example

while (i < N) {a += i;i++;

}b := a * 2;c := b + 6;g := 0;h := g + 10;if (c < 100)

d := c/2;else

e := c * 2;while (j < N) {

f := g + 6;j++;

}

g := 0;h := g + 10;while (i < N) {

a += i;i++;

}while (j < N) {

f := g + 6;j++;

}b := a * 2;c := b + 6;if (c < 100)

d := c/2;else

e := c * 2;

Page 25: 1 Proactive Loop-nest Optimizations Mei Ye mei.ye@amd.com Acknowledgements: Dinesh Suresh, Roy Ju, Michael Lai.

25

Barton’s Pros & Cons

Pros Powerful full-fledged code motion.

Cons Loops must be control-flow equivalent. No finer granularity in if-regions.


Recommended