+ All Categories
Home > Documents > COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33...

COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33...

Date post: 18-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
33
COMP 515: Advanced Compilation for Vector and Parallel Processors Vivek Sarkar Department of Computer Science Rice University [email protected] http://www.cs.rice.edu/~vsarkar/comp515 COMP 515 Lecture 10 22 September, 2011 1
Transcript
Page 1: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

COMP 515: Advanced Compilation for Vector and Parallel Processors

Vivek SarkarDepartment of Computer ScienceRice [email protected]

http://www.cs.rice.edu/~vsarkar/comp515

COMP 515 Lecture 10 22 September, 2011

1

Page 2: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Acknowledgments• Slides from previous offerings of COMP 515 by Prof. Ken

Kennedy—http://www.cs.rice.edu/~ken/comp515/

2

Page 3: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Enhancing Fine-Grained Parallelism (contd)

Chapter 5 of Allen and Kennedy

3

Page 4: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Run-time Symbolic Resolution

• “Breaking Conditions”

DO I = 1, N !

! A(I+L) = A(I) + B(I)

ENDDO

Transformed to..

IF(L.LE.0 .OR. L.GT.N) THEN

! A(L+1:N+L) = A(1:N) + B(1:N)

ELSE

! DO I = 1, N !

! A(I+L) = A(I) + B(I)

! ENDDO

ENDIF

4

Page 5: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Run-time Symbolic Resolution

• Identifying minimum number of breaking conditions to break a recurrence is NP-hard

• Heuristic:—Identify when a critical dependence can be conditionally eliminated

via a breaking condition

5

Page 6: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing

• Reshape Iteration Space to uncover parallelism

DO I = 1, N

! DO J = 1, N

! ! ! (=,<)!

S:! A(I,J) = A(I-1,J) + A(I,J-1)

! ! (<,=)

! ENDDO

ENDDO

Parallelism not apparent

6

Page 7: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing• Dependence Pattern before loop skewing

7

Page 8: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing• Do the following transformation called loop skewing

jj=J+I or J=jj-I

DO I = 1, N

! DO jj = I+1, I+N !

J = jj - I (=,<)

S:! A(I,J) = A(I-1,J) + A(I,J-1)

! ! (<,<)

! ENDDO

ENDDO

Note: Direction Vector Changes, but statement body remains the same

(Examples in textbook usually copy propagate J=jj-I in all uses of J)

8

Page 9: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing• Dependence pattern after loop skewing

9

Page 10: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing

• Reshape Iteration Space to uncover parallelism

DO I = 1, N

! DO J = 1, N

! ! ! (=,<)!

S:! A(I,J) = A(I-1,J) + A(I,J-1)

! ! ! (<,=)

! ENDDO

ENDDO

Parallelism not apparent

10

Page 11: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop SkewingDO I = 1, N ! DV = { (<,<), (=, <) }

! DO j = I+1, I+N

S:!! A(I,j-I) = A(I-1,j-I) + A(I,j-I-1)

! ENDDO

ENDDO

Loop interchange to..

DO j = 2, N+N ! DV = { (<,<), (<, =) }

! DO I = max(1,j-N), min(N,j-1)

S:!! A(I,j-I) = A(I-1,j-I) + A(I,j-I-1)

! ENDDO

ENDDO

Vectorize to..

DO j = 2, N+N

! FORALL I = max(1,j-N), min(N,j-1)

S:!! A(I,j-I) = A(I-1,j-I) + A(I,j-I-1)

! END FORALL

ENDDO

11

Page 12: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Skewing• Disadvantages:

— Varying vector length

– Not profitable if N is small— If vector startup time is more than speedup time, this is not profitable— Vector bounds must be recomputed on each iteration of outer loop

• Apply loop skewing if everything else fails

• See Unimodular transformations in Sarkar-Thekkath-92 paper (and related work) for generalization of loop skewing

12

Page 13: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Chapter 5: Putting It All Together• Good Part

—Many transformations imply more choices to exploit parallelism

• Bad Part—Choosing the right transformation—How to automate transformation selection process?—Interference between transformations

13

Page 14: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Putting It All Together• Example of Interference

DO I = 1, N

! DO J = 1, M !

! ! S(I) = S(I) + A(I,J)

! ENDDO

ENDDO

Sum Reduction gives..

DO I = 1, N !

! S(I) = S(I) + SUM (A(I,1:M))

ENDDO

While Loop Interchange and Vectorization gives..

DO J = 1, N !

! S(1:N) = S(1:N) + A(1:N,J)

ENDDO

14

Page 15: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Putting It All Together• Any algorithm which tries to tie all transformations must

—Take a global view of transformed code—Know the architecture of the target machine

• Goal of our algorithm—Finding ONE good vector loop in each loop nest [works well for most

vector register architectures]

15

Page 16: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Unified Framework• Detection: finding ALL loops for EACH statement that can be

run in vector

• Selection: choosing best loop for vector execution for EACH statement

• Transformation: carrying out the transformations necessary to vectorize the selected loop

• See Section 5.10 for details

16

Page 17: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Performance on Benchmarks

PFC = Parallel Fortran Converter tool developed at Rice by Allen & Kennedy17

Page 18: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Test 171: One example that PFC was unable to vectorize

DO I = 1, N

! ! A(I*N) = A(I*N) + B(I)

ENDDO

18

Page 19: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Coarse-Grain Parallelism

Chapter 6 of Allen and Kennedy

19

Page 20: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Introduction

• Previously, our transformations targeted vector and superscalar architectures.

• In this lecture, we worry about transformations for symmetric multiprocessor machines.

• The difference between these transformations tends to be one of granularity.

20

Page 21: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

p1

Memory

Bus

p2 p3 p4

Review• SMP machines have multiple

processors all accessing a central memory.

• The processors are unrelated, and can run separate processes.

• Starting processes and synchonrization between proccesses is expensive.

21

Page 22: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Synchronization• A basic synchronization element is the barrier at the end of a

parallel loop.

• A barrier in a program forces all processes to reach a certain point before execution continues.

• Bus contention can cause slowdowns.

22

Page 23: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Techniques for parallelizing a single loop

• Single loop methods—Privatization—Loop distribution—Loop fusion—Alignment—Code replication

Page 24: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

DO I = 1,N

S1 T = A(I)

S2 A(I) = B(I)

S3 B(I) = T

ENDDO

PARALLEL DO I = 1,N

PRIVATE t

S1 t = A(I)

S2 A(I) = B(I)

S3 B(I) = t

ENDDO

Single Loops• The analog of scalar expansion is privatization.

• Temporaries can be given separate namespaces for each iteration.

24

Page 25: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Definition: A scalar variable x in a loop L is said to be privatizable if every path from the loop entry to a use of x inside the loop passes through a definition of x.

Privatizability can be stated as a data-flow problem:

We can also do this by declaring a variable x private if its SSA graph doesn’t contain a phi function at the entry.

up(x) = use(x)∪ (!def (x)∩ up(y))y∈succ( x )U

private(L) =!up(entry)∩ ( def (y))y∈LU

Privatization

25

Page 26: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Example of Privatizable Scalar Variable

• “Method of, system for, and computer program product for efficient identification of private variables in program loops by an optimizing compiler”, US Patent 5,790,859, issued Aug 1998.

26

Page 27: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

We need to privatize array variables.

For iteration J, upwards exposed variables are those exposed due to loop body without variables defined earlier.

DO I = 1,100

S0 T(1)=X

L1 DO J = 2,N

S1 T(J) = T(J-1)+B(I,J)

S2 A(I,J) = T(J)

ENDDO

ENDDO

up(L1) = ({T (J −1)} \ {T(n) : 2 ≤ n ≤ j})

J= 2

N

U

So for this fragment, T(1) is the only exposed variable.

Array Privatization

27

Page 28: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

PARALLEL DO I = 1,100 PRIVATE t(N)

S0 t(1) = X

L1 DO J = 2,N

S1 t(J) = t(J-1)+B(I,J)

S2 A(I,J)=t(J)

ENDDO

ENDDO

Array Privatization• Using this analysis, we get the following code:

28

Page 29: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

Loop Distribution• Loop distribution can convert loop-carried dependences to loop-

independent dependences.

• Consequently, it often creates opportunity for outer-loop parallelism.

• However, we must add extra barriers to keep distributed loops from executing out of order, so the overhead may override the parallel savings.

29

Page 30: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

DO I = 2,N

A(I) = B(I)+C(I)

D(I) = A(I-1)*2.0

ENDDO

DO I = 1,N ! Aligned loop

IF (I .GT. 1) A(I) = B(I)+C(I)

IF (I .LT. N) D(I+1) = A(I)*2.0

ENDDO

Loop Alignment• Many carried dependencies are due to array alignment issues.

• If we can align all references, then dependencies would go away, and parallelism is possible.

• This is also related to Software Pipelining

30

D(2) = A(1)*2.0

DO I = 2,N-1 ! Pipelined loop

A(I) = B(I)+C(I)

D(I+1) = A(I)*2.0

ENDDO

A(N) = B(N)+C(N)

Page 31: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

DO I = 2,N

J = MOD(I+N-4,N-1)+2

A(J) = B(J)+C

D(I)=A(I-1)*2.0

ENDDO

D(2) = A(1)*2.0DO I = 2,N-1 A(I) = B(I)+C(I) D(I+1) = A(I)*2.0ENDDOA(N) = B(N)+C(N)

Alignment• There are other ways to align the loop:

31

Page 32: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

DO I = 1,N

A(I+1) = B(I)+C

X(I) = A(I+1)+A(I)

ENDDO

DO I = 1,N A(I+1) = B(I)+C ! Replicated Statement IF (I .EQ 1) THEN t = A(I) ELSE t = B(I-1)+C END IF X(I) = A(I+1)+tENDDO

Code Replication• If an array is involved in a recurrence, then alignment isn’t

possible.

• If two dependencies between the same statements have different dependency distances, then alignment doesn’t work.

• We can fix the second case by replicating code:

32

Page 33: COMP 515: Advanced Compilation for Vector and Parallel ...vs3/PDF//comp515-lec10-f11-v1.pdf · 33 COMP 515, Fall 2011 (V.Sarkar) REMINDER: Homework #4 (Written Assignment) 1. Solve

COMP 515, Fall 2011 (V.Sarkar)33

REMINDER: Homework #4 (Written Assignment)

1. Solve exercise 5.6 in book—Your solution should be legal for all values of K (note that the value of K

is invariant in loop I)

• Due by 5pm on Wednesday, Sep 28th

• Homework should be turned into Amanda Nokleby, Duncan Hall 3137

• Honor Code Policy: All submitted homeworks are expected to be the result of your individual effort. You are free to discuss course material and approaches to problems with your other classmates, the teaching assistants and the professor, but you should never misrepresent someone else’s work as your own. If you use any material from external sources, you must provide proper attribution.


Recommended