+ All Categories
Home > Documents > A L anguage for the Compact Representation of Multiple Program Version s

A L anguage for the Compact Representation of Multiple Program Version s

Date post: 02-Jan-2016
Category:
Upload: lev-beasley
View: 15 times
Download: 0 times
Share this document with a friend
Description:
A L anguage for the Compact Representation of Multiple Program Version s. S é bastien Donadio 1,2 , James Brodman 3 , Thomas Roeder 4 , Kamen Yotov 4 , Denis Barthou 2 , Albert Cohen 5 , Mar í a Jes ú s Garzar á n 3 , David Padua 3 , and Keshav Pingali 4. - PowerPoint PPT Presentation
26
A Language for the Compact Representation of Multiple Program Versions S é bastien Donadio 1,2 , James Brodman 3 , Thomas Roeder 4 , Kamen Yotov 4 , Denis Barthou 2 , Albert Cohen 5 , María Jesús Garzarán 3 , David Padua 3 , and Keshav Pingali 4 1 BULL S.A. 2 University of Versailles 3 University of Illinois at Urbana-Champaign 4 Cornell University 5 INRIA International Workshop LCPC 2005
Transcript
Page 1: A  L anguage for the Compact Representation of Multiple Program Version s

A Language for the Compact Representation of Multiple Program Versions

Sébastien Donadio1,2, James Brodman3, Thomas Roeder4,Kamen Yotov4, Denis Barthou2, Albert Cohen5,María Jesús Garzarán3, David Padua3, and Keshav Pingali4

1 BULL S.A. 2 University of Versailles3 University of Illinois at Urbana-Champaign

4 Cornell University 5 INRIA Futurs

International Workshop LCPC 2005

Page 2: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 2

Outline

Context in optimization for high performance

Goals of this language Features of this language Examples (Daxpy & Dgemm) Conclusion

Page 3: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 3

Context

Complex architecture and fragile optimizations Unpredictable performance

Architecture, domain-specific optimizations Resort to empirical search Complement general-purpose optimizations with

user-driven ones

Page 4: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 4

Example FFT performance

Reasonable implementation

(Numerical recipes.

GNU scientific library)

best available

implementation

(FFTW, Intel IPP, Spiral)

Page 5: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 5

Goals of X-Language

Tool to help programmers generate and evaluate multiple versions of their programs: Applying control and data structure transformations Trying multiple transformation sequences and

parameters Evaluating performance of each version and taking

decisions about which transformation variants to try

Page 6: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 6

Goals of X-Language (cont.)

The code must be portable accross ISO-C compilers: Use #pragma annotations for the above tasks Observable program semantics not altered by the

interpretation of these pragmas (assuming transformation legality)

Page 7: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 7

Comparaison with related works

Transformation

Generation

Black box Manual Domain specific

General purpose

Spiral

Atlas

Tick C

ReflectionCompiler

XLG

X-Language

Page 8: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 8

Features of the language

Elementary transformations (fission, stripmining, interchanging, unrolling,…)

Composition of transformations Conditional transformations (versioning) Procedural abstraction of transformations A mechanism to define new transformations No validity check is performed for the

transformation

Page 9: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 9

General schema of X-Language

Code withPragmas

TransformationDescriptions

Execute and measure performance

searchDifferentversions

Compile

Page 10: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 10

X-Language

Naming loops or scopes#pragma xlang name loop1for(i=0;i<10;i++) {a[i]=4;}

Format of transformation

#pragma xlang stripmine loop1 4 ii

#pragma xlangTransformatio

nname

Loop name

parameters

Name of additional

loops generatedby

transformations

Page 11: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 11

Elementary transformations implemented in X-language Full unrolling Partial unrolling Scalar promote Interchange Loop fission Loop fusion Strip mining Lifting Sofware pipelining

Page 12: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 12

Applying transformation

#pragma xlang loop1

for(i=min;i<4*max;i++)

a[i]=b[i]

#pragma xlang stripmine loop1 4 ii

#pragma xlang loop1

for(i=min;i<4*max;i+=4)

int nl1;

#pragma xlang ii

for(nl1=0;nl1<4;nl1 ++)

a[i+nl1]=b[i+nl1]

Page 13: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 13

How to search the value of parameters ? Using multistage evaluation External script

for(k=1;k<16;k=2*k)

‘{

#pragma xlang loop1

for(i=min;i<max;i++)

a[i]=b[i]

#pragma xlang stripmine loop1 ‘d(k) ii‘}

Page 14: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 14

Composing transformations

#pragma xlang loop1

for(i=0;i<4;i++)

#pragma xlang loop2

for(j=min2;j<max2;j++)

a[i]=b[j]

#pragma xlang interchange loop1 loop2

#pragma xlang fullunroll loop1

#pragma xlang loop2

for(j=min2;j<max2;j++)

{

a[0]=b[j];

a[1]=b[j];

a[2]=b[j];

a[3]=b[j];

}

Page 15: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 15

Analyses and Transformations

Static analyses should also enable the design of smarter (higher level) transformation primitives

External tool to find information

Page 16: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 16

Example with analysisfor(i=2;i<2*N;i+=2)

{u[i]=u[i-1]+u[i-2];

u[i+1]=u[i]+u[i-1];}

for(i=2;i<2*N;i+=2)

{u_1=u[i-1];

u_2=u[i-2];

u_0 = u_1 + u_2;

u_1 = u_0 + u_1;

u[i]=u_0;

u[i+1]=u _1;}

Without interference graph

u_0=u[0];

u_1=u[1];

for(i=2;i<2*N;i+=2)

{u_0 = u_1 + u_2;

u_1 = u_0 + u_1;}

u[i]=u_0;

u[i+1]=u _1;}

With interference graph

Page 17: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 17

Extending the X-LanguageRewriting rule :

#pragma xlang name iloop

for (i = 0; i < N; i++)

{<body> }

%

Pattern before Pattern after transformation

#pragma xlang name iiloop1

for (ii = 0; ii < (N/4)*4; ii += 4)

#pragma xlang name iloop1

for (i = ii; i < ii+4; i++)

{ <body>}

#pragma xlang name iloop2

for (i = (N/4)*4; i < N; i++) f

{<body>}

%%

Page 18: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 18

Daxpy Example

#pragma xlang name loop1for(k=0;k<2000;k++)

Y[k]=alpha*X[k]*Y[k];

We can modify values of N/** A few values tested for unrolling factor – Different generated version **/#pragma xlang transform stripmine loop1 k N;#pragma xlang transform scalarize-in X in loop1#pragma xlang transform lift l1.loads before loop1#pragma xlang transform scalarize-out Y in loop1#pragma xlang transform lift loop1.loads before loop1#pragma xlang transform lift loop1.stores after loop1#pragma xlang transform fullunroll loop1.loads#pragma xlang transform fullunroll loop1.stores#pragma xlang transform fullunroll loop1

Page 19: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 19

Daxpy Example – Different generated versions

Unrolling factor : 2 for(k=0;k<2000;k=k+2){

double x_0 = X[k+0];

double x_1 = X[k+1];

double y_0 = Y[k+0];

double y_1 = Y[k+1];

y_0=alpha*x_0+y_0;

y_1=alpha*x_1+y_1;

Y[k+0] = y_0;

Y[k+1] = y_1;

}

Unrolling factor : 4 for(k=0;k<2000;k=k+4){

double x_0 = X[k+0];

double x_1 = X[k+1];

double x_2 = X[k+2];

double x_3 = X[k+3];

double y_0 = Y[k+0];

double y_1 = Y[k+1];

double y_2 = Y[k+2];

double y_3 = Y[k+3];

y_0=alpha*x_0+y_0;

y_1=alpha*x_1+y_1;

y_2=alpha*x_2+y_2;

y_3=alpha*x_3+y_3;

Y[k+0] = y_0;

Y[k+1] = y_1;

Y[k+2] = y_2;}

Unrolling factor : 8

for(k=0;k<2000;k=k+16){

double x_0 = X[k+0];

double x_1 = X[k+1];

double x_2 = X[k+2];

y_0=alpha*x_0+y_0;

y_1=alpha*x_1+y_1;

y_2=alpha*x_2+y_2;

y_3=alpha*x_3+y_3;

Y[k+0] = y_0;

Y[k+1] = y_1;

Y[k+2] = y_2;

Y[k+3] = y_3;

}

Page 20: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 20

Matrix Multiply(Loop Declaration)

#pragma xlang name iloop

for (i = 0; i < NB; i++)

#pragma xlang name jloop

for (j = 0; j < NB; j++)

#pragma xlang name kloop

for (k = 0; k < NB; k++) {

c[i][j]=c[i][j]+a[i][k]*b[k][j];

}

The DGEMM example:

Matrix Multiplication

Problems :

Data locality

Scheduling

Page 21: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 21

Matrix Multiply(Transformation Declaration)

#pragma xlang transform stripmine iloop NU NUloop

#pragma xlang transform stripmine jloop MU MUloop

#pragma xlang transform interchange kloop MUloop

#pragma xlang transform interchange jloop NUloop

#pragma xlang transform interchange kloop NUloop

#pragma xlang transform fullunroll NUloop

#pragma xlang transform fullunroll MUloop

#pragma xlang transform scalarize_in b in kloop

#pragma xlang transform scalarize_in a in kloop

#pragma xlang transform scalarize_in&out c in kloop

#pragma xlang transform lift kloop.loads before kloop

#pragma xlang transform lift kloop.stores after kloop

Sequence of transformations for Itanium:

Page 22: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 22

Matrix Multiply(Transformation Sequence)#pragma xlang name iloopfor(i = 0; i < NB; i++){#pragma xlang name jloopfor(j = 0; j < NB; j += 4){#pragma xlang name kloop.loads{c_0_0 = c[i+0][j+0];c_0_1 = c[i+0][j+1];c_0_2 = c[i+0][j+2];c_0_3 = c[i+0][j+3];}#pragma xlang name kloopfor(k = 0; k < NB; k++){{a_0 = a[i+0][k];a_1 = a[i+0][k];a_2 = a[i+0][k];a_3 = a[i+0][k];}

{b_0 = b[k][j+0];b_1 = b[k][j+1];b_2 = b[k][j+2];b_3 = b[k][j+3];}{c_0_0=c_0_0+a_0*b_0;c_0_1=c_0_1+a_1*b_1;c_0_2=c_0_2+a_2*b_2;c_0_3=c_0_3+a_3*b_3;}...}#pragma xlang name kloop.stores{c[i+0][j+0] = c_0_0;c[i+0][j+1] = c_0_1;c[i+0][j+2] = c_0_2;c[i+0][j+3] = c_0_3;}}}... // Remainder code

Page 23: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 23

Block copies

Block Matrix Multiplication: better performance if matrices are contiguous in memory (TLB)

Poor performance of C copy Resort to a tool generating specific asm

code Tool generating a good code with search (XLG is

an asm search)

Page 24: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 24

Matrix Multiply(Results)

Dgemm on Itanium 2

3300

3500

37003900

4100

4300

4500

47004900

5100

5300

128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048

Matrix Size

Pe

rfo

rma

nc

e(M

Flo

ps

)

Atlas

XLanguage+XLG

XLanguage+Memcopy

XLanguage+MKL

Peak

Page 25: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 25

Conclusion

Describe transformations with reuse, procedures, conditionals

X-Language: language designed to generate multiversion programs Multistage language with a flexible pattern-matching and

rewriting language Experts can describe specific application transformation

optimizations

Page 26: A  L anguage for the Compact Representation of Multiple Program Version s

International Workshop LCPC 2005 26

Future works

Dependence analysis Going further searching asm code

transformation More transformations: vectorization,

alignment,…


Recommended