YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: The accULL CompilerUndergraduate Thesis Project
Juan Jose Fumero AlfonsoUniversidad de La Laguna
22 de junio de 2012
1 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
2 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
3 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Moore’s Law
Every 18 months the number of transistors could be doubled.
4 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Nowadays Parallel Architectures
5 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Parallel Architectures
The solution
• More processors
• More cores per processor
6 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Parallel Architectures
The systems are hybrid using all options.
7 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Parallel Architectures
8 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
OpenMP: Shared MemoryProgramming
• API that support SMP programming.
• Multi-platform.
• A directive-based approach.
• A set of compiler directives, library routines and environmentvariables for parallel programming.
OpenMP example
1 #pragma omp p a r a l l e l2 {3 #pragma omp master
4 {5 nthreads = omp_get_num_threads ( ) ;6 }7 #pragma omp f o r p r i v a t e ( x ) reduction (+: sum ) schedule ( runtime )8 f o r ( i=0; i < NUM_STEPS ; ++i ) {9 x = ( i+0.5)∗step ;
10 sum = sum + 4.0/(1 .0+ x∗x ) ;11 }12 #pragma omp master
13 {14 pi = step ∗ sum ;15 }16 }
9 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
MPI: Message Passing Interface
• A language-independent communications protocol used toprogram parallel applications.
• MPI’s goals are high performance, scalability and portability.
MPI example
1 MPI_Comm_size ( MPI_COMM_WORLD , &MPI_NUMPROCESSORS ) ;2 MPI_Comm_rank ( MPI_COMM_WORLD , &MPI_NAME ) ;3 w = 1.0 / N ;4 f o r ( i = MPI_NAME ; i < N ; i += MPI_NUMPROCESSORS ) {5 local = ( i + 0 . 5 ) ∗ w ;6 pi_mpi = pi_mpi + 4.0 / ( 1 . 0 + local ∗ local ) ;7 }8 MPI_Allreduce(&pi_mpi , &gpi_mpi , 1 , MPI_DOUBLE , MPI_SUM , MPI_COMM_WORLD ) ;
10 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
High Performance Computing
• The most powerful computers at the moment.
• Systems with a massive number of processors.
• High speed of calculation.
• It contains thousands of processors and cores.
• Systems very expensive and consuming a huge amount of energy.
11 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
TOP 500: High PerformanceComputing
• The TOP500 project ranks and details the 500 (non-distributed)most powerful known computer systems in the world.
• The project publishes an updated list of the supercomputerstwice a year.
12 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Accelerators Era
13 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
CUDADeveloped by NVIDIA.
• Pros: its performance, it is easier than OpenCL.
• Con: only works with NVIDIA hardware.
14 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
CUDA
1 __global__ v o i d mmkernel ( f l o a t∗ a , f l o a t∗ b , f l o a t∗ c , i n t n ,2 i n t m , i n t p )3 {4 i n t i = blockIdx . x∗32 + threadIdx . x ;5 i n t j = blockIdx . y ;6 f l o a t sum = 0.0 f ;7 f o r ( i n t k = 0 ; k < p ; ++k ) sum += b [ i+n∗k ] ∗ c [ k+p∗j ] ;8 a [ i+n∗j ] = sum ;9 }
15 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
OpenCL
A framework developed by the Khronos Group.
• Pros: can be used with any device, it is a standard.
• Cons: more complex than CUDA, immature.
16 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
OpenCL
1 __kernel v o i d matvecmul ( __global f l o a t ∗a ,2 const __global f l o a t ∗b , const __global f l o a t ∗c ,3 const uint N ) {4 f l o a t R ;5 i n t k ;6 i n t xid = get_global_id (0 ) ;7 i n t yid = get_global_id (1 ) ;8 i f ( xid < N ) {9 i f ( yid < N ) {
10 R = 0 . 0 ;11 f o r ( k = 0 ; k < N ; k++)12 R += b [ xid ∗ N + k ] ∗ c [ k∗N + yid ] ;13 a [ xid∗N+yid ] = R ;14 }15 }16 }
17 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
Pros
1 The programmer can use all machine’s devices.
2 GPU and CPU could work in parallel.
18 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
Problems
1 The programmer needs to know low-level details of thearchitecture.
19 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Languages for HeterogeneousProgramming
Cons
1 The programmer needs to know low-level details of thearchitecture.
2 Source codes need to be rewritten:• One version for OpenMP/MPI.• A different version for GPU.
3 Good performance requires a great effort in parameter tuning.
4 These languages (CUDA/OpenCL) are complex and new fornon-experts.
20 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
GPGPU (General Purpose GPU)Computing
Can we use GPUs for parallelcomputing? Is this efficient?
21 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
The NBody Problem
• Simulation numericallyapproximates theevolution of a system ofbodies.
• Each body continuouslyinteracts with otherbodies.
• Fluid flow simulations.
22 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
NBody description
Acceleration
ai =Fi
mi
ai ≈ G ·∑
1≤j≤N
mj rij
(||rij ||2 + ε2)3/2
23 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
CUDA implementation
• The method is Particle to Particle.
• Its computational complexity is O(n2)
• Evaluate all pair-wise interactions. It is exact.
24 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
CUDA implementation: blocks andgrids
25 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
CUDA Kernel: Tile calculation
1 __device__ float3 gravitation ( float4 myPos , float3 accel ) {2 e x t e r n __shared__ float4 sharedPos [ ] ;3 uns igned long i = 0 ;45 f o r ( uns igned i n t counter = 0 ; counter < blockDim . x ; counter++ )6 {7 accel = bodyBodyInteraction ( accel , SX ( i++) , myPos ) ;8 }9 r e t u r n accel ;
10 }
26 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
CUDA Kernel: calculate forces
1 __global__ v o i d calculate_forces ( float4∗ globalX , float4∗ globalA )2 {3 // A sha r ed memory b u f f e r to s t o r e the body p o s i t i o n s .4 e x t e r n __shared__ float4 [ ] shPosition ;5 float4 myPosition ;6 i n t i , tile ;7 float3 acc = {0.0f , 0 . 0 f , 0 . 0 f};8 // G l oba l t h r ead ID ( r e p r e s e n t the un ique body i ndex i n the s imu l a t i o n )9 i n t gtid = blockIdx . x ∗ blockDim . x + threadIdx . x ;
10 // This i s the p o s i t i o n o f the body we a r e computing the a c c e l e r a t i o n f o r .11 float4 myPosition = globalX [ gtid ] ;12 f o r ( i = 0 , tile = 0 ; i < N ; i += blockDim . x , tile++)13 {14 i n t idx = tile ∗ blockDim . x + threadIdx . x ;15 shPosition [ threadIdx . x ] = globalX [ idx ] ;16 __syncthreads ( ) ;17 acc = tile_calculation ( myPosition , acc ) ;18 __syncthreads ( ) ;19 }20 // r e t u r n21 }
27 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Results• Tesla C1060 (1.3).• Sequential source code: Intel Corei7 930.• NBody SDK.• Cuda Runtime /Cuda Driver: 4.0.
• 400000 bodies• 200 interactions.
Device Cores Memory Performance (GFLOPS)Tesla C1060 240 4GB 933 (Single), 78 (double)Intel Corei7 4 4GB 44.8 (11.2 per core)
28 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Results
• Sequential code: ≈ 147202512.40 ms ≈ 41 hours (40.89 hours)
• Parallel CUDA code: 1392029.6 ms = (23.3 minutes)
• The speedup is 105.7 (105×).
29 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
At the Present Time
• Some applications accelerate with GPUs.
• The user need to learn new programming languages and tools.
• The CUDA model and its architecture have to be understood.
• Non-expert users have to write programs for a new model.
30 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
GPGPU Languages
OpenACC: introduced last November inSuperComputing’2011
A directive based language.
• Aimed to be standard.
• Supported by: Cray, NVIDIA, PGI and CAPS.
• One simple source code for all versions.
• Platform independent.
• Easier for beginners.
31 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
GPGPU Languages
OpenACC
A directive based language.
32 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
A New Dimension for HPC
33 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCImplementation
accULL = compiler + runtime library.
accULL = YaCF + Frangollo.
34 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCImplementation
accULL = compiler + runtime library.accULL = YaCF + Frangollo.
34 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Initial Objectives of this Project
• To integrate C99 in the YaCF project.
• To implement a new class hierarchy for new YaCF Frontends.
• To implement an OpenACC Frontend.
• To complete the OpenMP grammar with directives in OpenMP3.0.
• To test the new C99 interface.
35 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Source-to-source Compilers
• Rose Compiler Framework.
• Cetus Compiler.
• Mercurium.
36 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
37 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCimplementation
38 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCimplementation
39 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCimplementation
40 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
accULL: our OpenACCimplementation
41 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Yet Another CompilerFramework
42 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF
• A source-to-source compiler that translates C code withOpenMP, llc and OpenACC annotations into code withFrangollo calls.
• Integrates code analysis tools.
• Completely written in Python.
• Based on widely known object oriented software patterns.
• Based on the pycparser Python module.
• Implementing code transformation is only a matter of writing afew lines of code.
43 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
44 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
45 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
46 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
47 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
48 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
49 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
50 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
51 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Preprocessor
52 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Preprocessor
53 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Preprocessor
54 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Preprocessor
55 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
56 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Architecture
57 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: Statistics
• 20683 lines of Python code.
• 2158 functions and methods.
• My contribution has been about 25 % of YaCF project.
58 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
59 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Experiments
• Benchmark Scalapack: testingC99.
• Block Matrix Multiplication inaccULL.
• Three different problems fromthe Rodinia Benchmark:
• HotSpot.• SRAD.• Needleman–Wunsch.
60 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
ScaLAPACK
• The ScaLAPACK (Scalable LAPACK) is a library that includesa subset of LAPACK routines redesigned for distributed memoryMIMD parallel computers.
• ScaLAPACK is designed for heterogeneous computing.
• It is portable to any computer that support MPI.
• Scalable depends on PBLAS operations.
61 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
ScaLAPACK: results in YaCF
Directory Total C files Success Failures
PBLAS/SRC 123 123 0REDIST/SRC 21 21 0PBLAS/SRC/PTOOLS 102 101 1PBLAS/TESTING 2 1 1PBLAS/TIMING 2 1 1REDIST/TESTING 10 0 10SRC 9 9 0TOOLS 2 2 0
Total 271 258 13
95 % of the ScaLAPACK C files are correctly parsed in YaCF.
62 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
ScaLAPACK: results in YaCF
Directory Total C files Success Failures
PBLAS/SRC 123 123 0REDIST/SRC 21 21 0PBLAS/SRC/PTOOLS 102 101 1PBLAS/TESTING 2 1 1PBLAS/TIMING 2 1 1REDIST/TESTING 10 0 10SRC 9 9 0TOOLS 2 2 0
Total 271 258 1395 % of the ScaLAPACK C files are correctly parsed in YaCF.
62 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Platforms
• Garoe: A desktop computer with an Intel Core i7 930 processor(2.80 GHz), with 1MB of L2 cache, 8MB of L3 cache, shared bythe four cores. The system has 4 GB RAM and a Tesla C2050with 4 GB of memory attached.
63 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Platforms
• Drago: A second cluster node. It is a shared memory systemwith 4 Intel Xeon E7. Each processor has 10 cores. In this case,the accelerator platform is Intel OpenCL SDK 1.5 which runs onthe CPU.
64 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
MxM in accULL
• MxM is a basic kernel frequently used to showcase the peakperformance of GPU computing.
• We compare the performance of the accULL implementationwith that of:
• OpenMP.• CUDA.• OpenCL.
65 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
MxM in accULL
MxM OpenACC code
1 #pragma acc k e r n e l s name ( "mxm" ) copy ( a [ L∗N] ) copy i n ( b [ L∗M] , c [M∗N] )2 {3 #pragma acc loop p r i v a t e ( i , j ) c o l l a p s e (2 )4 f o r ( i = 0 ; i < L ; i++)5 f o r ( j = 0 ; j < N ; j++)6 a [ i ∗ L + j ] = 0 . 0 ;7 /∗ I t e r a t e ove r b l o c k s ∗/8 f o r ( ii = 0 ; ii < L ; ii += tile_size )9 f o r ( jj = 0 ; jj < N ; jj += tile_size )
10 f o r ( kk = 0 ; kk < M ; kk += tile_size ) {11 /∗ I t e r a t e i n s i d e a b l o ck ∗/12 #pragma acc loop collapse (2 ) p r i v a t e (i , j , k )13 f o r ( j=jj ; j < min (N , jj+tile_size ) ; j++)14 f o r ( i=ii ; i < min (L , ii+tile_size ) ; i++)15 f o r ( k=kk ; k < min (M , kk+tile_size ) ; k++)16 a [ i∗L+j ] += ( b [ i∗L+k ] ∗ c [ k∗M+j ] ) ;17 }18 }
66 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
MxM in accULL (Garoe)
67 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
MxM in accULL (Drago)
68 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
SRAD: an Image Filtering Code
69 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
SRAD (Garoe)
CUDA in Frangollo performs better than CUDA native.
70 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
SRAD (Drago)
71 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
NW: Needleman-Wunsch, aSequence Alignment Code
72 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
NW (Garoe)
Poor results (but better than OpenMP - 4 cores)
73 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
NW (Drago)
74 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
HotSpot: a Thermal SimulationTool for Estimating Processor
Temperature
75 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
HotSpot (Garoe)
As good as native versions.
76 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
HotSpot (Drago)
77 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
78 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Conclusions: CompilerTechnologies
• Compiler technologies tend to use and optimize source-to-sourcecompilers to generate and transform source code.
• It is easier to parallelize a source code with AST transformations.
• AST transformations enable to programmers to easily generatecode for any platform.
79 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Conclusions: Programming Model
• The usage of directive-based programming languages allownon-expert programmers to abstract from architectural detailsand write programs easier.
• The OpenACC standard is a start point to heterogeneoussystems programming.
• Future versions of the OpenMP standard will include support foraccelerators.
• The results we are obtaining with accULL our early OpenACCimplementation are promising.
80 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
References I
Ruyman Reyes, Ivan Lopez, Juan J. Fumero, F de SandeaccULL: An OpenACC implementation with CUDA and OpenCLsupportInternational European Conference on Parallel and DistributedComputing 2012.
Ruyman Reyes, Ivan Lopez, Juan J. Fumero, F de SandeDirective-based Programming for GPUs: A Comparative StudyThe 14th IEEE International Conference on High PerformanceComputing and Communications.
Ruyman Reyes, Ivan Lopez, Juan J. Fumero, F de SandeaccULL: an user-directed Approach to HeterogeneousProgrammingThe 10th IEEE International Symposium on Parallel andDistributed Processing with Applications.
81 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Outline
1 Introduction
2 YaCF
3 Experiments
4 Conclusions
5 Future Work
82 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Future Work
• Add support to MPI with CUDA and OpenCL.
• Perform new experiments with OpenACC.
• To compare our accULL approach with PGI-OpenACC andCAPS-HMPP.
• Adding support for vectorization.
• Exploring FPGAs to combine with CUDA and OpenCL.
• To introduce LLVM Compiler Framework in the Frontend.
83 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
Thank you for your attention
Juan Jose Fumero [email protected]
84 / 85
YaCF: TheaccULL Compiler
Juan J. Fumero
Introduction
YaCF
Experiments
Conclusions
Future Work
YaCF: The accULL CompilerUndergraduate Thesis Project
Juan Jose Fumero AlfonsoUniversidad de La Laguna
22 de junio de 2012
85 / 85