+ All Categories
Home > Technology > PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

Date post: 10-May-2015
Category:
Upload: amd-developer-central
View: 348 times
Download: 1 times
Share this document with a friend
Description:
Presentation PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier, at the AMD Developer Summit (APU13) November 11-13, 2013.
Popular Tags:
43
PORTING AND OPTIMIZING OPENMP APPLICATIONS TO APU USING CAPS TOOLS JEANCHARLES VASNIER, CAPS ENTREPRISE
Transcript
Page 1: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

PORTING  AND  OPTIMIZING  OPENMP  APPLICATIONS  TO  APU  USING  CAPS  TOOLS  

JEAN-­‐CHARLES  VASNIER,  CAPS  ENTREPRISE  

Page 2: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

2   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

AGENDA  

y  CAPS  enterprise  y  OpenACC  y  CAPS  Compilers  

y  CAPS  OpenMP  Compiler  for  AMD  APUs  ‒ Compiler  analyzes  and  code  generaPon  ‒  InteracPve  report  

y  ExperimentaPons  with  benchmark  applicaPons  ‒ HydroC  

y  Future  work  

CAPS OpenMP Compiler - June 2013 2

Page 3: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

CAPS  enterprise  

Page 4: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

4   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

y  Founded  in  2002  ‒  Large  experPse  in  processor  micro-­‐architecture  and  code  generaPon  ‒  Spin-­‐off  of  French  INRIA  Research  Lab  ‒ 30  employees  

y  Mission:  to  help  its  customers  to  leverage  the  performance  of  mulP/manycore  machines  ‒ ConsulPng  &  engineering  services  ‒ CAPS  OpenACC  Compiler  &  toolchain  ‒ Trainings  

y  Expanding  sales  worldwide  ‒ Resellers  in  US  and  APAC    (Exxact,  Abso^,  JCC  Gimmick  Ltd,  Nodasys,  …)    

www.caps-entreprise.com 4

COMPANY  PROFILE  

Page 5: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

5   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL   www.caps-entreprise.com 5

CAPS  ECOSYSTEM  

Business Partners

European R&D Projects

Customers

Page 6: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

OpenACC    

Page 7: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

7   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

OPENACC  INITIATIVE  

y  A CAPS, CRAY, Nvidia and PGI initiative

y  Open Standard

y  A directive-based approach for programming heterogeneous many-core hardware for C and FORTRAN applications

y  http://www.openacc-standard.com

www.caps-entreprise.com 7

Page 8: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

8   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

DIRECTIVE-­‐BASED  PROGRAMMING  (1)    

y  Three ways of programming GPGPU applications:

www.caps-entreprise.com 8

Libraries

Ready-to-use Acceleration  

Directives

Quickly Accelerate Existing Applications  

Programming Languages

Maximum Performance  

Page 9: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

9   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

DIRECTIVE-­‐BASED  PROGRAMMING  (2)    

www.caps-­‐entreprise.com   9  

Page 10: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

10   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

EXECUTION  MODEL  

y  Among a bulk of computations executed by the CPU, some regions can be offloaded to hardware accelerators ‒ Parallel regions ‒ Kernels regions

y  Host is responsible for: ‒ Allocating memory space on accelerator ‒  Initiating data transfers ‒ Launching computations ‒ Waiting for completion ‒ Deallocating memory space

y  Accelerators execute parallel regions: ‒ Use work-sharing directives ‒ Specify level of parallelization

www.caps-entreprise.com 10

Page 11: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

11   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

OPENACC  EXECUTION  MODEL  

y  Host-­‐controlled  execuPon  y  Based  on  three  parallelism  levels  

‒ Gangs  –  coarse  grain  ‒ Workers  –  fine  grain  ‒ Vectors  –  finest  grain  

www.caps-entreprise.com 11

Device  

Gang                    

Worker        

       

   Vectors  

Gang                    

Worker          

       

   Vectors  

…  

Page 12: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

CAPS  Compilers  

Page 13: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

13   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

OPENACC  COMPILERS  (1)  

CAPS  Compilers:  y Source-­‐to-­‐source  compilers  y Support  Intel  Xeon  Phi,  NVIDIA  GPUs,  

AMD  GPUs  and  APUs  

PGI  Accelerator  y Extension  of  x86  PGI  compiler  y Support  Intel  Xeon  Phi,  NVIDIA  GPUs,  

AMD  GPUs  and  APUs  

www.caps-­‐entreprise.com   13  

Cray  Compilers:  y Provided  with  Cray  system  only  

Page 14: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

14   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

CAPS  COMPILERS  (2)  

Are source-to-source compilers, composed of 3 parts:

y The directives (OpenACC or OpenHMPP) ‒ Define parts of code to be accelerated ‒ Indicate resource allocation and communication ‒ Ensure portability

y The toolchain ‒ Helps building manycore applications ‒ Includes compilers and target code generators ‒ Insulates hardware specific computations ‒ Uses hardware vendor SDK

y The runtime ‒ Helps to adapt to platform configuration ‒ Manages hardware resource availability

www.caps-entreprise.com 14

Page 15: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

15   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

CAPS  COMPILERS  (3)  

y Take  the  original  applicaPon  as  input  and  generate  another  applicaPon  source  code  as  output  ‒ AutomaPcally  turn  the  OpenACC  source  code  into  a  accelerator-­‐specific  source  code  (CUDA,  OpenCL)  

y Compile  the  enPre  hybrid  applicaPon    y  Just  prefix  the  original  compilaPon  line  with  capsmc  to  produce  a  hybrid  applicaPon  

y CompaPble  with:  ‒ GNU  ‒ Intel  ‒ Open64  ‒ Abso^  ‒ …  

www.caps-entreprise.com 15

$ capsmc gcc myprogram.c $ capsmc gfortran myprogram.f90  

Page 16: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

16   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

CAPS  COMPILERS  (4)  

y CAPS Compilers drives all compilation passes

y Host application compilation ‒ Calls traditional CPU compilers ‒ CAPS Runtime is linked to the host part of the application

y Device code production ‒ According to the specified target

‒ A dynamic library is built

www.caps-­‐entreprise.com   16  

Fun  #3  

C++  Frontend  

C  Frontend  

Fortran  Frontend  

CUDA  Code  GeneraPon  

Executable  (mybin.exe)  

Instrumen-­‐taPon  module  

CPU  compiler    (gcc,  ifort,  …)   CUDA  compilers  

HWA  Code    (Dynamic  library)  

OpenCL  GeneraPon  

OpenCL  compilers  

ExtracPon  module  

Fun  #2  

Host  code  

codelets  

CAPS  RunDme  

Fun  #1  

Page 17: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

From  OpenMP  To  OpenACC  

Page 18: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

18   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

CAPS  OPENMP  COMPILER  

y  AutomaPcally  turns  OpenMP  codes  into  OpenACC  

y  Diagnoses  compaPbility  issues  and  suggests  code  transformaPons  

y  Builds  accelerated  versions  based  on  CUDA  or  OpenCL  y  Works  with  all  plalorms  

‒ AMD  and  Nvidia  GPUs  ‒ AMD  APUs  ‒  Intel  Xeon  Phi  

CAPS OpenMP Compiler - June 2013 18

Page 19: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

19   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

CAPS  OPENMP  COMPILER  OVERVIEW  

Profiling   Analysis   AcceleraPon  

CAPS OpenMP Compiler - June 2013 19

Page 20: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

20   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

y  Converts  OpenMP  codes  into  OpenACC    ‒ Examine  OpenMP  loop  nests  and  check  their  OpenACC  compaPbility    ‒ Diagnose  non  compaPbility  issues  and  propose  advice    ‒ Build  an  APU  version  based  on  OpenCL  

y  Builds  a  interacPve  report    ‒ Based  on  the  compiler  staPc  and  dynamic  analyses    ‒ OpenMP  to  OpenACC  kernels  view  o    Performance  details  of  each  region    ‒ Regions’  In/Out  and  data  dependencies  between  regions  ‒ Gives  the  user  control  on  pushing  kernels  onto  GPU  and  manage  data  transfers  

EXTENSION  OF  THE  CAPS  OPENACC  COMPILER  

Page 21: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

21   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

OPENMP-­‐BASED  OPTIMIZATION  PROCESS  

CAPS OpenMP Compiler - June 2013 21

Execution

Profiling report

Generation

Accelerated executable

Application with OpenMP

directives

Instrumentation

Tracable application

Analysis

HTML interactive

report

Page 22: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

22   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

y  Code  preprocessing  and  instrumentaPon  ‒  IdenPfy  supported  OpenMP  regions    

‒   parallel,  parallel    for  and  parallel  for  constructs  ‒  Instrument  the  code  to  track  data  and  measure  kernel  performance    

y  Instrumented  applicaPon  execuPon    ‒ Based  on  the  user  data  set      ‒ Number  of  Pmes  a  OpenMP  region  is  executed    ‒ Region’s  reads  and  writes    ‒ Range  of  loops  iteraPon    ‒ Region  performance  

INSTRUMENTATION  AND  PROFILING  PHASES  

Page 23: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

23   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

y  Generates  an  interacPve  HTML  report  ‒ Based  on  the  compiler  staPc  and  dynamic  analyses  ‒ Metrics  for  each  OpenMP  regions    

‒   Check  OpenACC  compliancy    ‒  ComputaPon  density    ‒  Coalescing  of  data  accesses  ‒  EsPmated  speed-­‐up  ‒ Memory  usage  

‒ Propose  a  GPU  execuPon  or  naPve  OpenMP  execuPon  ‒ Data  usage  and  data  dependencies  graph  between  regions  

‒ Determine  when  transfers  are  required  between  kernels  ‒  Let  the  user  modify  the  CPU  or  GPU  execuPon  and  data  transfer  policy  

ANALYSIS  PHASE  

Page 24: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

24   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HTML  INTERACTIVE  REPORT  (1)  

y  Get  regions  overview  in  a  snap!  

 

y  Code  View:  from  OpenMP  to  OpenACC  direcPves  

CAPS OpenMP Compiler - June 2013 24

Page 25: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

25   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HTML  INTERACTIVE  REPORT  (2)  

y Performance  details  of  each  region  

y Analysis  conclusions  and  portability  diagnosis  

CAPS  OpenMP  Compiler  -­‐  June  2013   25  

Page 26: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

26   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HTML  INTERACTIVE  REPORT  (3)  

y  Regions’  inputs/outputs  and  data  dependencies  map  

CAPS OpenMP Compiler - June 2013 26

Page 27: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

27   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HTML  INTERACTIVE  REPORT  (4)  

y  Get  the  control!  ‒ Manually  push  kernels  onto  accelerators  ‒ Manage  data  transfers  

CAPS OpenMP Compiler - June 2013 27

Page 28: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

28   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

y  Same  as  the  CAPS  OpenACC  Compiler    ‒ Based  on  the  analysis  report    ‒ Generates  OpenCL  kernels  from  OpenACC    ‒ AutomaPc  data  updates  to  ensure  memory  coherency  

CODE  GENERATION  PHASE  

Page 29: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

29   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

FEATURES  

y  Diagnoses  ‒ OpenACC  compliancy  ‒ ComputaPonal  density  ‒ Data  accesses  coalescing  ‒ Memory  usage  ‒ EsPmated  speed-­‐up  

y  AutomaPc  porPng  to  AMD,  NVIDIA,  or  Intel  accelerators  

y  Accelerates  execuPon  or  keeps  the  OpenMP  naPve  one  

y  Gives  users  control  to  manual  opPmizaPons  

CAPS OpenMP Compiler - June 2013 29

Page 30: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

ApplicaPon  ExperimentaPons  

Page 31: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

31   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HARDWARE  AND  SOFTWARE  ENVIRONMENT  

y  Linux  system  ‒ AMD  SDK  2.8  ‒ CAPS  Compiler  revision  50387  ‒ GCC  4.6.1  ‒ OpenMPI  1.6.4  

y  Hardware  ‒ AMD  A10-­‐5800K  APU  with  Radeon  HD  Graphics  

CAPS OpenMP Compiler - June 2013 31

Page 32: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

32   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

APPLICATIONS  STATUS  

y  Main  objecPve  is  proof  of  concept,  not  performance  ‒ Performance  limitaPons  of  current  version  of  the  APU    

y  HydroC  ‒ Most  convincing  demo  ‒  x1.3  speed-­‐up  by  modifying  the    execuPon  and  transfer  policy  

CAPS OpenMP Compiler - June 2013 32

Page 33: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

33   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

HYDROC  HTML  REPORT  

Page 34: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

Fututre  Work  C2PO  

Page 35: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

35   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  MISSION  STATEMENT  

y  Combines  various  CAPS  technologies  in  a  modular  tool  chain  ‒  StaPc  and  dynamic  code  analyzers  ‒ OpenMP  to  OpenACC  code  transformers  ‒ Kernel  micro-­‐bencher  ‒ Plug  with  third-­‐party  tools:  Vtune,  CUDA  profiler  ‒ Use  CAPS  Compiler  at  final  stage  to  produce  manycore  applicaPon  

C2PO - Oct. 2013 35

Guides  you  through  the  whole  process  of  porPng  and  tuning  applicaPons  onto  manycore  parallel  systems  

Page 36: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

36   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  PHASES  

1.  GeneraPon  of  an  OpenACC  skeleton  from  OpenMP  or  sequenPal  code  ‒  Hotspot  detecPon  and  dataflow  analysis  

2.  Indicates  global  and  local  advice  on    ‒ Data  management/placement  between  kernels  or  regions  ‒  First  ten  Pps  on  kernel  performance  

‒  Data  coalescing,  parallelism,  gridificaPon,  loops  order  

3.  Let  you  rapidly  opPmize  performance  of  kernels  ‒  Extracts  funcPons,  loops  or  annotated  regions  ‒  Tune  kernel  code  following  C2PO  advice  ‒  Replay  standalone  with  applicaPon  data  and  measure  performance  gain  ‒  Re-­‐inject  opPmized  into  applicaPon  source  code  

4.  Use  CAPS  Compilers  to  build  Intel  Xeon  Phi,  NVIDIA  or  AMD  GPUs  

C2PO - Oct. 2013 36

Dataflow  analysis  

OpenACC  skeleton  generaPon  

Extract  loops,  funcPons,  regions  

Fine  tune  kernels  

User  Input  

Page 37: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

37   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  TOOL  CHAIN  

C2PO - Oct. 2013 37

OpenACC  Generator  

Data  Movement  Analyzer  SequenPal  

Code  

OpenMP  Code  

HTML  Report  

OpenACC  Code  

ubencher  

InteracPve  Report  

Kernels   Performance  analyzer  

Code  skeleton  generaDon  

Global  tuning  

Local  tuning  

CUDA  profiler  

VTune  

Page 38: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

38   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  OPENACC  GENERATION  

y  From  sequenPal  or  OpenMP  code  to  first  parallelized  code  ‒  Instrument  applicaPon  and  detect  hotspots  ‒ Generate  OpenACC  skeleton  of  kernels  from  loops  ‒ Manage  data  transfers  between  kernels  

y  A  report  is  generated  containing  ‒ Various  performance  metrics  

‒  Kernel  execuPon  ‒ Memory  reads  and  writes  ‒  PotenPal  performance  gain  

‒ Data  dependencies  and  usage  between  kernels  ‒ OpenACC  code  view  

C2PO - Oct. 2013 38

Page 39: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

39   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  GLOBAL  TUNING  

y  Dynamic  tracking  of  data  so  as  to  opPmize  their  movement  ‒ Dynamically  trace  uploads  and  downloads  at  execuPon  Pme  ‒ Detect  potenPally  redundant  data  transfers    

C2PO - Oct. 2013 39

#openacc  data  region  //  convergence  loop    for  {          Upload  data()          Kernels’  calls()          Download  data()  }  …  

Difficult  for  the  compiler  to  detect  any  CPU  use  of  data  

Possible  advice:  are  the  following  parameters  modified  

by  the  CPU  between  the  downloads  and  uploads?    

If  yes,  insert  OpenACC  data  region  with  non  modified  parameters  

Page 40: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

40   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  TUNING  PHASE  

y  Microbenchmarking  mechanism  ‒  Loops,  funcPons,  user  annotated  regions  are  extracted  in  kernels  ‒ Apply  opPmizaPons    ‒ Replay  kernels  with  original  data  set  without  running  the  whole  applicaPon  ‒ Once  tuned,  inject  kernels  into  the  applicaPon  source  code  

y  Apply  performance  analyzers  from  third  party  tools  (Vtune,  CUDA  profiler)  ‒  Synthesizes  raw  metrics  (hardware  counters)  linked  to  the  source  code  

C2PO - Oct. 2013 40

Page 41: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

41   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

C2PO  OBJECTIVES  AND  BENEFITS  

y  Keep  one  single  OpenMP  code  for  various  parallel  many-­‐core  systems  (GPUs,  APUs,  MIC)  

y  Incrementally  port  and  opPmize  codes  in  a  modular  way  

y  Use  an  interacPve  compiler:  advice  from  dynamic  and  staPc  analyses  at  source  code  level  

C2PO - Oct. 2013 41

Page 42: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

THANK  YOU  FOR  YOUR  ATTENTION!  

Vasnier  Jean-­‐Charles  Sales  Engineer,  CAPS  entreprise  

Phone:  +1-­‐865-­‐227-­‐6899  Email:  jvasnier@caps-­‐entreprise.com  

Page 43: PT-4142, Porting and Optimizing OpenMP applications to APU using CAPS tools, by Jean-Charles Vasnier

43   |      PRESENTATION  TITLE      |      NOVEMBRE  19,  2013      |      CONFIDENTIAL  

GET  PERFORMANCE  IN  NO  TIME!  

CAPS OpenMP Compiler - June 2013 43

 ‒  Measured  on  a  dual  Sandy  bridge  E5-­‐2687W  with  32  Go  RAM  and  a  Kepler  K20C  driven  by  CUDA  v5.0    

45,698  

63,42  

27,539  

12,71  

23,417  

12,55  

0  

10  

20  

30  

40  

50  

60  

70  

Hydro   Nbody  

ExecuD

on  Tim

e  (secon

ds)  

Original  (OpenMP)  

Generated  (auto)  

Generated(tweaked)  

x2  speed-­‐up  (a^er  user’s  tuning)  

x6  speed-­‐up  in  3  clicks  (full  automaPc)  


Recommended