+ All Categories
Home > Documents > Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC...

Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC...

Date post: 27-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
22
Scientific Computing with Intel Xeon Phi Coprocessors Andrey Vladimirov Colfax International HPC Advisory Council Stanford Conference 2015 Compututing with Xeon Phi Welcome © Colfax International, 2014
Transcript
Page 1: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Scientific Computing withIntel Xeon Phi Coprocessors

Andrey VladimirovColfax International

HPC Advisory Council Stanford Conference 2015

Compututing with Xeon Phi Welcome © Colfax International, 2014

Page 2: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Contents

§1 MIC Architecture, Developer’s Perspective§2 Case Studies

Ï Astrophysics (offload story)Ï N-body simulation (offload vs native in a cluster)Ï Finanical Monte Carlo (heterogeneous clustering)Ï Computational fluid dynamics (legacy code)

§3 Colfax Developer Training

Compututing with Xeon Phi Welcome © Colfax International, 2014

Page 3: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

§1. MIC Architecture from Developer’sPerspective

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 4: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Intel Xeon Phi Coprocessors and the MIC Architecture

PCIe end-point device

High Power efficiency

∼ 1 TFLOP/s in DP

Heterogeneous clustering

For highly parallel applications which reach the scaling limitson Intel Xeon processors

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 5: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Examples of Solutions with the Intel MIC Architecture

Colfax’s CXP7450 workstation withtwo Intel Xeon Phi coprocessorsxeonphi.com/workstations

Colfax’s CXP9000 server with eightIntel Xeon Phi coprocessorsxeonphi.com/servers

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 6: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Intel Xeon Phi Coprocessors and the MIC Architecture

≤18 cores/socket ≈3 GHz

2-way hyper-threading

Up to 768 GB of DDR3 RAM

256-bit AVX vectors

57 to 61 cores at ≈1 GHz

4 hardware threads per core

6–16 GB cached GDDR5 RAM

512-bit IMCI vectors

C/C++/Fortran; OpenMP/MPI

Linux OS (on host and on coprocessor)Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 7: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Linux µOS on Intel Xeon Phi coprocessors (part of MPSS)user@host% lspci | grep -i "co-processor"06:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)82:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)user@host% sudo service mpss statusmpss is runninguser@host% cat /etc/hosts | grep mic172.31.1.1 host-mic0 mic0172.31.2.1 host-mic1 mic1user@host% ssh mic0user@mic0% cat /proc/cpuinfo | grep proc | tail -n 3processor : 237processor : 238processor : 239user@mic0% ls /amplxe dev home lib64 oldroot proc sbin sys usrbin etc lib linuxrc opt root sep3.10 tmp var

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 8: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Offload and Native modesExplicit offload mode:

Native mode:

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 9: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Optimization Areas

Common methods for Intel Xeon CPUs and Intel Xeon Phi coprocessors:

1 Scalar optimization (compiler-friendly practices)

2 Vectorization (must use 16- or 8-wide vectors)

3 Multi-threading (must scale to 100+ threads)

4 Memory access (streaming access or tiling)

5 Communication (offload, MPI traffic control)

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 10: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Getting Ready for the Future

Knights Landing (KNL) – next generation of Intel MIC architecture

3x the performance of current generation

Available as a stand-alone processor or as a coprocessor

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 11: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Getting Ready for the Future

The best way to prepare applications for KNL is to optimize them forIntel Xeon Phi coprocessors based on KNC.

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Page 12: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

§2. Case Studies

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 13: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 14: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 15: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

N-body Simulation: Offload vs Native in a Cluster

xeonphi.com/papers/nbody-basic

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 16: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

N-body Simulation: Offload vs Native in a Cluster

Initial Multi-threaded

Vectorizedwith SoA

ScalarTuning

Tiled,Unrolled

0

500

1000

1500

2000

Sin

gle

Prec

isio

n G

FLO

P/s

5.3140 180

480 520

0.8120

220

870

1620

N-Body Simulation Performance

Processor: Intel Xeon E5-2697 v2 Coprocessor: Intel Xeon Phi 7120P

xeonphi.com/papers/sc14

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 17: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

N-body Simulation: Offload vs Native in a Cluster

 0

 5

 10

 15

 20

 25

 30

 35

 1  2  3  4  8  12  16

Per

form

ance

, TFLO

P/s

Number of Nodes or Coprocessors (P)

92% eff

76% eff

Intel Xeon E5-2697 v2 CPUs (4 nodes)

Intel Xeon Phi 7120P coprocesors (4 per node)N=220 particles (strong scaling)

1 Xeon Phi/node

2 Xeon Phi/node

3 Xeon Phi/node

4 Xeon Phi/node

Xeon Phi,native MPI

Xeon Phi,MPI+Offload

CPU

xeonphi.com/papers/sc14

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 18: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Asian Option Pricing: Heterogeneous Clustering

xeonphi.com/papers/heterogeneous

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Page 19: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Computational Fluid Dynamics: Legacy Code

xeonphi.com/papers/shallowCompututing with Xeon Phi Case Studies © Colfax International, 2014

Page 20: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

§3. Colfax Developer Training

Compututing with Xeon Phi Colfax Developer Training © Colfax International, 2014

Page 21: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Colfax Developer Training

Intel Xeon Phi Coprocessor ProgrammingFuture-Proofing Applications for Knights Landing (KNL)

xeonphi.com/trainingCompututing with Xeon Phi Colfax Developer Training © Colfax International, 2014

Page 22: Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC Architecture •18 cores/socket …3 GHz 2-way hyper-threading Up to 768 GB of DDR3

Free Training for HPCAC Stanford 2015 Participants

xeonphi.com/hpcac2015

Compututing with Xeon Phi Colfax Developer Training © Colfax International, 2014


Recommended