PRACE PATC Course
Intel MIC Programming Workshop
June, 26-28, 2017, LRZ
LRZ in the HPC Environment
Intel MIC Programming Workshop @ LRZ
HLRS@Stuttgart JSC@Jülich LRZ@Garching
PRACE has 25 members, representing European
Union Member States and Associated Countries.
Bavarian Contribution to National Infrastructure
German Contribution to European Infrastructure
26.-28.6.2017
PATC Courses
Advanced Training Centre (PATC) Courses
LRZ is part of the Gauss Centre for Supercomputing (GCS), which is one of the six
PRACE Advanced Training Centres (PATCs) that started in 2012:
Barcelona Supercomputing Center (Spain), CINECA
Consorzio Interuniversitario (Italy)
CSC – IT Center for Science Ltd (Finland)
EPCC at the University of Edinburgh (UK)
Gauss Centre for Supercomputing (Germany)
Maison de la Simulation (France)
Mission: Serve as European hubs and key drivers of advanced high-quality
training for researchers working in the computational sciences.
http://www.training.prace-ri.eu/
Intel MIC Programming Workshop @ LRZ26.-28.6.2017
Tentative Agenda: Monday
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
● Monday, June 26, 2017, Kursraum 2, H.U.010 (course room)
● 09:00-10:00 Welcome & Introduction (Weinberg)
● 10:00-10:30 Overview of the Intel MIC architecture (Allalen)
● 10:30-11:00 Coffee break
● 11:00-11:30 Overview of the Intel MIC programming models (Allalen)
● 11:30-12:00 Native mode KNC and KNL programming (Allalen)
● 12:00-13:00 Lunch break
● 13:00-14:00 KNL Memory Modes and Cluster Modes, MCDRAM (Weinberg)
● 14:00-15:30 Offloading (Weinberg)
● 15:30-16:00 Coffee break
● 16:00-17:00 MKL (Allalen)
Tentative Agenda: Tuesday
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
● Tuesday, June 27, 2017, Kursraum 2, H.U.010 (course room)
● 09:00-10:30 Vectorisation and Intel Xeon Phi performance optimisation (Allalen)
● 10:30-11:00 Coffee break
● 11:00-12:00 Guided SuperMUC/MIC Tour (Weinberg/Allalen)
● 12:00-13:00 Lunch break
● 13:00-15:30 KNL code optimisation process (Baruffa)
● 15:30-16:00 Coffee Break
● 16:00-17:00 Profiling tools: Intel Advisor (Baruffa)
● 18:00 - open end at GARNIX https://www.garnix-festival.de/
Tentative Agenda: Wednesday
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
● Wednesday, June 28, 2017, 09:00-12:00, Hörsaal, H.E.009 (Lecture Hall)
● 09:00-10:30 Many-core Programming with OpenMP 4.x (Michael Klemm, Intel)
● 10:30-10:45 Coffee Break
● 10:45-12:00 Advanced KNL programming techniques (Intrinsics, Assembler, AVX-
512,...) (Jan Eitzinger, RRZE)
● 12:00-13:00 Lunch Break
Tentative Agenda: Wednesday
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
● Wednesday, June 28, 2017, 13:00-18:00, Hörsaal, H.E.009 (Lecture Hall)
● Plenum session with invited talks on MIC experience and best practice recommendations
(joint session with the Scientific Workshop "HPC for natural hazard assessment and disaster
mitigation"), public session
● 13:00-13:30 Luigi Iapichino, IPCC@LRZ: "Performance Optimization of Smoothed Particle
Hydrodynamics and Experiences on Many-Core Architectures"
● 13:30-14:00 Michael Bader/Carsten Uphoff, IPCC@TUM: "Extreme-scale Multi-physics Simulation
of the 2004 Sumatra Earthquake"
● 14:00-14:30 Vit Vondrak/Branislav Jansik, IPCC@IT4I: "Development of Intel Xeon Phi Accelerated
Algorithms and Applications at IT4I"
● 14:30-15:00 Michael Klemm, Intel: "Application Show Cases on Intel® Xeon Phi™ Processors"
● 15:00-15:30 Coffee Break
● 15:30-16:00 Jan Eitzinger, RRZE: "Evaluation of Intel Xeon Phi "Knights Landing": Initial
impressions and benchmarking results"
● 16:00-16:30 Piotr Korcyl, University of Regensburg: "Lattice Quantum Chromodynamics on the MIC
architectures"
● 16:30-17:00 Nils Moschüring, IPP: "The experience of the HLST on Europes biggest KNL cluster"
● 17:00-17:30 Andreas Marek, Max Planck Computing and Data Facility (MPCDF), "Porting the
ELPA library to the KNL architecture"
● 17:30-18:00 Q&A, Wrap-up
Information
● Lecturers:
Dr. Momme Allalen, Dr. Fabio Baruffa, Dr. Volker Weinberg (LRZ)
Dr.-Ing. Jan Eitzinger (RRZE)
Dr.-Ing. Michael Klemm (Intel Corp.)
● Complete lecture slides & exercise sheets:
https://www.lrz.de/services/compute/courses/x_lecturenotes/mic_
workshop_2017/
http://tinyurl.com/yd6lfweq
● Examples under:
/lrz/sys/courses/MIC_Workshop
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Intel Xeon Phi @ LRZ and EU
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Intel Xeon Phi and GPU Training @ LRZ
Intel MIC Programming Workshop @ LRZ
28.-30.4.2014 @ LRZ (PATC): KNC+GPU
27.-29.4.2015 @ LRZ (PATC): KNC+GPU
3.-4.2.2016 @ IT4Innovations: KNC
27.-29.6.2016 @ LRZ (PATC): KNC+KNL
28.9.2016 @ PRACE Seasonal School,
Hagenberg: KNC
7.-8.2.2017 @ IT4Innovations (PATC): KNC
26.-28.6.2017 @ LRZ (PATC): KNL
June 2018 @ LRZ (PATC tbc.): KNL
http://inside.hlrs.de/
inSiDE, Vol. 12, No. 2, p. 102, 2014
inSiDE, Vol. 13, No. 2, p. 79, 2015
inSiDE, Vol. 14, No. 1, p. 76f, 2016
inSiDE, Vol. 14, No. 2, p. 25ff, 2016
inSiDE, Vol. 15, No. 1, p. 48ff, 2017
26.-28.6.2017
Evaluating Accelerators at LRZ
Research at LRZ within PRACE & KONWIHR:
● CELL programming
2008-2009 Evaluation of CELL programming.
IBM announced to discontinue CELL in Nov. 2009.
● GPGPU programming
Regular GPGPU computing courses at LRZ since 2009.
Evaluation of GPGPU programming languages:
CAPS HMPP
PGI accelerator compiler
CUDA, cuBLAS, cuFFT
PyCUDA/R
● Intel Xeon Phi programming
Larrabee (2009) → Knights Ferry (2010) → Knights Corner → Intel
Xeon Phi (2012) → KNL (2016)
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
} → OpenACC, OpenMP 4.x
IPCC (Intel Parallel Computing Centre)
● New Intel Parallel Computing Centre (IPCC) since July 2014:
Extreme Scaling on MIC/x86
● Chair of Scientific Computing at the Department of Informatics in
the Technische Universität München (TUM) & LRZ
● https://software.intel.com/de-de/ipcc#centers
● https://software.intel.com/de-de/articles/intel-parallel-computing-center-at-
leibniz-supercomputing-centre-and-technische-universit-t
● Codes:
Simulation of Dynamic Ruptures and Seismic Motion in Complex
Domains: SeisSol
Numerical Simulation of Cosmological Structure Formation: GADGET
Molecular Dynamics Simulation for Chemical Engineering: ls1 mardyn
Data Mining in High Dimensional Domains Using Sparse Grids: SG++
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
● Czech-Bavarian Competence Team for
Supercomputing Applications (CzeBaCCA)
● New BMBF funded project that started in Jan. 2016 to:
Foster Czech-German Collaboration in Simulation Supercomputing
series of workshops will initiate and deepen collaboration between Czech
and German computational scientists
Establish Well-Trained Supercomputing Communities
joint training program will extend and improve trainings on both sides
Improve Simulation Software
establish and disseminate role models and best practices of simulation
software in supercomputing
Intel MIC Programming Workshop @ LRZ
CzeBaCCA Project
26.-28.6.2017
CzeBaCCA Trainings and Workshops
● Intel MIC Programming Workshop, 3 – 4 February 2016, Ostrava, Czech
Republic
● Scientific Workshop: SeisMIC - Seismic Simulation on Current and Future
Supercomputers, 5 February 2016, Ostrava, Czech Republic
● PRACE PATC Course: Intel MIC Programming Workshop, 27 - 29 June 2016,
Garching, Germany
● Scientific Workshop: High Performance Computing for Water Related Hazards,
29 June - 1 July 2016, Garching, Germany
● PRACE PATC Course: Intel MIC Programming Workshop, 7 – 8 February 2017,
Ostrava, Czech Republic
● Scientific Workshop: High performance computing in atmosphere modelling and
air related environmental hazards, 9 February 2017, Ostrava, Czech Republic
● PRACE PATC Course: Intel MIC Programming Workshop, 26 – 28 June 2017,
Garching, Germany
● Scientific Workshop: HPC for natural hazard assessment and disaster migration,
28 - 30 June 2017, Garching, Germany
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
CzeBaCCA Trainings and Workshops
Intel MIC Programming Workshop @ LRZ
1st workshop series: February 2016 @ IT4I
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/
http://inside.hlrs.de/ inSiDE, Vol. 14, No. 1, p. 76f, 2016
http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
26.-28.6.2017
CzeBaCCA Trainings and Workshops
Intel MIC Programming Workshop @ LRZ
2nd workshop series: June 2016 @ LRZ
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/
http://inside.hlrs.de/ inSiDE, Vol. 14, No. 2, p. 25ff, 2016
http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
26.-28.6.2017
CzeBaCCA Trainings and Workshops
Intel MIC Programming Workshop @ LRZ
3rd workshop series: February 2017 @ IT4I
https://www.lrz.de/forschung/projekte/forschung-hpc/CzeBaCCA/
http://inside.hlrs.de/ inSiDE, Vol. 15, No. 1, p. 48ff, 2017
http://www.gate-germany.de/fileadmin/dokumente/Laenderprofile/Laenderprofil_Tschechien.pdf, p.27
26.-28.6.2017
Intel Xeon Phi @ Top500 June 2017
● https://www.top500.org/list/2017/06/
● #2: Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C
2.200GHz, TH Express-2, Intel Xeon Phi 31S1P, National Super Computer
Center in Guangzhou, China
● #6: Cori - Cray XC40, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect ,
Cray Inc., DOE/SC/LBNL/NERSC, United States
● #7: Oakforest-PACS - PRIMERGY CX1640 M1, Intel Xeon Phi 7250 68C
1.4GHz, Intel Omni-Path , Fujitsu, Joint Center for Advanced High Performance
Computing, Japan
● #12:Stampede2 - PowerEdge C6320P, Intel Xeon Phi 7250 68C 1.4GHz, Intel
Omni-Path , Dell, Texas Advanced Computing Center/Univ. of Texas, United
States
● #14: Marconi, Intel Xeon Phi - CINECA Cluster, Intel Xeon Phi 7250 68C
1.4GHz, Intel Omni-Path , Lenovo, CINECA, Italy
● … several non European systems …
● #78: Salomon - SGI ICE X, Xeon E5-2680v3 12C 2.5GHz, Infiniband FDR, Intel
Xeon Phi 7120P, HPE, IT4Innovations National Supercomputing Center, VSB-
Technical University of Ostrava, Czech Republic
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
PRACE: Best Practice Guides
● http://www.prace-ri.eu/best-practice-guides/
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Best Practice Guides - Overview
● The following 4 Best Practice Guides (BPGs) have been written within
PRACE-4IP by 13 authors from 8 institutions and have been
published in pdf and html format in January 2017 on the PRACE
website:
Intel® Xeon Phi™ BPG Update of the PRACE-3IP BPG
Haswell/Broadwell BPG Written from scratch
Knights Landing BPG Written from scratch
GPGPU BPG Update of the PRACE-2IP mini-guide
● Online under: http://www.prace-ri.eu/best-practice-guides/
Intel MIC within PRACE:
Intel Xeon Phi (KNC) Best Practice Guide
Created within PRACE-3IP+4IP.
Written in Docbook XML.
122 pages, 13 authors
Now including information about existing
Xeon Phi based systems in Europe: Avitohol
@ BAS (NCSA), MareNostrum @ BSC,
Salomon @ IT4Innovations,SuperMIC @
LRZ
http://www.prace-ri.eu/best-practice-guide-
intel-xeon-phi-january-2017/
http://www.prace-ri.eu/IMG/pdf/Best-
Practice-Guide-Intel-Xeon-Phi-1.pdf
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Intel MIC within PRACE:
Knights Landing Best Practice Guide
Created within PRACE-4IP.
Written in Docbook XML.
85 pages, 3 authors
General information about the KNL
architecture and programming environment
Benchmark & Application Performance
results
http://www.prace-ri.eu/IMG/best-practice-
guide-knights-landing-january-2017/
http://www.prace-ri.eu/IMG/pdf/Best-
Practice-Guide-Knights-Landing.pdf
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Best Practice Guides - Dissemination
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC ∈ SuperMUC @ LRZ
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMUC System Overview
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMUC Phase 2: Moving to Haswell
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
6 Haswell islands
512 nodes per island
warm water cooling
LRZ infrastructure
(NAS, Archive, Visualization)
Internet / Grid Services
Mellanox FDR14
Island switch
Haswell-EP
24 cores/node
2.67 GB/core
non blocking
Spine infiniband
switches
pruned tree
I/O
servers
GPFS for
$WORK
$SCRATCH
I/O Servers
(weak coupling of phases 1+2)
Mellanox FDR10
Island switch
non blocking
pruned tree
Thin + Fat islands
of SuperMC
SuperMUC Phase 2: Moving to Haswell
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC: Intel Xeon Phi Cluster
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC: Intel Xeon Phi Cluster
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC ∈ SuperMUC @ LRZ
● 32 compute nodes (diskless)
SLES11 SP3
2 Ivy-Bridge host processors [email protected] GHz with 16 cores
2 Intel Xeon Phi 5110P coprocessors per node with 60 cores
64 GB (Host) + 2 * 8 GB (Xeon Phi) memory
2 MLNX CX3 FDR PCIe cards attached to each CPU socket
● Interconnect
Mellanox Infiniband FDR14
Through Bridge Interface all nodes and MICs are directly accessible
● 1 Login- and 1 Management-Server (Batch-System, xCAT, …)
● Air-cooled
● Supports both native and offload mode
● Batch-system: LoadLeveler
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC Network Access
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
SuperMIC Access
● Description of SuperMIC:
● https://www.lrz.de/services/compute/supermuc/supermic/
● Training Login Information:
● https://www.lrz.de/services/compute/supermuc/supermic/tr
aining-login/
● Use course account on paper snippets
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
KNL Testsystem
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
First login to Linux-Cluster (directly
reachable from the course PCs, use only
account a2c06aa!):
ssh lxlogin1.lrz.de –l a2c06aa
Then:
ssh mcct03.cos.lrz.de or
ssh mcct04.cos.lrz.de
Processor: Intel(R) Xeon Phi(TM) CPU 7210. 64 cores, 4 threads per core.
Frequency: 1 - 1.5 GHz
KNL: 64 cores x 1.3 GHz x 8 (SIMD) x 2 x 2 (FMA) = 2662.4 GFLOP/s
Compare with:
KNC: 60 cores x 1 GHz x 8 (SIMD) x 2 (FMA) = 960 GFLOP/s
Sandy-Bridge: 2 sockets x 8 cores x 2.7 GHz x 4 (SIMD) x 2 (ALUs) = 345.6 GFLOP/s
Xeon Phi References
● Books:
James Reinders, James Jeffers, Intel Xeon Phi Coprocessor High
Performance Programming, Morgan Kaufman Publ. Inc., 2013
http://lotsofcores.com ; new KNL edition in July 2016
Rezaur Rahman: Intel Xeon Phi Coprocessor Architecture and Tools: The
Guide for Application Developers, Apress 2013 .
Parallel Programming and Optimization with Intel Xeon Phi Coprocessors,
Colfax 2013 http://www.colfaxintl.com/nd/xeonphi/book.aspx
● Training material by CAPS, TACC, EPCC
● Intel Training Material and Webinars
● V. Weinberg (Editor) et al., Best Practice Guide - Intel Xeon Phi v2,
http://www.prace-ri.eu/best-practice-guide-intel-xeon-phi-january-2017/
and references therein
● Ole Widar Saastad (Editor) et al., Best Practice Guide – Knights
Landing, http://www.prace-ri.eu/best-practice-guide-knights-landing-
january-2017/
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ
Acknowledgements
● IT4Innovation, Ostrava.
● Partnership for Advanced Computing in Europe (PRACE)
● Intel
● BMBF (Federal Ministry of Education and Research)
● Dr. Karl Fürlinger (LMU)
● J. Cazes, R. Evans, K. Milfeld, C. Proctor (TACC)
● Adrian Jackson (EPCC)
Intel MIC Programming Workshop @ LRZ26.-28.6.2017
And now …
Enjoy the course!
26.-28.6.2017 Intel MIC Programming Workshop @ LRZ