Hiroshi Nakamura
Director of Information Technology Center The Univ. of Tokyo
(Director of JCAHPC)
Introduction of Oakforest-PACS
Outline
• Supercomputer deployment plan in Japan • What is JCAHPC? • Oakforest-PACS system • Application • Summary
Impacts of extreme scale computing (2017/11/2) 2
Computational Resource Providers of HPCI
• Tier1: K Computer at RIKEN • Tier2: Supercomputers of 9 universities and 2 research
institutes : Oakforest-PACS
Impacts of extreme scale computing (2017/11/2) 3
Oakforest-PACS
Deployment plan of Tier-2 Supercomputers (as of May. 2017) available at HPCI Consortium (www.hpci-c.jp)
Impacts of extreme scale computing (2017/11/2) 4 Power consumption indicates maximum of power supply (includes cooling facility)
http://www.hpci-c.jp/news/HPCI-infra-summary.pdf
Towards Exascale Computing
Impacts of extreme scale computing (2017/11/2) 5
1
10
100
1000 Post K Computer
T2K
PF
2008 2010 2012 2014 2016 2018 2020
U. of Tsukuba U. of Tokyo Kyoto U.
RIKEN AICS
Future Exascale
Tokyo Tech. TSUBAME2.0
JCAHPC (The Univ. of Tokyo and Univ. of Tsukuba)
Oakforest-PACS
Tier-1 and Tier-2 supercomputers move forward to Exascale computing like two wheels
JCAHPC • Joint Center for Advanced High Performance
Computing (http://jcahpc.jp) • director: Hiroshi Nakamura @ ITC, U-Tokyo
• established in 2013 under agreement between • Information Technology Center (ITC) at
The University of Tokyo • Center for Computational Sciences (CCS) at
University of Tsukuba • Design, operate and manage next-generation
supercomputer system for researchers Community of advanced HPC research
Impacts of extreme scale computing (2017/11/2) 6
Procurement Policy of JCAHPC • joint procurement by two universities • uniform specification, single shared system
• Each university is financially responsible to introduce the machine and its operation
• first attempt in Japan • the largest class of budget as national universities’
supercomputer in Japan Oakforest-PACS : largest scale in Japan
• investment ratio: U. Tokyo : U. Tsukuba = 2:1
Impacts of extreme scale computing (2017/11/2) 7
Oakforest-PACS • Full operation
started Dec. 2016 • Official Program
started on April 2017
• 25 PFLOPS peak • 8208 KNL CPUs • Fat-Tree (full
bisection BW) by OmniPath
Impacts of extreme scale computing (2017/11/2) 8
• HPL 13.55 PFLOPS: #1 in Japan (2017/6) WW #6(2016/11)➙#7(2017/6)
• HPCG WW #3(2016/11)➙#5(2017/6)
Location of Oakforest-PACS : Kashiwa Campus of U. Tokyo
Impacts of extreme scale computing (2017/11/2) 10
Hongo Campus of U. Tokyo
Univ. of Tsukuba
Kashiwa Campus Univ. of Tokyo
Oakforest-PACS in the Room
11 Impacts of extreme scale computing (2017/11/2)
2nd floor of Kashiwa Research Complex
http://news.mynavi.jp/news/2016/12/02/035/
Specification of Oakforest-PACS
Impacts of extreme scale computing (2017/11/2) 12
Total peak performance 25 PFLOPS Total number of compute nodes
8,208
Compute node
Product Fujitsu PRIMERGY CX600 M1 (2U) + CX1640 M1 x 8node
Processor Intel® Xeon Phi™ 7250 (Code name: Knights Landing), 68 cores, 1.4 GHz
Memory High BW 16 GB, 490 GB/sec (MCDRAM, effective rate) Low BW 96 GB, 115.2 GB/sec (peak rate)
Inter-connect
Product Intel® Omni-Path Architecture Link speed 100 Gbps Topology Fat-tree with (completely) full-bisection
bandwidth
Computation node & chassis
Impacts of extreme scale computing (2017/11/2) 13
Computation node (Fujitsu next generation PRIMERGY) with single chip Intel Xeon Phi (Knights Landing, 3+TFLOPS) and Intel Omni-Path Architecture card (100Gbps)
Chassis with 8 nodes, 2U size
Water cooling wheel & pipe
Impacts of extreme scale computing (2017/11/2) 14
Rack
15 Chassis with 120 nodes per Rack
rear panel radiator water cooling pipe
Full bisection bandwidth Fat-tree by Intel® Omni-Path Architecture
Impacts of extreme scale computing (2017/11/2) 15
12 of 768 port Director Switch (Source by Intel)
362 of 48 port Edge Switch
2 2
24 1 48 25 72 49
Uplink: 24
Downlink: 24
. . . . . . . . . Compute Nodes 8208
Login Nodes 20
Parallel FS 64
IME 300
Mgmt, etc. 8
Total 8600
• All the nodes are connected with FBB Fat-tree • globally full bisection bandwidth is preferable
for flexible job management. • 2/3 of system : University of Tokyo • 1/3 of system : University of Tsukuba • but job assignment is flexible (no boudary)
Specification of Oakforest-PACS (I/O)
Impacts of extreme scale computing (2017/11/2) 16
Parallel File System
Type Lustre File System
Total Capacity 26.2 PB
Product DataDirect Networks SFA14KE
Aggregate BW 500 GB/sec
File Cache System
Type Burst Buffer, Infinite Memory Engine (by DDN) Total capacity 940 TB (NVMe SSD, including parity data by
erasure coding) Product DataDirect Networks IME14K
Aggregate BW 1,560 GB/sec Power consumption 4.2 MW (including cooling) actually ~3.0MW # of racks 102
Software of Oakforest-PACS
Impacts of extreme scale computing (2017/11/2) 17
Compute node Login node
OS CentOS 7, McKernel Red Hat Enterprise Linux 7 Compiler gcc, Intel compiler (C, C++, Fortran), XcalbleMP MPI Intel MPI, MVAPICH2 Library Intel MKL
LAPACK, FFTW, SuperLU, PETSc, METIS, Scotch, ScaLAPACK, GNU Scientific Library, NetCDF, Parallel netCDF, Xabclib, ppOpen-HPC, ppOpen-AT, MassiveThreads
Application mpijava, XcalableMP, OpenFOAM, ABINIT-MP, PHASE system, FrontFlow/blue, FrontISTR, REVOCAP, OpenMX, xTAPP, AkaiKKR, MODYLAS, ALPS, feram, GROMACS, BLAST, R packages, Bioconductor, BioPerl, BioRuby
Distributed FS Globus Toolkit, Gfarm Job Scheduler Fujitsu Technical Computing Suite Debugger Allinea DDT Profiler Intel VTune Amplifier, Trace Analyzer & Collector
Post-K Computer and Oakforest-PACS as the two wheels of HPCI in Japan
• Oakforest-PACS fills blank period between K Computer and Post-K Computer
• Installation of Post-K Computer is planned in 2020-2021 • Shutdown of K Computer is planned in 2018-2019 ??
• System software in Oakforest-PACS developed for Post-K • McKernel
• OS for Many-core era, for a number of thin-cores without OS jitter and core binding
• Primary OS (based on Linux) on Post-K, and application development goes ahead
• XcalableMP (XMP) • Parallel programming language for directive-base easy coding on
distributed memory system • Not like explicit message passing with MPI
Impacts of extreme scale computing (2017/11/2) 18
Oakforest-PACS resource sharing program (nation-wide) • As JCAHPC (20%)
• HPCI – HPC Infrastructure program in Japan to share all supercomputers (free!)
• Big challenge special use (full system size)
• As U. Tokyo (56.7%) • Interdisciplinary Joint Research Program • General use • Industrial trial use • Educational use • Young & Female special use
• As U. Tsukuba (23.3%) • Interdisciplinary Academic Program • Large scale general use
Impacts of extreme scale computing (2017/11/2) 19
Applications on Oakforest-PACS 20
• ARTED (SALMON) – Electron Dynamics
• Lattice QCD – Quantum Chrono Dynamics
• NICAM & COCO – Atmosphere & Ocean Coupling
• GHYDRA – Earthquake Simulations
• Seism3D – Seismic Wave Propagation
Impacts of extreme scale computing (2017/11/2)
Summary • JCAHPC : joint resource center for advanced HPC
by Univ. of Tokyo and Univ. of Tsukuba • for community for advanced HPC research
• Oakforest-PACS is currently #1 supercomputer in Japan and available for nation-wide resource sharing programs
• Oakforest-PACS and Post-K : two wheels of HPCI • Oakforest-PACS is also a testbed for McKernel and XcalableMP
system software to support Post-K development
• Full system scale applications are under development with extreme scale and getting new results
• fundamental physics, global science, disaster simulation, material science, etc.
Impacts of extreme scale computing (2017/11/2) 21