+ All Categories
Home > Documents > Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with...

Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with...

Date post: 26-May-2018
Category:
Upload: leliem
View: 219 times
Download: 0 times
Share this document with a friend
12
Post-K Supercomputer Copyright 2017 FUJITSU LIMITED 1
Transcript
Page 1: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Post-K Supercomputer

Copyright 2017 FUJITSU LIMITED1

Page 2: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Fujitsu’s High-end HPC Development

Copyright 2017 FUJITSU LIMITED

Fujitsu has provided HPC systems with original technologies, developed for over 40 years, to accelerate advanced research

© RIKEN

The K computer continues to be competitive in various fields; from advanced research to manufacturing

K computer

RIKEN and Fujitsu are developing the Post-K to achieve superior application performance

Post-K computer

HPCGNo.1(2017)

Graph500No.1(2017)

Gordon Bell Prize Finalist

(2016)

PRIMEHPC FX10

PRIMEHPC FX100

2

Page 3: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Japan’s Post-K Computer Development Project

Overview• RIKEN and Fujitsu are developing the Post-K computer, which is aiming to be

the most advanced general-purpose supercomputer in the world

Goals and Approaches

Copyright 2017 FUJITSU LIMITED

Project Goals

Application performance

Power efficiency

User convenience

• Fujitsu-original CPU and interconnect• Superior compiler optimization

• Effective use of hardware resources through a co-design approach

• Building the Arm HPC ecosystem• Excellent application portability

Approaches

3

Page 4: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Post-K Fujitsu Original CPU and Interconnect

The CPU was designed to support the Arm SVE instruction set architecture (including FP16)

The CPU & Tofu maintain the programming models and provide high application performance

Copyright 2017 FUJITSU LIMITED

Functions & Architecture Post-K K computer

Processor

Base ISA + SIMD Extensions Armv8-A+SVE SPARC-V9+HPC-ACE

SIMD width [bits] 512 128

FMA: Floating-point multiply and add ✔ ✔

Inter-core barrier ✔ ✔

Sector cache ✔ Enhanced ✔

Hardware “prefetch” assist ✔ Enhanced ✔

Interconnect Tofu ✔ Enhanced ✔

4

Page 5: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Our Approach to Post-K High Performance

The compiler cooperates with hardware to improve performance.

• Designed to satisfy both performance/power and usability

Improve memory bandwidth Improve computational efficiency

• Software Prefetch• Loop-Blocking

• Stacked Memory • Software Pipelining with Loop Fission

• Auto-Vectorization with SVE

• Out-of-Order• SVE

Compiler Hardware Compiler Hardware

Memory Bandwidth-intensiveApplications

Calculation-intensiveApplications

Copyright 2017 FUJITSU LIMITED5

Page 6: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

“Smart” Auto-Vectorization forArm SVE : Next-Generation SIMD ISA

TSVC (total kernels) K computer Post-K Goal

Fortran (135) 89 111

C (151) 108 121

Copyright 2017 FUJITSU LIMITED

# of kernels vectorized on TSVC*

*[Fortran] D. Callahan, J. Dongarra, and D. Levine. “Vectorizing compilers: a test suite and results.” In Supercomputing '88, pp. 98- 105.[C] S. Maleki, Y. Gao, M. J. Garzar´n, T. Wong, and D. A. Padua, "An Evaluation of Vectorizing Compilers,” PACT2011, pp. 372-382.

// Loop s482 in TSVC kernels// is vectorized by SVEfor (int i = 0; i < LEN; i++) {

a[i] += b[i] * c[i];if (c[i] > b[i]) break;

}

• Efficient utilization of vectore.g. Gather/scatter and packed SIMD

• Highly optimized executablese.g. Utilizing deep knowledge of ISA and Post-K microarchitecture

• Per-lane predication

• Gather-load and scatter-store

• HPC-focused instructionse.g. Reciprocal inst.,

Math. acceleration inst., etc.

Arm SVE FeaturesFujitsu contributed to specifications

Advantages of Post-Kw/ Fujitsu Compiler

High vectorization rate

Smart Auto-VectorizationFollowing Loops :

- containing “if” statement- containing list-access- partial vectorization

6

Page 7: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Copyright 2017 FUJITSU LIMITED

Enables large loop SWP by reducing required registers, improving performance

Algorithm pre-trained by machine learning

• Trained by millions of patterns, machine learning determines the best weight of where to fission

• Machine learning evaluates # of registers, memory access, cache hit ratio, etc.

“Smart” Loop Fission : Increasing the Opportunity for “Software Pipelining” (SWP)

Fujitsu’s Software Pipelining with Smart Loop Fission

for (...) {

}for (...) {

}

123456789

10

for (...) {

}for (...) {

}

123456789

10

for (...) {

}

123456789

10

Instruction

Insufficientregisters for SWP

Sufficientfor SWP

Sufficientfor SWP

:::

:::

Largebasic-block

SWPedbasic-block

Smart Loop Fission

by Machine-learnedalgorithm

SWPFissioned

basic-block

7

Page 8: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Effectiveness of Software Pipelining (SWP)with “Smart” Loop Fission

NICAM-DC-MINI

• A benchmark from one of the world's most famous climate simulations (search “NICAM-DC-MINI”)

SWP with Smart Loop Fission improves performance

Copyright 2017 FUJITSU LIMITED

(Source: http://www.riken.jp/pr/topics/2013/20130920_1/)

4 inst. committed

2-3 inst. committed

1 inst. committed

Wait (calculation)

Wait (data access)

SWP without Loop Fission SWP with Loop Fission

31% speedup

Nor

mal

ized

exec

uti

on t

ime

NICAM-DC-MINI single core breakdown of execution time on FX100 (w/ 32 SIMD registers)

Wait (calculation)reduced

8

Page 9: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Building the Arm HPC Ecosystem

Fujitsu collaborates closely with partners & communities to contribute to the prosperity of the Arm HPC Ecosystem; making Arm system easy-to-use

Copyright 2017 FUJITSU LIMITED9

Open & Efficient HPC Spec.

Porting & OptimizingHPC Software Stacks

Porting & Tuning HPC Applications

Arm HPC Ecosystem

Hardware

Middleware

Application

9

Page 10: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Fujitsu’s Contributions for the Arm HPC Ecosystem

Application

Middleware

Hardware

2017 2020Porting & Tuning HPC applications

Developing an SVE simulator

Contributed to SVE specifications

Extending linux OS to support SVE and flexible RAS framework

Augmenting OSS compiler for HPC (e.g. Clang)

Optimizing scientific librariesPorting scientific libraries (PLASMA/SCOTCH/SLEPc ported)

Collaboration with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor.

Fujitsu is collaborating with Arm, Linaro HPC-SIG, OpenHPC on many projects

10

Page 11: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Appendix: What is Software Pipelining?

Original Loop

Instruction scheduling which overlaps instructions of a iteration in a loop with the following

Execute many parallel instructions AMAP, more efficiently

Copyright 2017 FUJITSU LIMITED

0

0.2

0.4

0.6

0.8

1

Effect of SWP in K computer1 kernel of NICAM-DC

(Dynamic Core of Non-hydrostatic Icosahedral Atmospheric Model)

noSWP

withSWP

Exec

uti

onCy

cle

Cou

nt

Rat

io 2.2x

wait

wait

wait

wait

wait

wait

Loop

Software Pipelined Loop

Kernel

Epilogue

Prologue

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

wait

schedule

split and shift overlaid

Instructions Execute in Order*All instructions need 2 cycles to finish

Concept Image of Software Pipelining

11

Page 12: Post-K Supercomputer - Fujitsu with Arm Collaboration with Linaro HPC-SIG/OpenHPC Collaboration with various users, developers, vendor. Fujitsu is collaborating with Arm, Linaro HPC-SIG,

Copyright 2016 FUJITSU LIMITED12


Recommended