+ All Categories
Home > Documents > High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November...

High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November...

Date post: 08-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
20
Mitglied der Helmholtz-Gemeinschaft High Performance Computing meets Radio Astronomy Willi Homberg German SKA Science Meeting 12-13 February 2014 Bielefeld Jülich Supercomputing Centre (JSC)
Transcript
Page 1: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Mitg

lied

de

r H

elm

hol

tz-G

emei

nsc

haft

High Performance Computingmeets

Radio Astronomy

Willi Homberg

German SKA Science Meeting12-13 February 2014

Bielefeld

Jülich Supercomputing Centre (JSC)

Page 2: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

SKA Observatory Diagram

Credit: Peter Dudney, SKA project office

Page 3: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

SDP Element Concept● Ingest processor including routing capability

■ Data rate out of correlator: 4670 / 842 GBytes/s (SURVEY/LOW), 1800 Gbytes/s (Mid)■ Max data rate into SDP: 995 GBytes/s (SURVEY/LOW), 255 Gbytes/s (Mid)

● A data parallel processing system■ local data buffer linked to ingest processor■ multi-core CPU (multi-GPU) based compute system

■ Max computing load: 32 PFlop/s (SURVEY/LOW), 10 PFlop/s (Mid)■ Emphasis is on the framework to manage the throughput

■ Max buffer size: 14 PBytes (SURVEY/LOW), 11 Pbytes (Mid)■ Hardware platform to be replaced on a short duty cycle

● Tiered data archive: fast-access (data < 1 year), higher latency (tape) archive● Master Controller and data archives

Credit: Bojan NikolicSDP Management Team

Page 4: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Top500 Trends: Peak Performance#1 Nov 2013: Tianhe-2Peak: 54,902.4 Tflop/sLinpack: 33,862.7 TFlop/s

Page 5: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

SDP Processing Example Implementation

Bojan Nikolic12.12.2013

Page 6: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Processors: Increasing parallelism● Limits of frequeny increase have been reached

■ Typical clock rates of HPC processors: 2.5-4 GHz● Increase of parallelism at various levels

■ Multiple to many cores■ Simultaneous multi-threading■ SIMD instructions

■ 128, 256, 512 bit operands, up to 16-way SP floating-point operations● Relevant processor architectures

■ Intel Xeon: used by >80% of TOP500 systems; stable road-map■ IBM POWER: small markest share; new opportunities due to OpenPOWER■ Other: AMD Opteron, Blue Gene/Q processor, Sparc VIIIfx

● Accelerators:■ NVIDIA Kepler

■ K20, K40■ Intel Knights Corner

■ Xeon Phi

5110P, 7120

NVIDIA Tesla K20X (1x Kepler GK110)● Flops: 3.94 / 1.31 TFlops SP / DP● Compute Units: 14● Processing Elements: 192 / CU● Total # PEs: 14 x 192 = 2688● CU frequency: 732 MHz● Memory: 6 GB (ECC) – 384bit● Memory frequency: 5.2 GHz● Memory bandwidth: 250 GB/s● Power consumption: 235 W

INTEL Xeon Phi (MIC) Coprocessor 5110P● Flops: 2.02 / 1.01 TFlops SP / DP● Compute Units: 60 (Cores)● Processing Elements: 16 / Core● Total # PEs: 60 x 16 = 960● Core frequency: 1.053 GHz● Memory: 8 GB● Memory bandwidth: 320 GB/s● Power consumption: 225 W

Page 7: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

System Memory

● Top 10 ranks of Top500 (November 2013)

■ Max memory capacity stagnating (~1.5 PiBytes)

■ But: TOP500 provides selective view

■ Increasing number of systems with accelerators

● GORDON@SDSC■ Architecture integrating a large

number of SSDs■ Rank #129 at Top500 list

[GiByte]

Page 8: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Storage Technology Parameters

● Storage Capacity:■ HDD (O(2) Tbytes)■ SSD (Laptop/Desktop O(256) Gbytes■ Enterprise O(1) Tbytes)

● Bandwidth:■ Data transfer rate from disk or storage system■ Can be Measured at different levels■ HDD (O(150) Mbytes/s)■ SSD (Laptop/Desktop O(400) Mbytes/s■ Enterprise O(2.5) Gbytes/s) ■ System level: JUST3 66GB/s, JUST4-GSS 160GB/s

● IOPS■ Number of I/O operations per second■ Linked to bandwidth by request size■ HDD (75-210 IOPS)■ SSD (8.6k-1.2m IOPS)

Page 9: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Storage Issues

RAID rebuild time■ JUST 3: Large disks plus standard RAID6:

■ long time to rebuild ■ observed rebuild times of about 22 hours■ 3-4 failures per week■ Risk of failures during time of rebuild:

Observed failure of 2nd disk during rebuild■ Performance penalties during rebuilt:

noticeable storage server performance degradation

■ JUST 4: GPFS Storage Server (GSS):■ Uses de-clustered RAID for faster rebuild■ End-to-end integrity checksum

● Availability, scalability, maintainability● Silent data corruption

● Module calculations assuming UDE/IO rate of 10-13 (estimated mean time to undetected error for 1000 disk system over 5 years)

● JUST storage system comprises O(10,000) disks

JSC tape system:

● Actual capacity 44.5 Pbytes

● Maximum capacity 100 Pbytes

● Tapes 16600

● Libraries 2 (at different locations in JSC)

● Transfer (T1000C up to 240 MB/s, T1000B/A up to 120 MB/s)

Page 10: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Network link technologies and topologies

● InfiniBand■ Top500 system share of 41.4 %

● Ethernet■ Top500 system share of 42.4 %■ Very large market

● Proprietary link technologies■ TOFU■ Blue Gene/Q■ Aries■ EXTOLL

● Processor network attachment■ PCIe attached

■ Most common approach■ most systems still Gen2

■ Proprietary IO bus, e.g. Fujitsu HSIO■ On-chip transceivers, e.g. BG/Q

● Key aspects■ Nearest-neighbour connectivity■ Network diameter■ Bi-sectional bandwidth

● Popular technologies■ Fat tree■ D-dimensional torus■ Toroidal topologies like TOFU■ dragonfly

Page 11: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

HPC Energy Efficiency

● Energy efficiency in HPC■ Top500■ Green500■ PUE

● Cooling■ Air-cooling■ Water-cooling■ Warm-water■ Free cooling

● Energy-aware scheduling■ LRZ SuperMUC

■ configuration dependent of application profile ■ reduce power vs. decrease execution time

■ eeClust■ Fit4Green

Courtesy of Erich Strohmaier, Lawrence Berkeley National Laboratory

5.04 x in 5 y

3.13 x in 5 y

3.25 x in 5 y

Page 12: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

System Software and Management● Operating system

■ support fault tolerance and fault-resiliency● Interconnect management

■ adaptive and dynamic routing■ congestion control

● Cluster management■ on-the-fly analysis monitoring■ post mortem data mining■ health checking

● Resource management and job scheduling■ load balancing■ flexible allocation coupled with applications

● Energy efficiency● Programming environment / basic porting

■ languages: C/C++, FORTRAN, Python■ standards: MPI, OpenMP, OpanACC, OpenCL, CUDA, OmpSs■ compiler, debugger

● tuning applications, performance analysis

Page 13: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Performance Analysis on Extreme-Scale Systems● Technical challenges:

■ Heterogeneity■ Extreme concurrency■ Perturbation and data volume■ Drawing insight from measurements■ Quality information sources

● Steps:■ Instrumentation■ Measurement

■ Profiling (time, counts)■ Tracing (events)■ Filtering, reporting, examination

Page 14: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Summary

● Processor technology■ Increasing parallelism■ Accelerator support

● Memory trends■ DRAM, GDDR5, SSD

● Storage technology■ Parameters■ Issues

● Network link technologies and topologies● Energy efficiency● Software environment

■ System software and management■ Performance analysis

Page 15: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

End of Presentation

Page 16: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Jülich Supercomputing Centre Supercomputer operation for

Centre – FZJ

Regional – JARA

Helmholtz & National – NIC, GCS

Europe – PRACE, EU committees

Application support

SimLabs

Cross Sectional Groups

Peer review coordination

R&D Work

Algorithms, performance analysis, and tools

Community data management service

Novel computer architectures:

Exascale laboratories: EIC (IBM), ECL (Intel), NVIDIA

Education and Training

Page 17: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Processors and Accelerators

Page 18: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Memory technologies

● DDR SRAM■ Mass market, clear road-map

● High-bandwidth memory■ New, emerging solutions■ Small, volatile market

● Dense memory■ Technical limitations, unclear

road-map■ Large market

Page 19: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Network link technologies and topologies

[M.Gerndt, 2013]

[Faanes et al., 2012]

Page 20: High Performance Computing meets Radio Astronomy Willi … · Top 10 ranks of Top500 (November 2013) Max memory capacity stagnating (~1.5 PiBytes) But: TOP500 provides selective view

Comparison of selected Top500 systems


Recommended