DCGData Center Group
Intel HPC Co-Design Activities in EuropeHans-Christian Hoppe, Principal Engineer, Intel
RoMoL 2016 Workshop, UPC, Barcelona, March 17, 2016
DCGData Center Group
Legal Disclaimers
Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult
other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your
system hardware, software or configuration may affect your actual performance.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www-ssl.intel.com/content/www/us/en/high-
performance-computing/path-to-aurora.html.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other
sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit
http://www.intel.com/performance.
Intel, the Intel logo, Xeon, Intel Xeon Phi, Intel Optane and 3D XPoint are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other
countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation. All rights reserved.
2
DCGData Center Group
Outline
Scalable System Framework
HPC as an integrated HW/SW platform
Moore’s Law and scaling
It might go post-CMOS
Intel pathfinding collaborative R&D in Europe
Brief introduction to the six DCG joint R&D labs
Intel’s HPC collaboration with BSC
Co-Design projects in Framework 7 and Horizon 2020
DEEP and DEEP-ER on system-level heterogeneity
NEXTGenIO on how to best use NVDIMM technology
3
DCGData Center Group
DCG Data Center Group
HPC is More than Just the CPU
4
DCGData Center Group 2
Growing Challenges in HPC
“The Walls”System Bottlenecks
Memory | I/O | StorageEnergy Efficient Performance
Space | Resiliency | Unoptimized Software
Divergent Infrastructure
Barriers to Extending Usage
Heterogeneity | Resources Split Among Modeling and Simulation
| Big Data Analytics | Machine Learning | Visualization
HPCOptimized
Democratization at Every Scale | Cloud Access |
Exploration of New Parallel Programming Models
BigDatahpc
Machine learning
visualization
DCGData Center Group 3
A Holistic Architectural Approach is Required
Compute
Memory
Fabric
Storage
PE
RF
OR
MA
NC
E
I
CA
PA
BIL
ITY
TIME
System Software
Innovative Technologies Tighter Integration
Memory
Cores
Graphics
Fabric
FPGA
I/O
Compute Memory/Storage
Fabric Software
Intel Silicon
Photonics
Intel® Scalable System Framework
DCGData Center Group
OpenHPC: Community-driven Common HPC SW Platform
7
OpenHPC is a community-driven effort to
Provide a common HPC SW platform that works across
segments and enables end-users to collaborate and innovate
Simplify the complexity of installation, configuration, and
ongoing maintenance of HPC software stacks
Receive contributions and feedback from community
Deliver integrated hardware and software innovations to
ease the path to Exascale
Status
Community established, V 1.01 of initial OpenHPC SW
stack available for download
Significant participation by OEMs, users and ISVs/OSVs
Community website at http://www.openhpc.community
has all technical information
We’d like to encourage you to join
DCGData Center Group
DCG Data Center Group
Brief Look at Moore’s Law & Scaling
8
DCGData Center Group
Process & Device Innovation for Cost, Performance, Architecture & Energy Benefits
9
32 nm 22 nm 14 nm 10 nm 7 nm 5nm
Moore’s Law: use half the space for the same circuitry (cost reduction) OR have twice the circuitry in the same space (architecture innovation)
Device innovation is critical to realize power and
performance benefits
Green line shows power/performance achievable with
CMOS device
Blue line shows scaling improvements for alternative
electronic device
Some Spintronic devices show additional power
reductions
Potential future: combinations of CMOS and
non-CMOS devices
Sources: Bill Holt presentations at ISSCC 2016 & 015 Intel Investor Forum
DCGData Center Group
Scaling going Forward: Lots of Options being Considered
10Sources: Bill Holt presentations 2015 Intel Investor Forum
DCGData Center Group
DCG Data Center Group
Intel Pathfinding R&D Labs in Europe
11
DCGData Center Group
IPAG Europe Labs in Europe
Juelich
Leuven
Paris
Geneva
Barcelona
HPC System Architecture
Fast Simulators, machine learning
(EXCAPE)
High Throughput Comp, Big Data
Scalable Tools, Progr. Models
Workloads, Data Center Analytics
EdinburghData Science Algorithms
12
DCGData Center Group
System Architecture Work
Objectives • Innovate with strategic local European partners
• Drive co-design for key usage scenarios
• Build and validate actual system prototypes
Multicore +
Manycore
Fast local storage
Next-gen
Manycore
NVDIMMs
I/O architecture
Towards system-level heterogeneity• Deliver efficiency and scalability for
real-world workloads
• Support flexible combination of resources
• Provide scalable and highly efficient I/O
• Demonstrate resiliency
1 Pflop/s
10 Pflop/s
13
100 Pflop/s
DCGData Center Group
Advance Exascale readiness of critical HPC applications
Tools and methodologies for assisting transition to future architectures
Supporting Tera 1000 user codes: material science, fusion, life science, engineering
Results to impact wider eco system and community
EXA2CT and READEX EU-funded projects
Workload Analysis (Paris)
14
DCGData Center Group
Apply upcoming technologies in the context of real-time processing at the LHC
Design for readout chain of new data acquisition system by 2019
Specifically investigate the benefits of Manycore, advanced fabrics, reconfigurable computing
Challenge: address the 10x increase in BW for high rate data acquisition
Partner: CERN, LHCb experiment
High Throughput Computing (Geneva)
15
DCGData Center Group
End-to-end Analysis and Modeling, Programming Models and Big Data(Barcelona)
End-to-end performance analysis oflarge-scale systems
HPC and Big Data workloads Memory access and data placement analysis
System-level performance predictions Capture interaction of computation and communication Support architecture decisions & system configurations
Scale-out extrapolations Capture of workload and system scaling properties Critical to architect extreme-scale systems
Dynamic programming models Achieve (power) efficiency within a node Accommodate dynamic load balancing or system
control (malleability)
Big Data and SDI Analyze scalable I/O (Lustre) and Big Data systems (Hadoop) Scheduling and orchestration for SW-defined infrastructures
Scalable System Framework Evaluate reference SSF installation with tools and applications
16
DCGData Center Group
DCG Data Center Group
Co-Design in European Projects
17
DCGData Center Group
DEEP: System Level Heterogeneity
18
Conceptual idea Have different top-level parts of a system with different characteristics Execute each application/workflow component on the part that best
matches its characteristics
Cluster/Booster implementation in DEEP COTS HPC Cluster (Intel® Xeon®)
combined withy Manycore Booster (Intel® Xeon PhiTM)
Highly regular & scalable application parts run on Booster profit from high SIMD performance and parallelism
Less scalable parts run on Cluster profit from OOO execution and high per-thread performance
Booster uses high-performance EXTOLL 3D Torus network
Booster Interface implements zero-copy network bridging
Scalable high-frequency RAS plane on Booster can assist in dynamic resource management
This work has received funding from the European Union's Seventh Framework Programme under grant agreements 287530 (DEEP) and 610476 (DEEP-ER)
DCGData Center Group
DEEP/DEEP-ER: Programming Model and Applications
19
Programming model and RTS are based on standard, proven solutions MPI spanning both system parts and supporting process management Extensions to OmpSs model to accommodate collective offload of
MPI-parallel application parts Tight co-design loop with applications to get this right
Radio astronomy (SKA, Astron)
Electromagnetic fields (INRIA)
Seismic simulation (LRZ, TU Munich)
FWI seismic imaging (BSC)
Lattice QCD(Uni Regensburg)
CFD & combustion (CERFACS)
Materials Monte-Carlo (CINECA)
RTM seismic imaging (CGG)
Brain simulation(HBP, EPFL)
Space weather(KU Leuven)
Climate(Cyprus Institute)
This work has received funding from the European Union's Seventh Framework Programme under grant agreements 287530 (DEEP) and 610476 (DEEP-ER)
DCGData Center Group
DEEP: Prototype System(s)
20
Booster part 384 Intel® Xeon PhiTM (KNC) nodes EXTOLL NICs using FPGAs (20 Gbit/s
per link) Eurotech boards and direct liquid
cooling Peak performance of 400 Tflop/s,
energy efficiency of 3.5 Gflop/s/WCluster part 128 dual-socket Intel Xeon nodes
(Sandy Bridge) InfinibandTM QDR network
Immersive cooled “ASIC Evaluator” 32 Intel® Xeon PhiTM (KNC) nodes EXTOLL NICs using ASICs
(60 Gbit/s per link) Innovative, 2-phase immersion
cooling (GreenIceTM)
This work has received funding from the European Union's Seventh Framework Programme under grant agreements 287530 (DEEP) and 610476 (DEEP-ER)
DCGData Center Group
DEEP-ER: Technology Refresh, I/O and Resiliency
21
Improvements Use 2nd generation Intel® Xeon PhiTM for Booster Have EXTOLL network span the whole system Add local storage to Booster nodes
New concepts in DEEP-ER PCIe-attached local NVMe storage
for the Booster nodes
PoC for network-attached memory (NAM) as a shared memory resource
Adaptation of high-performance parallel file system (BeeGFS)
Distributed and multi-level checkpointing/restart scheme with redundancy
Automatic task-based resiliency added to OmpSs
This work has received funding from the European Union's Seventh Framework Programme under grant agreements 287530 (DEEP) and 610476 (DEEP-ER)
DCGData Center Group
DEEP-ER: Prototype Design
22
Booster based on Eurotech’s next-generation Aurora Blade platform Each blade will have an Intel® Xeon PhiTM (KNL) KNL CPU with
16 GByte of MC-DRAM and 96 GByte of DDR4 memory on board
Backplane routes 2 PCI Express generation 3 links to a root card
Root card manages the blade boards and provides PCI Express addincard slots for EXTOLL NIC and Intel NVMe devices
Leverages Eurotech’s direct liquid cooling technology
EXTOLL NIC to provide seven links with 100 Gbit/s bandwidth each
Booster system to use 3D torus topology
NAM based on Xilinx Virtex 7 FPGA and using HMC memory
Cluster integrated by Megware 16 standard dual Intel® Xeon ® (Haswell) CPU nodes
Two storage servers and one metadata server
This work has received funding from the European Union's Seventh Framework Programme under grant agreements 287530 (DEEP) and 610476 (DEEP-ER)
DCGData Center Group
DEEP/DEEP-ER: Partners and Links
23
Links DEEP homepage: http://www.deep-project.eu/ DEEP-ER homepage: http://www.deep-er.eu/
(Coordinator)
DCGData Center Group
NEXTGenIO: Leverage Non-Volatile DIMM Technology
24
Intel’s 3D XPointTM technology adds a new layer to the memory hierarchy
Very large memory capacity plus persistence
Access speeds within an order of magnitude of DDR4
NEXTGenIO investigates how to leverage and influence thistechnology for
Truly scalable and highly efficient parallel I/O
Memory-hungry HPC and data analytics applications
Optimization of system throughput for highly complex workflows
NEXTGenIO develops
A comprehensive and systematic requirements analysis
A system prototype based on Intel® Xeon® nodes
System SW components for NVDIMM access, data-aware scheduling and object storage
Tools for analyzing and optimizing application and workflows
This work has received funding from the European Union's Horizon 2020 Programme under grant agreement 671591.
DCGData Center Group
NEXTGenIO: Partners and Links
25
Links NEXTGenIO homepage: http://www.nextgenio.eu/