Date post: | 05-May-2018 |
Category: |
Documents |
Upload: | truonghuong |
View: | 218 times |
Download: | 2 times |
HETEROGENEOUS
SYSTEM ARCHITECTURE:
FROM THE HPC USAGE
PERSPECTIVE
Haibo Xie, Ph.D.
Chief HSA Evangelist
AMD China
AGENDA:
GPGPU in HPC, what are the challenges
Introducing Heterogeneous System Architecture (HSA)
How HSA benefits GPGPU in HPC usage
Taking HSA to the Industry
3 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
GPU IN HPC – WHAT ARE THE CHALLENGES?
Massively Parallel Processing ?
Finding Parallelism ?
SIMDs/Vector-Arrays ?
Bringing Data to Computation ?
Refine the algorithm ?
5 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
THE PROBLEM – WHY IS IT DIFFICULT?
Not every HPC domain-science
programmer could use GPUs
Efforts on tailoring algo.
Even the size of the problem
Code reuse remains an issues
Algo., programming
Data transfer cost
Distributed memory space between
CPU and GPU embarrassed (legacy)
programming models
High software runtime overhead
Special purpose devices that lacks
the necessary tools
Hardware, tool-chain
6 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
BUT…
US Department of Energy's 20 MW expectation
Getting performance is still a problem in general
purpose HPC
Hybrid computing became a common term,
heterogeneity is now becoming a norm
ExaScale system is probably going to end up
being a optimization problem to solve
Several efforts still targeted at utilizing GPUs in
HPC
9 HPC China 2012 | HSA: from the HPC usage perspective | Oct, 30, 2012
INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE Brings All the Processors in a System into Unified Coherent Memory
10 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
INTRODUCING HETEROGENEOUS SYSTEM ARCHITECTURE Brings All the Processors in a System into Unified Coherent Memory
POWER EFFICIENT
EASY TO PROGRAM
FUTURE LOOKING
ESTABLISHED TECHNOLOGY FOUNDATION
OPEN STANDARD
INDUSTRY SUPPORT
11 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HSA APU FEATURE ROADMAP
System
Integration
GPU compute
context switch
GPU graphics
pre-emption
Quality of Service
Extend to
Discrete GPU
Architectural
Integration
Unified Address Space
for CPU and GPU
Fully coherent memory
between CPU & GPU
GPU uses pageable
system memory via
CPU pointers
Optimized
Platforms
Bi-Directional Power
Mgmt between CPU
and GPU
GPU Compute C++
support
User mode scheduling
Physical
Integration
Integrate CPU & GPU
in silicon
Unified Memory
Controller
Common
Manufacturing
Technology
12 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HSA COMPLIANT FEATURES
Optimized
Platforms
Bi-Directional Power
Mgmt between CPU
and GPU
GPU Compute C++
support
User mode scheduling
Support OpenCL C++ directions and Microsoft’s upcoming C++ AMP language.
This eases programming of both CPU and GPU working together to process
parallel workloads.
Drastically reduces the time to dispatch work, requiring no OS kernel transitions
or services, minimizing software driver overhead
Enables “power sloshing” where CPU and GPU are able to dynamically lower or
raise their power and performance, depending on the activity and which one is
more suited to the task at hand.
13 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HSA COMPLIANT FEATURES
The unified address space provides ease of programming for developers to create
applications. For HSA platforms, a pointer is really a pointer and does not require
separate memory pointers for CPU and GPU.
The GPU can take advantage of the CPU virtual address space. With pageable
system memory, the GPU can reference the data directly in the CPU domain. In
prior architectures, data had to be copied between the two spaces or page-locked
prior to use. And, NO GPU memory size limitation!
Allows for data to be cached by both the CPU and the GPU, and referenced by
either. In all previous generations, GPU caches had to be flushed at command
buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU
and GPU in an APU share a high speed coherent bus.
Architectural
Integration
Unified Address Space
for CPU and GPU
Fully coherent memory
between CPU & GPU
GPU uses pageable
system memory via
CPU pointers
14 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
GPU tasks can be context switched, making the GPU a multi-tasker. Context
switching means faster application, graphics and compute
interoperation. Users get a snappier, more interactive experience.
As more applications enjoy the performance and features of the GPU, it is important
that interactivity of the system is good. This means low latency access to the GPU
from any process.
With context switching and pre-emption, time criticality is added to the tasks
assigned to the processors. Direct access to the hardware for multi-users or
multiple applications are either prioritized or equalized.
FULL HSA FEATURES
System
Integration
GPU compute context
switch
Quality of service
GPU graphics pre-
emption
15 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HSA SOLUTION STACK
System components:
– Compliant heterogeneous computing hardware
– A software compilation stack
– A user - space runtime system
– Kernel - space system components
Overall Vision:
– Make GPU easily accessible
Support mainstream languages, expandable to
domain specific languages
Complete GPU tool-chain, Programming &
debugging & profiling like CPU does
– Make compute offload efficient
Direct path to GPU (avoid Graphics overhead)
Eliminate memory copy, Low-latency dispatch
– Make it ubiquitous
Drive HSA as a standard through HSA Foundation
Open Source key components
Application SW
Drivers
Differentiated HW CPU(s) GPU(s) Other
Accelerators
HSA Finalizer
Legacy
Drivers
Application
Domain Specific Libs
(Bolt, OpenCV™, … many others)
HSA Runtime
DirectX
Runtime
Other
Runtime
HSAIL
GPU ISA
OpenCL™
Runtime
HSA Software
16 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
Hardware - APUs, CPUs, GPUs
AMD user mode component AMD kernel mode component All others contributed by third parties or AMD
Driver Stack
Domain Libraries
OpenCL™ 1.x, DX Runtimes,
User Mode Drivers
Graphics Kernel Mode Driver
Apps Apps
Apps Apps
Apps Apps
HSA Software Stack
Task Queuing
Libraries
HSA Domain Libraries
HSA Kernel
Mode Driver
HSA Runtime
HSA JIT
Apps Apps
Apps Apps
Apps Apps
17 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HETEROGENEOUS COMPUTE DISPATCH
How compute dispatch operates
today in the driver model
How compute dispatch
improves tomorrow under HSA
18 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
Application / Runtime
HSA COMMAND AND DISPATCH CPU <-> GPU
CPU2 CPU1 GPU
19 HPC Advisory Council | HSA: platform for the future | Oct, 28, 2012
HSA INTERMEDIATE LAYER - HSAIL
HSAIL is a virtual ISA for parallel programs
– Finalized to ISA by a JIT compiler or
“Finalizer”
– Low level for fast JIT compilation
Explicitly parallel
– Designed for data parallel programming
Support for exceptions, virtual functions,
and other high level language features
Syscall methods
– GPU code can call directly to system
services, IO, printf, etc
Debugging support
20 HPC China 2012 | HSA: from the HPC usage perspective | Oct, 30, 2012
HSA TAKING PLATFORM TO PROGRAMMERS
Balance between CPU and GPU for performance and power efficiency
Make GPUs accessible to wider audience of programmers
– Programming models close to today’s CPU programming models
– Enabling more advanced language features on GPU
– Shared virtual memory enables complex pointer-containing data structures (lists, trees,
etc) and hence more applications on GPU
– Kernel can enqueue work to any other device in the system (e.g. GPU->GPU, GPU->CPU)
• Enabling task-graph style algorithms, Ray-Tracing, etc
Complete tool-chain for programming, debugging and profiling
HSA provides a compatible architecture across a wide range of programming models
and HW implementations.
21 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
HSA VALUES GPGPU – EASIER TO PROGRAM
Cacheable and coherent memory, more data
structure allowed to be freely shared
More programming models support, OpenCL,
C++ AMP, OpenMP
Single Source for all processors on the SOC
Pointer is a pointer!
Expressive runtime for rich high level
programming language, C/C++, Java, Python, C#
22 HPC China 2012 | HSA: from the HPC usage perspective | Oct, 30, 2012
HSA VALUES GPGPU – PERFORMANCE AND POWER EFFICIENCY
Pass pointer rather than moving data, support
more problem with different dataset
Reduced kernel launch time and efficient
CPU/GPU communication
Hardware managed queue and scheduling, allows
for very low-latency comm. between devices.
Good for performance and power effficiencyPre-
emption and context switching, Support for
Multiple Concurrent GPU process, Preemptive
Multitasking of CPU/GPU resources
Pre-emption and context switching, Support for
Multiple Concurrent GPU process, Preemptive
Multitasking of CPU/GPU resources
Bi-Directional Power Mgmt between CPU and GPU,
Turbo Core technology for more power efficiency
HSA FOUNDATION INITIAL FOUNDERS
© Copyright 2012 HSA Foundation. All Rights Reserved. 24
represented by ,
ARM Fellow and VP of Technology, Media Processing
represented by
Vice President, Marketing
represented by ,
Senior Director, CTO Office
represented by ,
Director, Linux Development Center
represented by ,
CVP, Heterogeneous Applications and Developer Solutions
25 HPC China 2012 | HSA: from the HPC usage perspective | Oct, 30, 2012
AMD’S OPEN SOURCE COMMITMENT TO HSA
Component Name AMD Specific Rationale
HSA Bolt Library No Enable understanding and debug
OpenCL HSAIL Code Generator No Enable research
LLVM Contributions No Industry and academic collaboration
HSA Assembler No Enable understanding and debug
HSA Runtime No Standardize on a single runtime
HSA Finalizer Yes Enable research and debug
HSA Kernel Driver Yes For inclusion in linux distros
We will open source our linux execution and compilation stack
– Jump start the ecosystem
– Allow a single shared implementation where appropriate
– Enable university research in all areas
26 HPC China 2012 | HSA: from the HPC usage perspective | Oct, 30, 2012
THE FUTURE OF HETEROGENEOUS COMPUTING
The architectural path for the future is clear
– Programming patterns established on
Symmetric Multi-Processor (SMP) systems
migrate to the heterogeneous world
– An open architecture, with published
specifications and an open source execution
software stack
– Heterogeneous cores working together
seamlessly in coherent memory
– Low latency dispatch
– No software fault lines
APU server will unleash the GPGPU power in
HPC domain
27 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
WHERE ARE WE TAKING YOU?
Switch the compute, don’t move
the data! Platform Design Goals
Every processor now has serial and
parallel cores
All cores capable, with performance
differences
Simple and
efficient program
model
Easy support of massive data sets
Support for task based programming
models
Solutions for
all platforms
Open to all
THANK YOU!
Access HSA:
http://developer.amd.com
http://hc.csdn.net
Haibo Xie:
29 HPC China 2012 | HSA: from the HPC usage perspective | Oct. 30, 2012
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is
no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information
and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or
changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT,
SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED
HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
this presentation are for informational purposes only and may be trademarks of their respective owners.
© 2012 Advanced Micro Devices, Inc.