© NVIDIA Corporation 2013
CUDA TRAINING DAY – BIRMINGHAM UNIVERSITY Jeremy Purches & Dr Timothy Lanfear 31st July 2013
© NVIDIA Corporation 2013
AGENDA
10:30 Introduction to GPU Computing
Introduction to NVIDIA & GPU Products – Jeremy Purches
Introduction to GPU programming – Tim Lanfear Programming languages (CUDA, OpenACC)
Programming environment and tools
12:00 Lunch
13:00 GPU programming with CUDA
Hands-on training – Tim Lanfear
16:00 Close
Birmingham University 31st July 2013
http://www.nvidia.co.uk/object/gpu-computing-survey-uk.html Course feedback/survey:
NVIDIA GPU
TECHNOLOGY
Jeremy Purches HPC Business Development Manager
Birmingham University 31st July 2013
© NVIDIA Corporation 2013
GPU
Mobile
Cloud
GeForce®
Quadro®
, Tesla® Tegra® GRID™
NVIDIA — Core Technologies and Brands
© NVIDIA Corporation 2013
The GPU is one of the most complex processors
ever created, with more than 7 billion transistors.
NVIDIA has shipped over 1 billion GPUs.
NVIDIA GPU
© NVIDIA Corporation 2013
GPU Roadmap
2012 2014 2008 2010
DP G
FLO
PS p
er
Watt
Kepler
Tesla
Fermi
Maxwell
Volta Stacked DRAM
Unified Virtual Memory
Dynamic Parallelism
FP64
CUDA
32
16
8
4
2
1
0.5
© NVIDIA Corporation 2013
Tesla Kepler Family World’s Fastest and Most Efficient HPC Accelerators
GPUs
Single
Precision
Peak
(SGEMM)
Double
Precision
Peak
(DGEMM)
Memory
Size
Memory
Bandwidth
(ECC off)
System Solution
Weather & Climate,
Physics, BioChemistry, CAE,
Material Science
K20X 3.95 TF
(2.90 TF)
1.32 TF
(1.22 TF) 6 GB 250 GB/s Server only
K20 3.52 TF
(2.61 TF)
1.17 TF
(1.10 TF) 5 GB 208 GB/s
Server +
Workstation
Image, Signal,
Video, Seismic K10 4.58 TF 0.19 TF 8 GB 320 GB/s Server only
© NVIDIA Corporation 2013
Tesla Kepler Product Family
Excellent DP for widest range
of applications
Double precision performance
leader for the most demanding HPC
applications
Highest memory bandwidth and
single precision for seismic, signal,
image, video, molecular dynamics
Bubble size is single
precision SGEMM in
Teraflops K10 3 TF SGEMM
K20X
K20 M2090
M2075
© NVIDIA Corporation 2013
2.73x 3.20x
7.17x
8.85x
10.20x
0
2
4
6
8
10
12
NAMD WL-LSMS AMBER SPECFEM3D Chroma
Speed U
p
2x CPU
1x Tesla K20X + 1x CPU
Performance on Leading Scientific Applications
K20X Relative Performance vs. dual-socket Sandy Bridge
2x CPU = 2x Sandy Bridge E5-2687, 3.10 GHz 1x Tesla K20X + 1x CPU = 1x Tesla K20 GPU; 1x Sandy Bridge E5-2687, 3.10 GHz
1x
© NVIDIA Corporation 2013
Developer Momentum Continues to Grow
2008 2013
4,000 Academic Papers
150K CUDA Downloads
60 University Courses
100M CUDA –Capable GPUs
1 Supercomputer
430M CUDA-Capable GPUs
50 Supercomputers
1.6M CUDA Downloads
640 University Courses
37,000 Academic Papers
© NVIDIA Corporation 2013
0
5
10
15
20
25
30
35
40
2006 2007 2008 2009 2010 2011 2012
Performance of Accelerators To
tal Perf
orm
ance (
PFLO
PS)
NVIDIA Kepler
NVIDIA Fermi
Intel Xeon Phi
IBM Cell
Other
19% of FLOPS from GPU systems
© NVIDIA Corporation 2013
TITAN: World’s Fastest Open Science Supercomputer
18,688 Tesla K20X GPUs
27 Petaflops Peak, 17.59 Petaflops on Linpack
90% of Performance from GPUs
© NVIDIA Corporation 2013
CSCS - Europe’s Fastest GPU Supercomputer Switzerland’s Piz Daint, to be Powered by Tesla K20X
Astrophysics · Climate & Weather · Genomics · Geophysics · Material Science
© NVIDIA Corporation 2013
3150 MFLOPS/Watt
128 Tesla K20 Accelerators
$100k Energy Savings / Yr
300 Tons of CO2 Saved / Yr 0
1000
2000
3000
CINECA Eurora-Tesla K20
NICS Beacon-Greenest Xeon
Phi System
C-DAC- GreenestCPU System
MFLOPS/Watt
CINECA Eurora
“Liquid-Cooled” Eurotech Aurora Tigon
Greener than Xeon Phi, Xeon CPU
World’s Most Energy Efficient Supercomputer
© NVIDIA Corporation 2013
e-Infrastructure South Consortium
EMERALD
• 84 HP SL390 G7 servers
• 372 NVIDIA M2090 GPUs • Voltaire QDR IB Network
• Gnodal 10G Ethernet
• 135TB Panasas Storage
The UK's most powerful GPU-based supercomputer,
"Emerald", has been unveiled at the Science and Technology
Facilities Council's (STFC) Rutherford Appleton Laboratory (RAL).
Using the newly-available technology researchers will soon tackle areas
ranging from healthcare (Tamiflu and swine flu); astrophysics (real-time
pulsar detection application for the forthcoming Square Kilometre Array
Project), bioinformatics (analysis and statistical modelling of whole-genome
sequencing data); climate change modelling; complex engineering
systems; simulating 3G and 4G communications networks and developing
new tools for processing and managing medical images.
© NVIDIA Corporation 2013
Super Computer Performance Development
iPhone 4s (1.02 Gflop/s)
Laptop (70 Gflop/s)
GPU (1.3 Tflop/s)
© NVIDIA Corporation 2013
Accelerator Computing Now Mainstream
Our end customer survey shows that 78.4% of HPC sites
are planning to include accelerators/coprocessors in
their next technical computing server purchase, up from
29% just 2 years ago.
IDC
HPC Market Survey
April, 2013
“
”
© NVIDIA Corporation 2013
The Era of Accelerated Computing is Here
1980 1990 2000 2010 2020
Era of
Vector Computing
Era of
Accelerated Computing
Era of
Distributed Computing
© NVIDIA Corporation 2013
OIL & GAS
MANUFACTURING
MEDIA & ENTMNT.
EDU/RESEARCH
LIFE SCIENCES
GOVERNMENT
DATA ANALYTICS
FINANCE
GPUs Central To Computing
Air Force
Research
Laboratory
Chinese
Academy
Of Sciences
© NVIDIA Corporation 2013
Top Scientific Apps
Computational
Chemistry
AMBER
CHARMM
GROMACS
LAMMPS
NAMD
DL_POLY
Material Science QMCPACK
Quantum Espresso
GAMESS-US
Gaussian
NWChem
VASP
Climate &
Weather COSMO
GEOS-5
CAM-SE
NIM
WRF
Physics Chroma
Denovo
GTC
GTS
ENZO
MILC
CAE ANSYS Mechanical
MSC Nastran
SIMULIA Abaqus
ANSYS Fluent
OpenFOAM
LS-DYNA
Explosive Growth of GPU Accelerated Apps
0
50
100
150
200
2010 2011 2012
# of Apps
40% Increase
61% Increase
Accelerated, In Development
© NVIDIA Corporation 2013
Top Applications Now with Built-in GPU Support
AMBER
NAMD
GROMACS
CHARMM
LAMMPS DL_POLY
Non-GPU Apps
Molecular Dynamics
Adobe CS
Apple Final Cut
Sony Vegas Pro
Avid Media Composer
Autodesk 3dsMax
Other GPU Apps
Non-GPU Apps
Digital Content Creation
Gaussian GAMESS
NWChem
CP2K Quantum Espresso
Non-GPU Apps
Quantum Chemistry
ANSYS
Simulia Abaqus
MSC Nastran Altair
Radioss
Non-GPU Apps
Computer-Aided Engineering
Application
Market Share
by Segment
© NVIDIA Corporation 2013
ANSYS Fluent 14.5 Multi-GPU Demonstration
G1 G2 G3 G4
8-Cores 8-Cores 16-Core Server Node
Multi-GPU Acceleration of
a 16-Core ANSYS Fluent
Simulation of External Aero
Xeon E5-2667 CPUs + Tesla K20X GPUs
2.9X Solver Speedup
CPU Configuration CPU + GPU Configuration
Click to Launch Movie
THANK YOU
Contact: [email protected]
Mobile: 07798 700 424
http://www.nvidia.co.uk/object/gpu-computing-survey-uk.html
Course Survey:
www.nvidia.co.uk/testgpu
© NVIDIA Corporation 2013
AGENDA
10:30 Introduction to GPU Computing
Introduction to NVIDIA & GPU Products – Jeremy Purches
Introduction to GPU programming – Tim Lanfear Programming languages (CUDA, OpenACC)
Programming environment and tools
12:00 Lunch
13:00 GPU programming with CUDA
Hands-on training – Tim Lanfear
16:00 Close
http://www.nvidia.co.uk/object/gpu-computing-survey-uk.html Birmingham University 31st July 2013