GPU versus FPGA for high productivitycomputing
David Huw Jones, Adam Powell, Christos-Savvas Bouganis,Peter Y. K. Cheung
August 26, 2010
David Huw JonesGPU versus FPGA for high productivity computing1 / 21
Overview
I Just another GPU vs FPGA comparison?I Productivity versus performanceI First review of new COTS FPGA system, HC-1I Benchmarked on different process architectures.I Analysis from point of view of HPC user.
David Huw JonesGPU versus FPGA for high productivity computing2 / 21
Contents
I Devices
I Benchmarks
I Results
I Conclusions
David Huw JonesGPU versus FPGA for high productivity computing3 / 21
HC-1: What is it?
a
aImage courtesy of ConveyComputer
Cost ∼ £30kForm 2U
FLOPS 65 GRated 1.4 kW
Typical 600 W
David Huw JonesGPU versus FPGA for high productivity computing4 / 21
HC-1: What is it?
5x Virtex 58x Stratix II
128GB DDR2 RAM80GB/s bandwidth
32 300Mhz cores
David Huw JonesGPU versus FPGA for high productivity computing5 / 21
GTX285: What is it?
a
aImage courtesy ofphoto.hardwarebistro.com
Cost ∼ £1kForm PCI card
FLOPS 1063 GRated 550 W
Typical 400 W
David Huw JonesGPU versus FPGA for high productivity computing6 / 21
GTX285: What is it?
192x 1.2Ghz cores1GB DDR3 RAM
120GB/s bandwidth
David Huw JonesGPU versus FPGA for high productivity computing7 / 21
Development options
David Huw JonesGPU versus FPGA for high productivity computing8 / 21
Development options considered
1. Why not custom personalities?I Convey estimate 3 month development timeI Average HPC job length 3 hours, max 24 days (SDSC)I Not OpenFPGA compliant, no open-source
2. Why not C - to - gates?I No compiler for HC-1 (yet)I Upper limit from application-specific personality benchmarkI Unfortunately, and despite 40 years of parallelizing compilers
for all sorts of machines, [optimization] algorithms don’t workterribly well (Ian Page 2004)
David Huw JonesGPU versus FPGA for high productivity computing9 / 21
Algorithmic skeletons
David Huw JonesGPU versus FPGA for high productivity computing10 / 21
Benchmarks
1. Random number generationI using Mersenne Twister
2. Matrix multiplicationI Floating pointI 32 and 64 bit
3. Sum of vectorI Floating pointI 32 and 64 bit
4. N-body simulationI 2nd orderI 2-D and 3-D
David Huw JonesGPU versus FPGA for high productivity computing11 / 21
Architecture and implementation of benchmarks
BenchmarksProcess Architecture 1 2 3 4Asynchronous Pipeline X X
Tree computation X
Crowd computation X
GPU Process implementationOptimised software X X X
FPGA Process implementationOptimised software X X
Optimised firmware X
David Huw JonesGPU versus FPGA for high productivity computing12 / 21
Expected results
I GPU: 192 cores at 1.2GHz will beat HC-1 64 cores at 300Mhz
I Synchronisation expensive on GPU
I 64-bit calculations expensive on GPU
I Custom firmware on HC-1 will beat software of GPU
David Huw JonesGPU versus FPGA for high productivity computing13 / 21
Random number generation
HC-1: 88.9xGPU: 89.3x
David Huw JonesGPU versus FPGA for high productivity computing14 / 21
Performance - Matrix multiplication
HC-1: 48.8xGPU: 190.4x
David Huw JonesGPU versus FPGA for high productivity computing15 / 21
Performance - Matrix multiplication
HC-1: 52.5xGPU: 98.0x
David Huw JonesGPU versus FPGA for high productivity computing16 / 21
Performance - Sum of vector
HC-1: 125.6xGPU: 306.4x
David Huw JonesGPU versus FPGA for high productivity computing17 / 21
Performance - Sum of vector
HC-1: 81.1xGPU: 109.3x
David Huw JonesGPU versus FPGA for high productivity computing18 / 21
Performance - N-body simulation
HC-1: 1.9xGPU: 43.2x
David Huw JonesGPU versus FPGA for high productivity computing19 / 21
Conclusions
I Both platforms outperformed CPU
I Most cases GPU outperformed HC-1I The exception used custom firmware
I Closed-sourceI Not standards-compliantI Developing HC-1 compatible code ≥3month task
I Further anecdotal evidenceI TOP500I Manufacturers, Cray et al.
David Huw JonesGPU versus FPGA for high productivity computing20 / 21
Acknowledgements
I Thanks toI EPSRC (Grants EP/C549481, EP/E045472).I Convey ComputerI Nvidia
I Questions?
David Huw JonesGPU versus FPGA for high productivity computing21 / 21