+ All Categories
Home > Documents > GPU versus FPGA for high productivity...

GPU versus FPGA for high productivity...

Date post: 22-May-2020
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
21
GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell, Christos-Savvas Bouganis, Peter Y. K. Cheung August 26, 2010 David Huw Jones GPU versus FPGA for high productivity computing 1 / 21
Transcript
Page 1: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

GPU versus FPGA for high productivitycomputing

David Huw Jones, Adam Powell, Christos-Savvas Bouganis,Peter Y. K. Cheung

August 26, 2010

David Huw JonesGPU versus FPGA for high productivity computing1 / 21

Page 2: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Overview

I Just another GPU vs FPGA comparison?I Productivity versus performanceI First review of new COTS FPGA system, HC-1I Benchmarked on different process architectures.I Analysis from point of view of HPC user.

David Huw JonesGPU versus FPGA for high productivity computing2 / 21

Page 3: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Contents

I Devices

I Benchmarks

I Results

I Conclusions

David Huw JonesGPU versus FPGA for high productivity computing3 / 21

Page 4: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

HC-1: What is it?

a

aImage courtesy of ConveyComputer

Cost ∼ £30kForm 2U

FLOPS 65 GRated 1.4 kW

Typical 600 W

David Huw JonesGPU versus FPGA for high productivity computing4 / 21

Page 5: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

HC-1: What is it?

5x Virtex 58x Stratix II

128GB DDR2 RAM80GB/s bandwidth

32 300Mhz cores

David Huw JonesGPU versus FPGA for high productivity computing5 / 21

Page 6: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

GTX285: What is it?

a

aImage courtesy ofphoto.hardwarebistro.com

Cost ∼ £1kForm PCI card

FLOPS 1063 GRated 550 W

Typical 400 W

David Huw JonesGPU versus FPGA for high productivity computing6 / 21

Page 7: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

GTX285: What is it?

192x 1.2Ghz cores1GB DDR3 RAM

120GB/s bandwidth

David Huw JonesGPU versus FPGA for high productivity computing7 / 21

Page 8: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Development options

David Huw JonesGPU versus FPGA for high productivity computing8 / 21

Page 9: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Development options considered

1. Why not custom personalities?I Convey estimate 3 month development timeI Average HPC job length 3 hours, max 24 days (SDSC)I Not OpenFPGA compliant, no open-source

2. Why not C - to - gates?I No compiler for HC-1 (yet)I Upper limit from application-specific personality benchmarkI Unfortunately, and despite 40 years of parallelizing compilers

for all sorts of machines, [optimization] algorithms don’t workterribly well (Ian Page 2004)

David Huw JonesGPU versus FPGA for high productivity computing9 / 21

Page 10: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Algorithmic skeletons

David Huw JonesGPU versus FPGA for high productivity computing10 / 21

Page 11: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Benchmarks

1. Random number generationI using Mersenne Twister

2. Matrix multiplicationI Floating pointI 32 and 64 bit

3. Sum of vectorI Floating pointI 32 and 64 bit

4. N-body simulationI 2nd orderI 2-D and 3-D

David Huw JonesGPU versus FPGA for high productivity computing11 / 21

Page 12: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Architecture and implementation of benchmarks

BenchmarksProcess Architecture 1 2 3 4Asynchronous Pipeline X X

Tree computation X

Crowd computation X

GPU Process implementationOptimised software X X X

FPGA Process implementationOptimised software X X

Optimised firmware X

David Huw JonesGPU versus FPGA for high productivity computing12 / 21

Page 13: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Expected results

I GPU: 192 cores at 1.2GHz will beat HC-1 64 cores at 300Mhz

I Synchronisation expensive on GPU

I 64-bit calculations expensive on GPU

I Custom firmware on HC-1 will beat software of GPU

David Huw JonesGPU versus FPGA for high productivity computing13 / 21

Page 14: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Random number generation

HC-1: 88.9xGPU: 89.3x

David Huw JonesGPU versus FPGA for high productivity computing14 / 21

Page 15: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Performance - Matrix multiplication

HC-1: 48.8xGPU: 190.4x

David Huw JonesGPU versus FPGA for high productivity computing15 / 21

Page 16: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Performance - Matrix multiplication

HC-1: 52.5xGPU: 98.0x

David Huw JonesGPU versus FPGA for high productivity computing16 / 21

Page 17: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Performance - Sum of vector

HC-1: 125.6xGPU: 306.4x

David Huw JonesGPU versus FPGA for high productivity computing17 / 21

Page 18: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Performance - Sum of vector

HC-1: 81.1xGPU: 109.3x

David Huw JonesGPU versus FPGA for high productivity computing18 / 21

Page 19: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Performance - N-body simulation

HC-1: 1.9xGPU: 43.2x

David Huw JonesGPU versus FPGA for high productivity computing19 / 21

Page 20: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Conclusions

I Both platforms outperformed CPU

I Most cases GPU outperformed HC-1I The exception used custom firmware

I Closed-sourceI Not standards-compliantI Developing HC-1 compatible code ≥3month task

I Further anecdotal evidenceI TOP500I Manufacturers, Cray et al.

David Huw JonesGPU versus FPGA for high productivity computing20 / 21

Page 21: GPU versus FPGA for high productivity computingconferenze.dei.polimi.it/FPL2010/presentations/T2_B_2.pdf · GPU versus FPGA for high productivity computing David Huw Jones, Adam Powell,

Acknowledgements

I Thanks toI EPSRC (Grants EP/C549481, EP/E045472).I Convey ComputerI Nvidia

I Questions?

David Huw JonesGPU versus FPGA for high productivity computing21 / 21


Recommended