+ All Categories
Home > Documents > Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future...

Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future...

Date post: 31-Mar-2015
Category:
Upload: jairo-wilemon
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
12
Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics Division
Transcript
Page 1: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

Presented by

Field-Programmable Gate Array Research Speeds HPC “up to 100X”

Olaf O. StoraasliFuture Technologies Group

Computer Science and Mathematics Division

Page 2: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

2 Storaasli_FPGA_SC07

Contents Background: Why FPGAs?

ORNL success: FPGA systems, tools and up to 100X speedup

Partners: Research Lab, , SRC,

THE SUPERCOMPUTER COMPANY

Explore FPGAs for future ORNL HPC

Virtex4 FPGA blades “accelerate mission-critical applications > 100X.”

Steve Scott, CTO HPCWire 24/3/2006

“After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.”

ORNL benefit: Exceed petaflops and reduce power

Why HPC vendors offer FPGAs

, ,

Page 3: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

3 Storaasli_FPGA_SC07

FPGA Logic slice

What’s an FPGA?Your “custom chip”

Xilinx Virtex4 FPGA: 25K slices (miniCPUs) Logic array: user-tailored to application

On-chip RAM, multipliers and PowerPCs

Gigabit transceivers/DSP blocks => FastIO/precision

100–1000 operations/clock cycle

Page 4: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

4 Storaasli_FPGA_SC07

0

100

200

300

Computation(GOPS)

Memory Bandwidth(GB/sec)

IO Bandwidth(Gbps)

Pentium

Virtex-4FPGAVirtex4

Pentium

Why FPGAs? Performance—optimal silicon use

(maximize parallel ops/cycle)

Rapid growth—cells, speed, I/O

Power—1/10th CPUs

Flexible—tailor to application

1000

800

600

400

200

00

100

200

300

400

500

600

700

2002 2004 2006 2008

Th

ou

sa

nd

s

Logic Cells

MH

z

Clock speed (MHz)

Page 5: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

5 Storaasli_FPGA_SC07

Cray XD1

ORNL FPGA hardware/tools

SRC-6 (Carte), Digilent (Viva, VHDL), Nallatech (Viva)

Cray XD1 (MitrionC, VHDL): 6 FPGAs + 144 Opterons

SGI RASC-Altix/Virtex4s (MitrionC)

CHiMPS (Bee2 => Cray XD1 => DRC => XT4)

RASC sgi

Page 6: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

6 Storaasli_FPGA_SC07

Find parallelism: 80% FFTs

More GF/$ GF/Watt

Goal

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Profile

Model faster

Ported HPC code spectral transform shallow water model (STSWM) to FPGAs

FTRNDE

FTRNPE

FTTdd

UV FFT

SHTRNS FFT

COMP1

STEP

FTRNEX FTRNVX

8 calls in parallel

3 functions in parallel

2 calls in parallel

HLL compilerCHiMPS, Mitrion(FPGA Tools Inside) FPGA

speedup

HLL developerprofiles

Page 7: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

7 Storaasli_FPGA_SC07

Viva: Graphical icons—3-dimensional MitrionC: Text/flow—1-dimensional

Exploring programming options

Gauss matrix solver Compiler, simulator, and debugger

+ Carte/SRC, CHiMPS-VHDL/Xilinx ,

Page 8: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

8 Storaasli_FPGA_SC07

*FPGA vs 2.2 GHz Opteron

First mixed-precision LU and solver for FPGAs

Benefits:High performance of LP arithmetic

High precision accuracy

Speedup increases with matrix size

(as LU dominates calculations)

Design Double FP Single FP S10e5

PE Amount 8 16 32

Max size 128 256 256

Achievablefrequency

120 MHz 150 MHz 150 MHz

Slices 27,005 (57%) 14,792 (59%) 14,730 (62%)

BRAMs 68 (29%) 129 (55%) 65 (28%)

MULT18X18 128 (55%) 64 (27%) 32 (13%)

0

200

400

600

800

1000

Exe

cutio

n t

ime

(u

s)

64 96 128Matrix size

8757

149218

133

404

865

443

258

S10e5

Single

Double

0

20

30

Sp

ee

du

p

double single S10e5

10.97.7

21.3

9.7 10.3

36.6LU

Solver

Design data type

10

40

37X* LU decomposition speedup10X for matrix equation solver

Page 9: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

9 Storaasli_FPGA_SC07

FP

GA

Sp

eed

up

8 hrs => 5 min

*Virtex-4 FPGA vs 2.2 GHz Opteron on Cray XD1

120

100

80

60

40

20

026 28 30 32 34 36 38 40

Genome sequence

8K w/align

16K w/align

8K w/o align

16K w/o align

100X* DNA sequence speedupBacillus anthracis human DNA comparison

24#

# 24= Sequence AE17024

Page 10: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

10 Storaasli_FPGA_SC07

FPGA speedup growswith query size

Page 11: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

11 Storaasli_FPGA_SC07

Acknowledgement:

This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Summary

ORNL FPGA research: Increasing HPC relevence FPGA systems: Cray, SRC, Nallatech, Digilent, SGI Compilers: Mitrion-C, Carte, Viva, DSPlogic, CHiMPS Speedup: 10X eqn soln, 100X DNA sequencing Partners: Xilinx, UT, Mitrion, Cray, SGI

Next: Explore DRC, more FPGAs and CHiMPS

Page 12: Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

12 Storaasli_FPGA_SC07

Contact

12 Storaasli_ReconfigHPC_SC07

Olaf StoraasliFuture Technologies GroupComputer Science and Mathematics [email protected]

Google Olaf ORNL


Recommended