Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | jairo-wilemon |
View: | 220 times |
Download: | 0 times |
Presented by
Field-Programmable Gate Array Research Speeds HPC “up to 100X”
Olaf O. StoraasliFuture Technologies Group
Computer Science and Mathematics Division
2 Storaasli_FPGA_SC07
Contents Background: Why FPGAs?
ORNL success: FPGA systems, tools and up to 100X speedup
Partners: Research Lab, , SRC,
THE SUPERCOMPUTER COMPANY
Explore FPGAs for future ORNL HPC
Virtex4 FPGA blades “accelerate mission-critical applications > 100X.”
Steve Scott, CTO HPCWire 24/3/2006
“After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.”
ORNL benefit: Exceed petaflops and reduce power
Why HPC vendors offer FPGAs
, ,
3 Storaasli_FPGA_SC07
FPGA Logic slice
What’s an FPGA?Your “custom chip”
Xilinx Virtex4 FPGA: 25K slices (miniCPUs) Logic array: user-tailored to application
On-chip RAM, multipliers and PowerPCs
Gigabit transceivers/DSP blocks => FastIO/precision
100–1000 operations/clock cycle
4 Storaasli_FPGA_SC07
0
100
200
300
Computation(GOPS)
Memory Bandwidth(GB/sec)
IO Bandwidth(Gbps)
Pentium
Virtex-4FPGAVirtex4
Pentium
Why FPGAs? Performance—optimal silicon use
(maximize parallel ops/cycle)
Rapid growth—cells, speed, I/O
Power—1/10th CPUs
Flexible—tailor to application
1000
800
600
400
200
00
100
200
300
400
500
600
700
2002 2004 2006 2008
Th
ou
sa
nd
s
Logic Cells
MH
z
Clock speed (MHz)
5 Storaasli_FPGA_SC07
Cray XD1
ORNL FPGA hardware/tools
SRC-6 (Carte), Digilent (Viva, VHDL), Nallatech (Viva)
Cray XD1 (MitrionC, VHDL): 6 FPGAs + 144 Opterons
SGI RASC-Altix/Virtex4s (MitrionC)
CHiMPS (Bee2 => Cray XD1 => DRC => XT4)
RASC sgi
6 Storaasli_FPGA_SC07
Find parallelism: 80% FFTs
More GF/$ GF/Watt
Goal
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Profile
Model faster
Ported HPC code spectral transform shallow water model (STSWM) to FPGAs
FTRNDE
FTRNPE
FTTdd
UV FFT
SHTRNS FFT
COMP1
STEP
FTRNEX FTRNVX
8 calls in parallel
3 functions in parallel
2 calls in parallel
HLL compilerCHiMPS, Mitrion(FPGA Tools Inside) FPGA
speedup
HLL developerprofiles
7 Storaasli_FPGA_SC07
Viva: Graphical icons—3-dimensional MitrionC: Text/flow—1-dimensional
Exploring programming options
Gauss matrix solver Compiler, simulator, and debugger
+ Carte/SRC, CHiMPS-VHDL/Xilinx ,
8 Storaasli_FPGA_SC07
*FPGA vs 2.2 GHz Opteron
First mixed-precision LU and solver for FPGAs
Benefits:High performance of LP arithmetic
High precision accuracy
Speedup increases with matrix size
(as LU dominates calculations)
Design Double FP Single FP S10e5
PE Amount 8 16 32
Max size 128 256 256
Achievablefrequency
120 MHz 150 MHz 150 MHz
Slices 27,005 (57%) 14,792 (59%) 14,730 (62%)
BRAMs 68 (29%) 129 (55%) 65 (28%)
MULT18X18 128 (55%) 64 (27%) 32 (13%)
0
200
400
600
800
1000
Exe
cutio
n t
ime
(u
s)
64 96 128Matrix size
8757
149218
133
404
865
443
258
S10e5
Single
Double
0
20
30
Sp
ee
du
p
double single S10e5
10.97.7
21.3
9.7 10.3
36.6LU
Solver
Design data type
10
40
37X* LU decomposition speedup10X for matrix equation solver
9 Storaasli_FPGA_SC07
FP
GA
Sp
eed
up
8 hrs => 5 min
*Virtex-4 FPGA vs 2.2 GHz Opteron on Cray XD1
120
100
80
60
40
20
026 28 30 32 34 36 38 40
Genome sequence
8K w/align
16K w/align
8K w/o align
16K w/o align
100X* DNA sequence speedupBacillus anthracis human DNA comparison
24#
# 24= Sequence AE17024
10 Storaasli_FPGA_SC07
FPGA speedup growswith query size
11 Storaasli_FPGA_SC07
Acknowledgement:
This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Summary
ORNL FPGA research: Increasing HPC relevence FPGA systems: Cray, SRC, Nallatech, Digilent, SGI Compilers: Mitrion-C, Carte, Viva, DSPlogic, CHiMPS Speedup: 10X eqn soln, 100X DNA sequencing Partners: Xilinx, UT, Mitrion, Cray, SGI
Next: Explore DRC, more FPGAs and CHiMPS
12 Storaasli_FPGA_SC07
Contact
12 Storaasli_ReconfigHPC_SC07
Olaf StoraasliFuture Technologies GroupComputer Science and Mathematics [email protected]
Google Olaf ORNL