Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | amber-eaton |
View: | 214 times |
Download: | 1 times |
Design Space Exploration for Application Specific FPGAs in
System-on-a-Chip Designs
Mark Hammerquist, Roman LyseckyDepartment of Electrical and Computer Engineering
University of Arizona, Tucson AZ, [email protected], [email protected]
http://www.ece.arizona.edu/~embedded
2
Introduction and MotivationFPGAs vs. ASICs
FPGAs vs ASICs in SoC Designs Advantages of FPGAs
Programmed by downloading bits to the FPGA
Much like software executing on a microprocessor
Allows hardware modifications throughout the development cycle
And, even after manufacturing Correct costly design errors without requiring
respin
Dynamically reconfigurable FPGAs can be used to implement multiple
hardware circuits throughout its execution
Disadvantages of FPGAs 10-40x larger than ASICs 5-12x more power than ASICs 3-4x longer delay than ASICs
Kuon et al. FPGA 2006
University of Arizona
µP
Periphs
I$
D$
FPGA
µP
Periph(s)
I$
D$
ASIC
How can we take advantage of FPGAs without the significant overheads?
3
Introduction and MotivationApplication-Specific FPGAs
SoCs require fabrication Provides an opportunity to customize the
FPGA architecture Reduce area, reduce energy, improve
performance
Application-Specific FPGA Create an FPGA architecture tailored to
the specific hardware circuit Flexible-optimized
Optimized for one application, but flexible enough to implement other hardware circuits or additions
Fully-optimized Highly optimized for one application – only
flexible enough to support minor changes Trades off flexibility for smaller
area/power/delay
University of Arizona
HW Circuit
ASFPGA Generation
FPGA Architecture & Bitstream
µP
Periphs
I$
D$
FPGAASFPGA
4
Introduction and MotivationPrevious Work
University of Arizona
Researchers have investigated various methods for optimizing reconfigurable fabrics Levinthal et al. (DesignCon, 2005)
Coarse-grained reconfigurable logic cells with fixed routing
Aken’Ova et al. (IEEE Custom IC, 2005) FPGA-specific standard cells
Rose et al. (FPGA 2003, 2005) Auto generate transistor-level implementation of FPGA from
architectural description Enabling technology
Holland et al. (FPL 2004, 2005; FPGA, 2006) Automated tool flow for creating domain-specific reconfigurable logic Domains: floating point, arithmetic, encryption, sorters
5
Application-Specific FPGAs (ASFPGAs)Traditional FPGA CAD Tool Flow
Traditional CAD Tool Flow Utilize academic FPGA CAD tools to
map hardware circuits to target FPGA
Technology mapping (FlowMap) Packing (T-VPack) Placement and routing (VPR)
FPGA architecture is known a prioiri and represents the target FPGA
Application-Specific FPGA FPGA’s architectural features can
be tuned to the target hardware circuit
FPGA CAD tools can be utilized to explore the available architectural options
Currently focus on a creating a flexible-optimized ASFPGA
HW Circuit (BLIF)
Tech. Mapping (FlowMap)
Mapped Circuit (BLIF)
Packing(T-VPack)
Packed Circuit (Netlist)
Placement/Routing (VPR)
HW Bitstream Design Metrics (Area, Delay,
Energy)
LUT Size
CLB Size
Connectiv
ity/
Chann
el
Wid
th/FP
GA
Size
FPGA Arch.
University of Arizona
6
Application-Specific FPGAs (ASFPGAs)Design Space Exploration Framework
Design Space Exploration Framework Explores a set of configurable options
for the target FPGA Goal: Find lowest area/delay/power
FPGA architecture for target application
Configurable FPGA Options LUT Size:
3-, 4-, or 5-input LUTs
CLB Size: 2 or 4 LUT CLBs
Connection Block Connectivity: 100%, 90%, 80%, 70%, 60%
FPGA Size: NxN fixed size
Channel Width: 100%-130% of minimum channel width
More configurable options exist, but are not considered at this time
University of Arizona
HW Circuit (BLIF)
Tech. Mapping (FlowMap)
Mapped Circuit (BLIF)
Design Space Exploration for ASFPGAs
Packing/Activity Est. (T-VPack)
Packed Circuit (Netlist)
Switching Activity
Placement/Routing/Power Est. (VPR with Power Model)
HW Bitstream Design Metrics (Area, Delay,
Energy)
LUT Size
CLB Size
Connectiv
ity/
Channel
Wid
th/FP
GA
Size
FPGA Arch. & Bitstream
7
Application-Specific FPGAs (ASFPGAs)Experimental Setup
Experimental Setup Consider several MCNC benchmark
circuits of varying complexity alu4, apex6, bigkey, cordic, des,
dsip, misex1, mult32a, s1423, s298
Design Metric Calculation Delay is reported by VPR after
routing Power Model utilized to estimate
power consumption Poon et al. (TODAES 2005)
Area Routing area is reported by VPR Developed a transistor level
estimation method to determine CLB area requirements
University of Arizona
HW Circuit (BLIF)
Tech. Mapping (FlowMap)
Mapped Circuit (BLIF)
Design Space Exploration for ASFPGAs
Packing/Activity Est. (T-VPack)
Packed Circuit (Netlist)
Switching Activity
Placement/Routing/Power Est. (VPR with Power Model)
HW Bitstream Design Metrics (Area, Delay,
Energy)
LUT Size
CLB Size
Connectiv
ity/
Channel
Wid
th/FP
GA
Size
FPGA Arch. & Bitstream
8
Experimental ResultsASFPGA vs Delay/Energy/Area-Optimized FPGA
ASFPGA Optimized for one particular
hardware application Design space exploration
determined three best architectures for each circuit
Delay/Energy/Area-Optimized Best average delay, energy, or
area across all hardware circuits
Delay- and energy-optimized architecture:
5-input LUTs, 4 LUTs per CLB, 80% connectivity
Area-optimized architecture: 3-input LUTs, 2 LUTs per CLB,
90% connectivity
University of Arizona
9
Experimental Results ASFPGA vs Delay/Energy/Area-Optimized FPGA
ASFPGA provides good reductions over delay-optimized, energy-optimized, and area-optimized FPGAs 5% faster, 10% more energy efficient, or 17% smaller, on average
University of Arizona
0%
15%
30%
45%
60%
75%
alu4
apex
6
bigke
y
cord
icde
sds
ip
mise
x
mult
32
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
67% less energy 49% smaller26% faster
10
Experimental Results Experimental Results ASFPGA vs Balance-Optimized FPGA
ASFPGA Optimized for one particular
hardware application Design space exploration
determined three best architectures for each circuit
Balance-Optimized Balanced FPGA between delay,
energy, and area Selected FPGA architecture with
best average area/delay/energy (ADE) cost
ADE is average of the individual area, delay, energy costs for each FPGA across all benchmarks
Calculated as the area/delay/ energy for an architecture divided by max area/delay/ energy for that hardware circuit
FPGA architecture with best average ADE cost across all circuits:
5-input LUTs, 2 LUTs per CLB, 60% connectivityUniversity of Arizona
11
Experimental Results ASFPGA vs Balance-Optimized FPGA
ASFPGA can provide significant reductions in delay/energy/area over balance-optimized FPGA 25% faster, 36% more energy efficient, or 28% smaller, on average
University of Arizona
0%
20%
40%
60%
80%
100%
alu4
apex6
bigkey
cord
icdes
dsip
mise
x
mult3
2
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
73% less energy
49% less area
39% shorter delay
12
Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA
ASFPGA Optimized for one
particular hardware application
Design space exploration determined three best architectures for each circuit
Fixed-Size Balance-Optimized Limited to a fixed size and
balanced between area, delay, and energy
Fixed size is min size needed to support all hardware benchmarks considered
63x63 CLBs
University of Arizona
13
Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA
ASFPGA can provide significant reductions in delay/energy/area over fixed-size balance-optimized FPGA 50% faster, 75% more energy efficient, or 82% smaller, on average
University of Arizona
0%
20%
40%
60%
80%
100%
alu4
apex6
bigkey
cord
icdes
dsip
mise
x
mult3
2
s142
3s2
98
Avera
ge
Benchmark Circuit
Per
cen
tag
e R
edu
ctio
n Delay Energy Area
> 40% area savings for all circuits
> 60% energy savings for most circuits
14
Conclusions and Future Work
Conclusions Presented an initial design space exploration framework for
Application-Specific FPGAs Allows an FPGA architecture to be customized to a particular
hardware circuit before manufacturing Yet flexible enough to support changes to the hardware after
fabrication ASFPGAs are 5% faster, 10% more energy efficient, or 17% smaller
than traditional metric-optimized FPGAs As much as 50% faster, 75% more energy efficient, or 82% smaller, on
average, compared to fixed-size balance-optimized FPGA
Current/Future Work FPGA architecture customization that constructs/optimizes an
FPGA from the logic characteristics of the hardware circuit Potentially can provide significant additional savings by further customizing
individual CLBs and routing resources – but yields irregular FPGA fabric Requires new FPGA CAD tools to handle irregularity to support hardware
modifications
University of Arizona
15
Thanks
Questions?
University of Arizona