Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | scarlett-mccarthy |
View: | 214 times |
Download: | 0 times |
Rollins 156 / MAPLD 20051
Reducing Energy in FPGA Multipliers Through Glitch
Reduction
Nathan Rollins and Michael J. Wirthlin
Department of Electrical and Computer EngineeringBrigham Young University
Provo, UT
This work was supported by the NASA Earth-Sun System Technology Office as sub-contract with USC-ISI
Rollins 156 / MAPLD 20052
FPGAs’ High Power Consumption
• Flexibility and reprogrammability result in greater power consumption relative to ASICs
• Static power is insignificant compared to dynamic power consumption
• Dynamic power consumption:
Pavg = ½ Σ Cn·fn·V2
n є nets
Rollins 156 / MAPLD 20053
FPGAs’ High Power Consumption
• fn term represents the net switching activity
• Some net switching activity is unproductive: glitches
• Large amount of dynamic switching power wasted in glitches
• Goal: Lower energy by reducing the amount of glitching
Rollins 156 / MAPLD 20054
FPGA Glitching Example
LUT 4
A B C D OUT0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
ABCD
OUT
0
• Glitching caused by unequal logic and interconnect delays
Rollins 156 / MAPLD 20055
FPGA Glitching Example
LUT 4
A B C D OUT0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
ABCD
OUT
1
1
• Glitching caused by unequal logic and interconnect delays
Rollins 156 / MAPLD 20056
FPGA Glitching Example
LUT 4
A B C D OUT0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
ABCD
OUT
1
1
0
Glitch
• Glitching caused by unequal logic and interconnect delays
Rollins 156 / MAPLD 20057
FPGA Glitching Example
LUT 4
A B C D OUT0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
ABCD
OUT
1
1
1
Glitch1
Glitch
• Glitching caused by unequal logic and interconnect delays
Rollins 156 / MAPLD 20058
FPGA Glitching Example
LUT 4
A B C D OUT0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
ABCD
OUT
1
1
1
Glitch1
Glitch1
• Glitching caused by unequal logic and interconnect delays
Rollins 156 / MAPLD 20059
Power Classification
• Design Static Power: divide the total static power of the device by the relative size of the circuit Total Static Power / (Circuit LUTs / Total LUTs)
• Dynamic Glitching Power: % of signal glitches to total transitions is used to divide dynamic power into dynamic glitching and useful dynamic power
• Useful Dynamic Power: the “useful” transitions of the circuit
Rollins 156 / MAPLD 200510
Reduce Glitches with Pipelining
• Pipelined designs have less logic and interconnect between registers
• Pipelining causes long routes to be broken up
• Pipelining in FPGAs can come at little additional cost
Rollins 156 / MAPLD 200511
Pipelined Multiplier
• Long carry chain paths of multiplier stages are ideal for pipelining
• Pipelining gradually inserted in multipliers of different bit widths: – 4x4– 8x8– 16x16– 32x32
Rollins 156 / MAPLD 200512
Multiplier Power Classification
12.5%0.2%
87.3%
4-Bit
46.6%0.2%
53.2%
8-Bit
16-Bit
68.2%0.1%
31.7%
32-Bit
75.9%0.0%
24.1%
Dynamic Glitch Power
Useful Dynamic Power
Static Power
Rollins 156 / MAPLD 200513
Reduce Glitches with PipeliningGlitching as a Percentage of Total Transitions
010
2030
405060
7080
90100
0 1 2 4 8 16 32
Number of Pipeline Stages
4-bit
8-bit
16-bit
32-bit
Dynamic Glitching Power as a Percentage of Total Power
0
10
20
30
40
50
60
70
80
0 1 2 4 8 16 32
Number of Pipeline Stages
4-bit
8-bit
16-bit
32-bit
• Pipelining reduces glitching and lowers power
Rollins 156 / MAPLD 200514
Extreme Pipelining: Digit-Serial
• In an FPGA an NxN array multiplier can have N pipeline stages
• A digit-serial multiplier provides pipelining at a smaller granularity
• Digit-serial operations can increase throughput – but also increase latency
• Different digit sizes of digit-serial multiplier used: 1, 2, 4, 8, 16, 32
Rollins 156 / MAPLD 200515
Pipelined vs. Digit-Serial Multiplier: Total Power Consumption
Total Power Consumption
30
35
40
45
50
55
60
65
70
1 2 4 8 16 32
Digit Size
(mW
)
4-bit
8-bit
16-bit
32-bit
Total Power Consumption
1
10
100
1000
10000
0 1 2 4 8 16 32
Number of Pipeline Stages
(mW
)
4-bit
8-bit
16-bit
32-bit
• Digit-serial multiplier has almost no glitching - dynamic glitching power accounts for < 1% of total power
Array Multipliers Digit-serial Multipliers
Rollins 156 / MAPLD 200516
Operation Energy
• Most studies focus on quantifying circuit design power only – often energy is a more useful metric
• Four metrics can be used for energy consumption– Energy per Operation– Energy Delay– Energy Throughput– Energy Density
Rollins 156 / MAPLD 200517
Pipelined vs. Digit-Serial Multiplier: Energy Per Operation
Energy Per Operation
0.1
1
10
100
1000
0 1 2 4 8 16 32
Number of Pipeline Stages
(nJ)
4-bit
8-bit
16-bit
32-bit
Energy Per Operation
1
10
100
1 2 4 8 16 32
Digit Size
(nJ)
4-bit
8-bit
16-bit
32-bit
Array Multipliers Digit-serial Multipliers
• Quantifies the amount of energy required to complete a single operation (in nJ)
Eop = P·tclk·n
Rollins 156 / MAPLD 200518
Pipelined vs. Digit-Serial Multiplier: Energy Delay
Energy Delay
10
100
1000
1 2 4 8 16 32
Digit Size
(nJ
ns)
4-bit
8-bit
16-bit
32-bit
Energy Delay
1
10
100
1000
10000
100000
0 1 2 4 8 16 32
Number of Pipeline Stages
(nJ
ns)
4-bit
8-bit
16-bit
32-bit
Array Multipliers Digit-serial Multipliers
• Combines the energy efficiency and speed of an operator into a single parameter (in nJ ns)
Edelay = P·tclk·tmin·n
Rollins 156 / MAPLD 200519
Pipelined vs. Digit-Serial Multiplier: Energy Throughput
Energy Throughput
1
10
100
1000
10000
100000
0 1 2 4 8 16 32
Number of Pipeline Stages
(nJ
ns)
4-bit
8-bit
16-bit
32-bit
Energy Throughput
10
100
1000
1 2 4 8 16 32
Digit Size
(nJ
ns)
4-bit
8-bit
16-bit
32-bit
Array Multipliers Digit-serial Multipliers
• Operation pipelined version of energy delayEthput = P·tclk·tmin·δ
Rollins 156 / MAPLD 200520
Pipelined vs. Digit-Serial Multiplier: Energy Density
Energy Density
0.1
1
10
100
1 2 4 8 16 32
Digit Size
(pJ/
LU
T) 4-bit
8-bit
16-bit
32-bit
Energy Density
10
100
1000
0 1 2 4 8 16 32
Number of Pipeline Stages
(pJ/
LU
T) 4-bit
8-bit
16-bit
32-bit
Array Multipliers Digit-serial Multipliers
• Normalizes the amount of energy used to perform a single operation to the logic resources used
Edensity = P·tclk/Area
Rollins 156 / MAPLD 200521
Pipelined vs. Digit-Serial Multiplier: Clock Energy Increase
Clock Energy
0.1
1
10
0 1 2 4 8 16 32
Number of Pipeline Stages
(nJ)
4-bit
8-bit
16-bit
32-bit
Clock Energy
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
1 2 4 8 16 32
Digit Size
(nJ)
4-bit
8-bit
16-bit
32-bit
Array Multipliers Digit-serial Multipliers
• In contrasts to an ASIC, there is very little or no increase in clock energy as pipeline depths or digit sizes are increased
Rollins 156 / MAPLD 200522
Conclusions and Future Work
• Glitch power is often a significant percentage of total consumed power
• Up to 76% in an array multiplier
• Reducing glitching is essential for low power designs
Rollins 156 / MAPLD 200523
Conclusions and Future Work
• Pipelining is an effective way of reducing glitches
• Digit-serial multiplier almost eliminates glitches
• Reducing glitching by pipelining reduces power consumption
• Up to 96% in an array multiplier
Rollins 156 / MAPLD 200524
Conclusions and Future Work
• More information that just raw power consumption is required for effective low-power designs
• Different energy metrics can provide this extra information
• A high-level synthesis tool can use this information to produce low power designs