Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
1
A Timing-Driven Synthesis Approach of a Fast
Four-Stage Hybrid Adder in
Sum-of-Products
Sabyasachi DasSabyasachi DasUniversity of Colorado, BoulderUniversity of Colorado, Boulder
Sunil P. KhatriSunil P. KhatriTexas A&M UniversityTexas A&M University
2
What is a Sum-of-What is a Sum-of-Product (SOP)Product (SOP) An arithmetic Sum-of-Product block An arithmetic Sum-of-Product block
(SOP) consists of an arbitrary number of (SOP) consists of an arbitrary number of product terms and sum terms.product terms and sum terms.
General form of SOP:General form of SOP:
p = a * b
a b
q = c * d
c d
z = p + q + e + f
e f
z
p q
)()( 2222211111 edcbaedcbaz
3
Examples of SOP BlocksExamples of SOP Blocks Multiplier {Multiplier {assign z = a * b}assign z = a * b}
found in found in MicroprocessorsMicroprocessors
Multiply-Accumulator {Multiply-Accumulator {assign z = (a * b) + c}assign z = (a * b) + c} found infound in Cryptographic Applications Cryptographic Applications
Squarer {Squarer {assign z = a * a}assign z = a * a} found in found in DSP processorsDSP processors
Addition Tree {Addition Tree {assign z = a + b + c + assign z = a + b + c + d}d}
found in found in ALUALU, , Wireless applicationsWireless applications
Generalized SOP {Generalized SOP {assign z = (a * b) + (c assign z = (a * b) + (c * d)}* d)}
found infound in FIR filters, IIR filters FIR filters, IIR filters
4
Synthesis of Sum-of-Synthesis of Sum-of-ProductsProducts
Synthesis of Sum-of-Synthesis of Sum-of-Product blocks is done Product blocks is done in 3 steps (in the order in 3 steps (in the order of data-flow)of data-flow)
Creation of Partial Creation of Partial ProductsProducts
Reduction of Partial Reduction of Partial Products into 2 operandsProducts into 2 operands
Computation of Final Computation of Final Sum by adding the 2 Sum by adding the 2 operandsoperands
Creation ofPartial Products
Reduction ofPartial Products
Computation ofFinal Sum
Inputs
Output
5
Motivation and Problem Motivation and Problem StatementStatement
SOP blocks are widely used and SOP blocks are widely used and computationally-intensivecomputationally-intensive
Final adder in SOP consumes about Final adder in SOP consumes about 30% to 30% to 40%40% delay of the SOP block. This paper delay of the SOP block. This paper focuses on the synthesis of an efficient final focuses on the synthesis of an efficient final adder for a SOP expressionadder for a SOP expression
Stand-alone adder architectures do not work Stand-alone adder architectures do not work well in SOPwell in SOP
6
Stand-alone Adder Stand-alone Adder ArchitecturesArchitectures
Frequently used adder architecturesFrequently used adder architectures Ripple-Carry Ripple-Carry
Area-efficient, but slowArea-efficient, but slow Timing-efficient if inputs have skewed arrival timeTiming-efficient if inputs have skewed arrival time
Parallel-Prefix architecture (Brent-Kung, Kogge-Stone)Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Faster architectureFaster architecture Requires more areaRequires more area
Carry-SelectCarry-Select Large area overhead (often >100%)Large area overhead (often >100%) Better delay if CBetter delay if Cinin signal arrives late. signal arrives late.
None of these are very suitable in Sum-of-ProductsNone of these are very suitable in Sum-of-Products Why?Why?
7
Special Arrival-time Special Arrival-time PropertyProperty The 2 operands of the The 2 operands of the
final adder in a SOP final adder in a SOP exhibit a peculiar arrival exhibit a peculiar arrival time pattern time pattern
As a result, traditional As a result, traditional monolithic adders do not monolithic adders do not work well in SOPwork well in SOP Optimized for equal arrival Optimized for equal arrival
timestimes
Hence, Hence, hybrid addershybrid adders are required, which are required, which exploit this arrival-time exploit this arrival-time patternpattern
Hence it is critical to Hence it is critical to synthesize an efficient synthesize an efficient hybrid adder which is hybrid adder which is designed specifically designed specifically for SOP blocksfor SOP blocks
200
400
600
800
1000
4 6 8 10 12 14 16 18
Bit Number
Arriv
al T
ime
Arrival Times of input x
Arrival Times of input y
8
Proposed 4-Stage Hybrid Proposed 4-Stage Hybrid AdderAdder
SubAdder1 RippleCarry
w1w1
w1
SubAdder2 KoggeStone
w2w2
w2
SubAdder3 CarrySelect
w3w3
w3
SubAdder4 CarrySelect
w4w4
w4
Ripple-Carry architecture near LSBRipple-Carry architecture near LSB Fast Kogge-Stone architecture near MiddleFast Kogge-Stone architecture near Middle 2 Carry-Selects (based on Brent-Kung) near MSB2 Carry-Selects (based on Brent-Kung) near MSB GOAL : Find wGOAL : Find w1 1 , w, w2 2 , w, w33 and w and w44 algorithmically algorithmically
9
NotationsNotations
We use the following notations:We use the following notations: The bit-width of SubAdderThe bit-width of SubAdder11 (Ripple) is w (Ripple) is w11 bits bits
The bit-width of SubAdderThe bit-width of SubAdder22 (Kogge-Stone) is w (Kogge-Stone) is w22 bits bits
The bit-width of SubAdderThe bit-width of SubAdder33 (Carry-Select, Brent-Kung) is w (Carry-Select, Brent-Kung) is w33 bitsbits
The bit-width of SubAdderThe bit-width of SubAdder44 (Carry-Select, Brent-Kung) is w (Carry-Select, Brent-Kung) is w44 bitsbits
ww11 + w + w22 + w + w33 + w + w44 = n (total width of the hybrid adder) = n (total width of the hybrid adder)
T(aT(aii) = Time when input signal a) = Time when input signal aii is available is available
T(ST(Sii) = Time when output signal S) = Time when output signal Sii (Sum (Sumii) is available) is available
T(CT(Cii) = Time when output signal C) = Time when output signal Cii (Carry (Carryii) is available) is available
10
SubAdderSubAdder11 (Ripple-Carry) (Ripple-Carry)
Most area-efficient architectureMost area-efficient architecture Very slowVery slow Timing-efficient if input arrival time Timing-efficient if input arrival time
is skewed. We use it for a few bits is skewed. We use it for a few bits near LSB (which arrive earliest)near LSB (which arrive earliest)
FA
x0 y0
z0
FA
x1 y1
z1
FA
x2 y2
z2
FA
xk yk
zkzk+1
11
Parallel-Prefix Adders Parallel-Prefix Adders (KS, BK)(KS, BK)
In a Parallel-Prefix adder, Carry for each bit is In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the computed by an efficient tree-structure (using the GenerateGenerate and and Propagate Propagate concept).concept).
For each bit i of the adder, For each bit i of the adder, Generate (GGenerate (Gii)) indicates indicates whether a carry is generated from that bitwhether a carry is generated from that bit GGii = a = ai i bbi i
For each bit i of the adder, For each bit i of the adder, Propagate (PPropagate (Pii)) indicates indicates whether a carry is propagated through that bitwhether a carry is propagated through that bit PPii = a = ai i bbi i
The Generate and Propagate concept is extendable The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss to blocks comprising multiple bits, as we discuss nextnext
12
Parallel-Prefix Adders Parallel-Prefix Adders (KS, BK)(KS, BK)
If two blocks (comprising one or more bits) have the If two blocks (comprising one or more bits) have the GP value-pairs as (GGP value-pairs as (Gleftleft, P, Pleftleft) and (G) and (Grightright, P, Prightright), then ), then the combined block has the GP values as follows:the combined block has the GP values as follows: GGleft, rightleft, right = G = Gleft left (P (Pleftleft G Grightright)) PPleft, rightleft, right = P = Pleft left P Prightright
The above computation is performed The above computation is performed by a carry-operator or by a carry-operator or ”o”-operator”o”-operator
Once we obtain carry for each bit, Once we obtain carry for each bit, it is trivial to compute the sum it is trivial to compute the sum output of each bit (XOR and NAND)output of each bit (XOR and NAND)
(Gleft, Pleft)(Gright, Pright )
(Gleft, right, Pleft, right )
13
SubAdderSubAdder22 (Kogge-Stone) (Kogge-Stone)
Kogge-Stone Parallel prefix architectureKogge-Stone Parallel prefix architecture Delay: Delay: loglog22n n levelslevels of ”o”-operator of ”o”-operator Area: (Area: (n*logn*log22n)-n+1 n)-n+1 number of ”o”-operatornumber of ”o”-operator
GP3 GP2 GP1GP0GP7 GP6 GP5 GP4
C4 C3 C2C8 C7 C6 C5 C1
Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973
14
Brent-Kung (BK) Brent-Kung (BK)
Brent-Kung Parallel prefix architectureBrent-Kung Parallel prefix architecture Delay: Delay: (2*log(2*log22n)-2 n)-2 levels of ”o”-operatorlevels of ”o”-operator Area: Area: (2*n)-2-log(2*n)-2-log22n n number of ”o”-operatornumber of ”o”-operator
GP3 GP2 GP1GP0GP7 GP6 GP5 GP4
C4 C3 C2C8 C7 C6 C5 C1
Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982
15
SubAdderSubAdder3 3 & SubAdder& SubAdder44 (Carry-Select)(Carry-Select)
Adder1
yx
z1
Adder0 1’b0
x
z0Mux
z
cin
y
1’b1
Large area overheadLarge area overhead Used as a special case, since CUsed as a special case, since Cinin arrives late arrives late Speed depends on the architecture of two addersSpeed depends on the architecture of two adders
But these adders need not be KS (rather, we use BK)But these adders need not be KS (rather, we use BK) The arrival times of the inputs of SubAdderThe arrival times of the inputs of SubAdder33 and and
SubAdderSubAdder44 are earlier than those for SubAdder are earlier than those for SubAdder22
16
Determination of width of Determination of width of SubAdderSubAdder11
Width of the Ripple adder (SubAdderWidth of the Ripple adder (SubAdder11)) At every bit (i), compute T(CAt every bit (i), compute T(C i+1i+1) and ) and
check ifcheck if T(CT(Ci+1i+1) ≤ T(a) ≤ T(ai+1i+1)) T(CT(Ci+1i+1) ≤ T(b) ≤ T(bi+1i+1))
If check passes, i = i+1If check passes, i = i+1 Else continue checking until 3 Else continue checking until 3
consecutive bits fail the check (Hill consecutive bits fail the check (Hill Climbing)Climbing)
Return the value i as the Ripple Adder Return the value i as the Ripple Adder widthwidth
17
Determination of width of Determination of width of SubAdderSubAdder22
Width of Kogge-Stone Adder (SubAdderWidth of Kogge-Stone Adder (SubAdder22)) The latest arriving signals are part of this The latest arriving signals are part of this
adderadder Hence keep this adder wide, while ensuring Hence keep this adder wide, while ensuring
that this does not result in a very narrow that this does not result in a very narrow Carry-Select adder for SubAdderCarry-Select adder for SubAdder33 and and SubAdderSubAdder44
We determine the widths with the following We determine the widths with the following equation:equation: ww22 = n – w = n – w11 if (n-w if (n-w11) ≤ 8 ) ≤ 8 ww22 = 2 = 2pp, where p = log, where p = log22 (n-w (n-w11)) if (n-w if (n-w11) > 8) > 8
Example: If n=32 and w1=7 then w2=16
18
Delay of the Hybrid Delay of the Hybrid AdderAdder
SubAdder1 RippleCarry
w1w1
w1
SubAdder2 KoggeStone
w2w2
w2
SubAdder3 CarrySelect
w3w3
w3
SubAdder4 CarrySelect
w4w4
w4
Thybrid = max (T(C4), T(S4), T(S3), T(S2))
T(S2)T(S3)T(S4)T(C4)
19
Determination of widths of Determination of widths of SubAdderSubAdder3 3 andand SubAdderSubAdder44
Width of the two Carry-Select addersWidth of the two Carry-Select adders Initial width configuration
w3 = (n-w1-w2)/2
w4 = (n-w1-w2-w3)
With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide)
Use an iterative approach to explore in the Use an iterative approach to explore in the appropriate direction (similar to Binary appropriate direction (similar to Binary Search) and converge on the smallest delay Search) and converge on the smallest delay configurationconfiguration
20
Experimental SetupExperimental Setup To test our approach, we used:To test our approach, we used:
Adders in several different types of SOP blocks (Multipliers, MAC, generalized Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer)SOP and Squarer)
Two process technologies (Two process technologies (0.13µ and 0.09µ0.13µ and 0.09µ) ) Two Two commercial library vendorscommercial library vendors Two different Two different arrival timearrival time constraints constraints
We compared the results of our hybrid adder with the adder produced We compared the results of our hybrid adder with the adder produced by a by a commercial datapath synthesis toolcommercial datapath synthesis tool..
21
ResultsResults
On an average, 14.31% faster than the result of the commercial Synthesis tool (with 6.62% area penalty)
0
200
400
600
800
1000
1200
1400
Adder-75 Adder-35 Adder-68 Adder-57 Adder-47 Adder-61 Adder-89
Name of The Adder
Wo
rst-
case
Dela
y (
ps)
Delay of the Adder Produced by Commercial Tool Delay of Our Proposed Adder
22
SummarySummary
Hybrid adder consists of 4 SubAddersHybrid adder consists of 4 SubAdders SubAdderSubAdder11 has Ripple-Carry architecture has Ripple-Carry architecture SubAdderSubAdder22 has Kogge-Stone architecture has Kogge-Stone architecture SubAdderSubAdder3 3 and SubAdderand SubAdder4 4 have Carry-have Carry-
Select (based on Brent-Kung) Select (based on Brent-Kung) architecturearchitecture
Widths of all SubAdders are computed Widths of all SubAdders are computed based on a timing-driven analysisbased on a timing-driven analysis
On an average, 14.31% faster (with On an average, 14.31% faster (with 6.62% area penalty)6.62% area penalty)
23
Thank youThank you