1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi...

1

A Timing-Driven Synthesis Approach of a Fast

Four-Stage Hybrid Adder in

Sum-of-Products

Sabyasachi DasSabyasachi DasUniversity of Colorado, BoulderUniversity of Colorado, Boulder

Sunil P. KhatriSunil P. KhatriTexas A&M UniversityTexas A&M University

2

What is a Sum-of-What is a Sum-of-Product (SOP)Product (SOP) An arithmetic Sum-of-Product block An arithmetic Sum-of-Product block

(SOP) consists of an arbitrary number of (SOP) consists of an arbitrary number of product terms and sum terms.product terms and sum terms.

General form of SOP:General form of SOP:

p = a * b

a b

q = c * d

c d

z = p + q + e + f

e f

z

p q

)()( 2222211111 edcbaedcbaz

3

Examples of SOP BlocksExamples of SOP Blocks Multiplier {Multiplier {assign z = a * b}assign z = a * b}

found in found in MicroprocessorsMicroprocessors

Multiply-Accumulator {Multiply-Accumulator {assign z = (a * b) + c}assign z = (a * b) + c} found infound in Cryptographic Applications Cryptographic Applications

Squarer {Squarer {assign z = a * a}assign z = a * a} found in found in DSP processorsDSP processors

Addition Tree {Addition Tree {assign z = a + b + c + assign z = a + b + c + d}d}

found in found in ALUALU, , Wireless applicationsWireless applications

Generalized SOP {Generalized SOP {assign z = (a * b) + (c assign z = (a * b) + (c * d)}* d)}

found infound in FIR filters, IIR filters FIR filters, IIR filters

4

Synthesis of Sum-of-Synthesis of Sum-of-ProductsProducts

Synthesis of Sum-of-Synthesis of Sum-of-Product blocks is done Product blocks is done in 3 steps (in the order in 3 steps (in the order of data-flow)of data-flow)

Creation of Partial Creation of Partial ProductsProducts

Reduction of Partial Reduction of Partial Products into 2 operandsProducts into 2 operands

Computation of Final Computation of Final Sum by adding the 2 Sum by adding the 2 operandsoperands

Creation ofPartial Products

Reduction ofPartial Products

Computation ofFinal Sum

Inputs

Output

5

Motivation and Problem Motivation and Problem StatementStatement

SOP blocks are widely used and SOP blocks are widely used and computationally-intensivecomputationally-intensive

Final adder in SOP consumes about Final adder in SOP consumes about 30% to 30% to 40%40% delay of the SOP block. This paper delay of the SOP block. This paper focuses on the synthesis of an efficient final focuses on the synthesis of an efficient final adder for a SOP expressionadder for a SOP expression

Stand-alone adder architectures do not work Stand-alone adder architectures do not work well in SOPwell in SOP

6

Stand-alone Adder Stand-alone Adder ArchitecturesArchitectures

Frequently used adder architecturesFrequently used adder architectures Ripple-Carry Ripple-Carry

Area-efficient, but slowArea-efficient, but slow Timing-efficient if inputs have skewed arrival timeTiming-efficient if inputs have skewed arrival time

Parallel-Prefix architecture (Brent-Kung, Kogge-Stone)Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Faster architectureFaster architecture Requires more areaRequires more area

Carry-SelectCarry-Select Large area overhead (often >100%)Large area overhead (often >100%) Better delay if CBetter delay if Cinin signal arrives late. signal arrives late.

None of these are very suitable in Sum-of-ProductsNone of these are very suitable in Sum-of-Products Why?Why?

7

Special Arrival-time Special Arrival-time PropertyProperty The 2 operands of the The 2 operands of the

final adder in a SOP final adder in a SOP exhibit a peculiar arrival exhibit a peculiar arrival time pattern time pattern

As a result, traditional As a result, traditional monolithic adders do not monolithic adders do not work well in SOPwork well in SOP Optimized for equal arrival Optimized for equal arrival

timestimes

Hence, Hence, hybrid addershybrid adders are required, which are required, which exploit this arrival-time exploit this arrival-time patternpattern

Hence it is critical to Hence it is critical to synthesize an efficient synthesize an efficient hybrid adder which is hybrid adder which is designed specifically designed specifically for SOP blocksfor SOP blocks

200

400

600

800

1000

4 6 8 10 12 14 16 18

Bit Number

Arriv

al T

ime

Arrival Times of input x

Arrival Times of input y

8

Proposed 4-Stage Hybrid Proposed 4-Stage Hybrid AdderAdder

SubAdder1 RippleCarry

w1w1

w1

SubAdder2 KoggeStone

w2w2

w2

SubAdder3 CarrySelect

w3w3

w3


w4w4

w4

Ripple-Carry architecture near LSBRipple-Carry architecture near LSB Fast Kogge-Stone architecture near MiddleFast Kogge-Stone architecture near Middle 2 Carry-Selects (based on Brent-Kung) near MSB2 Carry-Selects (based on Brent-Kung) near MSB GOAL : Find wGOAL : Find w1 1 , w, w2 2 , w, w33 and w and w44 algorithmically algorithmically

9

NotationsNotations

We use the following notations:We use the following notations: The bit-width of SubAdderThe bit-width of SubAdder11 (Ripple) is w (Ripple) is w11 bits bits

The bit-width of SubAdderThe bit-width of SubAdder22 (Kogge-Stone) is w (Kogge-Stone) is w22 bits bits

The bit-width of SubAdderThe bit-width of SubAdder33 (Carry-Select, Brent-Kung) is w (Carry-Select, Brent-Kung) is w33 bitsbits

The bit-width of SubAdderThe bit-width of SubAdder44 (Carry-Select, Brent-Kung) is w (Carry-Select, Brent-Kung) is w44 bitsbits

ww11 + w + w22 + w + w33 + w + w44 = n (total width of the hybrid adder) = n (total width of the hybrid adder)

T(aT(aii) = Time when input signal a) = Time when input signal aii is available is available

T(ST(Sii) = Time when output signal S) = Time when output signal Sii (Sum (Sumii) is available) is available

T(CT(Cii) = Time when output signal C) = Time when output signal Cii (Carry (Carryii) is available) is available

10

SubAdderSubAdder11 (Ripple-Carry) (Ripple-Carry)

Most area-efficient architectureMost area-efficient architecture Very slowVery slow Timing-efficient if input arrival time Timing-efficient if input arrival time

is skewed. We use it for a few bits is skewed. We use it for a few bits near LSB (which arrive earliest)near LSB (which arrive earliest)

FA

x0 y0

z0

FA

x1 y1

z1

FA

x2 y2

z2

FA

xk yk

zkzk+1

11

Parallel-Prefix Adders Parallel-Prefix Adders (KS, BK)(KS, BK)

In a Parallel-Prefix adder, Carry for each bit is In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the computed by an efficient tree-structure (using the GenerateGenerate and and Propagate Propagate concept).concept).

For each bit i of the adder, For each bit i of the adder, Generate (GGenerate (Gii)) indicates indicates whether a carry is generated from that bitwhether a carry is generated from that bit GGii = a = ai i bbi i

For each bit i of the adder, For each bit i of the adder, Propagate (PPropagate (Pii)) indicates indicates whether a carry is propagated through that bitwhether a carry is propagated through that bit PPii = a = ai i bbi i

The Generate and Propagate concept is extendable The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss to blocks comprising multiple bits, as we discuss nextnext

12

Parallel-Prefix Adders Parallel-Prefix Adders (KS, BK)(KS, BK)

If two blocks (comprising one or more bits) have the If two blocks (comprising one or more bits) have the GP value-pairs as (GGP value-pairs as (Gleftleft, P, Pleftleft) and (G) and (Grightright, P, Prightright), then ), then the combined block has the GP values as follows:the combined block has the GP values as follows: GGleft, rightleft, right = G = Gleft left (P (Pleftleft G Grightright)) PPleft, rightleft, right = P = Pleft left P Prightright

The above computation is performed The above computation is performed by a carry-operator or by a carry-operator or ”o”-operator”o”-operator

Once we obtain carry for each bit, Once we obtain carry for each bit, it is trivial to compute the sum it is trivial to compute the sum output of each bit (XOR and NAND)output of each bit (XOR and NAND)

(Gleft, Pleft)(Gright, Pright )

(Gleft, right, Pleft, right )

13

SubAdderSubAdder22 (Kogge-Stone) (Kogge-Stone)

Kogge-Stone Parallel prefix architectureKogge-Stone Parallel prefix architecture Delay: Delay: loglog22n n levelslevels of ”o”-operator of ”o”-operator Area: (Area: (n*logn*log22n)-n+1 n)-n+1 number of ”o”-operatornumber of ”o”-operator

GP3 GP2 GP1GP0GP7 GP6 GP5 GP4

C4 C3 C2C8 C7 C6 C5 C1

Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

14

Brent-Kung (BK) Brent-Kung (BK)

Brent-Kung Parallel prefix architectureBrent-Kung Parallel prefix architecture Delay: Delay: (2*log(2*log22n)-2 n)-2 levels of ”o”-operatorlevels of ”o”-operator Area: Area: (2*n)-2-log(2*n)-2-log22n n number of ”o”-operatornumber of ”o”-operator

GP3 GP2 GP1GP0GP7 GP6 GP5 GP4

C4 C3 C2C8 C7 C6 C5 C1

Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

15

SubAdderSubAdder3 3 & SubAdder& SubAdder44 (Carry-Select)(Carry-Select)

Adder1

yx

z1

Adder0 1’b0

x

z0Mux

z

cin

y

1’b1

Large area overheadLarge area overhead Used as a special case, since CUsed as a special case, since Cinin arrives late arrives late Speed depends on the architecture of two addersSpeed depends on the architecture of two adders

But these adders need not be KS (rather, we use BK)But these adders need not be KS (rather, we use BK) The arrival times of the inputs of SubAdderThe arrival times of the inputs of SubAdder33 and and

SubAdderSubAdder44 are earlier than those for SubAdder are earlier than those for SubAdder22

16

Determination of width of Determination of width of SubAdderSubAdder11

Width of the Ripple adder (SubAdderWidth of the Ripple adder (SubAdder11)) At every bit (i), compute T(CAt every bit (i), compute T(C i+1i+1) and ) and

check ifcheck if T(CT(Ci+1i+1) ≤ T(a) ≤ T(ai+1i+1)) T(CT(Ci+1i+1) ≤ T(b) ≤ T(bi+1i+1))

If check passes, i = i+1If check passes, i = i+1 Else continue checking until 3 Else continue checking until 3

consecutive bits fail the check (Hill consecutive bits fail the check (Hill Climbing)Climbing)

Return the value i as the Ripple Adder Return the value i as the Ripple Adder widthwidth

17

Determination of width of Determination of width of SubAdderSubAdder22

Width of Kogge-Stone Adder (SubAdderWidth of Kogge-Stone Adder (SubAdder22)) The latest arriving signals are part of this The latest arriving signals are part of this

adderadder Hence keep this adder wide, while ensuring Hence keep this adder wide, while ensuring

that this does not result in a very narrow that this does not result in a very narrow Carry-Select adder for SubAdderCarry-Select adder for SubAdder33 and and SubAdderSubAdder44

We determine the widths with the following We determine the widths with the following equation:equation: ww22 = n – w = n – w11 if (n-w if (n-w11) ≤ 8 ) ≤ 8 ww22 = 2 = 2pp, where p = log, where p = log22 (n-w (n-w11)) if (n-w if (n-w11) > 8) > 8

Example: If n=32 and w1=7 then w2=16

18

Delay of the Hybrid Delay of the Hybrid AdderAdder

SubAdder1 RippleCarry

w1w1

w1

SubAdder2 KoggeStone

w2w2

w2


w3w3

w3


w4w4

w4

Thybrid = max (T(C4), T(S4), T(S3), T(S2))

T(S2)T(S3)T(S4)T(C4)

19

Determination of widths of Determination of widths of SubAdderSubAdder3 3 andand SubAdderSubAdder44

Width of the two Carry-Select addersWidth of the two Carry-Select adders Initial width configuration

w3 = (n-w1-w2)/2

w4 = (n-w1-w2-w3)

With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide)

Use an iterative approach to explore in the Use an iterative approach to explore in the appropriate direction (similar to Binary appropriate direction (similar to Binary Search) and converge on the smallest delay Search) and converge on the smallest delay configurationconfiguration

20

Experimental SetupExperimental Setup To test our approach, we used:To test our approach, we used:

Adders in several different types of SOP blocks (Multipliers, MAC, generalized Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer)SOP and Squarer)

Two process technologies (Two process technologies (0.13µ and 0.09µ0.13µ and 0.09µ) ) Two Two commercial library vendorscommercial library vendors Two different Two different arrival timearrival time constraints constraints

We compared the results of our hybrid adder with the adder produced We compared the results of our hybrid adder with the adder produced by a by a commercial datapath synthesis toolcommercial datapath synthesis tool..

21

ResultsResults

On an average, 14.31% faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

0

200

400

600

800

1000

1200

1400

Adder-75 Adder-35 Adder-68 Adder-57 Adder-47 Adder-61 Adder-89

Name of The Adder

Wo

rst-

case

Dela

y (

ps)

Delay of the Adder Produced by Commercial Tool Delay of Our Proposed Adder

22

SummarySummary

Hybrid adder consists of 4 SubAddersHybrid adder consists of 4 SubAdders SubAdderSubAdder11 has Ripple-Carry architecture has Ripple-Carry architecture SubAdderSubAdder22 has Kogge-Stone architecture has Kogge-Stone architecture SubAdderSubAdder3 3 and SubAdderand SubAdder4 4 have Carry-have Carry-

Select (based on Brent-Kung) Select (based on Brent-Kung) architecturearchitecture

Widths of all SubAdders are computed Widths of all SubAdders are computed based on a timing-driven analysisbased on a timing-driven analysis

On an average, 14.31% faster (with On an average, 14.31% faster (with 6.62% area penalty)6.62% area penalty)

23

Thank youThank you

Date post:	19-Dec-2015
Category:	Documents
View:	213 times
Download:	0 times

1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi...

Documents