Performance Analysis of Parallel Prefix Adder Based on FPGA

Post on 28-Mar-2023

0 views 0 download

transcript

Performance Analysis of ParallelPrefix Adder Based on FPGA

Avinash shrivastava StudentM`E (VLSI Design) Departmentof Electronics amp Communication of SSCET

CSVTUBhilaiBhilai(CG) India

avinashshrivastava33gmailcom

Chandrahas sahu Asst Prof Department of Electronics amp

Communication of SSCETCSVTUBhilai

Bhilai(CG) IndiaChandrahas_1981yahoocom

Abstractmdash Parallel-prefix structures (alsoknown as carry tree) are found to be commonin high performance adders in very largescale integration (VLSI) designs because ofthe delay is logarithmically proportional tothe adder width Such structures can usuallybe classified into three basic stages whichare pre-computation prefix tree and post-computation However this performanceadvantage does not translate directly intoFPGA implementations due to constraints onlogic block configurations and routingoverhead In this paper six types of carry-tree adders (the Kogge-Stone Brent KungHan Carlson Ladner Fischer Sklansky andHarris adder) investigates and compares themto the simple Ripple Carry Adder(RCA)These implementations have beensuccessfully done in verilog hardwaredescriptive language using Xilinx IntegratedSoftware Environment (ISE) 132 design suitThese designs are implemented in XilinxSpartan 6 Spartan 6 low power virtex 6virtex 6 low power Field Programmable GateArrays (FPGA) and delays area and power aremeasured using xpower analyzers 132 and allthese adderrsquos Comparison of Sliceutilization No of logic levels required ampDelay are investigated and compared finally

Keywordsmdash Parallel prefix adders carrytree adders FPGA logic analyzer delaypower

Introduction

Addition is a fundamental operation for anydigital system digital signal processing orcontrol system A fast and accurate

operation of a digital system is greatlyinfluenced by the performance of theresident adders Adders are also veryimportant component in digital systemsbecause of their extensive use in otherbasic digital operations such assubtraction multiplication and divisionHence improving performance of the digitaladder would greatly advance the execution ofbinary operations inside a circuitcompromised of such blocks The performanceof a digital circuit block is gauged byanalyzing its power dissipation layout areaand its operating speed Parallel PrefixAdder (PPA) is very useful in todayrsquos worldof technology because of its implementationin Very Large Scale Integration (VLSI)chips The VLSI chips rely heavily on fastand reliable arithmetic computation Thesecontributions can be provided by PPA Thereare many types of PPA such as Kogge Stone[1] Brent Kung [2] Ladner Fisher [3] HansCarlson [4] and Knowles [5] Harris For thepurpose of this research only Brent Kungand Kogge Stone adders will be investigatedFig 1 shows the structured diagram of aPPA PPA can be divided into three mainparts namely the pre-processing carrygraph and post-processing The pre-processing part will generate the propagate(p)

and generate (g) bits The acquirement of thePPA carry bit is differentiates PPA fromother type of adders It is a parallel formof obtaining the carry bit that makes itperforms addition arithmetic faster

In this paper the practical issues involvedin designing and implementing tree-basedadders on FPGAs are described An efficienttesting strategy for evaluating theperformance of these adders is discussedSeveral tree-based adder structures areimplemented and characterized on a FPGA andcompared with the Ripple Carry Adder (RCA)

Types of adders Ripple carry adder or carry propagate

adder

Carry look-ahead adder

Carry skip adder

Manchester chain adder

Carry select adders

Pre-Fix Adders

Multi-operand adder

Carry save Adder

Pipelined parallel adder

Parallel Adders

Parallel adders are digital circuits thatcompute the addition of variable binarystrings of equivalent or different size inparallel The schematic diagram of aparallel adder is shown below in Fig 1

Fig 1 Parallel Adder

DRAWBACKS OF RIPPLE CARRY AND CARRYLOOKAHEAD ADDER

Fig2 4 bit ripple carry adder

In fig2 the first sum bit should wait untilinput carry is given the second sum bitshould wait until previous carry ispropagated and so on Finally the output sumshould wait until all previous carries aregenerated So it results in delayIn order to reduce the delay in RCA (or) topropagate the carry in advance we go forcarry look ahead adder Basically this adderworks on two operations called propagate andgenerate The propagate and generate equationsare given by

Pi=AioplusBi(1)

Gi=AiBi(2)

For 4 bit CLA the propagated carry equationsare given as

C1=G0+P0C0(3)

C2=G1+P1G0+P1P0C0(4)

C2=G2+P2G1+P2P1G0+P2P1P0G0(5)

C4=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0C0 (6) Equations (3) (4) (5) and (6) are observedthat the carry complexity increases byincreasing the adder bit width So designinghigher bit CLA becomes complexity In thisway for the higher bit of CLArsquos the carrycomplexity increases by increasing the widthof the adder So results in bounded fan-inrather than unbounded fan-in when designingwide width adders In order to compute thecarries in advance without delay and

complexity there is a concept calledParallel prefix approach

DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERSAND OTHERS

The PPArsquos pre-computes generate and propagatesignals are presented in [2] Using thefundamental carry operator (fco) thesecomputed signals are combined in [3]Thefundamental carry operator is denoted by thesymbol ldquoοrdquo

PARALLEL PREFIX ADDER STRUCTURE

Parallel-prefix structures are found to becommon in high performance adders because ofthe delay is logarithmically proportional tothe adder width [2] PPArsquos basically consists of 3 stagesbull Pre computationbull Prefix stagebull Final computation

Pre computation In pre-computation stage propagates andgenerates are computed for the given inputsusing the given equations (1) and (2)

Prefix stageIn the prefix stage group generatepropagatesignals are computed at each bit using thegiven equations The black cell (BC)generates the ordered pair in equation (7)the gray cell (GC) generates only leftsignal following [2]

Gij=Gij+GijPjminus1k (10)

Pik=PijPjminus1k(11)

More practically the equations (10) and (11)can be expressed using a symbol ldquoo ldquodenotedby Brent and Kung Its function is exactlythe same as that of a black cell ie

GikPik=(GijPij )o(Gjminus1kPjminus1k)(12)

The o operation will help make the rules ofbuilding prefix structures

Fig 3 Parallel-Prefix Structure with carrysave notation

Fig 4 Black and Gray Cell logic Definitions

C Final computationIn the final computation the sum andcarryout are the final output

Si=PiGiminus1minus1(12)

Cout=Gnminus1(13)

Where ldquo-1rdquo is the position of carry-inputThe generatepropagate signals can be groupedin different fashion to get the same correctcarries Based on different ways of groupingthe generatepropagate signals differentprefix architectures can be created Figure 3shows the definitions of cells that are usedin prefix structures including BC and GCFor analysis of various parallel prefixstructures see [2] [3] amp [4]

In the prefix tree group generatepropagatesignals are computed at each bit

Gij=Gij+PijsdotGjminus1k

Pij=PijsdotPjminus1k(15)

KOGGE-STONE PREFIX TREEKogge-Stone prefix tree is among the type ofprefix trees that use the fewest logiclevels A 16-bit example is shown in Figure5 In fact Kogge-Stone is a member ofKnowles prefix tree The 16-bit prefix treecan be viewed as Knowels [1 1 1 1] Thenumbers in the brackets represent the maximumbranch fan-out at each logic level Themaximum fan-out is 2 in all logic levels forall width Kogge-Stone prefix trees

The key of building a prefix tree is how toimplement Equation (32) according to thespecial features of that type of prefix treeand apply the rules described in the previoussection Gray cells are inserted similar toblack cells except that the gray cells final

output carry outs instead of intermediate GP

group The reason of starting with Kogge-Stone prefix tree is that it is the easiestto build in terms of using a program conceptThe example in Figure 5 is 16-bit (a power of2) prefix tree It is not difficult to extendthe structure to any width if the basics arestrictly followed

For the Kogge-Stone prefix tree at the logiclevel 1 the inputs span is 1 bit (eg group(43) takes the inputs at bit 4 and bit 3)Group (43) will be taken as inputs andcombined with group (65) to generate group(63) at logic level 2 Group (63) will be

taken as inputs and combined with group(107) to generate group (103) at logiclevel 3 and so on so forth With thisinspection the structure can be describedwith the Algorithm 61 listed below

Figure 5 16-bit Kogge-Stone Prefix Tree

In Algorithm 61 the number of logic levelsis calculated first At each logic level themaximum input bit span and maximum output bitspan are computed Equation (15) is appliedin the inner loop where bit goes from bit v-1though bit n-1 If any of the subscript goesless than -1 the value stays at -1 Thismeans there is no crossing over bit

Algorithm 61 Building Kogge-Stone PrefixTreeL=log2 (n )for llevel = 1 llevel le L llevel ++ dou = 2llevel output bit spanv = 2llevelminus1 input bit spanFor i = v-1 i lt n-1 i ++ doGPiiminusu+1 = (GPiiminusv+1) (GPiminusviminusu+1)end forend for

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra Level 1 logic levels A 16-bit exampleis shown in Figure 6 The critical path isshown in the figure with a thick gray lineBrent-Kung prefix tree is a bit complex tobuild because it has the most logic levels

To build such a structure the pseudo-codecan be composed as Algorithm71

Algorithm 71 Building Brent-Kung Prefix TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel ndash dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

Table 1 Verifying the Pseudo-Code ofBuilding a Kogge-Stone Prefix Tree

-1 or the LSB boundaryThe statement in the inner f or loop is

applying Equation (32) The validity of thisimplementation can be verified by looking atTable 31 In the table one group operationis randomly selected at each logic levelOther operations can be verified by insertingthe numbers as listed in Table 31 The termGPii=GPiand LSB boundary of theinputsoutputs is bit 1 Table 31 can alsobe matched against Figure 38 to see thecorrespondence

The pseudo-code is a simplified version ofthe exact program In the real program thecode should tell where the black cells andgray cells are The program also needscontrol so that the LSB never goes beyond 1and utilizes optional buffers In Figure 38there are fan-outs more than 2 because thestructure is not buffered Figure 39 shows abuffered 16-bit pre x tree however theexact number of buffers is based on thecapacitance and resistance of theinterconnect network [46] Both figuresindicate a wire track of 8

The algorithmic delay is simply the numberof logic levels The area can be estimated asthe number of cells in the prefix tree Tosimply the calculation all cells are countedas black cells To understand this structureremember that the number of gray cells alwaysequals to n - 1 since the prefix tree onlyoutputs n - 1 carries A black cell has onemore AND gate than a gray cell andtherefore a more accurate area estimationwill just subtract that n 1 AND gates

The number cells for a Kogge-Stone prefixtree can be counted as follows Each logiclevel has n m cells where m =2llevelminus1 Thatis each logic level is missing m cells Thatnumber is the sum of a geometric series

starting from 1 to n2 which totals to n 1

The total number of cells will be nlog2 nsubtracting the total number of cells missingat each logic level which winds up withnlog2 (n)minusn+1 When n = 16 the area isestimated as 49

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan-out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra L-1 logic levels A 16-bit example isshown in Figure 313 The critical path isshown in the figure with a thick gray line

Brent-Kung prefix tree is a bit complex tobuild because it has the most logic levelsTo build such a structure the pseudo-codecan be composed as Algorithm 81

Algorithm 81 Building Brent-Kung Prefix

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

In this paper the practical issues involvedin designing and implementing tree-basedadders on FPGAs are described An efficienttesting strategy for evaluating theperformance of these adders is discussedSeveral tree-based adder structures areimplemented and characterized on a FPGA andcompared with the Ripple Carry Adder (RCA)

Types of adders Ripple carry adder or carry propagate

adder

Carry look-ahead adder

Carry skip adder

Manchester chain adder

Carry select adders

Pre-Fix Adders

Multi-operand adder

Carry save Adder

Pipelined parallel adder

Parallel Adders

Parallel adders are digital circuits thatcompute the addition of variable binarystrings of equivalent or different size inparallel The schematic diagram of aparallel adder is shown below in Fig 1

Fig 1 Parallel Adder

DRAWBACKS OF RIPPLE CARRY AND CARRYLOOKAHEAD ADDER

Fig2 4 bit ripple carry adder

In fig2 the first sum bit should wait untilinput carry is given the second sum bitshould wait until previous carry ispropagated and so on Finally the output sumshould wait until all previous carries aregenerated So it results in delayIn order to reduce the delay in RCA (or) topropagate the carry in advance we go forcarry look ahead adder Basically this adderworks on two operations called propagate andgenerate The propagate and generate equationsare given by

Pi=AioplusBi(1)

Gi=AiBi(2)

For 4 bit CLA the propagated carry equationsare given as

C1=G0+P0C0(3)

C2=G1+P1G0+P1P0C0(4)

C2=G2+P2G1+P2P1G0+P2P1P0G0(5)

C4=G3+P3G2+P3P2G1+P3P2P1G0+P3P2P1P0C0 (6) Equations (3) (4) (5) and (6) are observedthat the carry complexity increases byincreasing the adder bit width So designinghigher bit CLA becomes complexity In thisway for the higher bit of CLArsquos the carrycomplexity increases by increasing the widthof the adder So results in bounded fan-inrather than unbounded fan-in when designingwide width adders In order to compute thecarries in advance without delay and

complexity there is a concept calledParallel prefix approach

DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERSAND OTHERS

The PPArsquos pre-computes generate and propagatesignals are presented in [2] Using thefundamental carry operator (fco) thesecomputed signals are combined in [3]Thefundamental carry operator is denoted by thesymbol ldquoοrdquo

PARALLEL PREFIX ADDER STRUCTURE

Parallel-prefix structures are found to becommon in high performance adders because ofthe delay is logarithmically proportional tothe adder width [2] PPArsquos basically consists of 3 stagesbull Pre computationbull Prefix stagebull Final computation

Pre computation In pre-computation stage propagates andgenerates are computed for the given inputsusing the given equations (1) and (2)

Prefix stageIn the prefix stage group generatepropagatesignals are computed at each bit using thegiven equations The black cell (BC)generates the ordered pair in equation (7)the gray cell (GC) generates only leftsignal following [2]

Gij=Gij+GijPjminus1k (10)

Pik=PijPjminus1k(11)

More practically the equations (10) and (11)can be expressed using a symbol ldquoo ldquodenotedby Brent and Kung Its function is exactlythe same as that of a black cell ie

GikPik=(GijPij )o(Gjminus1kPjminus1k)(12)

The o operation will help make the rules ofbuilding prefix structures

Fig 3 Parallel-Prefix Structure with carrysave notation

Fig 4 Black and Gray Cell logic Definitions

C Final computationIn the final computation the sum andcarryout are the final output

Si=PiGiminus1minus1(12)

Cout=Gnminus1(13)

Where ldquo-1rdquo is the position of carry-inputThe generatepropagate signals can be groupedin different fashion to get the same correctcarries Based on different ways of groupingthe generatepropagate signals differentprefix architectures can be created Figure 3shows the definitions of cells that are usedin prefix structures including BC and GCFor analysis of various parallel prefixstructures see [2] [3] amp [4]

In the prefix tree group generatepropagatesignals are computed at each bit

Gij=Gij+PijsdotGjminus1k

Pij=PijsdotPjminus1k(15)

KOGGE-STONE PREFIX TREEKogge-Stone prefix tree is among the type ofprefix trees that use the fewest logiclevels A 16-bit example is shown in Figure5 In fact Kogge-Stone is a member ofKnowles prefix tree The 16-bit prefix treecan be viewed as Knowels [1 1 1 1] Thenumbers in the brackets represent the maximumbranch fan-out at each logic level Themaximum fan-out is 2 in all logic levels forall width Kogge-Stone prefix trees

The key of building a prefix tree is how toimplement Equation (32) according to thespecial features of that type of prefix treeand apply the rules described in the previoussection Gray cells are inserted similar toblack cells except that the gray cells final

output carry outs instead of intermediate GP

group The reason of starting with Kogge-Stone prefix tree is that it is the easiestto build in terms of using a program conceptThe example in Figure 5 is 16-bit (a power of2) prefix tree It is not difficult to extendthe structure to any width if the basics arestrictly followed

For the Kogge-Stone prefix tree at the logiclevel 1 the inputs span is 1 bit (eg group(43) takes the inputs at bit 4 and bit 3)Group (43) will be taken as inputs andcombined with group (65) to generate group(63) at logic level 2 Group (63) will be

taken as inputs and combined with group(107) to generate group (103) at logiclevel 3 and so on so forth With thisinspection the structure can be describedwith the Algorithm 61 listed below

Figure 5 16-bit Kogge-Stone Prefix Tree

In Algorithm 61 the number of logic levelsis calculated first At each logic level themaximum input bit span and maximum output bitspan are computed Equation (15) is appliedin the inner loop where bit goes from bit v-1though bit n-1 If any of the subscript goesless than -1 the value stays at -1 Thismeans there is no crossing over bit

Algorithm 61 Building Kogge-Stone PrefixTreeL=log2 (n )for llevel = 1 llevel le L llevel ++ dou = 2llevel output bit spanv = 2llevelminus1 input bit spanFor i = v-1 i lt n-1 i ++ doGPiiminusu+1 = (GPiiminusv+1) (GPiminusviminusu+1)end forend for

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra Level 1 logic levels A 16-bit exampleis shown in Figure 6 The critical path isshown in the figure with a thick gray lineBrent-Kung prefix tree is a bit complex tobuild because it has the most logic levels

To build such a structure the pseudo-codecan be composed as Algorithm71

Algorithm 71 Building Brent-Kung Prefix TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel ndash dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

Table 1 Verifying the Pseudo-Code ofBuilding a Kogge-Stone Prefix Tree

-1 or the LSB boundaryThe statement in the inner f or loop is

applying Equation (32) The validity of thisimplementation can be verified by looking atTable 31 In the table one group operationis randomly selected at each logic levelOther operations can be verified by insertingthe numbers as listed in Table 31 The termGPii=GPiand LSB boundary of theinputsoutputs is bit 1 Table 31 can alsobe matched against Figure 38 to see thecorrespondence

The pseudo-code is a simplified version ofthe exact program In the real program thecode should tell where the black cells andgray cells are The program also needscontrol so that the LSB never goes beyond 1and utilizes optional buffers In Figure 38there are fan-outs more than 2 because thestructure is not buffered Figure 39 shows abuffered 16-bit pre x tree however theexact number of buffers is based on thecapacitance and resistance of theinterconnect network [46] Both figuresindicate a wire track of 8

The algorithmic delay is simply the numberof logic levels The area can be estimated asthe number of cells in the prefix tree Tosimply the calculation all cells are countedas black cells To understand this structureremember that the number of gray cells alwaysequals to n - 1 since the prefix tree onlyoutputs n - 1 carries A black cell has onemore AND gate than a gray cell andtherefore a more accurate area estimationwill just subtract that n 1 AND gates

The number cells for a Kogge-Stone prefixtree can be counted as follows Each logiclevel has n m cells where m =2llevelminus1 Thatis each logic level is missing m cells Thatnumber is the sum of a geometric series

starting from 1 to n2 which totals to n 1

The total number of cells will be nlog2 nsubtracting the total number of cells missingat each logic level which winds up withnlog2 (n)minusn+1 When n = 16 the area isestimated as 49

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan-out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra L-1 logic levels A 16-bit example isshown in Figure 313 The critical path isshown in the figure with a thick gray line

Brent-Kung prefix tree is a bit complex tobuild because it has the most logic levelsTo build such a structure the pseudo-codecan be composed as Algorithm 81

Algorithm 81 Building Brent-Kung Prefix

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

complexity there is a concept calledParallel prefix approach

DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERSAND OTHERS

The PPArsquos pre-computes generate and propagatesignals are presented in [2] Using thefundamental carry operator (fco) thesecomputed signals are combined in [3]Thefundamental carry operator is denoted by thesymbol ldquoοrdquo

PARALLEL PREFIX ADDER STRUCTURE

Parallel-prefix structures are found to becommon in high performance adders because ofthe delay is logarithmically proportional tothe adder width [2] PPArsquos basically consists of 3 stagesbull Pre computationbull Prefix stagebull Final computation

Pre computation In pre-computation stage propagates andgenerates are computed for the given inputsusing the given equations (1) and (2)

Prefix stageIn the prefix stage group generatepropagatesignals are computed at each bit using thegiven equations The black cell (BC)generates the ordered pair in equation (7)the gray cell (GC) generates only leftsignal following [2]

Gij=Gij+GijPjminus1k (10)

Pik=PijPjminus1k(11)

More practically the equations (10) and (11)can be expressed using a symbol ldquoo ldquodenotedby Brent and Kung Its function is exactlythe same as that of a black cell ie

GikPik=(GijPij )o(Gjminus1kPjminus1k)(12)

The o operation will help make the rules ofbuilding prefix structures

Fig 3 Parallel-Prefix Structure with carrysave notation

Fig 4 Black and Gray Cell logic Definitions

C Final computationIn the final computation the sum andcarryout are the final output

Si=PiGiminus1minus1(12)

Cout=Gnminus1(13)

Where ldquo-1rdquo is the position of carry-inputThe generatepropagate signals can be groupedin different fashion to get the same correctcarries Based on different ways of groupingthe generatepropagate signals differentprefix architectures can be created Figure 3shows the definitions of cells that are usedin prefix structures including BC and GCFor analysis of various parallel prefixstructures see [2] [3] amp [4]

In the prefix tree group generatepropagatesignals are computed at each bit

Gij=Gij+PijsdotGjminus1k

Pij=PijsdotPjminus1k(15)

KOGGE-STONE PREFIX TREEKogge-Stone prefix tree is among the type ofprefix trees that use the fewest logiclevels A 16-bit example is shown in Figure5 In fact Kogge-Stone is a member ofKnowles prefix tree The 16-bit prefix treecan be viewed as Knowels [1 1 1 1] Thenumbers in the brackets represent the maximumbranch fan-out at each logic level Themaximum fan-out is 2 in all logic levels forall width Kogge-Stone prefix trees

The key of building a prefix tree is how toimplement Equation (32) according to thespecial features of that type of prefix treeand apply the rules described in the previoussection Gray cells are inserted similar toblack cells except that the gray cells final

output carry outs instead of intermediate GP

group The reason of starting with Kogge-Stone prefix tree is that it is the easiestto build in terms of using a program conceptThe example in Figure 5 is 16-bit (a power of2) prefix tree It is not difficult to extendthe structure to any width if the basics arestrictly followed

For the Kogge-Stone prefix tree at the logiclevel 1 the inputs span is 1 bit (eg group(43) takes the inputs at bit 4 and bit 3)Group (43) will be taken as inputs andcombined with group (65) to generate group(63) at logic level 2 Group (63) will be

taken as inputs and combined with group(107) to generate group (103) at logiclevel 3 and so on so forth With thisinspection the structure can be describedwith the Algorithm 61 listed below

Figure 5 16-bit Kogge-Stone Prefix Tree

In Algorithm 61 the number of logic levelsis calculated first At each logic level themaximum input bit span and maximum output bitspan are computed Equation (15) is appliedin the inner loop where bit goes from bit v-1though bit n-1 If any of the subscript goesless than -1 the value stays at -1 Thismeans there is no crossing over bit

Algorithm 61 Building Kogge-Stone PrefixTreeL=log2 (n )for llevel = 1 llevel le L llevel ++ dou = 2llevel output bit spanv = 2llevelminus1 input bit spanFor i = v-1 i lt n-1 i ++ doGPiiminusu+1 = (GPiiminusv+1) (GPiminusviminusu+1)end forend for

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra Level 1 logic levels A 16-bit exampleis shown in Figure 6 The critical path isshown in the figure with a thick gray lineBrent-Kung prefix tree is a bit complex tobuild because it has the most logic levels

To build such a structure the pseudo-codecan be composed as Algorithm71

Algorithm 71 Building Brent-Kung Prefix TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel ndash dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

Table 1 Verifying the Pseudo-Code ofBuilding a Kogge-Stone Prefix Tree

-1 or the LSB boundaryThe statement in the inner f or loop is

applying Equation (32) The validity of thisimplementation can be verified by looking atTable 31 In the table one group operationis randomly selected at each logic levelOther operations can be verified by insertingthe numbers as listed in Table 31 The termGPii=GPiand LSB boundary of theinputsoutputs is bit 1 Table 31 can alsobe matched against Figure 38 to see thecorrespondence

The pseudo-code is a simplified version ofthe exact program In the real program thecode should tell where the black cells andgray cells are The program also needscontrol so that the LSB never goes beyond 1and utilizes optional buffers In Figure 38there are fan-outs more than 2 because thestructure is not buffered Figure 39 shows abuffered 16-bit pre x tree however theexact number of buffers is based on thecapacitance and resistance of theinterconnect network [46] Both figuresindicate a wire track of 8

The algorithmic delay is simply the numberof logic levels The area can be estimated asthe number of cells in the prefix tree Tosimply the calculation all cells are countedas black cells To understand this structureremember that the number of gray cells alwaysequals to n - 1 since the prefix tree onlyoutputs n - 1 carries A black cell has onemore AND gate than a gray cell andtherefore a more accurate area estimationwill just subtract that n 1 AND gates

The number cells for a Kogge-Stone prefixtree can be counted as follows Each logiclevel has n m cells where m =2llevelminus1 Thatis each logic level is missing m cells Thatnumber is the sum of a geometric series

starting from 1 to n2 which totals to n 1

The total number of cells will be nlog2 nsubtracting the total number of cells missingat each logic level which winds up withnlog2 (n)minusn+1 When n = 16 the area isestimated as 49

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan-out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra L-1 logic levels A 16-bit example isshown in Figure 313 The critical path isshown in the figure with a thick gray line

Brent-Kung prefix tree is a bit complex tobuild because it has the most logic levelsTo build such a structure the pseudo-codecan be composed as Algorithm 81

Algorithm 81 Building Brent-Kung Prefix

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

Where ldquo-1rdquo is the position of carry-inputThe generatepropagate signals can be groupedin different fashion to get the same correctcarries Based on different ways of groupingthe generatepropagate signals differentprefix architectures can be created Figure 3shows the definitions of cells that are usedin prefix structures including BC and GCFor analysis of various parallel prefixstructures see [2] [3] amp [4]

In the prefix tree group generatepropagatesignals are computed at each bit

Gij=Gij+PijsdotGjminus1k

Pij=PijsdotPjminus1k(15)

KOGGE-STONE PREFIX TREEKogge-Stone prefix tree is among the type ofprefix trees that use the fewest logiclevels A 16-bit example is shown in Figure5 In fact Kogge-Stone is a member ofKnowles prefix tree The 16-bit prefix treecan be viewed as Knowels [1 1 1 1] Thenumbers in the brackets represent the maximumbranch fan-out at each logic level Themaximum fan-out is 2 in all logic levels forall width Kogge-Stone prefix trees

The key of building a prefix tree is how toimplement Equation (32) according to thespecial features of that type of prefix treeand apply the rules described in the previoussection Gray cells are inserted similar toblack cells except that the gray cells final

output carry outs instead of intermediate GP

group The reason of starting with Kogge-Stone prefix tree is that it is the easiestto build in terms of using a program conceptThe example in Figure 5 is 16-bit (a power of2) prefix tree It is not difficult to extendthe structure to any width if the basics arestrictly followed

For the Kogge-Stone prefix tree at the logiclevel 1 the inputs span is 1 bit (eg group(43) takes the inputs at bit 4 and bit 3)Group (43) will be taken as inputs andcombined with group (65) to generate group(63) at logic level 2 Group (63) will be

taken as inputs and combined with group(107) to generate group (103) at logiclevel 3 and so on so forth With thisinspection the structure can be describedwith the Algorithm 61 listed below

Figure 5 16-bit Kogge-Stone Prefix Tree

In Algorithm 61 the number of logic levelsis calculated first At each logic level themaximum input bit span and maximum output bitspan are computed Equation (15) is appliedin the inner loop where bit goes from bit v-1though bit n-1 If any of the subscript goesless than -1 the value stays at -1 Thismeans there is no crossing over bit

Algorithm 61 Building Kogge-Stone PrefixTreeL=log2 (n )for llevel = 1 llevel le L llevel ++ dou = 2llevel output bit spanv = 2llevelminus1 input bit spanFor i = v-1 i lt n-1 i ++ doGPiiminusu+1 = (GPiiminusv+1) (GPiminusviminusu+1)end forend for

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra Level 1 logic levels A 16-bit exampleis shown in Figure 6 The critical path isshown in the figure with a thick gray lineBrent-Kung prefix tree is a bit complex tobuild because it has the most logic levels

To build such a structure the pseudo-codecan be composed as Algorithm71

Algorithm 71 Building Brent-Kung Prefix TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel ndash dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

Table 1 Verifying the Pseudo-Code ofBuilding a Kogge-Stone Prefix Tree

-1 or the LSB boundaryThe statement in the inner f or loop is

applying Equation (32) The validity of thisimplementation can be verified by looking atTable 31 In the table one group operationis randomly selected at each logic levelOther operations can be verified by insertingthe numbers as listed in Table 31 The termGPii=GPiand LSB boundary of theinputsoutputs is bit 1 Table 31 can alsobe matched against Figure 38 to see thecorrespondence

The pseudo-code is a simplified version ofthe exact program In the real program thecode should tell where the black cells andgray cells are The program also needscontrol so that the LSB never goes beyond 1and utilizes optional buffers In Figure 38there are fan-outs more than 2 because thestructure is not buffered Figure 39 shows abuffered 16-bit pre x tree however theexact number of buffers is based on thecapacitance and resistance of theinterconnect network [46] Both figuresindicate a wire track of 8

The algorithmic delay is simply the numberof logic levels The area can be estimated asthe number of cells in the prefix tree Tosimply the calculation all cells are countedas black cells To understand this structureremember that the number of gray cells alwaysequals to n - 1 since the prefix tree onlyoutputs n - 1 carries A black cell has onemore AND gate than a gray cell andtherefore a more accurate area estimationwill just subtract that n 1 AND gates

The number cells for a Kogge-Stone prefixtree can be counted as follows Each logiclevel has n m cells where m =2llevelminus1 Thatis each logic level is missing m cells Thatnumber is the sum of a geometric series

starting from 1 to n2 which totals to n 1

The total number of cells will be nlog2 nsubtracting the total number of cells missingat each logic level which winds up withnlog2 (n)minusn+1 When n = 16 the area isestimated as 49

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan-out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra L-1 logic levels A 16-bit example isshown in Figure 313 The critical path isshown in the figure with a thick gray line

Brent-Kung prefix tree is a bit complex tobuild because it has the most logic levelsTo build such a structure the pseudo-codecan be composed as Algorithm 81

Algorithm 81 Building Brent-Kung Prefix

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

To build such a structure the pseudo-codecan be composed as Algorithm71

Algorithm 71 Building Brent-Kung Prefix TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel ndash dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

Table 1 Verifying the Pseudo-Code ofBuilding a Kogge-Stone Prefix Tree

-1 or the LSB boundaryThe statement in the inner f or loop is

applying Equation (32) The validity of thisimplementation can be verified by looking atTable 31 In the table one group operationis randomly selected at each logic levelOther operations can be verified by insertingthe numbers as listed in Table 31 The termGPii=GPiand LSB boundary of theinputsoutputs is bit 1 Table 31 can alsobe matched against Figure 38 to see thecorrespondence

The pseudo-code is a simplified version ofthe exact program In the real program thecode should tell where the black cells andgray cells are The program also needscontrol so that the LSB never goes beyond 1and utilizes optional buffers In Figure 38there are fan-outs more than 2 because thestructure is not buffered Figure 39 shows abuffered 16-bit pre x tree however theexact number of buffers is based on thecapacitance and resistance of theinterconnect network [46] Both figuresindicate a wire track of 8

The algorithmic delay is simply the numberof logic levels The area can be estimated asthe number of cells in the prefix tree Tosimply the calculation all cells are countedas black cells To understand this structureremember that the number of gray cells alwaysequals to n - 1 since the prefix tree onlyoutputs n - 1 carries A black cell has onemore AND gate than a gray cell andtherefore a more accurate area estimationwill just subtract that n 1 AND gates

The number cells for a Kogge-Stone prefixtree can be counted as follows Each logiclevel has n m cells where m =2llevelminus1 Thatis each logic level is missing m cells Thatnumber is the sum of a geometric series

starting from 1 to n2 which totals to n 1

The total number of cells will be nlog2 nsubtracting the total number of cells missingat each logic level which winds up withnlog2 (n)minusn+1 When n = 16 the area isestimated as 49

BRENT-KUNG PREFIX TREEBrent-Kung prefix tree is a well-knownstructure with relatively sparse network Thefan-out is among the minimum as f = 0 So isthe wire tracks where t = 0 The cost is theextra L-1 logic levels A 16-bit example isshown in Figure 313 The critical path isshown in the figure with a thick gray line

Brent-Kung prefix tree is a bit complex tobuild because it has the most logic levelsTo build such a structure the pseudo-codecan be composed as Algorithm 81

Algorithm 81 Building Brent-Kung Prefix

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

TreeL=log2 (n )for llevel=1 llevel L llevel++ doleu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u- 2 i lt n-1 i + = u doGPiiminusu+1=(GPiiminusv+1 ) (GPiminusviminusu+1 ) end forend forfor llevel = L-1 llevel 1 llevel - - dogeu=2llevel output bit spanv=2llevelminus1 input bit spanfor i = u+v- 2 i lt n-1 i + = u doGPiminus1=(GPiiminusv+1 ) (GPiminusviminus1) end forend for

The algorithm deals with this prefix tree in2 major f or loops The first f or loophandles logic level by logic level from 1 upto L with the second f or loop handling therest L- 1 logic levels in a decrementalfashion Figure 6 can be divided as the top 4logic levels and the bottom 3 logic levelsThe structure starts with cells every 2 bitsThe input span is 1 bit and the output spanis 2 bits At logic level 2 and 3 thedistance between each cell is 4 and 8 bitsrespectively The inputoutput span is 2=4bits at logic level 2 and 4=8 bits at logiclevel 3 At logic level 4 the only cell isat MSB (bit 14 from input) with inputspanning 8 bits and output spanning 16 bitsBy logic level 4 some carries are alreadygenerated At logic level 5 through 7 theinput bit span is decremented instead ofbeing incremented as in the previous casesThe input bit spans at logic level 5 through6 and 7 are 4 2 and 1 bit respectively

The term output span no longer applies tothese L-1 levels since all the outputs arethe final carries with the formGiminus1

Figure 6 16-bit Brent-Kung Prefix TreeThe delay is estimated as the number of logiclevels (ie L) The total number of cellscan be calculated in the following way Inthe first log2n logic levels the number ofcells is a geometric series For example inthe 16-bit prefix tree at logic level 1through 4 there are 8 4 2 1 cell at eachlevel The sum of this series is n-1 For therest of the logic levels there only existgray cells The total number of gray cells isn-1 for any prefix tree as mentioned beforeHowever in the previous log2 n logic levelsthe pre x tree contains log2 n gray cells Thesum of cells is 2(n-1) log2n When n = 16 thenumber of cells required is 26

SKLANSKY PREFIX TREESklansky prefix tree takes the least logiclevels to compute the carries Plus it usesless cells than Knowles [2111] and Kogge-Stone structure at the cost of higher fan-out Figure 7 shows the 16-bit example ofSklansky prefix tree with critical path insolid line

For a 16-bit Sklansky prefix tree themaximum fan-out is 9 (ie f = 3) Thestructure can be viewed as a compactedversion of Brent-kungs where logic levelsare reduced and fan-out increased A similarpseudo-code listed for Brent-Kung prefix treecan be used to generate a Sklansky prefixtree However the maximum input span isstill a power of 2 relating with the numberof logic levels The difference is that onemore f or loop is required to account for themultiple fan-out (eg at logic level 2through 4 in Figure 7 where the cells areplaced in group of 24 and 8 respectively)

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

The number of logic levels is log2n Eachlogic level has n=2 cells as can be observedin Figure 7 The area is estimated as (n=2)log2n When n = 16 32 cells are required

LADNER-FISCHER PREFIX TREE

Sklansky prefix tree has the minimum logiclevels and uses fewer cells than Kogge-Stoneand Knowles prefix trees The major problemof Sklansky prefix tree is its high fan-outLadner-Fischer prefix tree is proposed torelieve this problem

Figure 7 16-bit Sklansky Prefix Tree

To reduce fan-out without adding extra cellsmore logic levels have to be added Figure 8shows a 16-bit example of Ladner-Fischerprefix tree

Figure 811-bit Ladner-Fischer Prefix TreeSynthesis

Ladner-Fischer prefix tree is a structurethat sits between Brent-Kung and Sklanskyprefix tree It can be observed that inFigure 8 the first two logic levels of thestructure are exactly the same as Brent-Kungs Starting from logic level 3 fan-outmore than 2 is allowed (ie f gt 0)Comparing the fan-out of Ladner-Fischers andSklanskys the number is reduced by a factorof 2 since Ladner-Fischer prefix tree allowsmore fan-out one logic level later thanSklansky prefix tree

Building a Ladner-Fischer prefix tree canbe seen as a relieved version of Sklanskyprefix tree For a structure like Figure 8 aextra row of cells are required to generatethe missing carries

The delay for the type of Ladner-Fischerprefix tree islog2 (n ) The first and last

logic level takes n2 and

n2minus1cells In

between there are log2 (n )minus1 logic levels

each having n4cells Summing up the cells

n2

+n2

minus1+(n4) (log2 (n )minus1 )which is equal to

(n4 )log2 (n)+3n4

minus1 When n = 16 total cells

required is 27

HAN-CARLSON PREFIX TREEThe idea of Han-Carlson prefix tree issimilar to Kogge-Stones structure since ithas a maximum fan-out of 2 or f = 0 Thedifference is that Han-Carlson prefix treeuses much less cells and wire tracks thanKogge-Stone The cost is one extra logiclevelHan-Carlson prefix tree can be viewed as asparse version of Kogge-Stone prefix tree Infact the fan-out at all logic levels is thesame (ie 2) The pseudo-code for Kogge-Stones structure can be easily modified tobuild a Han-Carlson prefix tree The majordifference is that in each logic level Han-Carlson prefix tree places cells every otherbit and the last logic level accounts for themissing carries Figure 9 shows a 16-bit Han-

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

Carlson prefix tree ignoring the buffersThe critical path is shown with thick solidline

This type of Han-Carlson prefix tree has log2n+ 1 logic levels It happens to have the samenumber cells as Sklansky prefix tree sincethe cells in the extra logic level can bemove up to make the each of the previouslogic levels all have n=2 cells The area isestimated as (n=2) log2n When n = 16 thenumber is 32

Figure 9 16-bit Han-Carlson Prefix Tree

HARRIS PREFIX TREEThe idea from Harris about prefix tree is totry to balance the logic levels fan-out andwire tracks Harris proposed a cube to showthe taxonomy for prefix trees in Figure 10which illustrates the idea for 16-bit prefixtrees [1] In the figure all the prefixtrees mentioned above are on the cube withSklansky prefix tree standing at the fan-outextreme Brent-Kung at the logic levelsextreme and Kogge-Stone at the wire trackextreme

The balanced prefix structure is close tothe center of cube (ie when n = 16 l = 1f = 1 and t = 1 or represented in short by(1 1 1)) The logic levels is 24 + 1 = 5maximum fan-out is 2f + 1 = 3 and wire trackis 2t = 2 The diagram is shown in Figure 10with critical path in solid line Observationcan be made that there is bit overlap inlogic level 4 similar to Knowles [2 1 11] The overlap is valid for producingcorrect carries as it has be proven forKnowles [2 1 1 1]

Figure10 Taxonomy of 16-bit Prefix Tree(Adapted from [1])

For n ge 16 (1 1 1) will not be sufficientto build a prefix tree More logic levels orfan-out or wire tracks need to be added Forexample when n = 32 the prefix tree can bein the form of(1 1 2) (1 2 1) and (2 1 1) LikeLadner-Fischer and Han-Carlson prefix treeillustrated in the previous sections Harrisprefix tree has log2 (n )+1 logic levels Itneeds the same number of cells required forHan-Carlson and Sklansky prefix tree which

isn2log2 (n)

TABLE 2 ALGORITHMIC ANALYSIS

Types LogicLevels

Area Fan-out

Wiretracks

Brent-Kung

2log2 (n)2nminuslog2nminus2 2 1

Kogge-Stone

l0g2n nlog2nminusn+1 2 n2

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

Ladner-

Fischer

log2n+1

n4log2n+3n

4+1n4

+11

Sklansky

log2n (n2 )log2nn2

+11

Han-Carlso

n

log2n (n2 )log2n2 n

4

Harris log2n (n2 )log2n3 n 6

Figure 11 16-bit Harris Prefix Tree

METHEDOLOGY

Design Specification16 bit Full adderAlgorithm to be adopted

HDL EntryVerilog coding for each parallel prefix tree based full adder

Functional SimulationTo check the adder functionality for each parallel prefix tree based full adder

Synthesis Converting a high-level description of design into an optimized gate-level representation

Comparative analysis of synthesis results for various FPGA chipsParameters No of SliceLUT amp maximum path delay

Power Analysis for Spartan 6 FPGA chipStatic power comparison for various operating temperature

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

DISCUSSION OF RESULT

TABLE 3 COMPARISON OF SLICE UTILIZATION NOOF LOGIC LEVELS REQUIRED amp DELAY

TABLE 4 STATIC POWER COMPARISON ATVARIOUS TEMPERATURES

Device Spartan 6 XC6SLX45 Temp GradeC-Grade Vccint= 12V Vccaux= 25v

The delay observed for adder design fromsynthesis reports in Xilinx ISE 132

synthesis reports are shown in figure

05

1015202530354045

Virtex 6 Low Power

Virtex 6

Fig 11 Simulation results for the adderdesigns

The no of slices LUT and no of logic levelobserved for adder designs from synthesisreports in Xilinx ISE 132 are compared andshown in figure12The area of the adder designs is measured interms of look up tables (LUT) and inputoutput blocks (IOB) taken for Xilinx ISE 132in Spartan 6 FPGA chip is plotted in thefigureAs per reference [1] ISE software doesnrsquotgive exact delay of the adders because it isnot able to analyze the critical path over

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

the adder [1] From the comparison table itis clear that Out of all adders Harris adderhas less delay KSA adder and BKA

0

10

20

30

40

50Slices LUTNo of Logic Level

Fig Comparative chart for Slice utilization

have about the same delay According to thesynthesis reports out of four parallelprefix adders Harris adder has better delaybecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage

CONCLUSIONS

From the study of analysis done on delayarea and power we have concluded that theefficiency is improved by 650 in oursdelay for RCA when compared to Brent Kung[2] and for KSA it is improved by 253 when compared with [1] and for Harris adderit is improved by 1880 in Spartan 6 FPGAchip So we can say that Harris is the bestbecause of taking least logic level forcompaction where as on the basis of delay andarea (slice utilization or the no of LUTrequired) Brent Kung is the best on anaverage where as power analysis report showsthat almost all of the parallel prefix adderstakes more or less the same power

REFERENCES

[1] D Harris A taxonomy of parallel prefixnetworks in Record of the Thirty-Seventh AsilomarConference on Signals Systems and Computers Nov2003 pp 22132217[2] D Goldberg What every computerscientist should know about floating point

arithmetic ACM Computing surveys vol 23 no1 pp 548 1991[3] J Chen and J E Stine Optimization ofbipartite memory systems for multiplicativedivide and square root 48th IEEE InternationalMidwest Symp Circuits and Systems vol 2 pp14581461 2005[4] S Winograd On the time required toperform addition J ACM vol 12 no 2 pp277285 1965[5] R K Richards Arithmetic Operations in DigitalComputers D Van Nostrand Co PrincetonNJ 1955[6] A Weinberger and J Smith A logic forhigh-speed addition National Bureau of Standardsno Circulation 591 pp 312 1958[7] A Tyagi A reduced area scheme forcarry-select adders IEEE Trans Computers vol42 no 10 pp 11631170 Oct 1993[8] H Ling High speed binary adder IBMJournal of Research and Development vol 25 no 3pp 156166 1981[9] R P Brent and H T Kung A regularlayout for parallel adders IEEE Trans Computersvol C-31 no 3 pp 260264 Mar 1982[10]P Kogge and H Stone A parallel algorithmfor the efficient solution of a general classof recurrence relations IEEE Trans Computersvol C-22 no 8 pp 786793 Aug 1973[11] S Knowles A family of adders in Proc15th IEEE Symp Comp Arith June 2001 pp277281[12] J Sklansky Conditional-sum additionlogic IRE Trans Electronic Computers vol EC-9pp 226231 June 1960[13] R Ladner and M Fischer Parallelprefix Computation J ACM vol 27 no 4 pp831838 Oct 1980[14] T Han and D Carlson Fast area-efficient VLSI Adders in Proc 8th Symp CompArith Sept 1987 pp 4956[15] A Naini D Bearden and WAnderson A 45ns 96b CMOS adder design inProc IEEE Custom Integrate Circuits Conference vol 38no 8 Apr 1965 pp 114117[16] T Kilburn D B G Edwards and D Aspin all Parallel addition in digitalcomputers a new fast carry circuit in ProcIEE vol 106 pt B Sept 1959 p 464[17] N Szabo and R Tanaka Residue Arithmeticand Its Applications to Computer Technology McGraw-Hill 1967[18] W K Jenkins and B J Leon The useof residue number systems in the design offinite impulse response digital filters IEEE

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES

Trans Circuits and Systems vol 24 no 4 pp171201 Apr 1977[19] X Lai and J L Massey A proposal fora new block encryption standard in Advances inCryptology - EUROCRYPTrsquo90 Berlin Germany Springer-Verilag 1990 pp 389404[20] S S-S Yau and Y-C Liu Errorcorrection in redundant residue numbersystems IEEE Trans Computers vol C-22 no 1pp 511 Jan 1973[21] F Halsall Data Communications ComputerNetworks and Open Systems Addison Wesley 1996[22] C Efstathiou D Nikolos and JKalamatianos Area-time efficient modulo2nminusiquest1 adder design IEEE Trans Circuits and System-II vol 41 no 7 pp 463467 1994[23] L Kalamboukas D Nikolos CEfstathiou H T Vergos and Jkalamatianos High-speed parallel-prefixmodulo 2n 1048576 1 adders IEEE Trans Computers vol49 no 7 special is sure on computerarithmetic pp 673680 July 2000[24] C Efstathiou H T Vergos and DNikolos Fast parallel-prefix modulo 2n + 1adders IEEE Trans Computers vol 53 no 9pp 12111216 Sept 2004[25] H T Vergos C Efstathiou and DNikolos Modulo 2n _ 1 adder design usingselect-prefix blocks IEEE Trans Computers vol52 no 11 pp 13991406 Nov 2003[26] V Paliouras and T Stouraitis Novelhigh-radix residue number system multipliersand adders in Proc 1999 IEEE Intrsquol Symp Circuits andSystems VLSI (ISCAS rsquo99) 1999 pp 451454[27] S Bi W J Gross W Wang A Al-khalili and M N S Swamy An area-reducedscheme for modulo 2nminusiquest1additionsubtraction in Proc 9th InternationalDatabase Engineering amp Application Symp 2005 pp396399[28] R Zimmermann Efficient VLSIimplementation of modulo (2nplusmn 1) addition and

multiplication in Proc 14th IEEE Symp ComputerArithmetic 1999 pp p158167[29] J Chen and J E Stine Enhancingparallel-prefix structures using carry-savenotation 51st Midwest Symp Circuits and Systems pp354357 2008[30] G E Moore Cramming more componentsonto integrated circuits in Electronics May1965 pp 25512554[31] M Lehman and N Burla Skip techniquesfor high-speed carry propagation in binaryarithmetic units IRE Trans Electron Comput pp691698 Dec 1961[32] O J Bedrij Carry-select adder IRETrans Electron Comput pp 340346 June 1962[33] J Sklansky Conditional sum additionlogic IRE Trans Electron Comput pp 226 231June 1960[34] V G Oklobdzija B Zeydel H Dao SMathew and R Krishnamurthy Energy delayestimation technique for high-performancemicroprocessor VSLI adders Proc 16th IEEE SympComputer Arithmetic (ARITH-16rsquo03) p 272 June 2003[35] R Zimmermann Binary adderarchitectures for cell-based VLSI and theirsynthesis PhD dissertation ETHDissertation 12480 Swiss Federal Instituteof Technology 1997[36] R Zimmermann and H Kaeslin Cell-based multilevel carry-increment adders withminimal AT- and PT-products[37] S Majerski On determination ofoptimal distributions of carry skips inaddersIEEE Trans Electron Comput pp 4558 Feb1967[38] J E Stine Digital Computer Arithmetic Datapath Design Using Verilog HDL Kluwer Academic2004[39] R W Doran Variants of an improvedcarry-look-ahead-sum adder IEEE TransComputers vol 37 no 9 pp 111011131988

  • Types of adders
  • DIFFERENCE BETWEEN PARALLEL-PREFIX ADDERS AND OTHERS
  • PARALLEL PREFIX ADDER STRUCTURE
  • KOGGE-STONE PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • BRENT-KUNG PREFIX TREE
  • SKLANSKY PREFIX TREE
  • LADNER-FISCHER PREFIX TREE
  • HAN-CARLSON PREFIX TREE
  • HARRIS PREFIX TREE
  • DISCUSSION OF RESULT
  • CONCLUSIONS
  • REFERENCES