+ All Categories
Home > Documents > 1-s2.0-S0167926013000138-main

1-s2.0-S0167926013000138-main

Date post: 15-Oct-2015
Category:
Upload: pooja-verma
View: 17 times
Download: 0 times
Share this document with a friend
Description:
vlsi
10
Full-Swing Gate Diffusion Input logic   Case-study of low-power CLA adder design Arkadiy Morgenshtein, Viacheslav Yuzhaninov, Alexey Kovshilovsky, Alexander Fish n Faculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel a r t i c l e i n f o  Article history: Received 15 July 2012 Received in revised form 8 February 2013 Accepted 24 April 2013 Keywords: Alternative logic family Carry Look Ahead (CLA) adder Full-Swin g GDI Gate Difusion Input (GDI) Low power a b s t r a c t Full Swing Gate Diffusio n Inpu t (FS-GDI) methodology is presented. The propo sed methodology is applied to a 40 nm Carry Look Ahead Adder (CLA). The CLA is implemented mainly using GDI full-swing F1 and F2 gates, which are the counterparts of standard CMOS NAND and NOR gates. A 16-bit GDI CLA was designed in a 40 nm low power TSMC process. The CLA, implemented according to the proposed methodology, pres ents full func tion ality and robu stne ss unde r glob al and loca l process vari atio ns at wide range of supply voltages. Simulation results show 2  area reduction, 5  improvement in dynamic energy dissipation and 4  decrease in leakage, with a slight (24%) degradation in performance, when compared to the CMOS CLA. Advanced design metrics of GDI cells, such as minimum energy point (MEP) operation and minimum leakage vector (MLV), are discussed. & 2013 Elsevier B.V. All rights reserved. 1. Intr oduct ion Power consumption and area reduction of logic and memory have become primary focuses of attention in VLSI digital design [1   6]. Pow er is the limiti ng fac tor in bot h high per for man ce syst ems and port able appl ications. Die area directly affects the device size and cost. Since the introduction of the standard CMOS Logic in ear ly 80s , many des ign solut ion s hav e been propos ed to improve power dissip ation, area and performa nce of digit al VLSI chips. Gate Diffusion Input (GDI) design methodology was introduced as a pro mis ing alt er nat ive to St ati c CMO S Log ic  [7]. Origin ally proposed for fabrication in Silicon on Insulator (SOI) and twin-well CMOS processes, GDI methodology allowed implementation of a wide range of complex logic functions using only two transistors [7]. It was shown, that area and dynamic power of GDI combina- torial and sequential logic were signi cantly reduced, as compared to stand ard CMOS implement ation s. Simila rly to exis ting alter - natives to CMOS, such as Pass Transistor Logic (PTL), the GDI gates presented reduced voltage swing at their outputs due to threshold drops. These drops usually cause degradation in performance and increased short circuit power  [8] . However, since the GDI circuits were implemented with much less transistors, a signi cant power overall power reduction was observed, while maintaining minimal performance penalty. Recently, it was shown that any GDI circuit can be implemented in a standard CMOS process  [8] . The ef ciency of the GDI method for both combinatorial and sequential logic was shown by many groups  [8   15 ]. Var ious combinatorial cir cuits, suc h as adders, multi plier s, comp arat ors, and coun ters, were imple ment ed in processes from 0.8  mm down to 65 nm. GDI Flip-Flops were also presented, showing improvements in both area and power, com- pared to existing Flip Flop styles. In this paper we present an ef cient methodology for digital circuits implementation. The proposed methodology was applied to a 16-bit adder in low power standard 40 nm TSMC process. CLA adder architecture, which was originally proposed as an alterna- tive for the speed enh anc ement of a simple rip ple add er , wa s chosen as a benc hmark circuit in this work. The propos ed CLA implementation utilizes improved full-swing GDI F1 and F2 gates, which are the coun terp arts of stand ard CMOS NAND and NOR gates. The CLA design is compared with our previously shown GDI methodology  [8],  whi ch utiliz es swing -res tori ng buffe rs with sel ect ive app licati on of high- Vth tra nsistors, as well as wit h stand ard CMOS implement ation . Simu lation resu lts show CLA functionality and robustness under global and local process varia- tions. The CLA pres ents 2  area reduction and 4  5  power reduction, compared to the conventional CMOS implementation. The contributions of this paper are as follows: (1) The GDI full swing methodology for a standard nanoscaled CMOS technology is pr esented; (2) Lea kage reduction in GDI is discusse d thr ough minimum leaka ge vect or (ML V) analy sis; (3) GDI robustne ss is evaluated through statistical Monte Carlo simulations; and (4) Low volt age operatio n of the GDI cells, inclu ding minimum energy operation (MEP) is shown. Contents lists available at SciVerse ScienceDirect journal homepage:  www.elsevier.com/locate/vlsi INTEGRATION, the VLSI journal 01 67-9260/$- see front matt er  & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.vlsi.2013.04.002 n Correspondin g author . Tel.: +972 54 8044144; fax: +972 3 7384051. E-mail addresses:  alexander.[email protected],  alexander.[email protected] (A. Fish) . iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logic   Case-study of low-power CLA adder design, INTEGRATION, the VLSI journal (2013),  http://dx.doi.org/10.1016/j.vlsi.2013.04.002 INTEGRATION, the VLSI journal   ( ∎∎∎∎)  ∎∎∎  ∎∎∎
Transcript
  • axe

    Inookthe

    lowll fuges.de. A

    eaka

    ductiontentionr in bDie aructionlutionand

    methoc CMO

    performance penalty.

    NAND and NORously shown GDIng buffers withs well as withsults show CLAal process varia-d 45 power

    swing methodology for a standard nanoscaled CMOS technology is

    Contents lists available at SciVerse ScienceDirect

    w.e

    INTEGRATION, th

    INTEGRATION, the VLSI journal () operation (MEP) is shown.E-mail addresses: [email protected], [email protected] (A. Fish).presented; (2) Leakage reduction in GDI is discussed throughminimum leakage vector (MLV) analysis; (3) GDI robustness isevaluated through statistical Monte Carlo simulations; and (4) Lowvoltage operation of the GDI cells, including minimum energy

    0167-9260/$ - see front matter & 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.vlsi.2013.04.002

    n Corresponding author. Tel.: +972 54 8044144; fax: +972 3 7384051.iPleaINTEwere implemented with much less transistors, a signicant poweroverall power reduction was observed, while maintaining minimal

    reduction, compared to the conventional CMOS implementation.The contributions of this paper are as follows: (1) The GDI fullwide range of complex logic functions using only two transistors[7]. It was shown, that area and dynamic power of GDI combina-torial and sequential logic were signicantly reduced, as comparedto standard CMOS implementations. Similarly to existing alter-natives to CMOS, such as Pass Transistor Logic (PTL), the GDI gatespresented reduced voltage swing at their outputs due to thresholddrops. These drops usually cause degradation in performance andincreased short circuit power [8]. However, since the GDI circuits

    implementation utilizes improved full-swing GDIwhich are the counterparts of standard CMOSgates. The CLA design is compared with our previmethodology [8], which utilizes swing-restoriselective application of high-Vth transistors, astandard CMOS implementation. Simulation refunctionality and robustness under global and loctions. The CLA presents 2 area reduction anproposed for fabrication in Silicon on Insulator (SOI) and twin-wellCMOS processes, GDI methodology allowed implementation of a

    tive for the speed enhancement of a simple ripple adder, waschosen as a benchmark circuit in this work. The proposed CLA

    F1 and F2 gates,1. Introduction

    Power consumption and area rehave become primary focuses of at[16]. Power is the limiting factosystems and portable applications.device size and cost. Since the introdLogic in early 80s, many design soto improve power dissipation, areaVLSI chips.

    Gate Diffusion Input (GDI) designas a promising alternative to Statise cite this article as: A. MorgenshteGRATION, the VLSI journal (2013), hof logic and memoryin VLSI digital design

    oth high performanceea directly affects theof the standard CMOSs have been proposedperformance of digital

    dology was introducedS Logic [7]. Originally

    in a standard CMOS process [8]. The efciency of the GDI methodfor both combinatorial and sequential logic was shown by manygroups [815]. Various combinatorial circuits, such as adders,multipliers, comparators, and counters, were implemented inprocesses from 0.8 mm down to 65 nm. GDI Flip-Flops were alsopresented, showing improvements in both area and power, com-pared to existing Flip Flop styles.

    In this paper we present an efcient methodology for digitalcircuits implementation. The proposed methodology was appliedto a 16-bit adder in low power standard 40 nm TSMC process. CLAadder architecture, which was originally proposed as an alterna-Recently, it was shown that any GDI circuit can be implementedFull-Swing Gate Diffusion Input logicCCLA adder design

    Arkadiy Morgenshtein, Viacheslav Yuzhaninov, AleFaculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel

    a r t i c l e i n f o

    Article history:Received 15 July 2012Received in revised form8 February 2013Accepted 24 April 2013

    Keywords:Alternative logic familyCarry Look Ahead (CLA) adderFull-Swing GDIGate Difusion Input (GDI)Low power

    a b s t r a c t

    Full Swing Gate Diffusionapplied to a 40 nm Carry LF1 and F2 gates, which arewas designed in a 40 nmmethodology, presents fuwide range of supply voltaenergy dissipation and 4compared to the CMOS CLAoperation and minimum l

    journal homepage: wwin, et al., Full-Swing Gate Dttp://dx.doi.org/10.1016/j.vlsse-study of low-power

    y Kovshilovsky, Alexander Fish n

    put (FS-GDI) methodology is presented. The proposed methodology isAhead Adder (CLA). The CLA is implemented mainly using GDI full-swingcounterparts of standard CMOS NAND and NOR gates. A 16-bit GDI CLApower TSMC process. The CLA, implemented according to the proposednctionality and robustness under global and local process variations atSimulation results show 2 area reduction, 5 improvement in dynamiccrease in leakage, with a slight (24%) degradation in performance, whendvanced design metrics of GDI cells, such as minimum energy point (MEP)ge vector (MLV), are discussed.

    & 2013 Elsevier B.V. All rights reserved.

    lsevier.com/locate/vlsi

    e VLSI journaliffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • The paper is organized as follows: Section 2 overviews theGDI methodology and presents its benets and limitations. Theproposed CLA implementation is discussed in Section 3. Section 4presents simulation results of the proposed GDI CLA in 40 nmstandard CMOS process, comparing them to the CMOS CLA.Section 5 concludes the paper.

    2. Overview of GDI

    The basic GDI cell is shown in Fig. 1. At the rst glance, the GDIcell, which consists of only two transistors, resembles the conven-tional CMOS inverter. However, contradictory to the inverter, itcontains three inputs: G (common gate input of both the nMOSand the pMOS), P (input to the source/drain of the pMOS), and N(input to the source/drain of the nMOS).

    It was shown that multiple Boolean functions can be imple-mented by a simple GDI cell, as demonstrated in Table 1. This isachieved by a change of the input conguration of the GDI cell.While implementation of most of these functions is relativelycomplex (612 transistors) in Static CMOS, it is very efcient (only2 transistors) with the GDI cells. The Multiplexer (MUX) is themost complex function that can be implemented with a basic GDIcell, while being the most efcient function as compared to CMOSimplementation.

    GDI gates may suffer from threshold voltage drops whichreduce current drive and therefore affect the performance of thegate. These drops also increase direct-path static power dissipationin the cascaded inverters, used for swing restoration. It was shown

    section is a good example of such design. An example of a logic chain

    Boolean function synthesis through input conguration of a simple GDI cell

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 2N P G Out Function

    0 B A AB F1

    B 1 A A B F21 B A A B ORB 0 A AB ANDC B A AB AC MUX0 1 A A NOTthat these effects can be signicantly reduced by using swing-restoration buffers with a multiple VTH approach [7], hereinnamed MVT. This approach suggests using low threshold transis-tors in all paths where a voltage drop is expected. This way,the voltage drop at the output will be minimal. In addition, allregenerative inverters are implemented using high threshold

    Fig. 1. Basic GDI cell.

    Table 1iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsin CLA, containing full-swing GDI cells, is shown in Fig. 4.

    3.2. CLA implementation

    Two GDI versions of a 16-bit CLA adder were implemented inthis work: one using swing restoration buffers with multiple Vth(MVT GDI), and the second one with FS GDI gates.

    A similar conventional CLA architecture, shown in Fig. 5, wasused to implement all the CLA versions. The circuit-level imple-mentation of MVT and FS GDI adders was different, in order toaddress the specic properties of each technique. The implemen-tation was based mostly on F1 and F2 cells. The detailed GDIimplementation of various CLA blocks is given in (1)(4) andis depicted in Table 2, assuming the logical functions F1a;b aband F2a; b a b.

    The pg unit outputs were implemented as follows:

    p ab; GDI XOR gate

    g ab F2a; b;MVT GDIF b; a; FS GDI

    (1transistors. This combination allows minimization of the direct-path static power in the inverters.

    Most of today's static digital designs are based on CMOS NANDand NOR gates. The reasons for this are known and well explored.Both NAND and NOR gates are implemented using only fourtransistors and each one of these functions is a universal set. TheGDI method, which is very efcient for implementation of variousgates, such as MUX, AND, OR (see Table 2), has similar number oftransistors in NAND/NOR gates implementation as standard CMOSmethodology. However, the GDI technology provides alternativebasic functions, F1 and F2. Consisting only of two transistors (oneGDI cell), each one of these functions represents a universal set.Moreover, it was shown in [7] that F1 and F2 functions can be usedto synthesize other functions more efciently than the NANDand NOR gates. Fig. 2 shows a comparison of number of variousfunctions that can be implemented using the same number of F1and NAND gates. The strength of the F1 and F2 gates will be alsodemonstrated by implementation of CLA using mainly these GDIgates (see Section 3).

    3. GDI CLA adder design

    3.1. Full-swing GDI cells

    In this paper we propose full-swing (FS) GDI cells. Theproposed technique utilizes a single swing restoration (SR) tran-sistor to improve the output swing of F1 and F2 GDI gates. Fig. 3shows the structure of full swing F1 and F2 cells. As can be seen,the SR transistor is activated only in cases when the Vth drop mayoccur at the output. Since in F1 and F2 gates, the output VTH dropcan occur only at one of the logical levels (VTH instead of 0 V in F1,and VDDVTH instead of VDD in F2), only a single SR transistor isrequired to ensure the full swing operation.

    In cases where the gate input signal of GDI cell has an invertedrepresentation in the circuit, it can be used to control the swingrestoring transistor. This transistor will have a diffusion input similarto the diffusion input of GDI, but will be of an opposite type (nMOSfor F1, and pMOS for F2). In this manner, the diffusion input signalwill pass through a pair of transistors of both types (in the transistorof original GDI cell, and the complementary SR transistor). TheFS GDI cells are efcient alternative for swing restoration buffers,in designs where inverted signals can be obtained as part of logicfunction implementation. The CLA adder presented in the next sub-1

    iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • Table 2The transistor-level design of CLA blocks.

    Unit MVT GDI Full-Swing (FS) GDI

    pg

    PG

    LCG

    CLA

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 3

    iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder design,INTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsi.2013.04.002

  • Swi

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 4Table 2 (continued )

    Unit MVT GDI Full-

    Last CLAwhere a similar GDI XOR gate was used in both versions, as will beexplained below.

    The PG unit implementation is the following:

    P p1p0 F1p1 ; p0

    G g1 g0p1 F1g1; F2g0; p1 ; MVT GDIF2F1p1 ; g0;g1; FS GDI

    (2

    where output P has similar implementation in both the versions.

    Inverter with HVT nMOS transistor, Inverter with HVT pMO

    Note: F1* and F2* stand for GDI F1 and F2 full swing gates.

    0 1 2 3 40

    5

    10

    15

    20

    25

    30

    35

    40

    45

    Number of Cells

    Num

    ber

    of F

    unct

    ions

    CMOS NAND cell vs. GDI F1 cell

    21

    41

    CMOS

    GDI

    2

    84

    8

    16

    Fig. 2. Number of various functions that can be implemented using the samenumber of F1 and NAND cells (after [7]).

    iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsng (FS) GDIThe local and global carry generator blocks are implementedaccording to:

    Cloc g0 p0Cin

    F1g0; F2Cin;p0 ; MVT GDIF2F1p0 ;Cin; g0; FS GDI

    (3

    Cout G1 P1G0 P1P0Cin F2P1G0 G1 ; P1P0Cin F2F2P1G0 ;G1; F1P1P0 ;Cin

    F2F2F1P1 ;G0;G1; F1F1P1 ; P0;Cin 4The signals P1;G1 and P0;G0 in (4) represent the outputs from

    top hierarchy level CLA blocks.It should be noted that in the MVT implementation, the

    application of high-Vth (HVT) transistors in swing restorationbuffers is selective and depends on the Vth drop that may occur

    S transistor, Inverter with both, HVT transistors.

    Fig. 3. Scheme of the FS GDI gates.

    iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • in the path between two buffers. In case that the Vth drop at bufferinput may occur only at high (low) voltage, then an asymmetricbuffer will be used with high-Vth nMOS (pMOS) transistor. Allother transistors in the buffers will remain low-Vth (LVT) tomaintain the performance. In all the GDI cells, the transistor thatmay cause a Vth drop, will be LVT in order to minimize the Vthdrop. Other GDI transistors will be standard-Vth (SVT).

    The FS GDI implementation is based on the F1 and F2 cells, asdescribed in Fig. 3. The implementation did not require addition ofinverters for driving the SR transistors. All the inverted signals thatwere used in SR transistors appeared inherently in the functionalimplementation. In MUX cell implementation, a couple of com-plementary SR transistors was used at the output, making the cellsimilar to a PTL MUX.

    In both the versions the XOR implementation was optimized.The basic implementation of a GDI XOR gate, as was proposedin [1], consists of 4 transistors comprising a GDI cell used as aMUX, and an inverter. However, in complex circuits, the diffusioninput to GDI XOR may already have an inverted representation

    elsewhere in the circuit. Thus, instead of implementing an inverteragain as part of GDI XOR, we can use both signals as diffusioninputs to GDI MUX while maintaining the same functionality.This allows a signicant decrease in number of transistors incircuits with high number of XOR/XNOR gates. An example of theoptimization in GDI CLA adder can be seen in implementation ofthe p function in Table 2. This optimization allowed reducing thenumber of transistors in GDI design.

    Both versions of GDI implementation were compared withstandard CMOS design of CLA adder. In order to maintain optimaldesign, the XOR circuits in CMOS were implemented using thePTL technique. Table 3 summarizes the transistor count and areaestimation of both GDI designs vs. CMOS counterpart. Note, allCMOS gates were implemented using minimum sized transistorswith a standard ratio between pull up and pull down networks, i.e.2. GDI F1, F2 and XOR gates were sized similarly to a CMOSinverter (2). Supplementary SR transistors were: minimumsized NMOS and double sized PMOS. The area estimation isnormalized with respect to WminLmin.

    It can be clearly seen that both GDI implementations havesignicant advantage in terms of transistors count and area,as compared to CMOS. While FS GDI implementation providesfull swing and improved performance (as will be shown in thefollowing sections), it implies about 40% area increase as com-pared to MVT GDI. Still, it occupies only half of area as compared toCMOS design.

    3.3. Leakage elimination in GDI

    As was shown in [7], the unique structure of the GDI cell

    ral C

    and

    CG

    24)

    2)

    05)

    F *

    F *

    Fig. 4. Example logic chain in CLA, containing full-swing GDI cells.

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 5Fig. 5. The gene

    Table 3Transistor count and area estimation comparison between GDI

    Design Unit

    pg PG L

    CMOS 18 18 1(41) (35) (2

    MVT GDI 8 8 8(12) (12) (1

    FS GDI 11 11 1(16) (16) (1

    Transistors count (Area [W min L min]).iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsLA architecture.

    Static CMOS designs.

    CLA Last CLA Total

    30 34 934(59) (77) (2039)

    16 24 438(24) (36) (657)

    21 31 627(31) (46) (925)provides signicant reduction of both the sub-threshold and thegate leakage components, as compared to a static CMOS gate.iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • values of 100 mV up to nominal values of 1.1 V. The designs weresimulated using SPICE based Virtuoso simulator. Both GDI designswere compared to the CMOS counterpart in terms of performance,power consumption, area estimation and sensitivity to processvariations.

    4.1. Nominal voltage operation

    The comparative results of performance, static power, energyper operation and energy-delay product (EDP) in CMOS and GDIimplementations are presented in Table 4.

    DelayAs can be seen, the CMOS design has the shortest delayamong all implementations. The FS GDI implementation shows a30% delay increase as compared to CMOS. The MVT GDI is aboutthree times slower than CMOS due to voltage drops. The perfor-mance improvement in FS GDI is achieved due to better drivingcapabilities of the modied F1 and F2 gates.

    Static power consumptionAccording to the results presentedin Table 4, the static power of both GDI designs is signicantlylower than in CMOS design. One of the reasons for leakagereduction in GDI is the reduced transistor count. In addition, asshown in previous section, GDI benets from an inherent sub-threshold leakage reduction in half of the input vectors, leading toa zero potential between the diffusion inputs of GDI cell [7]. The FSGDI implementation presents reduction in static power consump-

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 6Fig. 6. LCG unit with minimum leakage vector.Since the sub-threshold leakage is still dominant, it is addressedhere in more details.

    A general GDI cell eliminates the sub-threshold leakage in halfof all possible states. This is contrary to static CMOS gates, wherethe pull-up and the pull-down networks are always connected tothe supply voltage and ground, respectively.

    Here we demonstrate this advantage by analyzing the minimumleakage vector (MLV) of a basic two-bit CLA module, consisting ofPG and LCG blocks. The following input vector provides the minimalleakage in the two-bit FS-GDI CLAvin! g0; g1; p0; p1;Cin 1;1;0;1;0

    As can be seen in Fig. 6, when the input vector is applied to LCGblock, four transistors are turned off (M1, M3, M4 and the inverternMOS). However, as the inputs Cin and go are connected todiffusions of turned-off transistors, the potential at both diffusionnodes of the transistors is similar. This leads to elimination of sub-threshold leakage in transistors M1, M3 and M4.

    Similar effect is observed when the input vector vin is appliedto the PG block, as shown in Fig. 7. As can be seen, due to thediffusion input connections, three out of ve turned-off transistorshave zero potential between the diffusion nodes. In this casetransistors M2, M8 and M9 exhibit zero sub-threshold leakage.

    4. Comparative simulation results

    The proposed 16-bit CLA circuits were designed in 40 nm TSMCprocess with supply voltage varying from deep sub-threshold

    tion. as compared to MVT GDI. This is achieved by maintaining fullswing at output nodes of all GDI cells, and therefore eliminating

    Fig. 7. PG unit with minimum leakage vector.

    iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsthe direct-path currents.Dynamic Energy and EDPTable 4 shows a 56 reduction of

    dynamic energy consumption per operation in both GDI circuitsas compared to CMOS. The main reason for this is the reducedswitching capacitance in GDI. Note that although CMOS presentsan advantage over GDI by means of delay, it can be clearly seenthat the EDP metric of both GDI designs is better.

    Table 4Comparison of performance, static and dynamic power in GDI and CMOS CLAimplementations.

    tpd [psec] PStat [nW] EDyn [f] EDP [J sec 1024]

    CMOS 134 78.7 123 16.5MVT GDI 422 35.9 23.5 8.2FS GDI 167 18.4 19.4 3.9Fig. 8. Delay distribution of CMOS design derived by Monte Carlo simulation.

    iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • Sensitivity of Process VariationsIn order to evaluate the sensi-tivity of the designs to local and global process variations, MonteCarlo simulations have been carried out. Figs. 810 present thedelay distribution of CMOS, MVT GDI and FS GDI, respectively.As expected, FS GDI presents much better immunity to processvariations than MVT GDI, while showing /s ratio that is very closeto CMOS (only 12% degradation). The MVT GDI adder is muchmore sensitive (4 degradation as compared to CMOS), becauseof driving current dependence on process-sensitive Vth, which isamplied due to voltage drops at internal nodes.

    4.2. Low voltage operation

    Driven by demand for ultra-low power dissipation, low voltageoperation of digital circuits has recently gained extensive researchefforts [1623]. It was shown that minimum energy operationpoint (MEP) is usually achieved in the sub-threshold region. Herewe examine the operation of GDI and CMOS adders at low supplyvoltages.

    While discussing the MEP term, it should be reminded that thetotal energy consumption is comprised of two different compo-nents Etotal Edyn Eleak. The dynamic energy component isproportional to an effective load capacitance and supply voltageEdynCef f VDD

    2, while the leakage component is dominated by

    Fig. 9. Delay distribution of MVT GDI design derived by Monte Carlo simulation.

    Fig. 10. Delay distribution of FS GDI design derived by Monte Carlo simulation.

    Fig. 11. MEPs s

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 7

    iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsintegration of a sub-threshold current along the operation timeEleakVDD

    2expVDD=nVt, assuming full swing operation [24]. It canbe easily noticed that these two components have an oppositeeffect as a function of VDD, so their relation determines the MEP.

    Fig. 10 presents the dependency of total energy consumptionon varying supply voltages. MEPs of GDI and CMOS adders areshown. Note that the simulation was carried out at a typical corner(TT), thus the actual MEPs may be higher due to increased leakagecurrents under process variations.

    An interesting observation from Fig. 11 is that the same order ofMEP energy dissipation is obtained for all designs, while the MEPswere achieved at different voltages. The FS GDI has 35% lower MEPenergy, as compared to CMOS.

    Both GDI designs have reduced effective load capacitances, thusthe consumed energy at high VDD values is lower than in CMOS.The sub-threshold leakage in MVT GDI adder starts dominatingimulation.

    iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • Simulation results showed a clear advantage of the proposed GDI

    LA A

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 8Fig. 12. Propagation delay of C

    Table 5Transistor count and area estimation comparison between GDI and Static CMOSat higher VDD values because of the VTH drops. Thus, the MVT GDIachieves the MEP at higher voltage. FS GDI design has slightlyhigher effective capacitances. Therefore, its energy consumption atlarge voltages is increased. Since no VTH drops occur in FS GDI, itsleakage component becomes dominant at lower supply voltage andtherefore the MEP is shifted towards lower VDD. The CMOS adderMEP appears at lower VDD because of the dominating dynamicenergy caused by higher switching capacitances.

    Fig. 12 presents the propagation delay of all adders as functionof supply voltage sweep. The delays at MEP are also shown. As canbe seen, the MVT GDI adder exhibits a delay of about one orderhigher than in CMOS and FS GDI. However, at MEP, the delayof MVT GDI is comparable with the MEP delays of the othertechniques.

    Table 5 summarizes the characteristics of all the designs atMEP. As can be seen, FS GDI presents the best energy and delayat MEP. Moreover, the MEP is achieved at higher VDD than CMOS,which when accounting the similar process variation sensitivitymake FS GDI benecial for minimal energy operation.

    5. Conclusions

    Full Swing Gate Diffusion Input (FS GDI) methodology wasproposed and evaluated on a 16-bit Carry Look Ahead Adder (CLA).It was shown that the proposed FS-GDI circuits are benecial interms of performance and static power consumption, as comparedto the conventional multiple Vth GDI (MVT-GDI). Three CLAversions, based on FS-GDI, MVT-GDI and standard CMOS were

    designs.

    MEP VDD [V] MEP energy [fJ] MEP delay [ sec]

    CMOS 0.18 4.7 5.51MVT GDI 0.32 5.2 5.92FS GDI 0.22 3.1 4.17

    iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsCLAs by means of area, dynamic and static energy. The FS-GDIachieved 2 area reduction, 5 improvement in dynamic energydissipation and 4 decrease in leakage, with a slight (24%)degradation in performance, when compared to the CMOS CLA.Advanced design metrics of GDI cells, such as minimum energypoint (MEP) operation and minimum leakage vector (MLV) werediscussed. It was shown that MVT-GDI achieved better character-istics at MEP, as compared to other techniques.

    Referencesdesigned and compared in 40 nm low power TSMC process.

    dders at low supply voltages.[1] M. Alioto, Ultra-low power VLSI circuit design demystied and explained: atutorial, IEEE Transactions on Circuits and SystemsPart I (invited) 59 (1)(2012) 329.

    [2] G. Gammie, A. Wang, M. Chau, S. Gururajarao, R. Pitts, F. Jumel, S. Engel, P.Royannez, R. Lagerquist, H. Mair, A 45 nm 3.5 g baseband-and-multimediaapplication processor using adaptive body-bias and ultra-low-power techni-ques, in: Proceedings of IEEE International Solid-State Circuits ConferenceDigest of Technical Papers (ISSCC), 2008, pp. 258611.

    [3] Bol D. Robust and energy-efcient ultra-low-voltage circuit design undertiming constraints in 65/45 nm CMOS. Journal of Low Power Electronics andApplications 1 (1) (2011) 119.

    [4] G. Chen, M. Fojtik, D. Kim, D. Fick, J. Park, M. Seok, M.T. Chen, Z. Foo, D. Sylvester,D. Blaauw, Millimeter-scale nearly perpetual sensor systemwith stacked batteryand solar cells, in: Proceedings of IEEE International Solid-State CircuitsConference Digest of Technical Papers (ISSCC), 2010 , pp. 288289.

    [5] I. Vaisband, E.G. Friedman, R. Ginosar, A. Kolodny, Low power clock networkdesign, Journal of Low Power Electronics and Applications 1 (2011) 219246.

    [6] A. Teman, L. Pergament, O. Cohen, A. Fish, Minimum leakage quasi-static RAMbitcell, Journal of Low Power Electronics and Applications 1 (2011) 204218.

    [7] A. Morgenshtein, A. Fish, I.A. Wagner, Gate-diffusion input (GDI)a power-efcient method for digital combinatorial circuits, IEEE Transactions on VLSISystems 10 (5) (2002).

    [8] A. Morgenshtein, I. Shwartz, A. Fish, Gate diffusion input (GDI) logic instandard CMOS nanoscale process, in: Proceedings of IEEE Convention ofElectrical and Electronics Engineers in Israel, 2010.

    [9] M. Kumar, M.A. Hussain, L.L.K. Singh, Design of a Low Power High Speed ALUin 45nm Using GDI Technique and Its Performance Comparison Communica-tions in Computer and Information Science 142 (Part 3) (2011) 458463.

    [10] K.K. Chaddha, R. Chandel, Design and analysis of a modied low power CMOSfull adder using gate-diffusion input technique, Journal of Low PowerElectronics 6 (4) (2010) 482490.

    [11] O.P. Hari, A.K. Mai, Low power and area efcient implementation of N-phasenon overlapping clock generator using GDI technique, in: Proceedings of IEEEInternational Conference on Electronics Computer Technology (ICECT), 2011.

    iffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

  • [12] P.M Lee, C.H. Hsu, Y.-H. Hung, Novel 10-T full adders realized by GDI structure, in:Proceedings of the IEEE International Symposium on Integrated Circuits, 2007.

    [13] F. Moradi, D.T. Wisland, D.T.H. Mahmoodi, H.S. Aunet, T.V. Cao, A. Peiravi, Ultralow power full adder topologies, in: Proceedings of ISCAS'04, Taipei, Taiwan,May 2009.

    [14] A. Morgenshtein, A. Fish, I.A. Wagner, An efcient implementation of D-ip-op using the GDI technique, in: Proceedings of ISCAS'04 Conference, Canada,May 2004, pp. 673676.

    [15] R. Uma, P. Dhavachelvan, Modied gate diffusion input technique: a newtechnique for enhancing performance in full adder circuits, Proceedings ofICCCS 6 (2012) 7481.

    [16] A. Wang, B.H. Calhoun, A.P. Chandrakasan, Sub-Threshold Design for UltraLow-Power Systems, Springer Verlag, 2006.

    [17] S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, A. Fish, Digitalsubthreshold logic design - motivation and challenges, in: Proceedings of theIEEE 25th Convention of Electrical and Electronics Engineers in Israel (IEEEI),

    tute of Technology. In 2012 he joined the IBM Haifa

    Viacheslav Yuzhaninov received the B.Sc. degreein Electrical Engineering from Ben-Gurion University,Beer Sheva, Israel, in 2012. He has been a ResearchAssistant at the Low Power Circuits and Systems Lab,VLSI Systems Center, Ben-Gurion University, since 2011.He is currently working on his M.Sc degree in

    Electrical Enginering at Bar-Ilan University. His researchinterests are energy effcient logic families and lowvoltage high performance digital design.

    Alexey Kovshilovsky received the B.Sc. degree inElectrical Engineering from Ben-Gurion University,Beer Sheva, Israel, in 2012. He has been a ResearchAssistant at the Low Power Circuits and Systems Lab,

    of papers that won the Best Paper Finalist awards at IEEE ISCAS and ICECS

    A. Morgenshtein et al. / INTEGRATION, the VLSI journal () 9Research Lab. Prior to that he worked with Core CADTechnologies group at Intel Corporation, where he wasresearching and developing tools for power optimiza-tion and estimation at various levels of VLSI design. Hehas been a Teaching and Research Assistant at ElectricalEngineering Department, Technion since 1999, wherehe is currently an Adjunct Lecturer.Dr. Morgenshtein's research interests include low-

    power design techniques for digital circuits, optimization of on-chip interconnect,CMOS sensors and EDA tools for power estimation and optimization. He hasauthored over 40 scientic papers and patent applications. Dr. Morgenshtein co-authored a paper that won the IEEE VLSI Transactions (TVLSI) Best Paper awardfor 2012. He was honored by Technion President's award and Intel-Technion awardfor excellence in study in 1998 and 2007. He supervised projects winning the awardof Oz Moses Foundation by Intel in 2002 and best VLSI project in 2003 and 2005.Dr. Morgenshtein has served as associate editor for the Journal of Low PowerElectronics and Applications (JLPEA), as session chairman at ICECS'04 Conferenceand as referee in multiple journals and conferences.standard JPEG co-processor in 65 nm CMOS with sub/near-threshold powersupply, in: Proceeding of IEEE International Solid-State Circuits Conference-Digest of Technical Papers (ISSCC), 147 a, 2009, pp. 146147.

    [24] B.H. Calhoun, A. Wang, A. Chandrakasan, Modeling and sizing for minimumenergy operation in subthreshold circuits, IEEE Journal of Solid-State Circuits40 (9) (2005) 17781786.

    Arkadiy Morgenshtein received the B.Sc. degree inelectrical engineering in 1999, M.Sc. in biomedicalengineering in 2003, MBA in 2006 and Ph.D in elec-trical engineering in 2008 from Technion, Israel Insti-[20] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis and mitigation ofvariability in subthreshold design, in: Proceedings of the 2005 InternationalSymposium on Low power Electronics and Design,2005, pp. 2025.

    [21] N. Verma, J. Kwong, A.P. Chandrakasan, Nanometer MOSFET variation inminimum energy subthreshold circuits, IEEE Transactions on Electron Devices55 (2008) 163174.

    [22] D.F. Finchelstein, V. Sze, M.E. Sinangil, A.P. Chandrakasan, A 0.7-V 1.8-mW H.264/AVC 720p video decoder, IEEE Journal of Solid State Circuits 44 (2009)29432956.

    [23] Y. Pu, J.P. de Gyvez, H. Corporaal, Y. Ha, An ultra-low-energy/frame multi-iPlease cite this article as: A. Morgenshtein, et al., Full-Swing Gate DINTEGRATION, the VLSI journal (2013), http://dx.doi.org/10.1016/j.vlsProf. Fish serves as an Editor in Chief for the MDPI Journal of Low PowerElectronics and Applications (JLPEA) and as an Associate Editor for the IEEE SensorsJournal. He also served as a chair of different tracks of various IEEE conferences. Hewas a co-organizer of many special sessions at IEEE conferences, including IEEEISCAS, IEEE Sensors and IEEEI conferences. Prof. Fish is a member of Sensory, VLSISystems and Applications and Bio-medical Systems Technical Committees of IEEECircuits and Systems Society.conferences.neering Department. There he founded the Low PowerCircuits and Systems (LPC&S) laboratory, specializing inlow power circuits and systems. In July 2011 he wasappointed as a head of the VLSI Systems Center at BGU.

    In October 2012 Prof. Fish joined the Bar-Ilan University, Faculty of Engineering asan Associate Professor and the head of the microelectronics track. Prof. Fish alsoleads new Energy Efcient Electronics and Applications Labs.Prof. Fish's research interests include development of energy efcient smart

    CMOS image sensors, ultra low power SRAM, DRAM and Flash memory arrays andenergy efcient design techniques for low voltage digital and analog VLSI chips. Hehas authored over 70 scientic papers in journals and conferences, including IEEEJournal of Solid State Circuits, IEEE Transactions on Electron Devices, IEEE Transac-tions on Circuits and Systems and many others. He also submitted 16 patentapplications. Prof. Fish has published two book chapters. He was a co-authorCurrently he is working as an Embedded SoftwareEngineer at Powermat Technologies in Neve-Ilan, Israel.

    Alexander Fish received the B.Sc. degree in ElectricalEngineering from the Technion, Israel Institute ofTechnology, Haifa, Israel, in 1999. He completed hisM.Sc. in 2002 and his Ph.D. (summa cum laude) in2006, respectively, at Ben-Gurion University in Israel.He was a postdoctoral fellow in the ATIPS laboratory atthe University of Calgary (Canada) from 20062008. In2008 he joined the Ben-Gurion University in Israel, as afaculty member in the Electrical and Computer Engi-near-threshold region, Proceedings of the IEEE 98 (2010) 237252.VLSI Systems Center, Ben-Gurion University, since 2011.vol. 35, 2008, pp. 702706.[18] P.R. Panda, A. Shrivastava, P.R. Panda, B.V.N. Silpa, K. Gummidipudi, Power-

    Efcient System Design, Springer Verlag, 2010.[19] D. Markovic, C.C. Wang, L.P. Alarcon, J.M. Rabaey, Ultralow-power design iniffusion Input logicCase-study of low-power CLA adder design,i.2013.04.002

    Full-Swing Gate Diffusion Input logicCase-study of low-power CLA adder designIntroductionOverview of GDIGDI CLA adder designFull-swing GDI cellsCLA implementationLeakage elimination in GDI

    Comparative simulation resultsNominal voltage operationLow voltage operation

    ConclusionsReferences


Recommended