+ All Categories
Home > Documents > 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems...

660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems...

Date post: 28-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
12
660 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007 Device-Aware Yield-Centric Dual- Design Under Parameter Variations in Nanoscale Technologies Amit Agarwal, Member, IEEE, Kunhyuk Kang, Student Member, IEEE, Swarup Bhunia, Member, IEEE, James D. Gallagher, and Kaushik Roy, Fellow, IEEE Abstract—Dual- design technique has proven to be extremely effective in reducing subthreshold leakage in both active and standby mode of operation of a circuit in submicrometer technolo- gies. However, aggressive scaling of technology results in different leakage components (subthreshold, gate and junction tunneling) to become significant portion of total power dissipation in CMOS cir- cuits. High- devices are expected to have high junction tunneling current (due to stronger halo doping) compared to low- devices, which in the worst case can increase the total leakage in dual- design. Moreover, process parameter variations (and in turn variations) are expected to be significantly high in sub-50-nm technology regime, which can severely affect the yield. In this paper, we propose a device aware simultaneous sizing and dual- design methodology that considers each component of leakage and the impact of process variation (on both delay and leakage power) to minimize the total leakage while ensuring a target yield. Our results show that conventional dual- design can overestimate leakage savings by 36% while incurring 17% average yield loss in 50-nm predictive technology. The proposed scheme results in 10%–20% extra leakage power savings compared to conventional dual- design, while ensuring target yield. This paper also shows that nonscalability of the present way of realizing high- devices results in negligible power savings beyond 25-nm technology. Hence, different dual- process options, such as metal gate work function engineering, are required to realize high-performance and low-leakage dual- designs in future technologies. Index Terms—Band-to-band tunneling (BTBT) leakage, dual- design, process variation. I. INTRODUCTION C MOS devices are being scaled down aggressively in each technology generation to achieve higher integration density, while the supply voltage is scaled to achieve lower switching energy per device. However, to achieve high per- formance, there is a need for commensurate scaling of the transistor threshold voltage ( ), which in turn exponentially increases the subthreshold leakage [1]. Aggressive scaling of the devices not only increases subthreshold leakage but also has other negative impacts such as increased drain induced barrier lowering (DIBL), roll-off, reduced on-current to off-current ratio, and increased source-drain resistance [2]. To avoid the Manuscript received June 1, 2006; revised December 5, 2006. This work was supported in part by Semiconductor Research Corporation (SRC) and by Gigas- cale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal and J. D. Gallagher are with Intel Corporation, Hillsboro, OR 97124 USA (e-mail: [email protected]; [email protected]). K. Kang and K. Roy are with the Electrical Engineering Department, Purdue University, Lafayette, IN 47906 USA (e-mail: [email protected]; [email protected]). S. Bhunia is with Case Western University, Cleveland, OH 44106 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2007.898683 Fig. 1. Major leakage components in a transistor. short channel effects, oxide thickness scaling and higher and nonuniform doping (“halo” and “retrograde well”) needs to be incorporated as the devices are scaled to the nanometer regime. However, low oxide thickness gives rise to high electric field, resulting in considerable direct tunneling current (gate leakage, Fig. 1). Higher doping results in high electric field across the p-n junctions (source-substrate or drain-substrate), which causes significant junction (source/substrate and drain/sub- strate) band-to-band tunneling (BTBT leakage) of electrons from the valence band of the p-region to the conduction band of the n-region (see Fig. 1) [23]. There is another leakage component called gate induced drain leakage (GIDL) which is also a product of small transistor geometries and may not be a dominant component during regular operations of the circuit. During normal mode of operation, the major leakage currents are subthreshold, gate, and junction BTBT leakage. The increase in different leakage components with technology scaling has two major implications in logic design. First, leakage reduction techniques are becoming indispensable in fu- ture design using sub-100-nm silicon technologies. Moreover, different leakage mechanisms are becoming equally important with device scaling. Hence, the relative magnitudes of each of the leakage components play a major role in low-leakage logic design. Furthermore, controlling the variation in device parameters (both systematic and random) during fabrication is becoming a great challenge for scaled technologies [1], [3]. The delay and leakage currents in a device depend on the transistor geometry (gate length, oxide thickness, width, the doping profile, “halo” doping concentration, etc.), the flat-band voltage, and the supply voltage. Any statistical variation in each of these parameters results in a large variation in different leakage components and significant spread in delay. Among the statistical variations, the random placement of dopants is of great concern [3] because it 1063-8210/$25.00 © 2007 IEEE
Transcript
Page 1: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

660 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

Device-Aware Yield-Centric Dual-Vt Design UnderParameter Variations in Nanoscale Technologies

Amit Agarwal, Member, IEEE, Kunhyuk Kang, Student Member, IEEE, Swarup Bhunia, Member, IEEE,James D. Gallagher, and Kaushik Roy, Fellow, IEEE

Abstract—Dual- design technique has proven to be extremelyeffective in reducing subthreshold leakage in both active andstandby mode of operation of a circuit in submicrometer technolo-gies. However, aggressive scaling of technology results in differentleakage components (subthreshold, gate and junction tunneling) tobecome significant portion of total power dissipation in CMOS cir-cuits. High- devices are expected to have high junction tunnelingcurrent (due to stronger halo doping) compared to low- devices,which in the worst case can increase the total leakage in dual-design. Moreover, process parameter variations (and in turnvariations) are expected to be significantly high in sub-50-nmtechnology regime, which can severely affect the yield. In thispaper, we propose a device aware simultaneous sizing and dual-design methodology that considers each component of leakage andthe impact of process variation (on both delay and leakage power)to minimize the total leakage while ensuring a target yield. Ourresults show that conventional dual- design can overestimateleakage savings by 36% while incurring 17% average yield lossin 50-nm predictive technology. The proposed scheme results in10%–20% extra leakage power savings compared to conventionaldual- design, while ensuring target yield. This paper also showsthat nonscalability of the present way of realizing high- devicesresults in negligible power savings beyond 25-nm technology.Hence, different dual- process options, such as metal gate workfunction engineering, are required to realize high-performanceand low-leakage dual- designs in future technologies.

Index Terms—Band-to-band tunneling (BTBT) leakage, dual-design, process variation.

I. INTRODUCTION

CMOS devices are being scaled down aggressively ineach technology generation to achieve higher integration

density, while the supply voltage is scaled to achieve lowerswitching energy per device. However, to achieve high per-formance, there is a need for commensurate scaling of thetransistor threshold voltage ( ), which in turn exponentiallyincreases the subthreshold leakage [1]. Aggressive scaling ofthe devices not only increases subthreshold leakage but also hasother negative impacts such as increased drain induced barrierlowering (DIBL), roll-off, reduced on-current to off-currentratio, and increased source-drain resistance [2]. To avoid the

Manuscript received June 1, 2006; revised December 5, 2006. This work wassupported in part by Semiconductor Research Corporation (SRC) and by Gigas-cale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, bySRC, by Intel, and by IBM.

A. Agarwal and J. D. Gallagher are with Intel Corporation, Hillsboro, OR97124 USA (e-mail: [email protected]; [email protected]).

K. Kang and K. Roy are with the Electrical Engineering Department, PurdueUniversity, Lafayette, IN 47906 USA (e-mail: [email protected];[email protected]).

S. Bhunia is with Case Western University, Cleveland, OH 44106 USA(e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2007.898683

Fig. 1. Major leakage components in a transistor.

short channel effects, oxide thickness scaling and higher andnonuniform doping (“halo” and “retrograde well”) needs to beincorporated as the devices are scaled to the nanometer regime.However, low oxide thickness gives rise to high electric field,resulting in considerable direct tunneling current (gate leakage,Fig. 1). Higher doping results in high electric field acrossthe p-n junctions (source-substrate or drain-substrate), whichcauses significant junction (source/substrate and drain/sub-strate) band-to-band tunneling (BTBT leakage) of electronsfrom the valence band of the p-region to the conduction bandof the n-region (see Fig. 1) [23]. There is another leakagecomponent called gate induced drain leakage (GIDL) whichis also a product of small transistor geometries and may notbe a dominant component during regular operations of thecircuit. During normal mode of operation, the major leakagecurrents are subthreshold, gate, and junction BTBT leakage.The increase in different leakage components with technologyscaling has two major implications in logic design. First,leakage reduction techniques are becoming indispensable in fu-ture design using sub-100-nm silicon technologies. Moreover,different leakage mechanisms are becoming equally importantwith device scaling. Hence, the relative magnitudes of each ofthe leakage components play a major role in low-leakage logicdesign.

Furthermore, controlling the variation in device parameters(both systematic and random) during fabrication is becoming agreat challenge for scaled technologies [1], [3]. The delay andleakage currents in a device depend on the transistor geometry(gate length, oxide thickness, width, the doping profile, “halo”doping concentration, etc.), the flat-band voltage, and the supplyvoltage. Any statistical variation in each of these parametersresults in a large variation in different leakage components andsignificant spread in delay. Among the statistical variations, therandom placement of dopants is of great concern [3] because it

1063-8210/$25.00 © 2007 IEEE

Page 2: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 661

Fig. 2. Leakage components in 90- and 50-nm low- and high-V , 100 C.

is independent of transistor spatial location and causes thresholdvoltage mismatch between transistors even though they may beclose to each other (intra-die variation) resulting in significantleakage and delay variation of logic gates and circuits. Hence,any low leakage design needs to consider the spread of leakageand delay, both at circuit and device design phase, to minimizeoverall leakage, while maintaining yield with respect to a targetdelay under process variation.

Dual- design technique has proven to be extremely effec-tive in reducing subthreshold leakage in both active and standbymode of operation of a circuit in submicrometer technologies.However, with the emerging issues related to technologyscaling, the effectiveness of conventional dual- design tech-nique [4]–[7] may be degrading in nanoscale technologies. Theissues related to dual- design in nanoscale technologies areas follows.

1) Scaled devices require the use of higher substrate dopingand the application of the “halo” profiles to reduce theshort channel effect. The high halo doping supersedesany change in base channel doping or threshold voltageimplants, which were used traditionally to achieve high-devices. In nanoscale technologies, high- devices canbe obtained by increasing the peak halo doping [24].This higher halo doping reduces the subthreshold leakageexponentially, however, it results in significant junctionBTBT current [23] (note that gate leakage is insensitiveto halo doping profile, Fig. 2). Hence, any reduction insubthreshold leakage because of high- device in dual-design will be at the expense of corresponding increasein junction BTBT leakage, which in the worst case mightincrease the total leakage. Since the relative magnitudes ofdifferent leakage components vary across devices, therebydifferent ’s, the selection of high/low- devices in adual- design should consider this tradeoff. A deviceaware dual- design, which investigates different devicedesign options for realizing the optimum low/high-devices, is required so that the leakage savings can beamplified.

2) It has been observed that as the number of critical pathson a die increases, within-die delay variation causes bothmean and standard deviation of the die frequency distri-bution to become smaller. Reduction in the mean of the

Fig. 3. Yield loss due to dual-V design in 50-nm technology, 100 C.

die frequency dominates over reduction in the standard de-viation, resulting in reduced performance [9]. Since theidea behind dual- design is to utilize the slack betweenoff-critical and critical paths for high assignment, in ef-fect, it increases the number of critical paths in a circuit.This, in turn, increases the mean and reduces the standarddeviation of the circuit delay distribution. Since circuitsare designed to meet certain delay constraint, this increasein the mean of circuit delay distribution may increase thenumber of dies failing to meet the delay boundary, andhence resulting in reduced yield (smaller standard devia-tion reduces this yield loss). Fig. 3 plots the circuit delaydistributions of a low- and a conventionally optimized(for low leakage) dual- circuit. We can observe that after

assignment, more number of dies may fail to meet therequired delay constraint resulting in low yield. Moreover,devices changing different ’s will have different processvariation spread. A high- device is expected to have large

variation due to high halo doping concentration [10](more random dopant fluctuation). Hence, a device awaredual- design, which considers the delay distribution ofcircuit under process variation, is required to minimizeleakage, while ensuring yield.

3) Since circuit leakage follows statistical distribution underparameter variations, any dual- design technique thatconsiders either worst case or best case leakage will sufferfrom an overly pessimistic or optimistic approach. A gooddual- design should target probabilistic minimizationof leakage considering the effect of process variation onthe leakage of different devices (high- devices will havelarge ).

All previously proposed dual- design techniques either ig-nore the effect of process variation [4]–[7] or do not consider allleakage components while selecting high- devices. Since bothprocess variation and relative magnitude of different leakagecomponents strongly depend on the choice of low/high de-vices, we propose a device aware yield-centric dual- designmethodology, which will consider each component of leakageand the impact of process variation (on both delay and leakagepower) to minimize the total leakage while ensuring a targetyield [8]. We also analyze the effectiveness of dual- design

Page 3: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

662 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

Fig. 4. Nanoscaled n-channel device with halo doping.

with technology scaling. Our results show that non-scalabilityof present way of realizing high- devices results in negligiblepower savings beyond 25 nm technology even in our proposeddevice aware dual- design. Hence, different process options,such as metal gate work function engineering, are required torealize high-performance and low-leakage dual- designs insub-50-nm bulk technologies.

The rest of the paper is organized as follows. In Section II,we show our device level analysis for 90-, 50-, and 25-nmdual- technologies. Section III explains the statistical leakageand delay analysis and presents our proposed device-awaredual- design under parameter variations considering a yieldconstraint. Section IV presents the simulation results on a setof ISCAS85 benchmark circuits. Section V describes the metalgate work function engineering, and shows its effectiveness insaving leakage power, when used as an option for designingdual- devices. In Section VI, we draw our conclusions.

II. DEVICE LEVEL ANALYSIS

In nanoscaled bulk Silicon technologies, high- devices areobtained by changing the peak halo density and its location [24].In n-channel device the strength of the halo can be increasedby: 1) increasing the peak halo doping Ap, 2) moving the po-sition of the lateral peak of the halo (CXp) close to the centerof the channel and 3) moving the position of the vertical peakof the halo (CYp) away from the bottom junction and towardsthe surface (Fig. 4). An increase in the strength of the “halo”reduces subthreshold leakage and improves short channel ef-fects; however, it increases the junction BTBT due to high elec-tric field across p-n junctions [10], [23]. It also increases thevariability due to random dopant fluctuation and the junc-tion capacitance. To investigate effectiveness of dual- designwith technology scaling and to achieve optimum low/high-devices, nMOS transistors were designed based on the dopingprofile and device structure given in [11] and the design guide-line given in 2001 and 2003 ITRS Roadmap for effective gatelength of 90, 50, and 25 nm. The devices were simulated usingMEDICI device simulator [12].

The peak halo density (Ap) along with halo location (CXp,CYp) was varied to achieve optimum low/high- devices. Theoxide thickness, source/drain junction doping, base channeldoping and all other device parameters were kept fixed basedon ITRS Roadmap and device structure given in [11]. Deviceoptimization was performed by varying halo doping profilewhile keeping the subthreshold leakage fixed to a desired value.The goal of the optimization was to maximize / , while

maintaining the subthreshold slope within 120 mV/decade withreasonable - roll-off and DIBL. Here, Ioff consists of allcomponents of leakage (gate, subthreshold and junction BTBTleakage). Devices with different subthreshold leakage corre-spond to different ’s. Since, gate leakage is almost insensitiveto change in halo doping profile, by maximizing / weachieved an optimum device with minimum junction BTBT andhighest performance for a given subthreshold leakage (in otherwords for a given ). Low- transistor for each technologyis chosen such that it meets the leakage specification providedin the ITRS roadmap.1 In this paper, we use these devices toshow our results on 90, 50, and 25 nm effective gate lengthtechnologies.

Fig. 5 plots the different leakage components in our op-timized low/high nMOS devices for 90-, 50-, and 25-nmdevices at 100 C. It can be observed from the figure that in-creasing the of the device reduces the subthreshold leakageexponentially, however, it also increases the junction BTBTleakage. The gate leakage is almost insensitive to change in .In reality, during inversion (on state) an increase in effectivechannel doping increases band-bending, thereby increases thegate to channel leakage, but at the same time it also decreasesthe amount of inversion charge available for tunneling (at same

thereby, decreasing the leakage current. Weobserved that, the second effect prevails over the first and thegate tunneling current decreases at high- . However, decreasein gate leakage is negligible compared to increase in junctionBTBT leakage. Hence, any reduction in subthreshold leakagebecause of high- device in dual- design will be at theexpense of corresponding increase in junction BTBT leakage,which in the worst case might increase the total leakage. Since90-nm devices do not require strong halo concentration to main-tain short channel effect and to meet the required subthresholdleakage, junction BTBT is almost negligible as compared to thesubthreshold leakage for a wide range of ’s [see Fig. 5(a)].Hence, conventional dual- designs that do not considerjunction BTBT while assigning high- , can be in effectivein saving leakage in submicrometer technologies. However,in a 50-nm device, the junction BTBT leakage increases sig-nificantly with small changes in and becomes comparableto subthreshold leakage at 0.3 V, which is only 100 mVhigher than the low [see Fig. 5(b)]. This difference betweenlow and high- gets smaller (only 60 mV) as we go to 25-nmtechnology [see Fig. 5(c)]. Hence, the relative magnitudes ofdifferent leakage components vary across devices having dif-ferent ’s. Considering only subthreshold leakage in dual-optimization will, therefore, overestimate the leakage savingsand in the worst case might increase the total leakage. The gateleakage is almost insensitive to the change in .

Fig. 6 plots the standard deviation of ( ) due to randomdopant fluctuation versus for 90, 50, and 25 nm optimizedminimum width nMOS devices. depends on manufacturingprocess, doping profile, and the transistor size and is given by[10]

(1)

1[Online]. Available: http://public.itrs.net/

Page 4: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 663

Fig. 5. Simulation results of low/high-V optimum n-channel devices leakage components (a) 90 nm, V = 1.5 V, (b) 50 nm, V = 1.2 V, and (c) 25 nm,V = 1.0 V.

Fig. 6. �V due to random dopant fluctuation versus V .

where is the effective channel doping, is the depletionregion width, and is the oxide thickness. Since high- de-vices have high effective channel doping, increases with .Fig. 6 shows that is negligible with respect to nominalin 90-nm devices, however, it becomes significant in 50- and25-nm devices resulting in considerable spread in drive currentand leakage power.

III. STATISTICAL CIRCUIT ANALYSIS

In nanoscaled CMOS devices, the random variations in thenumber and the placement of dopant atoms in the channelregion cause random variations in the transistor thresholdvoltage [3]. Moreover, the delay and leakage distributionof a circuit strongly depends on the device geometry (channellength, width, oxide thickness, etc.) and doping profile. ThoughMonte Carlo simulation of logic gates (using device simulatorlike MEDICI [12] during device design or circuit simulatorlike SPICE during circuit design) provides an accurate way ofestimating the delay and leakage distributions, it is computa-tionally expensive and thus considerably increases the designtime. The problem is more severe particularly if estimation isrequired at the device design phase. Hence, statistical modelingand analysis of delay and leakage of logic gates are necessaryboth at the circuit and device design phase to improve theefficiency of low power dual- design. This section describesour semianalytical method to estimate both leakage and delay

distribution in a circuit using the parameters obtained fromdevice simulations.

For intra-die variation, we consider the intrinsic fluctuationof the of different transistors due to random dopant effect,which is the primary source of intra-die process variation [3].For inter-die variation we consider variation in gate length( , results in variations in transistor threshold voltage),usually considered to be the dominant source of inter-dievariation. It should be noted that any other variations can easilybe incorporated into our model. We assume that random dopantfluctuation is independent of variation. While there is aminor dependency between and random dopant fluctu-ation (1), the error introduced as a result of the independenceassumption was found to be negligible. The standard deviationof due to random dopant fluctuation is extracted from ouroptimized device using (1) (see Fig. 6), which depends on both

and width of the transistors. We assume 15% 3 variation infor our analysis [9].1

A. Statistical Leakage Power Estimation

Many researchers have proposed statistical model of sub-threshold leakage to estimate leakage of a chip under parametervariations [26], [27]. However, they do not consider variationsin other leakage components. In this work, we have modeledall three components of leakage (subthreshold, gate and junc-tion BTBT leakage) of a device using the device geometry,2-D doping profile and the operating temperature based onanalytical models described in [13]. These analytical modelsare calibrated and verified by device simulation across differentbiasing conditions and device/circuit parameters. The leakagemodels are used to estimate total leakage (as a sum of sub-threshold, gate and junction BTBT leakage) of a circuit in ourdual- design. Second, the sensitivity of different parameters(e.g., , gate length variation) on leakage is extracted fromdevice level analysis of different devices with different ’s.Finally, the developed model and extracted sensitivity are usedto estimate the total leakage and its distribution in a circuit. Thesubthreshold leakage dependency on and variation ismodeled as an exponential decay model (e.g., ).Since the junction BTBT and the gate leakage are almostinsensitive to random dopant fluctuation and only linearly

Page 5: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

664 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

Fig. 7. NAND gate leakage dependency with respect to input vector, stacking,and body effect.

dependent on variation, we neglect any variation of thesetwo leakage components.

Since, the variation is random in nature, it is assumed tobe independent for all transistors in a circuit. Hence, the sub-threshold leakage in a logic gate depends on the variationin different transistors in that logic gate and the input vector.As we can observe from Fig. 7, the subthreshold leakage of atwo-input NAND gate, for input vectors “00” and “01,” dependson threshold voltage of both M1 and M2. This can be modeledas [22], [28]

The subthreshold leakage for input vector “10” and “11” caneasily be written as

Hence, the total leakage in a NAND gate is

where and are the signal probability (input prob-ability of 00, 01, 10, and 11, respectively) of input vectors atthe inputs of the NAND gate. These probabilities are calculatedusing signal probability of a node in a circuit, which isthe probability that node is logical one. These signal probabil-ities can be propagated through the basic logic gates based onsimple rules of probability and Boolean function implementedby the gate [14]. Hence, the PDF of total leakage in a NAND gatecan be expressed as the sum of correlated lognormals, which canbe well approximated by another lognormal using Wilkinson’smethod [15]. Similarly, the PDF of leakage distribution in dif-ferent types of gates (NOR, Inverter) can be approximated as alognormal based on their input probabilities.

The PDF of total leakage of a circuit due to intra-die varia-tion can be written as a sum of independent lognormal distribu-tion associated with different logic gates [22]. This can be againapproximated as lognormal using Wilkinson approximation

(2)

Finally, the total leakage distribution of a circuit consideringboth inter ( variation) and intra die variation can be written

as the product of two lognormal distributions, which itself canbe represented as a lognormal [22]

(3)

where and are the parameters of final lognormal distribu-tion. All sensitivities (e.g., etc.) and associ-ated with all the transistors are calculated using the developedleakage models and device simulations. The percentage point

function of a lognormal is defined as

where

(4)

Here, is the CDF of a standard normal distribution. This canbe used to estimate the confidence point of leakage distribution.

B. Simultaneous Gate Sizing and Dual- AssignmentAlgorithm

In this section, we describe the simultaneous gate sizing anddual- assignment algorithm to minimize total power (dynamicpower and leakage power) of the circuit while meeting a targetyield (with respect to a given delay constraint). The proposed al-gorithm is iterative, where the choice of transistor size or -as-signment is based on a sensitivity analysis of different logicgates. The basic steps of the algorithm are described in Fig. 8.

The algorithm starts by assigning high- and minimum sizeto all logic gates in the given circuit, which corresponds to theminimum power/maximum delay configuration of the circuit.For the gate sizing, we applied a standard cell library-basedmethod, where each cell type is provided with seven differentsizes to represent various driver strengths and capacitive loads.Then the algorithm proceeds to select logic gates for low-assignment or up-sizing in multiple iterations. Note that forup-sizing, we always select the next available size in the libraryto compute the change in delay and power. We use a flag (i.e.,color) to indicate if a particular logic gate is either assignedlow- value or up-sized in a particular iteration. In each itera-tion, we compute sensitivity of each logic gate with respect tothe change in circuit delay for unit change in circuit power asfollows:

(5)

where and represent the change in circuit delayor total power by upsizing or low- assignment, respectively.

can be calculated using the statistical timing analysis.We employed the statistical static timing analysis (SSTA) al-gorithm proposed in [16], where delay distribution of a circuitis calculated using the levelized covariance propagation (LCP)technique [16], [25]. Using the SSTA algorithm proposed in[16], we computed the distribution of maximum circuit delay(e.g., mean and STD) considering the independent random vari-ation of and the inter-die variation of . To consider the

Page 6: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 665

Fig. 8. Flow chart for the simultaneous gate sizing and dual-V designalgorithm.

impact of process variation on delay, we have used 95 percentilecircuit delay value which ensures 95% yield. representsthe change in total power of the circuit for a sizing or low- as-signment operation. Both dynamic and leakage powers are com-puted separately. Signal activities at each node are precharacter-ized and used to compute dynamic power (i.e., issignal activity) during the analysis. To consider the impact ofprocess variation on leakage, we have characterized the leakagevalues of the cells for the 95 percentile value from the leakagedistribution. Leakage currents are computed using the algorithmpresented in Section III-A. The necessary parameters used indelay and power models are extracted from device simulationresults for high/low- devices.

In each iteration, we rank all the logic gates in descendingorder of their sensitivity values. Then, we choose the logicgates in order of their sensitivities and assign them with the bestsize-factor/ -value (with highest sensitivity ). When the se-lection of size-factor/ -value is done, all the logic gates in itsfan-in and fan-out cone are colored, so that they are not consid-ered for sizing/ -allocation in the current iteration. This helpsus to improve the runtime of the algorithm, by minimizing ex-pensive statistical timing analysis runs. However, the coloringmethod can inherently cause the algorithm to select relativelylow-sensitivity gates in the early iterations, which can degradethe overall yield result. To avoid such cases, we have set the 10%

sensitivity value of first selection as the lower limit in selectingnext gates. By providing the lower limit, we observed a negli-gible difference between our coloring-based selection methodand the original method without coloring (i.e., only one selec-tion for each iteration). When all gates are colored, we run thestatistical timing analysis and check for the yield constraint. Ifthe constraint is satisfied, the algorithm terminates for a fixedvalue, else it goes back to the initialization step and proceeds tothe next iteration. The algorithm for assignment/sizing for afixed value terminates before the yield constraint is satisfiedafter an iteration failing to improve yield by a threshold margin.Note that even multiple gates are up sized or assigned low- ,due to the coloring method, only one gate per path can be chosein max, hence the algorithm will never over satisfy the designconstraint.

The proposed algorithm works on a greedy heuristic of sensi-tivity based sizing/ -value selection. The assignment/sizingalgorithm is effective in terms of reducing total power while sat-isfying yield constraint since, at each iteration, it selects onlythe most sensitive logic gates (for sizing or assignment) thatcan meet the yield bound with minimum power increase. Theeffectiveness of the algorithm largely depends on the accuracyand efficiency of calculating sensitivity values at each iteration.In our experiments, we have used incremental timing and poweranalysis (which considers recomputation of delay or power onlyat the affected nodes due to a size change and thus, requiremuch less run time compared to a full timing analysis) to re-compute sensitivity of logic gates with respect to both upsizingand low- assignment at the beginning of each iteration. Thecomplexity of the algorithm is

(6)

where is the number of iterations required, which dependsmostly on circuit topology and in the worst case it can growcomparable to the number of logic gates (i.e., in the ex-treme case where only single logic gate is applied to resizingor low- assignment); and are the complexityof timing analysis and power (dynamic and leakage) analysis,respectively, during sensitivity computation. Since incrementalalgorithm is used for the sensitivity computation, complexity of

and are linearly dependent to the total numberof gates . is the complexity of statistical timinganalysis and yield computation for the circuit. In [16], it wasshown that the employed SSTA algorithm is square dependent tothe maximum number of gates in a single logic level. Note thatwe run this algorithm over a set of preselected high values.Hence, the overall complexity of the algorithm in Fig. 8 willbe: , where is the number of preselected highvalues.

IV. RESULTS

In this section, we compare the leakage savings and yieldimprovement achieved by traditional static dual- design(CONV), which only considers subthreshold leakage as theoptimization criteria, with our proposed device aware dual-scheme considering process variation and all components ofleakage. To show different tradeoffs associated with leakage

Page 7: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

666 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

Fig. 9. High-V assignment using CONV and OPT1 (a) 90-, (b) 50-, and (c) 25-nm technology. P1: Leakage power using OPT1. P2: Expected leakage Powerusing CONV. P3: Actual leakage power using CONV.

savings and yield improvement, we categorized our algorithminto three progressive optimizations [7]. First, optimization(OPT1) considers all components of leakage but ignores anyvariation in delay and leakage. The second optimization (OPT2)takes circuit delay variation (95th percentile circuits delay oflow- circuit as a constraint in dual- optimization) intoaccount to ensure yield, while considering all components ofleakage, but ignores any leakage variation. The third optimiza-tion (OPT3) considers all the previous parameters and alsominimizes 95 percentile leakage of the circuit. We show allour results for ISCAS benchmarks using our optimized devicesfor 90-, 50-, and 25-nm technologies. We first size all ISCASbenchmark circuits using low- transistors for a given delayconstraint for minimum dynamic power. We estimate the totaldynamic and leakage power of these optimally sized circuitsconsidering all components of leakage. We also measure the95th percentile delay of these circuits. We use these power anddelay values as the basis for showing our results for powersavings and yield estimation.

Fig. 9 compares the optimum high- and the leakage sav-ings achieved by conventional dual- design (CONV) andOPT1 for ISCAS c880. For 90-nm technology both designtechniques select the same optimum high- and results inaround 90% leakage savings Fig. 9(a). However, for 50-nmtechnology, optimum high- ’s selected by CONV and OPT1differ by 40 mV [see Fig. 9(b)]. Fig. 10 analyzes the differentpower components in CONV for the 50-nm node. It showsthat even though subthreshold leakage power is minimum at

0.34 V, the total leakage minima occurs at a smallervalue due to increase in junction BTBT. The gate leakage anddynamic power do not change significantly across different ’sand depend on the size of logic gates. If we include junctionBTBT and gate leakage power at optimum point (P2) inCONV curve Fig. 9(b), the total power (P3) actually exceedsthe minimum power (P1) achieved by OPT1 by 37%. Hence,OPT1 achieves more leakage power savings compared to con-ventional approaches. It also shows that CONV overestimatesthe total power saving by 63% and only saves 17% of totalleakage. Even though OPT1 results in higher junction BTBTleakage compared to low- design, it saves more than 54% oftotal leakage.

Fig. 10. Leakage power components in CONV using 50-nm technology.

Fig. 11 plots the minimum total leakage and minimum totalpower achieved by both design methods for different bench-marks in 50-nm technology. Total leakage and total powershown are normalized with respect to total leakage and totalpower of an optimally sized low- design, respectively. Notethat we can observe a noticeable benefit in c880 circuit. This isdue to the large number of near-similar noncritical paths withabundant timing slacks in c880. Since high ’s are mainlyapplied to noncritical paths, we can see large portion of totalleakage being dominated by the BTBT component. By applyingOPT1, we can significantly reduce the BTBT component asshown in Fig. 11(a). Table I shows the expected (P2, consideringonly subthreshold leakage) and actual leakage power savings(P3, considering all components of leakage) in CONV and totalleakage power savings (P1) achieved by OPT1 across differentbenchmarks, while considering the dynamic power overheaddue to sizing. Our device aware scheme results in average 14%and a maximum of 37% more leakage power savings comparedto conventional scheme. Conventional designs overestimate theleakage savings by 36% (average). Considering all componentsof leakage power results in around 20 mV smaller optimumcompared to considering subthreshold leakage in optimization.

However, in 25-nm technology, due to significant increase injunction BTBT, dual- design using CONV results in negli-

Page 8: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 667

Fig. 11. (a) Minimum total leakage using CONV and OPT1. (b) Minimum total power using CONV and OPT1 in 50-nm technology.

Fig. 12. (a) Minimum total leakage using CONV and OPT1. (b) Minimum total power using CONV and OPT1 in 25-nm technology.

TABLE I% LEAKAGE SAVINGS USING CONV AND OPT1 IN 50-nm TECHNOLOGY

gible leakage saving, while OPT1 results in only 14% leakagesaving Fig. 9(c). Moreover, the difference between low- andoptimum high- for OPT1 is only 20 mV. Such dual- ’swill be difficult to fabricate accurately considering the largeprocess variation in nanoscaled technologies. Fig. 12 plots theminimum total leakage and minimum total power achievedby both CONV and OPT1 for different benchmarks in 25-nmtechnology. Table II shows the respective leakage savings asdiscussed previously for 25-nm technology. It can be observedfrom the figure that CONV results in negligible power savingsand for some benchmarks it actually increases the total power

TABLE II% LEAKAGE SAVINGS USING CONV AND OPT1 IN 25-nm TECHNOLOGY

compared to low- design. Our device aware scheme OPT1results in average 13.8% and a maximum of 14.1% leakagepower savings. The difference between low- and optimumhigh- for OPT1 varies from 20–30 mV across benchmarks. Itis evident from the previous results that dual- designs shouldconsider each component of leakage while optimizing circuitto reduce total leakage power. Since, increasing peak halodoping to realize high- increases junction BTBT leakageresults in negligible leakage savings in aggressively scaledtechnologies, a different design option to realize high- ’s

Page 9: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

668 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

Fig. 13. Circuit delay distribution and yield loss using CONV, OPT1 and OPT2 (a) 90-, (b) 50-, (c) 25-nm technology.

TABLE III% YIELD USING CONV AND OPT1 IN 50-nm TECHNOLOGY

needs to be explored to maintain the effectiveness of dual-design in nanoscale technologies.

Fig. 13 plots the circuit delay distributions obtained usingstatistical timing analysis tool (Section III) for the optimumdual- circuits (high- which achieves minimum power)using CONV, OPT1, and OPT2 in ISCAS c880. Since in90-nm devices is negligible with respect to their , inCONV, OPT1, and OPT2, 95% of the dies were able to meetthe required delay constraint (95 percentile circuit delay oflow- circuit). However, for 50- and 25-nm technologies,CONV results in only 86% and 80% yield, while OPT1 resultsin 92% and 84% yield, respectively. Since OPT2 imposesyield constraint with respect to circuit delay variation whileassigning high- , it is able to meet the required 95th percentiledelay yield for both 50- and 25-nm technology. Table III showsthe yield of different benchmarks using CONV and OPT1 for50-nm technology. The CONV and OPT1 result in average17% and 16% and maximum of 59% and 52% yield loss in50-nm technology, respectively. The yield loss for 25 nm isobserved to be higher and OPT1 results in better yield on anaverage than CONV. Since c2670 has large number of primaryinputs (233) and a large number of critical paths, which in turnresults in large mean shift [9], it has maximum yield loss.

Hence, for nanoscale technologies, dual- design shouldconsider the delay distribution of circuit under process varia-tion to ensure yield, while minimizing leakage. The leakagepower saving achieved by OPT2 is 55% (average) in 50-nmtechnology. However, it is only 11% (average) in 25-nm tech-nology. This shows that in aggressively scaled technologies,dual- optimization results in almost negligible power savings,if the yield constraint is forced. Hence, present way of realizinghigh- devices, which results in higher process variation, maynot be suitable in reducing leakage in nanoscaled technologies.

Fig. 14 compares the 95th percentile leakage power savingsachieved by CONV, OPT2, OPT3 with respect to 95th percentile

leakage power of a low- design for different benchmarks.OPT3 results in best 95% percentile leakage power, while it en-sures yield with respect circuit delay for all technologies. Asexpected in 90-nm technology, 95th percentile leakage savingsare almost same for all the designs due to negligible intra-dieprocess variation. However, in 50-nm technology OPT3 resultsin average 13.4% and 7.3% extra leakage power saving com-pared to CONV and OPT2, respectively, and saves 67% leakagepower compared to all low- design. This shows the impor-tance of considering leakage variation in dual- optimization.In 25-nm technology, OPT3 results in average 14% 95th per-centile leakage saving compared to low- design, which is 2higher than the leakage savings achieved by CONV. However,the total leakage power savings compared to low- design it-self is negligible.

V. METAL GATE AND WORK FUNCTION ENGINEERING

As we expected, high- accomplished by strengthening thehalo doping concentration gives rise to a noticeable junctionBTBT leakage. This becomes more evident in future nanoscaletechnologies where a higher baseline halo concentration isneeded to suppress the worsening of roll-off and DIBLwith device scaling. In technologies where one cannot afford ahigher halo doping, high- devices can be realized by usingmetal gates—materials with higher work functions—withoutimpacting the junction BTBT leakage and process variation[19]. Recently, metal gates are being explored not only to haveproper control on realizing devices having high- , but alsoto achieve high performance while maintaining short channeleffect. Aggressive scaling of gate length and oxide thicknessof devices exacerbates the problems of poly-Si gate depletion,high gate resistance and boron penetration from the dopedpoly-Si gate into the channel region [19]. The poly depletionincreases the effective oxide thickness which in turn reducesthe gate capacitance in the inversion region and hence, theinversion charge density, leading to a lower gate overdriveand thus degrading the device performance. Moreover, poly-Sihas been reported to be incompatible with a number of high-kgate-dielectric materials, which are required to maintain rea-sonable gate leakage.

To show our results, we first designed an optimum low-25-nm device, by varying metal gate work function along with

Page 10: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 669

Fig. 14. 95 percentile leakage power saving using CONV, OPT2, and OPT3: (a) 90-, (b) 50-, and (c) 25-nm technology.

Fig. 15. Simulation results of 25-nm low/high-V optimum metal gate devices. (a) Leakage components. (b) � V due to random dopant fluctuation versus V .

, peak halo density and halo location ,which meets the ITRS roadmap. The devices having different

’s are then obtained by changing the gate work function.Fig. 15(a) plots different leakage components in our optimizedlow/high metal gate nMOS devices for 25-nm technologyat 100 C. It can be observed from the figure that subthresholdleakage dominates the total leakage in low- devices. In-creasing the (by changing the work function) of the devicereduces subthreshold leakage exponentially. It also decreasesthe gate leakage due to reduction in both the oxide field andthe inversion charge available for tunneling (increasing )[20]. The junction BTBT leakage is almost insensitive to thechange in , Since metal gate devices require lower baselinehalo concentration to maintain SCE, it has lesser junctionBTBT and smaller [due to random dopant fluctuation,Fig. 15(b)] compared to poly-Si gate devices. Moreover, theyare insensitive to change in . A dual- optimization usingOPT3 results in average 44% reduction in leakage (optimumhigh- 0.29 V, 120 mV higher than low ), while ensuringyield for all the ISCAS benchmarks designed using metal gatedevices.

We can conclude from the previous discussions, that metalgate work function engineering to realize high- devicesis suitable for dual- 25-nm technology, while achieving

high performance and target yield. The most desired metalgates should possess work functions close to Si band edgesfor CMOSFETS. More importantly, these metal gates shouldbe thermally stable to employ a convenient process flow forfabrication. However, it is extremely challenging to identifytwo thermally stable metal gates with the correct work func-tions. Furthermore, the method of preparing the metal gates iscritical due to process induced damages [21] and Fermi levelpinning. Many researchers have proposed different metal gatesand fabrication process to achieve these tasks [19]–[21] andsignificant research is still under way.

VI. CONCLUSION

In this paper, we show that in nanoscale regime conventionaldual- design suffers from yield loss due to process variationand considerably overestimates leakage savings since it doesnot consider all leakage components into account. Our anal-ysis shows the importance of considering device-based analysiswhile designing with low power schemes like dual- . It alsoshows that in nanoscale technology, statistical information ofboth leakage and delay helps in minimizing total leakage whileensuring a target yield with respect to target delay in dual-designs. Our proposed device and process variation aware si-multaneous sizing and dual- design methodology results in

Page 11: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

670 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 6, JUNE 2007

10%–20% extra leakage power savings compared to conven-tional dual- design, while maintaining yield in 50-nm tech-nology. However, nonscalability of the existing technique ofrealizing high- devices results in negligible power savingsbeyond 25-nm technology even in our proposed device-awaredual- design. We show that the use of different process op-tions such as metal gate work function engineering to realizehigh- devices will be helpful in achieving high-performance,low-leakage dual- designs in future scaled technologies.

REFERENCES

[1] S. Borkar, “Design challenges of technology scaling,” IEEE Micro, vol.19, no. 4, pp. 23–29, Jul./Aug. 1999.

[2] S. M. Sze, Physics of Semiconductor Devices. New York: Wiley-In-terscience, 1990.

[3] X. Tang, V. De, and J. D. Meindl, “Intrinsic MOSFET parameter fluc-tuations due to random dopant placement,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 5, no. 4, pp. 369–376, Dec. 1997.

[4] P. Pant, R. K. Roy, and A. Chattejee, “Dual-threshold voltage assign-ment with transistor sizing for low power CMOS,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 9, no. 2, pp. 390–394, Apr. 2001.

[5] T. Karnik, Y. Yibin, J. Tschanz, W. Liqiong, S. Burns, V. Govindara-julu, V. De, and S. Borkar, “Total power optimization by simultaneousdual-Vt allocation and device sizing in high performance microproces-sors,” in Proc. Design Autom. Conf., 2002, pp. 486–491.

[6] M. Ketkar and S. S. Sapatnekar, “Standby power optimization via tran-sistor sizing and dual threshold voltage assignment,” in Proc. Int. Conf.Comput.-Aided Design, 2002, pp. 375–378.

[7] A. Srivastava, D. Sylvester, and D. Blaauw, “Statistical optimizationof leakage power considering process variations using dual-Vth andsizing,” in Proc. Design Autom. Conf., 2004, pp. 773–778.

[8] A. Agarwal, K. Kang, S. K. Bhunia, J. D. Gallagher, and K. Roy,“Effectiveness of low power dual-V/sub t/designs in nanoscale tech-nologies under process parameter variations,” in Proc. Int. Symp. LowPower Electron. Design, 2005, pp. 14–19.

[9] K. A. Bowman, S. G. Duvall, and J. D. Meindl, “Impact of die-to-dieand within die parameter fluctuations on the maximum clock frequencydistribution for gigascale integration,” IEEE J. Solid State Circuits, vol.37, no. 2, pp. 183–190, Feb. 2002.

[10] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices.Cambridge, U.K.: Cambridge Univ. Press, 1998.

[11] Microsystems Technology Laboratory, MIT, Boston, MA, “Well-tem-pered bulk-Si nMOSFET device home page,” [Online]. Available:http://www-mtl.mit.edu/Well

[12] AVANT! Corp., Fremont, CA, “MEDICI: Two-dimensional semicon-ductor device simulation program,” 2000.

[13] K. Roy, S. Mukhopadhyay, and H. Mahmoodi, “Leakage currentmechanisms and leakage reduction techniques in deep-submicrometerCMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003.

[14] K. Roy and S. Prasad, Low Power CMOS VLSI Circuit Design. NewYork: Wiley-Interscience, 2000.

[15] S. C. Schwartz, “On the distribution function and moments of powersums with lognormal components,” Bell Syst. Tech. J., vol. 61, pp.1441–1462, Sep. 1982.

[16] K. Kang, B. C. Paul, and K. Roy, “Statistical timing analysis usinglevelized covariance propagation considering systematic and randomvariations of process parameters,” ACM Trans. Design Autom. Elec-tron. Syst., vol. 11, no. 4, pp. 848–879, Oct. 2005.

[17] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhao, K.Gala, and R. Panda, “Path-based statistical timing analysis consideringinter- and intra-die correlations,” in Proc. TAU, 2002, pp. 16–21.

[18] T. Sakurai and R. Newton, “Delay analysis of series-connectedMOSFET circuits,” IEEE J. Solid-State Circuits, vol. 26, no. 2, pp.122–131, Feb. 2001.

[19] D.-G. Park, “Thermally robust dual-work function ALD-MN MOSFETusing conventional CMOS process flow,” in Proc. IEEE VLSI Technol.Symp., 2004, pp. 186–187.

[20] Y.-T. Hou, M. Li, T. Low, and K. Dim-Lee, “Metal gate work func-tion engineering on gate leakage of MOSFET,” IEEE Trans. ElectronDevices, vol. 51, no. 11, pp. 1783–1789, Nov. 2004.

[21] H. Y. Yu, C. Ren, Y. C. Yeo, J. F. Kang, X. P. Wang, H. H. H. Ma, M. F.Li, D. S. H. Chan, and D.-L. Kwong, “Fermi pinning-induced thermalinstability of metal gate work functions,” IEEE Electron Device Lett.,vol. 25, no. 5, pp. 123–125, May 2004.

[22] A. Agarwal, K. Kang, and K. Roy, “Accurate estimation and mod-eling of total chip leakage considering inter- and intra-die process vari-ations,” in Proc. Int. Conf. Comput.-Aided Design, 2005, pp. 736–741.

[23] M. Chen, “Back-gate bias enhanced band-to-band tunneling leakagein scaled MOSFET,” IEEE Electron Device Lett., vol. 19, no. 4, pp.134–136, Apr. 1998.

[24] B. Yu, “50 nm Gate-length CMOS transistor with super-halo: Design,process, and reliability,” in Proc. Int. Electron Devices Meeting, 1999,pp. 653–656.

[25] J. Le, “STAC: Statistical timing analysis with correlation,” in Proc. De-sign Autom. Conf., 2004, pp. 343–348.

[26] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, “Statistical analysisof subthreshold leakage current for VLSI circuits,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 12, no. 2, pp. 131–139, Feb. 2004.

[27] S. Zhang, V. Wason, and K. Banerjee, “A probabilistic framework toestimate full-chip subthreshold leakage power distribution consideringwithin-die and die-to-die P-T-V variations,” in Proc. ISLPED, 2004,pp. 156–161.

[28] S. Narendra, V. De, S. Borkar, D. A. Antoniadis, and A. P. Chan-drakasan, “Full-chip subthreshold leakage power prediction and reduc-tion techniques for sub 0.18-um CMOS,” IEEE J. Solid-State Circuits,vol. 39, no. 3, pp. 501–510, Mar. 2004.

Amit Agarwal (M’04) received the B.Tech. degreein electrical engineering from the Indian Institute ofTechnology, Kanpur, India, in 2000, and the M.S. andPh.D. degrees in electrical and computer engineeringfrom Purdue University, Lafayette, IN, in 2001 and2006, respectively. His Ph.D. dissertation focused onintegrated device/circuit/architecture approach to lowpower, high performance process tolerant cache andregister file design.

Since graduation, he has been with Intel Corpo-ration’s Circuit Research Laboratories, Micropro-

cessor Technology Laboratories, Hillsboro, OR, as a Circuit Research Engineer,working on high performance, leakage tolerant, and low leakage register filedesign. He was with SGI Microprocessor Design Center, Boston, MA, in thesummer of 2001, working on signal integrity as a summer intern. His researchinterests include low-power, high-performance and process tolerant cache,register file, and reconfigurable architecture design.

Mr. Agarwal was a recipient of the Certificate of Academic Excellence for theyears 1996-1997, 1997-1998, and 1998-1999, a fellowship from Purdue Univer-sity for the year 2000-2001, and an Intel Ph.D. fellowship at Purdue Universityfor the year 2004-2005.

Kunhyuk Kang (S’04) received the B.S. degree inelectrical engineering from Seoul National Univer-sity, Seoul, Korea, in 2002, and the M.S. degree inelectrical engineering from Rensselaer PolytechnicInstitute, Troy, NY, in 2003. He is currently pursuingthe Ph.D. degree in electrical engineering fromPurdue University, West Lafayette, IN.

In the summer of 2005, he was an Intern with TexasDevelopment Center, Intel Corporation, Austin, TX,where he performed research in static timing anal-ysis (STA) algorithm. His research interests include

design for manufacturability/reliability, modeling and design methodology forfailure/fault tolerance, and statistical CAD algorithms under process variation.

Page 12: 660 IEEE TRANSACTIONS ON VERY LARGE SCALE …swarup.ece.ufl.edu/papers/J/J22.pdfcale Systems Research Center (GSRC), by DARPA, by MARCO, by GSRC, by SRC, by Intel, and by IBM. A. Agarwal

AGARWAL et al.: DEVICE-AWARE YIELD-CENTRIC DUAL- DESIGN UNDER PARAMETER VARIATIONS IN NANOSCALE TECHNOLOGIES 671

Swarup Bhunia (S’00–M’05) received the B.E.(honors) degree from Jadavpur University, Kolkata,India, in 1995, the M.Tech. degree from the IndianInstitute of Technology (IIT), Kharagpur, India, in1997, and the Ph.D. degree from Purdue University,West Lafayette, IN, in 2005.

Currently, he is an Assistant Professor with theDepartment of Electrical Engineering and Com-puter Science, Case Western Reserve University,Cleveland, OH. He has worked in the semiconductorindustry on RTL synthesis, verification, and low

power design for about three years.Dr. Bhunia was a recipient of the 2005 SRC Technical Excellence Award as

a team member, a Best Paper Award in the International Conference on Com-puter Design (ICCD 2004), a Best Paper Award in the Latin American TestWorkshop (LATW 2003), and a Best Paper Nomination in the Asia and SouthPacific Design Automation Conference (ASP-DAC 2006). He has served in thetechnical program committee of the Design Automation and Test Conference inEurope (DATE 2006-2007), the International Symposium on Low Power Elec-tronics and Design (ISLPED 2007), the Test Technology Educational Program(TTEP 2006-2007), and in the program committee of the International OnlineTest Symposium (IOLTS 2005).

James D. Gallagher received the B.S. degree inelectrical and computer engineering (highest honors)from Rutgers University, Piscataway, NJ, in 2003,and the M.S. degree in electrical and computerengineering from Purdue University, West Lafayette,IN, in 2005, after doing research in low powernano-scale devices

He is currently working as a Design Engineerwith the Enterprise Microprocessor Group, Intel,Hillsboro, OR.

Kaushik Roy (M’83–SM’95–F’02) received theB.Tech. degree in electronics and electrical com-munications engineering from the Indian Instituteof Technology, Kharagpur, India, and the Ph.D.degree from the electrical and computer engi-neering department of the University of Illinois atUrbana-Champaign, Urbana, in 1990.

He joined the Electrical And Computer Engi-neering Faculty, Purdue University, West Lafayette,IN, in 1993, where he is currently a Professor andholds the Roscoe H. George Chair of Electrical

and Computer Engineering. He was with the Semiconductor Process andDesign Center of Texas Instruments, Dallas, TX, where he worked on FPGAarchitecture development and low-power circuit design. His research interestsinclude VLSI design/CAD for nano-scale silicon and nonsilicon technologies,low-power electronics for portable computing and wireless communications,VLSI testing and verification, and reconfigurable computing. He has publishedmore than 400 papers in refereed journals and conferences, holds 8 patents, andis the co-author of two books on Low Power CMOS VLSI Design (Wiley).

Dr. Roy was a recipient of the National Science Foundation Career Devel-opment Award in 1995, the IBM Faculty Partnership Award, the ATT/LucentFoundation Award, the 2005 SRC Technical Excellence Award, the SRC Inven-tors Award, and the Best Paper Awards at the 1997 International Test Confer-ence, the IEEE 2000 International Symposium on Quality of IC Design, the2003 IEEE Latin American Test Workshop, the 2003 IEEE Nano, the 2004IEEE International Conference on Computer Design, the 2006 IEEE/ACM In-ternational Symposium on Low Power Electronics and Design, and the 2005IEEE Circuits and System Society Outstanding Young Author Award (ChrisKim), the 2006 IEEE TRANSACTIONS ON VLSI SYSTEMS Best Paper Award. Heis a Purdue University Faculty Scholar. He is the Chief Technical Advisor ofZenasis Inc. and the Research Visionary Board Member of Motorola Labora-tories (2002). He has been in the editorial board of IEEE DESIGN AND TEST,IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, and IEEE TRANSACTIONS

ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. He was a Guest Ed-itor for a Special Issue on Low-Power VLSI in the IEEE DESIGN AND TEST

(1994) and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)SYSTEMS (June 2000), and IEE Proceedings, Computers and Digital Techniques(July 2002).


Recommended