1720 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING …1720 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING...

1720 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING AND MANUFACTURING TECHNOLOGY, VOL. 3, NO. 10, OCTOBER 2013

Tier Adaptive Body Biasing: A Post-SiliconTuning Method to Minimize Clock Skew

Variations in 3-D ICsKwanyeob Chae, Student Member, IEEE, Xin Zhao, Student Member, IEEE, Sung Kyu Lim,

Senior Member, IEEE, and Saibal Mukhopadhyay, Senior Member, IEEE

Abstract— In this paper, we analyze the variability in a3-D clock network designed with single and multiple through-silicon vias and present a post-silicon tuning methodology,called tier adaptive body biasing (TABB), to reduce skewand data path variability in 3-D clock trees. TABB usesspecialized on-die sensors to independently detect the processcorners of n-channel metal–oxide–semiconductor (nMOS) andp-channel metal–oxide–semiconductor (pMOS) devices andaccordingly tune the body biases of nMOS/pMOS devices toreduce the clock skew variability. We also present the systemarchitecture of TABB and circuit techniques for the on-diesensors. Circuit-level simulation and statistical analysis of theTABB architecture in a predictive 45-nm technology demonstratethe effectiveness of TABB in reducing the clock skew variabilityconsidering the data path variability in 3-D ICs.

Index Terms— Adaptive body bias, clock skew, processvariation, 3-D integration.

I. INTRODUCTION

D IE-TO-DIE (D2D) and within-die (WID) variations inprocess parameters can lead to significant chip-to-chip

variations in delay and power dissipation of ICs [1]–[3].In 2-D ICs, within-chip variation is determined by WIDvariations only. A three-dimensional (3-D) IC is composed ofseparate dies from different wafers and lots [4]. Therefore,in a 3-D IC, both WID and D2D variations contribute towithin-chip variations [5]–[9]. Moreover, variations in RCproperties of through-silicon vias (TSVs) also add to totaldelay variations in 3-D ICs [6]–[9]. Hence, methodologies arerequired to reduce the effect of within-chip and chip-to-chipvariations in 3-D ICs.

The performance and functionality of a digital circuitdepend on the variations in logic delays and clock skews.The clock skew is defined as the difference between arrivaltimes of the clock signal at different flip-flops. A higher clockskew worsens performance and/or robustness of a design.

Manuscript received July 11, 2012; revised November 5, 2012; acceptedDecember 17, 2012. Date of publication January 29, 2013; date of currentversion September 30, 2013. This work was supported in part by theSemiconductor Research Corporation under Grant #1836.075 and the NationalScience Foundation under Grant CCF-0917000. Recommended for publicationby Associate Editor D. G. Kam upon evaluation of reviewers’ comments.

The authors are with the School of ECE, Georgia Institute of Technology,Atlanta, GA 30332 USA (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCPMT.2013.2238581

In 2-D ICs, WID variations change the delay differencebetween various branches of the clock tree, leading toincreased clock skews. The D2D variation changes the delayof the entire clock tree and, hence, does not affect the clockskew significantly. On the contrary, clock skews in 3-D ICsare affected by both D2D and WID variations as both of themlead to within-chip variations.

The history of variation-aware 3-D clock network design isshort. Zhao et al. [10], [11] investigated TSVs’ random effectson clock skew uncertainties and analyzed the impact of WIDand D2D process variations on 3-D clock performance. Theexperiments indicated that a 3-D clock network using multipleTSVs is able to decrease the clock skew variations by usingfewer buffers and shorter interconnects. In addition, Xu et al.[12] proposed a statistical clock skew model for a regular 3-DH-tree considering the WID and D2D variations in buffers.The use of clock TSV redundancy in a 3-D clock network forfault-tolerant design has been explored [13].

Adaptive voltage scaling (AVS) and adaptive body biasing(ABB) are widely used to offset D2D variations using post-silicon tuning [2], [3]. In AVS, higher VDD is assigned to theslower die (to improve speed), and lower VDD is assigned tothe faster die (to save power) [2]. We have investigated the useof AVS for reducing logic delay variability in 3-D ICs [14].However, AVS for clock networks with multiple clock TSVs ischallenging because all clock TSVs will require level shifters,which will introduce an additional source of delay variations(i.e., skew) and power overhead. The second approach is touse ABB, where forward body bias is applied to slow diesand reverse body bias is applied to fast dies [3]. ABB has asignificant advantage over AVS for 3-D clock network, as bodybiasing does not require different VDD for each die. Hence, thesignals between different dies can be interfaced without levelshifters. Kim et al. [15] studied the use of ABB combinedwith the die-matching strategy to reduce 3-D skew variation.However, they focused only on reducing the delay differencebetween dies (to reduce skew), not the delay variation itself.For example, if two dies are equally slow (or fast) zero biasis applied to both dies since their delay difference is mini-mal. However, as delay variations are not compensated, thechip-to-chip spread in clock slew and logic delay are notreduced, leading to yield loss.

In this paper, we analyze the effects of D2D and WIDvariability on the clock skew in a 3-D clock tree and present

2156-3950 © 2013 IEEE

CHAE et al.: TABB: A POST-SILICON TUNING METHOD TO MINIMIZE CLOCK SKEW VARIATIONS IN 3-D ICs 1721

TABLE I

PARAMETERS USED IN SIMULATION

Parameters Description

Process model 45-nm NCSU PTM model [16]

Threshold voltage (VTH) nMOS: VTH = 0.471 V pMOS: VTH = 0.423 V

Wire r = 0.1 �/μm, c = 0.2 fF/μm

TSV RC π model: RTSV = 50 m�, CTSV = 15 fF (CTOP = 7.5 fF, CBOTTOM = 7.5 fF)

[D2D σ, WID σ ] (VTH, wire, and TSV) [5%, 15%], [10%, 10%], [15%, 5%]

tier adaptive body biasing (TABB)—a post-silicon tuningmethod to reduce clock skew variations in 3-D ICs. Systemarchitecture is presented to independently sense the processvariations in p-channel metal–oxide–semiconductor (pMOS)and n-channel metal–oxide–semiconductor (nMOS) devicesusing on-chip-delay-based sensors and adapt the body bias ofthe nMOS/pMOS devices of each tier to mitigate the impactof process variations. The effectiveness of the approach isdemonstrated through statistical simulations considering D2Dand WID variations on example 3-D clock trees with differentnumber of TSVs in a predictive 45-nm node. The body biastuning helps mitigate the effect of tier-to-tier process shiftsand reduce clock skew variations. The clock slew variationis also reduced as the separate body biasing for nMOS andpMOS transistors compensates the VTH-skew between nMOSand pMOS transistors. Moreover, we show that the TABBhelps reduce variations in the power of the clock network andreduces the delay variability for logic paths. The applicationof ABB to reduce clock skew/slew, dynamic/static power, andlogic path delay variations is a unique contribution of thispaper.

The rest of the paper is organized as follows.Section II analyzes skew variations of 3-D clock networks;Section III discusses the TABB architecture. Section IVpresents the simulation results and discussion, and Section Vsummarizes the paper.

II. ANALYSIS OF 3-D CLOCK NETWORKS

UNDER VARIATIONS

We generated 3-D clock trees used in our study using thesynthesis method presented by Zhao et al. [10]. Given a setof clock nodes (i.e., clock inputs of flip-flops) distributed intotwo dies and a clock source, the goal is to build a single treethat connects all the nodes to the source so that the skew andthe total power consumption are minimized. TSVs are usedto connect the nodes in different dies. We use the IBM r4benchmark design that has 1000 clock nodes. The location andthe input capacitance of clock nodes as well as the RC parasiticof clock wires, TSVs, and buffers are given as input. The inputcapacitance of clock nodes in this tree varies from 30 to 60 fF.All design and simulations are performed considering thepredictive 45-nm technology. The various design/simulationparameters for devices, wires, and TSVs are shown inTable I. Three different types of 3-D clock networks weredesigned with 1 (type 1), 10 (type 2), and 100 (type 3) TSVsto observe the 3-D clock skew variations according to D2Dvariation as illustrated in Fig. 1(a)–(c), respectively. The size

(a)

(b)

(c)

Fig. 1. Three different types of 3-D clock networks. (a) Type 1 (1 TSV).(b) Type 2 (10 TSVs). (c) Type 3 (100 TSVs).

of each die is 10 × 10 mm. In the clock network type 1, eachdie has a complete clock network that is connected at a clocksource through a single TSV. The clock network types 2 and3 have multiple TSVs and have a main clock network in die 1(a complete 2-D network) and subclock networks in die 2.The subclock networks in die 2 are connected through theclock TSVs from the branches in the middle of the main clocknetwork in die 1. The type 2 clock network has 10 TSVsand type 3 has 100 TSVs, where the size of the subclocknetworks in type 3 is much smaller than those in type 2.Hence, the network latencies of the 100 subclock networks intype 3 are much shorter than that of the 10 subclock networksin type 2; the clock latency in die 2 is the highest for type 1.


(a)

(b) (c)

Fig. 2. Skew histogram: clock skew base line without process variations ofclock network. (a) Type 1. (b) Type 2. (c) Type 3.

The baseline values of clock skew, which are computedunder no process variations, are shown in Fig. 2. FromFig. 2(a), it can be found that 2-D skews are independent ofeach other in the clock network type 1. On the other hand, 2-Dskews are similar to each other in the clock networks types 2and 3. Since the subclock networks in die 2 of types 2 and 3are connected from the branches of the main clock networkof die 1, the clock skew performances of subclock networksin die 2 are affected by the clock skew performances of themain clock network in die 1. The clock network latencies ofvarious clock sinks in die 1 and die 2 are shown in Fig. 3(not considering variation), and a correlation coefficient (ρ)between the latencies of die 1 and die 2 is calculated. Asexpected from the preceding discussion, the clock networktype 1 has the lowest ρ (0.1727). On the other hand, in theclock networks types 2 and 3, the subclock networks sharethe common path with the main clock network. Hence, thecorrelation between the skew of die 1 and die 2 is much higher.The correlation is the highest for type 3 as it shares the longestcommon paths with the main clock network in die 1. Fromthis result, we conjecture that the skew performance of themain clock network becomes more important as the numberof clock TSVs used increases (as the subclock network sizegets smaller, or as the length of the common path increases).

The effect of process variations on the skew characteristicsare studied next. Fig. 4 illustrates the skew variability in theclock network types 1, 2, and 3 for different D2D and WIDvariations. In terms of 2-D skew variation, all clock networktypes show the same trend. As the WID variation becomesstronger, 2-D skew variation increases. From the results, weconclude that WID variation is a dominant factor that decidesthe level of 2-D skew variation. The clock network type 1showed extremely high 3-D skew variation even under thelow D2D variation condition (5% WID variation). Since theclock network type 1 in die 1 does not have a common pathwith the clock network in die 2, it showed the worst 3-Dskew variation. In addition, as the impact of D2D variation

(c)

(b)

(a)

Fig. 3. Correlation coefficient (ρ) between the latencies of die 1 and die 2for clock network. (a) Type 1. (b) Type 2. (c) Type 3, not considering processvariation.

gets stronger, 3-D clock skew variations of the clock networktypes 1 and 2 showed a distinctive increase. It implies thatthe D2D variation strongly impacts the skew variations ofthe 3-D clock network. However, as the number of clockTSVs increases (as the common clock path gets longer), theimpact of D2D variation on skew variation becomes weakeras illustrated in Fig. 4. We observe that 3-D skew variationis the maximum for type 1 and the minimum for type 3. Asthe impact of D2D variation decreases, the impact of the WIDvariation on 3-D skew becomes observable. For example, weobserve in Fig. 4 that the variations in 3-D skew and 2-D skeware comparable for the clock network type 3 when the D2Dvariation is weak; as the D2D variation increases, the 3-D skewvariation dominates the 2-D skew variation. In summary, anexcessive number of clock TSVs reduce 3-D skew variations,at the expense of additional area overhead for TSVs and thetest clock routing for separate die tests. In addition, it couldalso cause yield problems due to the TSV yield. More numberof TSVs could lead to a higher possibility of failure in theclock network. If the D2D variation can be compensated,possible performance loss can be minimized even with a lownumber of clock TSVs.

III. TIER-ADAPTIVE BODY BIASING

We propose tier-adaptive body biasing (TABB) tocompensate for the D2D variation and reduce 3-D clock skew.The basic approach is to detect the global variation in thethreshold voltage in each die. Forward body bias (FBB) is


Fig. 4. 2-D and 3-D skew distributions of specific points in the clock network according to different variations.

applied to a slow die to reduce VTH and improve performance,while reverse body bias (RBB) is applied to increase VTHto make a fast die slower. Independent body bias levels arerequired to compensate for the VTH shifts in nMOS andpMOS.

A. System Architecture

The system architecture of TABB is shown in Fig. 5. Eachtier includes sensors to independently detect the thresholdvoltage shifts in nMOS and pMOS devices. The variationsensors are enabled during power-up, and based on theiroutputs, a voltage regulator (body bias regulator) changes thebody voltages for nMOS and pMOS transistors in each tierseparately. Note that all nMOS devices in a tier receive thesame body bias and so do all pMOS devices in a tier. Inthis paper, we assumed an off-chip power management ICto generate the body bias voltages. The on-chip body biasgenerators, such as ones presented in [29] can also be usedfor this purpose. We bounded the body bias range for nMOSand pMOS transistors within +0.3 and −0.3 V, respectively.The limiting factor of FBB is the increased subthresholdleakage current as well as the potential for forward bias currentthrough the body-to-source diode. The limiting factor of RBBis the increase in the short-channel effect and the higherjunction tunneling current in nanometer technologies. Wefurther explore two ABB options. First, both FBB and RBB are

considered. However, RBB is only possible when the voltageregulator can provide a negative voltage for nMOS transistorsand a voltage higher than VDD for pMOS transistors. Sincegenerating a negative voltage or a voltage higher than VDD ismore complex (specifically, for on-chip generators), we alsoconsider the option of using only FBB and nominal or zerobody bias (ZBB).

B. D2D Variation Sensor

We develop a D2D variation sensor based on the principleof ring oscillator (RO)-type sensors. The frequency of a ringoscillator changes due to process variations, and this signaturecan be detected using a counter. An RO-type sensor can beeasily implemented with digital components. The outputs arealso digital and hence, can be easily utilized [17]–[23]. Theeffects of WID variations can be minimized to improve theaccuracy of D2D detection by increasing the number of chainsin the ring oscillator, which helps average out the random WIDacross the stages [14], [20]. However, in TABB, we need toindependently detect the D2D variation of nMOS and pMOSdevices. Since the delay of an RO is affected almost equallyby nMOS and pMOS transistors, it is difficult to determineVth shifts in nMOS and pMOS devices separately. This couldresult in an incorrect assignment of nMOS and pMOS bodybiases, resulting in reduced effectiveness. Further, in a clocknetwork, a larger difference between the effective strength of


Off-ChipPower

Management IC

die1

die2

Die2 nMOS sensorDie2 pMOS sensor

Die1 nMOS sensorDie1 pMOS sensor

VBN1 VBP1

VBN2 VBP2

EN

Body-bias for nMOStransistors

Body-bias for pMOStransistors

VID

Fig. 5. TABB system.

nMOS and pMOS devices can worsen the clock slew rate.Iizuka et al. [21], [22] proposed an effective all-digital methodto measure the performance variation of nMOS and pMOSdevices separately by counting the number of pulses vanishingto 0 or 1 in a buffer ring. However, this method requires anadditional calculation process to solve equations for obtainingthe final results. The method proposed by Zhang [23] forcharacterizing rising and falling time of standard cells includesanalog circuits and complex measurement procedure.

In this paper, we modify the RO-type D2D sensor tosense the delay variation of nMOS and pMOS transistorsseparately without the postcalculation process, complex detec-tion process, or sophisticated analog circuits (Fig. 6). ThenMOS variation sensor is composed of inverters with a pull-down network with stacked long-channel nMOS (Wn/Ln)transistors and a pull-up network with a single pMOS tran-sistor (Wp0/Lp0). When the enable signal is high, the nMOSvariation sensor oscillates with a frequency that is a strongfunction of the speed of nMOS transistors. This is because,due to the higher stack height, the fall time through the pull-down network is more dominant than the rising time. Forthe pMOS variation sensor, the inverter is composed of apull-up network with stacked long-channel pMOS (Wp/Lp)transistors and a pull-down network with a single nMOStransistor (Wn0/Ln0). In this case, the rising time throughthe pull-up network is higher, and hence, the pMOS variationdominates the frequency.

Fig. 7(a) shows the correlation between the nMOS speedand the sensor output, which increases with an increase inthe nMOS channel length and the stack height. As shown inFig. 7(c), for the nMOS sensor, at A, the measured correlationfactor was 0.280 with a short channel length (50 nm) and onetransistor stack. At B, with a long channel length (250 nm) and

(a) (b)

Fig. 6. Modified RO-based (a) nMOS and (b) pMOS variation sensors.

(a) (b)

(c)

Fig. 7. Correlation between the normalized nMOS or pMOS delay impactedby D2D variation and the normalized output of (a) the nMOS sensor and(b) the pMOS sensor according to the channel length and the transistor stack.(c) Detailed correlation analysis for the nMOS sensor at points A and B.

two-transistor nMOS stack, the correlation factor increases to0.915. Likewise, Fig. 7(b) shows that a higher channel lengthand stack height of pMOS transistors increase the correlationbetween the pMOS process corner and the sensor output. FromA to B in Fig. 7(b), the correlation factor increases from0.69 to 0.918. The correlation factor can be further increasedby increasing the number of stages. Next, the size of thepull-up PFET in the nMOS variability sensor and the size ofthe pull-down NFET in the pMOS variability sensor are opti-mized to improve the correlation factor. For the nMOS vari-ation sensor, if the pMOS transistor is too small, the pull-updelay becomes high. Thus, the pMOS speed introduces noiseat the sensor output. On the other hand, when the size of thepMOS transistor is too large, the contention between the pull-down and pull-up networks becomes high. This also degradesthe sensitivity of the total delay to the nMOS process corner.A similar explanation holds for the pMOS variation sensor.

IV. SIMULATION RESULTS

In this section, we present the statistical simulation resultsto demonstrate the effectiveness of TABB. Monte-Carlo (MC)


180 200 220 240 260 280 300180 200 220 240 260 280 3000.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

Sensor Output Code

VBN

VBP

Sensor Output Code

180 200 220 240 260 280 300 180 200 220 240 260 280 300

1.00

1.10

0.90

0.80

0.70

0.60

VBN

VBP

-0.10

0.00

0.10

0.20

0.30

0.40

Sensor Output Code Sensor Output Code

(a)

(b)

Fig. 8. Body bias assignments according to the sensor outputs of the pMOSand nMOS variation sensors considering 15% WID variation and 5% D2Dvariations with 50-mV resolution with (a) FBB/RBB and (b) FBB/ZBB.

simulations were conducted for the clock network types 1, 2,and 3. The simulations also include three different combina-tions of D2D variations and WID variations: 1) when [D2D σ ,WID σ ] are [5%, 15%], it indicates a process of higher WIDvariations than D2D variations; 2) when [D2D σ , WID σ ] are[10%, 10%], it implies a process with equal WID variationsand D2D variations; and 3) when [D2D σ , WID σ ] are [15%,5%], it indicates a process of higher D2D variations than WIDvariations. For each MC simulation point, nMOS and pMOSvariation sensors generate digital codes for the global nMOSand pMOS process corners. The body bias levels for eachtier are selected accordingly. Fig. 8 shows a summary of theoutputs of pMOS and nMOS variation sensors for the case of15% WID variation and 5% D2D variation. We consider thescenarios of: 1) both FBB and RBB application [Fig. 8(a)],and 2) only FBB and ZBB applications [Fig. 8(b)]. Accordingto the sensor outputs, we apply different body biases with50-mV resolution. Note that this resolution is well within thecapabilities of common voltage regulators (e.g., 6- to–12-mVresolution [25]). Fig. 9 shows the histogram of body biasassignments of die 1 and die 2 considering FBB/RBB[in Fig. 9(a)] and FBB/ZBB [in Fig. 9(b)] for the aboveexample.

A. Effect of TABB on the Clock Skew Variation

With different body biasing conditions (without TABB,TABB with FBB/RBB, or TABB with FBB/ZBB), weobserved the trends of mean/maximum skew and skew stan-dard deviation (standard deviation is denoted by σ ) in the clocknetworks while changing D2D standard deviation and WIDstandard deviation. Our observations for the clock networktypes 1, 2, and 3, are summarized in Figs. 10–12, respectively.

1) Effect of TABB on 2-D Skew: Higher WID variationincreases the variability in 2-D skew. However, even withoutany TABB, the effect is generally weak. Note that the impactof WID variations on 2-D clock skew can be further reduced

VBP1 VBN1

count

VBP2 VBN2

count

die1 die2

VBP1 VBN1

count

VBP2 VBN2

count

die1 die2

(a)

(b)

Fig. 9. Histogram of the body bias assignments of die 1 and die 2 considering(a) FBB/RBB and (b) FBB/ZBB with 15% WID and 5% D2D variations.

(a)

(b)

(c)

Fig. 10. Results of TABB on clock network type 1 considering D2D andWID variations. (a) Mean skew. (b) Skew variation. (c) Maximum skew.

if the size of the clock driver transistor increases. Normally,the buffers in the clock network are designed with transistorsthat are larger than minimum-sized transistors. When TABBis applied with FBB/RBB, we observe a marginal reduction inthe mean skew, but comparably larger reduction in the skewstandard deviation and the maximum skew. We observe thatTABB is more effective in reducing the mean, the standarddeviation, and the maximum 2-D skew when the D2D vari-


(a)

(b)

(c)


ation becomes higher. This is because the effect of the WIDvariation is more severe with worse global VTH corners, andD2D variation compensation with TABB helps reduce 2-Dskew variations. We further observe that TABB with only FBBgives marginally better benefits for 2-D skew than TABB withFBB/RBB. This is because the effect of the WID variationon 2-D skew is stronger for slow (high VTH) dies (slowdies have higher delay sensitivity than fast dies). Since FBBcompensates for variations in slow dies, FBB can be moreeffective in reducing 2-D skew standard deviation σ .

2) Effect of TABB on 3-D Skew: Without TABB, we observethat the D2D variation strongly affects 3-D skew. A higherD2D variation results in a significant increase in the mean, thestandard deviation, and the maximum skew. As TABB reducesthe D2D variation, it helps reduce the mean, the maximumvalue, and the standard deviation of 3-D skew significantly.As expected, the effectiveness of TABB is stronger when theD2D variation is larger. We further observe that TABB withFBB/RBB is more effective in reducing 3-D skew comparedwith TABB with only FBB. This is because using both FBBand RBB results in a better compensation of the D2D variationthan using FBB alone. The advantage of using both FBB/RBBis more pronounced under higher D2D variations. However,this observation is reversed when we consider the maximum3-D skew of clock network type 3 [Fig. 12(c)]. The reason isdiscussed in Section IV-A-3.

3) Effect of TABB on Different Types of Clock Network:TABB shows a consistent effectiveness for different types

(a)

(b)

(c)


of clock networks. Different clock networks showed similarresults for 2-D skew performance. As explained earlier, thecharacteristics of 2-D skew in die 1 and die 2 are verydifferent for the clock network type 1. TABB has a simi-lar impact on 2-D skew for both dies in type 1. For theclock network type 1, 3-D skew variations, due to D2Dvariations, dominate 2-D skew variations in each die. TABBwith RBB/FBB significantly reduces 3-D skew variations,and hence, the overall skew variations in the network type 1.Due to this factor, TABB is most effective for the clock net-work type 1 which has only one TSV. As the number of TSVsin the clock network increases, however, the effectiveness ofTABB reduces and the least impact is observed for clocknetwork type 3 (100 TSVs). This is because in clock networktype 3, the subnetworks in die 2 have the longest common pathwith the main clock network in die 1. This causes the clockskews in the two dies to become more and more correlatedand to be primarily determined by the skew variations in themain clock network in die 1. Therefore, the effectiveness ofTABB reduces as only the ABB for die 1 becomes important.We also observe that variations in 2-D skew and 3-D skewbecome comparable. For clock network type 3, we observethat FBB/ZBB achieved higher reduction in skew variations.Since clock network type 3 has small subnetworks in die 2(the clock subnetworks in die 2 have the maximum sharedclock path with the main clock network in die 1), it is affectedless significantly by D2D variations than clock network types1 and 2. As the D2D variation impact gets weaker, the WID


-0.1-0.05

00.05

0.1

-0.1-0.05

00.05

0.15060708090100110

VTHP VTHN

Slew

(ps)

-0.1-0.05

00.05

0.1

-0.1-0.05

00.05

0.15060708090100110

VTHP VTHN

Slew

(ps)

-0.1-0.05

00.05

0.1

-0.1-0.05

00.05

0.15060708090100110

VTHP VTHN

Slew

(ps)

(a) (b) (c)

Fig. 13. (a) Clock slew rate without TABB. (b) Clock slew rate with FBB/RBB. (c) Clock slew rate with FBB/ZBB according to VTHN and VTHP skew.

variation shows a stronger impact on skew performance. Thus,FBB/ZBB could achieve a higher gain than FBB/RBB sincemaking path delay shorter helps reduce delay variation. Insummary, FBB/RBB reduces skew variations more when theskew variation is a strong function of the D2D variation. Onthe other hand, FBB/ZBB achieves a higher gain if the clockskew is impacted more by the WID variation.

4) Effect of TABB on Clock Slew Rate: The effect of TABBon the variability in clock slew rate is studied. Fig. 13(a)shows the clock slew rate according to different thresholdvoltage variations (�VTHN and �VTHP) of nMOS and pMOStransistors. It can be observed that there exist significantvariations in the clock slew rate depending on the processshifts, even when the opposite VTH shifts in pMOS and nMOSvariations result in similar clock network latency (i.e., minimalskew). As shown in Fig. 13(b) and (c), FBB/RBB or FBB/ZBBcan effectively reduce the variations in the clock slew rate.It implies that applying separate body bias to nMOS andpMOS transistors helps better compensate variations for circuitparameters like clock slew rate, which are sensitive to VTHskew. Reducing the clock slew rate variation is importantas slew can significantly impact the timing characteristics(i.e., setup time and hold time) of flip-flops.

B. Effect of TABB on Overall Performance

The results in the previous sections show that TABB reducesthe mean, the standard deviation, and maximum values of 2-Dand 3-D skew under D2D and WID variations. However, asthe body of all devices in clock buffers and logic gates areshared, TABB also affects the delays of data paths. This isparticularly true for nMOS devices (assuming nontriple-wellprocess). Hence, we need to consider the impact of TABB onlogic paths as well. We study the effect of TABB on two2-D data paths (the whole path is only in a die) and one3-D data path (the data path occupies two dies and usesfive TSVs) of the 3-D design discussed in Section II. Fordata path, the absolute delay is important. Thus, D2D vari-ation increases the delay variation of both 2-D and 3-Dlogic paths (Fig. 14). We further observe that the delayσ /μ of the 3-D path was smaller than that of 2-D paths.This is because the independent D2D variations of two diescan partially offset each other, thereby reducing the overalldelay variations [14]. We observe that TABB with FBB/RBBsignificantly reduces delay variation but has a marginal impact

on the mean delay. The reduction in the delay spread isless when TABB with only FBB is considered. We observethat both 2-D and 3-D data paths experience a significantreduction in delay variation with TABB. In summary, TABBreduces variability in both clock skews and logic path delays,thereby significantly reducing the chip-to-chip variability inthe performance of 3-D ICs.

C. Impact of TABB on Area and Power of Clock Network

In the TABB architecture, power overhead of the sensorscan be neglected since nMOS and pMOS variation sensorsare activated only once during an initial boot-up sequence.However, we need to carefully analyze the impact of TABBon the power overhead of clock and logic paths. In case ofFBB/ZBB, since FBB/ZBB causes slow logic gates to switchfaster, it could help reduce shortcircuit current, which occurswhen both the pMOS transistor and the nMOS transistor areon. A faster transition reduces the time when the pMOSand the nMOS transistors are both on. On the other hand,FBB increases the subthreshold leakage current as well as thepotential for forward bias current through the body-to-sourcediode. Overall, the mean power overhead with FBB/ZBBwas 0.47%–0.49% of the total clock network power. WithRBB/FBB, the average power consumption was reduced by1.45% for all clock network types as shown in Fig. 15(a).Although FBB could increase the average power, RBB helpsreduce the excessive leakage current. Thus, in case of thetotal power (dynamic and leakage power), RBB/FBB reducedthe mean total power of clock networks slightly. The varia-tion in total power, on the other hand, reduces significantly(∼40.59%) if we use TABB with FBB/RBB. This is becauseFBB increases the total power for slow dies, and RBBdecreases the total power for fast and leaky dies. Thus,FBB/RBB reduces the total power variation down to 40.59%.On the other hand, FBB/ZBB decreases the total powervariation by 9.62% only. Since FBB/ZBB works only for slowdies while FBB/RBB works for both slow dies and fast dies,FBB/RBB reduces power variation more. Further, as the totalpower variation is significantly affected by the D2D variation,the reduction is higher when the D2D variation is dominant.

The layout area of nMOS and pMOS variation sensors are178.8 and 152.7 μm2, respectively. The size of the sensorsbecome negligible as the chip size gets bigger. Assuming alocal sensor in a 1 mm2 local area (1000 × 1000 μm), the area


[D2D σ,WID σ] 1 : =[5%,15%] 2 : =[10%,10%] 3 : =[15%,5%]TABB(FBB/RBB) TABB(FBB/ZBB)no TABB

02004006008001000

1 2 3 1 2 3 1 2 3

delayμ(ps)

2D path0 3D path

-6.99%

2.18%

2D path1

1 2 3 1 2 3 1 2 3

delayσ(ps)

3D path

-67.56%

-46.77%

-39.78%-41.15%

-32.67%-30.06%

2D path0 2D path1

020406080100120140

(a)

(b)

Fig. 14. Results of TABB (FBB/RBB or FBB/ZBB) of the data paths (two2-D paths and one 3-D path) according to D2D and WID variations. (a) Meandelay. (b) Delay variation.

overhead from sensors becomes 0.033%. Because the currentin the transistor body is at least two orders of magnitudesmaller than the supply current, the cost of body bias routing issignificantly less than the power grid [2]. Previous works havereported that the area overhead of body bias routing is less than2% of the total chip area. The area overhead was estimatedfrom a test layout as shown in Fig. 16. TAP cells for separatebody contacts (substrate and n-well contacts) and routing wereinserted at every 30 μm. The feasible width of a TAP cell,considering a 45-nm design rule checking, is 0.35 μm, fromwhich the area overhead can be estimated considering bodycontacts and routing. The estimated overhead is measured tobe 1.17%.

The measured power consumptions of the nMOS and thepMOS sensors are 24.78 and 26.93 μW, respectively, attypical conditions (1.0-V supply and 27°C temperature). Theoverhead of the power consumption is 0.49% of the clocknetwork type 1 power at 1.0-V supply, 27°C temperature, and100-MHz clock input. Considering logic power, this overheadwill become much smaller. In addition, this power overheadcan be negligible since the sensors operate only one time at theinitial operation. The additional major power overhead causedby FBB/ZBB is measured up to 0.47%–0.49% according toclock network types. If only FBB/ZBB is considered, theforward bias increases the leakage current from the supplyto the bulk, which causes static power overhead.

D. Discussions

Although ABB helps reduce performance variations, itcould impact the latch-up problem caused by parasitic bipolarjunction transistors (BJTs). This latch-up is more critical whenforward bias is applied to the body, since the forward biasmakes parasitic BJTs more likely to turn on as much as the

powerμ(mW)

19.019.520.020.521.021.5

1 2 3 1 2 3 1 2 3TYPE1 TYPE2 TYPE3

0.47%

0.49%

-1.45%

[D2D σ,WID σ] 1 : =[5%,15%] 2 : =[10%,10%] 3 : =[15%,5%]TABB(FBB/RBB) TABB(FBB/ZBB)no TABB

powerσ(μW)

050100150200250300350400450

1 2 3 1 2 3 1 2 3TYPE1 TYPE2 TYPE3

-40.59%-9.62%

0.87%-9.62%

5.19%

(a)

(b)

Fig. 15. Results of TABB on the clock power considering D2D and WIDvariations. (a) Mean power. (b) Power variation.

Fig. 16. Layout overhead considering adaptive body biasing.

forward bias voltage. Key parameters deciding latch-up relia-bility level are gains of BJTs and n-well/substrate resistancebetween a transistor body and an n-well/substrate contact. Thegain of a parasitic BJT is strongly affected by the distancebetween n-active and p-active and device isolation structurelike shallow trench isolation. Thus, in digital circuits, the gainof parasitic BJTs is hard to control. Instead, the resistancebetween a transistor body and an n-well/substrate contactis controllable by deciding the distance for TAP cell inser-tion. Reducing the distance of TAP cell insertion decreasesn-well/substrate resistance and eventually reduces the risk ofa latch-up issue, however, increases the area overhead. Therehave been prior studies considering latch-up reliability issueswith forward body biasing [24]–[26]. Hokazono et al. [24]observed that the measured latch-up holding voltage is above1.1 V for a 45-nm node, and Choi et al. [26] measured thelatch-up holding voltage, which was higher than 1.2 V at a65-nm node. As latch-up does not occur as long as forward


bias voltage is lower than the latch-up holding voltage, we canconclude that the forward bias up to 0.3 V under 1.0 V supplyvoltage does not cause latch-up reliability issues.

V. CONCLUSION

We presented TABB as a methodology for post-silicontuning for 3-D ICs under die-to-die and within-die processvariations. TABB reduces the skew and slew variability of3-D ICs by independently applying adaptive body biases to dif-ferent tiers. Digital circuit techniques to sense D2D variationsof pMOS and nMOS transistors are discussed. Our analysisshowed that TABB can improve the system performance byreducing the variability in clock skew and slew rate as wellas logic path delay. TABB is effective in reducing the clockskew variability in all types of 3-D clock network, but theeffectiveness varies mainly based on the number of TSVsused. The maximum effectiveness of TABB is observed forclock networks designed with fewer TSVs. In summary, asthe 3-D technology matures, designing a variation-tolerantclock network for 3-D ICs will continue to be an importantchallenge. The TABB proposed in this paper helps performpost-silicon tuning of 3-D clock trees to reduce variability. Asa future work, one can investigate how TABB can be usedduring the design of 3-D clock trees to optimize the numberof TSVs used.

REFERENCES

[1] S. Borkar, “Designing reliable systems from unreliable components:The challenges of transistor variability and degradation,” IEEE Micro,vol. 25, no. 6, pp. 10–16, Nov.–Dec. 2005.

[2] J. W. Tschanz, S. Narendra, R. Nair, and V. De, “Effectiveness ofadaptive supply voltage and body bias for reducing impact of parametervariations in low power and high performance microprocessors,” IEEEJ. Solid-State Circuits, vol. 38, no. 5, pp. 826–829, May 2003.

[3] J. Tschanz, J. T. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chan-drakasan, and V. De, “Adaptive body bias for reducing impacts ofdie-to-die and within-die parameter variations on microprocessor fre-quency and leakage,” IEEE J. Solid-State Circuits, vol. 37, no. 11,pp. 1396–1402, Nov. 2002.

[4] J. Van Olmen, A. Mercha, G. Katti, C. Huyghebaert, J. van Aelst,E. Seppala, Z. Chao, S. Armini, J. Vaes, R. C. Teixeira, M. Van Cauwen-berghe, P. Verdonck, K. Verhemeldonck, A. Jourdain, W. Ruythooren,M. de Potter de Ten Broeck, A. Opdebeeck, T. Chiarella, B. Parvais, I.Debusschere, T. Y. Hoffmann, B. De Wachter, W. Dehaene, M. Stucchi,M. Rakowski, P. Soussan, R. Cartuyvels, E. Beyne, S. Biesemans, andB. Swinnen, “3D stacked IC demonstration using a through silicon viafirst approach,” in Proc. IEEE Int. Electron. Device Meeting, Dec. 2008,pp. 1–4.

[5] F. Akopyan, C. Otero, D. Fang, S. J. Jackson, and R. Manohar,“Variability in 3-D integrated circuits,” in Proc. IEEE Custom Integr.Circuit Conf., Sep. 2008, pp. 659–662.

[6] S. Garg and D. Arculescu, “3D-GCP: An analytical model for the impactof process variations on the critical path delay distribution of 3D ICs,”in Proc. Int. Symp. Qual. Electron. Design, Mar. 2009, pp. 147–155.

[7] S. Reda, A. Si, and R. I. Bahar, “Reducing the leakage and timingvariability of 2D ICs using 3D ICs,” in Proc. IEEE Int. Symp. LowPower Electron. Design, Aug. 2009, pp. 283–286.

[8] S. Garg and D. Marculescu, “System-level process variability analysisand mitigation for 3D MPSoCs,” in Proc. Design Autom. Test Eur., 2009,pp. 604–609.

[9] S. S. Ozdemi, Y. Pan, A. Das, G. Memik, G. Loh, and A. Choudhary,“Quantifying and coping with parametric variations in 3D-stackedmicroarchitectures,” in Proc. Design Autom. Conf., 2010, pp. 144–149.

[10] X. Zhao, J. Minz, and S. K. Lim, “Low-power and reliable clocknetwork design for through-silicon via (TSV) based 3D ICs,” IEEETrans. Compon., Packag. Manuf. Technol., vol. 1, no. 2, pp. 247–259,Feb. 2011.

[11] C.-L. Lung, Y.-S. Su, S.-H. Huang, Y. Shi, and S.-C. Chang, “Fault-tolerant 3D clock network,” in Proc. Design Autom. Conf., 2011,pp. 645–651.

[12] X. Zhao, S. Mukhopadhyay, and S. K. Lim, “Variation-tolerant andlow-power clock network design for 3D ICs,” in Proc. IEEE Electron.Compon. Technol. Conf., May–Jun. 2011, pp. 2007–2014.

[13] H. Xu, V. F. Pavlidis, and G. De Micheli, “Process-induced skewvariation for scaled 2-D and 3-D ICs,” in Proc. Int. Workshop Syst.Level Inter. Prediction, 2010, pp. 17–24.

[14] K. Chae and S. Mukhopadhyay, “Tier-adaptive-voltage-scaling (TAVS):A methodology for post-silicon tuning of 3D ICs,” in Proc. Asia SouthPacific Design Autom. Conf., Jan. 2012, pp. 277–282.

[15] T.-Y. Kim and T. Kim, “Post silicon management of on-package varia-tion induced 3D clock skew,” J. Semicond. Technol. Sci., vol. 12, no. 2,pp. 139–149, Jun. 2012.

[16] FreePDK45: Contents. (2011) [Online]. Available:http://www.eda.ncsu.edu/wiki/FreePDK45:Contents

[17] M. Bhushan, M. Ketchen, S. Polonsky, and A. Gattiker, “Ring oscillatorbased technique for measuring variability statistics,” in Proc. IEEE IntConf. Microelectron. Test Struct., Mar. 2006, pp. 87–92.

[18] L.-T. Pang and B. Nikolic, “Measurements and analysis of processvariability in 90 nm CMOS,”IEEE J. Solid-State Circuits, vol. 44, no. 5,pp. 1655–1663, May 2009.

[19] K. Shinkai and M. Hashimoto, “Device-parameter estimation with onchip variation sensors considering random variability,” in Proc. AsiaSouth Pacific Design Autom. Conf., Jan. 2011, pp. 683–688.

[20] S. Mukhopadhyay, K. Kim, H. Mahmoodi, and K. Roy, “Design of aprocess variation tolerant self-repairing SRAM for yield enhancementin nanoscaled CMOS,” IEEE J. Solid State Circuits, vol. 42, no. 6,pp. 1370–1382, Jun. 2007.

[21] T. Iizuka, J. Jeong, T. Nakura, M. Ikeda, and K. Asada, “All-digitalon-chip monitor for PMOS and NMOS process variability measurementutilizing buffer ring with pulse counter,” in Proc. IEEE Eur. Solid-StateCircuits Conf., Sep. 2010, pp. 182–185.

[22] J. Jeong, T. Izuka, T. Nakura, M. Ikeda, and K. Asada, “All-digitalPMOS and NMOS process variability monitor utilizing buffer ringwith pulse counter,” in Proc. Asia South Pacific Design Autom. Conf.,Jan. 2011, pp. 79–80.

[23] X. Zhang, K. Ishida, M. Takamiya, and T. Sakurai, “An on-chipcharacterizing system for within-die delay variation measurement ofindividual standard cells in 65-nm CMOS,” in Proc. IEEE Asia SouthPacific Design Autom. Conf., Jan. 2011, pp. 109–110.

[24] A. Hokazono, S. Balasubramanian, K. Ishimaru, H. Ishiuchi, C. Hu, andT. K. Liu, “Forward body biasing as a bulk-Si CMOS technology scalingstrategy,” IEEE Trans. Electron. Devices, vol. 55, no. 10, pp. 2657–2664,Oct. 2008.

[25] S. Narendra and A. Chandrakasan, Leakage in Nanometer CMOSTechnologies. New York: Springer-Verlag, Nov. 2005.

[26] J. Y. Choi, B. H. Lee, K.-T. Do, H.-O. Kim, H.-S. Won, andK.-M. Choi, “Design techniques to minimize the yield loss for generalpurpose ASIC/Soc devices,” in Proc. IEEE Int. Soc Design Conf., Nov.2009, pp. 45–48.

[27] N. Kamae, A. Tsuchiya, and H. Onodera, “An area effective for-ward/reverse body bias generator for within-die variability compen-sation,” in Proc. IEEE Asian Solid State Circuit Conf., Nov. 2011,pp. 217–220.

Kwanyeob Chae (S’09) received the B.S. and M.S.degrees in electronics engineering from Korea Uni-versity, Seoul, Korea, in 1998 and 2000, respectively.He is currently pursuing the Ph.D. degree in elec-trical and computer engineering with the GeorgiaInstitute of Technology, Atlanta.

He joined Samsung Electronics Co., Ltd., in 2000,where he was engaged in the development of digitalcircuits. His current research interests include self-adaptive circuits, low-power circuits and systems,variation-tolerant design, nonvolatile memories, and

3-D ICs.Mr. Chae was a recipient of the 2007 Samsung LSI Presidential Award and

the 1998 LG Semiconductor Contest Award.


Xin Zhao (S’07) received the B.S. degree from theElectronic Engineering Department and M.S. degreefrom the Computer Science and Technology Depart-ment, Tsinghua University, Beijing, China, in 2003and 2006, respectively, and the Ph.D. degree fromthe School of Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta, in 2012.

Her current research interests include computer-aided design for very large scale integration cir-cuits, especially on physical design for low power,robustness, and 3-D ICs.

Dr. Zhao was a recipient of a Best Paper Award Nomination at theInternational Conference on Computer-Aided Design in 2009, a Best PaperAward Nomination from the IEEE TRANSACTIONS ON COMPUTER-AIDED

DESIGN in 2012, and a Best Paper Award Nomination at the InternationalSymposium on Low Power Electronics and Design in 2012.

Sung Kyu Lim (S’94–M’00–SM’05) received theB.S., M.S., and Ph.D. degrees from the ComputerScience Department, University of California, LosAngeles, in 1994, 1997, and 2000, respectively.

He joined the School of Electrical and Com-puter Engineering, Georgia Institute of Technology,Atlanta, in 2001, where he is currently an Asso-ciate Professor. His current research interests includearchitectures, circuits, and physical design for 3-DICs and 3-D system-in-packages. He has authoredthe book entitled Practical Problems in VLSI Phys-

ical Design Automation Springer, 2008).Dr. Lim was a recipient of the Design Automation Conference Graduate

Scholarship in 2003, the National Science Foundation Faculty Early CareerDevelopment Award in 2006, the ACM SIGDA Distinguished Service Awardin 2008, and nominations for the Best Paper Award at ISPD’06, ICCAD’09,CICC’10, DAC’11, DAC’12, and ISLPED’12. He was on the Advisory Boardof the ACM Special Interest Group on Design Automation from 2003 to 2008.He was an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE

SCALE INTEGRATION (VLSI) SYSTEMS from 2007 to 2009.

Saibal Mukhopadhyay (S’99–M’07–SM’11)received the B.E. degree in electronics andtelecommunication engineering from JadavpurUniversity, Kolkata, India, and the Ph.D. degree inelectrical and computer engineering from PurdueUniversity, West Lafayette, IN, in 2000 and 2006,respectively.

He is currently an Associate Professor with theSchool of Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta. He hasauthored or co-authored over 100 papers in refereed

journals and conferences and holds five U.S. patents. His current researchinterests include analysis and design of low-power and robust circuits innanometer technologies, and 3-D circuits and systems.

Dr. Mukhopadhyay was a recipient of the Office of Naval Research YoungInvestigator Award in 2012, the National Science Foundation CAREERAward in 2011, the IBM Faculty Partnership Award in 2009 and 2010, theSRC Inventor Recognition Award in 2008, the SRC Technical ExcellenceAward in 2005, the IBM Ph.D. Fellowship Award for 2004 to 2005, theBest in Session Award at 2005 SRC TECNCON, and the Best Paper Awardsat the 2003 IEEE NANO and 2004 International Conference on ComputerDesign.

Date post:	10-Apr-2021
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

1720 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING …1720 IEEE TRANSACTIONS ON COMPONENTS, PACKAGING...

Documents