[IEEE 2008 9th International Symposium of Quality of Electronic Design (ISQED) - San Jose, CA, USA...

Characterization of Standard Cells for Intra-Cell Mismatch Variations

Savithri Sundareswaran1

[email protected] A. Abraham2

[email protected] Ardelea3

[email protected]

Rajendran Panda1

[email protected] Freescale Semiconductor, 2 The University of Texas at Austin, 3 Sun Microsystems

Austin, Texas USA

Abstract

With the adoption of statistical timing across industry, there isa need to characterize all gates/cells in a digital library for delayvariations (referred to as, statistical characterization). Statisti-cal characterization need to be performed efficiently with accept-able accuracy as a function of several process and environmentparameter variations. In this paper, we propose an approach toconsider intra-cell process mismatch variations to characterize acell’s delay and output transition time (output slew) variations. Astraightforward approach to address this problem is to model thesemismatch variations by characterizing for each device fluctuationseparately. However, the runtime complexity for such characteri-zation becomes of the order of number of devices in the cell and thenumber of simulations required can easily become infeasible. Weanalyze the fluctuations in switching and non-switching devicesand their impact on delay variations. Using these properties ofthe devices, we propose a clustering approach to characterize forcell’s delay variations due to intra-cell mismatch variations. Theproposed approach results in as much as 12X runtime improve-ments with acceptable accuracy, compared with Monte Carlo sim-ulations. We show that this approach ensures an upper-bound onthe results while keeping the number of simulations for each cellindependent of the number of devices.

1. IntroductionProcess disturbances are often described by device parameter

variations which can be classified into two basic types: globalvariations, which are the same for all devices on the same chipand local variations, which vary from device to device. Advancesin process technology have greatly increased the importance ofgate/cell-level statistical static timing analysis (SSTA). There areseveral SSTA techniques proposed to account for both inter-chip(global) and intra-chip (local) variations. However, these tech-niques consider the variations at cell level and do not account forintra-cell device-to-device mismatch variations. The delay vari-ations of each cell accounting for intra-cell mismatch variationsalso need to be included in statistical timing analysis.

A naı̈ve and straightforward approach to computing intra-celldelay variability is by assigning random variables for each devicein the cell; such a model becomes infeasible when consideringa large number of devices. To address this problem, in [3][4], astatistical gate-delay variation using response surface method isproposed. The model calculates intra-cell variability through in-troduction of sensitivity constants. The sensitivity constants are

computed by considering the devices on the transition path (charg-ing/discharging path). Even though the intra-cell delay variance isrepresented finally using a single statistic, computing the sensitiv-ity constants requires an additional p characterizations (where, p isthe number of devices in the cell). In the worst-case, the runtimecomplexity of characterizations for each cell will be O(np), wheren is the number of intra-die physical parameters. We propose inthis paper, a clustering approach to model intra-cell mismatch vari-ations. This approach reduces the number of characterizations re-quired to capture intra-cell mismatch variations significantly. Weshow that in our approach, the complexity is O(n). The runtimedepends only on number of intra-die parameters and is indepen-dent of number of devices within the cell. Further, the approachensures an upper-bound on delay variance which is desirable fortiming analysis. Experiments indicate that the proposed approachmodels the delay variations due to intra-cell mismatch within 12%accuracy of Monte Carlo simulations. A major advantage of theproposed approach is that it needs little or almost no change toexisting characterization infrastructures. Specific contributions inthis paper are the following:

• We study the impact of intra-cell device fluctuations on delayvariations and empirically determine significant contributorsto intra-cell delay variations.

• We present a novel approach based on clustering multipleintra-cell variations to compute the delay sensitivity due todevice mismatch variations.

• We show that the proposed approach has a computationalcomplexity of O(n), n = number of intra-die (local) variables,independent of the number of devices in the cell.

The paper is organized as follows: Section 2 describes global andlocal parameters of variation and delay sensitivity characteriza-tion. Section 3 analyzes the impact of intra-cell mismatch varia-tions on cell delay variability and derives the proposed approach.Experiments and accuracy analyses of several digital standard cellsfor delay variationsare presented in Section 4. The results are il-lustrated for delay variations; but, these techniques can be easilyextended using a similar approach for output slew.

2. Background2.1. Global vs. Local Variations

Process variability in devices can be classified into two broadcategories: (a) global variations and (b) local variations. Typi-cally, all chip-to-chip, across-wafer and wafer-to-wafer variationsare combined as a global variation (also, commonly referred as

9th International Symposium on Quality Electronic Design

0-7695-3117-2/08 $25.00 © 2008 IEEEDOI 10.1109/ISQED.2008.11

213

9th International Symposium on Quality Electronic Design

0-7695-3117-2/08 $25.00 © 2008 IEEEDOI 10.1109/ISQED.2008.11

213

inter-chip variation). Variability across-chip (intra-die) is termedas local variation. Each parameter that has significant impact onthe device characteristics can be represented in the following form:

P = P0 + ΔPg + ΔPl

where P0 is the nominal or mean value, ΔPg , ΔPl is the globaland local component of variations, respectively for this parame-ter. To generalize for multiple parameters, let ΔX be a vector ofthe global components of variation, {ΔX1, ΔX2,. . .ΔXm} andlet ΔR be a vector of the local components of variation, {ΔR1,ΔR2,. . .ΔRn}. The global component ΔXi varies from chip-to-chip; but, for a given chip this value is same for all devices in thedesign. The local component, ΔRj is the across-chip component,which can vary from device-to-device and captures both locationor geometry dependent variations and mismatch or random vari-ations. Vectors ΔX and ΔR are typically modeled as standardnormal distributions, N(0, 1) and are statistically independent ofeach other. The parameters within ΔX (or ΔR ) can be correlatedin general. However, for simplicity of discussion we present thetechniques below for uncorrelated parameters. If the parametersare correlated, an orthogonalization technique (for example prin-cipal component analysis) can be applied to extract uncorrelatedparameters. For purposes of discussion in this work, any referenceto local variations will be to only local-random variations (alsotermed as mismatch variations). Local-random/mismatch varia-tions are caused by variations or mismatch in device characteris-tics in a cell. In such case, fluctuations of each device in a cellimpact the timing of the cell.

2.2. Sensitivity CharacterizationGate-level static timing analysis (STA) is a well known ap-

proach for timing sign-off. STA requires that the standard librarycells are pre-characterized for delay and output transition time andstored in a two-dimensional table indexed by input transition time(input slew) and output load. Each gate/cell is characterized usinga transistor level circuit simulator (e.g., SPICE simulator). From acell characterization perspective, each variable in ΔX impacts alldevices identically and hence, represents a single statistic. How-ever, each local variable in ΔR represents a separate random vari-able (statistic) for each device in the cell. Consider each cell in alibrary is pre-characterized for m number of global parameters ofvariation and n number of local parameters of variation. Let p benumber of devices for a cell, G in the library. Since the statisti-cal variations are often much smaller than the nominal parametervalues, usually performance characteristics of the cells are almostlinear functions of the parameters. The basic idea is then to extractthe first (mean) and second (variance) statistical moments of theperformance metric (e.g., delay, output slew, etc.) and use them torepresent the variation-aware delay. The delay of a timing arc, Dcan be represented as follows:

D = D0 +

m∑i=1

diΔXi +

n∑j=1

p∑k=1

djkΔRjk (1)

where D0 is the nominal delay value, and is characterized by set-ting variations ΔXi , ΔRjk to zero. All ΔXi, ΔRjk parametersare modeled as N(0, 1). The quantities di and djk are direct sen-sitivities of cell delay with respect to the global variations, ΔXi

and local variations, ΔRjk respectively. These are deterministicquantities obtained from characterization results. The problem of

statistical characterization becomes that of characterizing for thesedi and djk quantities, which are delay sensitivities to the globaland local parameter variations.

Thus, characterization of cell-delay variation due to globalvariations is performed by varying a given parameter for all de-vices in a cell. For example, consider channel length global vari-ation to be ΔLg and local variation is ΔLl. Figure 1 illustratesa 2-input nand cell with four devices in two statistical character-ization configurations: a. configuration for global variations andb. configuration for local variations. Consider, delay variation fortiming arc, A(r) → X(f) (input pin-A rising to output pin-Xfalling). To characterize for global variations, all devices are set toa single (correlated) random variable, ΔLg . And, the delay varia-tion for each timing arc, A(r) → X(f) is determined with respectto this single parameter, ΔLg . For intra-cell mismatch variations,variations of each device in the cell impact the delay variation oftiming arc A(r) → X(f). Figure 1(b) illustrates the configurationof a nand2 cell with four devices for characterization with respectto the local variations, ΔLl. In order to characterize for these localvariations, each device ‘i’ in the cell is assigned a separate randomvariable, ΔLli and the effective delay variation due to all suchintra-cell mismatch variables need to be determined.

Figure 1: a. Global vs. b. Local variations in a cell

The paper only discusses characterization of sensitivities tolocal variations. Different approaches to characterize for delayvariations due to local or intra-cell mismatch variations are de-scribed in the following sections. Section 3.2 describes the pro-posed clustering-based approach.

3. Modeling of intra-cell variations3.1. Intra-cell variations

Consider a 2-input NAND cell as illustrated in Figure 2 foranalysis of intra-cell delay variations; the number of devices, p =4 for this cell. Let K be a single physical parameter exhibitingmismatch variations (e.g., channel length with local component,ΔLl). Let ΔKNi and ΔKPj be the random variables correspond-ing to K for each nMOS device, Ni and pMOS device, Pj in thecell respectively. Let σKNi , σKP j be delay sensitivities due tothese random variables, ΔKNi, ΔKPj respectively. The problemof statistical delay characterization for intra-cell mismatch varia-tions is then to determine cell’s delay sensitivity, σK , as a functionof σKNi and σKPj .

3.1.1. Simple ApproachA direct and simple approach to computing intra-cell delay

variation of each timing arc is to determine delay variation by con-sidering a random fluctuation in each device separately. Each vari-ance component σ2

KNi, σ2

KPj(for devices, Ni, Pj respectively)

can be obtained through a separate sensitivity characterization by

214214

A

BX

Vdd

Gnd

P2

P1

N2N1

direct approach

ΔKP1

ΔKP2

ΔKN1ΔKN1

A

BX

Vdd

Gnd

P2

P1

N2N1

direct approach

A

BX

Vdd

Gnd

P2

P1

N2N1

direct approach

ΔKP1

ΔKP2

ΔKN1ΔKN1

Figure 2: Direct and simple approach for characterization of localvariations

setting random variables, ΔKNi , ΔKPj (illustrated in Figure2). Assuming delay variation due to each device is statistically in-dependent, the cell’s delay sensitivity can then be obtained usingfollowing relation:

σ2K =

∑i

σ2KNi

+∑

j

σ2KPj

(2)

Now, if there are p devices in a cell, at least p+1 (one additionalfor the nominal value) simulations need to be performed to deter-mine the delay varianceσ2

K . For n local sources of variations, theorder of computation complexity is O(np). While this approach isfairly accurate, it depends on number of devices in the cell. This isnot a good because, if the number of devices, p in a cell increases,the problem becomes infeasible. Further, each cell may have dif-ferent number of devices, p resulting in no consistent number ofsimulations for all the cells in the library. Note that due to com-mon device terminals and parasitics present for each componentof delay variance computation, there is correlation between delayvariables, σ2

KNiand σ2

KPj. However, this correlation is not sig-

nificant and for all practical purposes can be ignored for digitalcells/gates.

3.1.2. Transition-path based ApproachAnother approach to the problem of finding the delay variance

σ2K is to consider only devices on the transition path. Each output

transition can be determined through a set of devices in the pathfrom output to the power/ground rail (also termed as conductingpath or transition path). For example, for the 2-input NOR cell (asillustrated in Figure 3) consider A(f) → X(r) timing arc. Thedevices P1 and P2 lie on the transition path for A(f) → X(r).The delay variance, σ2

K , for A(f) → X(r) using transition-pathbased fluctuations is then given as:

σ2K = σ2

KP1 + σ2KP2 (3)

This approach assumes that the delay sensitivity of a cell, σK

has major contributions from P1, P2 and devices N1, N2 thatare not on transition path are not significant contributors to thedelay variation. This approach has the advantage that the num-ber of variables considered for characterization of delay varianceof each timing arc is reduced. However, it can be quickly ob-served that, using this approach the number of devices that needto be considered for each delay variance computation is differ-ent from one cell to another and from one timing arc to another.For example, in the case of the 2-input NOR cell, there are fourtiming arcs: {A(f) → X(r), A(r) → X(f), B(f) → X(r),B(r) → X(f)} with devices {(P1, P2), (N1), (P1, P2), (N2)}

that need to be identified respectively for each transition path. Ef-fectively the number of random variables that need to be consid-ered for characterization is equal to the number of devices in thecell. Further, this approach ignores contribution from the switch-ing device that is not on the transition path (e.g., pMOS (nMOS)device for output falling (rising) transition). We show empiricallyin the following sections that this is not the case and all switchingdevices have significant impact on the timing arc delay variance.Further, we derive an approach that guarantees an upper bound onthe cell’s delay sensitivity/variance.

A

BX

Vdd

Gnd

P2

P1

N2N1

A

BX

Vdd

Gnd

P2

P1

N2N1

Figure 3: Devices on transition path = {P1, P2} for A(r) → X(f)

3.2. Study of intra-cell delay variationsWe performed variance analysis to study the impact of intra-

cell mismatch variations on cell delay variance. Several MonteCarlo simulations were performed by setting one device fluctu-ation at a time and by setting all devices randomly. The delayvariance obtained by treating all devices randomly is treated asbaseline. The objective of this study is to determine the devicesthat contribute significantly to the intra-cell delay variance. TheMonte Carlo settings are explained in detail below:

Case 1: Monte-Carlo simulations were performed by treatingall devices randomly and the delay variance or standard deviationsfor each timing arc are captured. For a given local parameter thereis one Monte Carlo simulation performed in this case. Let σall bethe delay sensitivity obtained, which forms the baseline simulationresults for cell’s delay sensitivity.

Case 2: Separate Monte-Carlo simulations were performed bysetting one device fluctuation at a time and keeping the other de-vices at nominal conditions. For each local parameter, the numberof Monte Carlo simulations in this case is equal to the number ofdevices. For example, for a 2-input NOR cell and for a single lo-cal parameter, say channel length, there are four separate MonteCarlo simulations performed once for each device. Let the de-lay standard deviations from these simulations are: σNi

1 for ith

nMOS device and σPj for jth pMOS device.Nor2 A(f)->X(r)

Case 1

n1

n2

p1

p2liii

()

Nor2 A(r)->X(f)Case 1

n1

n2

p1

p2

liii

()

Figure 4: Nor2 A → X: Impact of individual device fluctuations

compared with baselineMonte Carlo simulation results for a 2-input NOR cell are il-

lustrated in Figure 4 (for input pin A transitioning). Each bar chartin Figure 4 depict the standard deviation results for Case 1 (left-

1For simplifying notations, all subscripts for parameter K on delay sen-sitivity is dropped from this section onwards

215215

most result) as baseline. And, the other four bars in each chartindicate delay sensitivity results from Case 2 simulations. That is,the bars “n1”, “n2”, “p1” and “p2” are the four Monte Carlo sim-ulation results for Case 2, by setting only one device fluctuation,N1, N2, P1, and P2 respectively. When input A (B) is transi-tioning, the devices N1, P1 (N2, P2) connected to these inputs atthe gate terminal are defined as switching devices. The remainingdevices are termed as non-switching devices. Case 1 and Case 2simulations results (illustrated in Figure 4) indicate that the sum ofdelay variance obtained for each device fluctuation from Case 2 isalmost equal to the delay variance from Case 1. This establishesthe statistical independence of delay variance due to each devicefluctuation.

In order to determine the significant components of delay vari-ance for each timing arc, we define two types of sensitivity.

• Direct sensitivity: is ratio of delay variation determined fromCase 2 for each device fluctuation with respect to Case 1.This can be represented for nMOS devices as SNi = σNi

σall

• Cluster sensitivity: A cluster is a set of devices of same type(either nMOS or pMOS). Cluster sensitivity is ratio of de-lay variation due to each device fluctuation with respect tovariation of all devices in a single cluster. This sensitivity iscomputed for nMOS devices as CNi = σNi√∑

iσ2

Ni

The above sensitivity relations can be similarly determined forpMOS devices. Direct sensitivity gives measure of the signifi-cant contributors to timing arc’s overall delay variation. Clustersensitivity provides information about which device fluctuation issignificant contributor within a given cluster. Figure 5 illustrates

Direct Sensitivity

n1

n1n2

n2

p1

p1p2

p2

0

0.2

0.4

0.6

0.8

1

Cluster Sensitivity

n1

n1n2

n2p1

p1p2

p2

0

0.2

0.4

0.6

0.8

1

1.2

B(R)→X(f)A(R)→X(f) B(R)→X(f)A(R)→X(f)

Direct Sensitivity

n1

n1n2

n2

p1

p1p2

p2

0

0.2

0.4

0.6

0.8

1

Cluster Sensitivity

n1

n1n2

n2p1

p1p2

p2

0

0.2

0.4

0.6

0.8

1

1.2

B(R)→X(f)B(R)→X(f)A(R)→X(f)A(R)→X(f) B(R)→X(f)B(R)→X(f)A(R)→X(f)A(R)→X(f)

Figure 5: Direct and Cluster Sensitivities Nor2 A, B(r) → X(f)the direct and cluster sensitivities computed for each device of theNor2 cell. The observations from the analysis of these results aredescribed in the following sub-sections.

3.2.1. Delay sensitivity to switching device fluctuations

A

BX

Vdd

Gnd

P2

P1

N2N1

A(r) →X(f) Delay variance is most sensitive to mismatch variations of switching devices

A

BX

Vdd

Gnd

P2

P1

N2N1

A(r) →X(f) Delay variance is most sensitive to mismatch variations of switching devices

Figure 6: Impact of switching device fluctuationsThe switching device on transition/conducting path impacts

significant portion of baseline results. For a falling output tran-sition, a switching nMOS device is the primary contributor while,primary contributor for a rising transition is a switching pMOSdevice. Also, impact of switching devices that are not on the tran-sition path is not insignificant. Consider A(r) → X(f) timing

arc illustrated in Figure 4. In this case, the device on the transi-tion path is N1. The switching devices are N1, P1. The combineddelay variance due to these switching devices comprises >96% oftotal delay variance obtained from Case 1 baseline results. That is,S2

N1 + S2P1 > 0.96. Further, direct sensitivity analysis shows that

impact of N1 device fluctuation has the largest sensitivity (see leftchart in Figure 5). While, cluster sensitivity analysis for the sametiming arc in Figure 5 (right chart) illustrates that switching devicefluctuations within a cluster comprise of >98% sensitivity. Thatis, for timing arc A(r) → X(f), C2

N1 > 0.98 and C2P1 > 0.98.

Hence, switching devices are the most significant contributors.

3.2.2. Delay sensitivity to non-switching device fluctuationsImpact of non-switching devices, whether on transition path

or not, is negligible. For example, consider timing arc, A(r) →X(f). In this case, fluctuations of devices that are non-switchingare N2 and P2. The nMOS device, N2 is not on transition path;and, P2 lies on the transition path. Both N2 and P2 exhibit verysmall direct and cluster sensitivities and their contributions to thedelay variations can be ignored. That is, CN1 >> CN2 within thenMOS cluster; andCP1 >> CP2within the pMOS cluster. When

A

BX

Vdd

Gnd

P2

P1

N2N1Mismatch variations of non-switching devices have

negligible impact on A(r) →X(f) delay variance

A

BX

Vdd

Gnd

P2

P1

N2N1Mismatch variations of non-switching devices have

negligible impact on A(r) →X(f) delay variance

Figure 7: Impact of non-switching device fluctuationscomparing the contributions for switching devices within a cluster,the impact due to non-switching device fluctuations is very smalland can be neglected for practical purposes.

3.2.3. Intra-cell delay correlationsFor each output transition, there can be correlation in delay

between the timing arcs due to common device fluctuations. How-ever, from direct and cluster sensitivities it can be trivially shownthat such correlation is negligible. From Sections 3.2.1 and 3.2.2 itcan be observed that the set of significant contributors for a giventiming arc does not overlap with that for other timing arcs. Forexample, consider Nor2. If output pin X is a falling transition,then, A(r) → X(f) and B(r) → X(f) can exhibit correlations.And, set of significant contributors {N1, P1} for A(r) → X(f)does not overlap with the set of significant contributors {N2, P2}for B(r) → X(f). Thus, if non-switching device fluctuation im-pact is neglected, there are no common devices that contribute tothe delay variance between A(r) → X(f) and B(r) → X(f),resulting in negligible correlation between A(r) → X(f) andB(r) → X(f) delay variations. Note that this is the case only formismatch/local-random variations, and such correlation betweentiming arcs cannot be neglected when considering global varia-tions and/or spatial-dependent variations in parameters.

3.3. Proposed Approach: Clustering-basedintra-cell variability

The proposed approach takes advantage of the observationsmade in the previous section. Following properties are derivedfrom analysis in sections 3.2.1 and 3.2.2:

216216

Property I: Impact of variations in switching devices both onthe transition path and on the non-transition path form significantcontributors to intra-cell delay variations.

Property II: Impact of variations in non-switching devices issmall and can be negligible.

We take advantage of these two important properties and pro-pose a new approach to characterizing for cell delay variance asfollows. The basic idea of the approach is to group all devices onthe nMOS and pMOS stack separately, resulting in two clustersfor each cell and then assign fluctuations or random variables tothe cluster instead of each device. This is equivalent to mappingany combinational cell to an inverter-like structure (see Figure 8).Since a cell is characterized for one input switching at a time,each cluster then has one switching device for a given timing arc.Within a cluster, the delay variance is most sensitive to the switch-ing device and the non-switching devices have negligible contri-bution. As a result, the delay variations computed for the clusterrandom variable is same as that for the switching device. The delayvariations thus derived for nMOS and pMOS clusters are statisti-cally combined to give the cell’s total delay variations due to intra-cell mismatch variations. This cluster-based approach is explainedin detail below. Let ΔKNi , ΔKPj be random variables corre-

A

B

X

Vdd

Gnd

Mn1

Mn2

Mp2Mp1

ΔDp

ΔDn

A

X

Vdd

Gnd

p-cluster

n-cluster

ΔDp

ΔDn

A

B

X

Vdd

Gnd

Mn1

Mn2

Mp2Mp1A

B

X

Vdd

Gnd

Mn1

Mn2

Mp2Mp1

ΔDp

ΔDn

A

X

Vdd

Gnd

p-cluster

n-cluster

ΔDp

ΔDn

A

X

Vdd

Gnd

p-cluster

n-cluster

A

X

Vdd

Gnd

p-cluster

n-cluster

ΔDp

ΔDn

Figure 8: Clustering of nMOS/pMOS stack: Equivalent to an inverter

with two delay variablessponding to K for each nMOS device, Ni and pMOS device, Pj

in the cell respectively. Assign random variables, ΔKn (ΔKp ) tothe nMOS (pMOS) cluster corresponding to each local parameterof variation, K. That means, every device in the nMOS (pMOS)cluster is assigned the same random variable, ΔKn (ΔKp ). Con-sider ΔKn (ΔKp ) to be standard normal distributions, N(0, 1).Let ΔDn (ΔDp ) be the cell’s delay variables due to ΔKn (ΔKp

) as illustrated in Figure 8. Assuming linearity, ΔDn and ΔDp

are also Gaussian with distributions, N(0, σn) and N(0, σp) re-spectively. Since all nMOS device fluctuations are varied togetherfor the cluster, the delay sensitivity for nMOS cluster can be givenas:

σn =∑

i

∂D

∂kni.Δkni

Since, ΔKni is N(0, 1), above equation can be rewritten as:

σn =∑

i

σni (4)

Similarly, for pMOS cluster, the delay sensitivity is given as:

σp =∑

j

σpj

Consider each device to be single fingered (handling of multi-fingers is explained in the following sub-section). For single in-put switching, there is a single nMOS and single pMOS switchingdevice in a typical CMOS combinational cell. Let the index be

i=1 (j=1) for the nMOS (pMOS) switching device within nMOS(pMOS) cluster. All other devices within the cluster are non-switching. Dividing with statistical sum of delay sensitivity dueto individual device fluctuations, equation (4) can be rewritten as:

σn∑iσ2

ni

= Cn1 +∑i,i�=1

Cni (5)

Using the cluster-sensitivity analysis from Section 3.1, the clus-ter sensitivity of switching device, Cn1 is significantly larger thancluster sensitivity of non-switching devices; hence, the last term inthe above equation is negligible. Equation (5) can be rewritten as:

σn∑iσ2

ni

≈ Cn1 ⇒ σn∼= σn1 (6)

Similarly using cluster sensitivity analysis, the delay sensitivity ofpMOS cluster is given as:

σp∑jσ2

pj

≈ Cp1 ⇒ σp∼= σp1 (7)

Rewriting the cell’s delay sensitivity relation in equation (2) bygrouping the sensitivities for switching and non-switching devicescan be given as:

σ2 = σ2n1 + σ2

p1 +∑i,i�=1

σ2ni +

∑j,j �=1

σ2pj

Using property I and II in the above equation and combining withequations (6) and (7), above can be re-written as:

σ2 ≈ σ2n1 + σ2

p1∼= σ2

n + σ2p (8)

Thus, by grouping all nMOS devices into an nMOS-cluster andsimilarly, all pMOS devices into a pMOS-cluster, the cell’s delaysensitivity can be determined by just computing the delay sensitiv-ity of these two clusters.

3.3.1. Handling Multi-fingered devicesEach transistor may have multiple fingers due to several rea-

sons e.g., folding performed during cell layout, handling very widetransistors etc. When there are multiple fingers within a cluster,then all the fingered-devices are connected to same input pin. Ifthis pin is transitioning or switching, then all these fingered de-vices form switching devices. Hence, all these fingered devicefluctuations need to be accounted for. A multi-fingered transistorin the simple approach is handled by treating each finger as a sep-arate device. Since, the fingers have the property that all deviceshave similar geometry; the cell’s delay sensitivity due to each fin-gered device fluctuation is almost same. This property is used tohandle multiple fingers. Let fn (fp) be number of fingers in thenMOS (pMOS) cluster for the chosen timing arc. Using propertyI and II, the nMOS cluster delay sensitivity in equation (4) can beextended for fingered devices as follows:

σn ∼∑f∈fn

σnf (9)

where σnf , for f = 1. . . fn is the delay sensitivity for each fingereddevice. Since the impact of each fingered device is equal, σnf isthe same and equal to σn1 for all fn devices. So, above equation(9) can be rewritten as:

σn = fn · σn1 (10)

217217

Similar equations can be derived for the pMOS cluster with fin-gered devices that are switching. Extending equation (2) fromsimple approach for fingered devices, the cell’s delay variance canbe rewritten as:

σ2 =

fn∑f=1

σ2nf +

fp∑f=1

σ2pf = fn · σ2

n1 + fp · σ2p1

Using equation (10) in above equation, the cell’s delay variancewhen the timing arc has multi-fingered devices can be given as:

σ2 =σ2

n

fn+

σ2p

fp(11)

Thus, for multi-fingered devices, the grouping of a cell into twoclusters is still performed. Then, the resulting sensitivity for eachcluster is scaled by squared-root of number of fingers correspond-ing to the timing arc. The runtime complexity remains the same asthat for single fingered device.

3.3.2. Handling Multiple DCCCs within CellA DCCC is a direct channel connected component. A simple

cell in the library is typically a single DCCC, e.g., a NAND/NORor an inverter (INV) are cells that have devices directly channelconnected and form a single DCCC. However, the library mayhave cells with more than one DCCC within the cell, for examplea simple buffer (BUFF) cell with two chained INV and hence hastwo DCCCs. In the case when a cell has multiple DCCCs, the pro-posed clustering approach is performed for each DCCC separately.The delay sensitivity for the cell’s timing arc is then computedfrom delay sensitivities obtained for each cluster in each DCCC.That is, if there are q numbers of DCCCs in a cell, the cell’s delaysensitivity can be derived from the following relation:

σ2 =∑

q

{σ2

nq + σ2pq

}(12)

3.3.3. Clustering results in an upper boundThe proposed clustering-based approach results in an upper

bound on the delay sensitivity. Due to clustering, all devices withinnMOS cluster are varying in same direction and are fully corre-lated. Hence the delay sensitivity derived from nMOS and pMOScluster delay variances is greater or equal to sum of delay variancesdue to each device fluctuation. That is:

σ2n + σ2

p =

(∑i

σni

)2

+

(∑j

σpj

)2

≥∑

i

σ2ni +

∑j

σ2pj

Thus, the cluster-based characterization results in an upper-boundon the cell’s delay variance. This is typically very useful for tim-ing analysis flows that require the delay sensitivity estimates to bepessimistic.

Delay sensitivity characterization due to intra-cell variationscan be performed by characterizing for the delay sensitivities ofnMOS and pMOS clusters in a combinational cell. Irrespective ofnumber of devices in the cell, the number of clusters is a constantand is equal to two. Thus, the runtime complexity for character-izing n number of local physical parameters of variation is O(2n).The advantage of this proposed cluster-based technique is there isa constant number of simulations required for all cells and is in-dependent of number of devices in the cell; unlike the methods

described in sections 3.1.1 and 3.1.2 that depend of number ofdevices in the cell. Above formulations for accounting intra-cellmismatch variations were validated using Monte Carlo simulationson simple, multi-DCCC and multi-fingered cells. The results anddiscussions are presented in following section.

4. Experimental Setup and ResultsThe proposed clustering-based approach has been imple-

mented in an industrial digital library characterization engine.Statistical characterization of each timing arc using cluster-basedtechnique described in previous section was performed for all thecells. The delay sensitivity for each cluster was computed using afinite-difference method. To highlight the effectiveness of the pro-posed approach we consider here example cells of different types:different device stack configurations, different number of DCCCs,different number of nMOS/pMOS fingers, etc.

For the experiments, the local parameter, ΔLl corresponding toeffective channel length and that for threshold voltage, ΔV tl werechosen. The parameters were set at 1-sigma = 3% of its nominalvalue. Characterization was carried out by setting one local pa-rameter at a time, while keeping all other parameters at nominalvalues and global variations set to zero. Results are illustrated onseveral SOI and Bulk cells for 65nm technology.

Monte Carlo simulations for each timing arc were performedusing the method described in 3.1, Case 1 for error comparisons.The variance, and hence the standard deviation, was observed bycarrying out 3000 iterations within each Monte Carlo simulation.Comparisons of standard deviation from Monte Carlo simulationswere performed with standard deviations obtained from proposedclustering-based approach.

Table 1 illustrates results for SOI library cells at 65nm technol-ogy considering channel length mismatch variations while, Table2 illustrates results for Bulk cells at 65nm technology consideringthreshold voltage mismatch variations. The column “Type” givesdetails on the type of the cell: the number of DCCCs within thecell is identified and also number of nMOS fingers, fn and pMOSfingers, fp is shown. Note that the number of fingers for transistorsconnected to each input pin may vary; however, only fingers cor-responding to the input pin of the chosen timing arc is given. Thenext column “Max Error (transition type)” shows percentage errorin the proposed approach and output transition that contributes tomax error. The error is computed with Monte Carlo simulationas baseline to understand the accuracy of the proposed approach.The transition type can be either falling (F) or rising (R). The lastcolumn provides the runtime improvement factor when comparingwith the simple approach as described in Section 3.1.1. The run-time improvement when comparing with Monte Carlo approach is≈ 1500X for all cells and not shown in the tables.

Consider for example, the results in Table 1 for the 2-inputNAND cell. This cell is a single DCCC “Type” and has two nMOSand two pMOS fingers. The maximumerror for this cell is 3.6%and it corresponds to timing arc that results in the output to bea falling (F) transition. It can be observed that the maximumer-ror for all the SOI cells is within 4%, and that for all Bulk cells iswithin 12%. The runtime improvement for the proposed techniqueis as much as 12X. It can be observed that the runtime improve-ment is very high for multi-DCCC and multi-fingered cells. Theclustering-based approach thus achieves a very high runtime ad-vantage with acceptable level of loss of accuracy.

It can be observed that the error is largest for the threshold volt-age variations, specifically for transition type that is controlled by

218218

series devices (e.g., falling transition for NAND). The reason isthat the threshold voltage variation on each device is dependenton the effective resistance of source and drain regions as well asthe channel’s effective gate-length and width dimensions. For se-ries devices this causes the variations in threshold voltage to be-come correlated. Typically, accurate silicon characterization isperformed to capture such correlations and embedded within theSPICE models. This correlation factor, however, was not capturedin the SPICE model used and not handled in the clustering ap-proach. Research to handle such correlations is part of our futurework. Further, the proposed approach addresses only combina-tional cells. Faster techniques to handle sequential cells are part offuture work.

Table 1: Clustering-based approach vs. Monte Carlo results for

SOI-65nm cells with channel length mismatch variationsCell Type MC Clustering Max Error Runtime

StdDev StdDev (transition improve-(ps) (ps) type) ment

NAND 1 DCCC 0.225 0.233 3.6% (F) 4X-2input fn=2, fp=2NOR 1 DCCC 0.322 0.336 4.0% (R) 2X

-2input fn=1, fp=1BUFFER 2 DCCC 0.179 0.184 2.8% (F) 8X

fn1=8, fp1=8fn2=4,fp2=4

INV 1 DCCC 0.106 0.109 2.8% (F) 8Xfn=8, fp=8

NAND 2 DCCC 0.318 0.328 3.1% (R) 2X-Chain fn1=1, fp1=1

fn2=1, fp2=1

Table 2: Clustering-based approach vs. Monte Carlo results for

Bulk-65nm cells w/ threshold voltage mismatch variationsCell Type MC Clustering Max Error Runtime

StdDev StdDev (transition improve-(ps) (ps) type) ment

NAND 1 DCCC 5.40 6.04 11.9% (F) 2X-2input fn=1, fp=1NOR 1 DCCC 3.53 3.95 11.9% (R) 2X

-2input fn=1, fp=1BUFFER 2 DCCC 3.90 3.97 1.8% (F) 2X

fn1=1, fp1=1fn2=1, fp2=1

INV 1 DCCC 4.76 4.86 2.1% (F) 1Xfn=1, fp=1

AND 2 DCCC 2.88 3.05 5.9% (R) 2X-2input fn1=1, fp1=1

fn2=1,fp2=1AND 2 DCCC 1.40 1.51 8.0% (R) 12X

-3input fn1=2, fp1=2fn2=4,fp2=4

AOI 1 DCCC 1.20 1.33 10.8% (F) 8Xfn=4, fp=4

5. ConclusionsIn this paper, we proposed a clustering based approach to ac-

count for intra-cell mismatch variations and characterize for delayvariations. Similar approach can be used for characterizing cellswith multiple DCCCs. The clustering-based approach easily ex-tends for handling multi-fingered devices by keeping the runtimethe same as that for single fingered device. The result is clusteringtechnique achieves significant runtime improvement for all cellsin the library. Further, it can be observed that the clustering-basedapproach always has constant number of clusters for each cell andhence, a constant runtime independent of number of devices in thecell. The proposed approach estimates higher sensitivity (an upperbound). This is very desirable by the timing analysis algorithms.

References[1] S. Bhardwaj, P. Ghanta, and S. Vrudhula. A framework for

statistical timing analysis using non-linear delay and slewmodels. Proceedings of International Conference of Com-puter Aided Design, pages 225–230, Nov. 2006.

[2] P. G. Drennan. Understanding mosfet mismatch for analogdesign. IEEE Journal of Solid-State Circuits, 38(3):450–456,Mar. 2003.

[3] K. Okada, K. Yamaoka, and H. Onodera. A statistical gate-delay model considering intra-gate variability. Proceedingsof International Conference on Computer Aided Design (IC-CAD), pages 908–913, Nov. 2003.

[4] K. Okada, K. Yamaoka, and H. Onodera. A statistical gatedelay model for intra-chip and inter-chip variabilities. Pro-ceedings of Asia South Pacific Design Automation Conference(ASPDAC), Jan. 2003.

[5] M. Qu and M. Styblinski. Statistical characterization andmodeling of analog functional blocks. IEEE InternationalSymposium on Circuits and Systems (ISCAS), pages 121–124,May-June 1994.

219219

Date post:	08-Dec-2016
Category:	Documents
Upload:	rajendran
View:	214 times
Download:	2 times

[IEEE 2008 9th International Symposium of Quality of Electronic Design (ISQED) - San Jose, CA, USA...

Documents