[IEEE 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) - San Jose, CA, USA...

Accelerated Statistical Simulation via On-demand Hermite Spline Interpolations Rouwaida Kanj, 1Tong Li, 2Rajiv Joshi, Kanak Agarwal, 1Ali Sadigh, 1David Winston, Sani Nassif

IBM Austin Research Labs, Austin TX 1 IBM Systems and Technology Group, Austin TX

2IBM TJ Watson Research Labs, Yorktown Heights NY [email protected]

Abstract We propose an efficient Hermite spline-based SPICE simulation methodology for accurate statistical yield analysis. Unlike conventional methods, the spline-based transistor tables are built on-demand specific to the transient simulation requirements of the statistical experiments. Compared with traditional MOSFET table models, on-demand spline table models use ~500X less memory. This makes Hermite spline-based table models practical for use in simulations for process variation modeling. Furthermore, we propose an efficient gate voltage offset approach to model transistor threshold voltage variation. In this scenario, evaluations of the transistor model rely on a single reference table and require one set of spline function evaluations per VT sample point as opposed to two or more sets for VT interpolation. This method is comprehensive and the results are in excellent agreement with traditional BSIM-based simulations. Around 4X improvement in speed, which includes the table generation cost, could be further improved by employing other fast-SPICE techniques or parallelism. To the best of our knowledge, this is the first time such a methodology has been coupled with importance sampling techniques to study the yield of memory designs.

Categories and Subject Descriptors B7.2 [Hardware]: Integrated Circuits – Design aids

General Terms Algorithms, Performance, Verification

Keywords Simulation, SRAM, statistical yield analysis, Hermite splines 1. Introduction

With deep technology scaling, statistical simulations play a key role in the analysis of state of the art memory and logic designs. Threshold voltage variation due to random dopant fluctuations, bias temperature instability variations, and random telegraphic noise impact the design yield [1, 2]. Device mismatch due to threshold voltage variation can lead to significant degradation of memory designs. This poses a challenge for designers who often have to deal with the tight constraint of very low fail probabilities of less than one part per million (5 sigma designs).

To analyze the statistical behavior of a design, traditional Monte Carlo techniques or fast variance reduction methods invoke transistor-level simulations [3, 4]. For statistical timing analysis and library characterization, fast simulators like ACES [5] or current source models [6] are used to replace the more accurate SPICE simulators like SPICE3, SPECTRE, HSPICE, …etc [7, 8, 9] for speed reasons. However, for memory structures like SRAM cells the statistical analysis is very sensitive to the accuracy of the underlying models (including

variability and mismatch) and requires traditional SPICE simulation accuracy. With millions of cells on today’s chips, a few failing cells can lead to memory yield loss. Hence it is an ongoing pursuit to target designs with extremely low fail probabilities. This involves analyzing meta-stable conditions of operation that are highly nonlinear. As a result, traditional fast-SPICE simulators which trade accuracy for speed (either via simplified device models, relaxed error tolerances, pre-characterized models or other approximate techniques) do not have sufficient accuracy to estimate low fail probabilities of SRAM cells with good confidence.

In this paper, we propose an on-demand Hermite spline-based statistical SPICE simulation flow. Spline tables are generated on the fly during BSIM model evaluations and then reused as necessary during the present or subsequent statistical simulations to dramatically reduce the cost per device evaluation. Given the nature of dynamic margin analysis [10] that is typically used for statistical simulations, a typical transient simulation spans a small portion of the spline tables. This results in a substantial advantage in size and computational effort to create on-demand tables compared to fully populated ones. The reduced table sizes in turn allow for a more thorough modeling of process variations over a wider range of variability.

Threshold voltage (VT) mismatch has been shown to have the most dominant effect on the memory yield. In our approach, threshold voltage variation is modeled by the proposed gate voltage, VGS, offset method which greatly improves the efficiency of the methodology. Hence on-demand spline functions are constructed at reference threshold voltage instances. The behavior of a device instance at a given threshold voltage sample point, can be derived from corresponding reference threshold voltage tables using the proper gate voltage offset as will be described in Section 2. This is in contrast to two reference tables needed typically for interpolating process variation effects. Hence, the computational cost for a given bias point is reduced to a one-spline function evaluation (at the reference table with the proper gate shift) instead of two-spline function evaluations. Such savings are key for small to medium sized circuits where the spline evaluations can be as or more costly than the rest of the simulation.

Finally, we rely on the proposed methodology to study the yield of SOI SRAM designs using an importance sampling-based technique and the results are in excellent agreement with the corresponding golden SPICE-based approach. The on-demand spline-based approach allows for runtime improvement of ~4X while accommodating for up to six sigma VT variations. In addition, the total size of the Hermite spline-based tables for the on-demand method is 500X smaller than that of fully populated tables (both using the VGS offset method). The paper is organized as follows. We present in Section 2, the VGS offset method to model VT. Section 3 introduces the Hermite spline-based table lookup approach. Section 4 provides a review of

978-1-4577-1400-9/11/$26.00 ©2011 IEEE 353

SRAM dynamic margins. Section 5 presents the results that are followed by concluding remarks in Section 6.

2. Modeling VT Variation by VGS Offset VT variation in devices can be modeled by offsetting the gate

voltage of the nominal VT device. This observation is based on the fact that the drain current (IDS) of a device is a function of voltage overdrive (VGS – VT) and not of VGS and VT separately. To a first order, the drain-to-source current of a MOS device can be expressed as:

⎪⎪⎪⎪⎪

⎩

⎪⎪⎪⎪⎪

⎨

⎧

>+−

<⎟⎟⎠

⎞⎜⎜⎝

⎛−−

<⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−

=

−−

n)(Saturatio )1)(

(Linear) 2

)(

off)-(Cut 1

'

2

)/(0

DSATDDSTGS

DSATDDS

DSTGS

T GSqkT

VS

VV

D

VVλV(VVL

Wk

VVV

VVVL

Wk

VVeeL

WI

I

DS

S

TGS

α

(1)

Note that in Equation 1, the VGS and VT terms always appear as (VGS – VT). Therefore, to simulate a device with VT = VT,nom-ΔVT, we can approximate the behavior of the modified VT device by applying a constant VGS offset of ΔVT to the nominal device model table. This offset in gate voltage replicates the effect of VT variation on device performance by impacting voltage overdrive VGS–VT in the same way as equivalent VT variation. The modified device characteristics for a given VT variation can therefore be obtained from the device characteristics of the nominal device in the following manner:

),()(),()(

,

,

DSTGSTnomTDS

DSGSnomTDS

VVVfVVIVVfVI

Δ+=Δ−⇒

= (2)

The VGS offset method is also applicable for estimating the impact of VT variation on various FET capacitances. That is due to the inversion charge density in the strong inversion region being proportional to VGS–VT. To a first order, the inversion charge density, Qinv, can be expressed as

)( TGSOXinv VVCQ −−= (3) where COX is the gate oxide capacitance [11]. When the device is in the cut-off region, the capacitance is determined by a series combination of the oxide capacitance and the depletion layer capacitance. However, as VGS–VT is increased, an inversion layer is developed at Si-SiO2 interface. This inversion layer plays the role of the bottom plate of the capacitor, thus pulling gate capacitance higher. The transition from cut-off to the inversion region is gradual and depends on voltage overdrive VGS–VT. Therefore, change in gate capacitance with gate voltage also depends on VGS–VT, and not of VGS and VT separately. There can be some differences in the depletion region capacitances for different VT devices in the cut-off region. These may not be captured by the VGS offset technique but as shown in the simulations below, this effect is very small and does not compromise the accuracy of the VGS offset approach.

We tested this method on sub-65nm regular VT NFET devices. For comparison, we considered devices with a VT variation (ΔVT) of ±100 mV and ±50 mV, and simulated the drain current (IDS(VT)) and gate to source capacitance (CGS(VT)) responses for each of these devices. We then predicted the (IDS(VGS Offset)) and (CGS(VGS Offset)) behavior for these devices by applying the appropriate VGS bias to the nominal VT FET model. Figures 1 and 2 shows the corresponding x-y scatter plots for the simulated versus predicted values for both IDS and

CGS respectively; the x-axis is the (true) simulated value and the y-axis represents the values predicted by VGS offset method. For each ΔVT value, current and capacitor values were collected for two values of VDS (1V and 0.5V) representing the linear and saturation regions; also device gate voltage VGS was swept from (0V to 1V). The figures show that the VGS offset method is very accurate in predicting drive current and capacitance over the entire range. The maximum error is below 1% for ΔVT =50mV and below 5% for ΔVT=100mV.

Normalized Plot

Figure 1. Comparison of drain current (IDS) for VT variation of of -100mV, -50mV, +50mV, and +100mV. X-axis represents true device simulation values. Y-axis represents the VGS offset method. The dashed line (45°) is the ideal matching reference line. Results for two VDS values corresponding to linear and saturation regions are shown.

Normalized Plot

Figure 2. Comparison of the gate-to-source capacitance (CGS) for VT variation of -100mV, -50mV, +50mV, and +100mV. X-axis represents true device simulation values. Y-axis represents the VGS offset method. The dashed line (45°) is the ideal matching reference line. Results for two VDS values corresponding to linear and saturation regions are shown.

We also verified the VGS offset technique with hardware data

from a 65 nm SOI process [12]. The macro consists of an addressable array of densely packed devices with identical widths and layouts. We can select the desired FET in the array by scanning a digital pattern that turns on the appropriate row and column switches. Once a device is selected, the gate and drain voltages can be applied to the selected device and the device current can be measured. We measured IV characteristics of a large number of FETs and extracted their threshold voltages

354

from the peak-gm method for VT extraction. The IV-characteristics for different FETs were also estimated by applying a VGS offset to the nominal VT FET data.

Figure 3 shows the measured IDS as a function of VGS for 3 FETs: the nominal VT device and ±3σ ΔVT devices (near ±3σ corners of VT variation). In the figure, we also show the estimated drive currents as obtained by the proposed VGS offset technique. The figure shows that the estimated response curves match very well with the measured data.

ΔVT (-3σ) MeasuredΔVT (-3σ) Estimated

ΔVT (+3σ) MeasuredΔVT (+3σ) Estimated

VT (nominal) Measured

Figure 3. Comparison of measured data [12] to estimated (using VGS offset technique) response for FETs with nominal and ±3σ VT variation in a 65 nm SOI process.

3. On-demand Hermite Spline Tables Traditional simulators like SPICE apply multi-step

integration with linearization to solve the transient response of the design. During transient analysis, for each device, the simulator calls the device evaluation routine to calculate the element values and derivatives versus voltage at the present state of the device. In most cases, device evaluations consume the largest part of the circuit simulation CPU time. Among all kinds of devices in a circuit, MOSFET device evaluations normally are the most time consuming.

The well known BSIM models are widely used to provide accurate element and derivative values to circuit simulators. Consider an SOI device (see Figure 4) for example. The BSIM model [13] returns the current, charge, thermal capacitance and thermal resistance values as well as their corresponding first order derivatives at a given bias point (VGB, VSB, VDB, and VBX). The list of the BSIM model elements for an SOI device includes the following.

• Currents elements: IBD, IBS, IGB, IGD, IGS, IDS • Charge elements: QGB, QSB, QDB, QBX • Thermal capacitance and current: CTH, ITH • Thermal resistance: RTH

The BSIM model evaluation is expensive due to the model’s complexity. Using a spline-based table to represent the complex BSIM model is one way to improve the SPICE simulation speed. Consider channel current IDS as an example: ) ,= SBDBGBDS V,V(V f I (4) Node X has negligible effect on channel current because of the buried oxide (BOX). During circuit simulation, assume VGB, VDB, VSB all ∈ [–Vmax, +Vmax] (Vmax > 0). This forms a cube which covers the bias voltage space (VGB, VDB, VSB). Each bias

voltage dimension can be split into n equal intervals which results in n3 small cubic blocks. Using the same grid size on the three axes is only a matter of convenience and should not be deemed as a limitation of the method. We call the vertices of the small cubes grid points. IDS information at the grid points are obtained by calling the BSIM evaluation function. For any bias points that do not fall on a grid point, interpolation methods can be used to provide IDS current value and its derivatives. In our implementation, a cubic Hermite spline function [14] is used to perform the interpolation during the model evaluation phase of a SPICE-like simulator for all the device operating regions. This is unlike other sophisticated table interpolation approaches [15] where the interpolation function is chosen based on the device operating region.

BOX

B

G

X

S D

Figure 4. SOI device cross-section. BSIM model evaluation elements are dependent on the bias voltages of the device. The floating body node adds to the complexity compared to traditional bulk devices.

3.1 Cubic Hermite Spline Functions for Table Models

Cubic Hermite spline functions can guarantee continuity of function and its first order derivative values [16]. The authors in [17] used the 1-D cubic Hermite spline function to interpolate the values from a 2-D device model to form a smooth and monotonic 3-D device model suitable for circuit simulation. Here, we construct 3-D tables by relying on direct evaluations from the BSIM model and we use 3-D cubic Hermite spline functions to do the interpolation. When created in this manner, the cubic Hermite spline functions guarantee continuous function and derivative values that can be critical to the convergence of DC and transient simulations. 3.1.1 Generation of cubic Hermite Spline Functions (3-D)

Given a function g(x,y,z) representing the device characteristics in the mth cube (m ∈[1, n3]), a cubic Hermite spline function is a 3rd-order polynomial function f(x,y,z) that approximates the original function g() according to

kji

i j kijk zyxCzyxfzyxg ∑∑∑

= = =

=3

0

3

0

3

0),,(~),,( (5)

The function f(x,y,z) has 64 coefficients, Cijk, that are obtained from g(x, y, z) and its derivatives. To guarantee a unique, continuous, and isotropic solution, Cijk solution requires the following set of derivative and function evaluations.

},,,,,,,{3222

dxdydzgd

dydzgd

dxdzgd

dxdygd

dzdg

dydg

dxdgg (6)

The function value and its 1st order derivatives are obtained from BSIM evaluations at each cube vertex. We use finite differences to estimate the necessary second and third order derivatives. To solve for Cijk, the problem is reduced to the form

bB =α (7)

355

where B is a 64x64 matrix that can be readily computed and is available in explicit (code) [16]. α is the Cijk vector and b is derived by evaluating (6) at the 8 vertices of the cube. The resulting coefficients Cijk satisfy the C1 continuity condition at the eight corners of the cube. This condition is also sufficient to guarantee C1

continuity of f on each face of the cube. A proof of the continuity of dzdfdydfdxdff /,/,/, at the cube boundaries and across the vertex points (Lipschitz continuity) is available in [16].

Compared with the natural cubic spline function, which is C2 continuous everywhere, the cubic Hermite spline function is not C2 continuous across boundaries. The benefit of sacrificing the boundary smoothness is that cubic-Hermite spline function follows the original function much better than the natural cubic-spline function. The oscillating effects of the interpolated function around the grid data points are diminished in cubic Hermite spline function compared to the natural cubic spline function over the same interpolation interval. This helps avoid oscillations in the interpolated function which can have severe negative impact on DC/Transient convergence during simulation. Based on our observation, the number of Newton-Raphson iterations are essentially the same by using BSIM model and the cubic-Hermite spline table model which is generated based on BSIM model. See Table I as an example. This implies that cubic Hermite spline function based models have negligible impact on DC/transient convergence properties of a SPICE-like simulator.

Based on the above statements, cubic Hermite spline functions offer an efficient and unique closed form interpolation structure. Most importantly, to generate an approximation function using cubic Hermite spline function we only invoke the local cube’s data. This is in contrast to general splines/interpolation structures that require data beyond a given cube to generate the spline function. Hence, this feature allows for efficient data handling and spline generation. Moreover, it is ideal for on-demand table model generation because spline creation is restricted to its local cube. For statistical simulations, this means the spline generation can be restricted to the dynamic margin simulation requirements.

Table I: Newton-Raphson iteration count comparison. Data is collected by running DC and transient analysis on nearly 20 types of circuit topologies.

Number of Iterations (DC analysis)

Number of Iterations

(Transient analysis)Cubic Hermite spline-based table model

3474 318354

BSIMSOI model 3474 316960

3.2 On-demand Spline Generation To guarantee comparable accuracy to BSIM models which

are highly non-linear, the grid size used in a table model can not be very large. Using small grid sizes however to create a table would be expensive. This is especially true with the large number of device instance parameters and design considerations.

In our approach, an on-demand method is used to generate a table model, i.e., we only create table entries for those cubes which are visited during DC/transient simulation. The approach is illustrated in Figure 5 and can be described as follows.

1) Given a new device instance, search whether an existing table model has been created for this instance based on the device size, options and overrides.

a) If the corresponding table model exists, then use this table model for this instance. b) If the table model does not exist start with a new unpopulated table for this instance.

Hence, if two devices in the netlist are the same in terms of size, options and overrides, then they share the same table model. 2) When the simulator requires an element value and its derivatives at a particular bias voltage Vbias=(VGB, VDB, VSB), perform the following steps:

a) Find the cubic block which contains this bias point. b) Check if a Hermite spline function exists for this cube.

i) If a Hermite spline function exists for this cube, simply calculate the element value and its derivatives using the spline function. ii) If no spline function exists for this cube, a cubic Hermite spline function for this block is generated and saved for later use.

To create a cubic Hermite spline function for a block, we check each vertex of this block to see whether the required function values are available. If not, we call the BSIM model evaluation function to obtain them and solve Equation 7 to create the cubic Hermite spline function for this block. The same steps are performed for all other elements in the device model.

For each MOSFET instanceat certain bias voltage Vbias

Is spline function created for this cube ?

Get the cube which contains Vbias

Call BSIM model to get grid value

Create spline function for this cube

grid value Exists?

N

Y

N

Y

For each vertex of the cube

Spline evaluation

Grid Point/Vertex

Bias voltage ‘Vbias’

Locus of transient simulation solution

Figure 5. Spline function generation and evaluation flow for the proposed on-demand approach.

Figure 6 illustrates the IDS table entries for an SOI NFET that were generated during the simulation of an inverter in response to an input pulse. We can clearly see that the required table entries make up a very small fraction of the voltage space.

The on-demand table generation approach is especially suitable for statistical yield analysis experiments. Simulations for a given sample point require only a few (typically 1-3) clock cycles to study the dynamic margins of the design. This experiment is repeated for many other sample points and supply/bias conditions. Hence devices visit only small portions of their entire space with a high probability for reuse. This reuse rate is further improved due to the symmetrical nature of the

356

SRAM cell and the proposed VGS offset method as will be described later. The on-demand table generation effort is as complex as the designer experimental scope of interest and is typically very efficient for pre-defined per-design experiments. Since tables are stored, they can also be reused for subsequent (Monte Carlo) sample points leading to more savings. Most importantly, the accuracy is comparable to golden SPICE

VGB

VSB

VD

B

(a)

VD

B

(b) VGB

Figure 6. Table entries for a pulsed inverter simulation portrayed in (a) 3-D view and (b) a corresponding 2-D view are very sparse.

Figure 7 illustrates a perfect match between the proposed on-demand spline-based table lookup (TLU) model and full BSIM model simulations (REF) for a memory storage node during Read operation. The table grid size for the three bias voltages was set to 50mV.

3.3 Threshold Voltage Variation The reduced table size accommodates for device instance modeling with a wide range of process variations. We are particularly interested in modeling threshold voltage variation and its impact on yield due to device mismatch. In principle, a new table model is needed to represent the device instance for each new ΔVT sample point. This can lead to a large number of tables needed for the statistical simulation purposes with hundreds of sample points involved. Alternatively interpolation

of the threshold voltage variation effect can be used. However, interpolation at a given sample point ΔVT value would require evaluation of the cube elements at two (or more) neighboring ΔVT tables. This approach can increase the model’s spline evaluation cost. Instead, we rely on the gate offset VT

modeling approach proposed in Section 2. With the VGS offset modeling, device characteristics are derived from the nearest ΔVT table along with the proper gate bias offset. Only one set of spline function evaluations is needed per bias point for a given sample point. Different ΔVT sample points (requiring different gate voltage offsets) can share the same table model if they are based on the same reference ΔVT value. In general, the number of required reference ΔVT tables is small (recall results of Figures 1 and 2). In the limit, when the number of device instances subject to variability is large, the VGS offset method can be used to model the ΔVT variability trends by relying on the nominal device (VT,nom i.e., device with ΔVT=0) table only.

BSIM(REF)

On-demand (TLU) Model

Time

0 V

Wordline

Storage nodesolution

Vol

ts

Figure 7. Waveform matching for the simulation of an SRAM memory cell between the TLU approach and golden full BSIM –based SPICE (REF); during Read the storage node of a typical cell can undergo several 10s of millivolts of node upset.

The approach can be best summarized in Figure 8. First here are some definitions:

• ΔVTstep is the step between two reference ΔVT tables. For example, for a ΔVTstep=50mV, device instance tables are created for reference ΔVT values δv={0, ±50mV, ±100mV, …}.

• TLUi:δv represents the spline function table for device type i, with reference threshold voltage variability value δv.

Without loss of generality, Figure 8 considers the example of the SRAM cell in Figure 9. The cell consists of six devices: two pull-up (PU), two pull-down (PD) and two pass-gate (PG) devices. Given a sample point {ΔVTi}, i ∈ { PUL, PUR, PDL, PDR, PGL, PGR} representing the left(L) and right(R) devices of the SRAM cell, we solve for VGS offset values {ΔVG:PUL, …, ΔVG:PDR} required for the modified circuit of Figure 9 according to (8). The ‘round’ function finds the nearest integer.

)(*1

*)(

: iTiiG

stepTstepT

Tii

vVV

VV

Vroundv

δ

δ

−Δ−=Δ

ΔΔ

Δ= (8)

For example, for a device with ΔVT=60mV we rely on the spline table with δv=50mV and we apply to the device an additional bias ΔVG=-10mV. Note that while the VGS offset is applied explicitly in Figure 9, it is also possible to embed the offset into the device model; this would make the simulation more efficient by reducing the number of circuit elements. Finally, the reference tables allow to span a wide range of VT variability and

357

hence allow us to estimate with confidence 5 sigma design yields (less than 1 part per million fails).

Statistical Simulation Flow

Generate new sample point (ΔVT:i); i ∈ {PUL, PUR, PGL, PGR, PDL, PDR)

For each device instance iFind reference δvi that results in smallest gate offset ΔVG:i

such that ΔVTi= δvi - ΔVG:i according to (8)

More sample pointsneeded

stop

NoYes

Simulate the modified circuit (with ΔVG:i)Invoke device reference tables (TLUi:δvi ) Evaluate/Build spline functions for (TLUi:δvi ) according to Figure 5

Figure 8. Statistical simulation flow with VGS offset method.

L

ΔVG

:PGL

-+

-+

- + + -

- + + -

ΔVG

:PGR

ΔVG:PUL ΔVG:PUR

ΔVG:PDL ΔVG:PDR

PUL PUR

PDL PDR

PGL PGR

Wordline Wordline

Bitl

ine

Bitl

ine

Figure 9. Modified SRAM cell schematic to accommodate for VGS offset.

4. Applications to SRAM Dynamic Margins Due to the hysteretic behavior of the SOI device body node

[10], static noise margin-based analyses can underestimate the yield of memory designs. It is more valuable to evaluate the yield in terms of the cell dynamic behavior. In this section we focus on two SRAM cell design metrics in terms of dynamic stability and writability. Dynamic Stability: In a typical read phenomenon, charge from the bitlines can be injected onto the cell’s ‘0’ storage node. A weak cell subject to variability may not sustain the injected noise and may erroneously flip. This is often referred to as a destructive read, and is evaluated by studying the dynamic read behavior of the cell; e.g., this happens when the node upset in Figure 10 is large. For SOI devices, the hysteretic behaviour can impact the stability yield most in a ‘Read after Write’ scenario as illustrated in Figure 10. This scenario preconditions the body of the critical devices in a worst-case manner. Dynamic Writability: Writability of the cell is another dynamic functionality metric. Often the designers look for the ability to flip the contents of the cell in the specified cycle as a metric for

writability yield. Others also account for potential hazards due to specific timing and design considerations of the peripheral logic.

WLL

time

Write ‘0’to Node L

Read ‘0’from Node L

Node upset

Figure 10. Worst-case vector for stability analysis. Dashed waveform represents the left ‘L’ storage node (see Figure 9).

Typically, the dynamic writability and stability yields can be studied simultaneously using the setup of Figure 10. The simulation experiment is repeated for thousands of sample points and at different supply conditions. This makes it an ideal candidate for the on-demand approach both in terms of the spline sparseness and reuse opportunities. Finally, both metrics are sensitive to the design environment and the circuit under study typically is a lot more complex than the individual cell. Most of the logic devices however are often large enough that, to a first-order, variability is restricted to the SRAM cell devices and possibly few other devices.

5. Analysis and Results We study the yield of a sub65-nm SRAM design subject to

random dopant fluctuation effects. The memory cross-section accounts for the cell and peripheral logic as well as the bitline loading and wire capacitance. The threshold voltages of the six transistors of the SRAM cell are subject to random variations. The standard deviation numbers are extrapolated from hardware. The frequency of operation is set to be bias voltage dependent, with fmax=2 GHz. For all the experiments, the design is assumed to be a dual supply design: the two supplies are ‘Vcs’ (the cell supply) and ‘Vdd’ (logic supply). The cell pFETs are connected to ‘Vcs’, the wordline high is also at ‘Vcs’, and the rest of the logic, including the bitline precharge circuitry, operates at ‘Vdd’. In our experiments, we rely on the following assumptions/definitions.

• A yield schmoo is a plot of the yield at different (Vdd, Vcs) combinations. The range for Vdd and Vcs is chosen wide enough to accommodate for low power and high performance operation (several hundred mV range). Extreme conditions are also studied to validate the methodologies for low and high yield values.

• Each point on the yield schmoo, or each experiment, is a statistical importance sampling simulation [3, 4]. It involves few thousand SPICE simulations. The threshold voltage of each device can vary anywhere between ±6 times its standard deviation, ±6σVT.

• The number of reference tables TLUi:δv in turn depends on the possible choices of reference table ΔVTstep {30mV, 60mV, 90mV}. Typically we require few reference tables.

• REF is the ideal golden full BSIM-based SPICE simulation.

358

• Hermite spline-based table lookup (TLU) is the proposed methodology. The table is constructed assuming grid size of 50mV for each of the bias voltages (VGB, VDB, VSB) over the range [-2V, +2V] (Vmax=2V). For simplicity, in this paper, we are assuming a uniform grid, but of course non-uniform grids are also possible. The underlying simulator is the same SPICE engine used for the golden simulations. The main difference is that the BSIM model calls are replaced by spline interpolations. Hence the two methods use the same netlist and underlying infrastructure.

Figure 11 illustrates the stability yield at different dual supply combinations for reference table ΔVTstep=30mV, 60mV and 90mV respectively. Excellent matching trends are shown, with ideal yield matching for the ΔVTstep=30mV as indicated in the corresponding yield schmoo in Figure 12. The writability yield was ideal (as predicted by both methods) in this example.

Most importantly we find that the size and number of on-demand device lookup tables is small enough that we can rely on ΔVTstep=30mV. For a given reference table TLUi:δv, the number of populated grid points by all the experiments is on average 500X smaller than that of the fully populated table. Table II summarizes the number of BSIM evaluations needed for a set of statistical experiments consisting of 16000 simulations for (a) a golden SPICE simulator (b) fully populated reference tables and (c) the proposed on-demand reference tables. The on-demand approach is very efficient in terms of the number of required BSIM evaluations; the ratio to the fully populated table is 1-to-628.

Stability Yield

1 3 5 7 9 11 13 15 17 19 21 23 25Experiment Number

Nor

mal

ized

Yie

ld S

igm

a

REFStep=30mVStep=60mVstep=90mV

TLU

Min

yie

ldM

ax y

ield

Figure11. Yield under different supply conditions. Reference table ΔVTstep = 30mV, 60mV, and 90mV. Extreme (nonrealistic) operating supply voltage points are employed for tool validation purposes to enable wide ranges of ‘Min’ and ‘Max’ Yields.

We then study the yield for increased device threshold voltage variation (i.e., increased σVT); this increase in σVT can be justified by additional sources of variability like bias temperature instability (BTI) effects [2], and has been exaggerated for purposes of our experiments to validate the tool accuracy. Figure 13 illustrates the results at the extreme points of the yield schmoo for reference table ΔVTstep =30mV and 60mV. Again the results match very well even for the higher σVT values.

Additional model variability is further injected to the circuit in terms of mobility and saturation velocity variations that are correlated to the VT variations. This in turn impacts the writability yield for the low Vcs regions due to weakened pass

gates. Hence, we study the yield schmoos for both the writability yield and conditional (to writability fails) stability yield. We identify non-monotonic conditional stability yield trends for the low Vcs regions. Despite this, we see identical yield schmoos for both REF and TLU (writability and stability); Figure 14 illustrates the case for reference table ΔVTstep=30mV.

VddL VddH VddL VddH

VcsH

VcsL

VcsH

VcsL

(a) (b)

Stability Yield AnalysisDual Supply design: Cell and WL @Vcs, Bitlines @Vdd

REF TLU: ΔVTstep=30mV

Figure 12. Yield schmoo for reference table ΔVTstep=30mV. Yield matching is almost ideal with less than 0.1 sigma error.

Table II. Comparison of the number of BSIM evaluations. Golden SPICE

(REF) Fully

populated Tables

On-demand TLU

#BSIM Evaluations

54.08*1e6 138.24*1e6 219*1e3

Normalized to TLU

245 628 1

Yield

VddL, VcsL VddH, VcsL VddL, VcsH Vddh, VcsH

Sigm

a Yi

eld

REFStep=30mVStep=60mV

σVT1

σVT2

σVT3

6% errorTLU

Min

yie

ldM

ax y

ield

Figure 13. Stability yield is matched even at extreme schmoo points (which can be sometimes nonrealistic operating points). σVT1 < σVT2 <σVT3. Exaggerated variability and extreme (nonrealistic) operating points employed for tool validation purposes.

Finally, for all the previous experiments, we note significant improvement in runtime of our statistical experiments using the proposed on-demand method. Figure 15 illustrates the distribution of the runtime for the different experiments. We find

359

that on average the runtime for the proposed TLU method (including table generation cost) improves by 4X compared to the golden SPICE (REF) simulations.

(a) (b)

(c) (d)

Conditional Stability Yield Writability Yield

VddL VddH

VcsH

VcsL

VddL VddH

VcsH

VcsL

VcsH

VcsL

VcsH

VcsL

REF

REF

TLU

TLU

Figure 14. Increased variability effects impacts the writability yield as well. We see perfect matching for the yield schmoos (a) vs. (c) and (b) vs. (d).

CPU Time Comparison

0

2

4

6

8

10

12

14

Bin 450

650

850

1050

1250

1450

1650

1850

2050

2250

2450

2650

2850

3050

3250

3450

3650

3850

RunTime (s)

Freq

uenc

y of

Occ

urre

nce

REFTLU

6562459mean10263950max3561359minTLUREF

Figure 15. Different experiments illustrate ~4X speedup for on-demand approach including table generation cost.

6. Conclusions

We propose a methodology for efficient and accurate statistical SPICE simulations. The methodology relies on cubic Hermite splines to offer a unique closed form interpolation structure for local cubic blocks. This enables an on-demand spline-based interpolation flow suitable for statistical dynamic margin analysis. To accommodate for process variation, the method relies on a gate voltage (VGS) offset method for efficient device model evaluations, thereby reducing the cost of variability modeling compared to traditional methods. The technique allows significant reduction in table size and runtime requirements and most importantly maintains a high degree of accuracy that is required for statistical analysis of memory designs. The approach is validated using state-of-the-art memory designs where an excellent agreement of the statistical simulation results is demonstrated compared to golden SPICE.

7. References [1] A. Asenov, “Random dopant induced threshold voltage

lowering and fluctuations in sub 0.1 micron MOSFETs” IEEE TED, Dec 1998, pp. 2505-2513

[2] S.E. Rauch, III “The statistics of NBTI-induced VT and β mismatch shifts in pMOSFETs” IEEE TDMR. 2002, pp. 89-93

[3] R. Kanj, R. Joshi, and S. Nassif, “Mixture importance sampling and its application to the analysis of SRAM designs in the presence of rare failure events” DAC 2006, pp. 69-72

[4] A. Singhee, S. Singhal, and R. Rutenbar, “Practical, fast Monte Carlo statistical static timing analysis: why and how.” ICCAD 2008, pp. 190-195

[5] A. Devgan, R.A. Rohrer, “Adaptively controlled explicit simulations”, IEEE TCAD, 1995, pp. 746-762

[6] J. Croix, D.F. Wong, “Blade and razor: cell and interconnect delay analysis using current-based models” DAC 2003, pp. 386 - 389

[7] L. W. Nagel. SPICE2: A Computer Program to Simulate Semiconductor Circuits. PhD thesis, University of California, Berkeley, 1975

[8] SPECTRE, http://www.cadence.com/products/cic/spectre_ circuit/pages/default.aspx [9] HSPICE, www.hspice.com [10] R. V. Joshi, S. Mukhopadhyay, D. Plass, Y. Chan, C.-T.

Chuang, and A. Devgan, “Variability analysis for Sub-100 nm PD/SOI CMOS SRAM cell,” ESSCC 2004, pp. 211-214

[11] Chenming C. Hu, Modern Semiconductor Devices for Integrated Circuits. Upper Saddle River, NJ: Pearson Higher Education, 2010.

[12] K. Agarwal, J. Hayes, and S. Nassif, “Fast Characterization of Threshold Voltage Fluctuation in MOS Devices”, IEEE TSM, vol. 21, no. 4, Nov. 2008, pp. 526-533.

[13] BSIMSOI, www-device.eecs.berkeley.edu/~bsimsoi/ [14] C. de Boor, A Practical Guide to Splines. New York:

Springer-Verlag, 1978 [15] V. Bourenkov, K.G. McCarthy, and A. Mathewson, “MOS

table models for circuit simulation,” IEEE TCAD 2005, pp. 352-362

[16] F. Lekien, and J. Marsden “Tricubic interpolation in three dimensions,” International Journal for Numerical Methods in Engineering, 2005, pp. 455-471

[17] T. Shima et al., “TABLE Look-Up MOSFET Modeling System Using a 2-D Device Simulator and Monotonic Piecewise Cubic Interpolation” IEEE TCAD, 1983, pp. 121 – 126

360

Date post:	09-Dec-2016
Category:	Documents
Upload:	sani
View:	219 times
Download:	4 times

[IEEE 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) - San Jose, CA, USA...

Documents