+ All Categories
Home > Documents > High-Performance SRAM Design

High-Performance SRAM Design

Date post: 24-Dec-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
31
High-Performance SRAM Design Rahul Rao IBM Systems and Technology Group
Transcript
Page 1: High-Performance SRAM Design

High-Performance SRAM Design

Rahul Rao

IBM Systems and Technology Group

Page 2: High-Performance SRAM Design

Thought exercise

READ Path

WWL

WBL WBLb

RWL

RBL READ Path

WWL

WBL WBLb

RWL

RBL

Page 3: High-Performance SRAM Design

2

Implement logic function via 8T merge

• Concept: Use the 8-T portion of the cell for implementing logic functions– Possible due to decoupling of

read and write paths

• RBL discharges when either C0 or C1 is high– Read stack remains 2-high

• And function: switch definition of WBL and WBLB

• An OR2 embodiment is shown on right– Can be complex OR4, OR8, etc.

WBL

WWL

WBLB

RWL

RBL

C0

C1

Page 4: High-Performance SRAM Design

3

A More Practical Implementation

+ Share RWL between adjacent cells => 3 word-lines in 4 metal tracks – Reduces coupling

capacitance (+ performance and power)

+ All FEOL features identical to conventional 8T cell– OR– upper device in stack can be

made smaller, reducing cell size

WBLWBLB RBL

WWL0

RWL

C0

WWL1

C1

Page 5: High-Performance SRAM Design

Low leakage SRAM

(S. Hanson , VLSI Symp ’08)

good trade-off between high granularity in power gating and footer over-head. These footers are selectively turned off during sleep mode based on the contents of the free-list, reducing DMEM sleep power DMEM from 22.5pW at full retention to 7.5pW at zero memory retention, a 66% reduction. [1] F. Albano, et al, Journal of Power Source, 170, 2007 [2] S. Hanson, et al, Symposium on VLSI Circuits, 2007.

[3] A. Wang, et, al, International Solid-State Circuits Conference, 2004. [4] C. Kim, et al, Transactions on VLSI Systems, Vol 11, 2003. [5] B. Zhai, et al, Symposium on VLSI Circuits, 2006. [6] B. Calhoun, et al, International Solid-State Circuits Conference, 2005. [7] Y. Lin, et al, Custom Integrated Circuits Conference, 2007. [8] M. Seok, et al, Design Automation Conference, 2007. [9] L. Chang, et al, Symposium on VLSI Technology, 2005.

Fe

tch

De

co

de

Pip

elin

e R

eg

iste

r52-b

it F

ree L

ist

...

Foote

r C

ontr

ol ......

40 columns

52

row

s

...

......

...

...

...

... ...

Array

Footer

Control

Free

List

......

wbl

wbl_

b

rbl

Decoder

...

Data In

wwl_b[0]..

..

I/O Driver

sleep_b

..

wbl wbl_brwl

rwl

wwl_b

0.44/0.50

0.55/0.50

0.55/

0.35*

0.55/0.18

0.55/0.18

Vss,cell Vss,gated

Vdd

rbl

Vdd

wwl_b

Low Leakage Cell Read Buffer

rwl[0]rwl_b[0]

Figure 10: 10.9fW/cell memory architecture. Array diagram highlights footer sharing every 2 rows. Inset figure shows bitcell architecture.

sleep

used?

sleep_b

Vss,gated0.99/

0.50

Data out

Write Complete

*W/L in umMVT HVT

Delay Replica

64x10

IMEM

128x10

IROM

52x40

DMEMCPUTimer

Temp

SensorPMU

Clock

GenI/O

W=0.66-28um

L=0.5um

W=0.99um

L=0.5um

W=5.5um

L=0.35um

26

footers

=MVT=HVT

Figure 4: Footer allocation in Phoenix

0.4 0.5 0.6 0.7 0.8 0.9

10

100

1000 Freq

Vdd

(V)

Fre

quency (

kH

z)

2

3

4

5

6

7

8

Eactive

/cycle

Active E

nerg

y p

er

Cycle

(p

J)

Active Mode Sleep Mode0

20

40

60

80

100

CPU

IROM

IMEM

DMEM

2.8pJ/cycle, 297nW 29.6pW

Timer

IROM

IMEM

DMEM

CPU[%]

0 4 8 12 16 20 24 28

100

200

300

400

500

CPU Footer Width (um)

Freq

uenc

y (K

Hz)

0 4 8 12 16 20 24 280

5

10

15

20

25

30

35

40

45

~10X

Sle

ep L

eaka

ge C

urre

nt (p

A)

CPU Footer Width (um)0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75

1

10

100

1000

3

4

5

6

7

Act

ive

Ene

rgy

per

Cyc

le (p

J)

Freq

uenc

y (k

Hz)

Vdd

(V)

Freq, W=28um

Freq, W=0.66um

Energy, W=28um

Energy, W=0.66um

Reduced min active energy for larger footer

0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75

10.0n

20.0n

30.0n

1.0m

2.0m

~104X

~2.5X

energy-optimal point

Tota

l Ene

rgy

(J)

Vdd

(V)

Wfooter

= 0.66um

Wfooter

= 28um

no footer

978-1-4244-1805-3/08/$25.00 © 2008 IEEE 2008 Symposium on VLSI Circuits Digest of Technical Papers 189

The Phoenix Processor: A 30pW Platform for Sensor Applications Mingoo Seok, Scott Hanson, Yu-Shiang Lin, Zhiyoong Foo, Daeyeon Kim, Yoonmyung Lee, Nurrachman Liu, Dennis Sylvester, David Blaauw

University of Michigan, Ann Arbor, MIAbstract

An integrated platform for sensor applications, called the Phoenix Processor, is implemented in a carefully-selected 0.18µm process with an area of 915x915µm2, making on-die battery integration feasible. Phoenix uses a comprehensive sleep strategy with a unique power gating approach, an event-driven CPU with compact ISA, data memory com-pression, a custom low leakage memory cell, and adaptive leakage man-agement in data memory. Measurements show that Phoenix consumes 29.6pW in sleep mode and 2.8pJ/cycle in active mode.

Introduction Form-factor is a critical concern for wide applicability and cost effec-tiveness in sensor systems, especially in medical applications. In this work, we explore the development of a sensor platform, called the Phoe-nix Processor, which will occupy only 1mm3 when coupled with an on-die battery. To ensure multi-year lifetime for the given platform size, average power consumption must be reduced to tens of pW [1], which marks a ~4000X reduction over previous sensor designs [2]. Recent work [2-6] has explored aggressive Vdd scaling for reducing active en-ergy but has overlooked the power consumed during idle periods, which can be >99% of the lifetime. In addition to aggressive voltage scaling, Phoenix leverages a comprehensive sleep strategy, including the inten-tional selection of an older, low leakage technology, a unique power gating approach, an event-driven CPU with compact instruction set, data memory compression, a custom low leakage memory cell, and adaptive leakage management in the data memory. A test chip, including a sensor and timers, was fabricated in an area of 915x915µm2 in a 0.18µm proc-ess. Phoenix consumes 29.6pW in sleep mode and 2.8pJ/cycle in active mode at Vdd=0.5V. For a typical sensor application that runs 2000 in-structions every 10 minutes, average power consumption is 39pW, which will enable integration of an on-die battery in a volume of 1mm3 while ensuring >1 year lifetime [1].

System Overview As shown in Fig 1(a), Phoenix is a modular system with the CPU, memories (DMEM, IMEM, IROM) and power management unit (PMU) serving as parents of the system bus and peripherals (timer, sensor) act-ing as children. The accommodations made for sleep mode are best un-derstood by exploring typical system operation (Fig 2(a)). The system begins in sleep mode (A), where 65-87% of all transistors are power gated using footers (depending on the retentive size of DMEM). In this mode, the PMU, a 2-bit FSM (Fig 3), remains awake and acts as the parent of the system bus. All gates in the PMU use stacked high-Vth (HVT) devices with Vth~0.7V to minimize leakage. In addition to the PMU, IMEM cells and valid DMEM cells remain awake to retain data. As shown in Fig 2(c), measurements at Vdd=0.5V reveal that total power is limited to 29.6pW in sleep mode with half DMEM retention. Note in Fig 2 that IMEM and DMEM draw 89% of sleep mode power. After a programmable sleep time (e.g., 10min), a 0.9pW timer similar to [7] raises an interrupt on the system bus using the asynchronous pro-tocol. In response the PMU initiates a short wake sequence and relin-quishes control of the bus to the CPU (B). The CPU then runs a routine of ~2000 instructions to query the sensor for data and process the ac-quired data (C). During this routine, the CPU decompresses a block of DMEM and places it in the cache (which is part of the register file), requests a sensor measurement over the bus, processes the returned data, compresses the cache contents and stores it to DMEM, and finally issues a sleep request. The PMU then regains control of the bus and gates sys-tem power (D,E). Fig 2(b) shows measured energy and frequency char-acteristics for Phoenix in active mode across Vdd. At Vdd=0.5V, the sys-tem operates at 106kHz and consumes 2.8pJ/cycle, with 88% of energy consumed by the CPU. Phoenix effectively operates at Vdd=384mV due to a non-zero virtual ground.

Sleep Strategies One of the critical pieces of our sleep strategy is the unique approach to power gating (Fig 4). Traditional power gating uses a wide HVT footer to minimize the voltage drop across the footer and thus maintain per-formance. Additionally, a HVT footer is attractive since it gives a dra-matic leakage reduction compared to a medium-Vth (MVT) footer for minimal delay penalty. Our approach for low Vdd power gating differs in two ways: 1) we use a MVT footer since on-current for a HVT footer is exponentially smaller at low Vdd, making HVT footers infeasible, and 2) our aim is to minimize energy rather than maintain performance [8], so footer size is set to only 0.66µm (0.01% of total effective NFET width) with L=0.50µm. Due to its small size, the footer develops a voltage drop of 116mV in active mode (frequency implications shown in Fig 5), ele-vating the energy-optimal Vdd to 0.5V. (In contrast to the optimal value of 0.36V reported for the un-gated design in [2]) The reduced leakage of the small footer (Fig 6) easily offsets the slight increase in active energy due to the power consumed across the footer (Fig 7). Total energy is reduced by 2.5X compared to a somewhat larger footer of 28µm and by 4 orders compared to a design without a footer (Fig 8). Additionally, the elevated optimal Vdd aids in robust SRAM design. While the CPU (Fig 9(a)) largely impacts active mode power, it also plays an important role in sleep mode. An event-driven operating system initiates computation only in response to interrupts raised by peripherals, ensuring that the system defaults to sleep mode. Since IMEM is a major source of sleep mode power, the instruction set was chosen to minimize IMEM footprint using a minimum group of basic instructions while also including support for compression, interrupt and sleep functionalities. To limit instruction width to only 10 bits, common instructions use flexible addressing modes while less common instructions use implicit operands. Additionally, the 64-word IMEM is supplemented with a low leakage 128-word IROM that contains common functions. Hardware support for compression (Fig 9(c)) was included in the CPU to minimize the DMEM footprint in sleep mode and to maximize mem-ory capacity. A virtual data memory space of 512B is mapped to the 256B DMEM using Huffman encoding with a fixed dictionary for a maximum compression ratio of 50%. DMEM is divided into 2 partitions: statically and dynamically allocated (Fig 9(b)). Each group of 16B (a block) in virtual memory is given one line in the statically allocated partition. If a write to memory causes a block to overflow its statically allocated entry, overflow data is written to an entry in dynamically allo-cated memory whose location is noted by a pointer in the statically allo-cated entry. A 52b free-list, which is visible to the CPU, monitors the usage of entries in both memory partitions. Since SRAM can dominate total energy consumption, we place empha-sis on low leakage SRAM design. Both IMEM and DMEM use the bit-cell shown in Fig 10. The cross-coupled inverters and access transistors use HVT devices, while stack forcing and gate length biasing are used to further reduce leakage and improve subthreshold swing. Measurements show that a single bitcell (~40µm2, which is acceptable for sensor appli-cations) consumes 10.9fW while retaining data. To enable robust low Vdd operation, the proposed cell includes a MVT read buffer similar to [9]. The MVT read buffer also enables single-cycle read-out despite the aggressive use of HVT devices elsewhere. This is useful for the IMEM, where a read occurs every cycle. Since the write operation in DMEM is slow relative to the MVT CPU, write operations are asynchronous. Write completion is determined by reading the contents of the row being writ-ten and comparing to the write data. Read is single-ended, so a replica delays the write completion signal to guarantee that both sides of the cell have been written correctly. To further reduce sleep power, the DMEM uses a leakage reduction scheme based on the free-list. The DMEM has 26 footers, with each connected to 2 rows (Fig 10). The choice of 2 rows per footer offers a

978-1-4244-1805-3/08/$25.00 © 2008 IEEE 2008 Symposium on VLSI Circuits Digest of Technical Papers 188

Page 6: High-Performance SRAM Design

5

BL0

WL0a

Multi-porting SRAM Cell

WL0b

BL1BLB1

BLB0

Page 7: High-Performance SRAM Design

Multi read ports

WWL

WBL WBLb

RWL0a

RBL1 RBL2

RWL0b

Page 8: High-Performance SRAM Design

Multi read ports

WWL

WBL WBLb

RWL0a

RBL1RBL2

RWL0b

Page 9: High-Performance SRAM Design

Block Diagram

Slide 8

13

Figure 2.1 SRAM architecture

CELL

WL[0]

WL[2n-1]

BL0

Row

Dec

oder

A0

A1

An

2n 2n x 2m

cell

CELL

CELL

CELL

Column Decoder

Precharge Circuit

BLB2m

-1BL2m

-1BLB0

An

An+m-1

Sense Amplifier & Write Driver

Timing & ControlCS

R/WGlobal Read/Write

Global Data Bus

2m bits

Address BufferAddress

Blocks

Block Decoder

signals is used for the determination of read or write operation and the chip set (CS)

signal is usually employed in multi-chip designs.

During the read operation the integrated SA on each column (sometimes shared

between more columns) will be employed to read the data. In write operation, the write

drivers will force the BL and BLB of selected column to ‘0’ or ‘1’ and the input data will

be written into the internal nodes of the selected cell.

Hence, a typical column of SRAM consists of the following blocks:

Bank andBank Conflicts

Page 10: High-Performance SRAM Design

Global and Local Variations

inter-die

GLOBALs

-D t GLOBALV

intra-die

LOCALs

-t LOCALVd

Random Dopant Fluctuation

Page 11: High-Performance SRAM Design

Hold Failure

AXR

BL BR

WL=0

L=‘1’

R=‘0’NR

PR

NL

PLAXL

VDDH

Time ->

WL VR

VLVo

ltage VDDH

Time ->

VR

VLWL

Volta

ge

VDDH

S. Mukhopadhyay, ITC 2010

Page 12: High-Performance SRAM Design

Read Failure

BL BR

WL

VL=‘1’

VR=‘0’VREAD

NR

PR

NL

PLAXRAXL

VTRIPRD VR=VREAD

VL

WL

Volta

ge

Time ->

WL

VR

VL

Volta

ge

Time ->

S. Mukhopadhyay, ITC 2010

Page 13: High-Performance SRAM Design

Write Failure

AXR

BL BR

WL

L=‘1’

R=‘0’NR

PR

NL

PLAXL

VR

VL

WL

Volta

ge

Time ->

WL

VR

VL

Volta

ge

Time ->

TWL

S. Mukhopadhyay, ITC 2010

Page 14: High-Performance SRAM Design

Access Failure

BL BR

WL=‘1’

VL=‘1’

VR=‘0’

NR

PR

NL

PLAXRAXL

VL=‘0’

WL=‘0’

TMAX

WL

BL

BRDMIN

TAC >TMAXVo

ltage

Time ->

S. Mukhopadhyay, ITC 2010

Page 15: High-Performance SRAM Design

Inter-die Variation & Cell Failures

inter-die Vt shift (DVth-GLOBAL)

GLOBALs

“1” “0”

High–Vt Corners− Access failure ­− Write failure ­

“1” “0”

Low–Vt Corners− Read failure ­− Hold failure ­

S. Mukhopadhyay et. al, ITC2005, VLSI2006, JSSC2007, TCAD2008

Page 16: High-Performance SRAM Design

Failures in SRAM Array

• PCOL: Probability that any of the cells in a column fail

1 (1 ) ROWNCOL FP P= - -

PMEM

PF

PCOL

Redundant Columns

AF

WF

RF

HF

PASS FAIL

PF1-PF

[ ]F F F F FP P Fail P A R W H= = é ùë û! ! !

Overall Cell Failure:

Page 17: High-Performance SRAM Design

Transistor Sizing

Failu

re P

roba

bilit

y (L

og)

130 140 150 -10

-8

-6

-4

Width of Access Transistor (nm)

Read Failure Write Failure Access Failure Cell Failure

105 115 125 -8

-6

-4

Width of Pull-Up Transistor (nm)

185 215 245 -15

-10

-5

Width of Pull-Down Transistor (nm)

Failu

re P

roba

bilit

y (L

og)

Failu

re P

roba

bilit

y (L

og)

Failu

re P

roba

bilit

y (L

og)

• Slide contributed by K. Roy, Purdue

0i

MIN MINVt Vt

i i

L WLWds s=

Page 18: High-Performance SRAM Design

Impact of Redundancy on Memory Failure

PMEM

Cell Failure

sVt

sVtFa

ilure

Pro

babi

lity

Redundant Col / Total Col. [%] Larger redundancy (1) more column to replace (less memory failure). (2) smaller cell area (larger cell failure).

Actual Col.Red. Col.

Total Area=Const.

Page 19: High-Performance SRAM Design

q Array redundancy

a) Improves cell stability

b) Degrades cell performance (i.e increases read and write

times)

c) Does not require any change to cell peripheral circuits

d) Row redundancy is better than column redundancy

Question

Slide 18

Page 20: High-Performance SRAM Design

Example: Multi-VCC for SRAM Cell

• Create differential voltage between WL and Cell to decouple the Read & Write– Write: V_WL > V_Cell– Read: V_WL < V_Cell

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2V1 /(V)

V2 (V

)

-V_WL-V_Cell = 0V-V_WL-V_Cell = -0.1V-V_WL-V_Cell = -0.2V

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.1 0.2 0.3

V_WL – V_Cell (V)

Cel

l writ

e m

argi

n (n

orm

aliz

ed)

Improved Write Margin

Source: K. Zhang et. al. ISSCC 2005

Page 21: High-Performance SRAM Design

Dynamic Circuit Techniques for Variation Tolerant SRAM

VBL = 0 - D

VWL = VDD + D

‘1’ ‘0’NR

PR

NL

PL

AXRAXL

Vcell = VDD - D

VBR=VDDVBL = 0

Vcell = VDDHigher VWL =>

Strong AX helps

discharge

Lower VWL => lower Vread(weak AX)

VWL

Negative VBL for 0 => strong

AX helps discharge

Weak impactVBL

Lower Vcs => Weak PUP

Higher Vcs => lower Vread(strong PD) Higher Vtrip

Vcs

WriteRead

Page 22: High-Performance SRAM Design

Example: Dual-Vcc based Dynamic Circuit Techniques

• Dynamic VCC MUX is integrated into subarray

• VCC selection is along column direction to decouple the Read & Write

VCC_HiVCC_Lo

VCC_Select

VCC_SRAM

VCC MUX

VCC_lo

cell cell cell cell cellWL

cell cell cell cell cellWL

MUX (8:1)W R R R

cellcell cellcell cellcell cellcell cellcellWL

cellcell cellcell cellcell cellcell

VCC_hiMUX MUX MUX MUX MUX

BI MUX

VCC MUX

VCC_lo

cellcell cellcell cellcell cellcell cellcellWL

cellcell cellcell cellcell cellcell cellcellWL

MUX (8:1)W R R R

cellcell cellcell cellcell cellcell cellcellWL

cellcell cellcell cellcell cellcell

VCC_hiMUX MUXMUX MUXMUX MUXMUX MUXMUX

BI MUX

VCC MUX

Source: K. Zhang et. al. ISSCC 2005

Page 23: High-Performance SRAM Design

Negative Bit Line Scheme

Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

3

C. Effect on Data-Retention Although a column-based scheme eliminates the degradation

in the half-select Read disturb failures, it can degrade the data

retention ability of the unselected cells in the selected column

(i.e. the data stored in the unselected cells can flip, resulting in a

data retention failure). For dynamic supply control, the data

retention failure probability can increase due to lower supply

voltage of the unselected cells. On the other hand, data

retention failures can occur in the negative bit-line scheme due

to increased leakage through the access transistor (Vgs = ∆BL >

0) in the unselected cells in the selected column.

We have estimated the data retention failure probability

considering similar reduction in the cell supply and (DC

negative) bit-line voltages. Fig. 3(a) plots the data retention

failure probability considering similar reduction in the cell

supply and bit-line voltage. The failure probability increases at

a greater rate for the DC negative bit-line scenario, as compared

to the reduced supply voltage scenario. This places a constraint

on the maximum DC negative bit-line voltage that can be used

to enhance the write-ability of the cell.

This was further verified by measuring the static noise

margin for an array of 120 cells. The cells were manufactured

in a 45nm SOI technology, and the mean hold noise margin for

various reductions in cell and bit-line voltages is plotted in Fig.

3(b). As expected, a similar trend can be observed, with the

hold noise margin decreasing rapidly with DC negative bit-line

voltages. In fact, it reduces to 0 at a negative bit line voltage of

200mV, indicating a data retention failure in all cells.

III. TRANSIENT NEGATIVE BIT-LINE TECHNIQUE

A DC negative bit-line voltage level requires a negative

voltage source (on-chip or off-chip) and level converters at the

drivers. Both these requirements increase the design

complexity. On-chip generation of a DC negative bias requires

a charge-pump which increases the power dissipation. In

addition, the application of a static negative bias is likely to

result in reliability concerns due to the increased electric field

across the devices. The DC negative bit line voltage also

degrades the hold stability as shown in the section II. C. To

eliminate these issues while preserving the benefit of the

negative bit-line voltage, we propose a capacitive coupling

based technique (Tran-NBL) for generating a transient negative

pulse on the appropriate bit-line.

A. Basic Concept A Write operation is essentially composed of two parts:

(a) Node storing ‘1’ (i.e. L in Fig. 1) is discharged till the

node voltage becomes equal to the voltage at the node storing

‘0’ (i.e. R in Fig. 1); and

(b) After voltage at the node L becomes lower than that at the

node R, the cross-coupled inverters ensures that node L reaches ‘0’ and node R reaches ‘1’ (Fig. 1).

The discharging time of part (a) is denoted as T1 and the

cross-coupled inverter action time in part (b) is T2. Normally T2

is much smaller than T1. The majority of the Write failures are

due to the effect of variations on T1. Variations delay the

discharge of the node L and T1 becomes larger than the

word-line turn-on time (TWL). If the node L is pulled down

below node R within the word-line turn-on time, the

cross-coupled inverter action will most likely ensure the write

operations.

1.0E+00

1.0E+02

1.0E+04

1.0E+06

1.0E+08

1.0E+10

0 0.05 0.1 0.15 0.2

Change in bit line or cell supply voltage [V]

Failu

re P

roba

bilit

y (S

imul

atio

n)

DC Negative Bit Line

Reduced Supply Voltage

(a) Reduction in Data Retention Failure Probability (Monte Carlo Simulation [7])

0.0

0.2

0.4

0.6

0.8

1.0

0 0.05 0.1 0.15 0.2

Change in bit line or cell supply voltage [V]

Hol

d N

oise

Mar

gin

(Mea

sure

d)

DC Negative Bit Line

Reduced Supply Voltage

(b) Reduction in Hold Noise Margin (Measured) Fig. 3: Date-retention failures and noise margin for voltage

changes at cell terminals

CBL

VBLVinCboost

cell

DB=“1”

BIT_EN

D=“0”

BL BR

Cboost

NSEL

BIT_EN generating

block

WR

CS

PCHG

NBL,PBL NBR,PBR

P1 P2

NSEL

Conventional

WL,PCHG

BL BRBIT_EN

&NSEL

∆ ~ Cboost/CBL

WL,PCHG

BL BR

This SchemeConventional

WL,PCHG

BL BRBIT_EN

&NSEL

∆ ~ Cboost/CBL

WL,PCHG

BL BR

This Scheme

CBL

VBLVinCboost

CBLCBLCBL

VBLVinCboost

cell

DB=“1”

BIT_EN

D=“0”

BL BR

Cboost

NSELNSEL

BIT_EN generating

block

WR

CS

PCHG

NBL,PBL NBR,PBR

P1 P2

NSEL

Conventional

WL,PCHG

BL BRBIT_EN

&NSEL

∆ ~ Cboost/CBL

WL,PCHG

BL BR

This SchemeConventional

WL,PCHG

BL BRBIT_EN

&NSEL

∆ ~ Cboost/CBL

WL,PCHG

BL BR

This Scheme

Fig. 4. The proposed Tran-NBL scheme.

Page 24: High-Performance SRAM Design

Effectiveness Considerations: Writability improvement

• Various dynamic schemes have different effectiveness in improving writability for similar read stability

– Higher VWL is most effective

100

10-5

10-10

10-15

Nor

m. w

rite

fail

prob

0 50 100 150 200change in terminal voltage (D) [mV]

VWL = VDD + D

VBL = - D

Vcell = VDD - D

Fast Monte-Carlo simulations for 45nm PD/SOI

VBL = - D

Vcell = VDD - D

Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009

Page 25: High-Performance SRAM Design

Impact on Active Data-Retention

• Column based read-write control adversely impact the active data-retention failures

– DC negative bitline has higher active data-retention failures

– Tran-NBL and lower Vcs have comparable failure rates

WL2

= 0

Vcell = VDD - D

Sel. col.-D VDD

Active data-retention fails

Tran-NBL

Fail probabilities are normalized to write fail prob. at nominal condition

DC-NBL

Lower Vcell

Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009

Page 26: High-Performance SRAM Design

Dynamic Circuit Techniques for Variation Tolerant SRAM

VBL = 0 - D

VWL = VDD + D

‘1’ ‘0’NR

PR

NL

PL

AXRAXL

Vcell = VDD - D

VBR=VDDVBL = 0

Vcell = VDDHigher VWL =>

Strong AX helps

discharge

Lower VWL => lower Vread(weak AX)

VWL

Negative VBL for 0 => strong

AX helps discharge

Weak impactVBL

Lower Vcs => Weak PUP

Higher Vcs => lower Vread(strong PD) Higher Vtrip

Vcs

WriteRead

Page 27: High-Performance SRAM Design

Implementation Consideration: Half-Select Stability

WL2

= 0

Vcell = VDD - D VDD

Sel. col.

Half-sel col.

WL1

=VDD + D

-D VDD VDD VDD

• Higher VWL

- Row-based scheme - Degrades half-select

read stability of the unselected columns

• Lower Vcell or negative bit-line+ Column-based scheme+ Half-select read

stability remains same

Page 28: High-Performance SRAM Design

Assist Methods

Page 29: High-Performance SRAM Design
Page 30: High-Performance SRAM Design

q Of the various assist methods

a) Negative bit line scheme does not help 8-T sram cell

b) Word line under drive does not help 8-T sram cell

c) Word line over drive does not help 7-T conditionally

decoupled sram cell

d) VCDL does not help any kind of assymetric sram cell

Question

Slide 29

Page 31: High-Performance SRAM Design

Precharge Time (Timing diagrams)

SRAM as random number generator

Slide 30


Recommended