High-Performance SRAM Design
Rahul Rao
IBM Systems and Technology Group
Thought exercise
READ Path
WWL
WBL WBLb
RWL
RBL READ Path
WWL
WBL WBLb
RWL
RBL
2
Implement logic function via 8T merge
• Concept: Use the 8-T portion of the cell for implementing logic functions– Possible due to decoupling of
read and write paths
• RBL discharges when either C0 or C1 is high– Read stack remains 2-high
• And function: switch definition of WBL and WBLB
• An OR2 embodiment is shown on right– Can be complex OR4, OR8, etc.
WBL
WWL
WBLB
RWL
RBL
C0
C1
3
A More Practical Implementation
+ Share RWL between adjacent cells => 3 word-lines in 4 metal tracks – Reduces coupling
capacitance (+ performance and power)
+ All FEOL features identical to conventional 8T cell– OR– upper device in stack can be
made smaller, reducing cell size
WBLWBLB RBL
WWL0
RWL
C0
WWL1
C1
Low leakage SRAM
(S. Hanson , VLSI Symp ’08)
good trade-off between high granularity in power gating and footer over-head. These footers are selectively turned off during sleep mode based on the contents of the free-list, reducing DMEM sleep power DMEM from 22.5pW at full retention to 7.5pW at zero memory retention, a 66% reduction. [1] F. Albano, et al, Journal of Power Source, 170, 2007 [2] S. Hanson, et al, Symposium on VLSI Circuits, 2007.
[3] A. Wang, et, al, International Solid-State Circuits Conference, 2004. [4] C. Kim, et al, Transactions on VLSI Systems, Vol 11, 2003. [5] B. Zhai, et al, Symposium on VLSI Circuits, 2006. [6] B. Calhoun, et al, International Solid-State Circuits Conference, 2005. [7] Y. Lin, et al, Custom Integrated Circuits Conference, 2007. [8] M. Seok, et al, Design Automation Conference, 2007. [9] L. Chang, et al, Symposium on VLSI Technology, 2005.
Fe
tch
De
co
de
Pip
elin
e R
eg
iste
r52-b
it F
ree L
ist
...
Foote
r C
ontr
ol ......
40 columns
52
row
s
...
......
...
...
...
... ...
Array
Footer
Control
Free
List
......
wbl
wbl_
b
rbl
Decoder
...
Data In
wwl_b[0]..
..
I/O Driver
sleep_b
..
wbl wbl_brwl
rwl
wwl_b
0.44/0.50
0.55/0.50
0.55/
0.35*
0.55/0.18
0.55/0.18
Vss,cell Vss,gated
Vdd
rbl
Vdd
wwl_b
Low Leakage Cell Read Buffer
rwl[0]rwl_b[0]
Figure 10: 10.9fW/cell memory architecture. Array diagram highlights footer sharing every 2 rows. Inset figure shows bitcell architecture.
sleep
used?
sleep_b
Vss,gated0.99/
0.50
Data out
Write Complete
*W/L in umMVT HVT
Delay Replica
64x10
IMEM
128x10
IROM
52x40
DMEMCPUTimer
Temp
SensorPMU
Clock
GenI/O
W=0.66-28um
L=0.5um
W=0.99um
L=0.5um
W=5.5um
L=0.35um
26
footers
=MVT=HVT
Figure 4: Footer allocation in Phoenix
0.4 0.5 0.6 0.7 0.8 0.9
10
100
1000 Freq
Vdd
(V)
Fre
quency (
kH
z)
2
3
4
5
6
7
8
Eactive
/cycle
Active E
nerg
y p
er
Cycle
(p
J)
Active Mode Sleep Mode0
20
40
60
80
100
CPU
IROM
IMEM
DMEM
2.8pJ/cycle, 297nW 29.6pW
Timer
IROM
IMEM
DMEM
CPU[%]
0 4 8 12 16 20 24 28
100
200
300
400
500
CPU Footer Width (um)
Freq
uenc
y (K
Hz)
0 4 8 12 16 20 24 280
5
10
15
20
25
30
35
40
45
~10X
Sle
ep L
eaka
ge C
urre
nt (p
A)
CPU Footer Width (um)0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
1
10
100
1000
3
4
5
6
7
Act
ive
Ene
rgy
per
Cyc
le (p
J)
Freq
uenc
y (k
Hz)
Vdd
(V)
Freq, W=28um
Freq, W=0.66um
Energy, W=28um
Energy, W=0.66um
Reduced min active energy for larger footer
0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
10.0n
20.0n
30.0n
1.0m
2.0m
~104X
~2.5X
energy-optimal point
Tota
l Ene
rgy
(J)
Vdd
(V)
Wfooter
= 0.66um
Wfooter
= 28um
no footer
978-1-4244-1805-3/08/$25.00 © 2008 IEEE 2008 Symposium on VLSI Circuits Digest of Technical Papers 189
The Phoenix Processor: A 30pW Platform for Sensor Applications Mingoo Seok, Scott Hanson, Yu-Shiang Lin, Zhiyoong Foo, Daeyeon Kim, Yoonmyung Lee, Nurrachman Liu, Dennis Sylvester, David Blaauw
University of Michigan, Ann Arbor, MIAbstract
An integrated platform for sensor applications, called the Phoenix Processor, is implemented in a carefully-selected 0.18µm process with an area of 915x915µm2, making on-die battery integration feasible. Phoenix uses a comprehensive sleep strategy with a unique power gating approach, an event-driven CPU with compact ISA, data memory com-pression, a custom low leakage memory cell, and adaptive leakage man-agement in data memory. Measurements show that Phoenix consumes 29.6pW in sleep mode and 2.8pJ/cycle in active mode.
Introduction Form-factor is a critical concern for wide applicability and cost effec-tiveness in sensor systems, especially in medical applications. In this work, we explore the development of a sensor platform, called the Phoe-nix Processor, which will occupy only 1mm3 when coupled with an on-die battery. To ensure multi-year lifetime for the given platform size, average power consumption must be reduced to tens of pW [1], which marks a ~4000X reduction over previous sensor designs [2]. Recent work [2-6] has explored aggressive Vdd scaling for reducing active en-ergy but has overlooked the power consumed during idle periods, which can be >99% of the lifetime. In addition to aggressive voltage scaling, Phoenix leverages a comprehensive sleep strategy, including the inten-tional selection of an older, low leakage technology, a unique power gating approach, an event-driven CPU with compact instruction set, data memory compression, a custom low leakage memory cell, and adaptive leakage management in the data memory. A test chip, including a sensor and timers, was fabricated in an area of 915x915µm2 in a 0.18µm proc-ess. Phoenix consumes 29.6pW in sleep mode and 2.8pJ/cycle in active mode at Vdd=0.5V. For a typical sensor application that runs 2000 in-structions every 10 minutes, average power consumption is 39pW, which will enable integration of an on-die battery in a volume of 1mm3 while ensuring >1 year lifetime [1].
System Overview As shown in Fig 1(a), Phoenix is a modular system with the CPU, memories (DMEM, IMEM, IROM) and power management unit (PMU) serving as parents of the system bus and peripherals (timer, sensor) act-ing as children. The accommodations made for sleep mode are best un-derstood by exploring typical system operation (Fig 2(a)). The system begins in sleep mode (A), where 65-87% of all transistors are power gated using footers (depending on the retentive size of DMEM). In this mode, the PMU, a 2-bit FSM (Fig 3), remains awake and acts as the parent of the system bus. All gates in the PMU use stacked high-Vth (HVT) devices with Vth~0.7V to minimize leakage. In addition to the PMU, IMEM cells and valid DMEM cells remain awake to retain data. As shown in Fig 2(c), measurements at Vdd=0.5V reveal that total power is limited to 29.6pW in sleep mode with half DMEM retention. Note in Fig 2 that IMEM and DMEM draw 89% of sleep mode power. After a programmable sleep time (e.g., 10min), a 0.9pW timer similar to [7] raises an interrupt on the system bus using the asynchronous pro-tocol. In response the PMU initiates a short wake sequence and relin-quishes control of the bus to the CPU (B). The CPU then runs a routine of ~2000 instructions to query the sensor for data and process the ac-quired data (C). During this routine, the CPU decompresses a block of DMEM and places it in the cache (which is part of the register file), requests a sensor measurement over the bus, processes the returned data, compresses the cache contents and stores it to DMEM, and finally issues a sleep request. The PMU then regains control of the bus and gates sys-tem power (D,E). Fig 2(b) shows measured energy and frequency char-acteristics for Phoenix in active mode across Vdd. At Vdd=0.5V, the sys-tem operates at 106kHz and consumes 2.8pJ/cycle, with 88% of energy consumed by the CPU. Phoenix effectively operates at Vdd=384mV due to a non-zero virtual ground.
Sleep Strategies One of the critical pieces of our sleep strategy is the unique approach to power gating (Fig 4). Traditional power gating uses a wide HVT footer to minimize the voltage drop across the footer and thus maintain per-formance. Additionally, a HVT footer is attractive since it gives a dra-matic leakage reduction compared to a medium-Vth (MVT) footer for minimal delay penalty. Our approach for low Vdd power gating differs in two ways: 1) we use a MVT footer since on-current for a HVT footer is exponentially smaller at low Vdd, making HVT footers infeasible, and 2) our aim is to minimize energy rather than maintain performance [8], so footer size is set to only 0.66µm (0.01% of total effective NFET width) with L=0.50µm. Due to its small size, the footer develops a voltage drop of 116mV in active mode (frequency implications shown in Fig 5), ele-vating the energy-optimal Vdd to 0.5V. (In contrast to the optimal value of 0.36V reported for the un-gated design in [2]) The reduced leakage of the small footer (Fig 6) easily offsets the slight increase in active energy due to the power consumed across the footer (Fig 7). Total energy is reduced by 2.5X compared to a somewhat larger footer of 28µm and by 4 orders compared to a design without a footer (Fig 8). Additionally, the elevated optimal Vdd aids in robust SRAM design. While the CPU (Fig 9(a)) largely impacts active mode power, it also plays an important role in sleep mode. An event-driven operating system initiates computation only in response to interrupts raised by peripherals, ensuring that the system defaults to sleep mode. Since IMEM is a major source of sleep mode power, the instruction set was chosen to minimize IMEM footprint using a minimum group of basic instructions while also including support for compression, interrupt and sleep functionalities. To limit instruction width to only 10 bits, common instructions use flexible addressing modes while less common instructions use implicit operands. Additionally, the 64-word IMEM is supplemented with a low leakage 128-word IROM that contains common functions. Hardware support for compression (Fig 9(c)) was included in the CPU to minimize the DMEM footprint in sleep mode and to maximize mem-ory capacity. A virtual data memory space of 512B is mapped to the 256B DMEM using Huffman encoding with a fixed dictionary for a maximum compression ratio of 50%. DMEM is divided into 2 partitions: statically and dynamically allocated (Fig 9(b)). Each group of 16B (a block) in virtual memory is given one line in the statically allocated partition. If a write to memory causes a block to overflow its statically allocated entry, overflow data is written to an entry in dynamically allo-cated memory whose location is noted by a pointer in the statically allo-cated entry. A 52b free-list, which is visible to the CPU, monitors the usage of entries in both memory partitions. Since SRAM can dominate total energy consumption, we place empha-sis on low leakage SRAM design. Both IMEM and DMEM use the bit-cell shown in Fig 10. The cross-coupled inverters and access transistors use HVT devices, while stack forcing and gate length biasing are used to further reduce leakage and improve subthreshold swing. Measurements show that a single bitcell (~40µm2, which is acceptable for sensor appli-cations) consumes 10.9fW while retaining data. To enable robust low Vdd operation, the proposed cell includes a MVT read buffer similar to [9]. The MVT read buffer also enables single-cycle read-out despite the aggressive use of HVT devices elsewhere. This is useful for the IMEM, where a read occurs every cycle. Since the write operation in DMEM is slow relative to the MVT CPU, write operations are asynchronous. Write completion is determined by reading the contents of the row being writ-ten and comparing to the write data. Read is single-ended, so a replica delays the write completion signal to guarantee that both sides of the cell have been written correctly. To further reduce sleep power, the DMEM uses a leakage reduction scheme based on the free-list. The DMEM has 26 footers, with each connected to 2 rows (Fig 10). The choice of 2 rows per footer offers a
978-1-4244-1805-3/08/$25.00 © 2008 IEEE 2008 Symposium on VLSI Circuits Digest of Technical Papers 188
5
BL0
WL0a
Multi-porting SRAM Cell
WL0b
BL1BLB1
BLB0
Multi read ports
WWL
WBL WBLb
RWL0a
RBL1 RBL2
RWL0b
Multi read ports
WWL
WBL WBLb
RWL0a
RBL1RBL2
RWL0b
Block Diagram
Slide 8
13
Figure 2.1 SRAM architecture
CELL
WL[0]
WL[2n-1]
BL0
Row
Dec
oder
A0
A1
An
2n 2n x 2m
cell
CELL
CELL
CELL
Column Decoder
Precharge Circuit
BLB2m
-1BL2m
-1BLB0
An
An+m-1
Sense Amplifier & Write Driver
Timing & ControlCS
R/WGlobal Read/Write
Global Data Bus
2m bits
Address BufferAddress
Blocks
Block Decoder
signals is used for the determination of read or write operation and the chip set (CS)
signal is usually employed in multi-chip designs.
During the read operation the integrated SA on each column (sometimes shared
between more columns) will be employed to read the data. In write operation, the write
drivers will force the BL and BLB of selected column to ‘0’ or ‘1’ and the input data will
be written into the internal nodes of the selected cell.
Hence, a typical column of SRAM consists of the following blocks:
Bank andBank Conflicts
Global and Local Variations
inter-die
GLOBALs
-D t GLOBALV
intra-die
LOCALs
-t LOCALVd
Random Dopant Fluctuation
Hold Failure
AXR
BL BR
WL=0
L=‘1’
R=‘0’NR
PR
NL
PLAXL
VDDH
Time ->
WL VR
VLVo
ltage VDDH
Time ->
VR
VLWL
Volta
ge
VDDH
S. Mukhopadhyay, ITC 2010
Read Failure
BL BR
WL
VL=‘1’
VR=‘0’VREAD
NR
PR
NL
PLAXRAXL
VTRIPRD VR=VREAD
VL
WL
Volta
ge
Time ->
WL
VR
VL
Volta
ge
Time ->
S. Mukhopadhyay, ITC 2010
Write Failure
AXR
BL BR
WL
L=‘1’
R=‘0’NR
PR
NL
PLAXL
VR
VL
WL
Volta
ge
Time ->
WL
VR
VL
Volta
ge
Time ->
TWL
S. Mukhopadhyay, ITC 2010
Access Failure
BL BR
WL=‘1’
VL=‘1’
VR=‘0’
NR
PR
NL
PLAXRAXL
VL=‘0’
WL=‘0’
TMAX
WL
BL
BRDMIN
TAC >TMAXVo
ltage
Time ->
S. Mukhopadhyay, ITC 2010
Inter-die Variation & Cell Failures
inter-die Vt shift (DVth-GLOBAL)
GLOBALs
“1” “0”
High–Vt Corners− Access failure − Write failure
“1” “0”
Low–Vt Corners− Read failure − Hold failure
S. Mukhopadhyay et. al, ITC2005, VLSI2006, JSSC2007, TCAD2008
Failures in SRAM Array
• PCOL: Probability that any of the cells in a column fail
1 (1 ) ROWNCOL FP P= - -
PMEM
PF
PCOL
Redundant Columns
AF
WF
RF
HF
PASS FAIL
PF1-PF
[ ]F F F F FP P Fail P A R W H= = é ùë û! ! !
Overall Cell Failure:
Transistor Sizing
Failu
re P
roba
bilit
y (L
og)
130 140 150 -10
-8
-6
-4
Width of Access Transistor (nm)
Read Failure Write Failure Access Failure Cell Failure
105 115 125 -8
-6
-4
Width of Pull-Up Transistor (nm)
185 215 245 -15
-10
-5
Width of Pull-Down Transistor (nm)
Failu
re P
roba
bilit
y (L
og)
Failu
re P
roba
bilit
y (L
og)
Failu
re P
roba
bilit
y (L
og)
• Slide contributed by K. Roy, Purdue
0i
MIN MINVt Vt
i i
L WLWds s=
Impact of Redundancy on Memory Failure
PMEM
Cell Failure
sVt
sVtFa
ilure
Pro
babi
lity
Redundant Col / Total Col. [%] Larger redundancy (1) more column to replace (less memory failure). (2) smaller cell area (larger cell failure).
Actual Col.Red. Col.
Total Area=Const.
q Array redundancy
a) Improves cell stability
b) Degrades cell performance (i.e increases read and write
times)
c) Does not require any change to cell peripheral circuits
d) Row redundancy is better than column redundancy
Question
Slide 18
Example: Multi-VCC for SRAM Cell
• Create differential voltage between WL and Cell to decouple the Read & Write– Write: V_WL > V_Cell– Read: V_WL < V_Cell
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2V1 /(V)
V2 (V
)
-V_WL-V_Cell = 0V-V_WL-V_Cell = -0.1V-V_WL-V_Cell = -0.2V
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.1 0.2 0.3
V_WL – V_Cell (V)
Cel
l writ
e m
argi
n (n
orm
aliz
ed)
Improved Write Margin
Source: K. Zhang et. al. ISSCC 2005
Dynamic Circuit Techniques for Variation Tolerant SRAM
VBL = 0 - D
VWL = VDD + D
‘1’ ‘0’NR
PR
NL
PL
AXRAXL
Vcell = VDD - D
VBR=VDDVBL = 0
Vcell = VDDHigher VWL =>
Strong AX helps
discharge
Lower VWL => lower Vread(weak AX)
VWL
Negative VBL for 0 => strong
AX helps discharge
Weak impactVBL
Lower Vcs => Weak PUP
Higher Vcs => lower Vread(strong PD) Higher Vtrip
Vcs
WriteRead
Example: Dual-Vcc based Dynamic Circuit Techniques
• Dynamic VCC MUX is integrated into subarray
• VCC selection is along column direction to decouple the Read & Write
VCC_HiVCC_Lo
VCC_Select
VCC_SRAM
VCC MUX
VCC_lo
cell cell cell cell cellWL
cell cell cell cell cellWL
MUX (8:1)W R R R
cellcell cellcell cellcell cellcell cellcellWL
cellcell cellcell cellcell cellcell
VCC_hiMUX MUX MUX MUX MUX
BI MUX
VCC MUX
VCC_lo
cellcell cellcell cellcell cellcell cellcellWL
cellcell cellcell cellcell cellcell cellcellWL
MUX (8:1)W R R R
cellcell cellcell cellcell cellcell cellcellWL
cellcell cellcell cellcell cellcell
VCC_hiMUX MUXMUX MUXMUX MUXMUX MUXMUX
BI MUX
VCC MUX
Source: K. Zhang et. al. ISSCC 2005
Negative Bit Line Scheme
Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
3
C. Effect on Data-Retention Although a column-based scheme eliminates the degradation
in the half-select Read disturb failures, it can degrade the data
retention ability of the unselected cells in the selected column
(i.e. the data stored in the unselected cells can flip, resulting in a
data retention failure). For dynamic supply control, the data
retention failure probability can increase due to lower supply
voltage of the unselected cells. On the other hand, data
retention failures can occur in the negative bit-line scheme due
to increased leakage through the access transistor (Vgs = ∆BL >
0) in the unselected cells in the selected column.
We have estimated the data retention failure probability
considering similar reduction in the cell supply and (DC
negative) bit-line voltages. Fig. 3(a) plots the data retention
failure probability considering similar reduction in the cell
supply and bit-line voltage. The failure probability increases at
a greater rate for the DC negative bit-line scenario, as compared
to the reduced supply voltage scenario. This places a constraint
on the maximum DC negative bit-line voltage that can be used
to enhance the write-ability of the cell.
This was further verified by measuring the static noise
margin for an array of 120 cells. The cells were manufactured
in a 45nm SOI technology, and the mean hold noise margin for
various reductions in cell and bit-line voltages is plotted in Fig.
3(b). As expected, a similar trend can be observed, with the
hold noise margin decreasing rapidly with DC negative bit-line
voltages. In fact, it reduces to 0 at a negative bit line voltage of
200mV, indicating a data retention failure in all cells.
III. TRANSIENT NEGATIVE BIT-LINE TECHNIQUE
A DC negative bit-line voltage level requires a negative
voltage source (on-chip or off-chip) and level converters at the
drivers. Both these requirements increase the design
complexity. On-chip generation of a DC negative bias requires
a charge-pump which increases the power dissipation. In
addition, the application of a static negative bias is likely to
result in reliability concerns due to the increased electric field
across the devices. The DC negative bit line voltage also
degrades the hold stability as shown in the section II. C. To
eliminate these issues while preserving the benefit of the
negative bit-line voltage, we propose a capacitive coupling
based technique (Tran-NBL) for generating a transient negative
pulse on the appropriate bit-line.
A. Basic Concept A Write operation is essentially composed of two parts:
(a) Node storing ‘1’ (i.e. L in Fig. 1) is discharged till the
node voltage becomes equal to the voltage at the node storing
‘0’ (i.e. R in Fig. 1); and
(b) After voltage at the node L becomes lower than that at the
node R, the cross-coupled inverters ensures that node L reaches ‘0’ and node R reaches ‘1’ (Fig. 1).
The discharging time of part (a) is denoted as T1 and the
cross-coupled inverter action time in part (b) is T2. Normally T2
is much smaller than T1. The majority of the Write failures are
due to the effect of variations on T1. Variations delay the
discharge of the node L and T1 becomes larger than the
word-line turn-on time (TWL). If the node L is pulled down
below node R within the word-line turn-on time, the
cross-coupled inverter action will most likely ensure the write
operations.
1.0E+00
1.0E+02
1.0E+04
1.0E+06
1.0E+08
1.0E+10
0 0.05 0.1 0.15 0.2
Change in bit line or cell supply voltage [V]
Failu
re P
roba
bilit
y (S
imul
atio
n)
DC Negative Bit Line
Reduced Supply Voltage
(a) Reduction in Data Retention Failure Probability (Monte Carlo Simulation [7])
0.0
0.2
0.4
0.6
0.8
1.0
0 0.05 0.1 0.15 0.2
Change in bit line or cell supply voltage [V]
Hol
d N
oise
Mar
gin
(Mea
sure
d)
DC Negative Bit Line
Reduced Supply Voltage
(b) Reduction in Hold Noise Margin (Measured) Fig. 3: Date-retention failures and noise margin for voltage
changes at cell terminals
CBL
VBLVinCboost
cell
DB=“1”
BIT_EN
D=“0”
BL BR
Cboost
NSEL
BIT_EN generating
block
WR
CS
PCHG
NBL,PBL NBR,PBR
P1 P2
NSEL
Conventional
WL,PCHG
BL BRBIT_EN
&NSEL
∆ ~ Cboost/CBL
WL,PCHG
BL BR
This SchemeConventional
WL,PCHG
BL BRBIT_EN
&NSEL
∆ ~ Cboost/CBL
WL,PCHG
BL BR
This Scheme
CBL
VBLVinCboost
CBLCBLCBL
VBLVinCboost
cell
DB=“1”
BIT_EN
D=“0”
BL BR
Cboost
NSELNSEL
BIT_EN generating
block
WR
CS
PCHG
NBL,PBL NBR,PBR
P1 P2
NSEL
Conventional
WL,PCHG
BL BRBIT_EN
&NSEL
∆ ~ Cboost/CBL
WL,PCHG
BL BR
This SchemeConventional
WL,PCHG
BL BRBIT_EN
&NSEL
∆ ~ Cboost/CBL
WL,PCHG
BL BR
This Scheme
Fig. 4. The proposed Tran-NBL scheme.
Effectiveness Considerations: Writability improvement
• Various dynamic schemes have different effectiveness in improving writability for similar read stability
– Higher VWL is most effective
100
10-5
10-10
10-15
Nor
m. w
rite
fail
prob
0 50 100 150 200change in terminal voltage (D) [mV]
VWL = VDD + D
VBL = - D
Vcell = VDD - D
Fast Monte-Carlo simulations for 45nm PD/SOI
VBL = - D
Vcell = VDD - D
Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009
Impact on Active Data-Retention
• Column based read-write control adversely impact the active data-retention failures
– DC negative bitline has higher active data-retention failures
– Tran-NBL and lower Vcs have comparable failure rates
WL2
= 0
Vcell = VDD - D
Sel. col.-D VDD
Active data-retention fails
Tran-NBL
Fail probabilities are normalized to write fail prob. at nominal condition
DC-NBL
Lower Vcell
Source: S. Mukhopadhyay, R. Rao et. al, TVLSI 2009
Dynamic Circuit Techniques for Variation Tolerant SRAM
VBL = 0 - D
VWL = VDD + D
‘1’ ‘0’NR
PR
NL
PL
AXRAXL
Vcell = VDD - D
VBR=VDDVBL = 0
Vcell = VDDHigher VWL =>
Strong AX helps
discharge
Lower VWL => lower Vread(weak AX)
VWL
Negative VBL for 0 => strong
AX helps discharge
Weak impactVBL
Lower Vcs => Weak PUP
Higher Vcs => lower Vread(strong PD) Higher Vtrip
Vcs
WriteRead
Implementation Consideration: Half-Select Stability
WL2
= 0
Vcell = VDD - D VDD
Sel. col.
Half-sel col.
WL1
=VDD + D
-D VDD VDD VDD
• Higher VWL
- Row-based scheme - Degrades half-select
read stability of the unselected columns
• Lower Vcell or negative bit-line+ Column-based scheme+ Half-select read
stability remains same
Assist Methods
q Of the various assist methods
a) Negative bit line scheme does not help 8-T sram cell
b) Word line under drive does not help 8-T sram cell
c) Word line over drive does not help 7-T conditionally
decoupled sram cell
d) VCDL does not help any kind of assymetric sram cell
Question
Slide 29
Precharge Time (Timing diagrams)
SRAM as random number generator
Slide 30