Sp12 CMPEN 411 L23 S.1
CMPEN 411VLSI Digital Circuits
Spring 2012
Lecture 23: Memory Cell DesignsSRAM, DRAM
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp12 CMPEN 411 L23 S.2
Heads-up
IBM Kerry Bernstein’s talk Thursday 4 PM, IST 333
To prepare for his talk, go to ANGEL system, find the file “New dimensions in performance”, under “interesting reading materials”
To make up last cancelled lecture:
Kerry Bernstein’s talk – “Microarchitecture’s Race for Performance and Power”, PSU talk, 11/2004, Slides and Videos are online in ANGEL system “Interesting Reading Materials”
DAC Young Student Scholarship
www.dac.com
Sp12 CMPEN 411 L23 S.3
Review: Basic Building Blocks
Datapath
Execution units
- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers
Multiplexers, decoders
Control
Finite state machines (PLA, ROM, random logic)
Interconnect
Switches, arbiters, buses
Memory
ROM, Caches (SRAMs), CAM, DRAMs, buffers
Sp12 CMPEN 411 L23 S.4
2D 4x4 SRAM Memory Bank
A0
!BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
BL
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
clocking and
control
enable
read
precharge
BLi BLi+1
Sp12 CMPEN 411 L23 S.5
6-Transistor SRAM Storage Cell
!BL BL
WL
M1
M2
M3
M4
M5
M6Q
!Q 1
0
on
on
off
off
Sp12 CMPEN 411 L23 S.6
SRAM Cell Analysis (Read)
!BL=2.5V BL=2.5V
WL=1
M1
M4
M5
M6Q=1!Q=0
CbitCbit
Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area
M1 must be stronger than M5 when storing a 1 (as shown)
M3 must be stronger than M6 when storing a 0
0
Sp12 CMPEN 411 L23 S.7
Read Voltage Ratios
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2 2.5 3
Cell Ratio (CR)
Vo
ltag
e R
ise o
n !
Q
VDD = 2.5V
VTn = 0.4V
where CR is the Cell Ratio = (W1/L1)/(W5/L5)
Keep cell size minimal while maintaining read stability
Make M1 minimum size and increase the L of M5 (to make it weaker)
- increases load on WL
Make M5 minimum size and increase the W of M1 (to make it stronger)
Similar constraints on (W3/L3)/(W6/L6) when storing a 0
1.2
Sp12 CMPEN 411 L23 S.8
SRAM Cell Analysis (Write)
!BL=2.5V BL=0V
WL=1
M1
M4
M5
M6Q=1!Q=0
CbitCbit
The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6.
M6 must be able to overpower M4 when storing a 1 and writing a 0
M5 must be able to overpower M2 when storing a 0 and writing a 1
0
Sp12 CMPEN 411 L23 S.9
Write Voltage Ratios
0
0.1
0.2
0.3
0.4
0.5
0 0.5 1 1.5 2
Pullup Ratio (PR)
Wri
te V
olt
ag
e (
VQ
)
VDD = 2.5V
|VTp| = 0.4V
p/n = 0.5
where PR is the Pull-up Ratio = (W4/L4)/(W6/L6)
Keep cell size minimal while allowing writes
Make M4 and M6minimum size
1.8
Sp12 CMPEN 411 L23 S.10
Cell Sizing and Performance
Keeping cell size minimal is critical for large SRAMs
Minimum sized pull down fets (M1 and M3)
- Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR
- But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle
Minimum width and length pass transistors
- Boost the width of the pull downs (M1 and M3)
- Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
Performance is determined by the read operation
To accelerate the read time, SRAMs use sense amplifiers (so that the bit line doesn’t have to make a full swing)
Sp12 CMPEN 411 L23 S.11
6-T SRAM Layout
VDD
GND
WL
BLBL
M1 M3
M4M2
M5 M6
Simple and reliable, but big
signal routing and connections to two bit lines, a word line, and both supply rails
Area is dominated by the wiring and contacts
Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process
Sp12 CMPEN 411 L23 S.12
Multiple Read/Write Port Storage Cell
!BL1 BL1
WL1
M1
M2
M3
M4
M5 M6Q!Q
WL2
BL2!BL2
M7 M8
To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports
Sp12 CMPEN 411 L23 S.13
Resistance-load SRAM Cell
M3
RL RL
VDD
WL
Q Q
M1 M2
M4
BL BL
Sp12 CMPEN 411 L23 S.14
Remove R
M3
WL
M1 M2
M4
BL BL
Sp12 CMPEN 411 L23 S.15
Remove R
M3
WL
M2
M4
Further remove one transistor
Sp12 CMPEN 411 L23 S.16
3-Transistor DRAM Cell
M1 M2
M3
X
BL1 BL2
WWL
RWL
X VDD-VT
BL1
VDD
WWL write
RWL read
BL2 VDD-VT V
Cs
Write: Cs is charged (or discharged) by asserting WWL and BL1 Value stored at node X when writing a 1 is VWWL - VTn
Read: Cs is “sensed” by asserting RWL and observing BL2 Read is non-destructive and inverting (ratioless)
Sp12 CMPEN 411 L23 S.17
3-Transistor DRAM Cell
M1 M2
M3
X
BL1 BL2
WWL
RWL
X VDD-VT
BL1
VDD
WWL write
RWL read
BL2 VDD-VT V
Cs
Refresh: read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)
Note Vt drop at x: how to fix it?
Sp12 CMPEN 411 L23 S.18
3-T DRAM Layout
BL2 BL1 GND
RWL
WWL
M3
M2
M1
Fewer contacts & wires
Total cell area is 576 2
(compared to 1,092 2
for the 6-T SRAM cell)
No special processing steps are needed (so compatible with logic CMOS process)
Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1”
Sp12 CMPEN 411 L23 S.19
1-Transistor DRAM Cell
M1 X
BL
WL
X VDD-VT
WLwrite
1
BL VDD
Cs
read
1
VDD/2 sensing
CBL
Write: Cs is charged (or discharged) by asserting WL and BL
Read: Charge redistribution occurs between CBL and Cs
Read is destructive, so must refresh after read
Voltage swing is small
Sp12 CMPEN 411 L23 S.20
Sense Amp Operation
V(1)
V(0)
t
VPRE
VBL
Sense amp activatedWord line activated
Sp12 CMPEN 411 L23 S.21
1-T DRAM Cell Observations
Cell is single ended (complicates the design of the sense amp)
Cell requires a sense amp for each bit line due to charge redistribution based read
BL’s precharged to VDD/2 (not VDD as with SRAM design)
all previous designs used SAs for speed, not functionality
Cell read is destructive; refresh must follow to restore data
Cell requires an extra capacitor (CS) that must be explicitly included in the design
May not compatible with logic CMOS process
A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)
Sp12 CMPEN 411 L23 S.22
1-T DRAM (3-D capacitor)
Source: IBMNon-CMOS
Sp12 CMPEN 411 L23 S.23
Peripheral Memory Circuitry
Row and column decoders
Read bit line precharge logic
Sense amplifiers
Timing and control
Speed
Power consumption
Area – pitch matching
Sp12 CMPEN 411 L23 S.24
2D 4x4 __RAM Memory
A0
!BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
BL
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
clocking and
control
enable
read
precharge
BLi BLi+1
Sp12 CMPEN 411 L23 S.25
2D 4x4 ___RAM Memory
A0
BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
WL[1]
WL[2]
WL[3]
bit line precharge
2 bit words
BL0 BL1 BL2 BL3
clocking,
control, and
refresh
enable
read
precharge
Sp12 CMPEN 411 L23 S.26
Row Decoders
Collection of 2M complex logic gates organized in a regular, dense fashion
(N)AND decoder for 8 address bits
WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0
…
WL(255) = A7 & A6 & A5 & A4 & A3 & A2 & A1 & A0
NOR decoder for 8 address bits
WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0)
…
WL(255) = !(!A7 | !A6 | !A5 | !A4 | !A3 | !A2 | !A1 | !A0)
Goals: Pitch matched, fast, low power
Sp12 CMPEN 411 L23 S.27
Dynamic Decoders
Precharge devices
VDD f
GND
WL3
WL2
WL1
WL0
A0A0
GND
A1A1f
WL3
A0A0 A1A1
WL 2
WL 1
WL 0
VDD
VDD
VDD
VDD
2-input NOR decoder 2-input NAND decoder
Which one is faster? Smaller? Low power?
Sp12 CMPEN 411 L23 S.28
Pass Transistor Based Column Decoder
BL3 BL2 BL1 BL0
data_out
2 input
NO
R d
ecoder
A1
A0
S3
S2
S1
S0
Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell
Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2K + 2 x 2K)
For K = 2 3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20
!BL3 !BL2 !BL1 !BL0
!data_out
Sp12 CMPEN 411 L23 S.29
Tree Based Column DecoderBL3 BL2 BL1 BL0
A0
!A0
A1
!A1
data_out
Number of transistors = (2 x 2 x (2K -1))
for K = 2 2 x 2 x (22 – 1) = 4 x 3 = 12
Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders)
can fix with buffers, progressive sizing, combination of tree and pass transistor approaches
!BL3 !BL2 !BL1 !BL0
!data_out
Sp12 CMPEN 411 L23 S.30
Bit Line Precharge Logic
equalization transistor - speeds up
equalization of the two bit lines by
allowing the capacitance and pull-up
device of the nondischarged bit line to
assist in precharging the discharged
line
!PC
!BLBL
First step of a Read cycle is to precharge (PC) the bit lines to VDD
every differential signal in the memory must be equalized to the same voltage level before Read
Turn off PC and enable the WL
the grounded PMOS load limits the bit line swing (speeding up the next precharge cycle)
Sp12 CMPEN 411 L23 S.31
Sense Amplifiers
Amplification – resolves data with small bit line swings (in some DRAMs required for proper functionality)
Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition
SA
input output
tp = ( C * V ) / Iav
large
small
make V as small as
possible
Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines
Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh
Sp12 CMPEN 411 L23 S.32
Differential Sense Amplifier
Directly applicable to
SRAMs
M4
M1
M5
M3
M2
VDD
bitbit
SE
Outy
Sp12 CMPEN 411 L23 S.33
Differential Sensing ― SRAM
VDD
VDD
VDD
VDD
BL
EQ
Diff.SenseAmp
(a) SRAM sensing scheme (b) two stage differential amplifier
SRAM cell i
WL i
2xx
VDD
Output
BL
PC
M3
M1
M5
M2
M4
x
SE
SE
SE
Output
SE
x2x 2x
y
y
2y
Sp12 CMPEN 411 L23 S.35
Redundancy in the Memory Structure
Row
address
Column
address
Redundant row
Redundant columns
Fuse bank
Sp12 CMPEN 411 L23 S.36
Page 4
== ?
== ?
Redundant Wordline
Redundant Wordline
Fused
Repair
Addresses
Enable
Normal
Wordline
Decoder
Normal Wordline
Functional
Address
== ?
== ?
Redundant Wordline
Redundant WordlineFused
Repair
Addresses
Enable
Normal
Wordline
DecoderNormal Wordline
Row Redundancy
Sp12 CMPEN 411 L23 S.37
Page 5
Column Redundancy
Re
du
nd
an
t D
ata
Colu
mn
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
n
No
rma
l D
ata
Co
lum
nF
use
Fu
se
Fu
se
Fu
se
Fu
se
Fu
se
Fu
se
Fu
se
Data
0
Data
1
Data
2
Data
3
Data
4
Data
5
Data
7
Data
6
Sp12 CMPEN 411 L23 S.38
Error-Correcting Codes
Example: Hamming Codes
e.g. If B3 flips
1
1
0
= 3
2K>= m+k+1. m # data bit, k # check bit
For 64 data bits, needs 7 check bits
Sp12 CMPEN 411 L23 S.39
Performance and area overhead for ECC
Sp12 CMPEN 411 L23 S.40
Redundancy and Error Correction
Sp12 CMPEN 411 L23 S.41
Soft Errors
Nonrecurrent and nonpermanent errors from
alpha particles (from the packaging materials)
neutrons from cosmic rays
As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower VDD) and thus Qcritical
(the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER)
1
10
100
1000
10000
0.25 0.18 0.13 0.09 0.05
Process Technology
Sy
ste
m F
ITS
From Semico Research Corp.
MTBF (hours)
.13 m .09 m
Ground-based 895 448
Civilian Avionics System 324 162
Military Avionics System 18 9
From Actel
Sp12 CMPEN 411 L23 S.42
Scary Fact
Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to 0.074 upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.
Sp12 CMPEN 411 L23 S.43
Modeling of a particle strike
Sp12 CMPEN 411 L23 S.44
A SPICE simulation for SRAM
A particle
strike
!BLBL
WL
0->11->0
0
Sp12 CMPEN 411 L23 S.45
On-chip Memory: ITRS roadmap
180nm /
'99
130nm /
'02
100nm /
'05
70nm /
'08
50nm /
'11
35nm /
'14
0
20
40
60
80
100
% D
ie u
tiliz
atio
n
Area Reused Logic
Area New Logic
Area Memory
Sp12 CMPEN 411 L23 S.46
State of Art
Sp12 CMPEN 411 L23 S.47
State of Art