Home > Documents > Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs...

Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs...

Date post: 02-Jan-2016
Category:
Author: valentine-parsons
View: 216 times
Download: 3 times
Share this document with a friend
Embed Size (px)
of 46 /46
Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs SRAM, DRAM [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Transcript
CSE 477. VLSI Systems DesignCMPEN 411
SRAM, DRAM
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L23 S.*
Heads-up
IBM Kerry Bernstein’s talk Thursday 4 PM, IST 333
To prepare for his talk, go to ANGEL system, find the file “New dimensions in performance”, under “interesting reading materials”
To make up last cancelled lecture:
Kerry Bernstein’s talk – “Microarchitecture’s Race for Performance and Power”, PSU talk, 11/2004, Slides and Videos are online in ANGEL system “Interesting Reading Materials”
DAC Young Student Scholarship
Review: Basic Building Blocks
Multiplexers, decoders
Interconnect
Sp09 CMPEN 411 L23 S.*
2D 4x4 SRAM Memory Bank
A0
BLi+1
To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing.
Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines
Sp09 CMPEN 411 L23 S.*
6-Transistor SRAM Storage Cell
For lecture
Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters. The major job of the pullups is to replenish loss due to leakage
Sizing of the transistors is critical
Does it consume standby power (yes – but of two forms. see if the students can identify each). Cell leakage and bit line leakage.
Talk about the crow-bar effect of the cross-coupled inverters
Sp09 CMPEN 411 L23 S.*
SRAM Cell Analysis (Read)
Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area
M1 must be stronger than M5 when storing a 1 (as shown)
M3 must be stronger than M6 when storing a 0
0
First precharge both bit lines – BL and !BL – to 1 - Note that bit line capacitance values for Cbit can be in the pF range for large memories (bit line capacitance is from wiring and from diffusion caps of M5’s of all the cells connected to the bit line and the load presented by the sense amp)
Then discharge !BL through M5 and M1 (since the cell is holding 1) – key is that you must ensure that the !Q point doesn’t rise too high before Cbit is discharged or the memory cell will change state- read-disturb.
BL being 1 will help to keep the cell from toggling. Or can precharge the bit lines to Vdd/2 so !Q could never reach the switching point. This has performance benefits as well.
Sp09 CMPEN 411 L23 S.*
Read Voltage Ratios
Keep cell size minimal while maintaining read stability
Make M1 minimum size and increase the L of M5 (to make it weaker)
increases load on WL
Make M5 minimum size and increase the W of M1 (to make it stronger)
Similar constraints on (W3/L3)/(W6/L6) when storing a 0
1.2
To avoid read-disturb, the voltage on node !Q must remain below the trip point of the inverter pair for all process, noise, and operating conditions. CRs of no less than 1.2 (most microprocessor fabs use a minimum cell ratio of 1.25 to 2)
Simulations should be done at high Vdd and low VTn (and considering process variations and misalignments)
Sp09 CMPEN 411 L23 S.*
SRAM Cell Analysis (Write)
Cbit
Cbit
The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6.
M6 must be able to overpower M4 when storing a 1 and writing a 0
M5 must be able to overpower M2 when storing a 0 and writing a 1
0
State shown is that before write takes effect (1 is stored, trying to write a 0)
Note that the !Q side of the cell cannot be pulled high enough to ensure the writing of a 1 (because of the Q state holding M1 on for a good path to ground). So writing has to occur through the Q side of the RAM cell. In order to write the cell, the pass gate M6 must be more conductive than the M4 to allow node Q to be pulled to a value low enough for the inverter pair to begin amplifying the new data.
The maximum ratio of the pullup size to that of the pass gate required to guarantee that the cell is writable – M6 in linear, M4 in saturation
Sp09 CMPEN 411 L23 S.*
Write Voltage Ratios
Make M4 and M6 minimum size
1.8
In order to write the cell, node Q must be pulled to a value low enough to trip the inverter combination – so pulling Q below VTn (0.4V) is required.
The lower the PR, the lower the value of VQ (has to be below 1.8).
The limiting case for the write operation occurs at high Vdd when the pfet is strong (mup high, VTp low) and the nfet is weak (mun low, VTn high)
Typically, the widths of the pullup devices are sized at or near process minimum. Longer than minimum channel lengths may also be employed to further reduce the pullup ratio. This is necessary since the read sizing of the SRAM cell dictates that the pass gate sizing should be minimized to prevent read disturbs.
Sp09 CMPEN 411 L23 S.*
Cell Sizing and Performance
Minimum sized pull down fets (M1 and M3)
Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR
But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle
Minimum width and length pass transistors
Boost the width of the pull downs (M1 and M3)
Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
Performance is determined by the read operation
To accelerate the read time, SRAMs use sense amplifiers (so that the bit line doesn’t have to make a full swing)
Read requires (dis)charging of the large bit-line capacitance through the stack of two small transistors in the selected cell. Write is dominated by the propagation delay of the cross-coupled inverter pair.
SPEED – tp = CLVswing/Iav
Read – critical speed op since have large bit line capacitance to discharge through small transistors
Write – speed determined by the propagation delay of the cross-coupled inverters
tpLH > tpHL due to precharge
Sp09 CMPEN 411 L23 S.*
6-T SRAM Layout
Simple and reliable, but big
signal routing and connections to two bit lines, a word line, and both supply rails
Area is dominated by the wiring and contacts
Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process
VDD
GND
Q
Q
WL
BL
BL
M1
M3
M4
M2
M5
M6
Area is dominated by wiring and contacts – 11.5 contacts
Other considerations – size of cell (1092 lambda**2 = 582 micron **2 by 0.7 micron design rules)
and power – standby/cell of 10**-15A
Sp09 CMPEN 411 L23 S.*
Multiple Read/Write Port Storage Cell
To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports
!BL1
BL1
WL1
M1
M2
M3
M4
M5
M6
Q
!Q
WL2
BL2
!BL2
M7
M8
Allows multiple simultaneous reads of the same cell, so the cell design must be stable for a case of multiple reads.
For the case of reads, with more than one pass gate open, the voltage rise in the cell will be larger and thus the size of the pulldown will have to be increased to maintain an acceptably low level to keep from incurring a read upset (by a factor equal to the number of simultaneous open read ports).
Sp09 CMPEN 411 L23 S.*
Resistance-load SRAM Cell
M
3
R
L
R
L
V
DD
WL
Q
Q
M
1
M
2
M
4
BL
BL
How to make R? Undope poly Tera Om/squre poly with silicide 4-5 Om/squre
Sp09 CMPEN 411 L23 S.*
Remove R
M
3
WL
M
1
M
2
M
4
BL
BL
How to make R? Undope poly Tera Om/squre poly with silicide 4-5 Om/squre
Sp09 CMPEN 411 L23 S.*
Remove R
Further remove one transistor
How to make R? Undope poly Tera Om/squre poly with silicide 4-5 Om/squre
Sp09 CMPEN 411 L23 S.*
3-Transistor DRAM Cell
M1
M2
M3
X
BL1
BL2
WWL
RWL
Cs
Write: Cs is charged (or discharged) by asserting WWL and BL1
Value stored at node X when writing a 1 is VWWL - VTn
Read: Cs is “sensed” by asserting RWL and observing BL2
Read is non-destructive and inverting (ratioless)
X
VDD-VT
BL1
VDD
WWL
write
RWL
read
BL2
VDD-VT
V
Core of first popular MOS memories (e.g., first 1Kbit memory from Intel). Cs is data storage (internal capacitance of wiring, M2 gate, and M1 diffusion capacitances)
No constraints on device sizes (ratioless)
Note threshold drop at point X which decreases the drive (gate voltage) of M2 and slows down read time – some designs “bootstrap” the word line voltage (raise the VWWL to a value higher than Vdd to get around threshold drop).
Write – uses WWL and BL1. Data is retained as charge on CS once WWL is lowered.
Read – uses RWL and BL2. Assume BL2 precharged to Vdd (or Vdd-Vt). If cell is holding 1, then BL2 goes low – so reads are inverting.
Refresh – read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)
Sp09 CMPEN 411 L23 S.*
3-Transistor DRAM Cell
M1
M2
M3
X
BL1
BL2
WWL
RWL
Cs
Refresh: read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)
Note Vt drop at x: how to fix it?
X
VDD-VT
BL1
VDD
WWL
write
RWL
read
BL2
VDD-VT
V
Core of first popular MOS memories (e.g., first 1Kbit memory from Intel). Cs is data storage (internal capacitance of wiring, M2 gate, and M1 diffusion capacitances)
No constraints on device sizes (ratioless)
Note threshold drop at point X which decreases the drive (gate voltage) of M2 and slows down read time – some designs “bootstrap” the word line voltage (raise the VWWL to a value higher than Vdd to get around threshold drop).
Write – uses WWL and BL1. Data is retained as charge on CS once WWL is lowered.
Read – uses RWL and BL2. Assume BL2 precharged to Vdd (or Vdd-Vt). If cell is holding 1, then BL2 goes low – so reads are inverting.
Refresh – read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)
Sp09 CMPEN 411 L23 S.*
3-T DRAM Layout
Fewer contacts & wires
Total cell area is 576 2 (compared to 1,092 2 for the 6-T SRAM cell)
No special processing steps are needed (so compatible with logic CMOS process)
Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1”
BL2
BL1
GND
RWL
WWL
M3
M2
M1
Note many fewer contacts (only 7) and wires than in 6-T SRAM cell
576 lambda**2 as compared to 1092 lambda**2 for SRAM cell
Sp09 CMPEN 411 L23 S.*
1-Transistor DRAM Cell
M1
X
BL
WL
X
VDD-VT
WL
write
1
BL
VDD
Cs
read
1
VDD/2
sensing
CBL
Write: Cs is charged (or discharged) by asserting WL and BL
Read: Charge redistribution occurs between CBL and Cs
Read is destructive, so must refresh after read
Voltage swing is small
Most pervasive DRAM cell in commercial designs.
Write – set BL and activate WL. Once again could bootstrap WL so that voltage drop at X doesn’t occur (to bring it up to Vdd) – common practice
Read - BL precharged to Vpre – typically Vdd/2 – then assert WL and sense state of BL that takes effect due to charge sharing between CBL and Cs. Note that Read is destructive (steal charge from Cs) so must follow with a refresh cycle. Note that Cs << CBL (1 to 2 orders of magnitude) so read voltage swings are typically very small (around 250mV for 0.8 micron technology?). Charge transfer ratios are between 1% and 10%
delta V = VBL – Vpre = (Vbit – Vpre) (Cs/(Cs + CBL))
REQUIRES a sense amp for each bit line for correct operation (wereas before was used to improve performance (via reduced bit line swings on reads))
Sp09 CMPEN 411 L23 S.*
Sense Amp Operation
1-T DRAM Cell Observations
Cell is single ended (complicates the design of the sense amp)
Cell requires a sense amp for each bit line due to charge redistribution based read
BL’s precharged to VDD/2 (not VDD as with SRAM design)
all previous designs used SAs for speed, not functionality
Cell read is destructive; refresh must follow to restore data
Cell requires an extra capacitor (CS) that must be explicitly included in the design
May not compatible with logic CMOS process
A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)
Sp09 CMPEN 411 L23 S.*
1-T DRAM (3-D capacitor)
Peripheral Memory Circuitry
Sense amplifiers
Area – pitch matching
Address decoders have a substantial impact on the speed and power consumption of the memory
When designing decoders, important to keep the complete memory floorplan in perspective so that geometry matching between the decoder cell dimensions and the core cell is done – pitch matching. Otherwise, will have long lines affecting speed and power consumption.
Sp09 CMPEN 411 L23 S.*
2D 4x4 __RAM Memory
BLi+1
To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing.
Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines
Sp09 CMPEN 411 L23 S.*
2D 4x4 ___RAM Memory
Note refresh logic
Note a single bit line (no BLbar) – thus sense amps are required for operation (and are more difficult to design)
Note that now sense amps are in front of the column decoder (because one is needed for every bit line, not just for the word being accessed) unlike in the SRAM case.
Sp09 CMPEN 411 L23 S.*
Row Decoders
Collection of 2M complex logic gates organized in a regular, dense fashion
(N)AND decoder for 8 address bits
WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0

NOR decoder for 8 address bits
WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0)

Goals: Pitch matched, fast, low power
note that addresses are represented as unsigned numbers (all bits are used unlike in the book)
Sp09 CMPEN 411 L23 S.*
Dynamic Decoders
Precharge devices
V
DD
V
DD
V
DD
V
DD
Nor is faster (only one transistor to ground), but larger (see ROM, each transistor has to connect to GND), more power (3 WL switch vs. 1 WL in NAND)
Sp09 CMPEN 411 L23 S.*
Pass Transistor Based Column Decoder
BL3
BL2
BL1
BL0
data_out
A1
A0
S3
S2
S1
S0
Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell
Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2K + 2 x 2K)
For K = 2 3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20
!BL3
!BL2
!BL1
!BL0
!data_out
Essentially a 2**k input multiplexer
Can run the NOR decoder while the row decoder and core are working – so only have 1 extra transistor in the signal path. Make sure the select lines (S) go full swing (to VDD) so that full swing appears on the BLs during write
Transistor count – 2*(k+1)2**k + 2**k devices - so for k=10 (1024 to 1) it would require 2*12,288 transistors (11*1024 + 1024)
Note that this is for 1-bit data lines. For multiple bit data words, the cost of the predecoder can be amortized and the cost of the pt design is less (in terms of transistor count). Note the large load on the decoder outputs for multiple bit data words (2 * # bits in the data word).
Sp09 CMPEN 411 L23 S.*
Tree Based Column Decoder
Number of transistors = (2 x 2 x (2K -1))
for K = 2 2 x 2 x (22 – 1) = 4 x 3 = 12
Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders)
can fix with buffers, progressive sizing, combination of tree and pass transistor approaches
!BL3
!BL2
!BL1
!BL0
!data_out
Note no predecoder needed as with previous design – the reason for the transistor count reduction
Number of transistors comes down to 2* 2*(2**k – 1) – so for k=10 (1024 to1) requires only 2*2,046 transistors
But not true (i.e., transistor count savings) for more than one bit of data!
Sp09 CMPEN 411 L23 S.*
Bit Line Precharge Logic
!PC
!BL
BL
First step of a Read cycle is to precharge (PC) the bit lines to VDD
every differential signal in the memory must be equalized to the same voltage level before Read
Turn off PC and enable the WL
the grounded PMOS load limits the bit line swing (speeding up the next precharge cycle)
equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line
Static pullup scheme – advantage is that it does not require a heavily loaded precharge clock signal to be routed across the array; disadvantage is that it is always on, so is fighting against the bit line discharge for the bit lines that are moving low (consumes power)
Clocked scheme – allows the designer to use much larger precharge devices so that bit line equalization happens more rapidly (note equalization transistor to help even more); disadvantage is the power consumption of the heavily loaded precharge clock signal.
What purpose do the two pfets with their gates tied to ground serve?
Sp09 CMPEN 411 L23 S.*
Sense Amplifiers
Amplification – resolves data with small bit line swings (in some DRAMs required for proper functionality)
Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition
Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines
Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh
SA
input
output
large
small
tp = ( C * V ) / Iav
Differential Sense Amplifier
Directly applicable to
Differential Sensing SRAM
Reliability and Yield
Memories operate under low signal-to-noise conditions
word line to bit line coupling can vary substantially over the memory array
folded bit line architecture (routing BL and !BL next to each other ensures a closer match between parasitics and bit line capacitances)
interwire bit line to bit line coupling
transposed (or twisted) bit line architecture (turn the noise into a common-mode signal for the SA)
leakage (in DRAMs) requiring refresh operation
suffer from low yield due to high density and structural defects
increase yield by using error correction (e.g., parity bits) and redundancy
and are susceptible to soft errors due to alpha particles and cosmic rays
we have only shown/considered folded bit line architectures here.
Sp09 CMPEN 411 L23 S.*
Redundancy in the Memory Structure
Row address
Column address
Redundant row
Redundant columns
Fuse bank
Replace bad row or column with “spare” – done by setting the fuse bank
Helps to correct faults that affect a large section of the memory; not good for scattered point errors or local errors (use error correction (ECC) logic like parity bits for that)
Sp09 CMPEN 411 L23 S.*
Page 4
Page 5
Column Redundancy
Error-Correcting Codes
2K>= m+k+1. m # data bit, k # check bit
For 64 data bits, needs 7 check bits
e.g. If B3 flips
Sp09 CMPEN 411 L23 S.*
Performance and area overhead for ECC
A circuit failure occurs only when the voltage disturbance causes the logic state of the circuit to change such that it cannot automatically recover. This can happen before the disturbed node completely charges or discharges. Once the node voltage reaches the switching points of any associated logic gates, this false transition will start to propagate along these signal paths. Furthermore, since many circuits have feedback loops, positive feedback can even accelerate the faulty transitions. Given the physical mechanism of a soft error event, the following measures can be taken in circuit design to reduce the particle induced failure rates:
increase the storage node charge
Add devices to compensate for charge loss
Minimize the charge collecting efficiency at the storage nodes
Sp09 CMPEN 411 L23 S.*
Redundancy and Error Correction
Soft Errors
alpha particles (from the packaging materials)
neutrons from cosmic rays
As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower VDD) and thus Qcritical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER)
From Semico Research Corp.
18
9
FIT= Failure In Time, one FIT is a single failure in 1 billion (1e9) hours. Hence, a system that experiences 1 failure in 13,158 hours has a failure rate of 1E9/13,158 = 76,000 FITs.
Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to 0.074 upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.
Sp09 CMPEN 411 L23 S.*
Scary Fact
Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to 0.074 upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.
Sp09 CMPEN 411 L23 S.*
Modeling of a particle strike
See 2004 paper. Before I talk about the solution, first I want to spend a few minutes on the analysis of soft error.. First question you may have is how to model soft errors… We can model a transient fualt as a double exponential injection current sourse in SPICE, where the increasing lope depends on the collection time constant for a junction, which is dependent on the doping concentration and process… I0, the peak current, depends on the process and the charge intensity. The area, I*t is the Q…
Show Kerry’s video here…
Sp09 CMPEN 411 L23 S.*
A SPICE simulation for SRAM
A particle strike
On-chip Memory: ITRS roadmap
State of Art
State of Art

Recommended