Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs...

Sp09 CMPEN 411 L23 S.1

CMPEN 411VLSI Digital Circuits

Spring 2009

Lecture 23: Memory Cell DesignsSRAM, DRAM

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]


Heads-up

IBM Kerry Bernstein’s talk Thursday 4 PM, IST 333 To prepare for his talk, go to ANGEL system, find the file “New

dimensions in performance”, under “interesting reading materials”

To make up last cancelled lecture: Kerry Bernstein’s talk – “Microarchitecture’s Race for

Performance and Power”, PSU talk, 11/2004, Slides and Videos are online in ANGEL system “Interesting Reading Materials”

DAC Young Student Scholarship

www.dac.com


Review: Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random logic)

Interconnect Switches, arbiters, buses

Memory ROM, Caches (SRAMs), CAM, DRAMs, buffers


2D 4x4 SRAM Memory Bank

A0

Row

Dec

oder

!BLWL[0]

A1

A2

Column Decoder

sense amplifiers

write circuitry

BL

WL[1]

WL[2]

WL[3]

bit line precharge

2 bit words

clocking and control

enable

read precharge

BLi BLi+1


6-Transistor SRAM Storage Cell

!BL BL

WL

M1

M2

M3

M4

M5

M6Q

!Q 10

on

onoff

off


SRAM Cell Analysis (Read)

!BL=2.5V BL=2.5V

WL=1

M1

M4

M5

M6Q=1!Q=0

CbitCbit

Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area M1 must be stronger than M5 when storing a 1 (as shown) M3 must be stronger than M6 when storing a 0

0


Read Voltage Ratios

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3

Cell Ratio (CR)

Vol

tage

Ris

e on

!Q

VDD = 2.5VVTn = 0.4V

where CR is the Cell Ratio = (W1/L1)/(W5/L5)

Keep cell size minimal while maintaining read stability Make M1 minimum size

and increase the L of M5 (to make it weaker)

- increases load on WL

Make M5 minimum size and increase the W of M1 (to make it stronger)

Similar constraints on (W3/L3)/(W6/L6) when storing a 0

1.2


SRAM Cell Analysis (Write)

!BL=2.5V BL=0V

WL=1

M1

M4

M5

M6Q=1!Q=0

CbitCbit

The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6. M6 must be able to overpower M4 when storing a 1 and writing a 0 M5 must be able to overpower M2 when storing a 0 and writing a 1

0


Write Voltage Ratios

0

0.1

0.2

0.3

0.4

0.5

0 0.5 1 1.5 2

Pullup Ratio (PR)

Wri

te V

olta

ge (V

Q)

VDD = 2.5V|VTp| = 0.4V

p/n = 0.5

where PR is the Pull-up Ratio = (W4/L4)/(W6/L6)

Keep cell size minimal while allowing writes Make M4 and M6

minimum size

1.8

Sp09 CMPEN 411 L23 S.10

Cell Sizing and Performance

Keeping cell size minimal is critical for large SRAMs Minimum sized pull down fets (M1 and M3)

- Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR

- But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle

Minimum width and length pass transistors

- Boost the width of the pull downs (M1 and M3)

- Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger

Performance is determined by the read operation To accelerate the read time, SRAMs use sense amplifiers (so

that the bit line doesn’t have to make a full swing)

Sp09 CMPEN 411 L23 S.11

6-T SRAM Layout

VDD

GND

QQ

WL

BLBL

M1 M3

M4M2

M5 M6

Simple and reliable, but big signal routing and connections

to two bit lines, a word line, and both supply rails

Area is dominated by the wiring and contacts

Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process

Sp09 CMPEN 411 L23 S.12

Multiple Read/Write Port Storage Cell

!BL1 BL1

WL1

M1

M2

M3

M4

M5 M6Q!Q

WL2

BL2!BL2

M7 M8

To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports

Sp09 CMPEN 411 L23 S.13

Resistance-load SRAM Cell

M3

RL RL

VDD

WL

Q Q

M1 M2

M4

BL BL

Sp09 CMPEN 411 L23 S.14

Remove R

M3

WL

M1 M2

M4

BL BL

Sp09 CMPEN 411 L23 S.15

Remove R

M3

WL

M2

M4

Further remove one transistor

Sp09 CMPEN 411 L23 S.16

3-Transistor DRAM Cell

M1 M2

M3

X

BL1 BL2

WWL

RWL

X VDD-VT

BL1VDD

WWL write

RWL read

BL2 VDD-VT V

Cs

Write: Cs is charged (or discharged) by asserting WWL and BL1 Value stored at node X when writing a 1 is VWWL - VTn

Read: Cs is “sensed” by asserting RWL and observing BL2 Read is non-destructive and inverting (ratioless)

Sp09 CMPEN 411 L23 S.17


M1 M2

M3

X

BL1 BL2

WWL

RWL

X VDD-VT

BL1VDD

WWL write

RWL read

BL2 VDD-VT V

Cs

Refresh: read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec)

Note Vt drop at x: how to fix it?

Sp09 CMPEN 411 L23 S.18

3-T DRAM Layout

BL2 BL1 GND

RWL

WWL

M3

M2

M1

Fewer contacts & wires

Total cell area is 576 2 (compared to 1,092 2 for the 6-T SRAM cell)

No special processing steps are needed (so compatible with logic CMOS process)

Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1”

Sp09 CMPEN 411 L23 S.19


M1 X

BL

WL

X VDD-VT

WLwrite

1

BL VDD

Cs

read1

VDD/2 sensing

CBL

Write: Cs is charged (or discharged) by asserting WL and BL

Read: Charge redistribution occurs between CBL and Cs

Read is destructive, so must refresh after read

Voltage swing is small

Sp09 CMPEN 411 L23 S.20

Sense Amp Operation

V(1)

V(0)

t

VPRE

VBL

Sense amp activatedWord line activated

Sp09 CMPEN 411 L23 S.21

1-T DRAM Cell Observations Cell is single ended (complicates the design of the sense

amp) Cell requires a sense amp for each bit line due to charge

redistribution based read BL’s precharged to VDD/2 (not VDD as with SRAM design) all previous designs used SAs for speed, not functionality

Cell read is destructive; refresh must follow to restore data

Cell requires an extra capacitor (CS) that must be explicitly included in the design May not compatible with logic CMOS process

A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)

Sp09 CMPEN 411 L23 S.22

1-T DRAM (3-D capacitor)

Source: IBMNon-CMOS

Sp09 CMPEN 411 L23 S.23

Peripheral Memory Circuitry

Row and column decoders

Read bit line precharge logic

Sense amplifiers

Timing and control

Speed

Power consumption

Area – pitch matching

Sp09 CMPEN 411 L23 S.24

2D 4x4 __RAM Memory

A0

Row

Dec

oder

!BLWL[0]

A1

A2

Column Decoder

sense amplifiers

write circuitry

BL

WL[1]

WL[2]

WL[3]

bit line precharge

2 bit words

clocking and control

enable

read precharge

BLi BLi+1

Sp09 CMPEN 411 L23 S.25

2D 4x4 ___RAM Memory

A0

Row

Dec

oder

BLWL[0]

A1

A2

Column Decoder

sense amplifiers

write circuitry

WL[1]

WL[2]

WL[3]

bit line precharge

2 bit words

BL0 BL1 BL2 BL3

clocking, control, and

refresh

enable

read precharge

Sp09 CMPEN 411 L23 S.26

Row Decoders

Collection of 2M complex logic gates organized in a regular, dense fashion

(N)AND decoder for 8 address bits

WL(0) = !A7 & !A6 & !A5 & !A4 & !A3 & !A2 & !A1 & !A0

…

WL(255) = A7 & A6 & A5 & A4 & A3 & A2 & A1 & A0

NOR decoder for 8 address bits

WL(0) = !(A7 | A6 | A5 | A4 | A3 | A2 | A1 | A0)

…

WL(255) = !(!A7 | !A6 | !A5 | !A4 | !A3 | !A2 | !A1 | !A0)

Goals: Pitch matched, fast, low power

Sp09 CMPEN 411 L23 S.27

Dynamic Decoders

Precharge devices

VDD

GND

WL3

WL2

WL1

WL0

A0A0

GND

A1A1

WL3

A0A0 A1A1

WL 2

WL 1

WL 0

VDD

VDD

VDD

VDD

2-input NOR decoder 2-input NAND decoder

Which one is faster? Smaller? Low power?

Sp09 CMPEN 411 L23 S.28

Pass Transistor Based Column DecoderBL3 BL2 BL1 BL0

data_out

2 in

put

NO

R d

ecod

erA1

A0

S3

S2

S1

S0

Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell

Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2K + 2 x 2K)

For K = 2 3 x 22 (decoder) + 2 x 22 (PTs) = 12 + 8 = 20

!BL3 !BL2 !BL1 !BL0

!data_out

Sp09 CMPEN 411 L23 S.29

Tree Based Column DecoderBL3 BL2 BL1 BL0

A0

!A0

A1

!A1

data_out Number of transistors = (2 x 2 x (2K -1))

for K = 2 2 x 2 x (22 – 1) = 4 x 3 = 12

Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders)

can fix with buffers, progressive sizing, combination of tree and pass transistor approaches

!BL3 !BL2 !BL1 !BL0

!data_out

Sp09 CMPEN 411 L23 S.30

Bit Line Precharge Logic

equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line

!PC

!BLBL

First step of a Read cycle is to precharge (PC) the bit lines to VDD

every differential signal in the memory must be equalized to the same voltage level before Read

Turn off PC and enable the WL the grounded PMOS load

limits the bit line swing (speeding up the next precharge cycle)

Sp09 CMPEN 411 L23 S.31

Sense Amplifiers Amplification – resolves data

with small bit line swings (in some DRAMs required for proper functionality)

Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition

SA

input output

tp = ( C * V ) / Iav

large

small

make V as small as possible

Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines

Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh

Sp09 CMPEN 411 L23 S.32

Differential Sense Amplifier

Directly applicable toSRAMs

M4

M1

M5

M3

M2

VDD

bitbit

SE

Outy

Sp09 CMPEN 411 L23 S.33

Differential Sensing ― SRAM

VDD

VDD

VDD

VDD

BL

EQ

Diff.SenseAmp

(a) SRAM sensing scheme (b) two stage differential amplifier

SRAM cell i

WL i

2xx

VDD

Output

BL

PC

M3

M1

M5

M2

M4

x

SE

SE

SE

Output

SE

x2x 2x

Sp09 CMPEN 411 L23 S.35

Redundancy in the Memory Structure

Row address

Column address

Redundant row

Redundant columns

Fuse bank

Sp09 CMPEN 411 L23 S.36

Page 4

== ?

== ?

Redundant Wordline

Redundant Wordline

Fused RepairAddresses

EnableNormalWordlineDecoder

Normal Wordline

Functional Address

== ?

== ?

Redundant Wordline

Redundant Wordline Fused RepairAddresses

Enable

NormalWordlineDecoder

Normal Wordline

Row Redundancy

Sp09 CMPEN 411 L23 S.37

Page 5

Column Redundancy

Re

du

nda

nt D

ata

Co

lum

n

Norm

al D

ata

Co

lum

n

Norm

al D

ata

Co

lum

n

No

rma

l Data

Co

lum

n

Norm

al D

ata

Co

lum

n

No

rma

l Data

Co

lum

n

No

rma

l Data

Co

lum

n

No

rmal D

ata

Colu

mn

No

rma

l Data

Co

lum

nF

use

Fuse

Fu

se

Fu

se

Fu

se

Fu

se

Fu

se

Fu

se

Data 0

Data 1

Data 2

Data 3

Data 4

Data 5

Data 7

Data 6

Sp09 CMPEN 411 L23 S.38

Error-Correcting Codes

Example: Hamming Codes

e.g. If B3 flips

1

1

0

= 3

2K>= m+k+1. m # data bit, k # check bitFor 64 data bits, needs 7 check bits

Sp09 CMPEN 411 L23 S.39

Performance and area overhead for ECC

Sp09 CMPEN 411 L23 S.40

Redundancy and Error Correction

Sp09 CMPEN 411 L23 S.41

Soft Errors

Nonrecurrent and nonpermanent errors from

alpha particles (from the packaging materials)

neutrons from cosmic rays

As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower VDD) and thus Qcritical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER)

1

10

100

1000

10000

0.25 0.18 0.13 0.09 0.05

Process Technology

Sys

tem

FIT

S

From Semico Research Corp.

MTBF (hours)

.13 m .09 m

Ground-based 895 448

Civilian Avionics System 324 162

Military Avionics System 18 9

From Actel

Sp09 CMPEN 411 L23 S.42

Scary Fact

Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to 0.074 upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.

Sp09 CMPEN 411 L23 S.43

Modeling of a particle strike

Sp09 CMPEN 411 L23 S.44

A SPICE simulation for SRAM

A particle strike

!BLBL

WL

0->11->0

0

Sp09 CMPEN 411 L23 S.45

On-chip Memory: ITRS roadmap

0

20

40

60

80

100

% D

ie u

tiliz

ation

Area Reused LogicArea New LogicArea Memory

Sp09 CMPEN 411 L23 S.46

State of Art

Sp09 CMPEN 411 L23 S.47

State of Art

Date post:	02-Jan-2016
Category:	Documents
Upload:	valentine-parsons
View:	217 times
Download:	3 times

Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs...

Documents