SRAMReport[1]

Static Read Access Memory

(SRAM) Design

Abhinandan Majumdar

MS. Computer Engineering

[email protected]

Srinivas Satish

MS. Computer Engineering

[email protected]

December 10, 2007

Final Project

EE 4321

VLSI Circuits

Prof. Azeez Bhavnagarwala

I�DEX

1. I�TRODUCTIO� ......................................................................................................1

1.1 Design .....................................................................................................................1

1.2 SRAM Operation ....................................................................................................2

1.3 Applications and Uses .............................................................................................3

2. DESIG� .......................................................................................................................5

2.1 Block Diagram ........................................................................................................5

2.2 Decoder ...................................................................................................................6

2.2.1 2 Input And Gate Design ............................................................................7

2.2.2 3 Input And Gate Design ..........................................................................11

2.2.3 3x8 Decoder ..............................................................................................13

2.2.4 6x64 Decoder ............................................................................................14

2.2.5 Decoder Resizing ......................................................................................15

2.3 SRAM Cell and Array Design ..............................................................................17

2.3.1 Precharge Circuitry ...................................................................................17

2.3.2 SRAM Cell ...............................................................................................18

2.3.3 Read Sensing Circuit ................................................................................19

2.3.4 Write Driver ..............................................................................................19

2.3.5 SRAM Array ............................................................................................20

2.3.6 SRAM Cell with Decoder ........................................................................20

2.3.7 Read Stability ...........................................................................................21

2.4 DC Simulation ......................................................................................................22

2.4.1 Static Noise Margin (SNM) ......................................................................21

2.4.2 Cell Read Current .....................................................................................23

2.4.3 Effect of Threshold Voltage (Vt) ..............................................................24

3. LAYOUT ...................................................................................................................27

3.1 Decoder .................................................................................................................27

3.1.1 AND2 Gate ...............................................................................................27

3.1.2 AND3 Gate ...............................................................................................28

3.1.3 3x8 Decoder ..............................................................................................29

3.1.4 6x64 Decoder ............................................................................................29

3.2 SRAM ...................................................................................................................30

3.2.1 Precharge ...................................................................................................30

3.2.2 Read Sensing Circuit .................................................................................31

3.2.3 SRAM 64x64 Array ..................................................................................32

4. RESULTS ..................................................................................................................35

4.1 Simulation Results ................................................................................................35

4.1.1 Simulation of One SRAM Cell .................................................................35

4.1.2 Simulation of 64x64 SRAM Array ...........................................................36

4.2 DRC & LVS Results .............................................................................................37

5. CO�CLUSIO�......................................................................................................... 38

6. REFERE�CES .........................................................................................................39

1

1. I�TRODUCTIO�

Static random access memory (SRAM) is a type of semiconductor memory. The word

"static" indicates that the memory retains its contents as long as power remains applied,

unlike dynamic RAM (DRAM) that needs to be periodically refreshed.

Design

Fig 1.1 A six-transistor CMOS SRAM cell.

Random access means that locations in the memory can be written to or read from in any

order, regardless of the memory location that was last accessed.

Each bit in an SRAM is stored on four transistors that form two cross-coupled inverters.

This storage cell has two stable states which are used to denote 0 and 1. Two additional

access transistors serve to control the access to a storage cell during read and write

operations. It thus typically takes six MOSFETs to store one memory bit.

Access to the cell is enabled by the word line (WL in figure) which controls the two

access transistors M5 and M6 which, in turn, control whether the cell should be connected

to the bit lines: BL and BL’. They are used to transfer data for both read and write

operations. While it's not strictly necessary to have two bit lines, both the signal and its

inverse are typically provided since it improves noise margins.

During read accesses, the bit lines are actively driven high and low by the inverters in the

SRAM cell. This improves SRAM speed compared to DRAMs—in a DRAM, the bit line

is connected to storage capacitors and charge sharing causes the bitline to swing upwards

2

or downwards. The symmetric structure of SRAMs also allows for differential signaling,

which makes small voltage swings more easily detectable. Another difference with

DRAM that contributes to making SRAM faster is that commercial chips accept all

address bits at a time. By comparison, commodity DRAMs have the address multiplexed

in two halves, i.e. higher bits followed by lower bits, over the same package pins in order

to keep their size and cost down.

The size of an SRAM with m address lines and n data lines is 2m words, or 2

m × n bits.

1.2. SRAM operation

A SRAM cell has three different states it can be in: standby where the circuit is idle,

reading when the data has been requested and writing when updating the contents. The

three different states work as follows:

a) Standby

If the word line is not asserted, the access transistors M5 and M6 disconnect the cell from

the bit lines. The two cross coupled inverters formed by M1 – M4 will continue to

reinforce each other as long as they are disconnected from the outside world.

b) Reading

Assume that the content of the memory is a 1, stored at Q. The read cycle is started by

precharging both the bit lines to a logical 1, then asserting the word line WL, enabling

both the access transistors. The second step occurs when the values stored in Q and Q are

transferred to the bit lines by leaving BL at its precharged value and discharging BL

through M1 and M5 to a logical 0. On the BL side, the transistors M4 and M6 pull the bit

line toward VDD, a logical 1. If the content of the memory was a 0, the opposite would

happen and BL would be pulled toward 1 and BL toward 0.

c) Writing

The start of a write cycle begins by applying the value to be written to the bit lines. If we

wish to write a 0, we would apply a 0 to the bit lines, i.e. setting BL to 1 and BL to 0.

This is similar to applying a reset pulse to a SR-latch, which causes the flip flop to

change state. A 1 is written by inverting the values of the bit lines. WL is then asserted

and the value that is to be stored is latched in. Note that the reason this works is that the

bit line input-drivers are designed to be much stronger than the relatively weak transistors

in the cell itself, so that they can easily override the previous state of the cross-coupled

3

inverters. Careful sizing of the transistors in a SRAM cell is needed to ensure proper

operation.

1.3. Applications and Uses

a) Characteristics

SRAM is a little more expensive, but faster and significantly less power hungry

(especially idle) than DRAM. It is therefore used where either speed or low power, or

both, are of prime interest. SRAM is also easier to control (interface to) and generally

more truly random access than modern types of DRAM. Due to a more complex internal

structure, SRAM is less dense than DRAM and is therefore not used for high-capacity,

low-cost applications such as the main memory in personal computers.

b) Clock speed and power

The power consumption of SRAM varies widely depending on how frequently it is

accessed; it can be as power-hungry as dynamic RAM, when used at high frequencies,

and some ICs can consume many watts at full speed. On the other hand, static RAM used

at a somewhat slower pace, such as in applications with moderately clocked

microprocessors, draw very little power and can have a nearly negligible power

consumption when sitting idle — in the region of a few microwatts.

Static RAM exists primarily as:

(i) General purpose products

• with asynchronous interface, such as the 28 pin 32Kx8 chips (usually named

XXC256), and similar products up to 16 Mb per chip

• with synchronous interface, usually used for caches and other applications

requiring burst transfers, up to 18 Mb (256Kx72) per chip

(ii) Integrated on chip

• as RAM or cache memory in microcontrollers (usually from around 32 bytes up

to 128 kilobytes)

• as the primary caches in powerful microprocessors, such as the x86 family, and

many others (from 8 KB, up to several megabytes)

• on application specific ICs, or ASICs (usually in the order of kilobytes)

• in FPGAs and CPLDs (usually in the order of a few kilobytes or less)

4

c) Uses

(i) Embedded Use

Many categories of industrial and scientific subsystems, automotive electronics,

and similar, contains static RAM. Some amounts (kilobytes or less) is also

embedded in practically all modern appliances, toys, etc that implements an

electronic user interface. Several megabytes may be used in complex products

such as digital cameras, cell phones, synthesizers, etc. SRAM in its dual-ported

form is sometimes used for realtime digital signal processing circuits.

(ii) In computers

SRAM is also used in personal computers, workstations, routers and peripheral

equipment: internal CPU caches and external burst mode SRAM caches, hard disk

buffers, router buffers, etc. LCD screens and printers also normally employ static

RAM to hold the image displayed (or to be printed). Small SRAM buffers are also

found in CDROM and CDRW drives; usually 256 KB or more are used to buffer

track data, which is transferred in blocks instead of as single values. The same

applies to cable modems and similar equipment connected to computers. The so

called "CMOS RAM" on PC motherboards was originally a battery-powered

SRAM chip, but is today more often implemented using EEPROM or Flash.

5

2. DESIG�

2.1 Block Diagram

The block diagram of 64x64 bit SRAM is given below

Fig 2.1: 64x64 bit SRAM Cell Block Diagram

There are two major blocks to be designed:

• Address decoder: The address decoder takes in the 6 address lines a4:0 coming

from the latch, and decodes them to generate 64 wordlines WL0-63 for the SRAM

array.

• SRAM array: Consists of an array of 64 x 64 bit SRAM cells. In addition to

these blocks, the array also contains circuitry that allows data to be written into

the array, and for precharging the bitlines to VDD before the read operation; these

circuits are not shown in figure.

64x64 bit

SRAM Array

6x64

Decoder

A5

A4

A0

WL0

WL1

WL63

D1 D63 D0

6

2.2 DECODER

To construct a 64x64 bit SRAM, we need 6x64 Address Decoder to select one of the

word lines of 64 rows, each containing 64 1b SRAM cells. Hence we need to make the

decoder logic fastest so as it doesn’t become the bottleneck of our whole design. Hence

considering speed and layout issues, we are taking up Domino Logic for all the

intermediate nodes being used.

For designing a 6x64 Decoder, we can either have three 2x4 decoders in 1st stage and

perform ANDING of the corresponding outputs to have a 6x64 decoder logic, or we can

have two 3x8. But for the former case, we need 64 three input AND gate and 12 two

input AND gate and which is designed through domino logic, while the later design has

64 two input AND gates and 16 three input AND gate, hence considering the space

limitations as three input AND gate takes much more area and offer higher gate

capacitance, we choose the later design for 6x64 decoder.

Fig 2.2: 6x64 Decoder using 2x4 decoders

2x4

2x4

2x4

Requires 64 three input and

12 two input AND Gates

7

Fig 2.2: 6x64 Decoder design using two 3x8 decoders

2.2.1 2 Input A�D Gate Design – We designed 2 Input AND gate using Domino

Logic. Here is the schematic of the design

Fig 2.3: Schematic Design of AND2 Gate

3x8

Requires 64 two input AND

and 16 three input AND

Gate

3x8

8

i) Frequency Calculation. We kept input A & B at 1.2V, and saw how fast can

it be operated at higher frequency, and we found that it atleast needs 0.4ns or

2.5Ghz.

Fig 2.4: Frequency Variation for AND2 Gate

ii) PFET size calculation. We tried to simulate for varying Pfet size and found

that we need to keep pfet minimal as well as optimum to charge the bitline

faster at a given frequency of 2.5Ghz. We decided upon pfet size to be 715nm

so as precharges at a faster rate.

9

Fig 2.5: Pfet width variation for AND2 Gate

iii) Sizing of nfets – We try to scale the nfet array so as the propagation delay

could be minimized. Increasing the scaling decreases the propagation delay,

hence decided upon a = 1.3

Fig 2.6: NFET Size variation for NFET

10

iv) Keeper PFET sizing – Keeper PFET is the one whose gate is driven by the

output of the inverter, and prevents the voltage drop across the intermediate

capacitance to drop below the VM of the inverter during evaluation stage. First

graph is that of clock. Second graph shows that if we don’t have any pfet, the

output voltage rises by mV. If we connect it to a pfet and increase its size by

b*(sum of the width of nfet array), we see the outout to be stable at 0 and

randomness decreases by increase in b. Hence we find b = 0.15.

Fig 2.7: Keeper PFET sizing for AND2 gate

v) Inverter Sizing. Though we should make the nfet stronger than pfet so as the

voltage drop across intermediate capacitance is greater than VM of inveter.

But making nfet stronger adds delay, so by adding a Keeper Pfet so as to keep

the intermediate capacitance charged, we can increase our pfet to have same

rise and fall time. Hence we find the beta ratio to be 2.45.

11

Fig 2.8: Inverter size variation for AND2 Gate

2.2.2 3 I�PUT A�D GATE. The ratios which we got for 2 INPUT AND Gate are

kept same for 3 INPUT too, but the confusion should we use 2 cascaded

AND2 gate for a 3 Input AND or single 3 INPUT AND. Hence we computed

the propagation delay, and found following things. AND2_1 and AND2_2 is

cascade 2 AND with changing line in 1st and 2

nd AND respectively.

Gate High to Low Low to High Propagation

Delay

AND2 0 1.15ns 0.575ns

AND2_1 (cascaded) 0 1.18ns 0.59ns

AND2_2 (cascaded) 0 1.19ns 0.595ns

AND3 0 1.46ns 0.73ns

Hence cascaded AND2 would make our design faster but could make it asymmetrical,

hence we chose AND3.

12

AND2 (Only one 2 Input AND) AND2_1 (Cascaded 2 Input AND)

AND2_2 (Cascaded 2 Input AND) AND3 (3 Input AND)

13

2.2.3 3x8 DECODER – Here is the schematic for the Decoder.

Fig 2.9: 3x8 Decoder Schematic

And, here is the simulation graph,

14

Fig 2.10: Simulation of 3x8 Decoder

2.2.4 6x64 Decoder – We used two 3x8 decoders and used 2 AND for having the

64x6 decoder logic. Here is the schematic

Fig 2.11: Schematic of 6x64 Decoder

We kept all inputs A1-A5 at 0 and sweeped A0 from 0 to 1.2V, and saw that Y0

dropping out and Y1 rising to 1.2V.

15

Fig 2.12: Propagation Delay at the Critical Path for 6x64 Decoder

2.2.5 Decoder Resizing.

The delay what we got after designing was 5.177ns – 5.025ns = 0.152ns when

running at 1Ghz and driving a capacitance of 39.931fF. We computed the end

capacitance having the value of gate capacitance as 1fF/um and width capacitance as

0.2fF/um. In this case the AND3 nfets have W1 = 1u and rest being size by the ratio

1.3, inveter nfet has W2 = 1um, AND2 nfets have W3 = 1u and sized accordingly

with ratio 1.3 and inverter has W4 = 1um.

To have minimal delay so as to have equal rise time and fall time, we optimized the

sizes as follows,

For AND3,

NFET Array: 2u, 2.6u, 3.38u, 4.395u

PFET: 3u

Keeper PFET: 800nm

Inverter: NFET – 3u

PFET – 2.9u

16

For AND2,

NFET Array: 5.8u, 7.54u, 9.8u

PFET: 3.2u

Keeper PFET: 2.2u

Inverter: NFET – 3u

PFET – 2.9u

Here’s the critical path

Fig 2.13: Schematic of Critical Path in 6x64 Decoder

We obtained a fall and rise time for the four stages as follows 33.94ps, 34,94ps, 33.23ps,

34.99ps. By this, our propagation delay got reduced from 152ps to 89ps (1.594ns –

1.505ns = 89ps). Hence we stick to this sizes.

17

Fig 2.14: Propagation of Critical Path in 6x64 Decoder after Optimization

2.3 SRAM cell and array design

2.3.1 Precharge circuitry

The schematic of the precharge circuit is shown below. The pfet are of 1um width.

This large width of the pfet is required to be able to charge the bitline quickly during

the pre-charge phase. The huge width ensures that the bit-line BIT and BIT_B are

charged to VDD in half the clock cycle.

18

Fig 2.15: Schematic of Precharge Circuit

2.3.2 SRAM Cell.

Schematic of the cell is shown below. The sizes of the access transistors, inverter

nfet, pfet widths are as per the ones given in the layout.

Fig 2.16: Schematic of SRAM Cell

19

2.3.3 Read Sense Circuit

Schematic of the read large sense circuit is shown below. The basic NAND gate is

sized with nfet=280nm and pfet width of 560nm a ratio of 4.8:1. This is the required

ratio in the 90nm process with channel length=80nm for achieving ideal rise and fall

times.

Fig 2.17: Schematic of Read Sense Circuit

2.3.4 Write driver

The write driver is enabled by a Write_enable line. The schematic is shown below.

Fig 2.18: Schematic of Write Circuit

20

2.3.5 The complete SRAM Array

Following is the schematic of 64x64 bit SRAM cell

Fig 2.19: Schematic of SRAM Array

2.3.6 SRAM Array with Decoder

Here is the schematic of the complete SRAM with DECODER,

Fig 2.20: Schematic of SRAM Array with 6X64 Decoder

21

2.3.7 Read Stability

This is an important characteristic of the SRAM Cell. During a read-operation one of the

bitlines either BIT or BIT_B is discharged though the access transistor and an nfet of the

inverter. During this discharge process, a large amount of current flows through node A (

shown below). Read stability is a measure of the potential at node A, this potential should

not exceed the switching threshold of the other inverter. If it does then the state of the

SRAM has changed. An analogous analysis was done in identifying tradeoffs in Read

Current and Static Noise Margin.

Following is the READ STABILITY Graph.

Fig 2.21: Simulation of Read Stability

22

2.4 DC SIMULATIO�

2.4.1 STATIC �OISE MARGI�

Here is the schematic of the SRAM for Static Noise Margin Measurement. We sweep

the left voltage and measure the right voltage and do vice versa and find the min edge

of the max box that can fit into the butterfly curve.

Fig 2.22: Schematic of SRAM Array with 6X64 Decoder

(i) HOLD operation. We keep the gate of the pass transistors at GND and get

the following curve. The SNM for this is 0.4604.

Fig 2.23: Hold operation

23

(ii) READ - The SNM we got was 0.1616V. The graph is as follows.

Fig 2.24: Static Noise Margin estimation of SRAM Cell

2.4.2 Cell Read Current

Cell read current equals the current that flows through the pass gate nfet connected to

the BL draining charge on the BL into the cell ground terminal. The larger the current

the faster BL gets discharged and develops a signal for the sensing circuit to detect.

Having a very large Read Current flowing through the discharge path from bit line to

the ground could result in the exceeding the read stability threshold. This can be

avoided by optimally choosing the sizing of the access nfet and the discharge nfet of

the respective inverted during a read operation cycle.

24

Fig 2.25: Cell Read Current Simulation

2.4.3 Effect of Threshold Voltage (Vt)

We change Vt by 25mV, 50mV, 100mV and 200mV by adding a –ve voltage to the

gate and got following values.

Vt Pass nfet Pull down nfet Pfet

25mV 0.1638 0.1626 0.1518

50mV 0.1725 0.1655 0.1483

100mV 0.1900 0.1732 0.1422

200mV 0.2246 0.1778 0.1252

25

Fig 2.26 - Effect of SNM by increasing Vt at pass nfet

Fig 2.27- Effect of SNM on increasing Vt at pull down nfet

26

Fig 2.28- Effect of increasing Vt at one end of pfet and measuring other side.

27

3. LAYOUT

3.1 DECODER

3.1.1 A�D2 Gate.

Here is the layout of AND2 gate which passes both DRC and LVS

Fig 3.1- DRC and LVS results for AND2 Gate along with layout.

28

3.1.2 A�D3 Gate.

Here is the layout of AND3 gate which passes both DRC and LVS

Fig 3.2- DRC and LVS results for AND3 Gate along with layout.

29

3.1.3 3x8 DECODER

Here is the layout of 3x8 Decoder which passes both DRC and LVS

Fig 3.3- DRC and LVS results for 3x8 Decoder along with layout.

3.1.4 6x64 DECODER

Here is the layout of 3x8 Decoder which passes both DRC and LVS

30

Fig 3.4- DRC and LVS results for 6x64 Decoder along with layout.

3.2 SRAM

3.2.1 Precharge circuit layout

The width of the entire precharge circuit layout should be equal to the width between the

two bit lines BIT and BIT_B. Below is an image of our layout of this circuit with its DRC

and LVS results.

31

Fig 3.5- DRC and LVS results for Precharge Circuit along with layout

3.2.2 Read Sense Amp Circuit

In the layout of the read circuit, care has to be taken to ensure that it fits exactly in

between the two bitlines. The symmetric lateral reflection layout of the SRAM cells adds

some degree of complexity, this being due to the fact that now we would have a series of

BIT, BIT_B, BIT_B, BIT followed by the same pattern. For a read it is sufficient to sense

one of the bit lines, either BIT or BIT_B. Two read sense amps would have to be fit

between the two BIT lines. The LVS results and the layout of the Read Sense amp can be

found in the image below.

32

Fig 3.6 DRC and LVS results for Read Sense Amplifier along with layout

3.2.3 SRAM 64 X 64 array

Using the SRAM Cell provided from the standard library, we created a symmetrical and

laterally inverted 2 X 2 network of SRAM cells. This was done to achieve a good sharing

of the power rails and to reduce the bit line noise reduction. Though not done in our

layout cross coupling bit lines would reduce the bit line noise to a very good extent.

Using an instance of 2 X 2 SRAM cells the entire array of 64 X 32 top half and 64 X 32

bottom halves as shown in the schematic of phase two was laid out. Following this is the

insertion of the Read Sense Amplifiers in between the top half and bottom halves of the

33

entire SRAM array layout. To the left of the image below is the layout of the 2 X 2

network of SRAM cells and to the right the 64 X 64 layout of SRAM cells.

Fig 3.7- Array of SRAM Cells, 2 X 2 and 64 X 64 arrays.

Image below shows the DRC test results:

Fig 3.8: DRC results for the 64 X 64 SRAM array

34

Here’s the complete layout of SRAM cell with decoder.

Fig 3.9: 64 X 64 SRAM array along with 6x64 Decoder

35

4. RESULTS

4.1 Simulation Results

4.1.1 Simulation for One Cell SRAM

We simulated a single cell SRAM with following schematic

Fig 4.1 – One Cell SRAM Schematic

36

Below is a graph showing the Write – 1 Read – 1 Write – 0 simulation on a single SRAM

cell.

Fig 4.2 – One Cell SRAM Simulation

4.1.2 Simulation for 64x64 bit SRAM Array

Here is the schematic used for 64x64 bit SRAM Array

Fig 4.3 –64x64 SRAM Array

and here are the simulation results, when din<0> = 1, din<1> = 0, and din<2> = 1 with

address line as 000000, and clock running at 1 Ghz.

37

Fig 4.4 – Simulation for complete 64x64 SRAM cell Array

4.2 DRC and LVS Results

The DRC and LVS were checked for each component individually. The following is

a summary of the results:

Functional Component DRC LVS

6 X 64 Decoder Passed Passed

Precharge Passed Passed

Read Sense Amp Passed Passed

64 X 64 SRAM array Errors Errors

Please find all reports to these tests at the following location on

http://vlsi2.cisl.columbia.edu

/home/user5/fall07/ssn2111/LVS_FinalReports

/home/user5/fall07/ssn2111/DRC_FinalReports

38

5. CO�CLUSIO�

As a SRAM project for EE 4321 VLSI course, we designed 64x64 bit SRAM cell both at

the schematic and layout level. We attempted to design the 6x64 decoder using 3x8

decoder using two and three input AND gates using Domino Logic. We could

successfully simulate and verify the functionality of the components which we targeted to

design. Though we couldn’t successfully pass the DRC and LVS of entire unit because of

the primary reason that the unit cell being provided to us failed at DRC and LVS level,

but we could successfully pass the DRC and LVS of other individual components

including Pre-Charge, Read Sensing Circuit and 6x64 Decoder.

The experience on working for such a design oriented project gave us a thorough insight

what all critical issues we need to consider while designing a simple unit. This also made

us familiar with the different approaches to implement the same design and decide what

the tradeoffs between different alternatives are. Also, it made us aware of the critical

physical implementation issues which we not only have to consider during actual layout

but also during schematic level design. It also gave a hand-on experience upon CAD

tools like Cadence, Virtuoso, Spice and Spectre widely used both at industrial and

academic level for circuit designing. Overall, it was a nice experience both at learning,

practicing and designing a most critical part of processor unit widely used in any

Computer Architecture.

39

6. REFERE�CES

1. http://en.wikipedia.org/wiki/Static_random_access_memory

2. Cmos Logic – Uyemura

3. CMOS VLSI Design – Weste & Harris

4. Static-Noise Margin Analysis of CMOS SRAM Cells EVERT SEEVINCK,

SENIOR MEMBER, IEEE, FRANS J. LIST, AND JAN LOHSTROH,

MEMBER, IEEE.

5. Analyzing Static Noise Margin for Subthreshold SRAM in 65nm CMOS Benton

H. Calhoun and Anantha Chandrakasan

6. Transistor Sizing for Reliable Domino Logic Design in Dual Threshold Voltage

Technologies by Seong-Ook Jung, Ki-Wook Kim, Sung-Mo (Steve) Kang

Date post:	20-Jan-2016
Category:	Documents
Upload:	premkumar-chandhran
View:	46 times
Download:	0 times

SRAMReport[1]

Documents