Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | premkumar-chandhran |
View: | 46 times |
Download: | 0 times |
Static Read Access Memory
(SRAM) Design
Abhinandan Majumdar
MS. Computer Engineering
Srinivas Satish
MS. Computer Engineering
December 10, 2007
Final Project
EE 4321
VLSI Circuits
Prof. Azeez Bhavnagarwala
I�DEX
1. I�TRODUCTIO� ......................................................................................................1
1.1 Design .....................................................................................................................1
1.2 SRAM Operation ....................................................................................................2
1.3 Applications and Uses .............................................................................................3
2. DESIG� .......................................................................................................................5
2.1 Block Diagram ........................................................................................................5
2.2 Decoder ...................................................................................................................6
2.2.1 2 Input And Gate Design ............................................................................7
2.2.2 3 Input And Gate Design ..........................................................................11
2.2.3 3x8 Decoder ..............................................................................................13
2.2.4 6x64 Decoder ............................................................................................14
2.2.5 Decoder Resizing ......................................................................................15
2.3 SRAM Cell and Array Design ..............................................................................17
2.3.1 Precharge Circuitry ...................................................................................17
2.3.2 SRAM Cell ...............................................................................................18
2.3.3 Read Sensing Circuit ................................................................................19
2.3.4 Write Driver ..............................................................................................19
2.3.5 SRAM Array ............................................................................................20
2.3.6 SRAM Cell with Decoder ........................................................................20
2.3.7 Read Stability ...........................................................................................21
2.4 DC Simulation ......................................................................................................22
2.4.1 Static Noise Margin (SNM) ......................................................................21
2.4.2 Cell Read Current .....................................................................................23
2.4.3 Effect of Threshold Voltage (Vt) ..............................................................24
3. LAYOUT ...................................................................................................................27
3.1 Decoder .................................................................................................................27
3.1.1 AND2 Gate ...............................................................................................27
3.1.2 AND3 Gate ...............................................................................................28
3.1.3 3x8 Decoder ..............................................................................................29
3.1.4 6x64 Decoder ............................................................................................29
3.2 SRAM ...................................................................................................................30
3.2.1 Precharge ...................................................................................................30
3.2.2 Read Sensing Circuit .................................................................................31
3.2.3 SRAM 64x64 Array ..................................................................................32
4. RESULTS ..................................................................................................................35
4.1 Simulation Results ................................................................................................35
4.1.1 Simulation of One SRAM Cell .................................................................35
4.1.2 Simulation of 64x64 SRAM Array ...........................................................36
4.2 DRC & LVS Results .............................................................................................37
5. CO�CLUSIO�......................................................................................................... 38
6. REFERE�CES .........................................................................................................39
1
1. I�TRODUCTIO�
Static random access memory (SRAM) is a type of semiconductor memory. The word
"static" indicates that the memory retains its contents as long as power remains applied,
unlike dynamic RAM (DRAM) that needs to be periodically refreshed.
Design
Fig 1.1 A six-transistor CMOS SRAM cell.
Random access means that locations in the memory can be written to or read from in any
order, regardless of the memory location that was last accessed.
Each bit in an SRAM is stored on four transistors that form two cross-coupled inverters.
This storage cell has two stable states which are used to denote 0 and 1. Two additional
access transistors serve to control the access to a storage cell during read and write
operations. It thus typically takes six MOSFETs to store one memory bit.
Access to the cell is enabled by the word line (WL in figure) which controls the two
access transistors M5 and M6 which, in turn, control whether the cell should be connected
to the bit lines: BL and BL’. They are used to transfer data for both read and write
operations. While it's not strictly necessary to have two bit lines, both the signal and its
inverse are typically provided since it improves noise margins.
During read accesses, the bit lines are actively driven high and low by the inverters in the
SRAM cell. This improves SRAM speed compared to DRAMs—in a DRAM, the bit line
is connected to storage capacitors and charge sharing causes the bitline to swing upwards
2
or downwards. The symmetric structure of SRAMs also allows for differential signaling,
which makes small voltage swings more easily detectable. Another difference with
DRAM that contributes to making SRAM faster is that commercial chips accept all
address bits at a time. By comparison, commodity DRAMs have the address multiplexed
in two halves, i.e. higher bits followed by lower bits, over the same package pins in order
to keep their size and cost down.
The size of an SRAM with m address lines and n data lines is 2m words, or 2
m × n bits.
1.2. SRAM operation
A SRAM cell has three different states it can be in: standby where the circuit is idle,
reading when the data has been requested and writing when updating the contents. The
three different states work as follows:
a) Standby
If the word line is not asserted, the access transistors M5 and M6 disconnect the cell from
the bit lines. The two cross coupled inverters formed by M1 – M4 will continue to
reinforce each other as long as they are disconnected from the outside world.
b) Reading
Assume that the content of the memory is a 1, stored at Q. The read cycle is started by
precharging both the bit lines to a logical 1, then asserting the word line WL, enabling
both the access transistors. The second step occurs when the values stored in Q and Q are
transferred to the bit lines by leaving BL at its precharged value and discharging BL
through M1 and M5 to a logical 0. On the BL side, the transistors M4 and M6 pull the bit
line toward VDD, a logical 1. If the content of the memory was a 0, the opposite would
happen and BL would be pulled toward 1 and BL toward 0.
c) Writing
The start of a write cycle begins by applying the value to be written to the bit lines. If we
wish to write a 0, we would apply a 0 to the bit lines, i.e. setting BL to 1 and BL to 0.
This is similar to applying a reset pulse to a SR-latch, which causes the flip flop to
change state. A 1 is written by inverting the values of the bit lines. WL is then asserted
and the value that is to be stored is latched in. Note that the reason this works is that the
bit line input-drivers are designed to be much stronger than the relatively weak transistors
in the cell itself, so that they can easily override the previous state of the cross-coupled
3
inverters. Careful sizing of the transistors in a SRAM cell is needed to ensure proper
operation.
1.3. Applications and Uses
a) Characteristics
SRAM is a little more expensive, but faster and significantly less power hungry
(especially idle) than DRAM. It is therefore used where either speed or low power, or
both, are of prime interest. SRAM is also easier to control (interface to) and generally
more truly random access than modern types of DRAM. Due to a more complex internal
structure, SRAM is less dense than DRAM and is therefore not used for high-capacity,
low-cost applications such as the main memory in personal computers.
b) Clock speed and power
The power consumption of SRAM varies widely depending on how frequently it is
accessed; it can be as power-hungry as dynamic RAM, when used at high frequencies,
and some ICs can consume many watts at full speed. On the other hand, static RAM used
at a somewhat slower pace, such as in applications with moderately clocked
microprocessors, draw very little power and can have a nearly negligible power
consumption when sitting idle — in the region of a few microwatts.
Static RAM exists primarily as:
(i) General purpose products
• with asynchronous interface, such as the 28 pin 32Kx8 chips (usually named
XXC256), and similar products up to 16 Mb per chip
• with synchronous interface, usually used for caches and other applications
requiring burst transfers, up to 18 Mb (256Kx72) per chip
(ii) Integrated on chip
• as RAM or cache memory in microcontrollers (usually from around 32 bytes up
to 128 kilobytes)
• as the primary caches in powerful microprocessors, such as the x86 family, and
many others (from 8 KB, up to several megabytes)
• on application specific ICs, or ASICs (usually in the order of kilobytes)
• in FPGAs and CPLDs (usually in the order of a few kilobytes or less)
4
c) Uses
(i) Embedded Use
Many categories of industrial and scientific subsystems, automotive electronics,
and similar, contains static RAM. Some amounts (kilobytes or less) is also
embedded in practically all modern appliances, toys, etc that implements an
electronic user interface. Several megabytes may be used in complex products
such as digital cameras, cell phones, synthesizers, etc. SRAM in its dual-ported
form is sometimes used for realtime digital signal processing circuits.
(ii) In computers
SRAM is also used in personal computers, workstations, routers and peripheral
equipment: internal CPU caches and external burst mode SRAM caches, hard disk
buffers, router buffers, etc. LCD screens and printers also normally employ static
RAM to hold the image displayed (or to be printed). Small SRAM buffers are also
found in CDROM and CDRW drives; usually 256 KB or more are used to buffer
track data, which is transferred in blocks instead of as single values. The same
applies to cable modems and similar equipment connected to computers. The so
called "CMOS RAM" on PC motherboards was originally a battery-powered
SRAM chip, but is today more often implemented using EEPROM or Flash.
5
2. DESIG�
2.1 Block Diagram
The block diagram of 64x64 bit SRAM is given below
Fig 2.1: 64x64 bit SRAM Cell Block Diagram
There are two major blocks to be designed:
• Address decoder: The address decoder takes in the 6 address lines a4:0 coming
from the latch, and decodes them to generate 64 wordlines WL0-63 for the SRAM
array.
• SRAM array: Consists of an array of 64 x 64 bit SRAM cells. In addition to
these blocks, the array also contains circuitry that allows data to be written into
the array, and for precharging the bitlines to VDD before the read operation; these
circuits are not shown in figure.
64x64 bit
SRAM Array
6x64
Decoder
A5
A4
A0
WL0
WL1
WL63
D1 D63 D0
6
2.2 DECODER
To construct a 64x64 bit SRAM, we need 6x64 Address Decoder to select one of the
word lines of 64 rows, each containing 64 1b SRAM cells. Hence we need to make the
decoder logic fastest so as it doesn’t become the bottleneck of our whole design. Hence
considering speed and layout issues, we are taking up Domino Logic for all the
intermediate nodes being used.
For designing a 6x64 Decoder, we can either have three 2x4 decoders in 1st stage and
perform ANDING of the corresponding outputs to have a 6x64 decoder logic, or we can
have two 3x8. But for the former case, we need 64 three input AND gate and 12 two
input AND gate and which is designed through domino logic, while the later design has
64 two input AND gates and 16 three input AND gate, hence considering the space
limitations as three input AND gate takes much more area and offer higher gate
capacitance, we choose the later design for 6x64 decoder.
Fig 2.2: 6x64 Decoder using 2x4 decoders
2x4
2x4
2x4
Requires 64 three input and
12 two input AND Gates
7
Fig 2.2: 6x64 Decoder design using two 3x8 decoders
2.2.1 2 Input A�D Gate Design – We designed 2 Input AND gate using Domino
Logic. Here is the schematic of the design
Fig 2.3: Schematic Design of AND2 Gate
3x8
Requires 64 two input AND
and 16 three input AND
Gate
3x8
8
i) Frequency Calculation. We kept input A & B at 1.2V, and saw how fast can
it be operated at higher frequency, and we found that it atleast needs 0.4ns or
2.5Ghz.
Fig 2.4: Frequency Variation for AND2 Gate
ii) PFET size calculation. We tried to simulate for varying Pfet size and found
that we need to keep pfet minimal as well as optimum to charge the bitline
faster at a given frequency of 2.5Ghz. We decided upon pfet size to be 715nm
so as precharges at a faster rate.
9
Fig 2.5: Pfet width variation for AND2 Gate
iii) Sizing of nfets – We try to scale the nfet array so as the propagation delay
could be minimized. Increasing the scaling decreases the propagation delay,
hence decided upon a = 1.3
Fig 2.6: NFET Size variation for NFET
10
iv) Keeper PFET sizing – Keeper PFET is the one whose gate is driven by the
output of the inverter, and prevents the voltage drop across the intermediate
capacitance to drop below the VM of the inverter during evaluation stage. First
graph is that of clock. Second graph shows that if we don’t have any pfet, the
output voltage rises by mV. If we connect it to a pfet and increase its size by
b*(sum of the width of nfet array), we see the outout to be stable at 0 and
randomness decreases by increase in b. Hence we find b = 0.15.
Fig 2.7: Keeper PFET sizing for AND2 gate
v) Inverter Sizing. Though we should make the nfet stronger than pfet so as the
voltage drop across intermediate capacitance is greater than VM of inveter.
But making nfet stronger adds delay, so by adding a Keeper Pfet so as to keep
the intermediate capacitance charged, we can increase our pfet to have same
rise and fall time. Hence we find the beta ratio to be 2.45.
11
Fig 2.8: Inverter size variation for AND2 Gate
2.2.2 3 I�PUT A�D GATE. The ratios which we got for 2 INPUT AND Gate are
kept same for 3 INPUT too, but the confusion should we use 2 cascaded
AND2 gate for a 3 Input AND or single 3 INPUT AND. Hence we computed
the propagation delay, and found following things. AND2_1 and AND2_2 is
cascade 2 AND with changing line in 1st and 2
nd AND respectively.
Gate High to Low Low to High Propagation
Delay
AND2 0 1.15ns 0.575ns
AND2_1 (cascaded) 0 1.18ns 0.59ns
AND2_2 (cascaded) 0 1.19ns 0.595ns
AND3 0 1.46ns 0.73ns
Hence cascaded AND2 would make our design faster but could make it asymmetrical,
hence we chose AND3.
12
AND2 (Only one 2 Input AND) AND2_1 (Cascaded 2 Input AND)
AND2_2 (Cascaded 2 Input AND) AND3 (3 Input AND)
13
2.2.3 3x8 DECODER – Here is the schematic for the Decoder.
Fig 2.9: 3x8 Decoder Schematic
And, here is the simulation graph,
14
Fig 2.10: Simulation of 3x8 Decoder
2.2.4 6x64 Decoder – We used two 3x8 decoders and used 2 AND for having the
64x6 decoder logic. Here is the schematic
Fig 2.11: Schematic of 6x64 Decoder
We kept all inputs A1-A5 at 0 and sweeped A0 from 0 to 1.2V, and saw that Y0
dropping out and Y1 rising to 1.2V.
15
Fig 2.12: Propagation Delay at the Critical Path for 6x64 Decoder
2.2.5 Decoder Resizing.
The delay what we got after designing was 5.177ns – 5.025ns = 0.152ns when
running at 1Ghz and driving a capacitance of 39.931fF. We computed the end
capacitance having the value of gate capacitance as 1fF/um and width capacitance as
0.2fF/um. In this case the AND3 nfets have W1 = 1u and rest being size by the ratio
1.3, inveter nfet has W2 = 1um, AND2 nfets have W3 = 1u and sized accordingly
with ratio 1.3 and inverter has W4 = 1um.
To have minimal delay so as to have equal rise time and fall time, we optimized the
sizes as follows,
For AND3,
NFET Array: 2u, 2.6u, 3.38u, 4.395u
PFET: 3u
Keeper PFET: 800nm
Inverter: NFET – 3u
PFET – 2.9u
16
For AND2,
NFET Array: 5.8u, 7.54u, 9.8u
PFET: 3.2u
Keeper PFET: 2.2u
Inverter: NFET – 3u
PFET – 2.9u
Here’s the critical path
Fig 2.13: Schematic of Critical Path in 6x64 Decoder
We obtained a fall and rise time for the four stages as follows 33.94ps, 34,94ps, 33.23ps,
34.99ps. By this, our propagation delay got reduced from 152ps to 89ps (1.594ns –
1.505ns = 89ps). Hence we stick to this sizes.
17
Fig 2.14: Propagation of Critical Path in 6x64 Decoder after Optimization
2.3 SRAM cell and array design
2.3.1 Precharge circuitry
The schematic of the precharge circuit is shown below. The pfet are of 1um width.
This large width of the pfet is required to be able to charge the bitline quickly during
the pre-charge phase. The huge width ensures that the bit-line BIT and BIT_B are
charged to VDD in half the clock cycle.
18
Fig 2.15: Schematic of Precharge Circuit
2.3.2 SRAM Cell.
Schematic of the cell is shown below. The sizes of the access transistors, inverter
nfet, pfet widths are as per the ones given in the layout.
Fig 2.16: Schematic of SRAM Cell
19
2.3.3 Read Sense Circuit
Schematic of the read large sense circuit is shown below. The basic NAND gate is
sized with nfet=280nm and pfet width of 560nm a ratio of 4.8:1. This is the required
ratio in the 90nm process with channel length=80nm for achieving ideal rise and fall
times.
Fig 2.17: Schematic of Read Sense Circuit
2.3.4 Write driver
The write driver is enabled by a Write_enable line. The schematic is shown below.
Fig 2.18: Schematic of Write Circuit
20
2.3.5 The complete SRAM Array
Following is the schematic of 64x64 bit SRAM cell
Fig 2.19: Schematic of SRAM Array
2.3.6 SRAM Array with Decoder
Here is the schematic of the complete SRAM with DECODER,
Fig 2.20: Schematic of SRAM Array with 6X64 Decoder
21
2.3.7 Read Stability
This is an important characteristic of the SRAM Cell. During a read-operation one of the
bitlines either BIT or BIT_B is discharged though the access transistor and an nfet of the
inverter. During this discharge process, a large amount of current flows through node A (
shown below). Read stability is a measure of the potential at node A, this potential should
not exceed the switching threshold of the other inverter. If it does then the state of the
SRAM has changed. An analogous analysis was done in identifying tradeoffs in Read
Current and Static Noise Margin.
Following is the READ STABILITY Graph.
Fig 2.21: Simulation of Read Stability
22
2.4 DC SIMULATIO�
2.4.1 STATIC �OISE MARGI�
Here is the schematic of the SRAM for Static Noise Margin Measurement. We sweep
the left voltage and measure the right voltage and do vice versa and find the min edge
of the max box that can fit into the butterfly curve.
Fig 2.22: Schematic of SRAM Array with 6X64 Decoder
(i) HOLD operation. We keep the gate of the pass transistors at GND and get
the following curve. The SNM for this is 0.4604.
Fig 2.23: Hold operation
23
(ii) READ - The SNM we got was 0.1616V. The graph is as follows.
Fig 2.24: Static Noise Margin estimation of SRAM Cell
2.4.2 Cell Read Current
Cell read current equals the current that flows through the pass gate nfet connected to
the BL draining charge on the BL into the cell ground terminal. The larger the current
the faster BL gets discharged and develops a signal for the sensing circuit to detect.
Having a very large Read Current flowing through the discharge path from bit line to
the ground could result in the exceeding the read stability threshold. This can be
avoided by optimally choosing the sizing of the access nfet and the discharge nfet of
the respective inverted during a read operation cycle.
24
Fig 2.25: Cell Read Current Simulation
2.4.3 Effect of Threshold Voltage (Vt)
We change Vt by 25mV, 50mV, 100mV and 200mV by adding a –ve voltage to the
gate and got following values.
Vt Pass nfet Pull down nfet Pfet
25mV 0.1638 0.1626 0.1518
50mV 0.1725 0.1655 0.1483
100mV 0.1900 0.1732 0.1422
200mV 0.2246 0.1778 0.1252
25
Fig 2.26 - Effect of SNM by increasing Vt at pass nfet
Fig 2.27- Effect of SNM on increasing Vt at pull down nfet
26
Fig 2.28- Effect of increasing Vt at one end of pfet and measuring other side.
27
3. LAYOUT
3.1 DECODER
3.1.1 A�D2 Gate.
Here is the layout of AND2 gate which passes both DRC and LVS
Fig 3.1- DRC and LVS results for AND2 Gate along with layout.
28
3.1.2 A�D3 Gate.
Here is the layout of AND3 gate which passes both DRC and LVS
Fig 3.2- DRC and LVS results for AND3 Gate along with layout.
29
3.1.3 3x8 DECODER
Here is the layout of 3x8 Decoder which passes both DRC and LVS
Fig 3.3- DRC and LVS results for 3x8 Decoder along with layout.
3.1.4 6x64 DECODER
Here is the layout of 3x8 Decoder which passes both DRC and LVS
30
Fig 3.4- DRC and LVS results for 6x64 Decoder along with layout.
3.2 SRAM
3.2.1 Precharge circuit layout
The width of the entire precharge circuit layout should be equal to the width between the
two bit lines BIT and BIT_B. Below is an image of our layout of this circuit with its DRC
and LVS results.
31
Fig 3.5- DRC and LVS results for Precharge Circuit along with layout
3.2.2 Read Sense Amp Circuit
In the layout of the read circuit, care has to be taken to ensure that it fits exactly in
between the two bitlines. The symmetric lateral reflection layout of the SRAM cells adds
some degree of complexity, this being due to the fact that now we would have a series of
BIT, BIT_B, BIT_B, BIT followed by the same pattern. For a read it is sufficient to sense
one of the bit lines, either BIT or BIT_B. Two read sense amps would have to be fit
between the two BIT lines. The LVS results and the layout of the Read Sense amp can be
found in the image below.
32
Fig 3.6 DRC and LVS results for Read Sense Amplifier along with layout
3.2.3 SRAM 64 X 64 array
Using the SRAM Cell provided from the standard library, we created a symmetrical and
laterally inverted 2 X 2 network of SRAM cells. This was done to achieve a good sharing
of the power rails and to reduce the bit line noise reduction. Though not done in our
layout cross coupling bit lines would reduce the bit line noise to a very good extent.
Using an instance of 2 X 2 SRAM cells the entire array of 64 X 32 top half and 64 X 32
bottom halves as shown in the schematic of phase two was laid out. Following this is the
insertion of the Read Sense Amplifiers in between the top half and bottom halves of the
33
entire SRAM array layout. To the left of the image below is the layout of the 2 X 2
network of SRAM cells and to the right the 64 X 64 layout of SRAM cells.
Fig 3.7- Array of SRAM Cells, 2 X 2 and 64 X 64 arrays.
Image below shows the DRC test results:
Fig 3.8: DRC results for the 64 X 64 SRAM array
34
Here’s the complete layout of SRAM cell with decoder.
Fig 3.9: 64 X 64 SRAM array along with 6x64 Decoder
35
4. RESULTS
4.1 Simulation Results
4.1.1 Simulation for One Cell SRAM
We simulated a single cell SRAM with following schematic
Fig 4.1 – One Cell SRAM Schematic
36
Below is a graph showing the Write – 1 Read – 1 Write – 0 simulation on a single SRAM
cell.
Fig 4.2 – One Cell SRAM Simulation
4.1.2 Simulation for 64x64 bit SRAM Array
Here is the schematic used for 64x64 bit SRAM Array
Fig 4.3 –64x64 SRAM Array
and here are the simulation results, when din<0> = 1, din<1> = 0, and din<2> = 1 with
address line as 000000, and clock running at 1 Ghz.
37
Fig 4.4 – Simulation for complete 64x64 SRAM cell Array
4.2 DRC and LVS Results
The DRC and LVS were checked for each component individually. The following is
a summary of the results:
Functional Component DRC LVS
6 X 64 Decoder Passed Passed
Precharge Passed Passed
Read Sense Amp Passed Passed
64 X 64 SRAM array Errors Errors
Please find all reports to these tests at the following location on
http://vlsi2.cisl.columbia.edu
/home/user5/fall07/ssn2111/LVS_FinalReports
/home/user5/fall07/ssn2111/DRC_FinalReports
38
5. CO�CLUSIO�
As a SRAM project for EE 4321 VLSI course, we designed 64x64 bit SRAM cell both at
the schematic and layout level. We attempted to design the 6x64 decoder using 3x8
decoder using two and three input AND gates using Domino Logic. We could
successfully simulate and verify the functionality of the components which we targeted to
design. Though we couldn’t successfully pass the DRC and LVS of entire unit because of
the primary reason that the unit cell being provided to us failed at DRC and LVS level,
but we could successfully pass the DRC and LVS of other individual components
including Pre-Charge, Read Sensing Circuit and 6x64 Decoder.
The experience on working for such a design oriented project gave us a thorough insight
what all critical issues we need to consider while designing a simple unit. This also made
us familiar with the different approaches to implement the same design and decide what
the tradeoffs between different alternatives are. Also, it made us aware of the critical
physical implementation issues which we not only have to consider during actual layout
but also during schematic level design. It also gave a hand-on experience upon CAD
tools like Cadence, Virtuoso, Spice and Spectre widely used both at industrial and
academic level for circuit designing. Overall, it was a nice experience both at learning,
practicing and designing a most critical part of processor unit widely used in any
Computer Architecture.
39
6. REFERE�CES
1. http://en.wikipedia.org/wiki/Static_random_access_memory
2. Cmos Logic – Uyemura
3. CMOS VLSI Design – Weste & Harris
4. Static-Noise Margin Analysis of CMOS SRAM Cells EVERT SEEVINCK,
SENIOR MEMBER, IEEE, FRANS J. LIST, AND JAN LOHSTROH,
MEMBER, IEEE.
5. Analyzing Static Noise Margin for Subthreshold SRAM in 65nm CMOS Benton
H. Calhoun and Anantha Chandrakasan
6. Transistor Sizing for Reliable Domino Logic Design in Dual Threshold Voltage
Technologies by Seong-Ook Jung, Ki-Wook Kim, Sung-Mo (Steve) Kang