+ All Categories
Home > Documents > Memory Hierarchy

Memory Hierarchy

Date post: 20-Apr-2017
Category:
Upload: ashhbam
View: 223 times
Download: 2 times
Share this document with a friend
25
1 Memory Technologies In principle, there are Static, Dynamic, and Non-volatile Memory technologies. Non Volatile Memory What all Non-volatile Memories have in common is that they retain their contents, i.e., the stored data, even if the power supply is switched off. They are randomly accessible and the memory cannot be changed (there are a few caveats to this statement). ROM Bipolar MOS Mask PROM’S mask PROM’s EPROM’s ROM’s ROM’s EEPROM’s Flash ROM or mask programmable ROM This is a semiconductor type of read-only memory whose stored information is programmed during the manufacturing process. In essence this type of ROM is an array of possible storage cells with a final layer of metal forming the interconnections to determine which of the cells will hold 0’s and is. Although the contents may be read they may not be written. It is programmed by the manufacturer at the time of production using a specially constructed template. All masked ROMs are therefore very similar, since most of their layers are the same with the differences only being introduced in the final layer metal mask. Masked ROMs are typically manufactured in high volumes in order to minimise costs. They can cram a lot of data onto a relatively small chip area; unfortunately, because the information is programmed by the design of a mask which is used during the manufacturing process, it cannot be erased. Also, as the final metal layer is deposited at the chip factory, any repair of a masked ROM requires a long turnaround time. The process of repair consists of identifying the error(s), notifying the chip firm, and altering the mask before finally fabricating new chips. Any detected bug also has the unwanted effect of leaving the user with lots of worthless parts. Most commentators usually refer to mask—programmable ROMs as simply ROM’s. PROM — A programmable read—only memory (PROM) is part of the read—only memory (ROM) family. It is a programmable chip which once written cannot be erased and rewritten, i.e. it is a read only after written memory. To implant programs, or data, into a PROM a programming machine (called a PROM programmer) is used to apply the correct voltage for the proper time to the appropriate addresses selected by the
Transcript
Page 1: Memory Hierarchy

1

Memory Technologies In principle, there are Static, Dynamic, and Non-volatile Memory technologies. Non Volatile Memory What all Non-volatile Memories have in common is that they retain their contents, i.e., the stored data, even if the power supply is switched off. They are randomly accessible and the memory cannot be changed (there are a few caveats to this statement). ROM Bipolar MOS Mask PROM’S mask PROM’s EPROM’s ROM’s ROM’s EEPROM’s Flash ROM or mask programmable ROM This is a semiconductor type of read-only memory whose stored information is programmed during the manufacturing process. In essence this type of ROM is an array of possible storage cells with a final layer of metal forming the interconnections to determine which of the cells will hold 0’s and is. Although the contents may be read they may not be written. It is programmed by the manufacturer at the time of production using a specially constructed template. All masked ROMs are therefore very similar, since most of their layers are the same with the differences only being introduced in the final layer metal mask. Masked ROMs are typically manufactured in high volumes in order to minimise costs. They can cram a lot of data onto a relatively small chip area; unfortunately, because the information is programmed by the design of a mask which is used during the manufacturing process, it cannot be erased. Also, as the final metal layer is deposited at the chip factory, any repair of a masked ROM requires a long turnaround time. The process of repair consists of identifying the error(s), notifying the chip firm, and altering the mask before finally fabricating new chips. Any detected bug also has the unwanted effect of leaving the user with lots of worthless parts. Most commentators usually refer to mask—programmable ROMs as simply ROM’s.

PROM — A programmable read—only memory (PROM) is part of the read—only memory (ROM) family. It is a programmable chip which once written cannot be erased and rewritten, i.e. it is a read only after written memory. To implant programs, or data, into a PROM a programming machine (called a PROM programmer) is used to apply the correct voltage for the proper time to the appropriate addresses selected by the

Page 2: Memory Hierarchy

2

programmer. As the PROM is simply an array of fusible links the programming machine essentially blows the various unwanted links within the PROM leaving the correct data patterns, a process which clearly cannot be reversed. Like the ROM, the PROM is normally used as the component within a computer used to carry any permanent instructions that the system may require.

EPROM — An erasable programmable read—only memory (EPROM) is a special form of semiconductor read only memory that can be completely erased by exposure to ultraviolet light. The device is programmed in a similar way to the programmable read—only memory (PROM); however, it does not depend on a permanent fusible link to store information, but instead relies on charges stored on capacitors in the memory array. The capacitors determine the on/off state of transistors, which in turn determine the presence of is or 0’s in the array.

The EPROM is so arranged that the information programmed into it can be erased, if required, by exposing the top surface of the package to ultraviolet radiation. This brings about an ionizing action within the package, which causes each memory cell to be discharged. EPROMs are easily identified physically by the clear window that covers the chip to admit the ultraviolet light. Once an EPROM has been erased, it can be reprogrammed with the matrix being used again to store new information. The user can then completely erase and reprogram the contents of the memory as many times as desired.

Intel first introduced the EPROM in 1971; however, the storage capacity has increased dramatically with improving IC technology. Current EPROMs can store multiple megabytes of information.

EEPROM — An electrically erasable programmable ROM (EEPROM) is a closely related device to the erasable programmable ROM (EPROM) in that it is programmed in a similar way, but the program is erased not with ultraviolet light but by the use of electricity. Erasure of the device is achieved by applying a strong current pulse, which removes the entire program, thus leaving the device ready to be reprogrammed. The voltages necessary to erase the EEPROM can either be applied to the device outside or (more often) from within the host system, thereby allowing systems to be reprogrammed regularly without disturbing the EEPROM chips. In this way electrical eras ability does yield certain benefits; however, this comes at the cost of fewer memory cells per chip and lower density, than on a standard ROM or EPROM. Flash Memory A characteristic of Flash Memories is that individual bytes can be addressed and read out, whereas write and delete processes operate on blocks of addresses at a time. Read access times, currently about l00ns, are about double those of Dynamic Memories. The number of programming and delete cycles is limited to about 100,000. In general, the retention of data is guaranteed over a period of 10 years. Among the various forms of Flash Memories available are SIMM, PC Card (PCMCIA), Compact Flash (CF) Card, Miniature Card (MC), and Solid State Floppy Disc Card (SSFDC). Over and above their exterior appearance, there are two main types of Flash Memory modules:

Page 3: Memory Hierarchy

3

Linear Flash and ATA Flash. Linear Flash modules have a linear’ address space and any address can be directly accessed from outside. On the other hand, for the ATA Flash cards address conversion takes place internally, so that the addressing procedure is similar to that of a disk drive, a fact that may for instance simplify driver programming. Examples of the application of Flash modules are mass or program memories in notebooks, network router, printers, PDAs, and digital cameras. RAM (Random Access Memory) Static RAM This memory is based on transistor technology, and does not require refreshing. It is random access and is volatile i.e. it loses its data if the power is removed. It consumes more power (thus generates more heat) than the dynamic type, and is significantly faster. It is often used in high speed computers or as cache memory. Another disadvantage is that the technology uses more silicon space per storage cell than dynamic memory, thus chip capacities are a lot less than dynamic chips. Access times of less than 15ns are currently available whereas dynamic RAM has access times of greater than 30ns.

Dynamic Random Access Memory (DRAM) Basic DRAM operation In order to store lots of small things we can divide the storage space up into small bins and stick one item in each bin. If each item we're storing is unique and we're ever going to be able to retrieve a specific item, we need an organisational scheme to order the storage space. Sticking a unique, numerical address on each bin is the normal approach. The addresses will start at some number and increment by one for each bin. If we wanted to search the entire storage space, we'd start with the lowest address and step through each successive one until we get to the higher address. Now, once we've got the storage space organised properly, we'll need a way to get the items into and out of it. For RAM storage, the data bus is what allows us to move stuff into and out of storage. And of course, since the storage space is organised, we need a way to tell the RAM exactly which location contains the exact data that we need; this is the job of the address bus. To the CPU, the RAM looks like one long, thin line of storage cells, each with a unique address. If the CPU wants a piece of data from RAM, it first places the address of the location on the address bus. It then waits a few cycles and listens on the data bus for the requested information to show up.

Page 4: Memory Hierarchy

4

Fig 1 Simple Model of DRAM The round dots in the middle are memory cells, and each one is hooked into a unique address line. The address decoder takes the address off of the address bus and identifies which cell the address is referring to. It then activates that cell, and the data in it drops down into the data interface where it's placed on the data bus and sent back to the CPU. The CPU sees those cells as a row of addressable storage spaces that hold 1 byte each, so it understands memory as a row of bytes. The CPU usually grabs data in 32-bit or 64-bit chunks, depending on the width of the data bus. So if the data bus is 64-bits wide and the CPU needs one particular byte, it'll go ahead and grab the byte it needs along with the 7 bytes that are next to it. It grabs 8 bytes at a time because: a) it wants to fill up the entire data bus with data every time it makes a request and b) it'll probably end up need those other 7 bytes shortly. Memory designers organise the cells in a grid of individual bits and split up the address into rows and columns, which can be used to locate the individual bits needed. This way, if you wanted to store, say, 1024 bits, you can use a 32 x 32 grid to do so. RAM chips don't store whole bytes, but rather they store individual bits in a grid, which can be addressed one bit at a time. When the CPU requests an individual bit it would place an address in the form of a string of 22 binary digits (for the x86) on the address bus. The RAM interface would then break that string of numbers in half, and use one half as an 11 digit row address and one half as an 11 digit column address. The row decoder would decode the row address and activate the proper row line so that all the cells on that row become active. Then the column decoder would decode the column address and activate the proper column line, selecting

Page 5: Memory Hierarchy

5

which particular cell on the active row is going to have its data sent back out over the data bus by the data interface. Also, note that the grid does not have to be square, and is usually a rectangle where the number of rows is less than the number of columns

Figure 2 Row and Column Addressing The cells are comprised of capacitors and are addressed via row and column decoders, which in turn receive their signals from the RAS and CAS clock generators. In order to minimise the package size, the row and column addresses are multiplexed into row and column address buffers. For example, if there are 11 address lines, there will be 11 row and 11 column address buffers. Access transistors called ‘sense amps’ are connected to the each column and provide the read and restore operations of the chip. DRAM Read

Page 6: Memory Hierarchy

6

1) The row address is placed on the address pins via the address bus. 2) The /RAS pin is activated, which places the row address onto the Row Address Latch. 3) The Row Address Decoder selects the proper row to be sent to the sense amps 4) The Write Enable (not pictured) is deactivated, so the DRAM knows that it's not being written to. 5) The column address is placed on the address pins via the address bus. 6) The /CAS pin is activated, which places the column address on the Column Address Latch. 7) The /CAS pin also serves as the Output Enable, so once the /CAS signal has stabilised the sense amps place the data from the selected row and column on the Data Out pin so that it can travel the data bus back out into the system. 8) /RAS and /CAS are both deactivated so that the cycle can begin again.

Figure 3 DRAM Read One of the problems with DRAM cells is that they leak their charges out over time, so that charge has to be. Reading from or writing to a DRAM cell refreshes its charge, so the

Page 7: Memory Hierarchy

7

most common way of refreshing a DRAM is to read periodically from each cell. This isn't quite as bad as it sounds for a couple of reasons. First, you can sort of cheat by only activating each row using /RAS, which is how refreshing is normally done. Second, the DRAM controller takes care of scheduling the refreshes and making sure that they don't interfere with regular reads and writes. So to keep the data in DRAM chip from leaking away the DRAM controller periodically sweeps through all of the rows by cycling RAS repeatedly and placing a series of row addresses on the address bus. A RAM grid is always organised as a rectangle, and not a perfect square. With DRAMs, it is advantageous to have fewer rows and more columns because the fewer rows you have, the less time it takes to refresh all the rows. Even though the DRAM controller handles all the refreshes and tries to schedule them for maximum performance, having to go through and refresh each row every few milliseconds can seriously get in the way of reads and writes and thus impact the performance of DRAM. EDO, Fast Page, and the various other types of DRAM are mainly distinguished by the ways in which they try to get around this potential bottleneck. Each of the cells in an SRAM or DRAM chip traps only a 1 or a 0. Also, the early DRAM and SRAM chips only had one Data In and one Data Out pin apiece. Now, the CPU actually sees main memory as a long row of 1-byte cells, not 1-bit cells. Therefore to store a complete byte just stack eight, 1-bit RAM chips together, and have each chip store one bit of the final byte. This involves feeding the same address to all eight chips, and having each chip's 1-bit output go to one line of the data bus. The following diagram should help you visualise the layout. (To save space, a 4-bit configuration is shown, this could be extended to eight bits by just adding four more chips and four more data bus lines. Just imagine that the picture below is twice as wide, and that there are eight chips on the module instead of four.

Page 8: Memory Hierarchy

8

Figure 4 DRAM Organisation By combining eight chips on one printed circuit board (PCB) with a common address and data bus would make an 8-bit RAM module. In the above picture, it is assumed that the address bus is 22 bits wide and the data bus is 8 bits wide. This means that each single chip in the module holds 222 or 4194304 bits. When the eight chips are put together on the module, with each of their 1-bit outputs connected to a single line of the 8-bit data bus, the module appears to the CPU to hold 4194304 cells of 8 bits (1 byte) each (or as a 4MB chip). So the CPU asks the module for data in 1 byte chunks from one of the 4194304 virtual 8-bit locations. In RAM notation, we say that this 4MB module is a 4194304 x 8 module (or alternatively, a 4M x 8 module. Note that the M in 4M is not equal to MB or megabyte, but to Mb or megabit.)

Page 9: Memory Hierarchy

9

Figure 5 SIMM Module The CPU likes to fill up its entire 32-bit or 64-bit data bus when it fetches data. So, instead of stacking the outputs of multiple chips together on one module, the outputs of multiple modules are stacked together into one RAM bank, Figure 6 shows you one bank of four, 8-bit modules. Assume that each chip in each module is a 4194304 x 1 chip, making each module a 4194304 x 8 (4 MB) module. The following bank then, with the 8-bit data buses from each module combined gives a bus width of 32 bits.

Page 10: Memory Hierarchy

10

Figure 6 RAM Bank The 16MB of memory that the above bank represents is broken up between the modules so that each module stores every fourth byte. So, module 1 stores byte 1, module 2 stores byte 2, module 3 stores byte 3, module 4 stores byte 4,

Page 11: Memory Hierarchy

11

module 1 stores byte 5, module 2 stores byte 6, and so on up to byte 16,777,216.This is done so that when the CPU needs a particular byte, it can not only grab the byte it needs but it can also put the rest of the adjacent bytes on the data bus, too, and bring them all in at the same time. To add memory to a system like this, you can do one of two things. The first option would be to increase the size of the bank by increasing the size of each individual module by the same amount. Say you wanted 32MB of memory; you'd increase the amount of storage on each module from 4MB to 8MB. The other option would be to add more banks. The example above shows what a RAM bank on some i486 systems would actually have looked like, with each of the modules being a 30-pin, single-sided SIMM. While 8 bits worth of data pins in a DRAM bank actually makes the memory organisation of a single SIMM a bit simpler and easier to understand, putting 16 or more bits worth of data pins on a single chip can actually make things more confusing.

The DIMM in Figure 7 is the Texas Instruments TM124BBJ32F. The TM124BBJ32F is a 4MB, 32-bit wide DRAM, which has only two RAM chips on it. This means that each chip is 16 bits wide and holds 2 MB. Externally, however, to the system as a whole, the module appears to be made up of four, 1M x 8-bit DRAM chips. Each of those 2M x 16-bit DIMMs is almost like a mini DRAM module, with an upper and lower half of 1M apiece, where each half has its own CAS and RAS signals.

Memory Latency

There are two main types of delays that we have to take into account. The first type includes the delays that have to take place between successive DRAM reads. You can't just fire off a read and then fire off another one immediately afterwards. Since a DRAM read involves charging and recharging capacitors, and various control signals have to propagate hither and thither so that the chip will know what it's doing, you have to stick some space in between reads so that all the signals can settle back down and the capacitors can recharge. Of this first type of in-between-reads delay, there's only one that's going to concern us really, and that's the /RAS and /CAS precharge delay. After /RAS has been active and you deactivate it, you've got to give it some time to charge back up before you can activate it again. Figure 8 shows this.

Figure 8 Asynchronous DRAM timing The same goes for the /CAS signal as well, and in fact to visualise the /CAS precharge delay just look at the above picture and replace the term RAS with CAS.

Page 12: Memory Hierarchy

12

The /RAS and /CAS precharge delays can be thought of in light of the list of DRAM read steps, it is this rest period which limits the number of reads that can occur in a given period of time. Specifically, step 8 dictates that you've got to deactivate /RAS and /CAS at the end of each cycle, so the fact that after you deactivate them you've got to wait for them to precharge before you can use them again means you have to wait a while in between reads (or writes, or refreshes, for that matter). This precharge time in between reads isn't the only thing that limits DRAM operations either. The other type of delay that concerns us is internal to a specific read. Just like the in-between-reads delay is associated with deactivating /RAS and /CAS, the inside-the-read delay is associated with activating /RAS and /CAS. For instance, the row access time (tRAC), is the minimum amount of time you have to wait between the moment you activate RAS and the moment the data you want can appear on the data bus. Likewise, the column access time (tCAC) is the minimum delay between the moment you activate /CAS and the moment the data can appear on the data bus. Think of tRAC and tCAC as the amount of time it takes the chip to fill an order you just placed at the drive-in window. You place your order (the row and column address of the data you want), and it has to go and fetch the data for you so it can place it on the data pins. Figure 8 shows how the two types of delays work

Figure 8 Row Access Time Figure 9 shows both types of delay in action in a series of DRAM reads.

Page 13: Memory Hierarchy

13

Figure 9 Complete DRAM timing diagram

Latency

There are two important types of latency ratings for DRAMs: access time and cycle time, where access time is related to the second type of delays we talked about (those internal to the read cycle) and cycle time is related to the first (those in between read cycles). Both ratings are given in nanoseconds.

For asynchronous DRAM chips, the access time describes the amount of time it takes in between when you drop that row address on the address pins and when you can expect the data to show up at the data pins. Going back to our drive-in analogy, the access time is the time in between when you place your order and when your food shows up at the window. So a DIMM with a 60ns latency takes at least 60ns to get your data to you after you've placed the row address (which is of course followed by the column address) on the pins. Cycle time is the amount of time you have to wait in between successive read operations. Minimising both cycle time and access time are what the next two types of DRAM. FPM DRAM. Fast Page Mode DRAM is so called because it squirts out data in 4-word bursts (a word is whatever the default memory chunk size is for the DRAM, usually a byte), where the four words in each burst all come from the same row, or page. For the read that fetches the first word of that four word burst, everything happens like a normal read--the row address is put on the address pins, /RAS goes active, the column address is put on the address pins, /CAS goes active, etc.. It's the next three successive reads that look kind of

Page 14: Memory Hierarchy

14

strange. At the end of that initial read, instead of deactivating /RAS and then reactivating it to take the next row address, the controller just leaves /RAS active for the next three reads. Since the four words all come from the same row but different columns, there's no need to keep sending in the same row address. The controller just leaves /RAS active so that to get the next three words all it has to do is send in three column addresses. To sum up, you give the FPM DRAM the row and column addresses of the initial word required, and then access three more words on that same row by simply providing three column addresses and activating /CAS three times for each new column.

Figure 10 FPM Timing It can be seen from figure 10, that FPM is faster than a regular read because it takes the delays associated with both /RAS (tRAC and the /RAS precharge) and the row address out of the equation for three of the four reads. All you have to deal with are /CAS-related delays for those last three reads, which makes for less overhead and faster access and cycle times. The first read takes a larger number of CPU cycles to complete (say, 6), and the next three take a smaller number of cycles (say, 3). For an FPM DRAM where the initial read takes 6 cycles and the successive three reads take 3 cycles. One important thing to notice in the FPM DRAM diagram is that you can't latch the column address for the next read until the data from the previous read is gone. Notice that the Column 2 block doesn't overlap with the Data 1 block, nor does the Column 3 block overlap with the Data 2 block, and so on. The output for one read has to be completely finished before the next read can be started by placing the column address on the bus, so there's a small delay imposed as depicted in figure 11.

Page 15: Memory Hierarchy

15

Figure 11 EDO RAM EDO RAM unlike FPM can start a read before the previous read's data is gone. With EDO DRAM, you can hold output data on the pins for longer, even if it means that the data from one read is on the pins at the same time that you're latching in the column address for the next read.

Figure 12

Page 16: Memory Hierarchy

16

A new access cycle can be started while keeping the data output of the previous cycle active. This allows a certain amount of overlap in operation (pipelining). When EDO first came out, there were claims of anywhere from 20% to 40% better performance. Since EDO can put out data faster than FPM, it can be used with faster bus speeds. With EDO, you could increase the bus speed up to 66MHz without having to insert wait states. SDRAM A major technology change occurred in around 1997, when SDRAM (Synchronous DRAM) first entered the marketplace. This was a completely new technology, which operates synchronously with the system bus.

Data can (in burst mode) be fetched on every clock pulse. Thus the module can operate fully synchronised with (at the same beat as) the bus – without so-called wait states (inactive clock pulses). Because they are linked synchronously to the system bus, SDRAM modules can run at much higher clock frequencies. Synchronous dynamic random access memory (SDRAM) is made up of multiple arrays of single-bit storage sites arranged in a two-dimensional lattice structure formed by the intersection of individual rows (Word Lines) and columns (Bit Lines). These grid-like structures, called banks, provide an expandable memory space allowing the host control process and other system components with direct access to main system memory to temporarily write and read data to and from a centralised storage location.

When associated in groups of two (DDR), four (DDR2) or eight (DDR3), these banks form the next higher logical unit, known as a rank.

Page 17: Memory Hierarchy

17

Figure 13 Figure 13 shows the typical functional arrangement of SDRAM memory space. It shows a dual-sided dual-rank 2GB SDRAM, which contains a total of 16 ICs, eight per side. Each IC contains eight banks of addressable memory space comprising 16K pages and 1K column address starting points with each column storing a single 8-bit word. This brings the total memory space to 128MB (16,384 rows/bank x 1,024 columns addresses/row x 1 byte/column address x 8 stacked banks) per IC. And since there are eight ICs per rank, Rank 1 is 1GB (128MB x 8 contiguous banks) in size, with the same for Rank 2, for a grand total of 2GB per module. In all the types of RAM covered so far the /RAS and /CAS have to be precharged before they can be used again after being deactivated. In an SDRAM module with two banks,

Page 18: Memory Hierarchy

18

you can have one bank busy precharging while the other bank is being used. This is known as interleaving. Interleaving allows banks of SDRAM to alternate their refresh and access cycles. One bank will undergo its refresh cycle while another is being accessed. This improves performance of the SDRAM by masking the refresh time of each bank. SDRAM_control Not only does an SDRAM's organisation into banks distinguish it from other types of DRAMs, but so does the way it's controlled. Since asynchronous DRAM doesn't share any sort of common clock signal with the CPU and chipset, the chipset has to manipulate the DRAM's control pins based on all sorts of timing considerations. SDRAM, however, shares the bus clock with the CPU and chipset, so the chipset can place commands (or, certain predefined combinations of signals) on its control pins on the rising clock edge.

Figure 14 Figure 14 shows the steps required, broken down by clock cycle. Clock 1: ACTIVATE the row by turning on /CS and /RAS. The row address is placed on the address bus to determine which row to activate.

Page 19: Memory Hierarchy

19

Clock 3: READ the column required from the activated row by turning on /CAS while placing the column's address on the address bus. Clocks 5-10: The data from the row and column that you gave the chip goes out onto the Data Bus, followed by a BURST of other columns, the order of which depends on which BURST MODE has been set. While asynchronous DRAM like EDO and FPM are designed to allow you to burst data onto the bus by keeping a row active and selecting only columns, SDRAM take this a step further by giving the facility to program a chip to deliver data bursts in predefined sequences. SDRAM CAS timing The last aspect of SDRAM that bears looking at is CAS latency. When looking at memory data sheets a number of numbers separated by dashes gives the latency of the device. e.g. the data sheet refers to: 9-9-9-24 (2T) for a memory chip. These refer to CAS-tRCD-tRP-tRAS and CMD(respectively) and these values are measured in clock cycles. To understand what these mean:

CAS Latency (1st number) Since data is often accessed sequentially (same row), the CPU only needs to select the next column in the row to get the next piece of data. In other words, CAS Latency is the delay between the CAS signal and the availability of valid data on the data pins. Therefore, the latency between column accesses (CAS), plays an important role in the performance of the memory. The lower the latency, the better the performance. However, the memory modules must be capable of supporting low latency settings. tRCD (2nd number) There is a delay from when a row is activated to when the cell (or column) is activated via the CAS signal and data can be written to or read from a memory cell. i.e RAS to CAS delay. This delay is called tRCD. When memory is accessed sequentially, the row is already active and tRCD will not have much impact. However, if memory is not accessed in a linear fashion, the current active row must be deactivated and then a new row selected/activated. It is this example where low tRCD's can improve performance. tRP (3rd number) tRP is the time required to switch between rows. Therefore, in conjunction with tRCD, the time required (or clock cycles required) to switch banks (or rows) and select the next cell for either reading, writing or refreshing is a combination of tRP and tRCD. tRAS (4th number)

Page 20: Memory Hierarchy

20

Memory architecture is like a spreadsheet with row upon row and column upon column with each row being 1 bank. In order for the CPU to access memory, it must first determine which Row or Bank in the memory that is to be accessed and activate that row via the RAS signal. Once activated, the row can be accessed over and over until the data is exhausted. This is why tRAS has little effect on overall system performance but could impact system stability if set incorrectly. Command Rate The Command Rate is the time needed between the chip select signal and the when commands can be issued to the RAM module IC. Typically, these are either 1 clock or 2.

Bank interleaving SDRAM divides memory into two to four banks for simultaneous access to more data. This division and simultaneous access is known as interleaving. Using a notebook analogy, two-way interleaving is like dividing each page in a notebook into two parts and having two assistants to each retrieve a different part of the page. Even though each assistant must take a break (be refreshed), breaks are staggered so that at least one assistant is working at all times. Therefore, they retrieve the data much faster than a single assistant could get the same data from one whole page, especially since no data can be accessed when a single assistant takes a break. In other words, while one memory bank is being accessed, the other bank remains ready to be accessed. This allows the processor to initiate a new memory access before the previous access completes and results in continuous data flow. In an interleaved memory system, there are still two physical banks of DRAM, but logically the system sees one bank of memory that is twice as large. In the interleaved bank, the first long word of bank 0 is followed by the first long word of bank 1, which is followed by the second long word of bank 0, which is followed by the second long word of bank 1, and so on. Figure 3 shows this organisation for two physical banks of N long words. All even long words of the logical bank are located in physical bank 0 and all odd long words are located in physical bank 1.

Page 21: Memory Hierarchy

21

DDR DRAM

Page 22: Memory Hierarchy

22

DDR DRAM is basically just a more advanced version of SDRAM, with an added twist at the data pins. Now SDRAM transfers its commands, addresses, and data on the rising edge of the clock. Like regular SDRAM, DDR DRAM transfers its commands and addresses on the rising edge of the clock, but unlike SDRAM it contains special circuitry behind its data pins that allows it to transfer data on both the rising and falling edges of the clock. So DDR can transfer two data words per clock cycle, as opposed to SDRAM's one word per clock cycle, effectively doubling the speed at which it can be read from or written to under optimal circumstances. Thus the DDR in DDR DRAM stands for Double Data Rate a name that it gets from this ability to transfer twice the data per clock as an SDRAM.

There are presently three generations of DDR memories:

1. DDR1 memory, with a maximum rated clock of 400 MHz and a 64-bit (8 bytes) data bus is now becoming obsolete and is not being produced in massive quantities.

2. DDR2 memory is the second generation in DDR memory. DDR2 starts with a speed of 400 MHz to the lowest, while the 400-MHz speed is actually the highest speed for DDR1. Therefore, DDR2 takes, where DDR1 leaves off. It's a bit strange, but due to the different latency 400MHz DDR1 will exceed one 400MHz DDR2, but the advantage back to> DDR2 when the speed is achieved, the next step is 532MHz, the DDR1 can not reach.

3. DDR3 is the third generation in DDR memory. DDR3 memory provides a reduction in power consumption of 30% compared to DDR2 modules due to DDR3's 1.5 V supply voltage, compared to DDR2's 1.8 V or DDR's 2.5 V. The main benefit of DDR3 comes from the higher bandwidth made possible by DDR3's 8-burst-deep prefetch buffer, in contrast to DDR2's 4-burst-deep or DDR's 2-burst-deep prefetch buffer.

DDR3 modules can transfer data at a rate of 800–2133 MT/s (Memory Transfers) using both rising and falling edges of a 400–1066MHz I/O clock.

Page 23: Memory Hierarchy

23

Memory Voltages The industry-standard operating voltage for computer memory components was originally 5 volts. However, as cell geometries decreased, memory circuitry became smaller and more sensitive. Likewise, the industry-standard operating voltage decreased. Today, computer memory components can operate as low 1.5 volts, which allows them to run faster and consume less power. Bandwidth The bandwidth capacity of the memory bus increases with its width (in bits) and its frequency (in MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz, SDRAM increases memory bandwidth to 800 MB/s, 50 percent more than EDO DRAMs (533 MB/s at 66 MHz). DIMM error detection/correction technologies Memory modules used in servers are inherently susceptible to memory errors. Memory cells must be continuously recharged (refreshed) to preserve the data. The operating voltage of the memory device determines the level of the electrical charge. However, if a capacitor’s charge is affected by some external event, the data may become incorrect. Depending on the cause, a memory error is referred to as either a hard or soft error. A hard error is caused by a broken or defective piece of hardware, so the device consistently returns incorrect results. For example, a memory cell may be stuck so that it always returns “0” bit, even when a “1” bit is written to it. Hard errors can be caused by DRAM defects, bad solder joints, connector issues, and other physical issues. Soft errors are more prevalent. They occur randomly when an electrical disturbance near a memory cell alters the charge on the capacitor. A soft error does not indicate a problem with a memory device because once the stored data is corrected (for example, by a write to a memory cell), the same error does not recur. Two trends increase the likelihood of memory errors in servers:

• expanding memory capacity and • increasing storage density.

Also two parameters of DRAM are inextricably tied together:

• the storage density of the DRAM chips and • the operating voltage of the memory system.

As the size of memory cells decreases, both DRAM storage density and the memory-cell voltage sensitivity increase. Initially, industry-standard DIMMs operated at 5 volts. However, due to improvements in DRAM storage density, the operating voltage decreased first to 3.3 V, then 2.5 V, and then 1.8 V to allow memory to run faster and consume less power. Because memory storage density is increasing and operating voltage is shrinking, there is a higher probability that an error may occur. Basic ECC Memory

Page 24: Memory Hierarchy

24

Parity checking detects only single-bit errors. It does not correct memory errors or detect multi-bit errors. Every time data is written to memory, ECC (Error Correction Codes) uses a special algorithm to generate values called check bits. The algorithm adds the check bits together to calculate a checksum, which it stores with the data. When data is read from memory, the algorithm recalculates the checksum and compares it with the checksum of the written data. If the checksums are equal, then the data is valid and operation continues. If they are different, the data has an error and the ECC memory logic isolates the error and reports it to the system. In the case of a single-bit error, the ECC memory logic can correct the error and output the corrected data so that the system continues to operate

In addition to detecting and correcting single-bit errors, ECC detects (but does not correct) errors of two random bits and up to four bits within a single DRAM chip.

Page 25: Memory Hierarchy

25


Recommended