Main Memory Background• Random Access Memory (vs. Serial Access Memory)• Cache uses SRAM: Static Random Access Memory
– No refresh (6 transistors/bit vs. 1 transistorSize: DRAMCost: DRAM Speed: SRAM
• Main Memory is DRAM: Dynamic Random Access Memory– Dynamic since needs to be refreshed periodically– Addresses divided into 2 halves (Memory as a 2D matrix):
• RAS or Row Access Strobe• CAS or Column Access Strobe
SRAM vs. DRAM
• DRAM = Dynamic RAM
• SRAM: 6T per bit– built with normal high-speed CMOS
technology
• DRAM: 1T per bit– built with special DRAM process
optimized for density
Hardware Structures
b b
SRAM
wordline
b
DRAM
wordline
DRAM Chip Organization
Row
Decoder
Sense Amps
Column Decoder
MemoryCell Array
Row Buffer
RowAddress
ColumnAddress
Data Bus
DRAM Chip Organization (2)
• Differences with SRAM• reads are destructive: contents are erased
after reading
– row buffer• read lots of bits all at once, and then parcel
them out based on different column addresses
– similar to reading a full cache line, but only accessing one word at a time
• “Fast-Page Mode” FPM DRAM organizes the DRAM row to contain bits for a complete page
– row address held constant, and then fast read from different locations from the same page
Row Buffer
Refresh
• So after a read, the contents of the DRAM cell are gone
• The values are stored in the row buffer• Write them back into the cells for the
next read in the future
Sense Amps
DRAM cells
Refresh (2)
• Fairly gradually, the DRAM cell will lose its contents even if it’s not accessed– This is why it’s called
“dynamic”– Contrast to SRAM which is
“static” in that once written, it maintains its value forever (so long as power remains on)
• All DRAM rows need to be regularly read and re-written
1
Gate Leakage
0
If it keeps its value even if power is
removed, then it’s “non-volatile” (e.g., flash, HDD, DVDs)
DRAM Read Timing
Accesses areasynchronous:
triggered by RAS andCAS signals, which
can in theory occur atarbitrary times (subject
to DRAM timingconstraints)
SDRAM Read Timing
Burst Length
Double-Data Rate (DDR) DRAMtransfers data on both rising and
falling edge of the clock
Timing figures taken from “A Performance Comparison of ContemporaryDRAM Architectures” by Cuppu, Jacob, Davis and Mudge
Command frequencydoes not change
Dynamic RAM
• SRAM cells exhibit high speed/poor density
• DRAM: simple transistor/capacitor pairs in high density form
Word Line
Bit Line
C
Sense Amp
.
.
.
Other Types of DRAM
• Synchronous DRAM (SDRAM): Ability to transfer a burst of data given a starting address and a burst length – suitable for transferring a block of data from main memory to cache.
• Page Mode DRAM: Access all bits on the same ROW – RAS keep active, Toggle CAS with new
column address
• Extended Data Output (EDO)– A new access cycle can be started while keeping the data output of
the previous cycle active.
• Rambus DRAM (RDRAM) - Uses pipelining to move data from RAM to cache memory.
Rambus (RDRAM)
• Synchronous interface• Row buffer cache
– last 4 rows accessed cached
• Uses other tricks since adopted by SDRAM– multiple data words per clock, high
frequencies
• Chips can self-refresh• Expensive for PC’s, used by X-Box, PS2
Faster DRAM Speed
• Clock FSB faster– DRAM chips may not be able to keep up
• Latency dominated by wire delay
– Bandwidth may be improved (DDR vs. regular) but latency doesn’t change much• Instead of 2 cycles for row access, may take 3
cycles at a faster bus speed
• Doesn’t address latency of the memory access
Memory Interleaving
Interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM). Main memory divided into two or more sections. The CPU can access alternate sections immediately, without waiting for memory to catch up (through wait states).
Interleaved memory is more flexible than wide-access memory in
that it can handle multiple independent accesses at once.
Add- ress
Addresses that are 0 mod 4
Addresses that are 2 mod 4
Addresses that are 1 mod 4
Addresses that are 3 mod 4
Return data
Data in
Data out Dispatch
(based on 2 LSBs of address)
Bus cycle
Memory cycle
0
1
2
3
0
1
2
3
Module accessed
Time
Memory Interleaving cont.
• For example, in an interleaved system with two memory banks (assuming word-addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are n banks and memory location i resides in bank i mod n.
Latency
Width/Speed variesdepending on memory type
Significant wire delay just getting fromthe CPU to the memory controller
More wire delay gettingto the memory chips
(plus the return trip…)
So what do we do about it?
• Caching– reduces average memory instruction
latency by avoiding DRAM altogether
• Limitations– Capacity
• programs keep increasing in size
– Compulsory misses
Idea: Caching!
• Not caching of data, but caching of translations
0K
4K
8K
12K
VirtualAddresses
0K
4K
8K
12K
16K
20K
24K
28K
PhysicalAddresses
8 16
0 204 4
12 XVPN 8
PPN 16
Data movement in a memory hierarchy.
Memory Hierarchy: The Big Picture
Pages Lines
Words
Registers
Main memory
Cache
Virtual memory
(transferred explicitly
via load/store) (transferred automatically
upon cache miss) (transferred automatically
upon page fault)
Virtual Memory has own terminology
• Each process has its own private “virtual address space” (e.g., 232 Bytes); CPU actually generates “virtual addresses”
• Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory”
• Address translation: mapping virtual addresses to physical addresses
– Allows multiple programs to use (different chunks of physical) memory at same time
– Also allows some chunks of virtual memory to be represented on disk, not in main memory (to exploit memory hierarchy)
Virtual Memory
• Idea 1: Many Programs sharing DRAM Memory so that context switches can occur
• Idea 2: Allow program to be written without memory constraints – program can exceed the size of the main memory
• Idea 3: Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk.
• Virtual Memory:(1) DRAM Memory holds many programs
running at same time (processes)(2) use DRAM Memory as a kind of “cache”
for disk
Programmer’s View
• Example 32-bit memory– When programming,
you don’t care about how much real memory there is
– Even if you use a lot, memory can always be paged to disk
Kernel
Text
Data
Heap
Stack
0-2G
B
4GBAKA Virtual Addresses
Pages
• Memory is divided into pages, which are nothing more than fixed sized and aligned regions of memory– Typical size: 4KB/page (but not always)
0-4095
4096-8191
8192-12287
12288-16383
…
Page 0
Page 1
Page 2
Page 3
Mapping Virtual Memory to Physical Memory • Divide Memory into equal sized
“chunks” (say, 4KB each)
0
Physical Memory
Virtual Memory
Heap
64 MB
0
• Any chunk of Virtual Memory assigned to any chunk of Physical Memory (“page”)
Stack
Heap
Static
Code
SingleProcess
Page Table
• Map from virtual addresses to physical locations
0K
4K
8K
12K
VirtualAddresses
0K
4K
8K
12K
16K
20K
24K
28K
PhysicalAddresses
“Physical Location” mayinclude hard-disk
Page Table implementsthis VP mapping
Page Tables
0K
4K
8K
12K
0K
4K
8K
12K
16K
20K
24K
28K0K
4K
8K
12K
Physical Memory
Need for Translation
Virtual Address
Virtual Page Number Page Offset
PageTable
MainMemory
PhysicalAddress
0xFC51908B
0x001520xFC519
0x0015208B
Choosing a Page Size
• Page size inversely proportional to page table overhead
• Large page size permits more efficient transfer to/from disk– vs. many small transfers– Like downloading from Internet
• Small page leads to less fragmentation– Big page likely to have more bytes
unused
Translation Cache: TLB
• TLB = Translation Look-aside Buffer
TLBVirtualAddress
CacheData
PhysicalAddress
CacheTags
Hit?
If TLB hit, no need todo page table lookup
from memoryNote: data cache
accessed by physicaladdresses now
Impact on Performance?
• Every time you load/store, the CPU must perform two (or more) accesses!
• Even worse, every fetch requires translation of the PC!
• Observation:– Once a virtual page is mapped into a
physical page, it’ll likely stay put for quite some time