Download - Main Memory Background Random Access Memory (vs. Serial Access Memory) Cache uses SRAM: Static Random Access Memory –No refresh (6 transistors/bit vs.

Main Memory Background• Random Access Memory (vs. Serial Access Memory)• Cache uses SRAM: Static Random Access Memory

– No refresh (6 transistors/bit vs. 1 transistorSize: DRAMCost: DRAM Speed: SRAM

• Main Memory is DRAM: Dynamic Random Access Memory– Dynamic since needs to be refreshed periodically– Addresses divided into 2 halves (Memory as a 2D matrix):

• RAS or Row Access Strobe• CAS or Column Access Strobe

SRAM vs. DRAM

• DRAM = Dynamic RAM

• SRAM: 6T per bit– built with normal high-speed CMOS

technology

• DRAM: 1T per bit– built with special DRAM process

optimized for density

Hardware Structures

b b

SRAM

wordline

b

DRAM

wordline

DRAM Chip Organization

Row

Decoder

Sense Amps

Column Decoder

MemoryCell Array

Row Buffer

RowAddress

ColumnAddress

Data Bus

DRAM Chip Organization (2)

• Differences with SRAM• reads are destructive: contents are erased

after reading

– row buffer• read lots of bits all at once, and then parcel

them out based on different column addresses

– similar to reading a full cache line, but only accessing one word at a time

• “Fast-Page Mode” FPM DRAM organizes the DRAM row to contain bits for a complete page

– row address held constant, and then fast read from different locations from the same page

Row Buffer

Refresh

• So after a read, the contents of the DRAM cell are gone

• The values are stored in the row buffer• Write them back into the cells for the

next read in the future

Sense Amps

DRAM cells

Refresh (2)

• Fairly gradually, the DRAM cell will lose its contents even if it’s not accessed– This is why it’s called

“dynamic”– Contrast to SRAM which is

“static” in that once written, it maintains its value forever (so long as power remains on)

• All DRAM rows need to be regularly read and re-written

1

Gate Leakage

0

If it keeps its value even if power is

removed, then it’s “non-volatile” (e.g., flash, HDD, DVDs)

DRAM Read Timing

Accesses areasynchronous:

triggered by RAS andCAS signals, which

can in theory occur atarbitrary times (subject

to DRAM timingconstraints)

SDRAM Read Timing

Burst Length

Double-Data Rate (DDR) DRAMtransfers data on both rising and

falling edge of the clock

Timing figures taken from “A Performance Comparison of ContemporaryDRAM Architectures” by Cuppu, Jacob, Davis and Mudge

Command frequencydoes not change

Dynamic RAM

• SRAM cells exhibit high speed/poor density

• DRAM: simple transistor/capacitor pairs in high density form

Word Line

Bit Line

C

Sense Amp

.

.

.

Other Types of DRAM

• Synchronous DRAM (SDRAM): Ability to transfer a burst of data given a starting address and a burst length – suitable for transferring a block of data from main memory to cache.

• Page Mode DRAM: Access all bits on the same ROW – RAS keep active, Toggle CAS with new

column address

• Extended Data Output (EDO)– A new access cycle can be started while keeping the data output of

the previous cycle active.

• Rambus DRAM (RDRAM) - Uses pipelining to move data from RAM to cache memory.

Rambus (RDRAM)

• Synchronous interface• Row buffer cache

– last 4 rows accessed cached

• Uses other tricks since adopted by SDRAM– multiple data words per clock, high

frequencies

• Chips can self-refresh• Expensive for PC’s, used by X-Box, PS2

Faster DRAM Speed

• Clock FSB faster– DRAM chips may not be able to keep up

• Latency dominated by wire delay

– Bandwidth may be improved (DDR vs. regular) but latency doesn’t change much• Instead of 2 cycles for row access, may take 3

cycles at a faster bus speed

• Doesn’t address latency of the memory access

Memory Interleaving

Interleaved memory is a design made to compensate for the relatively slow speed of dynamic random-access memory (DRAM). Main memory divided into two or more sections. The CPU can access alternate sections immediately, without waiting for memory to catch up (through wait states).

Interleaved memory is more flexible than wide-access memory in

that it can handle multiple independent accesses at once.

Add- ress

Addresses that are 0 mod 4




Return data

Data in

Data out Dispatch

(based on 2 LSBs of address)

Bus cycle

Memory cycle

0

1

2

3

0

1

2

3

Module accessed

Time

Memory Interleaving cont.

• For example, in an interleaved system with two memory banks (assuming word-addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are n banks and memory location i resides in bank i mod n.

Latency

Width/Speed variesdepending on memory type

Significant wire delay just getting fromthe CPU to the memory controller

More wire delay gettingto the memory chips

(plus the return trip…)

So what do we do about it?

• Caching– reduces average memory instruction

latency by avoiding DRAM altogether

• Limitations– Capacity

• programs keep increasing in size

– Compulsory misses

Idea: Caching!

• Not caching of data, but caching of translations

0K

4K

8K

12K

VirtualAddresses

0K

4K

8K

12K

16K

20K

24K

28K

PhysicalAddresses

8 16

0 204 4

12 XVPN 8

PPN 16

Data movement in a memory hierarchy.

Memory Hierarchy: The Big Picture

Pages Lines

Words

Registers

Main memory

Cache

Virtual memory

(transferred explicitly

via load/store) (transferred automatically

upon cache miss) (transferred automatically

upon page fault)

Virtual Memory has own terminology

• Each process has its own private “virtual address space” (e.g., 232 Bytes); CPU actually generates “virtual addresses”

• Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory”

• Address translation: mapping virtual addresses to physical addresses

– Allows multiple programs to use (different chunks of physical) memory at same time

– Also allows some chunks of virtual memory to be represented on disk, not in main memory (to exploit memory hierarchy)

Virtual Memory

• Idea 1: Many Programs sharing DRAM Memory so that context switches can occur

• Idea 2: Allow program to be written without memory constraints – program can exceed the size of the main memory

• Idea 3: Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk.

• Virtual Memory:(1) DRAM Memory holds many programs

running at same time (processes)(2) use DRAM Memory as a kind of “cache”

for disk

Programmer’s View

• Example 32-bit memory– When programming,

you don’t care about how much real memory there is

– Even if you use a lot, memory can always be paged to disk

Kernel

Text

Data

Heap

Stack

0-2G

B

4GBAKA Virtual Addresses

Pages

• Memory is divided into pages, which are nothing more than fixed sized and aligned regions of memory– Typical size: 4KB/page (but not always)

0-4095

4096-8191

8192-12287

12288-16383

…

Page 0

Page 1

Page 2

Page 3

Mapping Virtual Memory to Physical Memory • Divide Memory into equal sized

“chunks” (say, 4KB each)

0

Physical Memory

Virtual Memory

Heap

64 MB

0

• Any chunk of Virtual Memory assigned to any chunk of Physical Memory (“page”)

Stack

Heap

Static

Code

SingleProcess

Page Table

• Map from virtual addresses to physical locations

0K

4K

8K

12K

VirtualAddresses

0K

4K

8K

12K

16K

20K

24K

28K

PhysicalAddresses

“Physical Location” mayinclude hard-disk

Page Table implementsthis VP mapping

Page Tables

0K

4K

8K

12K

0K

4K

8K

12K

16K

20K

24K

28K0K

4K

8K

12K

Physical Memory

Need for Translation

Virtual Address

Virtual Page Number Page Offset

PageTable

MainMemory

PhysicalAddress

0xFC51908B

0x001520xFC519

0x0015208B

Choosing a Page Size

• Page size inversely proportional to page table overhead

• Large page size permits more efficient transfer to/from disk– vs. many small transfers– Like downloading from Internet

• Small page leads to less fragmentation– Big page likely to have more bytes

unused

Translation Cache: TLB

• TLB = Translation Look-aside Buffer

TLBVirtualAddress

CacheData

PhysicalAddress

CacheTags

Hit?

If TLB hit, no need todo page table lookup

from memoryNote: data cache

accessed by physicaladdresses now

Impact on Performance?

• Every time you load/store, the CPU must perform two (or more) accesses!

• Even worse, every fetch requires translation of the PC!

• Observation:– Once a virtual page is mapped into a

physical page, it’ll likely stay put for quite some time