Main memory technology and organization, Virtual Memory ... · Main memory organization •...

Virtual Memory

Main memory technology and organization, Virtual Memory concept, Virtual-physical

translation, page table, TLB

Main memory: technology & organization

• Main memory:

– Storage for programs and data that are in use by a computer

– Typically in desktops and servers:» Volatile

• The hard disk is the non-volatile storage» Based on DRAM technology

– Embedded systems» No hard disks; memory itself is non-volatile (e.g.

Flash)

DRAM technology• Single-transistor memory cell

– Store a bit of information as charge (“1”) or no-charge (“0”) in a capacitor

– Volatile (turn it off, charge goes away)» Cell discharges even when powered on; needs

refreshing

Bit line

Word line

Memory chip organization

Notes• Row decoder selects word line

– Column decoder determines which bit line(s) are active; data in/out is driven through bit lines

• Memory chips multiplex address lines to reduce pin count

– Obtain row address first ; latch it within memory; then obtain column address

• Large DRAM chips divided into sub-arrays– Avoid RC delays of very long word/bit lines

Notes

DIMMs (dual in-line memory modules)

•Collection of DRAM chips (4-16) on a standard PCB

•64-bit datapath (64+8=72 with error correction code)

Main memory organization• First-order factors affecting miss penalty:

1. Time to arbitrate the memory bus, send address2. Latency to access memory and fetch word3. Transfer time to send word to cache

• A cache block contains multiple words

– Each word transferred adds to penalty– Need to avoid serialization

Example• 4 cycles for address; 56-cycle access time per

word; 4 cycles for word transfer– And a 4-word cache block

• If cache-memory bus is one word wide and memory is one word wide

– The 4 words are accessed in sequence– Penalty = 4*(4+56+4) = 256

Alternatives

Wide memory• Increase bus and memory widths

– E.g. to 4 words

• Single address now finds entire cache block in memory

– Single access cycle, single transfer– Miss penalty = 1*(4+56+4) = 64

• Drawbacks:– More interconnections, pins needed

Interleaved memory• Instead of single wide memory, multiple

(narrower) memories– E.g. 4 1-word memory “banks”

• Keep bus with same width

• Address is seen by all banks– Each access their word independently– Then words are transferred back to cache one at a time

• Parallelize address/access– Sequential transfer– Penalty = 4 + 56 + 4*4

RAMBUS, SDRAM, DDR

• Techniques to improve the interface of main memory chips

– These are still DRAMs with high density and slow access times

– The techniques focus on improving transfer rates (bandwidth)

SDRAM/DDR• Synchronous DRAM

– Add clock signal to memory interface to avoid synchronization overheads

– PC100, PC133, PC150: clock rates (MHz) of SDRAM memory chips

• DDR DRAM– Synchronous, and transfers on both edges of the

clock» Double data rate

RAMBUS• A memory system within a chip

– Supports interleaved accesses within internal memory banks of each chip

– Supports multiple outstanding transactions (in conjunction with a pipelined, or split-transaction bus)

– RAMBUS Inc does not fabricate memory chips; is licenses its technology to companies using its interface

RAMBUS - notes• Expensive because:

– Requires more complexity in the memory chip» And licensing fees

– Also requires more complexity in the interface» Bus, chipset

• Improves bandwidth, but at its core it is still DRAM

– Per-access latency is slow

Virtual memory

CPU Registers100s Bytes<10s ns

CacheK Bytes10-100 ns$.01-.001/bit

Main MemoryM Bytes100ns-1us$.01-.001

DiskG Bytesms10 - 10 cents-3 -4

CapacityAccess TimeCost

Tapeinfinitesec-min10-6

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1-8 bytes

cache cntl8-128 bytes

OS512-4K bytes

user/operatorMbytes

Upper Level

Lower Level

faster

Larger

Memory addressing - physical

• So far we assumed addresses of LD/SDs go directly to caches/memory

• Complex to manage if a computer is multi-processed/multi-user

– Multiple users want to share same (physical) main memory

• Limits addressing space of programs to physical main memory available

Example• How do you assign addresses within a program so

that you know other users/programs will not conflict with them?

• Program A: Program B:SD 0x00000100,1 SD 0x00000100,5

LD R1,0x00000100

R1=?0x00000100

Main memory

Memory addressing - virtual

Program A: Program B:SD 0x00000100,1 SD 0x00000100,5

LD R1,0x00000100

Translation A: Translation B:0x00000100 -> 0x40000100 0x00000100 -> 0x50000100

Virtual memory• Three main goals:

– Allow efficient sharing of physical memory among multiple processes/users

– Allow address spaces that are larger than physical memory

» Use hard disk storage as main memory» In a way that is user-transparent

• Unlike earlier “overlay” techniques

– Allow user-transparent relocation» Previous example

Virtual MemoryProvides illusion of very large memory– sum of the memory of many jobs greater than physical memory– address space of each job larger than physical memory

Simplifies memory management and programming

Exploits memory hierarchy to keep average access time low.

Involves at least two storage levels: main and secondary

Main (DRAM): nanoseconds, M/GBytesSecondary (HD): miliseconds, G/TBytes

Virtual Address -- address used by the programmer

Memory Address -- address of word in physical memory also known as “physical address” or “real address”

Basic Issues in VM DesignTransfer unit from disk/memory: pages

virtual and physical address space partitionedinto blocks of equal size (typically few Kbytes)

Missing item fetched from secondary memory only onthe occurrence of a page fault

Page frames

Address translation

Example


LD R1,0x00000100


Address translation0x00000 100

0x00000 -> 0x40000

0x40000100

Protection• In addition to address mapping,

protection/state bits are added to page table

– E.g: valid (V), user-readable (R), user-writable (R/W), executable (X)

– More later

Address Mapping AlgorithmLookup table for VA; if an entry exists for VA, and it is valid

then page is in main memory at frame address stored in tableelse address located page in secondary memory

Access RightsR = Read-only, R/W = read/write, X = execute only

If kind of access not compatible with specified access rights,then protection_violation_fault

If valid bit not set then page fault

Protection Fault: access rights violation; causes trap to hardware,microcode, or software fault handler

Page Fault: page not resident in physical memory, also causes a trap;

usually accompanied by a context switch: current processsuspended while page is fetched from secondary storage

4 Q’s of virtual memory• Q1: Where can a block be placed in main memory?

– Disks are orders of magnitude slower than main memory– Need to reduce occurrence of misses as much as possible– Also, placement controlled by software (operating

system), not hardware

⇒Fully associative (page can be placed in any page frame in memory)

4 Qs

• Q2: How is a block found in main memory?– Via page table and concatenation of offset

• Q3: Which block should be replaced on a virtual memory miss?

– Goal: minimizing occurrence of misses (page faults)– Least-recently used

• Q4: What happens on a write?– Write-back instead of write-through

» With dirty bits

Page tables and processes• A process in typical operating systems has a

context that includes:– The values of all CPU registers (including PC)– The page table

• Virtual-physical address translations (page tables): per-process basis

Example


LD R1,0x00000100


PT1

0x00000100 0x40000100

PT2

0x500001000x00000100

Page table structures

Example:32-bit virtual address, physical address4KByte pagehow large a page table?

Page table sizes• 4KBytes – 12 bits (offset)

– Index into page table: 20 bits– Each entry: 20 bits + valid/protection/etc

» Let us assume 4 bytes for simplicity

• Total size:– 2^20 * 4 = 4MB– One per process!

» Typical Unix machine has dozens of processes• Hundreds of MB just for page tables?

Dealing with page table sizes• One solution:

– Increase page sizes» Other problems arise

• Larger block sizes -> more conflicts, larger page fault penalties

• Internal fragmentation

• Other approaches– Change the way page table itself is structured

» Inverted page tables» Multi-level page tables

Virtual Addresses and Caches

CPU Trans-lation Cache Main

Memory

VA PA miss

hitdata

It takes (at least) one extra memory access to translate VA to PA

Must access page table, which, itself, is stored in main memory

Fast translation techniques• If not done carefully, translation can yield poor

performance– One (or more) extra memory accesses for page table for

every memory reference» A single memory access is already very slow if misses

the cache

• Once again, exploit locality– Maintain a cache of recent translations – a translation

look-aside buffer (TLB)» Smaller and faster than L1 cache

TLBs

Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped

TLBs are usually small, typically not more than 128 - 256 entries.This permits fully associative lookups.

CPU TLBLookup Cache Main

Memory

VA PA miss

hit

data

Trans-lation

hit

missTranslationwith a TLB

Reducing Translation Time

• Machines with TLBs go one step further to reduce cache access time

• May overlap the cache access with the TLB access– Virtually-indexed, physically tagged

• Or, index cache with virtual address and keep VA tags

– Virtually-addressed caches

VA-addressed, PA-tagged

TLB Cache

10 200

4 bytes

index 1 K

page # offset20 12

assoclookup32

PA Hit/Miss PA Data Valid

=

Use only the offset part of the virtual address to index the cacheoffset is independent from translation… access can occur in parallel

Cache block tag and TLB translation (physical addresses) are compared to determine hit/miss

Limitations•Overlapped access only works as long as the address bits used toindex into the cache do not change as the result of VA translation

•This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache

Example: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K:

11 200

virt page # offset20 12

cache index

This bit is changedby VA translation, butis needed for cachelookup

ExampleExample: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K:

11 200

virt page # offset20 12

cache index

This bit is changedby VA translation, butis needed for cachelookup

Solutions:go to 8K byte page sizesgo to 2 way set associative cache (would allow you to continue to

use a 10 bit index)

1K4 4

102 way set assoc cache

VA-addressed, VA-tagged

• An alternative is to index the cache with a virtual address

– And also store tags of virtual (not physical) addresses for tag comparison

– “virtual caches”, “virtually addressed caches”

• Must watch out for multi-programming issues– Key issue: unlike PAs, VAs are not unique and are mapped

on per-process basis

Example


LD R1,0x00000100


If cache uses physical tags (e.g. tag = 5 MSB bytes):LD R1 will compare 0x50000 with the tag stored in cache;if cache has value stored by program A, tags won’t match

If cache uses virtual tags:LD R1 will compare 0x00000 with the tag stored in cache;if not careful, LD may result in R1=1

Virtual caches & processes

• A simple solution:– Flush the entire virtual cache contents on an O/S context

switch» “Brute force” guarantee that cache always has data

relative to a single process» Negative impact on performance; flushing can get rid of

data that will be needed by the processor in the near future

• Alternative:– Add a process identifier field to each block

Flushing vs. storing PIDs

Additional issues – virtual caches

• Protection– Must be checked in every access– Protection bits must be present in virtual cache

• Aliasing– Programs may map different VAs into same PA– Example: shared code pages, shared memory– Make sure all aliases map into same cache block

» Otherwise changes to aliased address will not be seen by other processes

» Not a problem with physically-indexed caches; all aliased VAs map into a single PA, i.e. a single location in cache

Handling Protection• Physical main memory is shared by multiple

processes– Via the virtual memory abstraction

• But a process/user does not want other processes/users accessing their data

– Unless explicitly permitted– Users expect this level of protection from others;

must be implemented by hardware, software, or both

Protection example• Example: O/S and two processes A, B

– Time-sharing; O/S lets A use CPU for some time, then switches context to B

» Without removing all pages used by A from physical memory

– If B can change its own page table entries, it can map an address from its virtual address space to a physical address in use by A

» May load/store A’s data

VM protection• Key ideas:

– Enforce protection at granularity of a page– Before accessing any physical page, check its protection

» As part of translation process– Implement at least two levels: kernel (supervisor;

privileged to O/S), user» Setup of protection bits done by privileged software

(kernel)

Page Tables

• With kernel/user modes, the O/S can protect the page tables:

– Place tables in memory locations only available to kernel mode

» Ensure users cannot overwrite translations

• Once page tables are protected by the kernel:– O/S can guarantee each page of a process maps to a

distinct memory page– Processes are protected from one another by having

their own page tables

Date post:	19-Jul-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Main memory technology and organization, Virtual Memory ... · Main memory organization •...

Documents