Virtual Memory
Main memory technology and organization, Virtual Memory concept, Virtual-physical
translation, page table, TLB
Main memory: technology & organization
• Main memory:
– Storage for programs and data that are in use by a computer
– Typically in desktops and servers:» Volatile
• The hard disk is the non-volatile storage» Based on DRAM technology
– Embedded systems» No hard disks; memory itself is non-volatile (e.g.
Flash)
DRAM technology• Single-transistor memory cell
– Store a bit of information as charge (“1”) or no-charge (“0”) in a capacitor
– Volatile (turn it off, charge goes away)» Cell discharges even when powered on; needs
refreshing
Bit line
Word line
Memory chip organization
Notes• Row decoder selects word line
– Column decoder determines which bit line(s) are active; data in/out is driven through bit lines
• Memory chips multiplex address lines to reduce pin count
– Obtain row address first ; latch it within memory; then obtain column address
• Large DRAM chips divided into sub-arrays– Avoid RC delays of very long word/bit lines
Notes
DIMMs (dual in-line memory modules)
•Collection of DRAM chips (4-16) on a standard PCB
•64-bit datapath (64+8=72 with error correction code)
Main memory organization• First-order factors affecting miss penalty:
1. Time to arbitrate the memory bus, send address2. Latency to access memory and fetch word3. Transfer time to send word to cache
• A cache block contains multiple words
– Each word transferred adds to penalty– Need to avoid serialization
Example• 4 cycles for address; 56-cycle access time per
word; 4 cycles for word transfer– And a 4-word cache block
• If cache-memory bus is one word wide and memory is one word wide
– The 4 words are accessed in sequence– Penalty = 4*(4+56+4) = 256
Alternatives
Wide memory• Increase bus and memory widths
– E.g. to 4 words
• Single address now finds entire cache block in memory
– Single access cycle, single transfer– Miss penalty = 1*(4+56+4) = 64
• Drawbacks:– More interconnections, pins needed
Interleaved memory• Instead of single wide memory, multiple
(narrower) memories– E.g. 4 1-word memory “banks”
• Keep bus with same width
• Address is seen by all banks– Each access their word independently– Then words are transferred back to cache one at a time
• Parallelize address/access– Sequential transfer– Penalty = 4 + 56 + 4*4
RAMBUS, SDRAM, DDR
• Techniques to improve the interface of main memory chips
– These are still DRAMs with high density and slow access times
– The techniques focus on improving transfer rates (bandwidth)
SDRAM/DDR• Synchronous DRAM
– Add clock signal to memory interface to avoid synchronization overheads
– PC100, PC133, PC150: clock rates (MHz) of SDRAM memory chips
• DDR DRAM– Synchronous, and transfers on both edges of the
clock» Double data rate
RAMBUS• A memory system within a chip
– Supports interleaved accesses within internal memory banks of each chip
– Supports multiple outstanding transactions (in conjunction with a pipelined, or split-transaction bus)
– RAMBUS Inc does not fabricate memory chips; is licenses its technology to companies using its interface
RAMBUS - notes• Expensive because:
– Requires more complexity in the memory chip» And licensing fees
– Also requires more complexity in the interface» Bus, chipset
• Improves bandwidth, but at its core it is still DRAM
– Per-access latency is slow
Virtual memory
CPU Registers100s Bytes<10s ns
CacheK Bytes10-100 ns$.01-.001/bit
Main MemoryM Bytes100ns-1us$.01-.001
DiskG Bytesms10 - 10 cents-3 -4
CapacityAccess TimeCost
Tapeinfinitesec-min10-6
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
StagingXfer Unit
prog./compiler1-8 bytes
cache cntl8-128 bytes
OS512-4K bytes
user/operatorMbytes
Upper Level
Lower Level
faster
Larger
Memory addressing - physical
• So far we assumed addresses of LD/SDs go directly to caches/memory
• Complex to manage if a computer is multi-processed/multi-user
– Multiple users want to share same (physical) main memory
• Limits addressing space of programs to physical main memory available
Example• How do you assign addresses within a program so
that you know other users/programs will not conflict with them?
• Program A: Program B:SD 0x00000100,1 SD 0x00000100,5
LD R1,0x00000100
R1=?0x00000100
Main memory
Memory addressing - virtual
Program A: Program B:SD 0x00000100,1 SD 0x00000100,5
LD R1,0x00000100
Translation A: Translation B:0x00000100 -> 0x40000100 0x00000100 -> 0x50000100
Virtual memory• Three main goals:
– Allow efficient sharing of physical memory among multiple processes/users
– Allow address spaces that are larger than physical memory
» Use hard disk storage as main memory» In a way that is user-transparent
• Unlike earlier “overlay” techniques
– Allow user-transparent relocation» Previous example
Virtual MemoryProvides illusion of very large memory– sum of the memory of many jobs greater than physical memory– address space of each job larger than physical memory
Simplifies memory management and programming
Exploits memory hierarchy to keep average access time low.
Involves at least two storage levels: main and secondary
Main (DRAM): nanoseconds, M/GBytesSecondary (HD): miliseconds, G/TBytes
Virtual Address -- address used by the programmer
Memory Address -- address of word in physical memory also known as “physical address” or “real address”
Basic Issues in VM DesignTransfer unit from disk/memory: pages
virtual and physical address space partitionedinto blocks of equal size (typically few Kbytes)
Missing item fetched from secondary memory only onthe occurrence of a page fault
Page frames
Address translation
Example
Program A: Program B:SD 0x00000100,1 SD 0x00000100,5
LD R1,0x00000100
Translation A: Translation B:0x00000100 -> 0x40000100 0x00000100 -> 0x50000100
Address translation0x00000 100
0x00000 -> 0x40000
0x40000100
Protection• In addition to address mapping,
protection/state bits are added to page table
– E.g: valid (V), user-readable (R), user-writable (R/W), executable (X)
– More later
Address Mapping AlgorithmLookup table for VA; if an entry exists for VA, and it is valid
then page is in main memory at frame address stored in tableelse address located page in secondary memory
Access RightsR = Read-only, R/W = read/write, X = execute only
If kind of access not compatible with specified access rights,then protection_violation_fault
If valid bit not set then page fault
Protection Fault: access rights violation; causes trap to hardware,microcode, or software fault handler
Page Fault: page not resident in physical memory, also causes a trap;
usually accompanied by a context switch: current processsuspended while page is fetched from secondary storage
4 Q’s of virtual memory• Q1: Where can a block be placed in main memory?
– Disks are orders of magnitude slower than main memory– Need to reduce occurrence of misses as much as possible– Also, placement controlled by software (operating
system), not hardware
⇒Fully associative (page can be placed in any page frame in memory)
4 Qs
• Q2: How is a block found in main memory?– Via page table and concatenation of offset
• Q3: Which block should be replaced on a virtual memory miss?
– Goal: minimizing occurrence of misses (page faults)– Least-recently used
• Q4: What happens on a write?– Write-back instead of write-through
» With dirty bits
Page tables and processes• A process in typical operating systems has a
context that includes:– The values of all CPU registers (including PC)– The page table
• Virtual-physical address translations (page tables): per-process basis
Example
Program A: Program B:SD 0x00000100,1 SD 0x00000100,5
LD R1,0x00000100
Translation A: Translation B:0x00000100 -> 0x40000100 0x00000100 -> 0x50000100
PT1
0x00000100 0x40000100
PT2
0x500001000x00000100
Page table structures
Example:32-bit virtual address, physical address4KByte pagehow large a page table?
Page table sizes• 4KBytes – 12 bits (offset)
– Index into page table: 20 bits– Each entry: 20 bits + valid/protection/etc
» Let us assume 4 bytes for simplicity
• Total size:– 2^20 * 4 = 4MB– One per process!
» Typical Unix machine has dozens of processes• Hundreds of MB just for page tables?
Dealing with page table sizes• One solution:
– Increase page sizes» Other problems arise
• Larger block sizes -> more conflicts, larger page fault penalties
• Internal fragmentation
• Other approaches– Change the way page table itself is structured
» Inverted page tables» Multi-level page tables
Virtual Addresses and Caches
CPU Trans-lation Cache Main
Memory
VA PA miss
hitdata
It takes (at least) one extra memory access to translate VA to PA
Must access page table, which, itself, is stored in main memory
Fast translation techniques• If not done carefully, translation can yield poor
performance– One (or more) extra memory accesses for page table for
every memory reference» A single memory access is already very slow if misses
the cache
• Once again, exploit locality– Maintain a cache of recent translations – a translation
look-aside buffer (TLB)» Smaller and faster than L1 cache
TLBs
Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped
TLBs are usually small, typically not more than 128 - 256 entries.This permits fully associative lookups.
CPU TLBLookup Cache Main
Memory
VA PA miss
hit
data
Trans-lation
hit
missTranslationwith a TLB
Reducing Translation Time
• Machines with TLBs go one step further to reduce cache access time
• May overlap the cache access with the TLB access– Virtually-indexed, physically tagged
• Or, index cache with virtual address and keep VA tags
– Virtually-addressed caches
VA-addressed, PA-tagged
TLB Cache
10 200
4 bytes
index 1 K
page # offset20 12
assoclookup32
PA Hit/Miss PA Data Valid
=
Use only the offset part of the virtual address to index the cacheoffset is independent from translation… access can occur in parallel
Cache block tag and TLB translation (physical addresses) are compared to determine hit/miss
Limitations•Overlapped access only works as long as the address bits used toindex into the cache do not change as the result of VA translation
•This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache
Example: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K:
11 200
virt page # offset20 12
cache index
This bit is changedby VA translation, butis needed for cachelookup
ExampleExample: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K:
11 200
virt page # offset20 12
cache index
This bit is changedby VA translation, butis needed for cachelookup
Solutions:go to 8K byte page sizesgo to 2 way set associative cache (would allow you to continue to
use a 10 bit index)
1K4 4
102 way set assoc cache
VA-addressed, VA-tagged
• An alternative is to index the cache with a virtual address
– And also store tags of virtual (not physical) addresses for tag comparison
– “virtual caches”, “virtually addressed caches”
• Must watch out for multi-programming issues– Key issue: unlike PAs, VAs are not unique and are mapped
on per-process basis
Example
Program A: Program B:SD 0x00000100,1 SD 0x00000100,5
LD R1,0x00000100
Translation A: Translation B:0x00000100 -> 0x40000100 0x00000100 -> 0x50000100
If cache uses physical tags (e.g. tag = 5 MSB bytes):LD R1 will compare 0x50000 with the tag stored in cache;if cache has value stored by program A, tags won’t match
If cache uses virtual tags:LD R1 will compare 0x00000 with the tag stored in cache;if not careful, LD may result in R1=1
Virtual caches & processes
• A simple solution:– Flush the entire virtual cache contents on an O/S context
switch» “Brute force” guarantee that cache always has data
relative to a single process» Negative impact on performance; flushing can get rid of
data that will be needed by the processor in the near future
• Alternative:– Add a process identifier field to each block
Flushing vs. storing PIDs
Additional issues – virtual caches
• Protection– Must be checked in every access– Protection bits must be present in virtual cache
• Aliasing– Programs may map different VAs into same PA– Example: shared code pages, shared memory– Make sure all aliases map into same cache block
» Otherwise changes to aliased address will not be seen by other processes
» Not a problem with physically-indexed caches; all aliased VAs map into a single PA, i.e. a single location in cache
Handling Protection• Physical main memory is shared by multiple
processes– Via the virtual memory abstraction
• But a process/user does not want other processes/users accessing their data
– Unless explicitly permitted– Users expect this level of protection from others;
must be implemented by hardware, software, or both
Protection example• Example: O/S and two processes A, B
– Time-sharing; O/S lets A use CPU for some time, then switches context to B
» Without removing all pages used by A from physical memory
– If B can change its own page table entries, it can map an address from its virtual address space to a physical address in use by A
» May load/store A’s data
VM protection• Key ideas:
– Enforce protection at granularity of a page– Before accessing any physical page, check its protection
» As part of translation process– Implement at least two levels: kernel (supervisor;
privileged to O/S), user» Setup of protection bits done by privileged software
(kernel)
Page Tables
• With kernel/user modes, the O/S can protect the page tables:
– Place tables in memory locations only available to kernel mode
» Ensure users cannot overwrite translations
• Once page tables are protected by the kernel:– O/S can guarantee each page of a process maps to a
distinct memory page– Processes are protected from one another by having
their own page tables