Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | geraldine-barber |
View: | 213 times |
Download: | 0 times |
Operating Systems & Memory Systems: Address Translation
Computer Science 220
ECE 252
Professor Alvin R. Lebeck
Fall 2006
CPS 220 2© Alvin R. Lebeck 2001
Outline
• Finish Main Memory
• Address Translation– basics
– 64-bit Address Space
• Managing memory
• OS Performance
Throughout
• Review Computer Architecture
• Interaction with Architectural Decisions
CPS 220 3© Alvin R. Lebeck 2001
Fast Memory Systems: DRAM specific
• Multiple RAS accesses: several names (page mode)– 64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns
• New DRAMs to address gap; what will they cost, will they survive?
– Synchronous DRAM: Provide a clock signal to DRAM, transfer synchronous to system clock
– RAMBUS: reinvent DRAM interface (Intel will use it)
» Each Chip a module vs. slice of memory
» Short bus between CPU and chips
» Does own refresh
» Variable amount of data returned
» 1 byte / 2 ns (500 MB/s per chip)
– Cached DRAM (CDRAM): Keep entire row in SRAM
CPS 220 4© Alvin R. Lebeck 2001
Main Memory Summary
• Big DRAM + Small SRAM = Cost Effective– Cray C-90 uses all SRAM (how many sold?)
• Wider Memory
• Interleaved Memory: for sequential or independent accesses
• Avoiding bank conflicts: SW & HW
• DRAM specific optimizations: page mode & Specialty DRAM, CDRAM
– Niche memory or main memory?
» e.g., Video RAM for frame buffers, DRAM + fast serial output
• IRAM: Do you know what it is?
CPS 220 5© Alvin R. Lebeck 2001
Review: Reducing Miss Penalty Summary
• Five techniques– Read priority over write on miss
– Subblock placement
– Early Restart and Critical Word First on miss
– Non-blocking Caches (Hit Under Miss)
– Second Level Cache
• Can be applied recursively to Multilevel Caches– Danger is that time to DRAM will grow with multiple levels in
between
CPS 220 6© Alvin R. Lebeck 2001
Review: Improving Cache Performance
1. Reduce the miss rate,
2. Reduce the miss penalty, or
3. Reduce the time to hit in the cache
CPS 220 7© Alvin R. Lebeck 2001
Review: Cache Optimization Summary
Technique MR MP HT Complexity
Larger Block Size + – 0Higher Associativity + – 1Victim Caches + 2Pseudo-Associative Caches + 2HW Prefetching of Instr/Data + 2Compiler Controlled Prefetching + 3Compiler Reduce Misses + 0
Priority to Read Misses + 1Subblock Placement + + 1Early Restart & Critical Word 1st + 2Non-Blocking Caches + 3Second Level Caches + 2
Small & Simple Caches – + 0Avoiding Address Translation + 2Pipelining Writes + 1
CPS 220 8© Alvin R. Lebeck 2001
I/O Bus
Core Chip Set
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
System Organization
CPS 220 9© Alvin R. Lebeck 2001
Computer Architecture
• Interface Between Hardware and Software
Hardware
SoftwareOperatingSystem
Compiler
Applications
CPU Memory I/O
Multiprocessor Networks
This is IT
CPS 220 10© Alvin R. Lebeck 2001
Memory Hierarchy 101
P
$
Memory
Very fast <1ns clockMultiple Instructionsper cycle SRAM, Fast, Small
Expensive
DRAM, Slow, Big,Cheap(called physical or main)
=> Cost Effective Memory System (Price/Performance)
Magnetic, Really Slow,Really Big, Really Cheap
CPS 220 11© Alvin R. Lebeck 2001
Virtual Memory: Motivation
• Process = Address Space + thread(s) of control
• Address space = PA– programmer controls
movement from disk
– protection?
– relocation?
• Linear Address space– larger than physical
address space
» 32, 64 bits v.s. 28-bit physical (256MB)
• Automatic management
Virtual
Physical
CPS 220 12© Alvin R. Lebeck 2001
Virtual Memory
• Process = virtual address space + thread(s) of control
• Translation– VA -> PA
– What physical address does virtual address A map to
– Is VA in physical memory?
• Protection (access control)– Do you have permission to access it?
CPS 220 13© Alvin R. Lebeck 2001
Virtual Memory: Questions
• How is data found if it is in physical memory?
• Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped
• What data should be replaced on a miss? (Take Compsci 210 …)
CPS 220 14© Alvin R. Lebeck 2001
Segmented Virtual Memory
• Virtual address (232, 264) to Physical Address mapping (230)
• Variable size, base + offset, contiguous in both VA and PA
Virtual
Physical0x1000
0x6000
0x9000
0x00000x1000
0x2000
0x11000
CPS 220 15© Alvin R. Lebeck 2001
Intel Pentium Segmentation
Seg Selector Offset
Logical Address
SegmentDescriptor
Global DescriptorTable (GDT)
Segment Base Address
Physical Address Space
CPS 220 16© Alvin R. Lebeck 2001
Pentium Segmention (Continued)
• Segment Descriptors– Local and Global
– base, limit, access rights
– Can define many
• Segment Registers– contain segment descriptors (faster than load from mem)
– Only 6
• Must load segment register with a valid entry before segment can be accessed
– generally managed by compiler, linker, not programmer
CPS 220 17© Alvin R. Lebeck 2001
Paged Virtual Memory
• Virtual address (232, 264) to Physical Address mapping (228)
– virtual page to physical page frame
• Fixed Size units for access control & translation
Virtual
Physical0x1000
0x6000
0x9000
0x00000x1000
0x2000
0x11000
Virtual page number Offset
CPS 220 18© Alvin R. Lebeck 2001
Page Table
• Kernel data structure (per process)
• Page Table Entry (PTE)– VA -> PA translations (if none page fault)
– access rights (Read, Write, Execute, User/Kernel, cached/uncached)
– reference, dirty bits
• Many designs– Linear, Forward mapped, Inverted, Hashed, Clustered
• Design Issues– support for aliasing (multiple VA to single PA)
– large virtual address space
– time to obtain translation
CPS 220 19© Alvin R. Lebeck 2001
Alpha VM Mapping (Forward Mapped)
• “64-bit” address divided into 3 segments
– seg0 (bit 63=0) user code/heap– seg1 (bit 63 = 1, 62 = 1) user stack– kseg (bit 63 = 1, 62 = 0)
kernel segment for OS
• Three level page table, each one page
– Alpha 21064 only 43 unique bits of VA– (future min page size up to 64KB => 55
bits of VA)
• PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit)
– What do you do for replacement?
2110
POL3L2L1
base+
10 10 13
+
+
phys pageframe number
seg 0/1
CPS 220 20© Alvin R. Lebeck 2001
Inverted Page Table (HP, IBM)
• One PTE per page frame
– only one VA per physical frame
• Must search for virtual address
• More difficult to support aliasing
• Force all sharing to use the same VA
Virtual page number Offset
VA PA,ST
Hash Anchor Table (HAT)
Inverted Page Table (IPT)
Hash
CPS 220 21© Alvin R. Lebeck 2001
Intel Pentium Segmentation + Paging
Seg Selector Offset
Logical Address
SegmentDescriptor
Global DescriptorTable (GDT)
Segment Base Address
Linear Address Space
PageDir
Physical Address Space
Dir OffsetTable
PageTable
CPS 220 22© Alvin R. Lebeck 2001
The Memory Management Unit (MMU)
• Input– virtual address
• Output– physical address
– access violation (exception, interrupts the processor)
• Access Violations– not present
– user v.s. kernel
– write
– read
– execute
CPS 220 23© Alvin R. Lebeck 2001
Translation Lookaside Buffers (TLB)
• Need to perform address translation on every memory reference
– 30% of instructions are memory references
– 4-way superscalar processor
– at least one memory reference per cycle
• Make Common Case Fast, others correct
• Throw HW at the problem
• Cache PTEs
CPS 220 24© Alvin R. Lebeck 2001
Fast Translation: Translation Buffer
• Cache of translated addresses
• Alpha 21164 TLB: 48 entry fully associative
Page Number
Pageoffset
. . . . . .
v r w tag phys frame
. . .
48:1 mux
1 2
. . .
483
4
CPS 220 25© Alvin R. Lebeck 2001
TLB Design
• Must be fast, not increase critical path
• Must achieve high hit ratio
• Generally small highly associative
• Mapping change– page removed from physical memory
– processor must invalidate the TLB entry
• PTE is per process entity– Multiple processes with same virtual addresses
– Context Switches?
• Flush TLB
• Add ASID (PID)– part of processor state, must be set on context switch
CPS 220 26© Alvin R. Lebeck 2001
Hardware Managed TLBs
• Hardware Handles TLB miss
• Dictates page table organization
• Compilicated state machine to “walk page table”
– Multiple levels for forward mapped
– Linked list for inverted
• Exception only if access violation
Control
Memory
TLB
CPU
CPS 220 27© Alvin R. Lebeck 2001
Software Managed TLBs
• Software Handles TLB miss
• Flexible page table organization
• Simple Hardware to detect Hit or Miss
• Exception if TLB miss or access violation
• Should you check for access violation on TLB miss?
Control
Memory
TLB
CPU
CPS 220 28© Alvin R. Lebeck 2001
Kernel
Mapping the Kernel
• Digital Unix Kseg– kseg (bit 63 = 1, 62 = 0)
• Kernel has direct access to physical memory
• One VA->PA mapping for entire Kernel
• Lock (pin) TLB entry– or special HW detection
UserStack
Kernel
User Code/Data
PhysicalMemory
0
264-1
CPS 220 29© Alvin R. Lebeck 2001
Considerations for Address Translation
Large virtual address space
• Can map more things– files
– frame buffers
– network interfaces
– memory from another workstation
• Sparse use of address space
• Page Table Design– space
– less locality => TLB misses
OS structure
• microkernel => more TLB misses
CPS 220 30© Alvin R. Lebeck 2001
Address Translation for Large Address Spaces
• Forward Mapped Page Table– grows with virtual address space
» worst case 100% overhead not likely
– TLB miss time: memory reference for each level
• Inverted Page Table– grows with physical address space
» independent of virtual address space usage
– TLB miss time: memory reference to HAT, IPT, list search
CPS 220 31© Alvin R. Lebeck 2001
Hashed Page Table (HP)
• Combine Hash Table and IPT [Huck96]
– can have more entries than physical page frames
• Must search for virtual address
• Easier to support aliasing than IPT
• Space– grows with physical space
• TLB miss– one less memory ref than
IPT
Virtual page number Offset
VA PA,ST
Hashed Page Table (HPT)Hash
CPS 220 32© Alvin R. Lebeck 2001
Clustered Page Table (SUN)
• Combine benefits of HPT and Linear [Talluri95]
• Store one base VPN (TAG) and several PPN values
– virtual page block number (VPBN)
– block offset
VPBN Offset
VPBNnext
PA0 attrib
Hash
Boff
VPBNnext
PA0 attrib
......
PA1 attribPA2 attribPA3 attrib
VPBNnext
PA0 attrib
VPBNnext
PA0 attrib
CPS 220 33© Alvin R. Lebeck 2001
Reducing TLB Miss Handling Time
• Problem– must walk Page Table on TLB miss
– usually incur cache misses
– big problem for IPC in microkernels
• Solution– build a small second-level cache in SW
– on TLB miss, first check SW cache
» use simple shift and mask index to hash table
CPS 220 34© Alvin R. Lebeck 2001
Cache Indexing
• Tag on each block– No need to check index or block offset
• Increasing associativity shrinks index, expands tag
Fully Associative: No indexDirect-Mapped: Large index
Block offset
Block Address
TAG Index
CPS 220 35© Alvin R. Lebeck 2001
Address Translation and Caches
• Where is the TLB wrt the cache?
• What are the consequences?
• Most of today’s systems have more than 1 cache– Digital 21164 has 3 levels
– 2 levels on chip (8KB-data,8KB-inst,96KB-unified)
– one level off chip (2-4MB)
• Does the OS need to worry about this?
Definition:
page coloring = careful selection of va->pa mapping
CPS 220 36© Alvin R. Lebeck 2001
TLBs and Caches
CPU
TLB
$
MEM
VA
PA
PA
ConventionalOrganization
CPU
$
TLB
MEM
VA
VA
PA
Virtually Addressed CacheTranslate only on miss
Alias (Synonym) Problem
CPU
$ TLB
MEM
VA
PATags
PA
Overlap $ accesswith VA translation:requires $ index to
remain invariantacross translation
VATags
L2 $
CPS 220 37© Alvin R. Lebeck 2001
Virtual Caches
• Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache
• Avoid address translation before accessing cache– faster hit time to cache
• Context Switches?– Just like the TLB (flush or pid)
– Cost is time to flush + “compulsory” misses from empty cache
– Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process
• I/O must interact with cache
CPS 220 38© Alvin R. Lebeck 2001
I/O Bus
Memory Bus
Processor
Cache
MainMemory
DiskController
Disk Disk
GraphicsController
NetworkInterface
Graphics Network
interrupts
I/O and Virtual Caches
I/O Bridge
VirtualCache
PhysicalAddresses
I/O is accomplishedwith physical addressesDMA• flush pages from cache• need pa->va reverse translation• coherent DMA
CPS 220 39© Alvin R. Lebeck 2001
Aliases and Virtual Caches
• aliases (sometimes called synonyms); Two different virtual addresses map to same physical address
• But, but... the virtual address is used to index the cache
• Could have data in two different locations in the cache
Kernel
UserStack
Kernel
User Code/Data
PhysicalMemory
0
264-1
CPS 220 40© Alvin R. Lebeck 2001
• If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag
• Limits cache to page size: what if want bigger caches and use same trick?
– Higher associativity
– Page coloring
Index with Physical Portion of Address
Page Address Page Offset
Address Tag Index Block Offset
CPS 220 41© Alvin R. Lebeck 2001
Page Coloring for Aliases
• HW that guarantees that every cache frame holds unique physical address
• OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame
– one form of page coloring
Page Address
Page Offset
Address Tag
Index
Block Offset
CPS 220 42© Alvin R. Lebeck 2001
Page Coloring to reduce misses
• Notion of bin– region of cache that may
contain cache blocks from a page
• Random vs careful mapping
• Selection of physical page frame dictates cache index
• Overall goal is to minimize cache misses
Cache Page frames
CPS 220 43© Alvin R. Lebeck 2001
Careful Page Mapping
[Kessler92, Bershad94]
• Select a page frame such that cache conflict misses are reduced
– only choose from available pages (no VM replacement induced)
• static– “smart” selection of page frame at page fault time
• dynamic– move pages around
CPS 220 44© Alvin R. Lebeck 2001
A Case for Large Pages
• Page table size is inversely proportional to the page size
– memory saved
• Fast cache hit time easy when cache <= page size (VA caches);
– bigger page makes it feasible as cache size grows
• Transferring larger pages to or from secondary storage, possibly over a network, is more efficient
• Number of TLB entries are restricted by clock cycle time,
– larger page size maps more memory
– reduces TLB misses
CPS 220 45© Alvin R. Lebeck 2001
A Case for Small Pages
• Fragmentation– large pages can waste storage
– data must be contiguous within page
• Quicker process start for small processes(??)
CPS 220 46© Alvin R. Lebeck 2001
Superpages
• Hybrid solution: multiple page sizes– 8KB, 16KB, 32KB, 64KB pages
– 4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages
• Need to identify candidate superpages– Kernel
– Frame buffers
– Database buffer pools
• Application/compiler hints
• Detecting superpages– static, at page fault time
– dynamically create superpages
• Page Table & TLB modifications
CPS 220 47© Alvin R. Lebeck 2001
Page Coloring
• Make physical index match virtual index
• Behaves like virtual index cache– no conflicts for sequential pages
• Possibly many conflicts between processes– address spaces all have same structure (stack, code, heap)
– modify to xor PID with address (MIPS used variant of this)
• Simple implementation
• Pick abitrary page if necessary
CPS 220 48© Alvin R. Lebeck 2001
Bin Hopping
• Allocate sequentially mapped pages (time) to sequential bins (space)
• Can exploit temporal locality– pages mapped close in time will be accessed close in time
• Search from last allocated bin until bin with available page frame
• Separate search list per process
• Simple implementation
CPS 220 49© Alvin R. Lebeck 2001
Best Bin
• Keep track of two counters per bin– used: # of pages allocated to this bin for this address space
– free: # of available pages in the system for this bin
• Bin selection is based on low values of used and high values of free
• Low used value– reduce conflicts within the address space
• High free value– reduce conflicts between address spaces
CPS 220 50© Alvin R. Lebeck 2001
Hierarchical
• Best bin could be linear in # of bins
• Build a tree– internal nodes contain sum of child <used,free> values
• Independent of cache size– simply stop at a particular level in the tree
CPS 220 51© Alvin R. Lebeck 2001
Benefit of Static Page Coloring
• Reduces cache misses by 10% to 20%
• Multiprogramming– want to distribute mapping to avoid inter-address space conflicts
CPS 220 52© Alvin R. Lebeck 2001
Dynamic Page Coloring
• Cache Miss Lookaside (CML) buffer [Bershad94]– proposed hardware device
• Monitor # of misses per page
• If # of misses >> # of cache blocks in page– must be conflict misses
– interrupt processor
– move a page (recolor)
• Cost of moving page << benefit