+ All Categories
Home > Documents > Memory: Programmer’s Viewinside.mines.edu/~bwu/CSCI_564_15SPRING/slides/lec13_virtual.pdf · •...

Memory: Programmer’s Viewinside.mines.edu/~bwu/CSCI_564_15SPRING/slides/lec13_virtual.pdf · •...

Date post: 26-Aug-2018
Category:
Upload: phamdieu
View: 228 times
Download: 0 times
Share this document with a friend
45
Transcript

Memory: Programmer’s View

Memory

Store

Load

Ideal Memory

•  Zero access time (latency) •  Zero cost •  Infinite bandwidth (to support multiple

accesses in parallel) •  Infinite capacity

A Modern Memory Hierarchy Register  File  

32  words,  sub-­‐nsec      

L1  cache  ~32  KB,  ~nsec  

   

L2  cache  512  KB  ~  2MB,  many  nsec  

   

L3  cache,    .....    

Main  memory  (DRAM),    GB,  ~100  nsec  

   

Swap  Disk  100  GB,  ~10  msec  

manual/compiler  register  spilling  

automaLc  demand    paging  

AutomaLc  HW  cache  management  

Memory  AbstracLon  

A System with Physical Memory Only •  Examples:

–  Most Cray machines –  early PCs –  nearly all embedded systems

CPU’s load or store addresses used directly to access memory.

CPU

0:1:

N-1:

Memory

PhysicalAddresses

The Problem

•  Physical memory is of limited size (cost)–  What if you need more?–  Should the programmer be concerned about the size of code/

data blocks fitting physical memory? –  Should the programmer manage data movement from disk to

physical memory?

•  Also, ISA can have an address space greater than the physical memory size–  E.g., a 64-bit address space with byte addressability–  What if you do not have enough physical memory?

Basic Mechanism

•  Indirection

•  Address generated by each instruction in a program is a “virtual address” –  i.e., it is not the physical address used to address main memory –  called “linear address” in x86

•  An “address translation” mechanism maps this address to a “physical address” –  called “real address” in x86 –  Address translation mechanism is implemented in hardware and

software together

A System with Virtual Memory (page-based) •  Examples:

–  Laptops, servers, modern PCs

•  Address Translation: The hardware converts virtual addresses into physical addresses via an OS-managed lookup table (page table)

CPU

0:1:

N-1:

Memory

0:1:

P-1:

Page Table

Disk

VirtualAddresses Physical

Addresses

Virtual Pages, Physical Frames •  Virtual address space divided into pages

•  Physical address space divided into frames

•  A virtual page is mapped to a physical frame –  Assuming the page is in memory

•  If an accessed virtual page is not in memory, but on disk –  Virtual memory system brings the page into a physical frame and

adjusts the mapping à demand paging

•  Page table is the table that stores the mapping of virtual pages to physical frames

What do we need to support VM?

•  Virtual memory requires both HW+SW support •  The hardware component is called the MMU

–  Most of what’s been explained today is done by the MMU

•  It is the job of the software to leverage the MMU –  Populate page directories and page tables –  Modify the Page Directory Base Register on context switch –  Set correct permissions –  Handle page faults –  Etc.

Additional Jobs from the Software Side

•  Keeping track of which physical pages are free •  Allocating free physical pages to virtual pages •  Page replacement policy

–  When no physical pages are free, which should be swapped out?

•  Sharing pages between processes •  Copy-on-write optimization

Page Fault (“A miss in physical memory”) •  What if object is on disk rather than in memory?

–  Page table entry indicates virtual page not in memory à page fault exception

–  OS trap handler invoked to move data from disk into memory •  Current process suspends, others can resume •  OS has full control over placement

CPU

Memory

Page Table

Disk

VirtualAddresses Physical

Addresses

Before fault

CPU

Memory

Page Table

Disk

VirtualAddresses Physical

Addresses

After fault

Disk

Servicing a Page Fault

(1) Processor signals controller –  Read block of length P starting at

disk address X and store starting at memory address Y

(2) Read occurs –  Direct Memory Access (DMA) –  Under control of I/O controller

(3) Controller signals completion –  Interrupt processor –  OS resumes suspended process Disk

Memory-I/O bus

Processor

Cache

Memory I/O controller

Reg

(2) DMA Transfer

(1) Initiate Block Read

(3) Read Done

Page Swap •  Swapping

–  You are running many programs that require lots of memory

•  What happens if you try to run another program? –  Some physical pages are “swapped out” to disk –  The data in some physical pages are migrated to disk –  This frees up those physical pages –  As a result, their page table entires become invalid

•  When you access a physical page that has been swapped out, only then is it brought back into physical memory –  This may cause another physical page to be swapped out –  If this “ping-ponging” occurs frequently, it is called thrashing –  Extreme performance degradation

Address Translation

•  How to get the physical address from a virtual address?

•  Page size specified by the ISA –  Today: 4KB, 8KB, 2GB, … (small and large pages mixed

together)

•  Page Table contains an entry for each virtual page –  Called Page Table Entry (PTE) –  What is in a PTE?

Trade-Offs in Page Size

•  Large page size (e.g., 1GB) –  Pro: Fewer PTEs required -> Saves memory space –  Pro: Fewer TLB misses -> Improves performance –  Con: Large transfers to/from disk

•  Even when only 1KB is needed, 1GB must be transferred •  Waste of bandwidth/energy •  Reduces performance

–  Con: Internal fragmentation •  Even when only 1KB is needed, 1GB must be allocated •  Waste of space

–  Con: Cannot have fine-grained permissions

VM Address Translation •  Parameters

–  P = 2p = page size (bytes). –  N = 2n = Virtual-address limit –  M = 2m = Physical-address limit

virtual page number page offset virtual address

physical page number page offset physical address 0 p–1

address translation

p m–1

n–1 0 p–1 p

Page offset bits don’t change as a result of translation

VM Address Translation

virtual page number (VPN) page offset

virtual address

physical page number (PPN) page offset physical address

0 p–1 p m–1

n–1 0

p–1 p page table base register

if valid=0 then page not in memory (page fault)

valid physical page number (PPN)

VPN acts as table index

n  Separate (set of) page table(s) per process n  VPN forms index into page table (points to a page table entry) n  Page Table Entry (PTE) provides information about page

access

VM Address Translation: Page Hit

VM Address Translation: Page Fault

Issues •  How large is the page table?

•  Where do we store it? –  In hardware? –  In physical memory? –  In virtual memory?

•  How can we store it efficiently without requiring physical memory that can store all page tables? –  Idea: multi-level page tables –  Only the first-level page table has to be in physical memory –  Remaining levels are in virtual memory (but get cached in

physical memory when accessed)

Issue: Page Table Size

•  Suppose  64-­‐bit  VA  and  40-­‐bit  PA,  how  large  is  the  page  table?          252  entries  x  ~4  bytes  ≈  16x1015  Bytes              and  that  is  for  just  one  process!!?  

VPN   PO  

page  table  

concat   PA  

64-­‐bit  

12-­‐bit  52-­‐bit  

28-­‐bit   40-­‐bit  

Multi-Level Page Tables in x86

X86 PTE (4KB page)

Four-level Paging in x86

Two problems with Page Table •  Problem #1: Page table is too large

–  Page table has 1M entries –  Each entry is 4B –  Page table = 4MB (!!)

•  very expensive in the 80s

•  Solution: Multi-level page table

Two problems with Page Table •  Problem #1: Page table is too large

–  Page table has 1M entries –  Each entry is 4B –  Page table = 4MB (!!)

•  very expensive in the 80s

•  Problem #2: Page table is in memory –  Before every memory access, always fetch the PTE from the slow

memory?

Translation Lookaside Buffer (TLB) •  A hardware structure where PTEs are cached

•  Whenever a virtual address needs to be translated, the TLB is first searched: “hit” vs. “miss”

•  Example: 80386

–  32 entries in the TLB –  TLB entry: tag + data

•  Tag: 20-bit VPN + 4-bit flag •  Data: 20-bit PPN •  Q: Why is the tag needed?

Context Switches •  Assume that Process X is running

–  Process X’s VPN 5 is mapped to PPN 100 –  The TLB caches this mapping

•  VPN 5 à PPN 100

•  Now assume a context switch to Process Y –  Process Y’s VPN 5 is mapped to PPN 200 –  When Process Y tries to access VPN 5, it searches the TLB

•  Process Y finds an entry whose tag is 5 •  TLB hit! •  The PPN must be 100! •  … Are you sure?

Context Switches (cont’d) •  Approach #1. Flush the TLB

–  Whenever there is a context switch, flush the TLB •  All TLB entries are invalidated

–  Example: 80836 •  Updating the value of CR3 signals a context switch •  This automatically triggers a TLB flush

•  Approach #2. Associate TLB entries with processes –  All TLB entries have an extra field in the tag ...

•  That identifies the process to which it belongs

–  Invalidate only the entries belonging to the old process –  Example: Modern x86, MIPS

Handling TLB Misses

•  The TLB is small; It cannot hold all PTEs–  Unavoidably, you’ll have TLB misses–  When it happens, walk the page table to find the entry

•  Performance penalty

•  Who handles TLB misses?–  Hardware managed–  Software managed

Handling TLB Misses (cont’d) •  Approach #1. Hardware-Managed (e.g., x86)

–  The hardware does the page walk –  The hardware fetches the PTE and inserts it into the TLB

•  If the TLB is full, the entry replaces another entry

–  All of this is done transparently

•  Approach #2. Software-Managed (e.g., MIPS) –  The hardware raises an exception –  The operating system does the page walk –  The operating system fetches the PTE –  The operating system inserts/evicts entries in the TLB

Handling TLB Misses (cont’d) •  Hardware-Managed TLB

–  Pro: No exceptions. Instruction just stalls –  Pro: Independent instructions may continue –  Pro: Small footprint (no extra instructions/data) –  Con: Page directory/table organization is etched in stone

•  Software-Managed TLB –  Pro: The OS can design the page directory/table –  Pro: More advanced TLB replacement policy –  Con: Flushes pipeline –  Con: Performance overhead

Protection with Virtual Memory

•  A normal user process should not be able to: –  Read/write another process’ memory –  Write into shared library data

•  How does virtual memory help? –  Address space isolation –  Protection information in page table –  Efficient clearing of data on newly allocated pages

Protection: Leaked Information •  Example (with the virtual memory we’ve discussed so far):

–  Process A writes “my password = ...” to virtual address 2 –  OS maps virtual address 2 to physical page 4 in page table –  Process A no longer needs virtual address 2 –  OS unmaps virtual address 2 from physical page 4 in page table

•  Attack vector: –  Sneaky Process B continually allocates pages and searches for “my

password = <string>”

Page-Level Access Control (Protection)

•  Not every process is allowed to access every page –  E.g., may need supervisor level privilege to access system pages

•  Idea: Store access control information on a page basis in the process’s page table

•  Enforce access control at the same time as translation

à Virtual memory system serves two functions today Address translation (for illusion of large physical memory) Access control (protection)

Page Table is Per Process •  Each process has its own virtual address space

–  Full address space for each program –  Simplifies memory allocation, sharing, linking and

loading.

Virtual Address Space for Process 1:

Physical Address Space (DRAM) VP 1

VP 2 PP 2 Address

Translation

0

0

N-1

0

N-1 M-1

VP 1 VP 2

PP 7

PP 10

(e.g., read/only library code)

...

...

Virtual Address Space for Process 2:

VM as a Tool for Memory Access Protection

45

Page Tables

Process i:

Physical AddrRead? Write? PP 9Yes No

PP 4Yes Yes

XXXXXXX No No

VP 0:

VP 1:

VP 2:•••

•••

•••

Process j:

PP 0

Memory

Physical AddrRead? Write? PP 6Yes Yes

PP 9Yes No

XXXXXXX No No•••

•••

•••

VP 0:

VP 1:

VP 2:

PP 2

PP 4

PP 6

PP 8

PP 10

PP 12•••

n  Extend  Page  Table  Entries  (PTEs)  with  permission  bits  n  Page  fault  handler  checks  these  before  remapping  

q  If  violated,  generate  excepLon  (Access  ProtecLon  excepLon)  


Recommended