Lecture 17 Slide 1EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
EECS 470Lecture 17Virtual Memory
Fall 2019
Prof. Ronald Dreslinski
http://www.eecs.umich.edu/courses/eecs470
Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.
Lecture 17 Slide 2EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
D$/TLB + SQ + LQLoad queue (LQ)
r In-flight load addressesr In program-order (like ROB,SQ)r Associatively searchabler Size heuristic: 20-30% of ROB
================
D$/TLB
head
tail
load queue(LQ)
address================
tail
head
age
store position flush?
SQ
Lecture 17 Slide 3EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Advanced Memory “Pipeline” (LQ Only)Loads
r Dispatch (D)m Allocate entry at LQ tail
r Execute (X)m Write address into corresponding LQ slot
Storesr Dispatch (D)
m Record current LQ tail as “store position” in RSr Execute (X)
m Where the good stuff happens
Lecture 17 Slide 4EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Detecting Memory Ordering ViolationsStore sends address to LQ
r Compare with all load addressesr Selecting matching addressesr Matching address?
m Load executed before storem Violationm Fix!
Age logic selects oldest load that is younger than store
r Use store positionr Processor flushes and restarts
================
D$/TLB
head
tail
load queue(LQ)
address================
tail
head
age
store position flush?
Lecture 17 Slide 5EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
R10K Data StructuresROBht # Insn T Told S X C
1 …2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c85 ldf A,c7,c116 ldf B,c8,c187 …
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no2 no3 no4 no5 no
Store Queueht # Addr v Value
1234
Load Queueht # Addr v ROB#
1234
Address, Cycle Dispatched, Cycle Operands Ready
Lecture 17 Slide 6EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 3ROBht # Insn T Told S X Cht 1 …
2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …
Store Queueht # Addr v Valueht 1 0
2 03 04 0
Load Queueht # Addr v ROB#ht 1 0
2 03 04 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no2 no3 no4 no5 no
Lecture 17 Slide 7EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 4ROBht # Insn T Told S X Ch 1 …t 2 stf A,c4,c10
3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0t 2 0
3 04 0
Load Queueht # Addr v ROB#ht 1 0
2 03 04 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no3 no4 no5 no
Lecture 17 Slide 8EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 5ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 t 3 stf B,c5,c20
4 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#ht 1 0
2 03 04 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no4 no5 no
Lecture 17 Slide 9EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 6ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 3 stf B,c5,c20
t 4 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 0 ROB#4t 2 0
3 04 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no5 no
Lecture 17 Slide 10EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 7ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c9
t 5 ldf A,c7,c116 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 0 ROB#4
2 0 ROB#5t 3 0
4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no ldf SQ#3 LQ#25 no
Lecture 17 Slide 11EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 8ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c11
t 6 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 0 ROB#4
2 0 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3
Lecture 17 Slide 12EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 9ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c9 c95 ldf A,c7,c11
t 6 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 0 ROB#4
2 0 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf + + SQ#3 LQ#14 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3
Lecture 17 Slide 13EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 10ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c103 stf B,c5,c204 ldf C,c6,c9 c9 c105 ldf A,c7,c11
t 6 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 0
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 0 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
1 no stf + + SQ#1 LQ#12 no stf SQ#2 LQ#134 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3
No match on Addr, so get cache value [C]
Lecture 17 Slide 14EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 11ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c10 c113 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11
t 6 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 0 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12 no stf SQ#2 LQ#134 no ldf + + SQ#3 LQ#25 no ldf SQ#3 LQ#3
Lecture 17 Slide 15EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 12ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c10 c113 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12
t 6 ldf B,c8,c187 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12 no stf SQ#2 LQ#1345 no ldf SQ#3 LQ#3
Hit on Older (SQ# < SQ Pos & ==Addr)Forward New A as result
Lecture 17 Slide 16EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 18ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c10 c11 c123 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c187 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 0 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12 no stf SQ#2 LQ#1345 no ldf + + SQ#3 LQ#3
Lecture 17 Slide 17EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 19ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c10 c11 c123 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c18 c197 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 B 1 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12 no stf SQ#2 LQ#1345
No match on Addr, so get cache value [B]
Lecture 17 Slide 18EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 20ROBht # Insn T Told S X Ch 1 …
2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 0t 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 B 1 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12 no stf + + SQ#2 LQ#1345
Lecture 17 Slide 19EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 21ROBht # Insn T Told S X C
1 …h 2 stf A,c4,c10 c10 c11 c12
3 stf B,c5,c20 c20 c214 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Valueh 1 A 1 New A
2 B 1 New Bt 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 B 1 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
Hit on Older (LQ# > LQ Pos & ==Addr)Set Exception Bit on ROB#E
Lecture 17 Slide 20EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 22ROBht # Insn T Told S X C
1 …2 stf A,c4,c10 c10 c11 c12
h 3 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Value
1 0h 2 B 1 New Bt 3 0
4 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 B 1 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
E
Commit New A value to the cache
Lecture 17 Slide 21EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 23ROBht # Insn T Told S X C
1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c22
h 4 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
t 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Value
1 02 0
ht 3 04 0
Load Queueht # Addr v ROB#h 1 C 1 ROB#4
2 A 1 ROB#53 B 1 ROB#6
t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
E
Commit New B value to the cache
Lecture 17 Slide 22EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 24ROBht # Insn T Told S X C
1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c11
h 5 ldf A,c7,c11 c11 c12 c13t 6 ldf B,c8,c18 c18 c19 c20
7 …
Store Queueht # Addr v Value
1 02 0
ht 3 04 0
Load Queueht # Addr v ROB#
1 0h 2 A 1 ROB#5
3 B 1 ROB#6t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
E
Lecture 17 Slide 23EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 25ROBht # Insn T Told S X C
1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
ht 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Value
1 02 0
ht 3 04 0
Load Queueht # Addr v ROB#
1 02 0
h 3 B 1 ROB#6t 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
E
Lecture 17 Slide 24EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cycle 26ROBht # Insn T Told S X C
1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13
ht 6 ldf B,c8,c18 c18 c19 c207 …
Store Queueht # Addr v Value
1 02 0
ht 3 04 0
Load Queueht # Addr v ROB#
1 02 03 0
ht 4 0
Reservation Stations# bus
yop T T1 T2 SQ Pos. LQ Pos.
12345
E HANDLE EXCEPTION
Lecture 17 Slide 25EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
2 Parts to Modern VM VM provides each process with the illusion of a
large, private, uniform memoryPart A: Protection
r each process sees a large, contiguous memory segment without holesr each process’s memory space is private, i.e. protected from access by
other processes
Part B: Demand Pagingr capacity of secondary memory (swap space on disk)r at the speed of primary memory (DRAM)
Based on a common HW mechanism: address translationr user process operates on “virtual” or “effective” addressesr HW translates from virtual to physical on each reference
m controls which physical locations can be named by a processm allows dynamic relocation of physical backing store (DRAM vs. HD)
r VM HW and memory management policies controlled by the OS
Lecture 17 Slide 26EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Evolution of Protection MechanismsEarliest machines had no concept of protection and address
translationr no need---single process, single userr automatically “private and uniform” (but not very large)r programs operated on physical addresses directly
no multitasking protection, no dynamic relocation (at least not very easily)
Lecture 17 Slide 27EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Base and bound registersIn a multi-tasking systemEach process is given a non-overlapping, contiguous physical memory region,
everything belonging to a process must fit in that regionWhen a process is swapped in, OS sets base to the start of the process’s
memory region and bound to the end of the regionHW translation and protection check (on each memory reference)
PA = EA + baseprovided (PA < bound), else violations
Þ Each process sees a private and uniform address space (0 .. max)
physical mem.
active process’sregion
another process’sregion
Base
Bound
Bound can also be formulated as a range
privileged controlregisters
Lecture 17 Slide 28EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarSegmented Address Space
segment == a base and bound pairsegmented addressing gives each process multiple segments
r initially, separate code and data segments- 2 sets of base-and-bound reg’s for inst and data fetch- allowed sharing code segments
r became more and more elaborate: code, data, stack, etc.r also (ab)used as a way for an ISA with a small EA space to address a
larger physical memory spaceSEG # EA
segmenttable
+,<base &
bound
PA&
okay?
segment tablesmust be
1. privileged data structures and
2. private/unique to each process
Lecture 17 Slide 29EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarPaged Address Space
Segmented addressing creates fragmentation problems, r a system may have plenty of unallocated memory locationsr they are useless if they do not form a contiguous region of a
sufficient size
In a Paged Memory System:PA space is divided into fixed size segments (e.g. 4kbyte),
more commonly known as “page frames”EA is interpreted as page number and page offset
Page No. Page Offset
pagetable
+page frame base &okay?
PA
page tablesmust be
1. privileged data structures and
2. private/unique to each process
Lecture 17 Slide 30EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Demand PagingMain memory and Disk as automatically managed levels in the
memory hierarchies
analogous to cache vs. main memory
Drastically different size and time scales
Þ very different design decisions
Early attempts r von Neumann already described manual memory hierarchiesr Brookner’s interpretive coding, 1960
m a software interpreter that managed paging between a 40kb main memory and a 640Kb drum
r Atlas, 1962m hardware demand paging between a 32-page (512 word/page) main
memory and 192 pages on drumsm user program believes it has 192 pages
Lecture 17 Slide 31EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Demand Paging vs. Caching: 2016Cache Demand Paging
capacity 64KB~MB ??
1GB~1TB ??
block size 16~128 Byte 4K to 64K Byte
hit time 1~3 cyc 50-150 cyc
miss penalty 10~300 cycles 1M to 10M cycles
miss rate 0.1~10% 0.00001~0.001%
hit handling hw hw
miss handling hw sw
Lecture 17 Slide 32EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Page-Based Virtual Memory
deco
der
deco
der
Physical Page
Number
Translation memory
(page table)
Page offset
Main memory pages
Virtual address
Virtual page number
Physical address
Where to hold this translation memory and how much translation memory do we need?
(64-bit)
(40-bit)
(12-bit)
(52-bit)
(~8-bytes)
(1~10 GBytes)
(10 ~ 100 GBytes)
Lecture 17 Slide 33EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Page table organization
Lecture 17 Slide 34EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarHierarchical Page Table
Page Table of the page table
pages of thepage table
data pages
page in swap diskpage in main memorypage does not exist
p1 p2 P.O.
Base of the Page Table of the page table
p1
d
p2
effective address
privilegedregister
12-bit10-bit10-bit
Storage of overhead of translation should be proportional to the size of physical memory and not the virtual address space
Lecture 17 Slide 35EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Inverted or Hashed Page Tables
hashPIDTableOffset
Base of Table
VPN+ PA of IPTE
PhysicalMemory
VPN PID PTE
InvertedPage Table
Size of Inverted Page table only needs to beproportional to the size of the physical memory
Each VPN can only be mapped to a small setof entries according to a hash function
To translate a VPN, check all allowed table entries for matching VPN and PID
How many memory lookups per translation?
Lecture 17 Slide 36EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtual-to-Physical Translation
Lecture 17 Slide 37EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Translation Look-aside Buffer (TLB)
Essentially a cache ofrecent address translationsavoids going to the page tableon every reference
indexed by lower bits ofVPN (virtual page #)
tag = unused bits of VPN +process ID
data = a page-table entryi.e. PPN (physical page #) and
access permissionstatus = valid, dirtythe usual cache design
choices (placement, replacement policymulti-level, etc) apply here too.
What should be the relative sizes of ITLB and I-cache?
=
Index
Tag
Physical page no.
Physical address
Page offset
Virtual addressVPN
Page offset
Lecture 17 Slide 38EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtual to Physical Address TranslationEffectiveAddress
TLBLookup
Page TableWalk
Update TLB Page FaultOSTable Walk
ProtectionCheck
PhysicalAddressTo Cache
miss hit
succeed fail denied permitted
ProtectionFault
£ 1 pclk
£ 1 pclk100’s pclkby HW or SW
10000’s pclk
Lecture 17 Slide 39EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Cache Placement and Address Translation
CPU VirtualCache MMU Physical
Memory
VA
PA
CPU PhysicalCacheMMU Physical
MemoryVA
PAPhysical Cache (Most Systems)
aliasing problem
cold start after context switch
longer hit time
Virtual Cache (SPARC2’s)
Virtual caches are not popular anymore becauseMMU and CPU can be integrated on one chip
fetch critical path
fetch critical path
Lecture 17 Slide 40EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Tag Index Page Offset (PO)
TLB
Phy. Page No. (PPN) POTag Index BO
D-cache
Data
kg
p
VirtualAddress(n=v+g bits)
PhysicalAddress(m=p+g bits)
Virtual Page No. (VPN)
v-k
t i b
Physically Indexed Cache
Lecture 17 Slide 41EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtually Indexed CacheParallel Access to TLB and Cache arrays
=
Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset
TLB
D-cachePPN
PPNData
Hit/Miss
p
p
gk Index BOv-k
i b
p
p
Virtual Pg No. (VPN)
How large can a virtually indexed cache get?
Lecture 17 Slide 42EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Large Virtually Indexed Cache
=
Virtual Pg No. (VPN)
Tag Index Page Offset Tag Index Page Offset
TLB
D-cache
PPN
PPN
Data
Hit/Miss
p
p
gk Index BOv-k
i b
p
p
Virtual Pg No. (VPN)
If two VPNs differs in a, but both map to the same PPN then there is an aliasing problem
a
Lecture 17 Slide 43EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Virtual Address Synonyms
Two Virtual pages that map to the same physical pager within the same virtual address spacer across address spaces
VA1
VA2
PA
Using VA bits as IDX, PA data may reside in different sets in cache!!
Lecture 17 Slide 44EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Synonym (or Aliasing)When VPN bits are used in
indexing, two virtual addresses that map to the same physical address can end up sitting in two cache lines
In other words, two copies of the same physical memory location may exist in the cacheÞ modification to one copy
won’t be visible in the other=
Tag Index Page Offset
D-cache
PPNData
Hit/Miss
p
Index BO
i b
p
Virtual Pg No. (VPN)
PPNfromTLB
If the two VPNs do not differ in a then there is no aliasing problem
a
Lecture 17 Slide 45EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
Synonym Solutions
Limit cache size to page size times associativityr get index from page offset
Search all sets in parallelr 64K 4-way cache, 4K pages, search 4 sets (16 entries)
r Slow!
Restrict page placement in OSr make sure index(VA) = index(PA)
Eliminate by OS conventionr single virtual space
r restrictive sharing model
Lecture 17 Slide 46EECS 470
© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar
MIPS R10K Synonym Solution32KB 2-Way Virtually-Indexed L1
r needs 10 bits of index and 4 bits of block offset r page offset is only 12-bits Þ 2 bits of index are VPN[1:0]
Direct-Mapped Physical L2 r L2 is Inclusive of L1r VPN[1:0] is appended to the “tag” of L2
Given two virtual addresses VA and VB that differs in a and both map to the same physical address PA
r Suppose VA is accessed first so blocks are allocated in L1&L2r What happens when VB is referenced?
1 VB indexes to a different block in L1and misses2 VB translates to PA and goes to the same block as VA in L23. Tag comparison fails (VA[1:0]¹VB[1:0])4. L2 detects that a synonym is cached in L1 Þ VA’s entry in L1 is
ejected before VB is allowed to be refilled in L1