+ All Categories
Home > Documents > EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7…...

EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7…...

Date post: 28-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
46
Lecture 17 Slide 1 EECS 470 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 470 Lecture 17 Virtual Memory Fall 2019 Prof. Ronald Dreslinski http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.
Transcript
Page 1: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 1EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS 470Lecture 17Virtual Memory

Fall 2019

Prof. Ronald Dreslinski

http://www.eecs.umich.edu/courses/eecs470

Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Page 2: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 2EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

D$/TLB + SQ + LQLoad queue (LQ)

r In-flight load addressesr In program-order (like ROB,SQ)r Associatively searchabler Size heuristic: 20-30% of ROB

================

D$/TLB

head

tail

load queue(LQ)

address================

tail

head

age

store position flush?

SQ

Page 3: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 3EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Advanced Memory “Pipeline” (LQ Only)Loads

r Dispatch (D)m Allocate entry at LQ tail

r Execute (X)m Write address into corresponding LQ slot

Storesr Dispatch (D)

m Record current LQ tail as “store position” in RSr Execute (X)

m Where the good stuff happens

Page 4: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 4EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Detecting Memory Ordering ViolationsStore sends address to LQ

r Compare with all load addressesr Selecting matching addressesr Matching address?

m Load executed before storem Violationm Fix!

Age logic selects oldest load that is younger than store

r Use store positionr Processor flushes and restarts

================

D$/TLB

head

tail

load queue(LQ)

address================

tail

head

age

store position flush?

Page 5: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 5EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

R10K Data StructuresROBht # Insn T Told S X C

1 …2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c85 ldf A,c7,c116 ldf B,c8,c187 …

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no2 no3 no4 no5 no

Store Queueht # Addr v Value

1234

Load Queueht # Addr v ROB#

1234

Address, Cycle Dispatched, Cycle Operands Ready

Page 6: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 6EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 3ROBht # Insn T Told S X Cht 1 …

2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …

Store Queueht # Addr v Valueht 1 0

2 03 04 0

Load Queueht # Addr v ROB#ht 1 0

2 03 04 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no2 no3 no4 no5 no

Page 7: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 7EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 4ROBht # Insn T Told S X Ch 1 …t 2 stf A,c4,c10

3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0t 2 0

3 04 0

Load Queueht # Addr v ROB#ht 1 0

2 03 04 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no3 no4 no5 no

Page 8: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 8EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 5ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 t 3 stf B,c5,c20

4 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#ht 1 0

2 03 04 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no4 no5 no

Page 9: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 9EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 6ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 3 stf B,c5,c20

t 4 ldf C,c6,c95 ldf A,c7,c116 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 0 ROB#4t 2 0

3 04 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no5 no

Page 10: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 10EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 7ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c9

t 5 ldf A,c7,c116 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 0 ROB#4

2 0 ROB#5t 3 0

4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no ldf SQ#3 LQ#25 no

Page 11: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 11EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 8ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c95 ldf A,c7,c11

t 6 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 0 ROB#4

2 0 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf SQ#3 LQ#14 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3

Page 12: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 12EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 9ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 3 stf B,c5,c204 ldf C,c6,c9 c95 ldf A,c7,c11

t 6 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 0 ROB#4

2 0 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf SQ#1 LQ#12 no stf SQ#2 LQ#13 no ldf + + SQ#3 LQ#14 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3

Page 13: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 13EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 10ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c103 stf B,c5,c204 ldf C,c6,c9 c9 c105 ldf A,c7,c11

t 6 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 0

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 0 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

1 no stf + + SQ#1 LQ#12 no stf SQ#2 LQ#134 no ldf SQ#3 LQ#25 no ldf SQ#3 LQ#3

No match on Addr, so get cache value [C]

Page 14: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 14EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 11ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c10 c113 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11

t 6 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 0 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12 no stf SQ#2 LQ#134 no ldf + + SQ#3 LQ#25 no ldf SQ#3 LQ#3

Page 15: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 15EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 12ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c10 c113 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12

t 6 ldf B,c8,c187 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12 no stf SQ#2 LQ#1345 no ldf SQ#3 LQ#3

Hit on Older (SQ# < SQ Pos & ==Addr)Forward New A as result

Page 16: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 16EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 18ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c10 c11 c123 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c187 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 0 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12 no stf SQ#2 LQ#1345 no ldf + + SQ#3 LQ#3

Page 17: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 17EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 19ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c10 c11 c123 stf B,c5,c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c18 c197 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 B 1 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12 no stf SQ#2 LQ#1345

No match on Addr, so get cache value [B]

Page 18: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 18EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 20ROBht # Insn T Told S X Ch 1 …

2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c204 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 0t 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 B 1 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12 no stf + + SQ#2 LQ#1345

Page 19: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 19EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 21ROBht # Insn T Told S X C

1 …h 2 stf A,c4,c10 c10 c11 c12

3 stf B,c5,c20 c20 c214 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Valueh 1 A 1 New A

2 B 1 New Bt 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 B 1 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

Hit on Older (LQ# > LQ Pos & ==Addr)Set Exception Bit on ROB#E

Page 20: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 20EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 22ROBht # Insn T Told S X C

1 …2 stf A,c4,c10 c10 c11 c12

h 3 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Value

1 0h 2 B 1 New Bt 3 0

4 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 B 1 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

E

Commit New A value to the cache

Page 21: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 21EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 23ROBht # Insn T Told S X C

1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c22

h 4 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

t 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Value

1 02 0

ht 3 04 0

Load Queueht # Addr v ROB#h 1 C 1 ROB#4

2 A 1 ROB#53 B 1 ROB#6

t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

E

Commit New B value to the cache

Page 22: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 22EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 24ROBht # Insn T Told S X C

1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c11

h 5 ldf A,c7,c11 c11 c12 c13t 6 ldf B,c8,c18 c18 c19 c20

7 …

Store Queueht # Addr v Value

1 02 0

ht 3 04 0

Load Queueht # Addr v ROB#

1 0h 2 A 1 ROB#5

3 B 1 ROB#6t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

E

Page 23: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 23EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 25ROBht # Insn T Told S X C

1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

ht 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Value

1 02 0

ht 3 04 0

Load Queueht # Addr v ROB#

1 02 0

h 3 B 1 ROB#6t 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

E

Page 24: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 24EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cycle 26ROBht # Insn T Told S X C

1 …2 stf A,c4,c10 c10 c11 c123 stf B,c5,c20 c20 c21 c224 ldf C,c6,c9 c9 c10 c115 ldf A,c7,c11 c11 c12 c13

ht 6 ldf B,c8,c18 c18 c19 c207 …

Store Queueht # Addr v Value

1 02 0

ht 3 04 0

Load Queueht # Addr v ROB#

1 02 03 0

ht 4 0

Reservation Stations# bus

yop T T1 T2 SQ Pos. LQ Pos.

12345

E HANDLE EXCEPTION

Page 25: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 25EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

2 Parts to Modern VM VM provides each process with the illusion of a

large, private, uniform memoryPart A: Protection

r each process sees a large, contiguous memory segment without holesr each process’s memory space is private, i.e. protected from access by

other processes

Part B: Demand Pagingr capacity of secondary memory (swap space on disk)r at the speed of primary memory (DRAM)

Based on a common HW mechanism: address translationr user process operates on “virtual” or “effective” addressesr HW translates from virtual to physical on each reference

m controls which physical locations can be named by a processm allows dynamic relocation of physical backing store (DRAM vs. HD)

r VM HW and memory management policies controlled by the OS

Page 26: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 26EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Evolution of Protection MechanismsEarliest machines had no concept of protection and address

translationr no need---single process, single userr automatically “private and uniform” (but not very large)r programs operated on physical addresses directly

no multitasking protection, no dynamic relocation (at least not very easily)

Page 27: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 27EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Base and bound registersIn a multi-tasking systemEach process is given a non-overlapping, contiguous physical memory region,

everything belonging to a process must fit in that regionWhen a process is swapped in, OS sets base to the start of the process’s

memory region and bound to the end of the regionHW translation and protection check (on each memory reference)

PA = EA + baseprovided (PA < bound), else violations

Þ Each process sees a private and uniform address space (0 .. max)

physical mem.

active process’sregion

another process’sregion

Base

Bound

Bound can also be formulated as a range

privileged controlregisters

Page 28: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 28EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarSegmented Address Space

segment == a base and bound pairsegmented addressing gives each process multiple segments

r initially, separate code and data segments- 2 sets of base-and-bound reg’s for inst and data fetch- allowed sharing code segments

r became more and more elaborate: code, data, stack, etc.r also (ab)used as a way for an ISA with a small EA space to address a

larger physical memory spaceSEG # EA

segmenttable

+,<base &

bound

PA&

okay?

segment tablesmust be

1. privileged data structures and

2. private/unique to each process

Page 29: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 29EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarPaged Address Space

Segmented addressing creates fragmentation problems, r a system may have plenty of unallocated memory locationsr they are useless if they do not form a contiguous region of a

sufficient size

In a Paged Memory System:PA space is divided into fixed size segments (e.g. 4kbyte),

more commonly known as “page frames”EA is interpreted as page number and page offset

Page No. Page Offset

pagetable

+page frame base &okay?

PA

page tablesmust be

1. privileged data structures and

2. private/unique to each process

Page 30: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 30EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Demand PagingMain memory and Disk as automatically managed levels in the

memory hierarchies

analogous to cache vs. main memory

Drastically different size and time scales

Þ very different design decisions

Early attempts r von Neumann already described manual memory hierarchiesr Brookner’s interpretive coding, 1960

m a software interpreter that managed paging between a 40kb main memory and a 640Kb drum

r Atlas, 1962m hardware demand paging between a 32-page (512 word/page) main

memory and 192 pages on drumsm user program believes it has 192 pages

Page 31: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 31EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Demand Paging vs. Caching: 2016Cache Demand Paging

capacity 64KB~MB ??

1GB~1TB ??

block size 16~128 Byte 4K to 64K Byte

hit time 1~3 cyc 50-150 cyc

miss penalty 10~300 cycles 1M to 10M cycles

miss rate 0.1~10% 0.00001~0.001%

hit handling hw hw

miss handling hw sw

Page 32: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 32EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Page-Based Virtual Memory

deco

der

deco

der

Physical Page

Number

Translation memory

(page table)

Page offset

Main memory pages

Virtual address

Virtual page number

Physical address

Where to hold this translation memory and how much translation memory do we need?

(64-bit)

(40-bit)

(12-bit)

(52-bit)

(~8-bytes)

(1~10 GBytes)

(10 ~ 100 GBytes)

Page 33: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 33EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Page table organization

Page 34: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 34EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, VijaykumarHierarchical Page Table

Page Table of the page table

pages of thepage table

data pages

page in swap diskpage in main memorypage does not exist

p1 p2 P.O.

Base of the Page Table of the page table

p1

d

p2

effective address

privilegedregister

12-bit10-bit10-bit

Storage of overhead of translation should be proportional to the size of physical memory and not the virtual address space

Page 35: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 35EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Inverted or Hashed Page Tables

hashPIDTableOffset

Base of Table

VPN+ PA of IPTE

PhysicalMemory

VPN PID PTE

InvertedPage Table

Size of Inverted Page table only needs to beproportional to the size of the physical memory

Each VPN can only be mapped to a small setof entries according to a hash function

To translate a VPN, check all allowed table entries for matching VPN and PID

How many memory lookups per translation?

Page 36: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 36EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Virtual-to-Physical Translation

Page 37: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 37EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Translation Look-aside Buffer (TLB)

Essentially a cache ofrecent address translationsavoids going to the page tableon every reference

indexed by lower bits ofVPN (virtual page #)

tag = unused bits of VPN +process ID

data = a page-table entryi.e. PPN (physical page #) and

access permissionstatus = valid, dirtythe usual cache design

choices (placement, replacement policymulti-level, etc) apply here too.

What should be the relative sizes of ITLB and I-cache?

=

Index

Tag

Physical page no.

Physical address

Page offset

Virtual addressVPN

Page offset

Page 38: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 38EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Virtual to Physical Address TranslationEffectiveAddress

TLBLookup

Page TableWalk

Update TLB Page FaultOSTable Walk

ProtectionCheck

PhysicalAddressTo Cache

miss hit

succeed fail denied permitted

ProtectionFault

£ 1 pclk

£ 1 pclk100’s pclkby HW or SW

10000’s pclk

Page 39: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 39EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cache Placement and Address Translation

CPU VirtualCache MMU Physical

Memory

VA

PA

CPU PhysicalCacheMMU Physical

MemoryVA

PAPhysical Cache (Most Systems)

aliasing problem

cold start after context switch

longer hit time

Virtual Cache (SPARC2’s)

Virtual caches are not popular anymore becauseMMU and CPU can be integrated on one chip

fetch critical path

fetch critical path

Page 40: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 40EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Tag Index Page Offset (PO)

TLB

Phy. Page No. (PPN) POTag Index BO

D-cache

Data

kg

p

VirtualAddress(n=v+g bits)

PhysicalAddress(m=p+g bits)

Virtual Page No. (VPN)

v-k

t i b

Physically Indexed Cache

Page 41: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 41EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Virtually Indexed CacheParallel Access to TLB and Cache arrays

=

Virtual Pg No. (VPN) Tag Index Page Offset Tag Index Page Offset

TLB

D-cachePPN

PPNData

Hit/Miss

p

p

gk Index BOv-k

i b

p

p

Virtual Pg No. (VPN)

How large can a virtually indexed cache get?

Page 42: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 42EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Large Virtually Indexed Cache

=

Virtual Pg No. (VPN)

Tag Index Page Offset Tag Index Page Offset

TLB

D-cache

PPN

PPN

Data

Hit/Miss

p

p

gk Index BOv-k

i b

p

p

Virtual Pg No. (VPN)

If two VPNs differs in a, but both map to the same PPN then there is an aliasing problem

a

Page 43: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 43EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Virtual Address Synonyms

Two Virtual pages that map to the same physical pager within the same virtual address spacer across address spaces

VA1

VA2

PA

Using VA bits as IDX, PA data may reside in different sets in cache!!

Page 44: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 44EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Synonym (or Aliasing)When VPN bits are used in

indexing, two virtual addresses that map to the same physical address can end up sitting in two cache lines

In other words, two copies of the same physical memory location may exist in the cacheÞ modification to one copy

won’t be visible in the other=

Tag Index Page Offset

D-cache

PPNData

Hit/Miss

p

Index BO

i b

p

Virtual Pg No. (VPN)

PPNfromTLB

If the two VPNs do not differ in a then there is no aliasing problem

a

Page 45: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 45EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Synonym Solutions

Limit cache size to page size times associativityr get index from page offset

Search all sets in parallelr 64K 4-way cache, 4K pages, search 4 sets (16 entries)

r Slow!

Restrict page placement in OSr make sure index(VA) = index(PA)

Eliminate by OS conventionr single virtual space

r restrictive sharing model

Page 46: EECS 470 Lecture 17 Virtual Memory · t3stfB,c5,c20 4ldfC,c6,c9 5ldfA,c7,c11 6ldfB,c8,c18 7… Store Queue ht#Addr vValue h1 0 2 0 t3 0 4 0 Load Queue ht#Addr vROB# ht1 0 2 0 3 0

Lecture 17 Slide 46EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

MIPS R10K Synonym Solution32KB 2-Way Virtually-Indexed L1

r needs 10 bits of index and 4 bits of block offset r page offset is only 12-bits Þ 2 bits of index are VPN[1:0]

Direct-Mapped Physical L2 r L2 is Inclusive of L1r VPN[1:0] is appended to the “tag” of L2

Given two virtual addresses VA and VB that differs in a and both map to the same physical address PA

r Suppose VA is accessed first so blocks are allocated in L1&L2r What happens when VB is referenced?

1 VB indexes to a different block in L1and misses2 VB translates to PA and goes to the same block as VA in L23. Tag comparison fails (VA[1:0]¹VB[1:0])4. L2 detects that a synonym is cached in L1 Þ VA’s entry in L1 is

ejected before VB is allowed to be refilled in L1


Recommended