+ All Categories
Home > Documents > CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall:...

CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall:...

Date post: 09-Mar-2018
Category:
Upload: leanh
View: 218 times
Download: 2 times
Share this document with a friend
44
CS698Y: Modern Memory Systems Lecture-11 (Hardware Prefetching) Biswabandan Panda [email protected] https://www.cse.iitk.ac.in/users/biswap/CS698Y.html
Transcript
Page 1: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

CS698Y: Modern Memory Systems Lecture-11 (Hardware Prefetching)

Biswabandan Panda [email protected]

https://www.cse.iitk.ac.in/users/biswap/CS698Y.html

Page 2: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 2

Flow of the Module

Data Prefetching Techniques

Interaction with Cache Replacement

Metrics Related to Prefetching

Instruction Prefetching

But, Why Prefetching? Remember Memory Wall: It is still hurting

Page 3: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 3

Hardware Prefetching

What? Latency-hiding technique - Fetches data before the core demands.

Why? Off-chip DRAM latency has grown up to 400 to 800 cycles.

How? By observing/predicting the demand access (LOAD/STORE) patterns.

Page 4: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 4

Hardware Prefetch Engine

L2

Prefetcher

X+2 X+3

Co

re

X+3

❺ HIT

X+1 X

Page 5: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 5

Prefetchers in Multicore - 101

Interconnect

L3

L2 PF

Core 0 Core 1 Core 2 Core 3

L2 L2 L2 PF PF PF

Page 6: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 6

Prefetching Knobs

Prefetch Degree: Number of prefetch requests to issue at a given time.

L2

L3/DRAM

Prefetcher

X

Demand Access

X+1

X+2

X+1 X+2

X+1 X+3 X+4

Page 7: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 7

Prefetching Knobs

Prefetch Distance: How far ahead of the demand access stream are the prefetch requests issued?

demand access

Prefetch-distance

X Y

prefetch

Y = X + 4 Y = X + 8 Y = X + 16

Page 8: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 8

Aggressiveness [degree, distance]

Prefetch degree: #Prefetch requests issued on a miss

X+1 X+1 X+2 X+2 X+1 X+3 X+4 PF

Prefetch distance: How far ahead (in terms of # blocks) of the demand miss ?

Demand Miss

X Y

Prefetch

Page 9: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 9

The Simplest Prefetcher

Next Line: Miss to cache block X , prefetch X+1. Degree=1, Distance=1

Works well for L1 Icache and L1 Dcache.

Next N Line: Miss to cache block X , prefetch X+1, X+2, ….. X+N, Degree=N, Distance= min. 1 and max. N

Page 10: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 10

What about this?

Page 11: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 11

Stride Prefetching

PC effective address

instruction tag previous address stride state

-

+

prefetch address

Page 12: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 12

An Example

float a[100][100], b[100][100], c[100][100]; ... for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) for (k = 0; k < 100; k++) a[i][j] += b[i][k] * c[k][j];

instruction tag previous address stride state ld b[i][k] 50000 0 initial ld c[k][j] 90000 0 initial ld a[i][j] 10000 0 initial

ld b[i][k] 50004 4 trans ld c[k][j] 90400 400 trans ld a[i][j] 10000 0 steady

ld b[i][k] 50008 4 steady ld c[k][j] 90800 800 steady ld a[i][j] 10000 0 steady

Page 13: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 13

Pointer Chasers

Page 14: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 14

Stream Prefetching [DPC1]

1st miss

2nd miss

3rd miss

100

100 102 104

102 104 Trained!

miss sequence 503 504 501 499

503 504 501 Fail! 504 501 499 Trained!

1st miss

2nd

miss 3rd

miss

Training: Consecutive misses in the same direction.

Page 15: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 15

Stream Prefetcher in Action

prefetch degree

Stream direction

original addr

memory access

Monitored region

prefetch distance

start addr end addr

Page 16: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 16

Stream + Stride

Stream direction

original addr

memory access

start addr

Monitored region

prefetch distance * stride

end addr

prefetch degree * stride

Page 17: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 17

Quantifying Prefetchers

(i)Prefetch

(i)Prefetchcuracy(i)PrefetchAc

issued

hits

(i)Prefetch

(i)Prefetch)Lateness(i

hits

late

(i)Demand

Poll(i) LLCi)Pollution(

misses

Prefetched Block in the Cache.

Prefetched Block Still on its way

Prefetched Block evicted a demand block that will be reused

(i)Demand (i) HitsPrefetch

Hits(i)Prefetch )Coverage(i

misses

Fraction of misses avoided

Page 18: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 18

Prefetch Lateness

Cache Miss

Cache

X

Prefetch

Demand

X

Page 19: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 19

Cache Pollution

Cache Miss

Cache

X Y

A

Z

B Set 1

Set 2

C

X

Prefetch

Demand

Page 20: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 20

Cache Hits & Accuracy

Cache Hit

Cache

Z Y

A B Set 1

Set 2

Z

Prefetch

Demand

Page 21: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 21

Where to Put These Prefetchers?

L1? Next-line, PC-localized stride predictors

L2? Stream + Stride, Other variants

L1 instruction cache ? Predict the future PC

Page 22: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 22

State-of-the-art Prefetchers

Page 23: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 23

Perfect Timing

Page 24: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 24

Delayed Prefetching

Page 25: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 25

Offset

Page 26: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 26

Offset = Sum of strides

Page 27: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 27

milc: Offset

Page 28: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 28

GemsFDTD: Offset

Page 29: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 29

Best-offset Prefetcher [HPCA ‘16]

Page 30: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 30

Specialized Streams Temporal Streams – Sequences of temporally correlated addresses, exploited by TMS [ISCA ‘05].

Spatial Streams – Streams, which are correlated in space, exploited by SMS [ISCA ‘06].

SpatioTemporal Streams – Temporal correlation among the spatial regions, and spatial correlation within a region, exploited by STeMS [ISCA ‘09].

Page 31: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 31

Spatial Memory Streaming (SMS)

Filter Table (FT)

Tag PC/Offset

Miss to A+1 PC/1 A

Miss to A+3 1

Bit Vector

Accumulation Table (AT)

PC/Offset Tag

A PC/1 0101

Miss to A+2 2 A PC/1 0111

Eviction/ Invalidation A

3

Active Generation Table (AGT)

Sig Bit Vector

Pattern History Table (PHT)

0111 PC/1

.

.

.

.

. .

.

.

- Divides the memory space into fixed size regions, indexed by a signature (PC/offset) . - Each signature contains a bit vector. - Each bit in the bit vector corresponds to a cache line.

Page 32: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 32

Reading Assignment-1

Proactive Instruction Fetch [MICRO ‘11]

Indirect Memory Prefetcher [MICRO ‘15]

Deadline: October 7, 2017, 17:00 hrs through Canvas

More details through Piazza

Page 33: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 33

Programming Assignment-2

Will be released on Sept 11, 2017

Based on Hardware Prefetchers

This time: You have to code (no analysis)

Page 34: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 34

PA1 Presentation

12th Sept, 15:00 hrs IST, KD-103

7+1 min presentation

Do not put MPKI and IPC numbers

You will evaluate your peers

Page 35: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 35

What About Irregular Applications? [MICRO ‘13]

Page 36: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 36

PC Localization

Page 37: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 37

Structural Address Space

Page 38: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 38

Page 39: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 39

Page 40: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 40

Page 41: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 41

Page 42: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 42

Irregular Stream Buffer [MICRO ‘13]

Page 43: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 43

Page 44: CS698Y: Modern Memory Systems Lecture-11 … Prefetching But, Why Prefetching? Remember Memory Wall: It is still hurting . Modern Memory Systems Biswabandan Panda, CSE@IITK 3 Hardware

Modern Memory Systems Biswabandan Panda, CSE@IITK 44

Interaction with Cache Replacement

Read PACMan [MICRO ‘11]

Crux: Prefetched blocks are not reused after their first-use. So insert them with lowest priority.


Recommended