+ All Categories
Home > Documents > Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Date post: 29-Mar-2015
Category:
Upload: dominique-grymes
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
48
Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt § ¥ ¥ §
Transcript
Page 1: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed PrefetchingSanthosh Srinath

Onur MutluHyesoon KimYale N. Patt

§¥

¥§

Page 2: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Problem

Prefetching can significantly improve performance When prefetches are accurate And timely

However, Prefetching can also significantly degrade performance Due to Memory Bandwidth impact Pollution of the cache

HPCA-13 Feedback Directed Prefetching 2

Feedback Directed Prefetching is a comprehensive mechanism which reduces the negative effects of prefetching as well as improves the positive effects

Solution

Page 3: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 3

Outline

Background and Motivation

Feedback Directed Prefetching (FDP) Metrics and How to collect How to adapt

Prefetcher Aggressiveness Cache Insertion Policy for Prefetches

Results

HPCA-13

Page 4: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Prefetch Distance

Prefetch Degree

Predicted StreamPredicted Stream

Feedback Directed Prefetching 4

Background (Prefetcher Aggressiveness)

X

Access Stream

PmaxPrefetch Distance

PmaxVery Conservative

PmaxMiddle of the Road

PmaxVery Aggressive

P

Prefetch DegreeX+1

1 2 3

HPCA-13

Page 5: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 5

Background (Prefetcher Aggressiveness) Very Aggressive

Well ahead of the load access stream Hides memory access latency better More speculative

Very Conservative Closer to the load access stream Might not hide memory access latency completely Reduces potential for cache pollution and

bandwidth contention

HPCA-13

Page 6: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 6

0.0

1.0

2.0

3.0

4.0

5.0

Inst

ruct

ion

s p

er

Cyc

le

No PrefetchingVery Conservative

Middle-of-the-RoadVery Aggressive

Motivation

Very Aggressive improves average performance by 84% However it can also significantly reduce performance on some benchmarks

48% 29%

HPCA-13

Page 7: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 7

Outline

Background and Motivation

Feedback Directed Prefetching (FDP) Metrics and How to collect How to adapt

Prefetcher Aggressiveness Cache Insertion Policy for Prefetches

Results

HPCA-13 7Feedback Directed Prefetching

Page 8: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 8

Feedback Directed Prefetching Comprehensive mechanism which takes in

account: Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution

Adapts Prefetcher Aggressiveness Cache Insertion Policy for Prefetches

HPCA-13

Page 9: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 9

Metrics

Prefetch Accuracy

Prefetch Lateness

Prefetcher-caused Cache Pollution

HPCA-13

Page 10: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 10

Prefetch Accuracy

Useful Prefetches are referenced by the demand requests when in L2

Memory Sent to Prefetches ofNumber

Prefetches UsefulofNumber Accuracy Prefetcher

HPCA-13

Page 11: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 11

Prefetch Accuracy

Low Accuracy More likely that Prefetching can reduce performance

-100%

-50%

0%

50%

100%

150%

200%

250%

300%

350%

400%

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Per

cent

age

IPC

cha

nge

ove

r N

o P

refe

tchi

ng

Prefetcher Accuracy

HPCA-13

Page 12: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 12

Prefetch Accuracy

Implementation pref-bit added to each L2 tag-store entry Tracked using two counters: pref_total,

used_total

pref_total

used_totalAccuracy Prefetcher

HPCA-13

Page 13: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 13

Prefetch Lateness

Measure of how timely prefetches are Used to determine if increasing the

aggressiveness helps Implementation

pref-bit added to each L2 MSHR entry New counter: late_total

Prefetches UsefulofNumber

Prefetches Late ofNumber LatenessPrefetch

used_total

late_total LatenessPrefetch

HPCA-13

Page 14: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 14

Prefetcher-caused Cache Pollution

Measure of the disturbance caused by prefetched data in the cache

Used to determine if the prefetcher is evicting useful data from the cache

Misses Demand ofNumber

Prefetcher by the caused Misses Demand ofNumber

Pollution Cache causedPrefetcher

HPCA-13

Page 15: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 15

Prefetcher-caused Cache Pollution (2)

Hardware Implementation Insight – this does not need to be exact Track pollution using Pollution filter

Based on Bloom Filter concept Bit set when a prefetch evicts a demand miss Bit reset when a prefetch is serviced

Two Counters – pollution_total, demand_total

aldemand_tot

totalpollution_Pollution Cache caused-Prefetcher

HPCA-13

Page 16: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 16

Feedback Directed Prefetching Comprehensive mechanism which takes in

account: Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution

Adapts Prefetcher Aggressiveness Cache Insertion Policy

HPCA-13 16Feedback Directed Prefetching

Page 17: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 17

How to adapt? Prefetcher Aggressiveness Dynamic Configuration Counter

Current Aggressiveness

Distance Degree

1 Very Conservative 4 1

2 Conservative 8 1

3 Middle-of-the-Road 16 2

4 Aggressive 32 4

5 Very Aggressive 64 4

HPCA-13

Page 18: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Improve TimelinessReduce Cache Pollution

Feedback Directed Prefetching 18

High Accuracy

Not-Late

Polluting

Decrease

Late

Increase

How to adapt? Prefetcher Aggressiveness (2)

For Current Phase, based on static thresholds, classify Accuracy Lateness Cache-Pollution caused by Prefetches

Med Accuracy

Not-Poll

Late

Increase

Polluting

Decrease

Low Accuracy

Not-Poll

Not-Late

No Change

Decrease

Reduce memory bandwidth usage and

Cache Pollution

HPCA-13

Page 19: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 19

How to Adapt?Cache Insertion Policy for Prefetches Why adapt?

Reduce the potential for cache pollution Classify Cache Pollution based on static

thresholds: Low – Insert at MID(n/2) Position

Eg: For a 16-way cache, MID = 8 in LRU stack Medium – Insert at LRU-4(n/4) Position

Eg: For a 16-way cache, LRU-4 = 4 in LRU stack High – Insert at LRU Position

HPCA-13

Page 20: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 20

Outline

Background and Motivation

Feedback Directed Prefetching Metrics and How to collect How to adapt

Prefetcher Aggressiveness Cache Insertion Policy for Prefetches

Results

HPCA-13 20Feedback Directed Prefetching

Page 21: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 21

Evaluation Methodology

Execution-driven Alpha simulator Aggressive out-of-order superscalar processor 1 MB, 16-way, 10-cycle unified L2 cache 500-cycle minimum main memory latency Detailed memory model

Prefetchers Modeled: Stream Prefetcher tracking 64 different streams Global History Buffer Prefetcher (in paper) PC-based Stride Prefetcher (in paper)

HPCA-13

Page 22: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 22

Results: Adjusting Only Aggressiveness

4.7% higher avg IPC over the Very Aggressive configuration Most of the performance losses have been eliminated

HPCA-13

Page 23: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 23

Results: Adjusting Only Cache Insertion Policy

5.1% better than inserting prefetches in MRU position 1.9% better than inserting prefetches in LRU-4 position

0.0

1.0

2.0

3.0

4.0

5.0

Ins

tru

cti

on

s p

er

Cy

cle

No PrefetchingLRULRU-4MIDMRUDynamic Insertion

Very Aggressive Prefetcher

HPCA-13

Page 24: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 24

Results: Putting it all together (FDP)

6.5% IPC improvement over Very Aggressive configuration Performance losses converted to performance gains!

11%13%

HPCA-13

Page 25: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

BPKI - Memory Bus Accesses per 1000 retired Instructions Includes effects of L2 demand misses as well as pollution

induced misses and prefetches

FDP significantly improves bandwidth efficiency

6.5% higher performance and18.7% less bandwidth

Feedback Directed Prefetching 25

Bandwidth Impact

No. Pref. Very Cons Mid Very Aggr FDP

IPC 0.85 1.21 1.47 1.57 1.67

BPKI 8.56 9.34 10.60 13.38 10.88

13.6% higher performance with similar bandwidth usage

HPCA-13

Page 26: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 26

Hardware Cost

Total hardware cost 20784 bits = 2.54 KB Percentage area overhead compared to baseline

1MB L2 cache 2.5KB/1024KB = 0.24% NOT on the critical path

pref-bits for L2 cache 16384 blocks 16384 bits

Pollution Filter 4096 entries * 1bit 4096 bits

16-bit counters 11 counters 176 bits

pref-bits for MSHR 128 entries 128 bits

HPCA-13

Page 27: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 27

Outline

Background and Motivation

Feedback Directed Prefetching Metrics and collecting this information in

Hardware How to adapt

Results Conclusions

HPCA-13 27Feedback Directed Prefetching

Page 28: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 28

Contributions Comprehensive and low-cost feedback mechanism

for hardware prefetchers Uses

Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution

Adapts Aggressiveness Cache Insertion Policy for prefetches

6.5% higher performance and 18.7% less bandwidth compared to Very Aggressive Prefetching

Eliminates negative impact of prefetching Applicable to any data prefetch algorithm

HPCA-13

Page 29: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 29

Questions?

HPCA-13

Page 30: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 30

Backups

HPCA-13

Page 31: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

FDP vs Prefetch Cache

Prefetch Caches eliminate prefetcher induced cache pollution

However, prefetches are now limited to the size of the prefetch cache

5.3% higher perf. than Very Aggr.+32KB Within 2% of Very Aggr.+64KB Memory bandwidth of FDP is 16% less than

32KB and 9% less than 64KB.

HPCA-13 31Feedback Directed Prefetching

Page 32: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Feedback Directed Prefetching 32

Performance on Other Prefetch algorithms Global History Buffer Prefetcher

20.8% less memory bandwidth than very aggressive with similar perf.

9.9% better performance than middle-of-the-road with similar bandwidth usage

PC-based Stride Prefetcher 4% better performance than the very aggressive 24% reduction in bandwidth usage

HPCA-13

Page 33: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

IPC Performance

HPCA-13 Feedback Directed Prefetching 33

Page 34: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Dynamic Prefetcher Accuracy

HPCA-13 Feedback Directed Prefetching 34

Page 35: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Prefetch Lateness

HPCA-13 Feedback Directed Prefetching 35

Page 36: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Pollution Filter

HPCA-13 Feedback Directed Prefetching 36

Page 37: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Thresholds

HPCA-13 Feedback Directed Prefetching 37

Page 38: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Prefetches Sent

HPCA-13 Feedback Directed Prefetching 38

Page 39: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Distribution of dynamic aggressiveness level

HPCA-13 Feedback Directed Prefetching 39

Page 40: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Distribution of insertion position of prefetched blocks

HPCA-13 Feedback Directed Prefetching 40

Page 41: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Effect of FDP on memory bandwidth consumption

HPCA-13 Feedback Directed Prefetching 41

Page 42: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Performance of Prefetch cache vs FDP

HPCA-13 Feedback Directed Prefetching 42

Page 43: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Bandwidth consumption of prefetch cache vs. FDP

HPCA-13 Feedback Directed Prefetching 43

Page 44: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Effect of FDP on GHB

HPCA-13 Feedback Directed Prefetching 44

Page 45: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Effect of FDP on GHB(Bandwidth)

HPCA-13 Feedback Directed Prefetching 45

Page 46: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

Effect of varying L2 size and memory latency

HPCA-13 Feedback Directed Prefetching 46

Page 47: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

IPC on other benchmarks

HPCA-13 Feedback Directed Prefetching 47

Page 48: Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt §¥ ¥ §

BPKI on other benchmarks

HPCA-13 Feedback Directed Prefetching 48


Recommended