+ All Categories
Home > Documents > ECE8833 Polymorphous and Many-Core Computer Architecture

ECE8833 Polymorphous and Many-Core Computer Architecture

Date post: 13-Jan-2016
Category:
Upload: early
View: 25 times
Download: 0 times
Share this document with a friend
Description:
ECE8833 Polymorphous and Many-Core Computer Architecture. Lecture 6 Fair Caching Mechanisms for CMP. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering. Cache Sharing in CMP [Kim, Chandra, Solihin, PACT’04]. Processor Core 1. Processor Core 2. L1 $. L1 $. L2 $. ……. - PowerPoint PPT Presentation
Popular Tags:
55
ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. School of Electrical and Computer Engineer Lecture 6 Fair Caching Mechanisms for CMP
Transcript
Page 1: ECE8833 Polymorphous and Many-Core Computer Architecture

ECE8833 Polymorphous and Many-Core Computer Architecture

Prof. Hsien-Hsin S. LeeSchool of Electrical and Computer Engineering

Lecture 6 Fair Caching Mechanisms for CMP

Page 2: ECE8833 Polymorphous and Many-Core Computer Architecture

2ECE8833 H.-H. S. Lee 2009

Cache Sharing in CMP [Kim, Chandra, Solihin, PACT’04]

L2 $

L1 $

……

Processor Core 1 Processor Core 2

L1 $

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

[Kim, Chandra, Solihin PACT2004]

Page 3: ECE8833 Polymorphous and Many-Core Computer Architecture

3ECE8833 H.-H. S. Lee 2009

Cache Sharing in CMP

L2 $

L1 $

……

Processor Core 1

L1 $

Processor Core 2←t1

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 4: ECE8833 Polymorphous and Many-Core Computer Architecture

4ECE8833 H.-H. S. Lee 2009

Cache Sharing in CMP

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

L1 $

Processor Core 1

L1 $

Processor Core 2

L2 $

……

t2→

Page 5: ECE8833 Polymorphous and Many-Core Computer Architecture

5ECE8833 H.-H. S. Lee 2009

Cache Sharing in CMP

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

L1 $

L2 $

……

Processor Core 1 Processor Core 2←t1

L1 $

t2→

t2’s throughput is significantly reduced due to unfair cache sharing.

Page 6: ECE8833 Polymorphous and Many-Core Computer Architecture

6ECE8833 H.-H. S. Lee 2009

Shared L2 Cache Space Contention

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

0

2

4

6

8

10

gzip(alone) gzip+applu gzip+apsi gzip+art gzip+swim

gzip's Normalized

Cache MissesPer

Instruction

0

0.2

0.4

0.6

0.8

1

1.2

gzip(alone) gzip+applu gzip+apsi gzip+art gzip+swim

gzip'sNormalized

IPC

Page 7: ECE8833 Polymorphous and Many-Core Computer Architecture

7ECE8833 H.-H. S. Lee 2009

Impact of Unfair Cache Sharing

7

• Uniprocessor scheduling

• 2-core CMP scheduling

• gzip will get more time slices than others if gzip is set to run at higher priority (and it could run slower than others priority inversion)

• It could further slows down the other processes (starvation)• Thus the overall throughput is reduced (uniform slowdown)

t1t4

t1t3t2

t1t2

t1t3

t1t2

t1t3

t4t1P1:

P2:

time slice

time slice

Page 8: ECE8833 Polymorphous and Many-Core Computer Architecture

8ECE8833 H.-H. S. Lee 2009

Stack Distance Profiling Algorithm

CTR Pos0

CTR Pos1

CTR Pos2

CTR Pos3

MRU LRU

HITCounters

Cache Tag

HIT Counters Value

CTR Pos 0CTR Pos 1CTR Pos 2CTR Pos 3

30201510

Misses = 25

[Qureshi+, MICRO-39]

Page 9: ECE8833 Polymorphous and Many-Core Computer Architecture

9ECE8833 H.-H. S. Lee 2009

Stack Distance Profiling

• A counter for each cache way, C>A is the counter for misses• Show the reuse frequency for each way in a cache• Can be used to predict the misses for associativity smaller than “A”

– Misses for 2-way cache for gzip = C>A + Σ Ci where i = 3 to 8• art does not need all the space for likely poor temporal locality• If the given space is halved for art and given to gzip, what

happens?

Page 10: ECE8833 Polymorphous and Many-Core Computer Architecture

10ECE8833 H.-H. S. Lee 2009

Fairness Metrics [Kim et al. PACT’04]

• Uniform slowdown

j

j

i

i

aloneT

sharedT

aloneT

sharedT

_

_

_

_

Execution time of ti when it runs

alone.

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 11: ECE8833 Polymorphous and Many-Core Computer Architecture

11ECE8833 H.-H. S. Lee 2009

Fairness Metrics [Kim et al. PACT’04]

• Uniform slowdown

Execution time of ti when it shares

cache with others.

j

j

i

i

aloneT

sharedT

aloneT

sharedT

_

_

_

_

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 12: ECE8833 Polymorphous and Many-Core Computer Architecture

12ECE8833 H.-H. S. Lee 2009

Fairness Metrics [Kim et al. PACT’04]

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

• Uniform slowdown

• We want to minimize:– Ideally:

i

iiji

ij

aloneT

sharedTXwhereXXM

_

_,0

j

j

i

i

aloneT

sharedT

aloneT

sharedT

_

_

_

_

Try to equalize the ratio of miss increase of each thread

Page 13: ECE8833 Polymorphous and Many-Core Computer Architecture

13ECE8833 H.-H. S. Lee 2009

Fairness Metrics [Kim et al. PACT’04]

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

• Uniform slowdown

• We want to minimize:– Ideally:

i

iiji

ij

aloneT

sharedTXwhereXXM

_

_,0

j

j

i

i

aloneT

sharedT

aloneT

sharedT

_

_

_

_

i

iiji

ij

aloneMiss

sharedMissXwhereXXM

_

_,1

i

iiji

ij

aloneMissRate

sharedMissRateXwhereXXM

_

_,3

iiijiij aloneMissRatesharedMissRateXwhereXXM __,5

Page 14: ECE8833 Polymorphous and Many-Core Computer Architecture

14ECE8833 H.-H. S. Lee 2009

Partitionable Cache Hardware

LRULRU

LRULRU

P1: 448B

P2 Miss

P2: 576B

Current Partition

P1: 384BP2: 640B

Target Partition

• Modified LRU cache replacement policy– G. E. Suh, et. al., HPCA 2002

Per-thread Counter

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 15: ECE8833 Polymorphous and Many-Core Computer Architecture

15ECE8833 H.-H. S. Lee 2009

Partitionable Cache Hardware

LRULRU

LRU* LRU

P1: 448B

P2 Miss

P2: 576B

Current Partition

P1: 384BP2: 640B

Target Partition

• Modified LRU cache replacement policy– G. Suh, et. al., HPCA 2002

LRULRU

LRU* LRU

P1: 384BP2: 640B

Current Partition

P1: 384BP2: 640B

Target Partition

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Partition granularity could be as coarse as one entire cache

way

Page 16: ECE8833 Polymorphous and Many-Core Computer Architecture

16ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

P1:

P2:

Ex) OptimizingM3 metric

P1:

P2:

Target Partition

MissRate alone

P1:

P2:

MissRate shared

Repartitioninginterval

Counters to keep miss rates running the process alone

(from stack distance profiling)

Counters to keep dynamic miss rates

(running with a shared cache)

Counters to keep target

partition size

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

10K accesses found to be the

best

Page 17: ECE8833 Polymorphous and Many-Core Computer Architecture

17ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

1st Interval P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:

P2:

MissRate shared

P1:20%

P2:15%

MissRate shared

P1:256KB

P2:256KB

Target Partition

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 18: ECE8833 Polymorphous and Many-Core Computer Architecture

18ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

Repartition!

Evaluate M3P1: 20% / 20%P2: 15% / 5%

P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:20%

P2:15%

MissRate shared

P1:256KB

P2:256KB

Target Partition

P1:192KB

P2:320KB

Target Partition

Partition granularity: 64KB

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 19: ECE8833 Polymorphous and Many-Core Computer Architecture

19ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

2nd Interval P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:20%

P2:15%

MissRate shared

P1:20%

P2:15%

MissRate shared

P1:192KB

P2:320KB

Target Partition

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 20: ECE8833 Polymorphous and Many-Core Computer Architecture

20ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

Repartition!

Evaluate M3P1: 20% / 20%P2: 10% / 5%

P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:20%

P2:15%

MissRate shared

P1:20%

P2:10%

MissRate shared

P1:192KB

P2:320KB

Target Partition

P1:128KB

P2:384KB

Target Partition

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 21: ECE8833 Polymorphous and Many-Core Computer Architecture

21ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

3rd Interval P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:20%

P2:10%

MissRate shared

P1:128KB

P2:384KB

Target Partition

P1:20%

P2:10%

MissRate shared

P1:25%

P2: 9%

MissRate shared

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

Page 22: ECE8833 Polymorphous and Many-Core Computer Architecture

22ECE8833 H.-H. S. Lee 2009

Dynamic Fair Caching Algorithm

Repartition!Do Rollback if:P2: Δ<Trollback

Δ=MRold-MRnew

P1:20%

P2: 5%

MissRate alone

Repartitioninginterval

P1:20%

P2:10%

MissRate shared

P1:25%

P2: 9%

MissRate shared

P1:128KB

P2:384KB

Target Partition

P1:192KB

P2:320KB

Target Partition

Slide courtesy: Seongbeom Kim, D. Chandra and Y. Solihin @NCSU

The best Trollback

threshold found to be 20%

Page 23: ECE8833 Polymorphous and Many-Core Computer Architecture

23ECE8833 H.-H. S. Lee 2009

Generic Repartitioning Algorithm

Pick the largest and smallest as a pair for

repartitioning

Repeat for all candidate processes

Page 24: ECE8833 Polymorphous and Many-Core Computer Architecture

Utility-Based Cache Partitioning (UCP)

Page 25: ECE8833 Polymorphous and Many-Core Computer Architecture

25ECE8833 H.-H. S. Lee 2009

Running Processes on Dual-Core [Qureshi & Patt, MICRO-39]

• LRU: in real runs on avg., 7 ways were allocated to equake and 9 to vpr• UTIL

– How much you use (in a set) is how much you will get – Ideally, 3 ways to equake and 13 to vpr

# of ways given (1 to 16) # of ways given (1 to 16)

Page 26: ECE8833 Polymorphous and Many-Core Computer Architecture

26ECE8833 H.-H. S. Lee 2009

Defining Utility

Utility Uab = Misses with a ways – Misses with b ways

Low Utility

High Utility

Saturating Utility

Num ways from 16-way 1MB L2

Mis

ses

per

10

00

in

stru

ctio

ns

Slide courtesy: Moin Qureshi, MICRO-39

Page 27: ECE8833 Polymorphous and Many-Core Computer Architecture

27ECE8833 H.-H. S. Lee 2009

Framework for UCP

Slide courtesy: Moin Qureshi, MICRO-39

Three components:

Utility Monitors (UMON) per core

Partitioning Algorithm (PA)

Replacement support to enforce partitions

I$

D$Core1

I$

D$Core2

SharedL2 cache

Main Memory

UMON1 UMON2PA

Page 28: ECE8833 Polymorphous and Many-Core Computer Architecture

28ECE8833 H.-H. S. Lee 2009

Utility Monitors (UMON) For each core, simulate LRU policy using Auxiliary Tag Dir (ATD)

UMON-global (one way-counter for all sets) Hit counters in ATD to count hits per recency position LRU is a stack algorithm: hit counts utility E.g., hits(2 ways) = H0+H1

Set ASet BSet CSet DSet ESet FSet GSet H

+ + + ++...(MRU) (LRU)H0 H1 H2 H3 H15

ATD

Page 29: ECE8833 Polymorphous and Many-Core Computer Architecture

29ECE8833 H.-H. S. Lee 2009

Utility Monitors (UMON) Extra tags incur hardware and power overhead

DSS reduces overhead [Qureshi et al. ISCA’06]

Set ASet BSet CSet DSet ESet FSet GSet H

+ + + ++...(MRU) (LRU)H0 H1 H2 H3 H15

ATD

Set ASet BSet CSet DSet ESet FSet GSet H

Page 30: ECE8833 Polymorphous and Many-Core Computer Architecture

30ECE8833 H.-H. S. Lee 2009

Utility Monitors (UMON) Extra tags incur hardware and power overhead

DSS reduces overhead [Qureshi et al. ISCA’06]

32 sets sufficient based on Chebyshev’s inequality

Sample every 32 sets (simple static) used in the paper

Storage < 2KB/UMON (or 0.17% L2)

Set ASet BSet CSet DSet ESet FSet GSet H

+ + + ++...(MRU) (LRU)H0 H1 H2 H3 H15

ATD

UMON (DSS)

Set BSet ESet F

Page 31: ECE8833 Polymorphous and Many-Core Computer Architecture

31ECE8833 H.-H. S. Lee 2009

Partitioning Algorithm (PA) Evaluate all possible partitions and select the best

With a ways to core1 and (16-a) ways to core2: Hitscore1 = (H0 + H1 + … + Ha-1) ---- from

UMON1 Hitscore2 = (H0 + H1 + … + H16-a-1) ---- from UMON2 Select a that maximizes (Hitscore1 + Hitscore2)

Partitioning done once every 5 million cycles

After each partitioning interval Hit counters in all UMONs are halved To retain some past information

Page 32: ECE8833 Polymorphous and Many-Core Computer Architecture

32ECE8833 H.-H. S. Lee 2009

Replacement Policy to Reach Desired PartitionUse way partitioning [Suh+ HPCA’02, Iyer ICS’04]

• Each Line contains core-id bits• On a miss, count ways_occupied in the set by

miss-causing app• Binary decision for dual-core (in this paper)

ways_occupied < ways_given

Yes No

Victim is the LRU line from other app

Victim is the LRU line from miss-causing app

Page 33: ECE8833 Polymorphous and Many-Core Computer Architecture

33ECE8833 H.-H. S. Lee 2009

UCP Performance (Weighted Speedup)

UCP improves average weighted speedup by 11% (Dual Core)

Page 34: ECE8833 Polymorphous and Many-Core Computer Architecture

34ECE8833 H.-H. S. Lee 2009

UPC Performance (Throughput)

UCP improves average throughput by 17%

Page 35: ECE8833 Polymorphous and Many-Core Computer Architecture

Dynamic Insertion Policy

Page 36: ECE8833 Polymorphous and Many-Core Computer Architecture

36ECE8833 H.-H. S. Lee 2009

Conventional LRU

MRUMRU LRULRU

Slide Source: Yuejian XieSlide Source: Yuejian Xie

Page 37: ECE8833 Polymorphous and Many-Core Computer Architecture

37ECE8833 H.-H. S. Lee 2009

Conventional LRU

MRUMRU LRULRU

Occupies one cache blockfor a long time with no benefit!

Occupies one cache blockfor a long time with no benefit!

Slide Source: Yuejian Xie

Page 38: ECE8833 Polymorphous and Many-Core Computer Architecture

38ECE8833 H.-H. S. Lee 2009

LIP: LRU Insertion Policy [Qureshi et al. ISCA’07]

MRUMRU LRULRU

Incoming Block

Incoming Block

38

Slide Source: Yuejian Xie

Page 39: ECE8833 Polymorphous and Many-Core Computer Architecture

39ECE8833 H.-H. S. Lee 2009

LIP: LRU Insertion Policy [Qureshi et al. ISCA’07]

MRUMRU LRULRU

Useless Block Evicted at next eviction

Useful Block Moved to MRU position

Adapted Slide from Yuejian Xie

Page 40: ECE8833 Polymorphous and Many-Core Computer Architecture

40ECE8833 H.-H. S. Lee 2009

LIP: LRU Insertion Policy [Qureshi et al. ISCA’07]

MRUMRU LRULRU

Useless Block Evicted at next eviction

Useful Block Moved to MRU position

Slide Source: Yuejian Xie

LIP is not entirely new, Intel has tried this in 1998 when designing “Timna” (integrating CPU and Gfx accelerator that share L2)

Page 41: ECE8833 Polymorphous and Many-Core Computer Architecture

41ECE8833 H.-H. S. Lee 2009

BIP: Bimodal Insertion Policy [Qureshi et al. ISCA’07]

if ( rand() < e ) Insert at MRU position; // LRU replacement policy

elseInsert at LRU position;

Promote to MRU if reused

LIP may not age older lines

Infrequently insert lines in MRU position

Let e = Bimodal throttle parameter

Page 42: ECE8833 Polymorphous and Many-Core Computer Architecture

42ECE8833 H.-H. S. Lee 2009

DIP: Dynamic Insertion Policy [Qureshi et al. ISCA’07]

Two types of workloads: LRU-friendly or BIP-friendly

DIP can be implemented by:

1. Monitor both policies (LRU and BIP)

2. Choose the best-performing policy

3. Apply the best policy to the cache

Need a cost-effective implementation “Set Dueling”

DIP

BIP LRU

LIP LRU

ε1-ε

Page 43: ECE8833 Polymorphous and Many-Core Computer Architecture

43ECE8833 H.-H. S. Lee 2009

Set Dueling for DIP [Qureshi et al. ISCA’07]

LRU-sets

Follower Sets

BIP-sets

Divide the cache in three:• Dedicated LRU sets• Dedicated BIP sets • Follower sets (winner of LRU,BIP)

n-bit saturating counter misses to LRU sets: counter++misses to BIP sets : counter--

Counter decides policy for follower sets:• MSB = 0, Use LRU• MSB = 1, Use BIP

n-bit cntr+

miss

–miss

MSB = 0?

YES No

Use LRU Use BIP

monitor choose apply

(using a single counter)

Slide Source: Moin Qureshi

Page 44: ECE8833 Polymorphous and Many-Core Computer Architecture

Promotion/Insertion Pseudo Partitioning

Page 45: ECE8833 Polymorphous and Many-Core Computer Architecture

45ECE8833 H.-H. S. Lee 2009

PIPP [Xie & Loh ISCA’09] • What’s PIPP?

– Promotion/Insertion Pseudo Partitioning– Achieving both capacity (UCP) and dead-time management

(DIP).

• Eviction– LRU block as the victim

• Insertion– The core’s quota worth of blocks away from LRU

• Promotion– To MRU by only one.

MRUMRU LRULRU

To Evict

Promote

HitHit

Insert Position = 3 (Target Allocation)

NewNew

45

Slide Source: Yuejian Xie

Page 46: ECE8833 Polymorphous and Many-Core Computer Architecture

46ECE8833 H.-H. S. Lee 2009

PIPP ExampleCore0 quota: 5

blocksCore1 quota: 3

blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

11 AA 22 33 44 55BB CC

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

MRU

MRU LRULRU

Core1’s quota=3

DD

Slide Source: Yuejian Xie

Page 47: ECE8833 Polymorphous and Many-Core Computer Architecture

47ECE8833 H.-H. S. Lee 2009

PIPP ExampleCore0 quota: 5

blocksCore1 quota: 3

blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

11 AA 22 5533 44 DD BB

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

MRU

MRU LRULRU

66

Core0’s quota=5

Slide Source: Yuejian Xie

Page 48: ECE8833 Polymorphous and Many-Core Computer Architecture

48ECE8833 H.-H. S. Lee 2009

PIPP ExampleCore0 quota: 5

blocksCore1 quota: 3

blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

11 AA 22 66 33 44 DD BB

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

MRU

MRU LRULRU

Core0’s quota=5

77

Slide Source: Yuejian Xie

Page 49: ECE8833 Polymorphous and Many-Core Computer Architecture

49ECE8833 H.-H. S. Lee 2009

PIPP ExampleCore0 quota: 5

blocksCore1 quota: 3

blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

11 AA 22 66 33 44 DD

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

MRU

MRU LRULRU

DD

77

Slide Source: Yuejian Xie

Page 50: ECE8833 Polymorphous and Many-Core Computer Architecture

50ECE8833 H.-H. S. Lee 2009

Core0 Core1 Core2 Core3

Quota 6 4 4 2

MRU

MRU LRULRU

Insert closer to LRU

position

Insert closer to LRU

position

50

How PIPP Does Both Management

Slide Source: Yuejian Xie

Page 51: ECE8833 Polymorphous and Many-Core Computer Architecture

51ECE8833 H.-H. S. Lee 2009 51

MRU0

Core0 quota: 5 blocks

Core1 quota: 3 blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

Strict Partition

Strict Partition

MRU1 LRU1LRU0

NewNew

Pseudo Partitioning Benefits

Slide Source: Yuejian Xie

Page 52: ECE8833 Polymorphous and Many-Core Computer Architecture

52ECE8833 H.-H. S. Lee 2009 52

MRU

LRU

Core0 quota: 5 blocks

Core1 quota: 3 blocks

Core0 quota: 5 blocks

Core1 quota: 3 blocks

Core0’s Block

Core0’s Block

Core1’s Block

Core1’s Block

Request

NewNew

Pseudo PartitionPseudo Partition

Pseudo Partitioning Benefits

Slide Source: Yuejian Xie

Core1 “stole” a line from Core0

Page 53: ECE8833 Polymorphous and Many-Core Computer Architecture

53ECE8833 H.-H. S. Lee 2009

Pseudo Partitioning Benefits

Page 54: ECE8833 Polymorphous and Many-Core Computer Architecture

54ECE8833 H.-H. S. Lee 2009 54

NewNew

MRU

MRU LRULRU

MRU

MRU LRULRU

NewNew

Single Reuse Block

Slide Source: Yuejian Xie

Page 55: ECE8833 Polymorphous and Many-Core Computer Architecture

55ECE8833 H.-H. S. Lee 2009 55

AlgorithmCapacity

Management

Dead-time Managemen

tNote

LRUBaseline, no explicit

management

UCP Strict partitioning

DIP / TADIPInsert at LRU and

promote to MRU on hit

PIPP Pseudo-partitioning and incremental promotion

Algorithm Comparison

Slide Source: Yuejian Xie


Recommended