+ All Categories
Home > Documents > Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Date post: 16-Feb-2016
Category:
Upload: atalo
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. Snehasish Kumar Arrvindh Shriraman Eric Matthews Lesley Shannon. Hongzhou Zhao Sandhya Dwarkadas. Fixed granularity cache organisation. Tag Array. Data Array. Cache data utilization. Tag Array. - PowerPoint PPT Presentation
Popular Tags:
56
Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar Arrvindh Shriraman Eric Matthews Lesley Shannon Hongzhou Zhao Sandhya Dwarkadas
Transcript
Page 1: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish KumarArrvindh ShriramanEric MatthewsLesley Shannon

Hongzhou ZhaoSandhya Dwarkadas

Page 2: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

2

Fixed granularity cache organisation

Tag Array Data Array

Page 3: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

3

Cache data utilization

Tags Data UntouchedData

Tag Array Data Array

Utilization = Fraction of words touched in cache block at the time of eviction

Page 4: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

4Ser0%

25%

50%

75%

100%64K L1 – 4 ways – 64B/block

apac

he

cann

.

eclip

se

firef

ox

h2 jbb

lbm

mcf

tpcc

x264

Cache utilization

Page 5: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

5

55%

13%6%

26%

18%5%4%

73%

Block Distribution

1-2

3-4

5-6

7-8

40%

26%

9%

25%

75%

14%

6%5%

Apac

heEc

lipse

Fire

fox

Cann

eal

# Words Touched

64K – 64B/block

Page 6: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

6

58%20%

12%

10%

Block Distribution

1-2

3-4

5-6

7-8

75%

14%

6%5%

Cann

eal

Cann

eal

# Words Touched

64K – 64B/block 1M – 64B/block

Page 7: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

7

Application specific behaviour ― Inefficient data structure access

patterns

Interaction with cache geometry— Way conflicts reduce block lifetime

and cause poor utilization

Factors affecting cache utilization

Page 8: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

8

Application Specific Behaviour

struct TIE {long long X, Y, Z;long long V, H;long long data[3];

} Imperial[1024];

Data[3]X Y HZ V

Access in a loop

Data Arrayfor (int i=0; i<1024; i++){

Imperial[i].X = …;Imperial[i].Y = …;Imperial[i].Z = …;Imperial[i].V = …;

}

Page 9: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

9

Cache Geometry

Data Array – 4 ways

Problem : Lots of data map to same set

1 2 3

4 5

Page 10: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

10

1. Shrinks effective cache space

2. Increases miss rate

3. Wastes on-chip bandwidth

4. Increases on-chip cache energy consumption

Implications

=

Page 11: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

11

Miss Rate

Space Utilisation

Bandwidth

AmoebaCache

Target Metrics

Page 12: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

12

Variable Granularity Blocks

Tag Array Data Array

How to support variable # of blocks / set ?

How to support variable granularity for each block?

Page 13: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

13

Our Approach : Amoeba Cache

Unified SRAM Array

Page 14: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

14

Amoeba Cache

• Insert• Lookup• Partial Miss• Overheads

Page 15: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

15

SRAM Array

Region Tag Start End

1 word 1+ words

SRAM Array

Tag Data Block

Bitmaps

0000Valid? Tag?

0000

0000 0000

0000 0000

0000 0000

Page 16: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

16

Tag - Regions

Memory

Region

RMAXbytes

Region Tag ByteStart / EndSet Index

3

64 bit address

Top 3

Page 17: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

17

Example

struct TIE {long long X, Y, Z;long long V, H;long long data[3];

} Imperial;

Imperial.X = … ;

Miss

Invoke Spatial Granularity Predictor(PC/Region based)

Fetch

Tag X Y Z V

Page 18: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

18

00000000

Valid? Tag?

Amoeba Cache – Insert (8words/set)

00000000SRAM Array / Set

Miss

Insert 4+1 words

00000 substring()

1Pos: 0

Tag X Y Z V

Page 19: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

19

00000000Valid? Tag?

Amoeba Cache – Insert (8words/set)

00000000

SRAM Array / Set

11111000

Tag X Y Z V

Refill

210000000

3

Tag X Y Z V

Page 20: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

20

Example

struct TIE {long long X, Y, Z;long long V, H;long long data[3];

} Imperial;

Imperial.Y = … ;Lookup Data from the cacheData[3]X Y HZ VX Y Z V

Tag X Y Z V

Page 21: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

21

Amoeba Cache – Lookup (8words/set)

RegionTag

Set Index

Word (W)

Tag X Y Z V

SRAM Array / Set

10000000

2x1 2x12x1 2x1

Tag?1

2 𝐴𝑑𝑑𝑟 ∈𝑇𝑎𝑔Region

==Start ≤ W

End > W Word SelectorHit?

3

Tag X Y Z VOutput Buffer

Criti

cal P

ath

Page 22: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

22

Partial MissIdentify Sub-Blocks Step 1 of 2

New ∩ Tags

1

MSHR 2 Evict Overlap

Fetch NewTag X Y Z V

Tag X Y Tag V H

Page 23: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

23

Partial MissInsert New Block Step 2 of 2

MSHR3

Allocate 6 words

Miss 4

5Patch Missing ?’s

Tag

Occurs ≈ 5 in 1000 accesses

Tag X Y Z V H

X Y ? V HZ

Page 24: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

Hardware Overheads

SRAM Array

24

Metadata

0000Valid? Tag?

0000

0000 0000

0000 0000 Criti

cal

Path

Extr

a

Amoe

ba C

ritica

l Pat

h

1 KB

Latency +4%

Page 25: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

25

Evaluation

• Parameters for latency and energy• Workloads

Page 26: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

26

Latency Parameters (cycles)

300

64K L1

1M LLC

CPU1

3

20

Fixe

d Gr

anul

arity

Amoe

ba C

ache

1.04 Latency +4%

Page 27: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

27

On-Chip Energy Parameters (pJ)

64K L1

1M LLC

101

230

Fixe

d Gr

anul

arity

Amoe

ba C

ache

≈ 7 / word

105

238

Page 28: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

28

• 22 diverse workloads from• PARSEC• SPEC-CPU 2000 & 2006• DaCapo ( Java Benchmarks )• Apache, Firefox and PostgreSQL

Workloads

Page 29: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

29

Results

Page 30: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

30

% Improvement in L1 Miss-Rate

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0%

10%

20%

30%

40%

Reduces L1 and L2 miss rate by 18%

Page 31: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

31

% Improvement in L1 Miss-Bandwidth

-25%

0%

25%

50%

75%

Reduces on-chip bandwidth by 46% Reduces off-chip bandwidth by 38%

Page 32: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

32

% Improvement in memory energy

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0%

10%

20%

30%

40%

Reduces energy by 11%

Page 33: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

33

% Improvement in execution time

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0%

5%

10%

15%

20%

21%

Improves performance by 10%

Page 34: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

34

Results SummaryAmoeba-Cache

• Reduce cache pollution for applications with low cache utilization

• Improve performance for moderate cache utilization

• Maintain performance for high cache utilization workloads

• Save energy for streaming applications by keeping out unused words

Page 35: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

35

Additional Results

Lookup as an extra cache pipeline stage vs. throttling the CPU

Spatial Granularity Predictor— Indexing— Training — Table Size

For extra pipeline stage, 8 of 22 applications show improvement

18 of 22 – Address region betterEvictions and First Touch

256 – PC and 1024 – Region

Page 36: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

36

Additional Results

Multicore Shared Cache

Comparison against other designs— Fixed Granularity 2X— Sector Cache variants— Multi-$

Reduces miss rate (avg 18%) and LLC miss bandwidth (16%-39%)

Page 37: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

37

Amoeba Cache

What? —Enable variable granularity data caching

Why?—Eliminate waste

How?—Unify tag and data into a single SRAM array

—Afforded by recent technology trendsWhere?

—Definitely at the L2, possibly at the L1

Page 38: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

38

Frequently Asked Questions

1. Multiple threads?

2. Compare against other designs

3. Spatial Pattern Predictor

4. Replacement Policy

Page 39: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

39

Multicore Shared Cache

Miss Miss Miss Miss BW

Mix T1 T2 T3 T4 (All)

jbb x2, tpc-c x2 12.38% 12.38% 22.29% 22.37% 39.07%

Firefox x2, x264 x2 3.82% 3.61% –2.44% 0.43% 15.71%

cactus, fluid., omnet., sopl. 1.01% 1.86% 22.38% 0.59% 18.62%

canneal, astar, ferret, milc 4.85% 2.75% 19.39% –4.07% 17.77%

Page 40: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

40

Comparison

Impact on Miss-RateImpact on BandwidthLow tag overheadTradeoff data and tag spaceDynamically resize blocks

Amoeba Cache

Multi -$Sector Variants

YesYes~

~NoYesNoNo

NoNo

Page 41: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

41

Comparison – Moderate Group – 64K

1.0 1.1 1.2 1.3 1.4 1.5 1.60.4

0.5

0.6

0.7

0.8

0.9

1.0

Miss Rate Ratio

Band

wid

th R

atio Sector

(x:2.9)

Sector-Pre

Fixed-2X

AmoebaMulti$-25

Multi$-50

Page 42: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

42

Spatial Pattern Predictor

Index Pattern

PC / Region 01011111

PC / Region 00011101

Predictor History Table

1

PC : Read Addr 0 0 0 1 1 1 0 1

2

Critical Word

Policy Miss vs Policy-Bandwidth

What to do when there is no entry?

Page 43: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

43

Predictor Training

Data Array

Index Pattern

PC / Region 01011111

PC / Region 00011101

Add / update entry on evict

Page 44: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

44

Predictor – L1 Miss Rate (1 of 2)

cann

e.

eclip

.

firef

.

h2

tpc-

c

x264

0

2

4

6

8

10Aligned Finite Infinite Finite+FT History

MPK

I

Page 45: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

45

Predictor – L1 Miss Rate (2 of 2)

apac

.

lbm

mcf jbb020406080

100120140

Aligned Finite Infinite Finite+FT History

MPK

I

Page 46: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

46

Predictor – L1 Miss Bandwidth (1 of 2)

cann

e.

eclip

.

firef

.

h2

tpc-

c

x264

0

300

600

900

1200

1500

1800Aligned Finite Infinite Finite+FT History

Band

wid

th R

ate

Page 47: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

47

Predictor – L1 Miss Bandwidth (2 of 2)

apac

.

lbm

mcf jbb0

2000

4000

6000

8000

10000Aligned Finite Infinite Finite+FT History

Band

wid

th R

ate

Page 48: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

48

Predictor – Summary

For majority applications Region Predictor with

— 1024 entry table— Table with 8 ways x 128 sets

PC Predictor is good for 5 applications— apache, art, mcf, lbm and omnetpp

Page 49: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

49

Pseudo LRU Replacement

• Logically partition the set into a Nways

• Pick a block at random from way• Unset the T? (Tag) and V? (Valid) bits

Way 0 Way 1

Page 50: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

50apac

he art

asta

rca

ctus can

eclip

se fac

ferr

etfir

efox

fluid

.fre

q. h2 jbb

lbm

mcf

milc

omne

t.so

plex

tpc-

c.tr

ade.

twol

fx2

64m

ean0

20

40

60

80

100

1-2 Words 3-4 Words 5-6 Words 7-8 WordsW

ords

Acc

esse

d (%

)

45 20 39 79 30 80 77 82 49 62 55 38 40 32 29 81 33 21 53 73 29 46 50

Access Distribution for L1W

ord

dist

ributi

on fo

r 64K

L1

Page 51: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

51

Amoeba block size distribution for L1Bl

ock

dist

ributi

on fo

r 64K

L1

apac

he art

asta

rca

ctus can

eclip

se fac

ferr

etfir

efox

fluid

.fre

q. h2 jbb

lbm

mcf

milc

omne

t.so

plex

tpc-

c.tr

ade.

twol

fx2

64m

ean0

20

40

60

80

100

1-2 Words 3-4 Words 5-6 Words 7-8 Words%

of A

moe

ba B

lock

s

92 80 98 100

67 98 88 99 78 100

94 82 89 89 93 100

83 91 91 97 70 91 90

Page 52: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

52

L1 FSM

Page 53: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

53

Miss-Rate ( 64K L1 )

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0

20

40

60

80

Fixed

Amoeba

Page 54: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

54

Miss Bandwidth Rate ( 64K L1 )

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0

2000

4000

6000

8000

10000Fixed

Amoeba

Page 55: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

55

Energy Rate ( L1 + LLC ) – (nJ/KI)

mcf

canneal

lbm h2 jbb

apache

x264

firefoxtpcc

eclipse

0

25

50

75

100Fixed

Amoeba

Page 56: Amoeba-Cache  Adaptive  Blocks for  Eliminating Waste  in the Memory Hierarchy

Amoeba Cache : Adaptive blocks for Eliminating Waste in the Memory Hierarchy

56

Reduction in execution time

0

4000

8000

12000

16000

Fixed

Amoeba


Recommended