+ All Categories
Home > Documents > Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science...

Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science...

Date post: 22-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
28
Copyright 1998 UC, Irvine 1 Miss Stride Buffer Miss Stride Buffer Department of Information and Computer Department of Information and Computer Science Science University of California, Irvine University of California, Irvine
Transcript

Copyright 1998 UC, Irvine 1

Miss Stride BufferMiss Stride Buffer

Department of Information and Computer Department of Information and Computer ScienceScience

University of California, IrvineUniversity of California, Irvine

Copyright 1998 UC, Irvine 2

IntroductionIntroduction

In this presentation, we present a new technique to eliminate conflict miss in cache. We use a Miss History Buffer to record the miss address. From the buffer we can calculate the miss stride to predict what address will miss again. Then we prefetch this memory address into cache. Experiment shows this technique is very effective in eliminate conflict miss in some applications, and it incurs little increase in bandwidth.

Copyright 1998 UC, Irvine 3

OverviewOverview

• Importance of cache performanceImportance of cache performance

• Techniques to reduce cache missesTechniques to reduce cache misses

• Our Approach--Miss Stride BufferOur Approach--Miss Stride Buffer

• ExperimentsExperiments

• DiscussionDiscussion

Copyright 1998 UC, Irvine 4

Importance of Cache performanceImportance of Cache performance

• Disparity of processor and memory Disparity of processor and memory speedspeed

• Cache missCache miss– CompulsoryCompulsory

– CapacityCapacity

– ConflictConflict

• Increasing cache miss penalty for Increasing cache miss penalty for faster machinefaster machine

Copyright 1998 UC, Irvine 5

Techniques to reduce Cache missTechniques to reduce Cache miss

• All use some kind of predictions on All use some kind of predictions on the pattern of missthe pattern of miss

• Victim CacheVictim Cache

• Stream BufferStream Buffer

• Stride prefetchStride prefetch

Copyright 1998 UC, Irvine 6

Victim CacheVictim Cache

• Mainly used to eliminate conflict missMainly used to eliminate conflict miss

• Prediction: the memory address of a cache line Prediction: the memory address of a cache line that is replaced is likely to be accessed again in that is replaced is likely to be accessed again in near futurenear future

• Scenario for prediction to be effective: false Scenario for prediction to be effective: false sharing, ugly address mappingsharing, ugly address mapping

• Architecture implementation: use a on-chip buffer Architecture implementation: use a on-chip buffer to store the contents of recently replaced cache to store the contents of recently replaced cache lineline

Copyright 1998 UC, Irvine 7

Copyright 1998 UC, Irvine 8

Drawback of Victim CacheDrawback of Victim Cache

• Ugly mapping can be rectified by cache Ugly mapping can be rectified by cache aware compileraware compiler

• Small size of victim cache, probability of Small size of victim cache, probability of memory address reuse within short period is memory address reuse within short period is very low.very low.

• Experiment shows victim cache is not Experiment shows victim cache is not effective effective

Copyright 1998 UC, Irvine 9

Stream BufferStream Buffer

• Mainly used to eliminate compulsory/capacity Mainly used to eliminate compulsory/capacity missesmisses

• Prediction: if a memory address is missed, the Prediction: if a memory address is missed, the consecutive address is likely to be missed in near consecutive address is likely to be missed in near futurefuture

• Scenario for prediction to be useful: stream accessScenario for prediction to be useful: stream access

• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch consecutive address into on-chip miss, prefetch consecutive address into on-chip buffer. When there is a hit in stream buffer, buffer. When there is a hit in stream buffer, prefetch the consecutive address of the hit prefetch the consecutive address of the hit address.address.

Copyright 1998 UC, Irvine 10

Copyright 1998 UC, Irvine 11

Stream CacheStream Cache

• Modification of stream bufferModification of stream buffer

• Use a separate cache to store stream data to Use a separate cache to store stream data to prevent cache pollution prevent cache pollution

• When there is a hit in stream buffer, the hit When there is a hit in stream buffer, the hit address is sent to stream cache instead of L1 address is sent to stream cache instead of L1 cachecache

Copyright 1998 UC, Irvine 12

Copyright 1998 UC, Irvine 13

Stride PrefetchStride Prefetch

• Mainly used to eliminate compulsory/capacity missMainly used to eliminate compulsory/capacity miss

• Prediction: if a memory address is missed, an Prediction: if a memory address is missed, an address that is offset by a distance from the missed address that is offset by a distance from the missed address is likely to be missed in near futureaddress is likely to be missed in near future

• Scenario for prediction to be useful: stride accessScenario for prediction to be useful: stride access

• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch address that is offset by a distance miss, prefetch address that is offset by a distance from the missed address. When there is a hit in from the missed address. When there is a hit in buffer, also prefetch the address that is offset by a buffer, also prefetch the address that is offset by a distance from the hit address.distance from the hit address.

Copyright 1998 UC, Irvine 14

Miss Stride BufferMiss Stride Buffer

• Mainly used to eliminate conflict missMainly used to eliminate conflict miss

• Prediction: if a memory address miss again Prediction: if a memory address miss again after after N N other misses, the memory address is other misses, the memory address is likely to miss again after likely to miss again after NN other misses other misses

• Scenario for the prediction to be usefulScenario for the prediction to be useful– multiple loop nestsmultiple loop nests

– some variables or array elements are reused some variables or array elements are reused across iterationsacross iterations

Copyright 1998 UC, Irvine 15

Advantage over Victim CacheAdvantage over Victim Cache

• Eliminate conflict miss that even cache Eliminate conflict miss that even cache aware compiler can not eliminateaware compiler can not eliminate– Ugly mappings are fewer and can be rectifiedUgly mappings are fewer and can be rectified

– Much more conflicts are random. From probability Much more conflicts are random. From probability perspective, a certain memory address will conflict perspective, a certain memory address will conflict with other addresses after some time, but we can with other addresses after some time, but we can not know at compile time which address it will not know at compile time which address it will conflict.conflict.

• There can be a much longer period before There can be a much longer period before the conflict address is reusedthe conflict address is reused– Victim cache’s small sizeVictim cache’s small size

Copyright 1998 UC, Irvine 16

Architecture ImplementationArchitecture Implementation

• Memory history bufferMemory history buffer– FIFO buffer to record recently missed memory FIFO buffer to record recently missed memory

addressaddress

– Predict only when there is a hit in the bufferPredict only when there is a hit in the buffer

– Miss stride can be calculated by the relative Miss stride can be calculated by the relative position of consecutive miss for the same addressposition of consecutive miss for the same address

– The size of the buffer determines the number of The size of the buffer determines the number of predictionspredictions

• Prefetch buffer (On-chip)Prefetch buffer (On-chip)– Store the contents of prefetched memory addressStore the contents of prefetched memory address

– The size of the buffer determines how much we can The size of the buffer determines how much we can tolerate the variation of miss stridetolerate the variation of miss stride

Copyright 1998 UC, Irvine 17

Architecture ImplementationArchitecture Implementation

• Prefetch schedulerPrefetch scheduler– Select a right time to prefetchSelect a right time to prefetch

– Avoid collisionAvoid collision

• PrefetcherPrefetcher– prefetch the contents of miss address into on-chip prefetch the contents of miss address into on-chip

prefetch bufferprefetch buffer

Copyright 1998 UC, Irvine 18

Copyright 1998 UC, Irvine 19

Copyright 1998 UC, Irvine 21

ExperimentExperiment

• Application: Matrix MultiplyApplication: Matrix Multiply#define N 257main() {

int i, j, k, sum, a[N][N], b[N][N], c[N][N];

for ( i=0; i<N; i++ ) for ( j=0; j<N; j++ ) {

b[i][j] = 1;c[i][j] = 1;

}for ( i=0; i<N; i++ )

for ( j=0; j<N; j++ ) {sum = 0;for ( k=0; k<N; k++ ) {

sum += b[i][k]+c[k][j]; }a[i][j] = sum;

}}

Copyright 1998 UC, Irvine 22

Rates

0

0.2

0.4

0.6

0.8

1

Matrix Size

Ra

te

Cache Miss RateMHB Hit RatioPrefetch Hit RatioMiss Elimination Ratio

Configuration: MHB--4096 entriesPrefetch Buffer--32 entries

Copyright 1998 UC, Irvine 23

Speedup VS. Bandwidth Increase

00.020.040.060.08

0.10.120.14

63 65 127

129

233

256

297

477

691

Speedup

BandwidthIncrease

Miss Penalty--6 cycles

Copyright 1998 UC, Irvine 24

MHB Size VS. Prefetch Buffer Size

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 4 8 16 32 64

Prefetch Buffer Size

Pre

fetc

h H

it R

ate

256

512

1024

2048

3072

4096

Copyright 1998 UC, Irvine 25

Memory History Buffer Hit Rate

0.955387 0.968175

0.2741950.2653470.2629930.261906

0.2

0.4

0.6

0.8

1

256 512 1024 2048 3072 4096

Buffer Size

Hit

Rat

e

Copyright 1998 UC, Irvine 26

Smooth Ugly Mapping

00.020.040.060.08

0.1

127 128 129 255 256 257

Matrix Size

Mis

s R

ate

Cache Miss Rate Miss Rate After Prefetch

Copyright 1998 UC, Irvine 27

DiscussionDiscussion

• The effectiveness depends on the hit ratio in The effectiveness depends on the hit ratio in MHBMHB

• Combined with blocking to increase the hit Combined with blocking to increase the hit ratio in MHBratio in MHB

• Used with victim cacheUsed with victim cache– long time vs.. short time memory address reuselong time vs.. short time memory address reuse

• Used with other miss elimination techniquesUsed with other miss elimination techniques– decrease the number of miss seen by MHB, decrease the number of miss seen by MHB,

equivalent to increase the size of MHBequivalent to increase the size of MHB

– More accurate predictionMore accurate prediction

Copyright 1998 UC, Irvine 28

DiscussionDiscussion

• ReconfigurationReconfiguration– Miss stride prefetch buffer, victim cache, and Miss stride prefetch buffer, victim cache, and

stream buffer share the same big buffer, stream buffer share the same big buffer, dynamically partition buffers dynamically partition buffers

– Use Conflict counter to recognize recent cache Use Conflict counter to recognize recent cache miss pattern--conflict dominant or notmiss pattern--conflict dominant or not


Recommended