Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science...

Copyright 1998 UC, Irvine 1

Miss Stride BufferMiss Stride Buffer

Department of Information and Computer Department of Information and Computer ScienceScience

University of California, IrvineUniversity of California, Irvine


IntroductionIntroduction

In this presentation, we present a new technique to eliminate conflict miss in cache. We use a Miss History Buffer to record the miss address. From the buffer we can calculate the miss stride to predict what address will miss again. Then we prefetch this memory address into cache. Experiment shows this technique is very effective in eliminate conflict miss in some applications, and it incurs little increase in bandwidth.


OverviewOverview

• Importance of cache performanceImportance of cache performance

• Techniques to reduce cache missesTechniques to reduce cache misses

• Our Approach--Miss Stride BufferOur Approach--Miss Stride Buffer

• ExperimentsExperiments

• DiscussionDiscussion


Importance of Cache performanceImportance of Cache performance

• Disparity of processor and memory Disparity of processor and memory speedspeed

• Cache missCache miss– CompulsoryCompulsory

– CapacityCapacity

– ConflictConflict

• Increasing cache miss penalty for Increasing cache miss penalty for faster machinefaster machine


Techniques to reduce Cache missTechniques to reduce Cache miss

• All use some kind of predictions on All use some kind of predictions on the pattern of missthe pattern of miss

• Victim CacheVictim Cache

• Stream BufferStream Buffer

• Stride prefetchStride prefetch


Victim CacheVictim Cache

• Mainly used to eliminate conflict missMainly used to eliminate conflict miss

• Prediction: the memory address of a cache line Prediction: the memory address of a cache line that is replaced is likely to be accessed again in that is replaced is likely to be accessed again in near futurenear future

• Scenario for prediction to be effective: false Scenario for prediction to be effective: false sharing, ugly address mappingsharing, ugly address mapping

• Architecture implementation: use a on-chip buffer Architecture implementation: use a on-chip buffer to store the contents of recently replaced cache to store the contents of recently replaced cache lineline



Drawback of Victim CacheDrawback of Victim Cache

• Ugly mapping can be rectified by cache Ugly mapping can be rectified by cache aware compileraware compiler

• Small size of victim cache, probability of Small size of victim cache, probability of memory address reuse within short period is memory address reuse within short period is very low.very low.

• Experiment shows victim cache is not Experiment shows victim cache is not effective effective


Stream BufferStream Buffer

• Mainly used to eliminate compulsory/capacity Mainly used to eliminate compulsory/capacity missesmisses

• Prediction: if a memory address is missed, the Prediction: if a memory address is missed, the consecutive address is likely to be missed in near consecutive address is likely to be missed in near futurefuture

• Scenario for prediction to be useful: stream accessScenario for prediction to be useful: stream access

• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch consecutive address into on-chip miss, prefetch consecutive address into on-chip buffer. When there is a hit in stream buffer, buffer. When there is a hit in stream buffer, prefetch the consecutive address of the hit prefetch the consecutive address of the hit address.address.



Stream CacheStream Cache

• Modification of stream bufferModification of stream buffer

• Use a separate cache to store stream data to Use a separate cache to store stream data to prevent cache pollution prevent cache pollution

• When there is a hit in stream buffer, the hit When there is a hit in stream buffer, the hit address is sent to stream cache instead of L1 address is sent to stream cache instead of L1 cachecache



Stride PrefetchStride Prefetch

• Mainly used to eliminate compulsory/capacity missMainly used to eliminate compulsory/capacity miss

• Prediction: if a memory address is missed, an Prediction: if a memory address is missed, an address that is offset by a distance from the missed address that is offset by a distance from the missed address is likely to be missed in near futureaddress is likely to be missed in near future

• Scenario for prediction to be useful: stride accessScenario for prediction to be useful: stride access

• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch address that is offset by a distance miss, prefetch address that is offset by a distance from the missed address. When there is a hit in from the missed address. When there is a hit in buffer, also prefetch the address that is offset by a buffer, also prefetch the address that is offset by a distance from the hit address.distance from the hit address.


Miss Stride BufferMiss Stride Buffer

• Mainly used to eliminate conflict missMainly used to eliminate conflict miss

• Prediction: if a memory address miss again Prediction: if a memory address miss again after after N N other misses, the memory address is other misses, the memory address is likely to miss again after likely to miss again after NN other misses other misses

• Scenario for the prediction to be usefulScenario for the prediction to be useful– multiple loop nestsmultiple loop nests

– some variables or array elements are reused some variables or array elements are reused across iterationsacross iterations


Advantage over Victim CacheAdvantage over Victim Cache

• Eliminate conflict miss that even cache Eliminate conflict miss that even cache aware compiler can not eliminateaware compiler can not eliminate– Ugly mappings are fewer and can be rectifiedUgly mappings are fewer and can be rectified

– Much more conflicts are random. From probability Much more conflicts are random. From probability perspective, a certain memory address will conflict perspective, a certain memory address will conflict with other addresses after some time, but we can with other addresses after some time, but we can not know at compile time which address it will not know at compile time which address it will conflict.conflict.

• There can be a much longer period before There can be a much longer period before the conflict address is reusedthe conflict address is reused– Victim cache’s small sizeVictim cache’s small size


Architecture ImplementationArchitecture Implementation

• Memory history bufferMemory history buffer– FIFO buffer to record recently missed memory FIFO buffer to record recently missed memory

addressaddress

– Predict only when there is a hit in the bufferPredict only when there is a hit in the buffer

– Miss stride can be calculated by the relative Miss stride can be calculated by the relative position of consecutive miss for the same addressposition of consecutive miss for the same address

– The size of the buffer determines the number of The size of the buffer determines the number of predictionspredictions

• Prefetch buffer (On-chip)Prefetch buffer (On-chip)– Store the contents of prefetched memory addressStore the contents of prefetched memory address

– The size of the buffer determines how much we can The size of the buffer determines how much we can tolerate the variation of miss stridetolerate the variation of miss stride


Architecture ImplementationArchitecture Implementation

• Prefetch schedulerPrefetch scheduler– Select a right time to prefetchSelect a right time to prefetch

– Avoid collisionAvoid collision

• PrefetcherPrefetcher– prefetch the contents of miss address into on-chip prefetch the contents of miss address into on-chip

prefetch bufferprefetch buffer




ExperimentExperiment

• Application: Matrix MultiplyApplication: Matrix Multiply#define N 257main() {

int i, j, k, sum, a[N][N], b[N][N], c[N][N];

for ( i=0; i<N; i++ ) for ( j=0; j<N; j++ ) {

b[i][j] = 1;c[i][j] = 1;

}for ( i=0; i<N; i++ )

for ( j=0; j<N; j++ ) {sum = 0;for ( k=0; k<N; k++ ) {

sum += b[i][k]+c[k][j]; }a[i][j] = sum;

}}


Rates

0

0.2

0.4

0.6

0.8

1

Matrix Size

Ra

te

Cache Miss RateMHB Hit RatioPrefetch Hit RatioMiss Elimination Ratio

Configuration: MHB--4096 entriesPrefetch Buffer--32 entries


Speedup VS. Bandwidth Increase

00.020.040.060.08

0.10.120.14

63 65 127

129

233

256

297

477

691

Speedup

BandwidthIncrease

Miss Penalty--6 cycles


MHB Size VS. Prefetch Buffer Size

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 4 8 16 32 64

Prefetch Buffer Size

Pre

fetc

h H

it R

ate

256

512

1024

2048

3072

4096


Memory History Buffer Hit Rate

0.955387 0.968175

0.2741950.2653470.2629930.261906

0.2

0.4

0.6

0.8

1

256 512 1024 2048 3072 4096

Buffer Size

Hit

Rat

e


Smooth Ugly Mapping

00.020.040.060.08

0.1

127 128 129 255 256 257

Matrix Size

Mis

s R

ate

Cache Miss Rate Miss Rate After Prefetch


DiscussionDiscussion

• The effectiveness depends on the hit ratio in The effectiveness depends on the hit ratio in MHBMHB

• Combined with blocking to increase the hit Combined with blocking to increase the hit ratio in MHBratio in MHB

• Used with victim cacheUsed with victim cache– long time vs.. short time memory address reuselong time vs.. short time memory address reuse

• Used with other miss elimination techniquesUsed with other miss elimination techniques– decrease the number of miss seen by MHB, decrease the number of miss seen by MHB,

equivalent to increase the size of MHBequivalent to increase the size of MHB

– More accurate predictionMore accurate prediction


DiscussionDiscussion

• ReconfigurationReconfiguration– Miss stride prefetch buffer, victim cache, and Miss stride prefetch buffer, victim cache, and

stream buffer share the same big buffer, stream buffer share the same big buffer, dynamically partition buffers dynamically partition buffers

– Use Conflict counter to recognize recent cache Use Conflict counter to recognize recent cache miss pattern--conflict dominant or notmiss pattern--conflict dominant or not

Date post:	22-Dec-2015
Category:	Documents
View:	213 times
Download:	0 times

Copyright 1998 UC, Irvine1 Miss Stride Buffer Department of Information and Computer Science...

Documents