Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Copyright 1998 UC, Irvine 1
Miss Stride BufferMiss Stride Buffer
Department of Information and Computer Department of Information and Computer ScienceScience
University of California, IrvineUniversity of California, Irvine
Copyright 1998 UC, Irvine 2
IntroductionIntroduction
In this presentation, we present a new technique to eliminate conflict miss in cache. We use a Miss History Buffer to record the miss address. From the buffer we can calculate the miss stride to predict what address will miss again. Then we prefetch this memory address into cache. Experiment shows this technique is very effective in eliminate conflict miss in some applications, and it incurs little increase in bandwidth.
Copyright 1998 UC, Irvine 3
OverviewOverview
• Importance of cache performanceImportance of cache performance
• Techniques to reduce cache missesTechniques to reduce cache misses
• Our Approach--Miss Stride BufferOur Approach--Miss Stride Buffer
• ExperimentsExperiments
• DiscussionDiscussion
Copyright 1998 UC, Irvine 4
Importance of Cache performanceImportance of Cache performance
• Disparity of processor and memory Disparity of processor and memory speedspeed
• Cache missCache miss– CompulsoryCompulsory
– CapacityCapacity
– ConflictConflict
• Increasing cache miss penalty for Increasing cache miss penalty for faster machinefaster machine
Copyright 1998 UC, Irvine 5
Techniques to reduce Cache missTechniques to reduce Cache miss
• All use some kind of predictions on All use some kind of predictions on the pattern of missthe pattern of miss
• Victim CacheVictim Cache
• Stream BufferStream Buffer
• Stride prefetchStride prefetch
Copyright 1998 UC, Irvine 6
Victim CacheVictim Cache
• Mainly used to eliminate conflict missMainly used to eliminate conflict miss
• Prediction: the memory address of a cache line Prediction: the memory address of a cache line that is replaced is likely to be accessed again in that is replaced is likely to be accessed again in near futurenear future
• Scenario for prediction to be effective: false Scenario for prediction to be effective: false sharing, ugly address mappingsharing, ugly address mapping
• Architecture implementation: use a on-chip buffer Architecture implementation: use a on-chip buffer to store the contents of recently replaced cache to store the contents of recently replaced cache lineline
Copyright 1998 UC, Irvine 8
Drawback of Victim CacheDrawback of Victim Cache
• Ugly mapping can be rectified by cache Ugly mapping can be rectified by cache aware compileraware compiler
• Small size of victim cache, probability of Small size of victim cache, probability of memory address reuse within short period is memory address reuse within short period is very low.very low.
• Experiment shows victim cache is not Experiment shows victim cache is not effective effective
Copyright 1998 UC, Irvine 9
Stream BufferStream Buffer
• Mainly used to eliminate compulsory/capacity Mainly used to eliminate compulsory/capacity missesmisses
• Prediction: if a memory address is missed, the Prediction: if a memory address is missed, the consecutive address is likely to be missed in near consecutive address is likely to be missed in near futurefuture
• Scenario for prediction to be useful: stream accessScenario for prediction to be useful: stream access
• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch consecutive address into on-chip miss, prefetch consecutive address into on-chip buffer. When there is a hit in stream buffer, buffer. When there is a hit in stream buffer, prefetch the consecutive address of the hit prefetch the consecutive address of the hit address.address.
Copyright 1998 UC, Irvine 11
Stream CacheStream Cache
• Modification of stream bufferModification of stream buffer
• Use a separate cache to store stream data to Use a separate cache to store stream data to prevent cache pollution prevent cache pollution
• When there is a hit in stream buffer, the hit When there is a hit in stream buffer, the hit address is sent to stream cache instead of L1 address is sent to stream cache instead of L1 cachecache
Copyright 1998 UC, Irvine 13
Stride PrefetchStride Prefetch
• Mainly used to eliminate compulsory/capacity missMainly used to eliminate compulsory/capacity miss
• Prediction: if a memory address is missed, an Prediction: if a memory address is missed, an address that is offset by a distance from the missed address that is offset by a distance from the missed address is likely to be missed in near futureaddress is likely to be missed in near future
• Scenario for prediction to be useful: stride accessScenario for prediction to be useful: stride access
• Architecture implementation: when an address Architecture implementation: when an address miss, prefetch address that is offset by a distance miss, prefetch address that is offset by a distance from the missed address. When there is a hit in from the missed address. When there is a hit in buffer, also prefetch the address that is offset by a buffer, also prefetch the address that is offset by a distance from the hit address.distance from the hit address.
Copyright 1998 UC, Irvine 14
Miss Stride BufferMiss Stride Buffer
• Mainly used to eliminate conflict missMainly used to eliminate conflict miss
• Prediction: if a memory address miss again Prediction: if a memory address miss again after after N N other misses, the memory address is other misses, the memory address is likely to miss again after likely to miss again after NN other misses other misses
• Scenario for the prediction to be usefulScenario for the prediction to be useful– multiple loop nestsmultiple loop nests
– some variables or array elements are reused some variables or array elements are reused across iterationsacross iterations
Copyright 1998 UC, Irvine 15
Advantage over Victim CacheAdvantage over Victim Cache
• Eliminate conflict miss that even cache Eliminate conflict miss that even cache aware compiler can not eliminateaware compiler can not eliminate– Ugly mappings are fewer and can be rectifiedUgly mappings are fewer and can be rectified
– Much more conflicts are random. From probability Much more conflicts are random. From probability perspective, a certain memory address will conflict perspective, a certain memory address will conflict with other addresses after some time, but we can with other addresses after some time, but we can not know at compile time which address it will not know at compile time which address it will conflict.conflict.
• There can be a much longer period before There can be a much longer period before the conflict address is reusedthe conflict address is reused– Victim cache’s small sizeVictim cache’s small size
Copyright 1998 UC, Irvine 16
Architecture ImplementationArchitecture Implementation
• Memory history bufferMemory history buffer– FIFO buffer to record recently missed memory FIFO buffer to record recently missed memory
addressaddress
– Predict only when there is a hit in the bufferPredict only when there is a hit in the buffer
– Miss stride can be calculated by the relative Miss stride can be calculated by the relative position of consecutive miss for the same addressposition of consecutive miss for the same address
– The size of the buffer determines the number of The size of the buffer determines the number of predictionspredictions
• Prefetch buffer (On-chip)Prefetch buffer (On-chip)– Store the contents of prefetched memory addressStore the contents of prefetched memory address
– The size of the buffer determines how much we can The size of the buffer determines how much we can tolerate the variation of miss stridetolerate the variation of miss stride
Copyright 1998 UC, Irvine 17
Architecture ImplementationArchitecture Implementation
• Prefetch schedulerPrefetch scheduler– Select a right time to prefetchSelect a right time to prefetch
– Avoid collisionAvoid collision
• PrefetcherPrefetcher– prefetch the contents of miss address into on-chip prefetch the contents of miss address into on-chip
prefetch bufferprefetch buffer
Copyright 1998 UC, Irvine 21
ExperimentExperiment
• Application: Matrix MultiplyApplication: Matrix Multiply#define N 257main() {
int i, j, k, sum, a[N][N], b[N][N], c[N][N];
for ( i=0; i<N; i++ ) for ( j=0; j<N; j++ ) {
b[i][j] = 1;c[i][j] = 1;
}for ( i=0; i<N; i++ )
for ( j=0; j<N; j++ ) {sum = 0;for ( k=0; k<N; k++ ) {
sum += b[i][k]+c[k][j]; }a[i][j] = sum;
}}
Copyright 1998 UC, Irvine 22
Rates
0
0.2
0.4
0.6
0.8
1
Matrix Size
Ra
te
Cache Miss RateMHB Hit RatioPrefetch Hit RatioMiss Elimination Ratio
Configuration: MHB--4096 entriesPrefetch Buffer--32 entries
Copyright 1998 UC, Irvine 23
Speedup VS. Bandwidth Increase
00.020.040.060.08
0.10.120.14
63 65 127
129
233
256
297
477
691
Speedup
BandwidthIncrease
Miss Penalty--6 cycles
Copyright 1998 UC, Irvine 24
MHB Size VS. Prefetch Buffer Size
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 4 8 16 32 64
Prefetch Buffer Size
Pre
fetc
h H
it R
ate
256
512
1024
2048
3072
4096
Copyright 1998 UC, Irvine 25
Memory History Buffer Hit Rate
0.955387 0.968175
0.2741950.2653470.2629930.261906
0.2
0.4
0.6
0.8
1
256 512 1024 2048 3072 4096
Buffer Size
Hit
Rat
e
Copyright 1998 UC, Irvine 26
Smooth Ugly Mapping
00.020.040.060.08
0.1
127 128 129 255 256 257
Matrix Size
Mis
s R
ate
Cache Miss Rate Miss Rate After Prefetch
Copyright 1998 UC, Irvine 27
DiscussionDiscussion
• The effectiveness depends on the hit ratio in The effectiveness depends on the hit ratio in MHBMHB
• Combined with blocking to increase the hit Combined with blocking to increase the hit ratio in MHBratio in MHB
• Used with victim cacheUsed with victim cache– long time vs.. short time memory address reuselong time vs.. short time memory address reuse
• Used with other miss elimination techniquesUsed with other miss elimination techniques– decrease the number of miss seen by MHB, decrease the number of miss seen by MHB,
equivalent to increase the size of MHBequivalent to increase the size of MHB
– More accurate predictionMore accurate prediction
Copyright 1998 UC, Irvine 28
DiscussionDiscussion
• ReconfigurationReconfiguration– Miss stride prefetch buffer, victim cache, and Miss stride prefetch buffer, victim cache, and
stream buffer share the same big buffer, stream buffer share the same big buffer, dynamically partition buffers dynamically partition buffers
– Use Conflict counter to recognize recent cache Use Conflict counter to recognize recent cache miss pattern--conflict dominant or notmiss pattern--conflict dominant or not