SHiP: Signature-based Hit Predictor forHigh Performance Caching
*Carole-Jean Wu, #Aamer Jaleel, #,+William Hasenplaugh,*Margaret Martonosi, #Simon Steely Jr., #,+Joel Emer
*Princeton University #Intel Corporation, VSSAD #,+MIT
IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)
Motivation
• Factors making caching important• Increasing ratio of CPU speed to memory speed• Multi-core poses challenges on better shared cache
management
• LRU has been the standard LLC replacement policy• However LRU has problems!
2
Problems with LRU Replacement
3
Working set larger than the cache causes thrashing
References to non-temporal data (scans) discards frequently referenced working set
miss miss miss missmiss
hit hit hit miss hit miss missscan scan scan
Wsize
Wsize
scans occur frequently in commercial workloads
LLCsize
LLCsize
hit hit hit hit hit
Desired Behavior from Cache Replacement
4
miss
miss
miss
miss
miss
Working set larger than the cache Preserve some of working set in the cache
Recurring scans Preserve frequently referenced working set in the cache
hit hit hit hitscan scan scanhit hit hit
Wsize
LLCsize [ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]
[ SRRIP (ISCA’10) achieves this effect ]
Dynamic Re-Reference Interval Prediction ( DRRIP )
0Imme-diate
1Inter-
mediate
2far
3distant
re-reference
No Victim
insertion
re-reference
eviction
re-reference
5
No Victim No Victim
(SRRIP)Scan-Resistant
[ Jaleel et al., ISCA’10 ]
insertion
( BRRIP )Thrash-Resistant
SRRIP Not Always Scan Resistant…
• LONG scans in access pattern
6
miss hit hit miss“short” scan “long” scanhit
SRRIP Not Always Scan Resistant…
• LONG scans in access pattern
7
miss miss miss missscan scan scan
miss hit hit miss“short” scan “long” scanhit
• Active working-set MUST be RE-REFERENCED at least ONCE between scans
SRRIP Not Always Scan Resistant…
• LONG scans in access pattern
8
miss miss miss missscan scan scan
miss hit hit miss“short” scan “long” scanhit hit
hithithit
Can We Be More Intelligent in Dealing with Scans?
• Active working-set MUST be RE-REFERENCED at least ONCE between scans
Closer Look at Scan Access Patterns
9
scan scan
Future Reference No Future References
Assuming Perfect Knowledge of Re-Reference Pattern
Improving RRIP on Cache Insertions
0Imme-diate
1Inter-
mediate
2far
3distant
re-reference
No Victim
re-reference
eviction
re-reference
10
No Victim No Victim
scan Improve Insertion
Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion
Focus of this Paper…• Goal: Learn re-reference interval of a cache line
11
PREDICTORcache access re-reference prediction
0: immediate1: intermediate2: far3: distant
How Best to Learn the Re-Reference Interval?
Learning Re-Reference Behavior
12
scan scan
REFERENCE SAMEMEMORY REGIONREFERENCED BY
SIMILAR SET OF PCs
Can We Learn Re-References By Correlating Accesses With Some Other Information?
Learning Re-Reference Behavior
13
scan scan
REFERENCE SAMEMEMORY REGIONREFERENCED BY
SIMILAR SET OF PCs
Can We Learn Re-References By Correlating Accesses With Some Other Information?
Using Signatures to Correlate Re-Reference
scan scan
• Different types of information:• Memory Region• Memory Instruction PC• Instruction Sequence
• Observation: LLC accesses by the same “signature” tend to have similar re-reference patterns
“signature“
OBSERVE, LEARN and PREDICT Re-Reference Pattern of a Signature
• Observe re-reference pattern in the baseline cache
• Cache Tag• Replacement State• Coherence State
Observe Signature Re-Reference Behavior
15LLC
Load
/Sto
re
Addr
ess
• Observe re-reference pattern in the baseline cache
• Hardware Required: • Was line re-referenced after cache insertion ( 1-bit )• “Signature” responsible for cache insertion ( 14-bits )
Observe Signature Re-Reference Behavior
16LLC
• reuse bit• signature_insert
Sign
atur
e
Load
/Sto
re
Addr
ess
metadata
• Learn signature re-reference behavior
• Hardware Required: • Signature History Counter Table (SHCT) ( 16K, 2-bit counters
)
• SHCT Training:• If evicted line reused: SHCT [ signature_insert ] ++• If evicted line NOT reused: SHCT [ signature_insert ] --
Learn Signature Re-Reference Behavior
17Last Level Cache (LLC)
SHCTcounter = 0, signature NOT re-referenced
counter != 0, signature re-referenced
Signature-based Hit Predictor (SHiP)• Predict re-reference interval of line using SHCT
18
SHiPcache hit/miss
re-reference prediction
0: immediate1: intermediate2: far3: distantsignature
SHCT
Signature-based Hit Predictor (SHiP)• Predict re-reference interval using SHCT on CACHE MISS
19
cache missre-reference
prediction0: immediate1: intermediate2: far3: distantsignature
if ( SHCT [ signature ] == 0 )if ( SHCT [ signature ] == 0 )
elsepredict DISTANT (i.e. 3)
predict FAR (i.e. 2)
SHiP Re-Reference Predictions On Miss
Signature-based Hit Predictor (SHiP)• Predict re-reference interval on CACHE HIT
20
cache hitre-reference
prediction0: immediate1: intermediate2: far3: distantsignature Always predict IMMEDIATE (i.e. 0)
SHiP Re-Reference Predictions On Hit
SHiP – High Level Architectural Overview
21
Acce
ss Ty
pe
Addr
ess
data
hit/m
iss
Sign
atur
e
Last Level Cache (LLC)
SHiPSHCT
Re-Reference Prediction
signature_insertreuse_bit
LLC hit/miss
SHCT Training
SHiP – High Level Architectural Overview
22
Acce
ss Ty
pe
Addr
ess
data
hit/m
iss
Sign
atur
e
Last Level Cache (LLC)
Per-Line Overhead Can Be Reduced by usingSet Sampling ( need only 32 - 64 sets )
SHiPSHCT
Re-Reference Prediction
signature_insertreuse_bit
LLC hit/miss
SHCT Training
SHiP
SHiP – High Level Architectural Overview
23
Acce
ss Ty
pe
Addr
ess
data
hit/m
iss
Sign
atur
e
Last Level Cache (LLC)
Per-Line Overhead Can Be Reduced by usingSet Sampling ( need only 32 - 64 sets )
SHCT
Re-Reference Prediction
signature_insertreuse_bit
LLC hit/miss
SHCT Training
~6
KB
NO CHANGE
Performance Comparison of Replacement Policies
24
SHiP Significantly Improves Performance Across All Workload Categories
Mm./Games Server SPEC2K6 All1.00
1.05
1.10
1.15
SRRIP DRRIP SHiP-PC
Perf
orm
ance
Rel
ativ
e to
LR
U
16-way 2MB LLCCore i7 Type Hierarchy
1.00
1.05
1.10
1.15
SRRIP DRRIP Seg-LRU SDBP SHiP-PC
Perf
orm
ance
Rel
ativ
e to
LR
UPerformance Comparison of Replacement Policies
CRC Results Comparison
25
16-way 1MB Private Cache65 Single-Threaded Workloads
16-way 4MB Shared Cache165 4-core Workloads
Averaged Across PC Games, Multimedia, Enterprise Server, SPEC CPU2006 Workloads S
HiPSHiP
SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies
Total Storage Overhead (16-way Set Associative Cache)
26
• LRU: 4-bits / cache block• Pseudo-LRU 1-bit / cache block• RRIP: [ ISCA’10 ] 2-bits / cache block• Seg-LRU: [ CRC’10 ] ~8-bits / cache block• SDBP: [ MICRO’10 ] ~10-bits / cache block• SHiP: [ MICRO’11 ] ~5-bits / cache block
SHiP Outperforms State-of-the-Art with HW Similar to LRU
Summary
• Scan-resistance is an important problem in commercial workloads• State-of-the art policies do not fully address scan-resistance
• Signatures help improve re-reference predictions to address scans• Need fine-grained re-reference predictions at insertion
• Proposed a Simple and Practical Scan-Resistant Replacement
• SHiP significantly outperforms winner of CRC Championship• SHiP requires less storage than CRC winner• HW overhead of SHiP is comparable to LRU
27
28
Q&A
29
Q&A
30
Q&A
Re-Reference Interval Prediction ( RRIP )
0Imme-diate
1Inter-
mediate
2far
3distant
re-reference
No Victim
insertion
re-reference
eviction
re-reference
31
No Victim No Victim
Scan-ResistantCAN INSERTION BEMORE INTELLIGENT?
Using Signatures to Correlate Re-Reference Behavior
32
scan scan
Future Cache Hits No Future Cache Hits
SIGNATURE a b a c d c
a c b d
Example SignaturesMemory Region Program Counter Instruction Decode History
LRU vs. Re-Reference Interval Prediction (RRIP)
33
21
Cache Tag
2
s c
3
b
0
h
5
f
4
d
7
g
6
e
“LRU Chain” position
0 1 2 3 4 5 6 7Physical Way #
LRU
RRIP Outperforms LRU with Storage Less Than LRU
20
Cache Tag
0
s c
1
b
0
h
2
f
2
d
3
g
3
e
Re-Reference Prediction
0 1 2 3 4 5 6 7Physical Way # RRIP
Signature-based Hit Predictor (SHiP)
• Goal: Predict the re-reference behavior of a signature
• Learn Re-Reference Behavior:
34
Acce
ss Ty
pe
Addr
ess
data
hit/m
iss
Sign
atur
e
LLC