Insertion Policy Selection Using Decision Tree Analysis
Samira Khan, Daniel A. Jiménez University of Texas at San Antonio
Motivation L1 and L2 filters the cache access Last Level Cache (LLC) does not have
much temporal locality Large fraction of blocks brought to cache
are never accessed again (zero reuse lines).
For SPEC CPU 2006 benchmarks, on average 60.18% lines are never accessed again while they are in the LLC
Motivation No cache bursts in LLC Only small portion of hits occur near the MRU
position
Goal Get rid of zero reuse lines as early as
possible Keep lines in cache for sufficient time
to get the first hit Minimal change to LRU policy Use as little space as possible
Insertion Position Selection Find the optimal insertion position
Zero reuse lines will get evicted earlier Most of the non zero reuse lines should be in
cache before their first hit This will get rid of zero reuse lines and make
space for useful lines Use Decision Tree Analysis via set dueling to
find the position This allows choosing among the insertion
positions to set duel
Set dueling betweenmiddle and MRU pos
Set dueling betweenLRU and middle pos
Set dueling between
nearMRU and MRU pos
Set dueling between
nearLRU and middle pos
Insert posLRU
Insert pos
nearLRU
Insert pos
middle
Insert pos
nearMRU
Insert posMRU
LRU pos middle pos MRU pos
nearLRU pos
nearMRU pos
middle pos winner MRU pos winner
Middle pos winnerLRU pos winner nearMRU pos winner
MRU pos winner
nearLRU pos winner
Middle pos winner
For 400.perlbench 66.67% lines brought to cache are never accessed again and 73.03% hits occur in between MRU and middle position
Adaptive Multi Set Dueling Current multi set dueling
Have one leader set for each insertion policy Partial follower sets duplicate the winner set policy Each policy set duel in a tournament manner Not scalable Leader sets performing the looser policies hurt
performance
Adaptive multi set dueling Leader set adaptively chooses the policy No need for partial follower set Scalable
Result
Space Overhead
Parameter Storage Total Storage
LRU overhead per line 4 bits 1024*16*4 = 8 KB
Set type per set 2 bits 1024 * 2 = 2048 bits
Two counters (psel1 & psel2)
Each 10 bits 20 bits
One counter (switched) 1 bit 1 bit
Total 8 KB + 2069 bits
Space overhead for a 1MB 16 way set associative LLC
Conclusion Insertion Position Selection using Decision
Tree Analysis Requires minimal change to LRU Needs only 2069 bits extra space Chooses the best insertion position adaptively Gets rid of zero reuse lines without any storage
hungry predictor Makes multi set dueling scalable
Questions
Zero Reuse Lines in SPEC CPU 2006
pselab
pselcd
pselef
pselgh
psel1
psel2
psel2
psel1
pa
pb
φabpc
pd
pe
pf
pg
ph
φcd
φef
φgh
pb
pa
pα
-1
+1
-1
-1
-1
-1
-1
+1
+1
+1
+1
+1
+1
+1
+1, if pb wins
-1
-1, if pa
wins
All sets in LLC
Leader se
ts in a
daptiv
e m
ulti se
t duelin
g sch
em
eLe
ader se
ts in cu
rrent
multi se
t duelin
g sch
em
eAdaptive Multi Set Dueling
Result
MRU
nearMRU
middle
nearLRU
LRU
psel2
psel1 s