Date post: | 20-Jul-2015 |
Category: |
Engineering |
Upload: | jose-pinilla |
View: | 89 times |
Download: | 5 times |
Adaptive Insertion Policies for High Performance CachingQureshi, et al.
EECE527 - Paper SummaryJose Pinilla
Cache Replacement Policies
● Victim Selection Policy○ LRU
● Insertion Policy○ MRU○ LRU
LRU (Baseline)LRU replacement (commonly used):
Belady’s OPT
Optimal page replacement algorithm (Changes Victim Selection Policy):
LRU replacement (commonly used):
LIP (LRU Insertion Policy)
LIP (LRU Insertion Policy)
LRU replacement (commonly used):
7 7
0
7
1
0
2
1
0
3
1
0
4
1
0
2
1
0
3
1
0
3
2
0
3
2
1
0
2
1
0
7
1
Belady’s OPT
LIP (LRU Insertion Policy)
LRU replacement (commonly used):
7 7
0
7
1
0
2
1
0
3
1
0
4
1
0
2
1
0
3
1
0
3
2
0
3
2
1
0
2
1
0
7
1
Cyclic Reference Model
for j = 1 to Ninstructions read (a1...aT)
for j = 1 to Ninstructions read (b1...bT)
Let there be an access pattern in which (a1 · · · aT)N is followed by (b1 · · · bT)N
Cache Size K (K < T)
N >> T N >> K/ϵ
Access Pattern: LRU Step 1
a1
a2
a3
aT
K
TN
Access Pattern: LRU Step 2
a1
a2
a3
aT
K
TN
Access Pattern: LRU Step X
a1
a2
a3
aT
K
TN
Access Pattern: LRU Step X>T*N
a1
a2
a3
aT
TN
b1
b2
b3
bT
KTN
T
Access Pattern: LIP Step 1
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
K
T
Access Pattern: LIP Step 2
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
K-1
T
Access Pattern: LIP Step X>T*N
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
K-1
Bimodal InsertionControl the percentage of incoming lines placed as MRU
ϵ = Bimodal throttle parameterϵ=1 => LRUϵ=0 => LIP
T
Access Pattern: BIP
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
K-1
T
Access Pattern: BIP
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
T
Access Pattern: BIP
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
T
Access Pattern: BIP
a1
a2
a3
aT
TN
b1
b2
b3
bT
N
Hit Rate
Cache Size K (K < T)
ϵ = Bimodal throttle parameterϵ=1 => LRUϵ=0 => LIP
N >> T N >> K/ϵ
Benchmarksmcf art
health
250M instructions obtained with SimPoint
Results 1
So they proved that it works…
Results 1
So they proved that it works…...but don’t over do it (ϵ)...
Results 1
So they proved that it works…...but don’t over do it (ϵ)...
...actually, let’s choose LRU on run-time sometimes.
DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling
ATD: Auxiliary Tag Directory
MTD: Main Tag Directory
DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling
Dedicated-SetSelection
Policy
Staticor
Dynamic(+2 bits)
DIP: Select MechanismDIP - Global / DSS DIP - Set Dueling
Dedicated-SetSize
SelectionPolicy
Run-time adaptation: PSEL values
PSEL>=512 then LIP PSEL<512 then LRU
Hardware advantages● LIP, BIP and DIP similar to current LRU approximations
● DIP does not require extra bits in the tag-store entry
● No major logic overhead means the cache access time is unaffected
Related Work
R: Random, N: Random from the less recent half, F: Frequently
● Bypass
● Early Eviction
● Dynamic Exclusion
Remarks
Retain some fraction of the working set
Dynamically adapt to workloads and patterns
Low overhead (Set dueling)
Questions?
Questions?
What would be the behaviour if DIP used ATDs dedicated to LRU and LIP?
● Compare Amean
Dynamic ϵ● Can ϵ be extracted from PSEL?
References“Cache Replacement with Dynamic Exclusion”. Scott McFarling
“Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching”. Qureshi et al.
“Using SimPoint for Accurate and Efficient Simulation”. Perelman et al.
“Adaptive Caching for High-Performance Memory Systems”. PhD Dissertation. Qureshi et al.
McFarling: Conflict Between Loops
for i = 1 to 10for j = 1 to 10
instruction afor j = 1 to 10
instruction b
*(a10b10)10 = 0%
(amah9bmbh
9)10 = 10%
* ignoring loop
Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling
McFarling: Conflict Between Loops Levels
for i = 1 to 10for j = 1 to 10
instruction ainstruction b
Direct-mapped(amah
9bm)10 = 18%
Optimalamah
9bm(ah10bm)9 = 10%
Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling
McFarling: Conflict within Loops
for i = 1 to 10instruction ainstruction b
Direct-mapped(ambm)10 = 100%
Optimalambm(ahbm)9 = 55%
Source: “Cache Replacement with Dynamic Exclusion”. Scott McFarling