Cache Performance Analysis of Traversals and Random AccessesR. E. Ladner, J. D. Fix, and A. LaMarcaPresented by Tomer Shiran
The Model
• A large memory – M blocksA smaller cache – C blocks
• We examine only direct-mapped caches
• Each block y in the cache is associated with exactly one block of memory such that y=x modC.
The Model (2)
0 1 2 C-2 C-1...
0 1 2 C-2 C-1... M-CM-C+1
M-C+2
M-2 M-1...…...
Cache
Memory
mody x C
0,..., 1
0,..., 1
x M
y C
The Model (3)
0 1 2 3 C-2 C-1...
0 1 2 3 C-2 C-1...
C C+1 C+2 C+3 2C-2 2C-1...
2C 2C+1 2C+2 2C+3 3C-2 3C-1...
3C 3C+1 3C+2 3C+3 4C-2 4C-1...
………………………………….
M-CM-C+1
M-C+2
M-C+3
M-2 M-1...
mody x C
Mem
ory
Cac
he
There are n different memory blocks that map to each cache block.
Thus, M=nC
Algorithms and Cache• An algorithm is simply a sequence of
accesses to blocks in memory• We assume that initially, none of the blocks
to be accessed are in the cache• A read or write to a variable that is part of
a block is modeled as one access to the block
• We do not distinguish between reads and writes – a copy back architecture with a write buffer is used
• An access to a memory block x is a hit if x is in the cache and is a miss, otherwise
• The cache performance of an algorithm is measured by the number of misses it incurs
Traversals
• A traversal with block access rate K accesses each block of a contiguous array of N/K blocks exactly K times each (we always assume that K divides N)
• There are a total of N accesses in a traversal
• Two types of traversals:– Scan traversal– Permutation traversal
Scan Traversals• A scan traversal accesses the first
block K times, then the second block K times, and so forth (for a total of N/K blocks and N accesses)
• Scan traversals are extremely common in algorithms that manipulate arrays– If B array elements fit in a block then a
left-to-right traversal of the array is a scan traversal with block access rate B
• [P-5.1] A scan traversal with block access rate K has 1/K cache misses per access
Permutation Traversals• Consider the multiset S that contains K
copies of x where 0 ≤ x < N/K• Let σ= σ1σ2…σN be a permutation of S,
chosen uniformly at random• If σi =x then the i-th access (out of N) in
the permutation traversal is to x• At any point in the permutation
traversal, if there are k accesses remaining and memory block x has j accesses remaining, then memory block x is chosen for the next access with probability j/k
1 5 2 15 10 1
0 1 2 3 4 5 N/K-1=6
Remaining accesses:
Memory block index:
Next access probability: 5/25 10/25 1/25 1/25 5/25 2/25 1/25
Hit Rate of Permutation Traversals
• [T-5.1] Assuming all permutations are equally likely, a permutation traversal with block access rate K of N/K contiguous memory blocks has the following number of misses per access:
1,
( 1)1 ,
N KCK
K CN KC
N
Hit Rate of Permutation Traversals (2)• x is a particular cache block
m1, m2, …, mn are memory blocks that map to cache block x in the region accessed by the traversal (N=nCK)
• During the traversal, nK accesses will be made to x
• Bi=j whenever the i-th access that maps to x is to location mj
(1≤i≤nK)m1
B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12
m1 m2m1 m3 m2m2 m1 m3 m2m3m3
1 2 3 4 5 6 7 8 9 10 11 12
m1
m2
m3
H
H
H
H
HH
Hit Rate of Permutation Traversals (3)• Xij is a random variable that indicates whether
the i-th access that maps to x is a hit to location mj
The first access to x is always a miss, so X1j=0 for all j
• For i>1 (and i≤nK) we have the following:1
1 1
Pr1
Pr Pr1
ij i i
i i i
X B j B jK K
B j B j B jnK nK
The total number of accesses to x(including accesses to mj)
The total number of accesses to mjSame calculation, except that we know that there was already one access to x for mj
Hit Rate of Permutation Traversals (4)• For a traversal, the expected number of hits at x is then:
• For the expected number of hits incurred by the traversal for all cache blocks, we need to multiply the result by the number of cache blocks:
21 1 1
1
11 1
1
nK n nK n
ij ij iji j i j
X X n nK X
K Kn nK K
nK nK
1,
1,
K NC
K KK C N
CN K
1 ,
1 ,
N NK C
K KN
K C CK
Expected number of hits
Expected hits per access
Tree Traversals – An Example• The nodes of the tree are allocated
contiguously in memory• L is the number of tree nodes that fit
in a single cache block K=3L• Even if the tree is arbitrary, the
permutation traversal that arisesfrom a preorder traversal is not completely arbitrary:– When the key of a node is visited, the next access will
always be to pL (the left child pointer)– pR (the right child pointer) will be accessed next for the
majority of nodes (the leaves), or may be accessed soon after
• Therefore, we model the accesses to the keys as a permutation traversal with K=L, and the remaining accesses to the child pointers as hits
key
pL pR
pData
key
pL pR
pData
key
pL pR
pData
Tree Traversals – An Example (2)• The total number of misses in a preorder traversal is:
• This result was validated with an implementation in C on a DEC Alpha (the memory access was monitored using Atom), and was found to be extremely accurate!
1,
11 ,
NN C
L L
L C NN C
N L
The number of nodes in the tree
Random Access
• In a random access pattern each block x of memory is accessed statistically (in other words, on a given access x is accessed with some probability)
• We assume the independent reference assumption
• The analysis of a set of random access patterns is called collective analysis
Collective Analysis• The cache is partitioned into a set R of regions• The accesses are partitioned into a set P of
processes• The processes are used to model accesses to
different portions of memory that map to the same portion of the cache (a single process doesn’t access different data items that conflict in the cache)
• λij is the probability that region i is accessed by process jri is the is the size of region i in blocks
λi is the probability that region i is accessedi
i R
i ijj P
C r
Collective Analysis (2)• [P-6.1] In a system of random accesses, in the limit as
the number of accesses goes to infinity, the expected number of misses per access is:
• We define the following quantities:
21i ij
j Pi
ii R
The probability that an access is a hit
The probability that an access is a hit in region i
211 1 ij
j Pii R
Random Access for a Finite Period• Proposition 6.1 gives the expected miss ratio if we think
of a system of random accesses running forever• In some cases we are interested in the number of misses
that occur in N accesses• [L-6.1] In a system of random accesses, for each block in
region i, the expected number of misses in N accesses is:
1 1N
i i i ii
i i i
Nr r
Random Access for a Finite Period (2)• x is a particular block in region i
ρik is the probability that the k-th access is a miss at block xqik is the probability that the k-th access was a hit to x given that it was an access to x (i.e., qik is the hit ratio of x at access k)
21 1
1 1 1 1k k
iji i iik
j Pi i i i
qr r
The probability that x has been accessed before the k-th access
The probability that the k-th access and the most recent access were made by the same process, given that they were accesses to x
1i
ik iki
qr
Pr[k-th access is a miss | k-th access is to x]
Pr[k-th access is to x]
1
1 1
1 1 1i
k NN N
i i i i
i
i i i ii ik
k ki i i i
N Nr rr r r
Also the expectation that the k-th access is a miss at x
Random Access for a Finite Period (3)• From Lemma 6.1 (which we just proved), we can find the
expected number of misses in all the N accesses• [T-6.1] In a system of random accesses, the expected
number of misses per access in N accesses is:
• As N goes to infinity the expected number of misses per access goes to 1-η, the expected miss rate from Proposition 6.1
1 1
11 1
11 1 1
N
i i ii i
i i i R i ii R
N
i i ii i
i R i R i R i i
N
i i i
i R i i
rNr r
N N
rN r
rN r
The expected number of misses in all the N accesses
Random Access for a Finite Period (4)• In the most simple case, there is only one process and
one region• In the collective analysis model, an access to a block in a
direct mapped cache by process j will be a hit if no other process has accessed the block since the last access by process j
• When there is only one process an access to a block is always a hit, so η=1
• As a consequence the expected number of misses per access simplifies to:
1
1
11
1 1
1 11 1 1 1
1
N
i
i R i
N N
i
C
i
N
irN r
r CN r N C
Ce
N
Interaction of a Scan Traversal with a System of Random Access
• Suppose we have a system of accesses that consists of a scan traversal with block access rate K to some segment of memory interleaved with a system of random accesses to another segment of memory that makes L accesses per traversal access
• The pattern of access is described by the regular expression:(t1rLt2rL
…tKrL )*, where a sequence t1t2…tK indicates K accesses to the same block and r represents a random access
• We assume that the system of random access has regions R and processes P and the probability that process j accesses region i is λij
• As before, region i has ri blocks
Scan Traversal with Access Rate 1• In this case K=1 and we are analyzing the access pattern
described by the regular expression (trL)*, where t indicates a traversal access and r indicates a random access
• N is the total number of accesses and we assume that (1+L)C divides N
• A traversal access is always a miss, because K=1 and the traversal accesses and random accesses are to different memory segments
• The number of traversal misses is N/(1+L)
Traversal accessmemory segment
Random accessmemory segment
Cache
Scan Traversal with Access Rate 1 (2)• Consider a block x in region i• Every C traversal accesses the traversal captures the
block x (i.e., the traversal accesses a memory block that maps to x)
• During the next C-1 traversal accesses, a random access might be made to the block that was evicted from x by the traversal
• By Lemma 6.1 (with N=LC) the expected number of misses per block of region i in the random accesses during C traversal accesses is:
• The expected number of misses, both traversal and random accesses, during C traversal accesses is:
1 1LC
i i i ii
i i i
LCr r
The total number of accesses during C traversal accesses is (1+L)C
1
1 i i i ii R i R
L Cr C r
L
Scan Traversal with Access Rate 1 (3)• [T-7.1] In a system consisting of a
scan traversal with access rate 1 and system of random accesses with L accesses per traversal access, the expected number of misses per access is:
11 1 1 1
1
LC
i i i
i R i i
rL
C r
L
Scan Traversal with Access Rate 1 (4)
1 1
1 1
11 1 1
1
11 1 1 1
1
LC
i i i ii
i i i R i i ii R
LC
i i ii i
i R i R i i
LC
i i i
i R i i
C r LCC r r r
L C L C
rL
C r
L
rL
C r
L
We want the number of misses per access
Scan Traversal with Access Rate 1 (5)• Assume there is one region of size C and two processes
where each is equally likely to access a given blockr1=C, λ1=1, and η=η1=½
• For large size C the previous formula (Theorem 7.1) evaluates to approximately:
• For L=1 (creating the access pattern (tr)*) this formula evaluates to approximately 0.91 misses per access
• As L grows the number of misses per access approaches 0.5 which is what one would expect with the system of random accesses without any interaction with a traversal
1 1 11 1 1 2 1 12 2 31 2 1 2 1
LC LC
L
L CLC C L eC
L L L
313 0 1 0 1
lim lim12 1 2 2
2
L
L
L L
eL e L L
LLL
Any Questions?