+ All Categories
Home > Documents > Cache Performance Analysis of Traversals and Random Accesses

Cache Performance Analysis of Traversals and Random Accesses

Date post: 21-Jan-2016
Category:
Upload: rasia
View: 48 times
Download: 0 times
Share this document with a friend
Description:
Cache Performance Analysis of Traversals and Random Accesses. R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran. The Model. A large memory – M blocks A smaller cache – C blocks We examine only direct-mapped caches - PowerPoint PPT Presentation
28
Cache Performance Analysis of Traversals and Random Accesses R. E. Ladner, J. D. Fix, and A. LaMarca Presented by Tomer Shiran
Transcript
Page 1: Cache Performance Analysis of Traversals and Random Accesses

Cache Performance Analysis of Traversals and Random AccessesR. E. Ladner, J. D. Fix, and A. LaMarcaPresented by Tomer Shiran

Page 2: Cache Performance Analysis of Traversals and Random Accesses

The Model

• A large memory – M blocksA smaller cache – C blocks

• We examine only direct-mapped caches

• Each block y in the cache is associated with exactly one block of memory such that y=x modC.

Page 3: Cache Performance Analysis of Traversals and Random Accesses

The Model (2)

0 1 2 C-2 C-1...

0 1 2 C-2 C-1... M-CM-C+1

M-C+2

M-2 M-1...…...

Cache

Memory

mody x C

0,..., 1

0,..., 1

x M

y C

Page 4: Cache Performance Analysis of Traversals and Random Accesses

The Model (3)

0 1 2 3 C-2 C-1...

0 1 2 3 C-2 C-1...

C C+1 C+2 C+3 2C-2 2C-1...

2C 2C+1 2C+2 2C+3 3C-2 3C-1...

3C 3C+1 3C+2 3C+3 4C-2 4C-1...

………………………………….

M-CM-C+1

M-C+2

M-C+3

M-2 M-1...

mody x C

Mem

ory

Cac

he

There are n different memory blocks that map to each cache block.

Thus, M=nC

Page 5: Cache Performance Analysis of Traversals and Random Accesses

Algorithms and Cache• An algorithm is simply a sequence of

accesses to blocks in memory• We assume that initially, none of the blocks

to be accessed are in the cache• A read or write to a variable that is part of

a block is modeled as one access to the block

• We do not distinguish between reads and writes – a copy back architecture with a write buffer is used

• An access to a memory block x is a hit if x is in the cache and is a miss, otherwise

• The cache performance of an algorithm is measured by the number of misses it incurs

Page 6: Cache Performance Analysis of Traversals and Random Accesses

Traversals

• A traversal with block access rate K accesses each block of a contiguous array of N/K blocks exactly K times each (we always assume that K divides N)

• There are a total of N accesses in a traversal

• Two types of traversals:– Scan traversal– Permutation traversal

Page 7: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversals• A scan traversal accesses the first

block K times, then the second block K times, and so forth (for a total of N/K blocks and N accesses)

• Scan traversals are extremely common in algorithms that manipulate arrays– If B array elements fit in a block then a

left-to-right traversal of the array is a scan traversal with block access rate B

• [P-5.1] A scan traversal with block access rate K has 1/K cache misses per access

Page 8: Cache Performance Analysis of Traversals and Random Accesses

Permutation Traversals• Consider the multiset S that contains K

copies of x where 0 ≤ x < N/K• Let σ= σ1σ2…σN be a permutation of S,

chosen uniformly at random• If σi =x then the i-th access (out of N) in

the permutation traversal is to x• At any point in the permutation

traversal, if there are k accesses remaining and memory block x has j accesses remaining, then memory block x is chosen for the next access with probability j/k

1 5 2 15 10 1

0 1 2 3 4 5 N/K-1=6

Remaining accesses:

Memory block index:

Next access probability: 5/25 10/25 1/25 1/25 5/25 2/25 1/25

Page 9: Cache Performance Analysis of Traversals and Random Accesses

Hit Rate of Permutation Traversals

• [T-5.1] Assuming all permutations are equally likely, a permutation traversal with block access rate K of N/K contiguous memory blocks has the following number of misses per access:

1,

( 1)1 ,

N KCK

K CN KC

N

Page 10: Cache Performance Analysis of Traversals and Random Accesses

Hit Rate of Permutation Traversals (2)• x is a particular cache block

m1, m2, …, mn are memory blocks that map to cache block x in the region accessed by the traversal (N=nCK)

• During the traversal, nK accesses will be made to x

• Bi=j whenever the i-th access that maps to x is to location mj

(1≤i≤nK)m1

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12

m1 m2m1 m3 m2m2 m1 m3 m2m3m3

1 2 3 4 5 6 7 8 9 10 11 12

m1

m2

m3

H

H

H

H

HH

Page 11: Cache Performance Analysis of Traversals and Random Accesses

Hit Rate of Permutation Traversals (3)• Xij is a random variable that indicates whether

the i-th access that maps to x is a hit to location mj

The first access to x is always a miss, so X1j=0 for all j

• For i>1 (and i≤nK) we have the following:1

1 1

Pr1

Pr Pr1

ij i i

i i i

X B j B jK K

B j B j B jnK nK

The total number of accesses to x(including accesses to mj)

The total number of accesses to mjSame calculation, except that we know that there was already one access to x for mj

Page 12: Cache Performance Analysis of Traversals and Random Accesses

Hit Rate of Permutation Traversals (4)• For a traversal, the expected number of hits at x is then:

• For the expected number of hits incurred by the traversal for all cache blocks, we need to multiply the result by the number of cache blocks:

21 1 1

1

11 1

1

nK n nK n

ij ij iji j i j

X X n nK X

K Kn nK K

nK nK

1,

1,

K NC

K KK C N

CN K

1 ,

1 ,

N NK C

K KN

K C CK

Expected number of hits

Expected hits per access

Page 13: Cache Performance Analysis of Traversals and Random Accesses

Tree Traversals – An Example• The nodes of the tree are allocated

contiguously in memory• L is the number of tree nodes that fit

in a single cache block K=3L• Even if the tree is arbitrary, the

permutation traversal that arisesfrom a preorder traversal is not completely arbitrary:– When the key of a node is visited, the next access will

always be to pL (the left child pointer)– pR (the right child pointer) will be accessed next for the

majority of nodes (the leaves), or may be accessed soon after

• Therefore, we model the accesses to the keys as a permutation traversal with K=L, and the remaining accesses to the child pointers as hits

key

pL pR

pData

key

pL pR

pData

key

pL pR

pData

Page 14: Cache Performance Analysis of Traversals and Random Accesses

Tree Traversals – An Example (2)• The total number of misses in a preorder traversal is:

• This result was validated with an implementation in C on a DEC Alpha (the memory access was monitored using Atom), and was found to be extremely accurate!

1,

11 ,

NN C

L L

L C NN C

N L

The number of nodes in the tree

Page 15: Cache Performance Analysis of Traversals and Random Accesses

Random Access

• In a random access pattern each block x of memory is accessed statistically (in other words, on a given access x is accessed with some probability)

• We assume the independent reference assumption

• The analysis of a set of random access patterns is called collective analysis

Page 16: Cache Performance Analysis of Traversals and Random Accesses

Collective Analysis• The cache is partitioned into a set R of regions• The accesses are partitioned into a set P of

processes• The processes are used to model accesses to

different portions of memory that map to the same portion of the cache (a single process doesn’t access different data items that conflict in the cache)

• λij is the probability that region i is accessed by process jri is the is the size of region i in blocks

λi is the probability that region i is accessedi

i R

i ijj P

C r

Page 17: Cache Performance Analysis of Traversals and Random Accesses

Collective Analysis (2)• [P-6.1] In a system of random accesses, in the limit as

the number of accesses goes to infinity, the expected number of misses per access is:

• We define the following quantities:

21i ij

j Pi

ii R

The probability that an access is a hit

The probability that an access is a hit in region i

211 1 ij

j Pii R

Page 18: Cache Performance Analysis of Traversals and Random Accesses

Random Access for a Finite Period• Proposition 6.1 gives the expected miss ratio if we think

of a system of random accesses running forever• In some cases we are interested in the number of misses

that occur in N accesses• [L-6.1] In a system of random accesses, for each block in

region i, the expected number of misses in N accesses is:

1 1N

i i i ii

i i i

Nr r

Page 19: Cache Performance Analysis of Traversals and Random Accesses

Random Access for a Finite Period (2)• x is a particular block in region i

ρik is the probability that the k-th access is a miss at block xqik is the probability that the k-th access was a hit to x given that it was an access to x (i.e., qik is the hit ratio of x at access k)

21 1

1 1 1 1k k

iji i iik

j Pi i i i

qr r

The probability that x has been accessed before the k-th access

The probability that the k-th access and the most recent access were made by the same process, given that they were accesses to x

1i

ik iki

qr

Pr[k-th access is a miss | k-th access is to x]

Pr[k-th access is to x]

1

1 1

1 1 1i

k NN N

i i i i

i

i i i ii ik

k ki i i i

N Nr rr r r

Also the expectation that the k-th access is a miss at x

Page 20: Cache Performance Analysis of Traversals and Random Accesses

Random Access for a Finite Period (3)• From Lemma 6.1 (which we just proved), we can find the

expected number of misses in all the N accesses• [T-6.1] In a system of random accesses, the expected

number of misses per access in N accesses is:

• As N goes to infinity the expected number of misses per access goes to 1-η, the expected miss rate from Proposition 6.1

1 1

11 1

11 1 1

N

i i ii i

i i i R i ii R

N

i i ii i

i R i R i R i i

N

i i i

i R i i

rNr r

N N

rN r

rN r

The expected number of misses in all the N accesses

Page 21: Cache Performance Analysis of Traversals and Random Accesses

Random Access for a Finite Period (4)• In the most simple case, there is only one process and

one region• In the collective analysis model, an access to a block in a

direct mapped cache by process j will be a hit if no other process has accessed the block since the last access by process j

• When there is only one process an access to a block is always a hit, so η=1

• As a consequence the expected number of misses per access simplifies to:

1

1

11

1 1

1 11 1 1 1

1

N

i

i R i

N N

i

C

i

N

irN r

r CN r N C

Ce

N

Page 22: Cache Performance Analysis of Traversals and Random Accesses

Interaction of a Scan Traversal with a System of Random Access

• Suppose we have a system of accesses that consists of a scan traversal with block access rate K to some segment of memory interleaved with a system of random accesses to another segment of memory that makes L accesses per traversal access

• The pattern of access is described by the regular expression:(t1rLt2rL

…tKrL )*, where a sequence t1t2…tK indicates K accesses to the same block and r represents a random access

• We assume that the system of random access has regions R and processes P and the probability that process j accesses region i is λij

• As before, region i has ri blocks

Page 23: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversal with Access Rate 1• In this case K=1 and we are analyzing the access pattern

described by the regular expression (trL)*, where t indicates a traversal access and r indicates a random access

• N is the total number of accesses and we assume that (1+L)C divides N

• A traversal access is always a miss, because K=1 and the traversal accesses and random accesses are to different memory segments

• The number of traversal misses is N/(1+L)

Traversal accessmemory segment

Random accessmemory segment

Cache

Page 24: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversal with Access Rate 1 (2)• Consider a block x in region i• Every C traversal accesses the traversal captures the

block x (i.e., the traversal accesses a memory block that maps to x)

• During the next C-1 traversal accesses, a random access might be made to the block that was evicted from x by the traversal

• By Lemma 6.1 (with N=LC) the expected number of misses per block of region i in the random accesses during C traversal accesses is:

• The expected number of misses, both traversal and random accesses, during C traversal accesses is:

1 1LC

i i i ii

i i i

LCr r

The total number of accesses during C traversal accesses is (1+L)C

1

1 i i i ii R i R

L Cr C r

L

Page 25: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversal with Access Rate 1 (3)• [T-7.1] In a system consisting of a

scan traversal with access rate 1 and system of random accesses with L accesses per traversal access, the expected number of misses per access is:

11 1 1 1

1

LC

i i i

i R i i

rL

C r

L

Page 26: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversal with Access Rate 1 (4)

1 1

1 1

11 1 1

1

11 1 1 1

1

LC

i i i ii

i i i R i i ii R

LC

i i ii i

i R i R i i

LC

i i i

i R i i

C r LCC r r r

L C L C

rL

C r

L

rL

C r

L

We want the number of misses per access

Page 27: Cache Performance Analysis of Traversals and Random Accesses

Scan Traversal with Access Rate 1 (5)• Assume there is one region of size C and two processes

where each is equally likely to access a given blockr1=C, λ1=1, and η=η1=½

• For large size C the previous formula (Theorem 7.1) evaluates to approximately:

• For L=1 (creating the access pattern (tr)*) this formula evaluates to approximately 0.91 misses per access

• As L grows the number of misses per access approaches 0.5 which is what one would expect with the system of random accesses without any interaction with a traversal

1 1 11 1 1 2 1 12 2 31 2 1 2 1

LC LC

L

L CLC C L eC

L L L

313 0 1 0 1

lim lim12 1 2 2

2

L

L

L L

eL e L L

LLL

Page 28: Cache Performance Analysis of Traversals and Random Accesses

Any Questions?


Recommended