+ All Categories
Home > Documents > Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between...

Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between...

Date post: 19-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
46
Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015
Transcript
Page 1: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Hardware Modeling 2 Cache Analyses Peter Puschner slides credits: P. Puschner, R. Kirner, B. Huber VU 2.0 182.101 SS 2015

Page 2: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Recap: Caches in WCET Analysis

Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures

Example: 40 cycles for a miss on MPC755 What: Instructions, Data, BTB, TLB Design: Direct Mapped, Set/Fully Associative Replacement Policy: LRU, FIFO, PLRU, PRR More Characteristics: read-only / write through / write

back, write (no) allocate, Multi-Level Caches (inclusive/exclusive), ...

2

Page 3: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Caches in WCET Analysis

For software running on hardware with caches, computing the WCET by IPET alone (CFG + CCG) gets too complex

Ignoring caches leads to unacceptable overestimations

ð Decomposition of WCET analysis into 2+ phases 1.  Categorization of memory access wrt. cache

behavior (e.g., always hit, always miss, etc.); Low-Level Analysis uses cache categorization.

2.  WCET computation: IPET with no or simplified cache model

3

Page 4: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

4

Categories of Cache Behavior ah always hit each access to the cache is a hit

(MUST analysis) am always

miss each access to the cache is a miss (MAY analysis ➭complement)

ps(S) persistent for each entering of context S, first access is nc, but all other accesses are hits (PERSISTANCE analysis)

nc not classified

the access is not classified as one of the above categorizations

Page 5: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Direct Mapped Cache

5

Line is selectedby ld(m) address

bits

Line 1

Line 2

Line m

m lines

Line: valid bit (v), tagand data (k bytes)

...v w1 w2 wktag

tag ld(m) bits

address word

ld(k) bits

Page 6: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

DM-$ Analysis Example

6

Compiled  from  e.g.  x, y, z = a, b, 0!while (x > 0 &&! y > 0)!{ z += x-- + y-- }!x,y = 0,0!

START

tag, line, offset0, 0, 00, 0, 1

0,1,10,2,0

0,2,10,3,0

0,3,1

0, 1, 0

END

1,0,0

Page 7: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

DM-$ Analysis Example

7

Compiled  from  e.g.  x, y, z = a, b, 0!while (x > 0 &&! y > 0)!{ z += x-- + y-- }!x,y = 0,0!

START

tag, line, offset0, 0, 00, 0, 1

0,2,0

0,3,0

0,3,1

0, 1, 0

END

1,0,0always miss

conflict with (0,0,x)

0,1,10,2,0

0,2,10,3,0

0, 1, 0

continue with 2nd loop iteration

always hit(2..n loop iteration)

0, 1, 1

0, 2, 1

alwayshit

Page 8: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

8

Cache Classification (Hit/Miss)

Goal: A mechanized analysis, which classifies each cache access in a certain context (e.g. call context) as either Ø Always hit: in all possible executions, this access to the

cache will be a cache hit (the accessed cache block is guaranteed to be in the cache)

Ø Always miss: in all possible executions, this access to the cache will be a cache miss (the accessed cache block is guaranteed NOT to be in the cache)

Ø Not classified: The accessed cache block may or may not be in the cache

Page 9: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

9

Automated Categorization of Memory Accesses

à Based on Abstract Interpretation and fixed-point analysis of cache states in the CFG

à Cache update function: models changes of the cache state for memory accesses

à Join function: Combines states at control-flow joins

à Concrete Semantics: Set of possible cache configurations (tags only, no data) at each program point

à Abstract Semantics: Efficient approximation in an abstract, “more efficient” domain

Page 10: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

10

Data-Flow Analysis (DFA) DFA analysis is based on the data-flow structure of the

system behavior of interest (e.g. forward and backward propagation) •  PRED(n) are the virtual predecessors of CFG node n

regarding the data flow of interest (Cache Analysis: usually CFG predecessors)

The data domain L of the analysis forms a lattice, on which the transfer function Fn(): L → L models the semantics of the system behavior of interest.

To merge two or more states, a join function ⊔: L × L → L is used to compute the least upper bound

Page 11: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

11

Data-Flow Analysis (2) Data-flow equations modeling the data-

flow between nodes:

IN(n) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) OUT(n) = Fn ( IN(n) )

node n

OUT(n)

IN(n)

Fn()

Page 12: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

12

Data-Flow Analysis (3)

Monotonicity requirements for solving the data-flow equation iteratively: •  the transfer functions Fn(s) as well as the join function

s1⊔s2 must be monotone to ensure termination of the analysis.

Monotonicity: a function f: AàB is monotone, iff

∀a,a’∈A. (a ⊆A a’) à ( f(a) ⊆B f(a’) )

Page 13: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

13

Data-Flow Analysis (4) Iterative Algorithm to find least fixpoint for data-flow equations:

for i ← 1 to N do /* initialize node i: */

OUT(i) = ⊥ while (sets are still changing) do

for i ← 1 to N do /* recompute sets at node i: */ IN(i) = ⊔ ( { OUT(j) | j ∈ PRED(n) } ) OUT(i) = Fn( IN(i) )

Page 14: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

14

Concrete & Abstract Semantics

Concrete Cache Semantics: Model the semantics of the relevant aspects of the program (here: cache state & update). The concrete semantics collects the set of all possible cache states for each program point. Abstract Cache Semantics: Semantics in a different, usually finite domain, connected to the concrete semantics by an abstraction/concretization function.

Page 15: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

N-way Set-Associative Cache

15

Set is selectedby ld(m) address

bits

Block 1,1

Block 2,1

Block 1,2 Block 1,n

Block 2,2 Block 2,n

...

...

Block m,1 Block m,2 Block m,n...

... ... ...

Replacement Strategyupdates blocks in one set

1

2

m sets

n ways

Block (Line): valid bit (v), tagand data (k bytes)

...v w1 w2 wktagtag ld(m) bits

address word

ld(k) bits

Page 16: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Fully-associative Cache (Associativity N)

16

Cache is updated based

on value of TAG.Replacement

Policydetermines the

updatestrategy used.

Way 1

Way 2

Way N

Line: valid bit (v), tagand data (k bytes)

...v w1 w2 wktag

tag

address word

offset Associativity: N

LRU, FIFO:youngest

LRU, FIFO:oldest, evicted on miss

Page 17: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

17

Concrete Cache Semantics (Fully Associative Cache)

Cache Configuration: Mapping from cache line to tag S (data is irrelevant)

Domain: For each program point, set of all possible cache states

State at start node: Singleton set with empty cache, or set of all possible cache configurations

Update: For a cache configuration C and cache reference S, the new cache configuration C’ after accessing S

Page 18: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

18

Concrete LRU Update (Fully Associative Cache)

Update Function for 4-way cache (1 line per way) with LRU a

b

c

d

c

a

b

d

access c

a

b

c

d

e

a

b

c

access e

HIT

MISS

Page 19: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

19

Abstract Cache Semantics for MUST / MAY Analysis

Abstract Cache Configuration Compact representation of cache configuration set MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age

Join: MUST: For each tag S, the maximum age MAY: For each tag S, the minimum age

Update (LRU) Accessed Tag: Youngest Set MUST: For other tags, increase age if may be aged MAY: For other tags, increase age if must be aged

Page 20: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

20

Abstract Cache Representation

a <= 1b <= 3c <= 4

d,e <= 5+

{ a }

{ }

{ b }

{ c }

MUST Analysis

or

a >= 2b >= 4c >= 5

d,e >= 1

{ d,e }

{ }

{ a }

{ b }

MAY Analysis

or

⊤  =  ∀x,  x  ≤  N+1  

⊤ =  ∀x,  x  ≥ 1  

Page 21: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

21

Abstract Cache Semantics (MUST Concretization)

a <= 1b <= 3c <= 4

d,e <= 5+

{ a }

{ }

{ b }

{ c }

MUST Analysis

or

Concretization

a

b

c

d

a

b

c

e

a

b

d

c

a

b

e

c

a

c

b

d

a

c

b

e

a

d

b

c

a

e

b

c

Page 22: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

22

Abstract Cache Semantics (MUST Join)

a <= 1b <= 3c <= 4

d,e <= 5+

MUST Join

join

a <= 2c <= 4d <= 4

b,e <= 5+

a <= 2b <= 5+c <= 4

d,e <= 5+

{ a }

{ }

{ b }

{ c }

{ }

{ a }

{ }

{ c,d }

{ }

{ a }

{ }

{ c }

Page 23: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

23

Abstract Cache Update Function: (LRU Cache, MUST analysis)

when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) max-age(d) < max-age(c) à max-age’(d) = max-age(d) + 1

Page 24: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

24

Abstract Cache Update Function: (LRU Cache, MUST analysis)

when accessing block c max-age’(c) = 1 max-age(d) ≥ max-age(c) à max-age’(d) = max-age(d) 1.  assume age(d) < age(c) à max-age(d) ≥ age(d)+1 2.  assume age(d) > age(c) à age’(d) = age(d) max-age(d) < max-age(c) à max-agd’(d) = max-age(d) + 1 1.  If age(d) < age(c), age’(d) = age(d) + 1 ≤ max-age(d) + 1 2.  If age(d) > age(c), age’(d) = age(d) ≤ max-age(d) + 1

Page 25: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

25

Cache Hit/Miss Classification using MUST analysis If at some program point, tag S must be in the cache, i.e., its maximum age is less than or equal to the associativity, then

The cache access is classified as ALWAYS HIT If at some program point, it is not the case that tag S may be in the cache, i.e., its minimum age is greater than the associativity of the cache, then

The cache access is classified as ALWAYS MISS Otherwise

The cache access is NOT CLASSIFIED

Page 26: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Abstract Cache Semantics (MAY Concretization)

a >= 2b >= 4c >= 5

d,e >= 1

{ d,e }

{ a }

{ }

{ b }

MAY Analysis

or

Concretization

d

a

e

b

e

a

d

b

d

e

a

b

e

d

a

b

Page 27: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Abstract Cache Semantics (MAY Join)

a >= 2b >= 4c >= 5

d,e >= 1

{ }

{ e }

{ }

{ a }

MAY Analysis

join

{ d,e }

{ a }

{ }

{ b }

a >= 4b >= 5c >= 5d >= 5e >= 2

a >= 2b >= 4c >= 5d >= 1e >= 1

{ d, e }

{ a }

{ }

{ b }

Page 28: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

28

Abstract Cache Update Function: (LRU Cache, MAY analysis)

when accessing block c min-age’(c) = 1 min-age(d) ≤ min-age(c) à min-age’(d) = min-age(d) + 1 1.  if age(d) > age(c) ≥ min-age(d) à

age’(d) = age(d) ≥ min-age(d) + 1 2.  assume age(d) < age(c) à age’(d) = age(d) + 1

min-age(d) > min-age(c) à min-age’(d) = min-age(d)

Page 29: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

29

Cache Hit/Miss Classification using MUST and MAY analysis If at some program point, tag S must be in the cache, i.e., its maximum age is less than or equal to the associativity, then

The cache access is classified as ALWAYS HIT If at some program point, it is not the case that tag S may be in the cache, i.e., its minimum age is greater than the associativity of the cache, then

The cache access is classified as ALWAYS MISS Otherwise

The cache access is NOT CLASSIFIED What is the benefit of ALWAYS MISS over NOT CLASS.?

Page 30: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

30

Discussion

Page 31: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

31

Consider  a  data  cache  (1  word  line  size),  with  address  of  odd_even_counter  sta<cally  known:  

static unsigned odd_even_counter[2]; ++odd_even_counter[sensor() % 2]; ++odd_even_counter[sensor() % 2]; ++odd_even_counter[sensor() % 2]; ++odd_even_counter[sensor() % 2]; ++odd_even_counter[sensor() % 2];!

Which  access  will  be  a  cache  miss?  How  many  access  will  be  cache  hits?  

Persistence Analysis

Page 32: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

32

Sometimes, we do not know whether one particular access will be always a hit or a miss.

A cache element is said to be persistent (with respect to a program scope S), if in every execution of the scope, all but the first access are guaranteed to be cache hits

Data Caches benefit from persistence analysis, because address (implies tag) is not exactly known (e.g., arrays)

Persistence Analysis

Page 33: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

33

Published persistence analysis until ~2009 was unsound. Only recently, development of correct persistence analyses (LRU only), published e.g. in (Ju,Huynh,Roychoudhury).

Abstract Domain: For each tag, set of possible younger tags (YS) accessed in the program scope of interest.

If |YS(c)| is less than the associativity of the cache, the element is persistent in the scope (i.e., it is not evicted once loaded)

DFA-based Persistence Analysis

Page 34: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

34

Known DFA Persistence Analyses only work with LRU cache

Another technique based on static scopes (LRU, FIFO): If during one execution of a program scope at most N elements are accessed, then all of them are persistent in an N-way cache.

Open Problem (for all persistence analyses): How to find good program scopes? Functions and Loops are obvious candidates. Which heuristics?

Scope-Based Persistence Analysis

Page 35: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

35

Usually assumes that the address of accessed elements is known, or within some small interval (e.g., if array index is unknown)

Precision can be further improved by analyzing array indices and access patterns.

If address is unknown, set-associative caches become less effective: access may affect any set. Modularity?

To improve analysis results, cache locking or cache splitting can be used, disabling the cache for “unpredictable accesses”.

Data Cache Analysis Remarks

Page 36: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

36

Applying the Cache Categorizations to ILP

In integer linear programming (ILP) we typically calculate the WCET by maximizing Σ xi · ti •  ti … execution time of CFG edge I (constant) •  xi … execution frequency of CFG edge I

(to be determined)

The hit and miss count of the cache are modeled by additional flow variables: xi = xi,h + xi,m

Thus, the updated goal function is Σ xi,h · ti,h + Σ xi,m · ti,m

Page 37: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

37

Applying the Cache Categorizations to ILP (2)

Depending on the cache categorization of a memory reference at edge i additional flow constraints are added:

•  always hit [ah]: xi,m = 0

•  always miss [am]: xi,h = 0

•  global persistency [gp]: xi,h ≥ xi - 1

•  local persistency [ps(S)]: xi,h ≥ xi – (∑ xk | edge k is entry to context S)

•  [nc]: no additional constraints are created

Page 38: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

38

Remarks to DFA-Based Cache Modeling

Persistence analysis is not necessary to distinguish first from subsequent loop iterations. To this end, the CFG is virtually rewritten to separate first loop iterations from the others (Virtual Loop Unpeeling1)

The separation of cache classification and WCET calculation

in DFA-based cache analysis scales well compared to the integrated approach where cache classification was modeled as cache conflict graph within the ILP problem.

1 Sometimes called “virtual loop unrolling”

Page 39: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

39

Remarks to DFA-Based Cache Modeling (2)

•  The  DFA-­‐based  cache  analysis  works  quite  well  for  set-­‐

associa<ve  caches  with  LRU  (least  recently  used)  replacement  strategy:  –  LRU  has  the  nice  locality  property  that  the  content  of  one  cache  line  is  not  affected  by  memory  accesses  that  map  to  other  cache  lines.  

•  However,  to  improve  hardware  performance,  oVen  much  less  predictable  replacement  strategies  are  used:  –  ColdFire  MCF  5307:  pseudo-­‐round  robin  replacement  –  PowerPC  750/755:  pseudo-­‐LRU  replacement  

Page 40: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

40

Remarks to DFA-Based Cache Modeling (3)

Avg.  performance  of  PRR  and  PLRU  is  similar  to  LRU,  but  predictability  is  much  worse!    Analysis  Results  with  PLRU:  MAY  analysis  does  not  yield  any  informa<on  at  all!  (star<ng  with  unknown  cache,  no  block  is  found  to  be  removed)  MUST  analysis  provides  some  informa<on  (but  less  than  for  LRU):  at  most  4  blocks  are  found  in  each  cache  set  (out  of  8  blocks  in  prac<ce)  S/ll  ongoing  research  (WCET’2010)  

Pseudo-­‐LRU  (PLRU):  The  cache  lines  are  leaves  of  a  tree  where  on  each  node  of  the  tree  a  path  bit  is  placed.  The  replacement  line  is  determined  by  following  from  top  along  the  path  indicated  by  the  path  bit.  On  each  regular  access,  the  path  bits  along  this  access  are  set  to  the  other  direc<on.  

b0 b1 b2

b4 b3 b5 b6

L1 L2 L3 L4 L5 L6 L7 L0

0

1

1

1

0 1 0 1 0 1 0

0 1 0

Page 41: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

41

Remarks to DFA-Based Cache Modeling (4)

Pseudo-­‐Round-­‐Robin  (PRR):  On  a  4-­‐way  set-­‐assoc.  caches  a  two-­‐bit  replacement  counter  is  used.  This  counter  is  shared  for  all  cache  lines  and  is  only  modified  (increased  mod  4)  on  replacement.  Thus,  each  cache  line  has  an  influence  on  the  others!    Analysis  Results  with  PRR:  MAY  analysis  does  not  yield  any  informa<on  at  all!  (without  counter  or  age  informa<on,  one  can  never  know  which  block  is  removed  from  cache)  MUST  analysis  provides  only  ligle  informa<on  (much  less  as  for  LRU):  when  a  block  b  is  accessed,  it  goes  into  to  cache,  but  without  counter  or  age  informa<on,  we  do  not  know,  which  block  is  removed  à  all  elements  currently  in  the  set  must  be  removed  (only  1  out  of  possibly  4  elements  can  be  found  to  be  in  the  cache)  With  PRR,  only  1  way  is  effecBvely  used  FIFO  caches:  Cache  hit/miss  classifica<on  difficult  (ECRTS’10)  

Page 42: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

42

Summary & Discussion

Topic of this lecture: cache access classification Abstract Interpretation: DFA + Abstract Cache States Cache Hit/Miss Classification: MUST/MAY analysis, for

instruction caches Replacement Policies: Most work published on LRU; also

applicable to direct mapped caches. FIFO,PLRU & PRR are less predictable.

Discussion: Preemption? Unpredictable Accesses? Alternatives (Scratchpad)?

Page 43: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

43

References 1.  CMHC: Henrik Theiling, Christian Ferdinand, Reinhard

Wilhelm, Fast and Precise WCET Prediction by Separate Cache and Path Analyses, Real-Time Systems 18(2/3), Kluwer, 2000.1

2.  Data-Cache Analysis: Bach Khoa Huynh, Lei Ju, and Abhik Roychoudhury. 2011. Scope-aware Data Cache Analysis for WCET Estimation. Proc. IEEE RTAS `11.

3.  FIFO Cache Analysis: Daniel Grund and Jan Reineke. 2010. Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection. In Proceedings of the 2010 22nd Euromicro Conference on Real-Time Systems (ECRTS '10).

1 For persistance analysis, refer to [2], not [1]

Page 44: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

44

References 1.  Preemption: Chang-Gun Lee, Joosun Hahn, Yang-Min

Seo, Sang Lyul Min, Rhan Ha, Seongsoo Hong, Chang Yun Park, Minsuk Lee, and Chong Sang Kim. 1998. Analysis of Cache-Related Preemption Delay in Fixed-Priority Preemptive Scheduling. IEEE Trans. Comput. 47, 6 (June 1998).

2.  Abstract Interpretation: Julien Bertrane, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné and Xavier Rival.2010. Static Analysis and Verification of Aerospace Software by Abstract Interpretation. Paper 2010-3385 in American Institue of Aeronautics and Astronautics (AIAA)

Page 45: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Extra Material SS 2011

45

Page 46: Hardware Modeling 2 Cache Analyses · Recap: Caches in WCET Analysis Purpose: Bridge gap between fast CPU and memory Essential to analyze caches on many architectures Example: 40

Exercise: 2-way set-assoc cache: MUST, MAY, PS

46

Compiled  from  e.g.  x, y, z = a, b, 0!while (x > 0 &&! y > 0)!{ z += x-- + y-- }!x,y = 0,0!

START

tag, set, offset0, 0, 00, 0, 1

0,1,11,0,0

1,0,11,1,0

1,1,12,0,0

0, 1, 0

END


Recommended