Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | newhondacity |
View: | 227 times |
Download: | 0 times |
of 20
7/27/2019 Chap4 Caching Testing
1/20
Chapter 4 (continued):
Caching;
Testing Memory Modules
7/27/2019 Chap4 Caching Testing
2/20
fig_04_30
Memory organization:
Typical Memory map
For power loss
7/27/2019 Chap4 Caching Testing
3/20
fig_04_31
Memory
hierarchy
7/27/2019 Chap4 Caching Testing
4/20
fig_04_32
Paging / Caching
Why it typically works:
locality of reference
(spatial/temporal)
working set
Note: in real-timeembedded systems,
behavior may be atypical;
but caching may still be a
useful technique
Here we consider caching
external to the CPUthe
CPU may have one or more
levels of caching built in
7/27/2019 Chap4 Caching Testing
5/20
fig_04_33
Typical memory system with cache: hit rate (miss rate)
important
Remember!
Registers here
7/27/2019 Chap4 Caching Testing
6/20
Basic caching strategies:
Direct-mapped Associative
Block-set associative questions:
what is associative memory?
what is overhead?
what is efficiency (hit rate)?
is bigger cache better?
7/27/2019 Chap4 Caching Testing
7/20
Associative memory: storage location related to data stored
Examplehashing:--When software program is compiled or assembled, a symbol table must becreated to link addresses with symbolic names--table may be large; even binary search of names may be too slow
--convert each name to a number associated with the name, this number will bethe symbol table indexFor example, let a = 1, b = 2, c = 3,Then cab has value 1 + 2 + 3 = 6ababab has value 3 *(1 + 2) = 9And vvvvv has value 5*22 = 110Address will be modulo a prime p, if we expect about 50 unique identifiers, cantake p = 101 (make storage about twice as large as number of items to be stored,reduce collisions)Now array of names in symbol table will look like:0>1>2--->
6--->cab
9--->ababab--->vvvvv
Here there is one collision, at address 9; the two items are stored in a linked list
Access time for an identifier
7/27/2019 Chap4 Caching Testing
8/20
Caching: the basic processnote OVERHEAD for each task
--program needs information M that is not in the CPU
--cache is checked for M
how do we know if M is in the cache?--hit: M is in cache and can be retrieved and used by CPU
--miss: M is not in cache (M in RAM or in secondary memory)
where is M?
* M must be brought into cache* if there is room, M is copied into cache
how do we know if there is room?
* if there is no room, must overwrite some info M
how do we select M?
++ if M has not been modified, overwrite ithow do we know if M has been modified?
++ if M has been modified, must save changes
how do we save changes to M?
7/27/2019 Chap4 Caching Testing
9/20
fig_04_34
Example: direct mapping
32-bit words, cache holds 64K words, in 128 0.5K blocks
Memory addresses 32 bits
Main memory 128M words; 2K pages, each holds 128 blocks (~ cache)
fig_04_35
fig_04_36
2 bits--byte; 9 bits--word address;
7 bitsblock address (index);
11 (of 15)tag (page block is from)
Tag table: 128 entries (one for each
block in the cache). Contains:Tag: page block came from
Valid bit: does this block contain data
write-through: any change propagated
immediately to main memory
delayed write: since this data may
change again soon, do not propagate
change to main memory immediately
this saves overhead; instead, set the dirty
bit
Intermediate: use queue, update
periodically
When a new block is brought in, if the
valid bit is true and the dirty bit is true, the
old block must first be copied into main
memory
Replacement algorithm: none; each
block only has one valid cache location
7/27/2019 Chap4 Caching Testing
10/20fig_04_37
Problem with direct mapping: two frequently used parts of
code can be in different Block0sso repeated swapping
would be necessary; this can degrade performance
unacceptably, especially in realtime systems (similar tothrashing in operating system virtual memory system)
Another method: associative mapping: put new block
anywhere in the cache; now we need an algorithm to decide
which block should be removed, if cache is full
7/27/2019 Chap4 Caching Testing
11/20
fig_04_38
Step 1: locate the desired
block within the cache; must
search tag table, linear
search may be too slow;search all entries in parallel
or use hashing
Step 2: if miss, decide which
block to replace.a.Add time accessed to tag
table info, use temporal
locality:
Least recently used (LRU)
a FIFO-type algorithm
Most recently used (MRU)
a LIFO-type algorithm
b. Choose a block at random
Drawbacks:long search times
Complexity and cost of supporting
logic
Advantages: more flexibility in
managing cache contents
7/27/2019 Chap4 Caching Testing
12/20fig_04_39
Intermediate method: block-set associative cache
Each index now specifies a setof blocks
Main memory: divided into m blocks organized into n groups
Group number = m mod nCache set number ~ main memory group number
Block from main memory group j can go into cache set j
Search time is less, since search space is smaller
How many blocks: simulation answer (one rule of thumb:
doubling associativity ~ doubling cache size, > 4-way probablynot efficient)
Two-way set-associative scheme
7/27/2019 Chap4 Caching Testing
13/20
Example: 256K memory-64 groups, 512 blocks
Block Group (m mod 64)
0 64 128 . . . 384 448 0
1 65 129 . . . 385 449 1
2 66 130 . . . 386 450 2
. . .
63 127 192 . . . 447 511 63
7/27/2019 Chap4 Caching Testing
14/20fig_04_40
Dynamic memory allocation virtual storage):
--for programs larger than main memory
--for multiple processes in main memory
--for multiple programs in main memory
General strategies may not work well because of hard
deadlines for real-time systems in embedded applications
general strategies are nondeterministic
Simple setup:
Can swap processes/programs
And their contexts
--Need storage (may be infirmware)
--Need small swap time compared
to run time
--Need determinism
Ex: chemical processing, thermal control
7/27/2019 Chap4 Caching Testing
15/20
fig_04_41
Overlays (pre-virtual storage):
Seqment program into one main
section and a set of overlays (kept in
ROM?)Swap overlays
Choose segmentation carefully to
prevent thrashing
7/27/2019 Chap4 Caching Testing
16/20
fig_04_42
Multiprogramming: similar to paging
Fixed partition size: Can get
memory fragmentationExample:
If each partition is 2K and we have
3 jobs:
J1 = 1.5K, J2 = 0.5K, J3 = 2.1KAllocate to successive partitions (4)
J2 is using only 0.5 K
J3 is using 2 partitions, one of size
0.1K
If a new job of size 1K enters
system, there is no place for it,
even though there is actually
enough unused memory for it
Variable size:
Use a scheme like
pagingInclude compaction
Choose parameters
carefully to prevent
thrashing
7/27/2019 Chap4 Caching Testing
17/20fig_04_43
Memory testing:
Components and basic architecture
7/27/2019 Chap4 Caching Testing
18/20fig_04_45
Faults to test: data and address lines; stuck-at and bridging
(if we assume no internal manufacturing defects)
7/27/2019 Chap4 Caching Testing
19/20fig_04_49
ROM testing:
stuck-at faults, bridging faults, correct data stored
Method: CRC (cyclic reduncancy check) or signature
analysisUse LFSR to compress a data stream into a K-bit pattern,
similar to error checking
(Q: how is error checking done?)
ROM contents modeled as N*M-bit data stream,N= address size, M = word size
7/27/2019 Chap4 Caching Testing
20/20
Error checking: simple examples
1.Detect one bit error: add a parity bit
2.Correct a 1-bit error: Hamming codeExample: send m message bits + r parity bitsThe number of possible error positions is
m + r + 1, we need 2r>= m + r + 1If m = 8, need r = 4; ri checks parity of bits with i in binary representationPattern:Bit #: 1 2 3 4 5 6 7 8 9 10 11 12Info: r0 r1 m1 r2 m2 m3 m4 r3 m5 m6 m7 m8
--- --- 1 --- 1 0 0 --- 0 1 1 1
Set parity = 0 for each groupr0: bits 1 + 3 + 5 + 7 + 9 + 11 = r0 + 1 + 1 + 0 + 0 + 1 r0 = 1r1: bits 2 + 3 + 6 + 7 + 10 + 11 = r1 + 1 + 0 + 0 + 1 + 1 r1 = 1r2: bits 4 + 5 + 6 + 7 + 12 = r2 + 1 + 0 + 1 r2 = 0r3: bits 8 + 9 + 10 + 11 + 12 = r3 + 0 + 1 + 1 + 1 r3 = 1Exercise: suppose message is sent and 1 bit is flipped in received messageCompute the parity bits to see which bit is incorrect
Addition: add an overall parity bit to end of message to also detect two errors
Note:a.this is just one example, a more general formulation of Hamming codesusing the finite field arithmetic can also be givenb. this is one example of how error correcting codes can be obtained, there aremany more complex examples, e.g., Reed-Solomon codes used in CD players