Date post: | 16-Nov-2014 |
Category: |
Documents |
Upload: | api-26594847 |
View: | 320 times |
Download: | 1 times |
Computer ArchitectureChapter 5
Memory Hierarchy Design
Chapter Overview
5.1 Introduction5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory5.8 Protection and Examples of Virtual Memory
Introduction
5.1 Introduction5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory5.8 Protection and Examples of Virtual Memory
The Big Picture: Where are We Now?
The Five Classic Components of a Computer
Control
Datapath
Memory
ProcessorInput
Output
Topics In This Chapter: SRAM Memory TechnologyDRAM Memory TechnologyMemory Organization
Levels of the Memory Hierarchy
CPU Registers100s Bytes1s ns
CacheK Bytes4 ns1-0.1 cents/bitMain MemoryM Bytes100ns- 300ns$.0001-.00001 cents /bitDiskG Bytes, 10 ms (10,000,000 ns)
10 - 10 cents/bit-5
-6
CapacityAccess TimeCost
Tapeinfinitesec-min10
-8
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
StagingXfer Unit
prog./compiler1-8 bytes
cache cntl8-128 bytes
OS512-4K bytes
user/operatorMbytes
Upper Level
Lower Level
faster
Larger
Introduction The Big Picture: Where are We Now?
The ABCs of CachesIn this section we will:
Learn lots of definitions about caches – you can’t talk about something until you understand it (this is true in computer science at least!)
Answer some fundamental questions about caches:
Q1: Where can a block be placed in the upper level? (Block placement)Q2: How is a block found if it is in the upper level? (Block identification)Q3: Which block should be replaced on a miss? (Block replacement)Q4: What happens on a write? (Write strategy)
5.1 Introduction5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory5.8 Protection and Examples of Virtual Memory
Cache MemoryThe purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory.Although cache is much smaller than main memory, its access time is a fraction of that of main memory.Unlike main memory, which is accessed by address, cache is typically accessed by content; hence, it is often called content addressable memory .Because of this, a single large cache memory isn’t always desirable-- it takes longer to search.
CacheSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module
Cache/Main Memory Structure
Cache operation – overviewCPU requests contents of memory locationCheck cache for this dataIf present, get from cache (fast)If not present, read required block from main memory to cacheThen deliver from cache to CPUCache includes tags to identify which block of main memory is in each cache slot
Cache Read Operation - Flowchart
Comparison of Cache Sizes
a Two values seperated by a slash refer to instruction and data cachesb Both caches are instruction only; no data caches
Processor Type Year of Introduction
L1 cachea L2 cache L3 cache
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SP High-end server/ supercomputer
2000 64 KB/32 KB 8 MB —
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
The Principle of Locality
The Principle of Locality:Program access a relatively small portion of the address space at any instant of time.
Three Different Types of Locality:Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access)Sequential Locality : Sequential order of program execution except branch instructions.
The ABCs of Caches Definitions
A few terms
Inclusion PropertyCoherence PropertyAccess frequencyAccess timeCycle timeLatency BandwidthCapacityUnit of transfer
Memory Hierarchy: Terminology
Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper levelHit Time: Time to access the upper level which consists of
Upper level access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y)Miss Rate = 1 - (Hit Rate)Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Consider a memory with three levelsAverage memory access time (assuming hit at 3rd level)h1 * t1 + (1 – h1) [t1 + h2 * t2 + (1 – h2) * ( t2 + t3)] where t1, t2 and t3 are access times at the three levels
Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi
Effective Access time = � (fi * ti)
The ABCs of Caches Definitions
Cache Measures
Hit rate : fraction found in that levelSo high that usually talk about Miss rate
Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks)Miss penalty : time to replace a block from lower level, including time to replace in CPU
access time : time to lower level = f(latency to lower level)transfer time : time to transfer block =f(Bandwidth between upper & lower levels)
The ABCs of Caches Definitions
Measures
CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) * Clock Cycle Time
CPU clock cycles includes cache hit and CPU is stalled during miss
Memory Stall cycles = Number of misses * Miss penalty= IC * (Misses / Instruction) * Miss penalty= IC * (Memory Accesses / Instruction) * Miss Rate * Miss penaltyMiss rate and miss penalties are different for reads and writes
Memory Stall Cycles= IC * (Reads / Instruction) * Read Miss Rate * Read Miss penalty+ IC * (Writes / Instruction) * Write Miss Rate * Write Miss penalty
Miss Rate = Misses / Instruction = (Miss rate * Memory Accesses ) / Instruction Count= Miss rate * (Memory Accesses / Instruction)
Typical Cache Organization
Simplest Cache: Direct MappedMemory 4 Byte Direct Mapped Cache
Memory Address0
123456789ABCDEF
Cache Index0123
Location 0 can be occupied by data from:Memory location 0, 4, 8, ... etc.In general: any memory locationwhose 2 LSBs of the address are 0sAddress<1:0> => cache index
Which one should we place in the cache?How can we tell which one is in the cache?
The ABCs of Caches Definitions
Block 12 is placed in an 8 block cache:Fully associative, direct mapped, 2-way set associativeS.A. Mapping = Block Number Modulo Number Sets
0 1 2 3 4 5 6 7
Block
no.
Direct mapped:block 12 can go only into block 4 (12 mod 8)
0 1 2 3 4 5 6 7
Block
no.
Set associative:block 12 can go anywhere in set 0 (12 mod 4)
Set0
Set1
Set2
Set3
0 1 2 3 4 5 6 7
Block
no.
Fully associative:block 12 can go anywhere
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Block-frame address
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
Block
no.
Where can a block be placed in the Cache?Cache Memory
Each entry in the cache stores wordsTag on each block
No need to check index or block offset
The ABCs of Caches How is a block found if it is in the cache?
Address Byte Offset
Tag Index
Each entry in the cache stores wordsTag on each block
No need to check index or block offset
The ABCs of Caches How is a block found if it is in the cache?
Address Byte Offset
Tag Index
Cache MemoryTake advantage of spatial locality
Store multiple words
The diagram below is a schematic of what cache looks like.
Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101.The other two blocks are not valid.
Cache Memory
As an example, suppose a program generates the address 1AA. In 14-bit binary, this number is: 00000110101010.The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and the final 3 bits indicate the word within the block.
Cache Organizations I : DirectMapped Cache
Cache Index
0123
:
Cache DataByte 0
0431
:
Cache Tag Example: 0x50Ex: 0x01
0x50
Stored as partof the cache “state”
Valid Bit
:31
Byte 1Byte 31 :Byte 32Byte 33Byte 63 :
Byte 992Byte 1023 :
Cache Tag
Byte SelectEx: 0x00
9Block address
Cache Memory
Direct MappingAddress StructureTag s-r Line or Slot r Word w
8 14 2
24 bit address2 bit word identifier (4 byte block)22 bit block identifier
8 bit tag (=22-14)14 bit slot or line
No two blocks in the same line have the same Tag fieldCheck contents of cache by finding line and checking Tag
Direct Mapping Cache Line Table
Cache line Main Memory blocks held0 0, m, 2m, 3m…2s-m1 1,m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache Organization
Direct Mapping pros & cons
SimpleInexpensiveFixed location for given block
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into any line of cacheMemory address is interpreted as tag and wordTag uniquely identifies block of memoryEvery line’s tag is examined for a matchCache searching gets expensive
Fully Associative Cache Organization
Tag 22 bitWord2 bit
Associative Mapping Address Structure
22 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g.
Address Tag Data Cache lineFFFFFC FFFFFC 24682468 3FFF
Cache Organizations II : Set Associative CacheCache Memory
Cache Index
Set 1
:
Cache DataByte 0
0431
:
Cache Tag Example: 0x50Ex: 0x01 mod 16
0x50
Stored as partof the cache “state”
Valid Bit
:
Byte 1Byte 31 :
Byte 32Byte 33Byte 63 :
Byte 992Byte 1023 :
Cache Tag
Byte SelectEx: 0x00
9Block address
Set 0
Set 15
8
Set Associative Mapping
Cache is divided into a number of setsEach set contains a number of linesA given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set2 way associative mappingA given block can be in one of 2 lines in only one set
Set Associative MappingExample
13 bit set numberBlock number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set
Two Way Set Associative Cache Organization
Set Associative MappingAddress Structure
Use set field to determine cache set to look inCompare tag field to see if we have a hite.g
Address Tag Data Set number1FF 7FFC 1FF 12345678 1FFF001 7FFC 001 11223344 1FFF
Tag 9 bit Set 13 bitWord2 bit
Two Way Set
Associative Mapping Example
Replacement Algorithms (1)Direct mapping
No choiceEach block only maps to one lineReplace that line
Replacement Algorithms (2)Associative & Set Associative
Hardware implemented algorithm (speed)Least Recently used (LRU)e.g. in 2 way set associative
Which of the 2 block is lru?
First in first out (FIFO)replace block that has been in cache longest
Least frequently usedreplace block which has had fewest hits
Random
Write Policy
Must not overwrite a cache block unless main memory is up to dateMultiple CPUs may have individual cachesI/O may address main memory directly
Write through
All writes go to main memory as well as cacheMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of trafficSlows down writes
Remember bogus write through caches!
Write back
Updates initially made in cache onlyUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if update bit is setOther caches get out of syncI/O must access main memory through cacheN.B. 15% of memory references are writes
Let’s Do An Example:The Memory Addresses We’ll Be
UsingHere’s a number of addresses. We’ll be asking for the data at these addresses and see what happens to the cache when we do so.
1090 00000000000000000000010
0010
00010
0458931
ResultOff-set
SetTagAddress
Miss
1440 00000000000000000000010
1101
00000
0458931 Miss
5000 xxxxxxxxxxxxxxxxxxxxxxx xxxx
01000
0458931
1470 xxxxxxxxxxxxxxxxxxxxxxx
xxxx
0458931 xxxxx
Cache:1. Is Direct Mapped2. Contains 512 bytes.3. Has 16 sets.4. Each set can hold 32 bytes or
1 cache line.
Cache Memory
Here’s the Cache We’ll Be Touching
Initially the cache is empty.
N15 (1111)N14 (1110)N13 (1101)N12 (1100)N11 (1011)N10 (1010)N9 (1001)N8 (1000)N7 (0111)N6 (0110)N5 (0101)N4 (0100)N3 (0011)N2 (0010)N1 (0001)N0 (0000)
Data(Can hold a 32-byte cache line.)
TagVSet Address
Cache:1. Is Direct Mapped2. Contains 512
bytes.3. Has 16 sets.4. Each set can hold
32 bytes or 1 cache line.
Doing Some Cache ActionWe want to READdata from address
1090= 010|0010|00010
N15 (1111)
N14 (1110)
N13 (1101)
N12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
N2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd. Y 00000….10 Data from memory loc. 1088 - 1119
Cache Memory
We want to READdata from address
1440= 010|1101|00000
N15 (1111)
N14 (1110)
N13 (1101)
N12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1088 - 111900000….10Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….10 Data from memory loc. 1440 - 1471
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to READdata from address
5000= 1001|1100|01000
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…0010Y13 (1101)
N12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1088 - 111900000…….10Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….1001 Data from memory loc. 4992 - 5023
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to READdata from address
1470= 0010|1101|11110
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…00010Y13 (1101)
Data from memory loc. 4992 - 502300000….1001Y12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1088 - 111900000…….10Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….0010 Data from memory loc. 1440 - 1471
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to READdata from address
1600= 0011|0010|00000
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…00010Y13 (1101)
Data from memory loc. 4992 - 502300000….1001Y12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1060 - 109100000…….10Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….0011 Data from memory loc. 1600 - 1631
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to WRITEdata to address
256= 0000|1000|00000
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…00010Y13 (1101)
Data from memory loc. 4992 - 502300000….1001Y12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
N8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1600 - 163100000….0011Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….0000 Data from memory loc. 256 - 287
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to WRITEdata to address
1620= 0011|0010|10100
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…00010Y13 (1101)
Data from memory loc. 4992 - 502300000….1001Y12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
Data from memory loc. 256 - 28700000….0000Y8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1060 - 109100000…….10Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….0011 Data from memory loc. 1600 - 1631
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
We want to WRITEdata to address
1099= 0010|0010|01011
N15 (1111)
N14 (1110)
Data from memory loc. 1440 - 147100000…00010Y13 (1101)
Data from memory loc. 4992 - 502300000….1001Y12 (1100)
N11 (1011)
N10 (1010)
N9 (1001)
Data from memory loc. 256 - 28700000….0000Y8 (1000)
N7 (0111)
N6 (0110)
N5 (0101)
N4 (0100)
N3 (0011)
Data from memory loc. 1600 - 163100000…00011Y2 (0010)
N1 (0001)
N0 (0000)
Data(Always holds a 32-byte cache line.)
TagVSet Address
Y 00000….0010 Data from memory loc. 1088 - 1119
1001
1000
0100
0011
0011
0010
0010
0010
0010
0010
0001
0000
Tag
1100
0000
0000
0010
0010
1101
1101
0010
0010
0000
0000
1000
Set
101001620
111101470
010005000
000004096
000002048
000001600
000001440
010111099
000101090
000001024
00000512
00000256
OffsetAdd.
Doing Some Cache Action
Cache Memory
Write through —The information is written to both the block in the cache and to the block in the lower-level memory.
Write back —The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.
is block clean or dirty?
WT always combined with write buffers so that don’t wait for lower level memory
What happens on a write?
Cache Memory
A Write Buffer is needed between the Cache and Memory
Processor: writes data into the cache and the write buffer;Memory controller: write contents of the buffer to memory.
Write buffer is just a FIFO:Typical number of entries: 4;Must handle bursts of writes;
ProcessorCache
Write Buffer
DRAM
Write Buffer for Write ThroughCache Memory
Assume: a 16-bit (sub-block) write to memory location 0x0 and causes a miss. Do we allocate space in cache and possibly read in the block?
Yes: � Write Allocate (Write back caches)No: � Not Write Allocate (Write through)
Example:WriteMem[100]WriteMem[100]ReadMem[200]WriteMem[200]WriteMem[100]
NWA: four misses and one hitWA: two misses and three hits
Write-miss Policy: Write Allocate vs . Not Allocate
Cache Memory
Pentium 4 Cache
80386 – no on chip cache80486 – 8k using 16 byte lines and four way set associative organizationPentium (all versions) – two on chip L1 caches
Data & instructionsPentium III – L3 cache added off chipPentium 4
L1 caches8k bytes64 byte linesfour way set associative
L2 cache Feeding both L1 caches256k128 byte lines8 way set associative
L3 cache on chip
Intel Cache EvolutionProblem Solution Processor on which feature
first appears
External memory slower than the system bus. Add external cache using faster memory technology.
386
Increased processor speed results in external bus becoming a bottleneck for cache access.
Move external cache on-chip, operating at the same speed as the processor.
486
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory
486
Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.
Create separate data and instruction caches.
Pentium
Increased processor speed results in external bus becoming a bottleneck for L2 cache access.
Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.
Pentium Pro
Move L2 cache on to the processor chip.
Pentium II
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Add external L3 cache. Pentium III
Move L3 cache on-chip. Pentium 4
Reducing Cache Misses5.1 Introduction5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main Memory 5.7 Virtual Memory5.8 Protection and Examples of Virtual Memory
Compulsory —The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses .(Misses in even an Infinite Cache)Capacity —If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved.(Misses in Fully Associative Size X Cache)Conflict —If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses .(Misses in N-way Associative, Size X Cache)
Classifying Misses: 3 Cs