Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | jessica-ross |
View: | 221 times |
Download: | 0 times |
ObjectivesObjectives
Discuss the types of cacheDiscuss the types of cache Describe the characteristics of the Describe the characteristics of the
computer memory systemcomputer memory system Understand the heirarchy of the computer Understand the heirarchy of the computer
memorymemory
22
33
CacheCache
Small amount of fast memorySmall amount of fast memory Sits between normal main memory and Sits between normal main memory and
CPUCPU May be located on CPU chip or moduleMay be located on CPU chip or module
55
Cont..Cont..- Memory is considered consists of a number of fixed-length - Memory is considered consists of a number of fixed-length
blocks of K words each. There are M=2blocks of K words each. There are M=2nn/K blocks./K blocks.
- The cache consists of - The cache consists of C linesC lines and each lines contains K words, and each lines contains K words, plus a tag of a few bits. The number of words in the line is plus a tag of a few bits. The number of words in the line is referred to as the referred to as the line sizeline size and the number of lines is less than and the number of lines is less than the number of main memory blocks. the number of main memory blocks.
- Some subset of the blocks of memory resides in lines in the Some subset of the blocks of memory resides in lines in the cache. If a word in a block of memory is read, that block is cache. If a word in a block of memory is read, that block is transferred to one of the lines of the cache. Because there are transferred to one of the lines of the cache. Because there are more blocks than lines, an individual line cannot be uniquely more blocks than lines, an individual line cannot be uniquely and permanently dedicated to a particular block.and permanently dedicated to a particular block.
- Each line includes a Each line includes a tagtag that identifies which particular block is that identifies which particular block is currently being stored. currently being stored.
66
Cache operation – overviewCache operation – overview
CPU requests contents of memory locationCPU requests contents of memory location Check cache for this dataCheck cache for this data If present, get from cache (fast)If present, get from cache (fast) If not present, read required block from main If not present, read required block from main
memory to cachememory to cache Then deliver from cache to CPUThen deliver from cache to CPU Cache includes tags to identify which block of Cache includes tags to identify which block of
main memory is in each cache slotmain memory is in each cache slot
88
Cache DesignCache Design
SizeSize Mapping FunctionMapping Function Replacement AlgorithmReplacement Algorithm Write PolicyWrite Policy Block SizeBlock Size Number of CachesNumber of Caches
99
Size does matterSize does matter
CostCost More cache is expensiveMore cache is expensive
SpeedSpeed More cache is faster More cache is faster Checking cache for data takes timeChecking cache for data takes time
1111
Comparison of Cache SizesComparison of Cache SizesProcessor Type
Year of Introduction L1 cachea L2 cache L3 cache
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SPHigh-end server/ supercomputer
2000 64 KB/32 KB 8 MB —
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
1212
Mapping FunctionMapping Function
Cache of 64kByteCache of 64kByte Cache block of 4 bytesCache block of 4 bytes
i.e. cache is 16k (2i.e. cache is 16k (21414) lines of 4 bytes) lines of 4 bytes
16MBytes main memory16MBytes main memory 24 bit address 24 bit address
(2(22424=16M)=16M)
1313
Direct MappingDirect Mapping
Each block of main memory maps to only one Each block of main memory maps to only one cache linecache line i.e. if a block is in cache, it must be in one i.e. if a block is in cache, it must be in one
specific placespecific place Address is in two partsAddress is in two parts Least Significant Least Significant ww bits identify unique word bits identify unique word Most Significant Most Significant ss bits specify one memory block bits specify one memory block The MSBs are split into a cache line field The MSBs are split into a cache line field rr and a and a
tag of tag of s-rs-r (most significant) (most significant)
1414
Direct MappingDirect MappingAddress StructureAddress Structure
Tag s-r Line or Slot r Word w
8 14 2
24 bit address24 bit address 2 bit word identifier (4 byte block)2 bit word identifier (4 byte block) 22 bit block identifier22 bit block identifier
8 bit tag (=22-14)8 bit tag (=22-14) 14 bit slot or line14 bit slot or line
No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field Check contents of cache by finding line and checking TagCheck contents of cache by finding line and checking Tag
1515
Direct Mapping Direct Mapping Cache Line TableCache Line Table
Cache lineCache line Main Memory blocks heldMain Memory blocks held 00 0, m, 2m, 3m…2s-m0, m, 2m, 3m…2s-m 11 1,m+1, 2m+1…2s-m+11,m+1, 2m+1…2s-m+1
m-1m-1 m-1, 2m-1,3m-1…2s-1m-1, 2m-1,3m-1…2s-1
1818
Direct Mapping SummaryDirect Mapping Summary
Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2Number of addressable units = 2(s+w)(s+w) words words
or bytesor bytes Block size = line size = 2Block size = line size = 2ww words or bytes words or bytes Number of blocks in main memory =Number of blocks in main memory =
22(s+ w)(s+ w)/2/2ww = 2 = 2ss
Number of lines in cache = m = 2Number of lines in cache = m = 2rr
Size of tag = (s – r) bitsSize of tag = (s – r) bits
1919
Direct Mapping pros & consDirect Mapping pros & cons
SimpleSimple InexpensiveInexpensive Fixed location for given blockFixed location for given block
If a program accesses 2 blocks that map to If a program accesses 2 blocks that map to the same line repeatedly, cache misses are the same line repeatedly, cache misses are very highvery high
2020
Associative MappingAssociative Mapping
A main memory block can load into any A main memory block can load into any line of cacheline of cache
Memory address is interpreted as tag and Memory address is interpreted as tag and wordword
Tag uniquely identifies block of memoryTag uniquely identifies block of memory Every line’s tag is examined for a matchEvery line’s tag is examined for a match Cache searching gets expensiveCache searching gets expensive
2323
Tag 22 bitWord2 bit
Associative MappingAssociative MappingAddress StructureAddress Structure
22 bit tag stored with each 32 bit block of data22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hitCompare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is Least significant 2 bits of address identify which 16 bit word is
required from 32 bit data blockrequired from 32 bit data block e.g.e.g.
AddressAddress TagTag DataData Cache Cache lineline
FFFFFCFFFFFC FFFFFCFFFFFC 2468246824682468 3FFF3FFF
2424
Associative Mapping SummaryAssociative Mapping Summary
Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2s+w Number of addressable units = 2s+w
words or byteswords or bytes Block size = line size = 2w words or bytesBlock size = line size = 2w words or bytes Number of blocks in main memory = 2s+ Number of blocks in main memory = 2s+
w/2w = 2sw/2w = 2s Number of lines in cache = undeterminedNumber of lines in cache = undetermined Size of tag = s bitsSize of tag = s bits
2525
Set Associative MappingSet Associative Mapping
Cache is divided into a number of setsCache is divided into a number of sets Each set contains a number of linesEach set contains a number of lines A given block maps to any line in a given A given block maps to any line in a given
setset e.g. Block B can be in any line of set ie.g. Block B can be in any line of set i
e.g. 2 lines per sete.g. 2 lines per set 2 way associative mapping2 way associative mapping A given block can be in one of 2 lines in only A given block can be in one of 2 lines in only
one setone set
2626
Set Associative MappingSet Associative MappingExampleExample
13 bit set number13 bit set number Block number in main memory is modulo Block number in main memory is modulo
221313 000000, 00A000, 00B000, 00C000 … map 000000, 00A000, 00B000, 00C000 … map
to same setto same set
2828
Set Associative MappingSet Associative MappingAddress StructureAddress Structure
Use set field to determine cache set to look inUse set field to determine cache set to look in Compare tag field to see if we have a hitCompare tag field to see if we have a hit e.ge.g
AddressAddress TagTag DataData Set numberSet number 1FF 7FFC1FF 7FFC 1FF1FF 1234567812345678 1FFF1FFF 001 7FFC001 7FFC 001001 1122334411223344 1FFF1FFF
Tag 9 bit Set 13 bitWord2 bit
3030
Set Associative Mapping Set Associative Mapping SummarySummary
Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2s+w words or Number of addressable units = 2s+w words or
bytesbytes Block size = line size = 2w words or bytesBlock size = line size = 2w words or bytes Number of blocks in main memory = 2dNumber of blocks in main memory = 2d Number of lines in set = kNumber of lines in set = k Number of sets = v = 2dNumber of sets = v = 2d Number of lines in cache = kv = k * 2dNumber of lines in cache = kv = k * 2d Size of tag = (s – d) bitsSize of tag = (s – d) bits
3131
Replacement Algorithms (1)Replacement Algorithms (1)Direct mappingDirect mapping
No choiceNo choice Each block only maps to one lineEach block only maps to one line Replace that lineReplace that line
3232
Replacement Algorithms (2)Replacement Algorithms (2)Associative & Set AssociativeAssociative & Set Associative
Hardware implemented algorithm (speed)Hardware implemented algorithm (speed) Least Recently used (LRU)Least Recently used (LRU) e.g. in 2 way set associativee.g. in 2 way set associative
Which of the 2 block is lru?Which of the 2 block is lru?
First in first out (FIFO)First in first out (FIFO) replace block that has been in cache longestreplace block that has been in cache longest
Least frequently usedLeast frequently used replace block which has had fewest hitsreplace block which has had fewest hits
RandomRandom
3333
Write PolicyWrite Policy
Must not overwrite a cache block unless Must not overwrite a cache block unless main memory is up to datemain memory is up to date
Multiple CPUs may have individual cachesMultiple CPUs may have individual caches I/O may address main memory directlyI/O may address main memory directly
3434
Write throughWrite through
All writes go to main memory as well as cacheAll writes go to main memory as well as cache Multiple CPUs can monitor main memory Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to datetraffic to keep local (to CPU) cache up to date Lots of trafficLots of traffic Slows down writes (creating bottleneck)Slows down writes (creating bottleneck)
Remember bogus write through caches!Remember bogus write through caches!
3535
Write backWrite back
Updates initially made in cache onlyUpdates initially made in cache only Update bit for cache slot is set when update Update bit for cache slot is set when update
occursoccurs If block is to be replaced, write to main If block is to be replaced, write to main
memory only if update bit is setmemory only if update bit is set Other caches get out of syncOther caches get out of sync I/O must access main memory through cacheI/O must access main memory through cache N.B. 15% of memory references are writesN.B. 15% of memory references are writes
3636
Pentium 4 CachePentium 4 Cache 80386 – no on chip cache80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative 80486 – 8k using 16 byte lines and four way set associative
organizationorganization Pentium (all versions) – two on chip L1 cachesPentium (all versions) – two on chip L1 caches
Data & instructionsData & instructions Pentium III – L3 cache added off chipPentium III – L3 cache added off chip Pentium 4Pentium 4
L1 cachesL1 caches 8k bytes8k bytes 64 byte lines64 byte lines four way set associativefour way set associative
L2 cache L2 cache Feeding both L1 cachesFeeding both L1 caches 256k256k 128 byte lines128 byte lines 8 way set associative8 way set associative
L3 cache on chipL3 cache on chip
3737
Intel Cache EvolutionIntel Cache EvolutionProblem Solution
Processor on which feature first appears
External memory slower than the system bus. Add external cache using faster memory technology.
386
Increased processor speed results in external bus becoming a bottleneck for cache access.
Move external cache on-chip, operating at the same speed as the processor.
486
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory
486
Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.
Create separate data and instruction caches.
Pentium
Increased processor speed results in external bus becoming a bottleneck for L2 cache access.
Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.
Pentium Pro
Move L2 cache on to the processor chip.
Pentium II
Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.
Add external L3 cache. Pentium III
Move L3 cache on-chip. Pentium 4
3939
Pentium 4 Core ProcessorPentium 4 Core Processor Fetch/Decode UnitFetch/Decode Unit
Fetches instructions from L2 cacheFetches instructions from L2 cache Decode into micro-opsDecode into micro-ops Store micro-ops in L1 cacheStore micro-ops in L1 cache
Out of order execution logicOut of order execution logic Schedules micro-opsSchedules micro-ops Based on data dependence and resourcesBased on data dependence and resources May speculatively executeMay speculatively execute
Execution unitsExecution units Execute micro-opsExecute micro-ops Data from L1 cacheData from L1 cache Results in registersResults in registers
Memory subsystemMemory subsystem L2 cache and systems busL2 cache and systems bus
4040
Pentium 4 Design ReasoningPentium 4 Design Reasoning Decodes instructions into RISC like micro-ops before L1 Decodes instructions into RISC like micro-ops before L1
cachecache Micro-ops fixed lengthMicro-ops fixed length
Superscalar pipelining and schedulingSuperscalar pipelining and scheduling
Pentium instructions long & complexPentium instructions long & complex Performance improved by separating decoding from Performance improved by separating decoding from
scheduling & pipeliningscheduling & pipelining (More later – ch14)(More later – ch14)
Data cache is write backData cache is write back Can be configured to write throughCan be configured to write through
L1 cache controlled by 2 bits in registerL1 cache controlled by 2 bits in register CD = cache disableCD = cache disable NW = not write throughNW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate2 instructions to invalidate (flush) cache and write back then invalidate
L2 and L3 8-way set-associative L2 and L3 8-way set-associative Line size 128 bytesLine size 128 bytes
4141
PowerPC Cache OrganizationPowerPC Cache Organization
601 – single 32kb 8 way set associative601 – single 32kb 8 way set associative 603 – 16kb (2 x 8kb) two way set associative603 – 16kb (2 x 8kb) two way set associative 604 – 32kb604 – 32kb 620 – 64kb620 – 64kb G3 & G4G3 & G4
64kb L1 cache64kb L1 cache 8 way set associative8 way set associative
256k, 512k or 1M L2 cache256k, 512k or 1M L2 cache two way set associativetwo way set associative
G5G5 32kB instruction cache32kB instruction cache 64kB data cache64kB data cache
4343
Exercise Exercise Consider a machine with a byte addressable main memory of 2Consider a machine with a byte addressable main memory of 21616 bytes and bytes and block size of 8 bytes. Assume that a block size of 8 bytes. Assume that a direct mapped cachedirect mapped cache consisting of 32 consisting of 32 lines is used with this machine.lines is used with this machine.
a) How is a 16-bit memory address divided into tag, line number, a) How is a 16-bit memory address divided into tag, line number,
and byte number?and byte number?
b) Into what line would bytes with each of the following address beb) Into what line would bytes with each of the following address be
stored?stored?
0001 0001 0001 10110001 0001 0001 1011
1100 0011 0011 01001100 0011 0011 0100
1101 0000 0001 11011101 0000 0001 1101
1010 1010 1010 10101010 1010 1010 1010
c) Suppose the byte with address 0001 1010 0001 1010 is stored c) Suppose the byte with address 0001 1010 0001 1010 is stored
in the cache. What are the address of other bytes stored alongin the cache. What are the address of other bytes stored along
with it.with it.