+ All Categories
Home > Documents > 1 Cache Memory. Objectives Discuss the types of cache Describe the characteristics of the computer...

1 Cache Memory. Objectives Discuss the types of cache Describe the characteristics of the computer...

Date post: 03-Jan-2016
Category:
Upload: jessica-ross
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
44
1 Cache Memory Cache Memory
Transcript

11

Cache MemoryCache Memory

ObjectivesObjectives

Discuss the types of cacheDiscuss the types of cache Describe the characteristics of the Describe the characteristics of the

computer memory systemcomputer memory system Understand the heirarchy of the computer Understand the heirarchy of the computer

memorymemory

22

33

CacheCache

Small amount of fast memorySmall amount of fast memory Sits between normal main memory and Sits between normal main memory and

CPUCPU May be located on CPU chip or moduleMay be located on CPU chip or module

44

Cache/Main Memory StructureCache/Main Memory Structure

55

Cont..Cont..- Memory is considered consists of a number of fixed-length - Memory is considered consists of a number of fixed-length

blocks of K words each. There are M=2blocks of K words each. There are M=2nn/K blocks./K blocks.

- The cache consists of - The cache consists of C linesC lines and each lines contains K words, and each lines contains K words, plus a tag of a few bits. The number of words in the line is plus a tag of a few bits. The number of words in the line is referred to as the referred to as the line sizeline size and the number of lines is less than and the number of lines is less than the number of main memory blocks. the number of main memory blocks.

- Some subset of the blocks of memory resides in lines in the Some subset of the blocks of memory resides in lines in the cache. If a word in a block of memory is read, that block is cache. If a word in a block of memory is read, that block is transferred to one of the lines of the cache. Because there are transferred to one of the lines of the cache. Because there are more blocks than lines, an individual line cannot be uniquely more blocks than lines, an individual line cannot be uniquely and permanently dedicated to a particular block.and permanently dedicated to a particular block.

- Each line includes a Each line includes a tagtag that identifies which particular block is that identifies which particular block is currently being stored. currently being stored.

66

Cache operation – overviewCache operation – overview

CPU requests contents of memory locationCPU requests contents of memory location Check cache for this dataCheck cache for this data If present, get from cache (fast)If present, get from cache (fast) If not present, read required block from main If not present, read required block from main

memory to cachememory to cache Then deliver from cache to CPUThen deliver from cache to CPU Cache includes tags to identify which block of Cache includes tags to identify which block of

main memory is in each cache slotmain memory is in each cache slot

77

Cache Read Operation - FlowchartCache Read Operation - Flowchart

88

Cache DesignCache Design

SizeSize Mapping FunctionMapping Function Replacement AlgorithmReplacement Algorithm Write PolicyWrite Policy Block SizeBlock Size Number of CachesNumber of Caches

99

Size does matterSize does matter

CostCost More cache is expensiveMore cache is expensive

SpeedSpeed More cache is faster More cache is faster Checking cache for data takes timeChecking cache for data takes time

1010

Typical Cache OrganizationTypical Cache Organization

1111

Comparison of Cache SizesComparison of Cache SizesProcessor Type

Year of Introduction L1 cachea L2 cache L3 cache

IBM 360/85 Mainframe 1968 16 to 32 KB — —

PDP-11/70 Minicomputer 1975 1 KB — —

VAX 11/780 Minicomputer 1978 16 KB — —

IBM 3033 Mainframe 1978 64 KB — —

IBM 3090 Mainframe 1985 128 to 256 KB — —

Intel 80486 PC 1989 8 KB — —

Pentium PC 1993 8 KB/8 KB 256 to 512 KB —

PowerPC 601 PC 1993 32 KB — —

PowerPC 620 PC 1996 32 KB/32 KB — —

PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB

IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB

IBM S/390 G6 Mainframe 1999 256 KB 8 MB —

Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —

IBM SPHigh-end server/ supercomputer

2000 64 KB/32 KB 8 MB —

CRAY MTAb Supercomputer 2000 8 KB 2 MB —

Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB

SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —

Itanium 2 PC/server 2002 32 KB 256 KB 6 MB

IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB

CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —

1212

Mapping FunctionMapping Function

Cache of 64kByteCache of 64kByte Cache block of 4 bytesCache block of 4 bytes

i.e. cache is 16k (2i.e. cache is 16k (21414) lines of 4 bytes) lines of 4 bytes

16MBytes main memory16MBytes main memory 24 bit address 24 bit address

(2(22424=16M)=16M)

1313

Direct MappingDirect Mapping

Each block of main memory maps to only one Each block of main memory maps to only one cache linecache line i.e. if a block is in cache, it must be in one i.e. if a block is in cache, it must be in one

specific placespecific place Address is in two partsAddress is in two parts Least Significant Least Significant ww bits identify unique word bits identify unique word Most Significant Most Significant ss bits specify one memory block bits specify one memory block The MSBs are split into a cache line field The MSBs are split into a cache line field rr and a and a

tag of tag of s-rs-r (most significant) (most significant)

1414

Direct MappingDirect MappingAddress StructureAddress Structure

Tag s-r Line or Slot r Word w

8 14 2

24 bit address24 bit address 2 bit word identifier (4 byte block)2 bit word identifier (4 byte block) 22 bit block identifier22 bit block identifier

8 bit tag (=22-14)8 bit tag (=22-14) 14 bit slot or line14 bit slot or line

No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field Check contents of cache by finding line and checking TagCheck contents of cache by finding line and checking Tag

1515

Direct Mapping Direct Mapping Cache Line TableCache Line Table

Cache lineCache line Main Memory blocks heldMain Memory blocks held 00 0, m, 2m, 3m…2s-m0, m, 2m, 3m…2s-m 11 1,m+1, 2m+1…2s-m+11,m+1, 2m+1…2s-m+1

m-1m-1 m-1, 2m-1,3m-1…2s-1m-1, 2m-1,3m-1…2s-1

1616

Direct Mapping Cache Direct Mapping Cache OrganizationOrganization

1717

Direct Mapping ExampleDirect Mapping Example

1818

Direct Mapping SummaryDirect Mapping Summary

Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2Number of addressable units = 2(s+w)(s+w) words words

or bytesor bytes Block size = line size = 2Block size = line size = 2ww words or bytes words or bytes Number of blocks in main memory =Number of blocks in main memory =

22(s+ w)(s+ w)/2/2ww = 2 = 2ss

Number of lines in cache = m = 2Number of lines in cache = m = 2rr

Size of tag = (s – r) bitsSize of tag = (s – r) bits

1919

Direct Mapping pros & consDirect Mapping pros & cons

SimpleSimple InexpensiveInexpensive Fixed location for given blockFixed location for given block

If a program accesses 2 blocks that map to If a program accesses 2 blocks that map to the same line repeatedly, cache misses are the same line repeatedly, cache misses are very highvery high

2020

Associative MappingAssociative Mapping

A main memory block can load into any A main memory block can load into any line of cacheline of cache

Memory address is interpreted as tag and Memory address is interpreted as tag and wordword

Tag uniquely identifies block of memoryTag uniquely identifies block of memory Every line’s tag is examined for a matchEvery line’s tag is examined for a match Cache searching gets expensiveCache searching gets expensive

2121

Fully Associative Cache Fully Associative Cache OrganizationOrganization

2222

Associative Mapping ExampleAssociative Mapping Example

2323

Tag 22 bitWord2 bit

Associative MappingAssociative MappingAddress StructureAddress Structure

22 bit tag stored with each 32 bit block of data22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hitCompare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is Least significant 2 bits of address identify which 16 bit word is

required from 32 bit data blockrequired from 32 bit data block e.g.e.g.

AddressAddress TagTag DataData Cache Cache lineline

FFFFFCFFFFFC FFFFFCFFFFFC 2468246824682468 3FFF3FFF

2424

Associative Mapping SummaryAssociative Mapping Summary

Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2s+w Number of addressable units = 2s+w

words or byteswords or bytes Block size = line size = 2w words or bytesBlock size = line size = 2w words or bytes Number of blocks in main memory = 2s+ Number of blocks in main memory = 2s+

w/2w = 2sw/2w = 2s Number of lines in cache = undeterminedNumber of lines in cache = undetermined Size of tag = s bitsSize of tag = s bits

2525

Set Associative MappingSet Associative Mapping

Cache is divided into a number of setsCache is divided into a number of sets Each set contains a number of linesEach set contains a number of lines A given block maps to any line in a given A given block maps to any line in a given

setset e.g. Block B can be in any line of set ie.g. Block B can be in any line of set i

e.g. 2 lines per sete.g. 2 lines per set 2 way associative mapping2 way associative mapping A given block can be in one of 2 lines in only A given block can be in one of 2 lines in only

one setone set

2626

Set Associative MappingSet Associative MappingExampleExample

13 bit set number13 bit set number Block number in main memory is modulo Block number in main memory is modulo

221313 000000, 00A000, 00B000, 00C000 … map 000000, 00A000, 00B000, 00C000 … map

to same setto same set

2727

Two Way Set Associative Two Way Set Associative Cache OrganizationCache Organization

2828

Set Associative MappingSet Associative MappingAddress StructureAddress Structure

Use set field to determine cache set to look inUse set field to determine cache set to look in Compare tag field to see if we have a hitCompare tag field to see if we have a hit e.ge.g

AddressAddress TagTag DataData Set numberSet number 1FF 7FFC1FF 7FFC 1FF1FF 1234567812345678 1FFF1FFF 001 7FFC001 7FFC 001001 1122334411223344 1FFF1FFF

Tag 9 bit Set 13 bitWord2 bit

2929

Two Way Set Associative Mapping ExampleTwo Way Set Associative Mapping Example

3030

Set Associative Mapping Set Associative Mapping SummarySummary

Address length = (s + w) bitsAddress length = (s + w) bits Number of addressable units = 2s+w words or Number of addressable units = 2s+w words or

bytesbytes Block size = line size = 2w words or bytesBlock size = line size = 2w words or bytes Number of blocks in main memory = 2dNumber of blocks in main memory = 2d Number of lines in set = kNumber of lines in set = k Number of sets = v = 2dNumber of sets = v = 2d Number of lines in cache = kv = k * 2dNumber of lines in cache = kv = k * 2d Size of tag = (s – d) bitsSize of tag = (s – d) bits

3131

Replacement Algorithms (1)Replacement Algorithms (1)Direct mappingDirect mapping

No choiceNo choice Each block only maps to one lineEach block only maps to one line Replace that lineReplace that line

3232

Replacement Algorithms (2)Replacement Algorithms (2)Associative & Set AssociativeAssociative & Set Associative

Hardware implemented algorithm (speed)Hardware implemented algorithm (speed) Least Recently used (LRU)Least Recently used (LRU) e.g. in 2 way set associativee.g. in 2 way set associative

Which of the 2 block is lru?Which of the 2 block is lru?

First in first out (FIFO)First in first out (FIFO) replace block that has been in cache longestreplace block that has been in cache longest

Least frequently usedLeast frequently used replace block which has had fewest hitsreplace block which has had fewest hits

RandomRandom

3333

Write PolicyWrite Policy

Must not overwrite a cache block unless Must not overwrite a cache block unless main memory is up to datemain memory is up to date

Multiple CPUs may have individual cachesMultiple CPUs may have individual caches I/O may address main memory directlyI/O may address main memory directly

3434

Write throughWrite through

All writes go to main memory as well as cacheAll writes go to main memory as well as cache Multiple CPUs can monitor main memory Multiple CPUs can monitor main memory

traffic to keep local (to CPU) cache up to datetraffic to keep local (to CPU) cache up to date Lots of trafficLots of traffic Slows down writes (creating bottleneck)Slows down writes (creating bottleneck)

Remember bogus write through caches!Remember bogus write through caches!

3535

Write backWrite back

Updates initially made in cache onlyUpdates initially made in cache only Update bit for cache slot is set when update Update bit for cache slot is set when update

occursoccurs If block is to be replaced, write to main If block is to be replaced, write to main

memory only if update bit is setmemory only if update bit is set Other caches get out of syncOther caches get out of sync I/O must access main memory through cacheI/O must access main memory through cache N.B. 15% of memory references are writesN.B. 15% of memory references are writes

3636

Pentium 4 CachePentium 4 Cache 80386 – no on chip cache80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative 80486 – 8k using 16 byte lines and four way set associative

organizationorganization Pentium (all versions) – two on chip L1 cachesPentium (all versions) – two on chip L1 caches

Data & instructionsData & instructions Pentium III – L3 cache added off chipPentium III – L3 cache added off chip Pentium 4Pentium 4

L1 cachesL1 caches 8k bytes8k bytes 64 byte lines64 byte lines four way set associativefour way set associative

L2 cache L2 cache Feeding both L1 cachesFeeding both L1 caches 256k256k 128 byte lines128 byte lines 8 way set associative8 way set associative

L3 cache on chipL3 cache on chip

3737

Intel Cache EvolutionIntel Cache EvolutionProblem Solution

Processor on which feature first appears

External memory slower than the system bus. Add external cache using faster memory technology.

386

Increased processor speed results in external bus becoming a bottleneck for cache access.

Move external cache on-chip, operating at the same speed as the processor.

486

Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory

486

Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.

Create separate data and instruction caches.

Pentium

Increased processor speed results in external bus becoming a bottleneck for L2 cache access.

Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.

Pentium Pro

Move L2 cache on to the processor chip.

Pentium II

Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small.

Add external L3 cache. Pentium III 

Move L3 cache on-chip. Pentium 4

3838

Pentium 4 Block DiagramPentium 4 Block Diagram

3939

Pentium 4 Core ProcessorPentium 4 Core Processor Fetch/Decode UnitFetch/Decode Unit

Fetches instructions from L2 cacheFetches instructions from L2 cache Decode into micro-opsDecode into micro-ops Store micro-ops in L1 cacheStore micro-ops in L1 cache

Out of order execution logicOut of order execution logic Schedules micro-opsSchedules micro-ops Based on data dependence and resourcesBased on data dependence and resources May speculatively executeMay speculatively execute

Execution unitsExecution units Execute micro-opsExecute micro-ops Data from L1 cacheData from L1 cache Results in registersResults in registers

Memory subsystemMemory subsystem L2 cache and systems busL2 cache and systems bus

4040

Pentium 4 Design ReasoningPentium 4 Design Reasoning Decodes instructions into RISC like micro-ops before L1 Decodes instructions into RISC like micro-ops before L1

cachecache Micro-ops fixed lengthMicro-ops fixed length

Superscalar pipelining and schedulingSuperscalar pipelining and scheduling

Pentium instructions long & complexPentium instructions long & complex Performance improved by separating decoding from Performance improved by separating decoding from

scheduling & pipeliningscheduling & pipelining (More later – ch14)(More later – ch14)

Data cache is write backData cache is write back Can be configured to write throughCan be configured to write through

L1 cache controlled by 2 bits in registerL1 cache controlled by 2 bits in register CD = cache disableCD = cache disable NW = not write throughNW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate2 instructions to invalidate (flush) cache and write back then invalidate

L2 and L3 8-way set-associative L2 and L3 8-way set-associative Line size 128 bytesLine size 128 bytes

4141

PowerPC Cache OrganizationPowerPC Cache Organization

601 – single 32kb 8 way set associative601 – single 32kb 8 way set associative 603 – 16kb (2 x 8kb) two way set associative603 – 16kb (2 x 8kb) two way set associative 604 – 32kb604 – 32kb 620 – 64kb620 – 64kb G3 & G4G3 & G4

64kb L1 cache64kb L1 cache 8 way set associative8 way set associative

256k, 512k or 1M L2 cache256k, 512k or 1M L2 cache two way set associativetwo way set associative

G5G5 32kB instruction cache32kB instruction cache 64kB data cache64kB data cache

4242

PowerPC G5 Block DiagramPowerPC G5 Block Diagram

4343

Exercise Exercise Consider a machine with a byte addressable main memory of 2Consider a machine with a byte addressable main memory of 21616 bytes and bytes and block size of 8 bytes. Assume that a block size of 8 bytes. Assume that a direct mapped cachedirect mapped cache consisting of 32 consisting of 32 lines is used with this machine.lines is used with this machine.

a) How is a 16-bit memory address divided into tag, line number, a) How is a 16-bit memory address divided into tag, line number,

and byte number?and byte number?

b) Into what line would bytes with each of the following address beb) Into what line would bytes with each of the following address be

stored?stored?

0001 0001 0001 10110001 0001 0001 1011

1100 0011 0011 01001100 0011 0011 0100

1101 0000 0001 11011101 0000 0001 1101

1010 1010 1010 10101010 1010 1010 1010

c) Suppose the byte with address 0001 1010 0001 1010 is stored c) Suppose the byte with address 0001 1010 0001 1010 is stored

in the cache. What are the address of other bytes stored alongin the cache. What are the address of other bytes stored along

with it.with it.

4444

Cont..Cont..

d) How many total bytes of memory can be d) How many total bytes of memory can be

stored in the cache?stored in the cache?

e) Why is the tag also stored in the cache?e) Why is the tag also stored in the cache?


Recommended