Memory HierarchyMemory Hierarchy
Faster Access-Lower CostFaster Access-Lower Cost
Principle of LocalityPrinciple of Locality
Programs access small portions of Programs access small portions of their address space at any instant of their address space at any instant of time.time.
Two typesTwo types• Temporal localityTemporal locality
Item referenced will be referenced again Item referenced will be referenced again soonsoon
• Spatial localitySpatial locality Items near the last referenced item will be Items near the last referenced item will be
referenced soonreferenced soon
Memory HierarchyMemory Hierarchy
Takes advantage of principle of localityTakes advantage of principle of locality Memory technologiesMemory technologies
• SRAM – fast but costlySRAM – fast but costly• DRAM – slower but not as costlyDRAM – slower but not as costly• Magnetic Disk – much slower but very cheapMagnetic Disk – much slower but very cheap
Idea: construct hierarchy of these Idea: construct hierarchy of these memories in increasing size away form memories in increasing size away form processorprocessor
Cache Memory (Two level)Cache Memory (Two level) Block – Smallest unit Block – Smallest unit
of data transferred.of data transferred. Hit rate – Fraction of Hit rate – Fraction of
memory access found memory access found in cachein cache
Miss rate (1 – hit rate)Miss rate (1 – hit rate) Hit time – Time to Hit time – Time to
access a level of access a level of memory including memory including determine hit or missdetermine hit or miss
Miss penalty – Time Miss penalty – Time required to fetch block required to fetch block from lower memoryfrom lower memory
Processor
Cache
Direct Mapped CacheDirect Mapped Cache
How do you map a block of memory How do you map a block of memory from larger memory space to cache?from larger memory space to cache?
Simplest method: assign location in Simplest method: assign location in cache for each location in memorycache for each location in memory
Function:Function:• (block addr) mod (# cache blocks)(block addr) mod (# cache blocks)• If # cache blocks is 2If # cache blocks is 2nn, block address for , block address for
memory address A is logmemory address A is log22(A) (A) • Note this is just the lower n bits of ANote this is just the lower n bits of A
A Direct-Mapped Cache ExampleA Direct-Mapped Cache Example
Accessing A CacheAccessing A Cache
IndexIndex VV TagTag DataData
000000 YY 1010 Memory(00000)Memory(00000)
001001 NN
010010 YY 1111 Memory(11010)Memory(11010)
011011 YY 0000 Memory(00011)Memory(00011)
100100 NN
101101 NN
110110 YY 1010 Memory(10110)Memory(10110)
111111 NN
References:10110 - m11010 - m10110 - h11010 - h10000 - m00011 - m10000 - h10010 - m
Updated CacheUpdated Cache
IndexIndex VV TagTag DataData
000000 YY 1010 Memory(00000)Memory(00000)
001001 NN
010010 YY 1010 Memory(10010)Memory(10010)
011011 YY 0000 Memory(00011)Memory(00011)
100100 NN
101101 NN
110110 YY 1010 Memory(10110)Memory(10110)
111111 NN
References:10110 - m11010 - m10110 - h11010 - h10000 - m00011 - m10000 - h10010 - m
Selecting the DataSelecting the Data
Handling Cache MissesHandling Cache Misses
Must modify control to take into Must modify control to take into account if miss occursaccount if miss occurs
Consider instruction memoryConsider instruction memory AlgorithmAlgorithm
• Send (PC – 4) to memorySend (PC – 4) to memory• Read memory and wait for resultRead memory and wait for result• Write cache entryWrite cache entry• Restart instruction execution Restart instruction execution
Handling WritesHandling Writes
Want to avoid inconsistent cache and Want to avoid inconsistent cache and memorymemory
Two approachesTwo approaches• Write-throughWrite-through• Write-backWrite-back
Write-ThroughWrite-Through
Idea: Write data into both cache and Idea: Write data into both cache and memorymemory
Simple solutionSimple solution Problematic in that the write to Problematic in that the write to
memory will take longer than write to memory will take longer than write to cache (maybe 100 times longer)cache (maybe 100 times longer)
Can use a write buffer Can use a write buffer What problems arise from using a What problems arise from using a
write buffer?write buffer?
Write-BackWrite-Back
Write only to the cacheWrite only to the cache Mark cache blocks that have been Mark cache blocks that have been
written to as “dirty”written to as “dirty” If block is dirty it must be written to If block is dirty it must be written to
memory when it is replacedmemory when it is replaced What type of problems can arise What type of problems can arise
using this strategy?using this strategy?
Memory Design to Support CachesMemory Design to Support Caches
Assume:Assume:• 1 memory bus clock cycle to send addr.1 memory bus clock cycle to send addr.• 15 memory bus clock cycles for DRAM access15 memory bus clock cycles for DRAM access• 1 memory bus clock cycle to send on word of 1 memory bus clock cycle to send on word of
datadata 4 word block transfer4 word block transfer
• 1 + 4x15 + 4x1 = 65 bus clock cycles1 + 4x15 + 4x1 = 65 bus clock cycles• Miss penalty is high Miss penalty is high • Bytes transferred per clock cycle Bytes transferred per clock cycle
(4x4)/65=0.25(4x4)/65=0.25
Memory DesignsMemory Designs
How do designs b & c increase the bytes per clock cycle transfer rate?
Bits In CacheBits In Cache
Block size is larger than word – say Block size is larger than word – say 22mm words words
Cache has 2Cache has 2nn blocks blocks Tag bits: 32 – (n – m + 2)Tag bits: 32 – (n – m + 2) Size: 2Size: 2nnx(mx32 + (32–n–m-2) + 1)x(mx32 + (32–n–m-2) + 1)
Analysis of Block SizeAnalysis of Block Size
Larger blocks exploit spatial localityLarger blocks exploit spatial locality Therefore, miss rate is loweredTherefore, miss rate is lowered What happens as block size What happens as block size
continues to gets larger?continues to gets larger?• Cache size is fixedCache size is fixed• Number of cache blocks is reducedNumber of cache blocks is reduced• Contention for block space in cache Contention for block space in cache
increasesincreases• Miss rate goes upMiss rate goes up
Measuring Cache PerformanceMeasuring Cache Performance
CPU Time = CPU Time =
(CPU exe cycles + mem stall cycles) (CPU exe cycles + mem stall cycles) x Clock cycle timex Clock cycle time
Read-stall cycles = Read-stall cycles =
Reads/Program x Read miss rate x Reads/Program x Read miss rate x Read Miss PenaltyRead Miss Penalty
Writes are a problem because of Writes are a problem because of buffer stallsbuffer stalls
Measuring Cache Performance:Measuring Cache Performance:SimplificationsSimplifications
Assume write-through schemeAssume write-through scheme Assume well designed system so that Assume well designed system so that
the write buffer stalls can be ignoredthe write buffer stalls can be ignored Read and write miss penalties are Read and write miss penalties are
the same.the same. Memory-stall clock cycles =Memory-stall clock cycles =Instructions/Program x misses/Instruction x Instructions/Program x misses/Instruction x
Miss penaltyMiss penalty
ExampleExample
AssumeAssume• Instruction cache miss rate: 2%Instruction cache miss rate: 2%• Data cache miss rate: 4%Data cache miss rate: 4%• CPI (Cycles Per Instruction): 2CPI (Cycles Per Instruction): 2• Miss penalty: 100 ccMiss penalty: 100 cc• SPECint2000 benchmark: 36% load & store SPECint2000 benchmark: 36% load & store
InstructionsInstructions• Clock Cycle Time: 1 ns (1x10Clock Cycle Time: 1 ns (1x10-9-9 sec) sec)
Find the CPU execution timeFind the CPU execution time How much faster would perfect cache be?How much faster would perfect cache be?
SolutionSolution
Instruction miss cycles: I x 2% x 100 = 2IInstruction miss cycles: I x 2% x 100 = 2I Data miss cycles: I x 36% x 4% x 100 = Data miss cycles: I x 36% x 4% x 100 =
1.44I1.44I Memory Stall cycles: 2I + 1.44I = 3.44IMemory Stall cycles: 2I + 1.44I = 3.44I CPI (w. Memory Stalls): 2 + 3.44 = 5.44CPI (w. Memory Stalls): 2 + 3.44 = 5.44 CPU execution time = 5.44I x 1 nsCPU execution time = 5.44I x 1 ns Perfect cache is 5.44/2 = 2.72 times fasterPerfect cache is 5.44/2 = 2.72 times faster
Types of Cache MappingsTypes of Cache Mappings
Direct mappedDirect mapped• Each block in only one placeEach block in only one place• (block number) mod (# cache blocks)(block number) mod (# cache blocks)
Set AssociativeSet Associative• Each block can be mapped to n places in Each block can be mapped to n places in
cachecache• (block number) mod (# sets in cache)(block number) mod (# sets in cache)
Fully AssociativeFully Associative• Block can map anywhere in cacheBlock can map anywhere in cache
Types of Cache Mappings (2)Types of Cache Mappings (2)
Set Associative Cache MappingsSet Associative Cache Mappings
Locating the Block in CacheLocating the Block in Cache
Virtual Memory: The ConceptVirtual Memory: The Concept
Use main memory as a cache for Use main memory as a cache for magnetic diskmagnetic disk
MotivationsMotivations• Safe and efficient sharing of main Safe and efficient sharing of main
memorymemory• Remove programmer burden of handling Remove programmer burden of handling
small limited amounts of memorysmall limited amounts of memory Invented in the 1960sInvented in the 1960s
Virtual Memory: Sharing MemoryVirtual Memory: Sharing Memory
Programs must be well behavedPrograms must be well behaved Main concept: each pgm has it own Main concept: each pgm has it own
address spaceaddress space Virtual memory: Virtual memory:
addr in program addr in program → → physical addressphysical address ProtectionProtection
• Protect one process from anotherProtect one process from another• Set of mechanisms for ensuring thisSet of mechanisms for ensuring this
Virtual Memory: Small MemoriesVirtual Memory: Small Memories
W/o virtual memory programmer W/o virtual memory programmer must make large pgm fit in small must make large pgm fit in small memory spacememory space
Solution was the use of overlaysSolution was the use of overlays Even with our relatively large main Even with our relatively large main
memories, we would still have to do memories, we would still have to do this today w/o virtual memory!!!!this today w/o virtual memory!!!!
Virtual Memory: TerminologyVirtual Memory: Terminology
Page – term for cache blockPage – term for cache block Page Fault – term for cache missPage Fault – term for cache miss Virtual addressVirtual address
• Address within the program spaceAddress within the program space• Translated to physical address by Translated to physical address by
combination of hardware & softwarecombination of hardware & software• Process called address translationProcess called address translation
Virtual Memory: Conceptual Virtual Memory: Conceptual DiagramDiagram
Virtual Memory: Address Virtual Memory: Address TranslationTranslation
Virtual Memory: Page FaultsVirtual Memory: Page Faults
Main memory approx. 100,000 times Main memory approx. 100,000 times faster than diskfaster than disk
Page Fault is enormously costlyPage Fault is enormously costly Key decisions:Key decisions:
• Page size – 4KB to 16KBPage size – 4KB to 16KB• Reducing page faults attractiveReducing page faults attractive• Page Faults can be handled in softwarePage Faults can be handled in software• Only write-back can be usedOnly write-back can be used
Virtual Memory: Placing & Finding Virtual Memory: Placing & Finding a Pagea Page
Each process has its
own page table
Virtual Memory: Swap SpaceVirtual Memory: Swap Space
Swap Space
Virtual Memory: Translation-Virtual Memory: Translation-Lookaside BufferLookaside Buffer