Memory Hierarchy

Memory HierarchyMemory Hierarchy

Faster Access-Lower CostFaster Access-Lower Cost

Principle of LocalityPrinciple of Locality

Programs access small portions of Programs access small portions of their address space at any instant of their address space at any instant of time.time.

Two typesTwo types• Temporal localityTemporal locality

Item referenced will be referenced again Item referenced will be referenced again soonsoon

• Spatial localitySpatial locality Items near the last referenced item will be Items near the last referenced item will be

referenced soonreferenced soon

Memory HierarchyMemory Hierarchy

Takes advantage of principle of localityTakes advantage of principle of locality Memory technologiesMemory technologies

• SRAM – fast but costlySRAM – fast but costly• DRAM – slower but not as costlyDRAM – slower but not as costly• Magnetic Disk – much slower but very cheapMagnetic Disk – much slower but very cheap

Idea: construct hierarchy of these Idea: construct hierarchy of these memories in increasing size away form memories in increasing size away form processorprocessor

Cache Memory (Two level)Cache Memory (Two level) Block – Smallest unit Block – Smallest unit

of data transferred.of data transferred. Hit rate – Fraction of Hit rate – Fraction of

memory access found memory access found in cachein cache

Miss rate (1 – hit rate)Miss rate (1 – hit rate) Hit time – Time to Hit time – Time to

access a level of access a level of memory including memory including determine hit or missdetermine hit or miss

Miss penalty – Time Miss penalty – Time required to fetch block required to fetch block from lower memoryfrom lower memory

Processor

Cache

Direct Mapped CacheDirect Mapped Cache

How do you map a block of memory How do you map a block of memory from larger memory space to cache?from larger memory space to cache?

Simplest method: assign location in Simplest method: assign location in cache for each location in memorycache for each location in memory

Function:Function:• (block addr) mod (# cache blocks)(block addr) mod (# cache blocks)• If # cache blocks is 2If # cache blocks is 2nn, block address for , block address for

memory address A is logmemory address A is log22(A) (A) • Note this is just the lower n bits of ANote this is just the lower n bits of A

A Direct-Mapped Cache ExampleA Direct-Mapped Cache Example

Accessing A CacheAccessing A Cache

IndexIndex VV TagTag DataData

000000 YY 1010 Memory(00000)Memory(00000)

001001 NN

010010 YY 1111 Memory(11010)Memory(11010)

011011 YY 0000 Memory(00011)Memory(00011)

100100 NN

101101 NN

110110 YY 1010 Memory(10110)Memory(10110)

111111 NN

References:10110 - m11010 - m10110 - h11010 - h10000 - m00011 - m10000 - h10010 - m

Updated CacheUpdated Cache

IndexIndex VV TagTag DataData

000000 YY 1010 Memory(00000)Memory(00000)

001001 NN

010010 YY 1010 Memory(10010)Memory(10010)

011011 YY 0000 Memory(00011)Memory(00011)

100100 NN

101101 NN

110110 YY 1010 Memory(10110)Memory(10110)

111111 NN

References:10110 - m11010 - m10110 - h11010 - h10000 - m00011 - m10000 - h10010 - m

Selecting the DataSelecting the Data

Handling Cache MissesHandling Cache Misses

Must modify control to take into Must modify control to take into account if miss occursaccount if miss occurs

Consider instruction memoryConsider instruction memory AlgorithmAlgorithm

• Send (PC – 4) to memorySend (PC – 4) to memory• Read memory and wait for resultRead memory and wait for result• Write cache entryWrite cache entry• Restart instruction execution Restart instruction execution

Handling WritesHandling Writes

Want to avoid inconsistent cache and Want to avoid inconsistent cache and memorymemory

Two approachesTwo approaches• Write-throughWrite-through• Write-backWrite-back

Write-ThroughWrite-Through

Idea: Write data into both cache and Idea: Write data into both cache and memorymemory

Simple solutionSimple solution Problematic in that the write to Problematic in that the write to

memory will take longer than write to memory will take longer than write to cache (maybe 100 times longer)cache (maybe 100 times longer)

Can use a write buffer Can use a write buffer What problems arise from using a What problems arise from using a

write buffer?write buffer?

Write-BackWrite-Back

Write only to the cacheWrite only to the cache Mark cache blocks that have been Mark cache blocks that have been

written to as “dirty”written to as “dirty” If block is dirty it must be written to If block is dirty it must be written to

memory when it is replacedmemory when it is replaced What type of problems can arise What type of problems can arise

using this strategy?using this strategy?

Memory Design to Support CachesMemory Design to Support Caches

Assume:Assume:• 1 memory bus clock cycle to send addr.1 memory bus clock cycle to send addr.• 15 memory bus clock cycles for DRAM access15 memory bus clock cycles for DRAM access• 1 memory bus clock cycle to send on word of 1 memory bus clock cycle to send on word of

datadata 4 word block transfer4 word block transfer

• 1 + 4x15 + 4x1 = 65 bus clock cycles1 + 4x15 + 4x1 = 65 bus clock cycles• Miss penalty is high Miss penalty is high • Bytes transferred per clock cycle Bytes transferred per clock cycle

(4x4)/65=0.25(4x4)/65=0.25

Memory DesignsMemory Designs

How do designs b & c increase the bytes per clock cycle transfer rate?

Bits In CacheBits In Cache

Block size is larger than word – say Block size is larger than word – say 22mm words words

Cache has 2Cache has 2nn blocks blocks Tag bits: 32 – (n – m + 2)Tag bits: 32 – (n – m + 2) Size: 2Size: 2nnx(mx32 + (32–n–m-2) + 1)x(mx32 + (32–n–m-2) + 1)

Analysis of Block SizeAnalysis of Block Size

Larger blocks exploit spatial localityLarger blocks exploit spatial locality Therefore, miss rate is loweredTherefore, miss rate is lowered What happens as block size What happens as block size

continues to gets larger?continues to gets larger?• Cache size is fixedCache size is fixed• Number of cache blocks is reducedNumber of cache blocks is reduced• Contention for block space in cache Contention for block space in cache

increasesincreases• Miss rate goes upMiss rate goes up

Measuring Cache PerformanceMeasuring Cache Performance

CPU Time = CPU Time =

(CPU exe cycles + mem stall cycles) (CPU exe cycles + mem stall cycles) x Clock cycle timex Clock cycle time

Read-stall cycles = Read-stall cycles =

Reads/Program x Read miss rate x Reads/Program x Read miss rate x Read Miss PenaltyRead Miss Penalty

Writes are a problem because of Writes are a problem because of buffer stallsbuffer stalls

Measuring Cache Performance:Measuring Cache Performance:SimplificationsSimplifications

Assume write-through schemeAssume write-through scheme Assume well designed system so that Assume well designed system so that

the write buffer stalls can be ignoredthe write buffer stalls can be ignored Read and write miss penalties are Read and write miss penalties are

the same.the same. Memory-stall clock cycles =Memory-stall clock cycles =Instructions/Program x misses/Instruction x Instructions/Program x misses/Instruction x

Miss penaltyMiss penalty

ExampleExample

AssumeAssume• Instruction cache miss rate: 2%Instruction cache miss rate: 2%• Data cache miss rate: 4%Data cache miss rate: 4%• CPI (Cycles Per Instruction): 2CPI (Cycles Per Instruction): 2• Miss penalty: 100 ccMiss penalty: 100 cc• SPECint2000 benchmark: 36% load & store SPECint2000 benchmark: 36% load & store

InstructionsInstructions• Clock Cycle Time: 1 ns (1x10Clock Cycle Time: 1 ns (1x10-9-9 sec) sec)

Find the CPU execution timeFind the CPU execution time How much faster would perfect cache be?How much faster would perfect cache be?

SolutionSolution

Instruction miss cycles: I x 2% x 100 = 2IInstruction miss cycles: I x 2% x 100 = 2I Data miss cycles: I x 36% x 4% x 100 = Data miss cycles: I x 36% x 4% x 100 =

1.44I1.44I Memory Stall cycles: 2I + 1.44I = 3.44IMemory Stall cycles: 2I + 1.44I = 3.44I CPI (w. Memory Stalls): 2 + 3.44 = 5.44CPI (w. Memory Stalls): 2 + 3.44 = 5.44 CPU execution time = 5.44I x 1 nsCPU execution time = 5.44I x 1 ns Perfect cache is 5.44/2 = 2.72 times fasterPerfect cache is 5.44/2 = 2.72 times faster

Types of Cache MappingsTypes of Cache Mappings

Direct mappedDirect mapped• Each block in only one placeEach block in only one place• (block number) mod (# cache blocks)(block number) mod (# cache blocks)

Set AssociativeSet Associative• Each block can be mapped to n places in Each block can be mapped to n places in

cachecache• (block number) mod (# sets in cache)(block number) mod (# sets in cache)

Fully AssociativeFully Associative• Block can map anywhere in cacheBlock can map anywhere in cache

Types of Cache Mappings (2)Types of Cache Mappings (2)

Set Associative Cache MappingsSet Associative Cache Mappings

Locating the Block in CacheLocating the Block in Cache

Virtual Memory: The ConceptVirtual Memory: The Concept

Use main memory as a cache for Use main memory as a cache for magnetic diskmagnetic disk

MotivationsMotivations• Safe and efficient sharing of main Safe and efficient sharing of main

memorymemory• Remove programmer burden of handling Remove programmer burden of handling

small limited amounts of memorysmall limited amounts of memory Invented in the 1960sInvented in the 1960s

Virtual Memory: Sharing MemoryVirtual Memory: Sharing Memory

Programs must be well behavedPrograms must be well behaved Main concept: each pgm has it own Main concept: each pgm has it own

address spaceaddress space Virtual memory: Virtual memory:

addr in program addr in program → → physical addressphysical address ProtectionProtection

• Protect one process from anotherProtect one process from another• Set of mechanisms for ensuring thisSet of mechanisms for ensuring this

Virtual Memory: Small MemoriesVirtual Memory: Small Memories

W/o virtual memory programmer W/o virtual memory programmer must make large pgm fit in small must make large pgm fit in small memory spacememory space

Solution was the use of overlaysSolution was the use of overlays Even with our relatively large main Even with our relatively large main

memories, we would still have to do memories, we would still have to do this today w/o virtual memory!!!!this today w/o virtual memory!!!!

Virtual Memory: TerminologyVirtual Memory: Terminology

Page – term for cache blockPage – term for cache block Page Fault – term for cache missPage Fault – term for cache miss Virtual addressVirtual address

• Address within the program spaceAddress within the program space• Translated to physical address by Translated to physical address by

combination of hardware & softwarecombination of hardware & software• Process called address translationProcess called address translation

Virtual Memory: Conceptual Virtual Memory: Conceptual DiagramDiagram

Virtual Memory: Address Virtual Memory: Address TranslationTranslation

Virtual Memory: Page FaultsVirtual Memory: Page Faults

Main memory approx. 100,000 times Main memory approx. 100,000 times faster than diskfaster than disk

Page Fault is enormously costlyPage Fault is enormously costly Key decisions:Key decisions:

• Page size – 4KB to 16KBPage size – 4KB to 16KB• Reducing page faults attractiveReducing page faults attractive• Page Faults can be handled in softwarePage Faults can be handled in software• Only write-back can be usedOnly write-back can be used

Virtual Memory: Placing & Finding Virtual Memory: Placing & Finding a Pagea Page

Each process has its

own page table

Virtual Memory: Swap SpaceVirtual Memory: Swap Space

Swap Space

Virtual Memory: Translation-Virtual Memory: Translation-Lookaside BufferLookaside Buffer

Date post:	23-Jan-2016
Category:	Documents
Upload:	nelly
View:	35 times
Download:	0 times

Memory Hierarchy

Documents