Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing...

Maurizio Palesi 1

Memory HierarchyMemory Hierarchy

Maurizio Palesi

Maurizio Palesi 2

ReferencesReferencesJohn L. Hennessy and David A. Patterson,

Computer Architecture a Quantitative Approach, second edition, Morgan KaufmannChapter 5

Maurizio Palesi 3

Who Cares About the Memory Hierarchy?Who Cares About the Memory Hierarchy?

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU19

82

ProcessorMemoryPerformance Gap:(grows 50% / year)

Per

form

ance

“Moore’s Law”

ProcessorDRAM Memory Gap (latency)

Maurizio Palesi 4

Maurizio Palesi 5

Maurizio Palesi 6

Maurizio Palesi 7

Levels of the Memory HierarchyLevels of the Memory Hierarchy

CPU Registers100s Bytes<10s ns

CacheK Bytes10100 ns10.1 cents/bit

Main MemoryM Bytes200ns 500ns$.0001.00001 cents /bit

DiskG Bytes, 10 ms (10,000,000 ns)105 106 cents/bit

Tapeinfinitesecmin108

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler18 bytes

cache cntl8128 bytes

OS5124K bytes

user/operatorMbytes

Faster

Larger

Maurizio Palesi 8

What is a Cache?What is a Cache? Small, fast storage used to improve average access time to

slow memory Exploits spatial and temporal locality In computer architecture, almost everything is a cache!

Registers a cache on variablesFirstlevel cache a cache on secondlevel cacheSecondlevel cache a cache on memoryMemory a cache on disk (virtual memory)TLB a cache on page tableBranchprediction a cache on prediction information?

Maurizio Palesi 9

The Principle of LocalityThe Principle of Locality The Principle of Locality

Program access a relatively small portion of the address space at any instant of time

Two Different Types of LocalityTemporal Locality (Locality in Time): If an item is referenced, it will

tend to be referenced again soon (e.g., loops, reuse)Spatial Locality (Locality in Space): If an item is referenced, items

whose addresses are close by tend to be referenced soon (e.g., straightline code, array access)

Last 20 years, HW relied on locality for speed

Maurizio Palesi 10

Exploit LocalityExploit Locality By taking advantage of the principle of locality

Present the user with as much memory as is available in the cheapest technology

Provide access at the speed offered by the fastest technology

DRAM is slow but cheap and denseGood choice for presenting the user with a BIG memory

system SRAM is fast but expensive and not very dense

Good choice for providing the user FAST access time

Maurizio Palesi 11

General PrincipesGeneral Principes Locality

Temporal Locality: referenced again soonSpatial Locality: nearby items referenced soon

Locality + smaller HW is faster = memory hierarchyLevels: each smaller, faster, more expensive/byte than level below Inclusive: data found in top also found in the bottom

DefinitionsUpper is closer to processorBlock is the minimum unit that is present or not in upper levelAddress = Block frame address + block offset address

Maurizio Palesi 12

Memory Hierarchy: TerminologyMemory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)

Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 (Hit Rate) Miss Penalty: Time to replace a block in the upper level +

Time to deliver the block to the processor Hit Time << Miss Penalty (500 instructions on 21264!)

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

Maurizio Palesi 13

Cache MeasuresCache Measures Average memoryaccess time

= Hit time + Miss rate x Miss penalty [ns or clocks] Miss penalty: time to replace a block from lower level, including time

to replace in CPUaccess time: time to lower level

= f(latency to lower level) transfer time: time to transfer block

=f(BW between upper & lower levels)

Maurizio Palesi 14

Block Size TradeoffBlock Size Tradeoff In general, larger block size take advantage of spatial locality BUT

Larger block size means larger miss penalty Takes longer time to fill up the block

If block size is too big relative to cache size, miss rate will go up Too few cache blocks

In general, Average Access Time= Hit Time + Miss Penalty x Miss Rate

MissPenalty

Block Size

MissRate Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

AverageAccess

Time

Increased Miss Penalty& Miss Rate

Block Size Block Size

Maurizio Palesi 15

4 Questions for Memory Hierarchy4 Questions for Memory Hierarchy

Q1: Where can a block be placed in the upper level? (Block placement)Fully Associative, Set Associative, Direct Mapped

Q2: How is a block found if it is in the upper level?(Block identification)Tag/Block

Q3: Which block should be replaced on a miss? (Block replacement)Random, LRU

Q4: What happens on a write? (Write strategy)Write Back or Write Through (with Write Buffer)

Maurizio Palesi 16

Q1: Where can a block be placed in Q1: Where can a block be placed in the upper level?the upper level?

0 1 2 3 4 5 6 7Blockno.

Fully associative:block 12 can go anywhere

0 1 2 3 4 5 6 7Blockno.

Direct mapped:block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7Blockno.

Set associative:block 12 can go anywhere in set 0 (12 mod 4)

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Blockframe address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3Blockno.

Maurizio Palesi 17

Q2: How Is a Block Found If It Is in Q2: How Is a Block Found If It Is in the Upper Level?the Upper Level?

Tag on each blockNo need to check index or block offset

Increasing associativity shrinks index, expands tag

Full Associative: No indexDirect Mapped : Large index

Maurizio Palesi 18

Cache Direct MappedCache Direct Mapped

00000 ( 0)00001 ( 1)

00111 ( 7)01000 ( 8)

01111 (15)10000 (16)

10111 (23)11000 (24)

11111 (31)

Main Memory

Part. 0

Part. 1

Part. 2

Part. 3

0

7

Cache

Maurizio Palesi 19

Cache Direct MappedCache Direct Mapped

00000 ( 0)00001 ( 1)

00111 ( 7)01000 ( 8)

01111 (15)10000 (16)

10111 (23)11000 (24)

11111 (31)

Main Memory

Part. 0

Part. 1

Part. 2

Part. 3

Cache

0

7

0

7

tag

Maurizio Palesi 20

Q3: Which Block Should be Q3: Which Block Should be Replaced on a Miss?Replaced on a Miss?

Easy for Direct MappedS.A. or F.A.:

Random (large associativities)LRU (smaller associativities)

Associativity2way 4way 8way

Size LRU RND LRU RND LRU RND16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

Maurizio Palesi 21

Q4: What Happens on a Write?Q4: What Happens on a Write?

Write through: The information is written to both the block in the cache and to the block in the lowerlevel memory

Write back: The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced Is block clean or dirty?

Pros and Cons of eachWT: read misses cannot result in writes (because of replacements)WB: no writes of repeated writes

WT always combined with write buffers so that don’t wait for lower level memory

Maurizio Palesi 22

Write Buffer for Write ThroughWrite Buffer for Write Through

A Write Buffer is needed between the Cache and MemoryProcessor: writes data into the cache and the write bufferMemory controller: write contents of the buffer to memory

Write buffer is just a FIFOTypical number of entries: 4Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle

Memory system designer’s nightmareStore frequency (w.r.t. time) > 1 / DRAM write cycleWrite buffer saturation

ProcessorCache

Write buffer

DRAM

Maurizio Palesi 23

How a Block is Found in CacheHow a Block is Found in Cache

CPU addresstag index offset

Cache tag Cache data

Hit/miss

CPUData bus

compare

Maurizio Palesi 24

How a Block is Found in CacheHow a Block is Found in Cache

Two sets ofAddress tagsand data RAM

2:1 Muxfor the way

Use addressbits to selectcorrect DRAM

Maurizio Palesi 25

Reducing MissesReducing Misses Classifying Misses: 3 Cs

Compulsory The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses

Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved

Conflict If blockplacement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses

Maurizio Palesi 26

3Cs Absolute Miss Rate 3Cs Absolute Miss Rate (SPEC92)(SPEC92)

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 4 8

16

32

64

12

8

1-way

2-way

4-way

8-way

Capacity

Compulsory

Maurizio Palesi 27

2:1 Cache Rule2:1 Cache Rule miss rate 1way associative cache size X = miss rate 2way associative cache size X/2

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0

0.02

0.04

0.06

0.08

0.1

0.12

0.141 2 4 8

16

32

64

12

8

1-way

2-way

4-way

8-way

Capacity

Compulsory

Maurizio Palesi 28

3Cs Relative Miss Rate3Cs Relative Miss Rate

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0%

20%

40%

60%

80%

100%1 2 4 8

16

32

64

12

8

1-way

2-way4-way

8-way

Capacity

Compulsory

Conflict

Date post:	18-Aug-2018
Category:	Documents
Upload:	phambao
View:	214 times
Download:	0 times

Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing...

Documents