+ All Categories
Home > Documents > Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing...

Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing...

Date post: 18-Aug-2018
Category:
Upload: phambao
View: 214 times
Download: 0 times
Share this document with a friend
28
Maurizio Palesi 1 Memory Hierarchy Memory Hierarchy Maurizio Palesi
Transcript
Page 1: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 1

Memory HierarchyMemory Hierarchy

Maurizio Palesi

Page 2: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 2

ReferencesReferencesJohn L. Hennessy and David A. Patterson, 

Computer Architecture a Quantitative Approach, second edition, Morgan KaufmannChapter 5

Page 3: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 3

Who Cares About the Memory Hierarchy?Who Cares About the Memory Hierarchy?

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU19

82

Processor­MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

“Moore’s Law”

Processor­DRAM Memory Gap (latency)

Page 4: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 4

Page 5: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 5

Page 6: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 6

Page 7: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 7

Levels of the Memory HierarchyLevels of the Memory Hierarchy

CPU Registers100s Bytes<10s ns

CacheK Bytes10­100 ns1­0.1 cents/bit

Main MemoryM Bytes200ns­ 500ns$.0001­.00001 cents /bit

DiskG Bytes, 10 ms (10,000,000 ns)10­5 ­ 10­6  cents/bit

Tapeinfinitesec­min10­8

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1­8 bytes

cache cntl8­128 bytes

OS512­4K bytes

user/operatorMbytes

Faster

Larger

Page 8: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 8

What is a Cache?What is a Cache? Small, fast storage used to improve average access time to 

slow memory Exploits spatial and temporal locality In computer architecture, almost everything is a cache!

Registers a cache on variablesFirst­level cache a cache on second­level cacheSecond­level cache a cache on memoryMemory a cache on disk (virtual memory)TLB a cache on page tableBranch­prediction a cache on prediction information?

Page 9: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 9

The Principle of LocalityThe Principle of Locality The Principle of Locality

Program access a relatively small portion of the address space at any instant of time

Two Different Types of LocalityTemporal Locality (Locality in Time): If an item is referenced, it will 

tend to be referenced again soon (e.g., loops, reuse)Spatial Locality (Locality in Space): If an item is referenced, items 

whose addresses are close by tend to be referenced soon (e.g., straightline code, array access)

Last 20 years, HW relied on locality for speed

Page 10: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 10

Exploit LocalityExploit Locality By taking advantage of the principle of locality

Present the user with as much memory as is available in the cheapest technology

Provide access at the speed offered by the fastest technology

DRAM is slow but cheap and denseGood choice for presenting the user with a BIG memory 

system SRAM is fast but expensive and not very dense

Good choice for providing the user FAST access time

Page 11: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 11

General PrincipesGeneral Principes Locality

Temporal Locality: referenced again soonSpatial Locality: nearby items referenced soon

Locality + smaller HW is faster = memory hierarchyLevels: each smaller, faster, more expensive/byte than level below Inclusive: data found in top also found in the bottom

DefinitionsUpper is closer to processorBlock is the minimum unit that is present or not in upper levelAddress = Block frame address + block offset address

Page 12: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 12

Memory Hierarchy: TerminologyMemory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) 

Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate  = 1 ­ (Hit Rate) Miss Penalty: Time to replace a block in the upper level  + 

Time to deliver the block to the processor Hit Time << Miss Penalty (500 instructions on 21264!)

Lower LevelMemoryUpper Level

MemoryTo Processor

From ProcessorBlk X

Blk Y

Page 13: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 13

Cache MeasuresCache Measures Average memory­access time 

= Hit time + Miss rate x Miss penalty [ns or clocks] Miss penalty: time to replace a block from lower level, including time 

to replace in CPUaccess time: time to lower level 

= f(latency to lower level) transfer time: time to transfer block 

=f(BW between upper & lower levels)

Page 14: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 14

Block Size TradeoffBlock Size Tradeoff In general, larger block size take advantage of spatial locality BUT

Larger block size means larger miss penalty Takes longer time to fill up the block

If block size is too big relative to cache size, miss rate will go up Too few cache blocks

In general, Average Access Time= Hit Time +  Miss Penalty x Miss Rate

MissPenalty

Block Size

MissRate Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

AverageAccess

Time

Increased Miss Penalty& Miss Rate

Block Size Block Size

Page 15: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 15

4 Questions for Memory Hierarchy4 Questions for Memory Hierarchy

Q1: Where can a block be placed in the upper level? (Block placement)Fully Associative, Set Associative, Direct Mapped

Q2: How is a block found if it is in the upper level?(Block identification)Tag/Block

Q3: Which block should be replaced on a miss? (Block replacement)Random, LRU

Q4: What happens on a write? (Write strategy)Write Back or Write Through (with Write Buffer)

Page 16: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 16

Q1: Where can a block be placed in Q1: Where can a block be placed in the upper level?the upper level?

0 1 2 3 4 5 6 7Blockno.

Fully associative:block 12 can go anywhere

0 1 2 3 4 5 6 7Blockno.

Direct mapped:block 12 can go only into block 4 (12 mod 8)

0 1 2 3 4 5 6 7Blockno.

Set associative:block 12 can go anywhere in set 0 (12 mod 4)

Set0

Set1

Set2

Set3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Block­frame address

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3Blockno.

Page 17: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 17

Q2: How Is a Block Found If It Is in Q2: How Is a Block Found If It Is in the Upper Level?the Upper Level?

Tag on each blockNo need to check index or block offset

Increasing associativity shrinks index, expands tag

Full Associative:  No indexDirect Mapped  : Large index

Page 18: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 18

Cache Direct MappedCache Direct Mapped

00000 ( 0)00001 ( 1)

00111 ( 7)01000 ( 8)

01111 (15)10000 (16)

10111 (23)11000 (24)

11111 (31)

Main Memory

Part. 0

Part. 1

Part. 2

Part. 3

0

7

Cache

Page 19: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 19

Cache Direct MappedCache Direct Mapped

00000 ( 0)00001 ( 1)

00111 ( 7)01000 ( 8)

01111 (15)10000 (16)

10111 (23)11000 (24)

11111 (31)

Main Memory

Part. 0

Part. 1

Part. 2

Part. 3

Cache

0

7

0

7

tag

Page 20: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 20

Q3: Which Block Should be Q3: Which Block Should be Replaced on a Miss?Replaced on a Miss?

Easy for Direct MappedS.A. or F.A.:

Random (large associativities)LRU (smaller associativities)

Associativity2­way 4­way 8­way

Size LRU RND LRU RND LRU RND16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

Page 21: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 21

Q4: What Happens on a Write?Q4: What Happens on a Write?

Write through: The information is written to both the block in the cache and to the block in the lower­level memory

Write back: The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced Is block clean or dirty?

Pros and Cons of eachWT: read misses cannot result in writes (because of replacements)WB: no writes of repeated writes

WT always combined with write buffers so that don’t wait for lower level memory

Page 22: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 22

Write Buffer for Write ThroughWrite Buffer for Write Through

A Write Buffer is needed between the Cache and MemoryProcessor: writes data into the cache and the write bufferMemory controller: write contents of the buffer to memory

Write buffer is just a FIFOTypical number of entries: 4Works fine if:  Store frequency (w.r.t. time) << 1 / DRAM write cycle

Memory system designer’s nightmareStore frequency (w.r.t. time) >  1 / DRAM write cycleWrite buffer saturation

ProcessorCache

Write buffer

DRAM

Page 23: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 23

How a Block is Found in CacheHow a Block is Found in Cache

CPU addresstag index offset

Cache tag Cache data

Hit/miss

CPUData bus

compare

Page 24: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 24

How a Block is Found in CacheHow a Block is Found in Cache

Two sets ofAddress tagsand data RAM

2:1 Muxfor the way

Use addressbits to selectcorrect DRAM

Page 25: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 25

Reducing MissesReducing Misses Classifying Misses: 3 Cs

Compulsory The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses

Capacity If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved

Conflict If block­placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses

Page 26: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 26

3Cs Absolute Miss Rate 3Cs Absolute Miss Rate (SPEC92)(SPEC92)

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 4 8

16

32

64

12

8

1-way

2-way

4-way

8-way

Capacity

Compulsory

Page 27: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 27

2:1 Cache Rule2:1 Cache Rule   miss rate 1­way associative cache size X = miss rate 2­way associative cache size X/2

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0

0.02

0.04

0.06

0.08

0.1

0.12

0.141 2 4 8

16

32

64

12

8

1-way

2-way

4-way

8-way

Capacity

Compulsory

Page 28: Memory Hierarchy - Unict · SRAM is fast but expensive and not very dense Good choice for providing the user FAST access time. Maurizio ... Locality + smaller HW is faster = memory

Maurizio Palesi 28

3Cs Relative Miss Rate3Cs Relative Miss Rate

Cache Size (KB)

Mis

s R

ate

per

Typ

e

0%

20%

40%

60%

80%

100%1 2 4 8

16

32

64

12

8

1-way

2-way4-way

8-way

Capacity

Compulsory

Conflict


Recommended