+ All Categories
Home > Documents > 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy...

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy...

Date post: 19-Dec-2015
Category:
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
26
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach, Fifth Edition
Transcript
Page 1: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Chapter 2 (and Appendix B)

Memory Hierarchy Design

Computer ArchitectureA Quantitative Approach, Fifth Edition

Page 2: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

2Copyright © 2012, Elsevier Inc. All rights reserved.

In the beginning…

Main memory (i.e., RAM) was faster than processors Memory access times less than clock cycle

times Everything was “great” until ~1980

Page 3: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

3Copyright © 2012, Elsevier Inc. All rights reserved.

Memory Performance GapIntroduction

Page 4: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

4Copyright © 2012, Elsevier Inc. All rights reserved.

Memory Performance Gap Programmers want unlimited amounts of memory with

low latency Fast memory technology is more expensive per bit than

slower memory Solution: organize memory system into a hierarchy

Entire addressable memory space available in largest, slowest memory

Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor

Temporal and spatial locality insures that nearly all references can be found in smaller memories

Gives the illusion of a large, fast memory being presented to the processor

Introduction

Page 5: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

5Copyright © 2012, Elsevier Inc. All rights reserved.

Memory HierarchyIntroduction

Page 6: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

6Copyright © 2012, Elsevier Inc. All rights reserved.

Memory Hierarchy Design

Memory hierarchy design becomes more crucial with recent multi-core processors: Aggregate peak bandwidth grows with # cores:

Intel Core i7 can generate two references per core per clock Four cores and 3.2 GHz clock

25.6 billion 64-bit data references/second + 12.8 billion 128-bit instruction references = 409.6 GB/s!

DRAM bandwidth is only 6% of this (25 GB/s) Requires:

Multi-port, pipelined caches Two levels of cache per core Shared third-level cache on chip

Introduction

Page 7: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

7Copyright © 2012, Elsevier Inc. All rights reserved.

Performance and Power

High-end microprocessors have >10 MB on-chip cache Consumes large amount of area and power budget

Introduction

Page 8: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

8Copyright © 2012, Elsevier Inc. All rights reserved.

Cache Hits

When a word is found in the cache, a hit occurs: Yay!

Introduction

Page 9: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

9Copyright © 2012, Elsevier Inc. All rights reserved.

Cache Misses

When a word is not found in the cache, a miss occurs: Fetch word from lower level in hierarchy, requiring a

higher latency reference Lower level may be another cache or the main

memory Also fetch the other words contained within the block

Takes advantage of spatial locality Place block into cache in a location determined by

address

Introduction

Page 10: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

10Copyright © 2012, Elsevier Inc. All rights reserved.

Causes of Misses

Miss rate Fraction of cache access that result in a miss

Causes of misses Compulsory

First reference to a block Capacity

Blocks discarded and later retrieved Conflict

Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache

Introduction

Page 11: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

11Copyright © 2012, Elsevier Inc. All rights reserved.

Causes of MissesIntroduction

Page 12: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

12Copyright © 2012, Elsevier Inc. All rights reserved.

Note that speculative and multithreaded processors may execute other instructions during a miss Reduces performance impact of misses

Quantifying MissesIntroduction

Page 13: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

13Copyright © 2012, Elsevier Inc. All rights reserved.

Cache Design

4 Questions: Where can a block be placed in the cache? How is a block found? Which block should be replaced? How is a write handled?

Introduction

Page 14: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

14Copyright © 2012, Elsevier Inc. All rights reserved.

Block Placement

Divide cache into sets to reduce conflict misses n blocks per sets => n-way set associative

Direct-mapped cache => one block per set Fully associative => one set

Introduction

Page 15: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

15Copyright © 2012, Elsevier Inc. All rights reserved.

Block PlacementIntroduction

Page 16: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

16Copyright © 2012, Elsevier Inc. All rights reserved.

Block Placement

For a direct-mapped cache High-order address bits determine block location

(uniquely) For example: 32-bit addresses, 64 blocks, 8-byte

blocks Low-order 3 address bits determine byte location in block High-order 32-3 = 29 bits determine block location in cache 64 blocks, so location is value of high-order 29 bits mod 64 Can simply use the low-order 6 bits of those 29 bits!

Introduction

Page 17: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

17Copyright © 2012, Elsevier Inc. All rights reserved.

Block Placement

In general Block address is divided into tag and index based on

associativity An n-way associative cache has a number of sets equal to

blocks/associativity, so index is lg blocks/associativity bits For example, a 2-way associative cache with 64 blocks has

32 sets, so 5 index bits Tag is used to identify the block during lookup

Page 18: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

18Copyright © 2012, Elsevier Inc. All rights reserved.

Causes of MissesIntroduction

Page 19: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

19Copyright © 2012, Elsevier Inc. All rights reserved.

Cache Lookup

Easy for direct mapped, each block has a unique location Just need to verify the tag

For n-way, check tag at index in all sets simultaneously Higher associativity means more-complex

hardware (i.e., slower) Also, each entry needs a “valid” bit

Avoids initialization issues

Introduction

Page 20: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

20Copyright © 2012, Elsevier Inc. All rights reserved.

Opteron L1 Data Cache (2-way)Introduction

Page 21: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

21Copyright © 2012, Elsevier Inc. All rights reserved.

Replacement

Easy for direct mapped, each block has a unique location (again)

For higher associativities Least Recently Used (LRU) First In, First Out (FIFO) Random

Introduction

Page 22: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

22Copyright © 2012, Elsevier Inc. All rights reserved.

Writes

Writing to cache: two strategies Write-through

Immediately update lower levels of hierarchy Write-back

Only update lower levels of hierarchy when an updated block is replaced (use dirty bit)

Both strategies use write buffer to make writes asynchronous

Introduction

Page 23: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

23Copyright © 2012, Elsevier Inc. All rights reserved.

Example 32-bit addressing; 128KB cache, 256B blocks (512

blocks), 2-way set associative (256 sets), LRU replacement policy

Low-order 8 bits is offset, next lowest 8 bits is index, high-order 16 bits is tag

Page 24: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

24Copyright © 2012, Elsevier Inc. All rights reserved.

Example From empty:

Read 0xAABB0100 Tag: AABB, Index: 01, Offset: 00; place in set 1, bank 0 Miss (compulsory)

Read 0xABBA0101 Tag: ABBA, Index: 01, Offset: 01; place in set 1, bank 1 Miss (compulsory)

Read 0xABBA01A3 Tag: ABBA, Index: 01, Offset: A3; located in set 1, bank 1 Hit!

Read 0xCBAF01FF Tag: CBAF, Index: 01, Offset: FF; place in set 1, bank 0 (LRU) Miss (conflict)

Read 0xAABB01CC Tag: AABB, Index: 01, Offset: CC; place in set 1, bank 1 (LRU) Miss (conflict)

Page 25: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

25Copyright © 2012, Elsevier Inc. All rights reserved.

Basic Optimizations

Six basic cache optimizations: Larger block size

Reduces compulsory misses Increases conflict misses, increases miss penalty

Larger total cache capacity Reduces capacity misses Increases hit time, increases power consumption

Higher associativity Reduces conflict misses Increases hit time, increases power consumption

Higher number of cache levels Reduces overall memory access time

Giving priority to read misses over writes Reduces miss penalty

Avoiding address translation in cache indexing Reduces hit time

Introduction

Page 26: 1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

26Copyright © 2012, Elsevier Inc. All rights reserved.

Pitfalls

Ignoring the impact of the operating system on the performance of the memory hierarchy

Mem

ory Technology


Recommended