+ All Categories
Home > Documents > Computer Organization and Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy

Date post: 30-Dec-2015
Category:
Upload: bryanne-kennedy
View: 43 times
Download: 1 times
Share this document with a friend
Description:
Computer Organization and Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan [email protected]. Major Components of a Computer. Processor. Devices. Control. Input. Memory. Datapath. - PowerPoint PPT Presentation
Popular Tags:
32
Computer Organization and Architecture Chapter 7 Large and Fast: Exploiting Memory Hierarchy Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan [email protected]
Transcript
Page 1: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Computer Organization and Architecture

Chapter 7

Large and Fast: Exploiting Memory

Hierarchy

Yu-Lun KuoComputer Sciences and Information Engineering

University of Tunghai, Taiwan

[email protected]

Page 2: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Major Components of a Computer

2

Processor

Control

Datapath

Memory

Devices

Input

Output

Page 3: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Processor-Memory Performance Gap

3

1

10

100

1000

10000

Year

Per

form

ance

“Moore’s Law”

µProc

55%/year

(2X/1.5yr)

DRAM

7%/year

(2X/10yrs)

Processor-Memory

Performance Gap(grows 50%/year)

Page 4: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Introduction

• The Principle of Locality– Program access a relatively small portion of the

address space at any instant of time.

• Two Different Types of Locality– Temporal Locality (Locality in Time)

» If an item is referenced, it will tend to be referenced again soon

• e.g., loop, subrouting, stack, variable of counting

– Spatial Locality (Locality in Space)» If an item is referenced, items whose addresses are close

by tend to be referenced soon

• e.g., array access, accessed sequentially

4

Page 5: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy

• Memory Hierarchy– A structure that uses multiple levels of memories;

as the distance form the CPU increase, the size of the memories and the access time both increase

– Locality + smaller HW is faster = memory hierarchy

• Levels– each smaller, faster, more expensive/byte than

level below

• Inclusive– data found in top also found in the bottom

5

Page 6: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Three Primary Technologies

• Building Memory Hierarchies– Main Memory

» DRAM (Dynamic random access memory)

– Caches (closer to the processor)» SRAM (static random access memory)

• DRAM vs. SRAM– Speed : DRAM < SRAM

– Cost: DRAM < SRAM

6

Page 7: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Introduction

• Cache memory– Made by SRAM (Static RAM)

– Small amount of fast and high speed memory

– Sits between normal main memory and CPU

– May be located on CPU chip or module

7

Page 8: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Introduction

• Cache memory

8

Page 9: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

A Typical Memory Hierarchy c.2008

9

L1 Data Cache

L1 Instruction Cache

Unified L2 Cache

RF Memory

Memory

Memory

Memory

Multiported register file (part of CPU)

Split instruction & data primary caches (on-chip SRAM)

Multiple interleaved memory banks(off-chip DRAM)

Large unified secondary cache (on-chip SRAM)

CPU

Page 10: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

A Typical Memory Hierarchy

Second

Level

Cache

(SRAM)

Control

Datapath

Secondary

Memory

(Disk)

On-Chip Components

Reg

File

Main

Memory

(DRAM)

Data

Cach

e

Instr

Cach

e

ITL

BD

TL

B

eDRAM

Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s G’s to T’s

Cost: highest lowest

By taking advantage of the principle of locality Can present the user with as much memory as is available in the

cheapest technology at the speed offered by the fastest technology

Page 11: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Characteristics of Memory Hierarchy

11

Increasing distance from the processor in access time

L1$

L2$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes (disk sector = page)

8-32 bytes (block)

Page 12: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Memory Hierarchy List

12

• Registers

• L1 Cache

• L2 Cache

• L3 cache

• Main memory

• Disk cache

• Disk (RAID)

• Optical (DVD)

• Tape

Page 13: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Why IC and DC need?

13

Page 14: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy: Terminology

• Hit: data is in some block in the upper level (Blk X)

– Hit Rate: the fraction of memory accesses found in the upper level

– Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

14

Lower Level

MemoryUpper Level

Memory

To Processor

From ProcessorBlk X

Blk Y

Page 15: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy: Terminology

• Miss: data is not in the upper level so needs to be retrieve from a block in the lower level (Blk Y)

– Miss Rate = 1 - (Hit Rate)

– Miss Penalty» Time to replace a block in the upper level + Time to

deliver the block the processor

» Hit Time << Miss Penalty

15

Lower Level

MemoryUpper Level

Memory

To Processor

From ProcessorBlk X

Blk Y

Page 16: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

How is the Hierarchy Managed?

• registers memory– by compiler (programmer?)

• cache main memory– by the cache controller hardware

• main memory disks– by the operating system (virtual memory)

– virtual to physical address mapping assisted by the hardware (TLB)

– by the programmer (files)

16

Page 17: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

7.2 The basics of Caches

• Simple cache– The processor requests are each one word

– The block size is one word of data

• Two questions to answer (in hardware):– Q1: How do we know if a data item is in the

cache?

– Q2: If it is, how do we find it?

17

Page 18: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Caches

• Direct Mapped– Assign the cache location based on the address of

the word in memory

– Address mapping:(block address) modulo (# of blocks in the cache)

» First consider block sizes of one word

18

Page 19: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Direct Mapped (Mapping) Cache

19

Page 20: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Caches

• Tag– Contain the address information required to

identify whether a word in the cache corresponds to the requested word

• Valid bit– After executing many instructions, some of the

cache entries may still be empty

– Indicate whether an entry contains a valid address» If valid bit = 0, there cannot be a match for this block

20

Page 21: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Direct Mapped Cache

23

0 1 2 3

4 3 4 15

• Consider the main memory word reference string

0 1 2 3 4 3 4 15

00 Mem(0) 00 Mem(0)

00 Mem(1)

00 Mem(0) 00 Mem(0)

00 Mem(1)

00 Mem(2)

miss miss miss miss

miss misshit hit

00 Mem(0)

00 Mem(1)

00 Mem(2)

00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)

00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)

00 Mem(3)

01 Mem(4)

00 Mem(1)

00 Mem(2)

00 Mem(3)

01 4

11 15

00 Mem(1)00 Mem(2)

00 Mem(3)

Start with an empty cache - all blocks initially marked as not valid

8 requests, 6 misses

Page 22: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Hits vs. Misses

• Read hits– this is what we want!

• Read misses– stall the CPU, fetch block from memory, deliver to

cache, restart

• Write hits– can replace data in cache and memory (write-through)

– write the data only into the cache (write-back the cache later)

• Write misses– read the entire block into the cache, then write the

word

28

Page 23: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

What happens on a write?

• Write work somewhat differently– Suppose on a store instruction

» Write the data into only the data cache

» Memory would have different value

• The cache & memory are “inconsistent”

–Keep the main memory & cache» Always write the data into both the memory and the

cache

» Called write-through (直接寫入 )

29

Page 24: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

What happens on a write?

• Although this design handles writes simple– Not provide very good performance

» Every write causes the data to be written to main memory

» Take a long time

» Ex. 10% of the instructions are stores

CPI without cache miss: 1.0

spending 100 extra cycles on every write

CPI = 1.0 + 100 x 10% = 11

reducing performance

30

Page 25: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Write Buffer for Write Through

• A Write Buffer is needed between the Cache and Memory (TLB: Translation Lookaside Buffer 轉譯旁觀緩衝區 )

– A queue that holds data while the data are waiting to be written to memory

– Processor:» writes data into the cache and the write buffer

– Memory controller:» write contents of the buffer to memory

31

ProcessorCache

Write Buffer

DRAM

Page 26: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

What happens on a write?

• Write back (間接寫入 )– New value only written only to the block in the

cache

– The modified block is written to the lower level of the hierarchy when it is replaced

32

Page 27: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

What happens on a write?

• Write Through– All writes go to main memory as well as cache

– Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date

– Lots of traffic

– Slows down writes

• Write Back– Updates initially made in cache only

– Update bit for cache slot is set when update occurs

– If block is to be replaced, write to main memory only if update bit is set

– Other caches get out of sync

33

Page 28: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Memory System to Support Caches

• It is difficult to reduce the latency to fetch the first word from memory

– We can reduce the miss penalty if increase the bandwidth from the memory to the cache

35

CPU

Cache

Memory

bus

CPU

Cache

Memory

bus

Multiplexor

CPU

Cache

Memory

bank 1

bus

Memory

bank 0

Memory

bank 2

Memory

bank 3

Page 29: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

One-word-wide memory organization

Assume

1. A cache block for 4 words

2. 1 memory bus clock cycle to send the address

3. 15 clock cycles for DRAM access initiated

4. 1 memory bus clock cycle to return a word of data

– Miss penalty: 1+ 4x15 + 4x1 = 65 clock cycles

– Number of bytes transferred per bus clock cycle for a single miss» 4 x 4 / 65 = 0.25

36

CPU

Cache

Memory

bus

Page 30: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Wide memory organization

Assume

1. A cache block for 4 words

2. 1 memory bus clock cycle to send the address

3. 15 clock cycles for DRAM access initiated

4. 1 memory bus clock cycle to return a word of data

– Two word wide» 1 + 2 x 15 + 2 x 1 = 33 clock cycles

» 4 x 4 / 33 = 0.48

– Four word wide» 1 + 1 x 15 + 1 x 1 = 17 clock cycles

» 4 x 4 / 17 = 0.94

37

CPU

Cache

Memory

bus

Multiplexor

Page 31: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

Interleaved memory organization

Assume

1. A cache block for 4 words

2. 1 memory bus clock cycle to send the address

3. 15 clock cycles for DRAM access initiated

4. 1 memory bus clock cycle to return a word of data

5. Each memory bank: 1 word wide

1. Advance: One latency time

– 1 + 1 x 15 + 4 x 1 = 20 clock cycle

– 4 x 4 / 20 = 0.8 byte/clock

» 3 times for one-word-wide

38

CPU

Cache

Memory

bank 1

bus

Memory

bank 0

Memory

bank 2

Memory

bank 3

Page 32: Computer Organization and Architecture Chapter 7  Large and Fast: Exploiting Memory Hierarchy

•Q & A

39


Recommended