+ All Categories
Home > Documents > CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory...

CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory...

Date post: 22-Dec-2015
Category:
View: 227 times
Download: 3 times
Share this document with a friend
Popular Tags:
49
CMPUT 229 - Computer Orga nization and Architecture I 1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and David O’Hallaron Memory technologes Memory hierarchy
Transcript
Page 1: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

1

Memory Hierarchy

Chapter 6: The Memory Hierarchy, A Programmer’s Perspective,Randal E. Bryant and David O’Hallaron

Memory technologesMemory hierarchy

Page 2: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

2

Types of Memories

Read/Write Memory (RWM):

the time required to read orwrite a bit of memory is independent of the bit’s location.

once a word is writtento a location, it remains stored as long as power is appliedto the chip, unless the location is written again.

the data stored ateach location must be refreshed periodically by reading it andthen writing it back again, or else it disappears.

we can store and retrieve data.

Random Access Memory (RAM):

Static Random Access Memory (SRAM):

Dynamic Random Access Memory (DRAM):

Page 3: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

3

Refreshing the Memory

Vcap

0V

HIGHLOW

VCC

time

0 stored

1 written refreshes

The solution is to periodically refresh the memorycells by reading and writing back each one of them.

Page 4: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

4

SRAM with Bi-directional Data Bus

WE_LCS_L

OE_L

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

IN OUTSELWR

DIO3 DIO2 DIO1 DIO0

WR_L

IOE_L

microprocessor

Page 5: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

5

DRAM High Level View

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

addr

data

2/

8/

Memorycontroller

(to CPU)

Byant/O’Hallaron, pp. 459

Page 6: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

6

DRAM RAS Request

RAS = 2

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

Row 2

addr

data

2/

8/

Memorycontroller

RAS = Row Address StrobeByant/O’Hallaron, pp. 460

Page 7: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

7

DRAM CAS Request

Supercell (2,1)

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

DRAM chip

CAS = 1

addr

data

2/

8/

Memorycontroller

CAS = Column Address StrobeByant/O’Hallaron, pp. 460

Page 8: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Memory Modules: Supercell (i,j)

031 78151623243263 394047485556

64-bit double word at main memory address A

addr (row = i, col = j)

data

64 MB memory module

consisting of8 8Mx8 DRAMs

Memorycontroller

bits0-7

DRAM 7

DRAM 0

bits8-15

bits16-23

bits24-31

bits32-39

bits40-47

bits48-55

bits56-63

64-bit doubleword to CPU chip

Byant/O’Hallaron, pp. 461

Page 9: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

9

Improved DRAMs

Central Idea: Each read to a DRAM actuallyreads a complete row of bits or word line fromthe DRAM core into an array of sense amps.

A traditional asynchronous DRAM interfacethen selects a small number of these bits to bedelivered to the cache/microprocessor.

All the other bits already extracted from the DRAMcells into the sense amps are wasted.

Page 10: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

10

Fast Page Mode DRAMs

In a DRAM with Fast Page Mode, a page is defined asall memory addresses that have the same row address.

To read in fast page mode, all the steps from 1 to 7 ofa standard read cycle are performed.

Then OE and CAS are switched high, but RAS remains low.

Then the steps 3 to 7 (providing a new column address,asserting CAS and OE) are performed for each newmemory location to be read.

Page 11: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

11

Enhanced Data Output RAMs (EDO-RAM)

The process to read multiple locations in an EDO-RAMis very similar to the Fast Page Mode.

The difference is that the output drivers are not disabledwhen CAS goes high.

This distintion allows the data from the current read cycleto be present at the outputs while the next cyclebegins.

As a result, faster read cycle times are allowed.

Page 12: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

12

Synchronous DRAMs (SDRAM)

A Synchronous DRAM (SDRAM) has a clock input. It operatesin a similar fashion as the fast page mode and EDO DRAM.However the consecutive data is output synchronously on thefalling/rising edge of the clock, instead of on command byCAS.

How many data elements will be output (the length of the burst) is programmable up to the maximum size ofthe row.

The clock in an SDRAM typically runs oneorder of magnitude faster than the access time forindividual accesses.

Page 13: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

13

DDR SDRAM

A Double Data Rate (DDR) SDRAM is an SDRAMthat allows data transfers both on the rising andfalling edge of the clock.

Thus the effective data transfer rate of a DDR SDRAM is two times the data transfer rate ofa standard SDRAM with the same clock frequency.

Page 14: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

14

The Rambus DRAM (RDRAM)

Multiple memory arrays (banks)Rambus DRAMs are synchronous and transfer data on both edges of the clock.

Page 15: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

15

SDRAM Memory Systems

Complex circuits for RAS/CAS/OE.

Each DIMM is connectedin parallel with the memorycontroller.(DIMM = Dual In-line Memory Module)

Often requires buffering.

Needs the whole clockcycle to establish valid data.

Making the bus wider ismechanically complicated.

Page 16: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

16

RDRAM Memory Systems

Page 17: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

17

Bus Structure

Mainmemory

I/O bridge

Bus interface

ALU

Register fileCPU

System bus Memory bus

Disk controller

Graphicsadapter

USBcontroller

Mouse Keyboard Monitor

Disk

I/O bus Expansion slots forother devices such

as network adapters

Byant/O’Hallaron, pp. 472

Page 18: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

18

DMA Request

Mainmemory

I/O bridge

Bus interface

ALU

Register fileCPU

System bus Memory bus

Disk controller

Graphicsadapter

USBcontroller

Mouse Keyboard Monitor

Disk

I/O bus Expansion slots forother devices such

as network adapters

DMA = Direct Memory Access

Byant/O’Hallaron, pp. 473

Page 19: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

19

DMA Transfer

Mainmemory

I/O bridge

Bus interface

ALU

Register fileCPU

System bus Memory bus

Disk controller

Graphicsadapter

USBcontroller

Mouse Keyboard Monitor

Disk

I/O bus Expansion slots forother devices such

as network adapters

DMA = Direct Memory Access

Byant/O’Hallaron, pp. 473

Page 20: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

20

DMA Complet. Notification

Mainmemory

I/O bridge

Bus interface

ALU

Register fileCPU

Memory bus

Disk controller

Graphicsadapter

USBcontroller

Mouse Keyboard Monitor

Disk

I/O bus Expansion slots forother devices such

as network adapters

DMA = Direct Memory Access

Interrupt

Byant/O’Hallaron, pp. 474

Page 21: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

21

Magnetic Disks

Random AccessInexpensiveNon-volatile

Page 22: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

22

How do disks work? Platter: covered with magnetic recording material Track: logical division of platter surface Sector: hardware division of tracks Block: OS division of tracks

Typical block sizes: 512 B, 2KB, 4KB

Read/write head

Page 23: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

23

Disk I/O := block I/O Hardware address is converted to Cylinder, Surface and Sector

number Modern disks: Logical Sector Address 0…n

Access time: time from read/write request to when data transfer begins Seek time: the head reaches correct track

Average seek time 5-10 msec Rotation latency time: correct

block rotated under head

5400 RPM, 15K RPMOn average 4-11 msec

Block Transfer Time

Disk I/O

Page 24: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

24

Optimize I/O

Database system performance I/O bound Improve the speed of access to disk:

Scheduling algorithms File Organization

Introduce disk redundancy Redundant Array of Independent Disks (RAID)

Reduce number of I/Os Query optimization, indices

Page 25: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

25

Disk Arrays Single disk becomes bottleneck Disk arrays

instead of single large disk many small parallel disks

read N blocks in a single access timeconcurrent queriestables spanning among disks

Redundant Arrays of Independent Disks (RAID) 7 levels reliability redundancy parallelism

Page 26: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

26

Locality

We say that a computer program exhibits good locality if the program tends to reference data that is nearby or datathat has been referenced recently.

Because a program might do one of these things, but not the other,the principle of locality is separated into two flavors:

Temporal locality: a memory location that is referenced once is likely to be referenced multiple times in the near future.

Spatial locality: if a memory location that is referenced once then locations that are nearby are likely to be referenced in the near future.

Byant/O’Hallaron, pp. 478

Page 27: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

27

Examples

In the Sampler function below, RandInt returns a randomly selected integer within the specified interval.Which program has better locality?

1 int SumVec(int v[], int N) 2 { 3 int i; 4 int sum = 0; 5 6 for (i=0 ; i<N ; i=i+1) 7 sum += v[i]; 8 return sum; 9 }

1 int Sampler(int v[], int N, int K) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<K ; i=i+1) 7 { 8 j = RandInt(0,N-1); 9 sum += v[j];10 }11 return sum/K;12 }

Byant/O’Hallaron, pp. 479

Page 28: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Memory Hierarchy

Larger, slower,

and cheaper (per byte)storagedevices

Registers

CPU registers hold words retrieved from cache memory.

L0:

On-chip L1cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache.L1:

Off-chip L2cache (SRAM)

L2 cache holds cache lines retrieved from memory.L2:

Main memory(DRAM)

Main memory holds disk blocks retrieved from local

disks.

L3:

Local secondary storage(local disks)

Local disks hold files retrieved from disks on

remote network servers.

L4:

Remote secondary storage(distributed file systems, Web servers)

L5:

Smaller,faster,and

costlier(per byte)storage devices

Byant/O’Hallaron,

pp. 483

Page 29: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

29

Cache policiesHow to arrange the memory hierarchy such

that the memory access can be as efficient as possible Buffer management

Between memory and hard disksLRU (Least Recently Used)

Cache organizationBetween L1, L2, and MemoryBlock-based cachingHardware implemented

Registers?

Page 30: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

30

Caching Principle

4 9 14 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitioned

into blocks.

Smaller, faster, more expensivedevice at level k caches a

subset of the blocks from level k+1

Data is copied betweenlevels in block-sized transfer units

Level k:

Level k+1:

Byant/O’Hallaron, pp. 484

Page 31: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

31

Cache Misses

Cold Misses, or compulsory misses, occur the first time that a data is referenced.

Conflict Misses, occur when two memory references have to occupy the same memory line. It can occur even when the remainder of the cache is not in use.

Capacity Misses, occur when there are no more free lines in the cache.

Page 32: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

32

L1 and L2 Bus System

Mainmemory

I/Obridge

Bus interfaceL2 cache

ALU

Register file

CPU chip

Cache bus System bus Memory bus

L1 cache

Byant/O’Hallaron, pp. 488

Page 33: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Cache Organization

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet 0:

B = 2b bytesper cache block

E lines per set

S = 2s sets

t tag bitsper line

1 valid bitper line

Cache size: C = B x E x S data bytes

• • •

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet 1:

• • •

• • • B–110

• • • B–110

Valid

Valid

Tag

TagSet S -1:

• • •• • •

Byant/O’Hallaron, pp. 488

Page 34: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

34

Address Partition

t bits s bits b bits

0m-1

Tag Set index Block offset

Address:

Compared with tags in thecache to find a match.

Used to find the set wherethe data might be found inthe cache.

Selects which word, insidethe block, is referenced.

Byant/O’Hallaron, pp. 488

Page 35: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

35

Multi-Level Cache Organization

Mainmemory Disk

L1 i-cache

L1 d-cacheRegs L2 unifiedcache

CPU

Byant/O’Hallaron, pp. 504

Page 36: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

36

Writing Cache-Conscious Programs

Problem: Write C code for a function that computes the sum of the elements of a two dimensional array, a[M][N], of integers.

int SumArray(int a[][], int M, int N)

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 37: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

37

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 38: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

38

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 39: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

39

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 40: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

40

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 41: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

41

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 42: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

42

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 43: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

43

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 44: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

44

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 45: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

45

Read Bandwidth

The rate that a program reads data from the memory system iscalled the read throughput or the read bandwidth.

The read throughput of a program depends on the memory hierarchy level from which the data is retrieved.

The read throughput is measured in bytes per second, or morecommonly in Mbytes/s.

We can write a program to force the data to come from the various levels in the hierarchy to estimate the read throughput.

Page 46: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

CMPUT 229 - Computer Organization and Architecture I

46

Measuring Read Bandwidth

1 int test(int elems, int stride) 2 { 3 int i; 4 int result = 0; 5 volatile int sink; 6 7 for(i=0 ; i<elems ; i += stride) 8 result += data[i]; 9 sink = result; /* to prevent compiler from optimizing away the loop */10 }

Byant/O’Hallaron, pp. 508

Page 47: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Pentium III Xeon Memory Mountain

s1

s3

s5

s7

s9

s11

s13

s15

8m

2m 512k 12

8k 32k 8k

2k

0

200

400

600

800

1000

1200

Rea

d t

hro

ug

hp

ut

(MB

/s)

Stride (words) Working set size (bytes)

Pentium III Xeon550 MHz16 KB on-chip L1 d-cache16 KB on-chip L1 i-cache512 KB off-chip unifiedL2 cache

Ridges oftemporallocality

L1

L2

Mem

Slopes ofspatiallocality

xe

Byant/O’Hallaron, pp. 514

Page 48: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Temporal Locality(stride = 1)

0

200

400

600

800

1000

1200

8m 4m 2m

1024

k

512k

256k

128k 64k

32k

16k 8k 4k 2k 1k

Working set size (bytes)

Rea

d t

hro

ug

pu

t (M

B/s

)

L1 cacheregion

L2 cacheregion

Main memoryregion

Page 49: CMPUT 229 - Computer Organization and Architecture I1 Memory Hierarchy Chapter 6: The Memory Hierarchy, A Programmer’s Perspective, Randal E. Bryant and.

Spatial Locality Slope(size = 256 KB)

0

100

200

300

400

500

600

700

800

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16

Stride (words)

Rea

d t

hro

ug

hp

ut

(MB

/s)

One access per cache line


Recommended