FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner...

1

FAMU-FSU College of Engineering

ComputerArchitectureEEL 4713/5764, Fall 2006

Dr. Linda DeBrunner

Module #17—Main Memory Concepts

Mar. 2006 Computer Architecture, Memory System Design Slide 2

Part VMemory System Design


V Memory System Design

Topics in This Part

Chapter 17 Main Memory Concepts

Chapter 18 Cache Memory Organization

Chapter 19 Mass Memory Concepts

Chapter 20 Virtual Memory and Paging

Design problem – We want a memory unit that:• Can keep up with the CPU’s processing speed• Has enough capacity for programs and data• Is inexpensive, reliable, and energy-efficient


17 Main Memory Concepts Technologies & organizations for computer’s main memory

• SRAM (cache), DRAM (main), and flash (nonvolatile)• Interleaving & pipelining to get around “memory wall”

Topics in This Chapter

17.1 Memory Structure and SRAM

17.2 DRAM and Refresh Cycles

17.3 Hitting the Memory Wall

17.4 Interleaved and Pipelined Memory

17.5 Nonvolatile Memory

17.6 The Need for a Memory Hierarchy


17.1 Memory Structure and SRAM

Fig. 17.1 Conceptual inner structure of a 2h g SRAM chip and its shorthand representation.

/ h

Write enable / g

Data in

Address

Data out

Chip select

Q

C

Q

D

FF

Q

C

Q

D

FF

Q

C

Q

D

FF

/

g

Output enable

1

0

2 –1 h

Address decoder

Storage cells

/

g

/

g

/

g

WE

CS

OE

D in D out

Addr

.

.

.


Multiple-Chip SRAM

Fig. 17.2 Eight 128K 8 SRAM chips forming a 256K 32 memory unit.

/

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

WE

CS

OE

D in D out

Addr

18

/

17

32 WE

CS

OE

D in D out

Addr

Data in

Data out, byte 3

Data out, byte 2

Data out, byte 1

Data out, byte 0

MSB

Address


SRAM with Bidirectional Data Bus

Fig. 17.3 When data input and output of an SRAM chip are shared or connected to a bidirectional data bus, output must be disabled during write operations.

/ h

/

g

Write enable

Data in/out

Chip select

Output enable

Address Data in Data out


17.2 DRAM and Refresh Cycles

DRAM vs. SRAM Memory Cell Complexity

Word line

Capacitor

Bit line

Pass transistor

Word line

Bit line

Compl. bit line

Vcc

(a) DRAM cell (b) Typical SRAM cell

Fig. 17.4 Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity DRAM memory chips.


Fig. 17.5 Variations in the voltage across a DRAM cell capacitor after writing a 1 and subsequent refresh operations.

DRAM Refresh Cycles and Refresh Rate

Time

Threshold voltage

0 Stored

1 Written Refreshed Refreshed Refreshed

10s of ms before needing refresh cycle

Voltage for 1

Voltage for 0


Loss of Bandwidth to Refresh CyclesExample 17.2

A 256 Mb DRAM chip is organized as a 32M 8 memory externally and as a 16K 16K array internally. Rows must be refreshed at least once every 50 ms to forestall data loss; refreshing a row takes 100 ns. What fraction of the total memory bandwidth is lost to refresh cycles?

Column mux

Row

dec

ode

r

/ h

Address

Square or almost square memory matrix

Row buffer

Row

Column

g bits data out

/

g /

h

Write enable

/

g

Data in

Address

Data out

Output enable

Chip select

.

.

.

. . .

. . .

(a) SRAM block diagram (b) SRAM read mechanism

Figure 2.10

16K

16K

8

14

11

Solution

Refreshing all 16K rows takes 16 1024 100 ns = 1.64 ms. Loss of 1.64 ms every 50 ms amounts to 1.64/50 = 3.3% of the total bandwidth.


DRAM Packaging

Fig. 17.6 Typical DRAM package housing a 16M 4 memory.

Legend:

Ai CAS Dj NC OE RAS WE

1 2 3 4 5 6 7 8 9 10 11 12

24 23 22 21 20 19 18 17 16 15 14 13

A4 A5 A6 A7 A8 A9 D3 D4 CAS OE Vss Vss

A0 A1 A2 A3 A10 D1 D2 RAS WE Vcc Vcc NC

Address bit i Column address strobe Data bit j No connection Output enable Row address strobe Write enable

24-pin dual in-line package (DIP)


DRAM Evolution

Fig. 17.7 Trends in DRAM main memory.

1990 1980 2000 2010

Nu

mb

er

of

me

mo

ry c

hip

s

Calendar year

1

10

100

1000

Large PCs

Work- stations

Servers

Super- computers

1 MB

4 MB

16 MB

64 MB

256 MB

1 GB

4 GB

16 GB

64 GB

256 GB

1 TB

Computer class

Memory size

Small PCs


17.3 Hitting the Memory Wall

Fig. 17.8 Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.

1990 1980 2000 2010 1

10

10

Re

lati

ve p

erf

orm

anc

e

Calendar year

Processor

Memory

3

6


Bridging the CPU-Memory Speed Gap

Idea: Retrieve more data from memory with each access

Fig. 17.9 Two ways of using a wide-access memory to bridge the speed gap between the processor and memory.

Wide-access

memory

.

.

.

Narrow bus to

processor Mux

Wide-access

memory

. . .

Wide bus to

processor

.

.

. Mux

(a) Buffer and mult iplexer at the memory side

(a) Buffer and mult iplexer at the processor side

. . .


17.4 Pipelined and Interleaved Memory

Address translation

Row decoding & read out

Column decoding

& selection

Tag comparison & validation

Fig. 17.10 Pipelined cache memory.

Memory latency may involve other supporting operationsbesides the physical access itself

Virtual-to-physical address translation (Chap 20) Tag comparison to determine cache hit/miss (Chap 18)


Memory Interleaving

Fig. 17.11 Interleaved memory is more flexible than wide-access

memory in that it can handle multiple independent accesses at once.

Add- ress

Addresses that are 0 mod 4




Return data

Data in

Data out Dispatch

(based on 2 LSBs of address)

Bus cycle

Memory cycle

0

1

2

3

0

1

2

3

Module accessed

Time


17.5 Nonvolatile Memory

ROM PROM

EPROM

Fig. 17.12 Read-only memory organization, with the

fixed contents shown on the right.

B i t l i n e s

Word lines

Word contents

1 0 1 0

1 0 0 1

0 0 1 0

1 1 0 1

S u p p l y v o l t a g e


Flash Memory

Fig. 17.13 EEPROM or Flash memory organization.

Each memory cell is built of a floating-gate MOS transistor.

S o u r c e l i n e s

B i t l i n e s

Word lines

n+

n

p subs- trate

Control gate

Floating gate

Source

Drain


17.6 The Need for a Memory Hierarchy

The widening speed gap between CPU and main memory

Processor operations take of the order of 1 ns

Memory access requires 10s or even 100s of ns

Memory bandwidth limits the instruction execution rate

Each instruction executed involves at least one memory access

Hence, a few to 100s of MIPS is the best that can be achieved

A fast buffer memory can help bridge the CPU-memory gap

The fastest memories are expensive and thus not very large

A second (third?) intermediate cache level is thus often used


Typical Levels in a Hierarchical Memory

Fig. 17.14 Names and key characteristics of levels in a memory hierarchy.

Tertiary Secondary

Main

Cache 2

Cache 1

Reg’s $Millions $100s Ks

$10s Ks

$1000s

$10s

$1s

Cost per GB Access latency Capacity

TBs 10s GB

100s MB

MBs

10s KB

100s B

min+ 10s ms

100s ns

10s ns

a few ns

ns

Speed gap

21

Before our next class meeting… Homework #10 due on Thursday, Nov. 16 (no

electronic submissions)

Date post:	16-Jan-2016
Category:	Documents
Upload:	martina-webster
View:	216 times
Download:	0 times

FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner...

Documents