Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | martina-webster |
View: | 216 times |
Download: | 0 times |
1
FAMU-FSU College of Engineering
ComputerArchitectureEEL 4713/5764, Fall 2006
Dr. Linda DeBrunner
Module #17—Main Memory Concepts
Mar. 2006 Computer Architecture, Memory System Design Slide 2
Part VMemory System Design
Mar. 2006 Computer Architecture, Memory System Design Slide 3
V Memory System Design
Topics in This Part
Chapter 17 Main Memory Concepts
Chapter 18 Cache Memory Organization
Chapter 19 Mass Memory Concepts
Chapter 20 Virtual Memory and Paging
Design problem – We want a memory unit that:• Can keep up with the CPU’s processing speed• Has enough capacity for programs and data• Is inexpensive, reliable, and energy-efficient
Mar. 2006 Computer Architecture, Memory System Design Slide 4
17 Main Memory Concepts Technologies & organizations for computer’s main memory
• SRAM (cache), DRAM (main), and flash (nonvolatile)• Interleaving & pipelining to get around “memory wall”
Topics in This Chapter
17.1 Memory Structure and SRAM
17.2 DRAM and Refresh Cycles
17.3 Hitting the Memory Wall
17.4 Interleaved and Pipelined Memory
17.5 Nonvolatile Memory
17.6 The Need for a Memory Hierarchy
Mar. 2006 Computer Architecture, Memory System Design Slide 5
17.1 Memory Structure and SRAM
Fig. 17.1 Conceptual inner structure of a 2h g SRAM chip and its shorthand representation.
/ h
Write enable / g
Data in
Address
Data out
Chip select
Q
C
Q
D
FF
Q
C
Q
D
FF
Q
C
Q
D
FF
/
g
Output enable
1
0
2 –1 h
Address decoder
Storage cells
/
g
/
g
/
g
WE
CS
OE
D in D out
Addr
.
.
.
Mar. 2006 Computer Architecture, Memory System Design Slide 6
Multiple-Chip SRAM
Fig. 17.2 Eight 128K 8 SRAM chips forming a 256K 32 memory unit.
/
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
WE
CS
OE
D in D out
Addr
18
/
17
32 WE
CS
OE
D in D out
Addr
Data in
Data out, byte 3
Data out, byte 2
Data out, byte 1
Data out, byte 0
MSB
Address
Mar. 2006 Computer Architecture, Memory System Design Slide 7
SRAM with Bidirectional Data Bus
Fig. 17.3 When data input and output of an SRAM chip are shared or connected to a bidirectional data bus, output must be disabled during write operations.
/ h
/
g
Write enable
Data in/out
Chip select
Output enable
Address Data in Data out
Mar. 2006 Computer Architecture, Memory System Design Slide 8
17.2 DRAM and Refresh Cycles
DRAM vs. SRAM Memory Cell Complexity
Word line
Capacitor
Bit line
Pass transistor
Word line
Bit line
Compl. bit line
Vcc
(a) DRAM cell (b) Typical SRAM cell
Fig. 17.4 Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity DRAM memory chips.
Mar. 2006 Computer Architecture, Memory System Design Slide 9
Fig. 17.5 Variations in the voltage across a DRAM cell capacitor after writing a 1 and subsequent refresh operations.
DRAM Refresh Cycles and Refresh Rate
Time
Threshold voltage
0 Stored
1 Written Refreshed Refreshed Refreshed
10s of ms before needing refresh cycle
Voltage for 1
Voltage for 0
Mar. 2006 Computer Architecture, Memory System Design Slide 10
Loss of Bandwidth to Refresh CyclesExample 17.2
A 256 Mb DRAM chip is organized as a 32M 8 memory externally and as a 16K 16K array internally. Rows must be refreshed at least once every 50 ms to forestall data loss; refreshing a row takes 100 ns. What fraction of the total memory bandwidth is lost to refresh cycles?
Column mux
Row
dec
ode
r
/ h
Address
Square or almost square memory matrix
Row buffer
Row
Column
g bits data out
/
g /
h
Write enable
/
g
Data in
Address
Data out
Output enable
Chip select
.
.
.
. . .
. . .
(a) SRAM block diagram (b) SRAM read mechanism
Figure 2.10
16K
16K
8
14
11
Solution
Refreshing all 16K rows takes 16 1024 100 ns = 1.64 ms. Loss of 1.64 ms every 50 ms amounts to 1.64/50 = 3.3% of the total bandwidth.
Mar. 2006 Computer Architecture, Memory System Design Slide 11
DRAM Packaging
Fig. 17.6 Typical DRAM package housing a 16M 4 memory.
Legend:
Ai CAS Dj NC OE RAS WE
1 2 3 4 5 6 7 8 9 10 11 12
24 23 22 21 20 19 18 17 16 15 14 13
A4 A5 A6 A7 A8 A9 D3 D4 CAS OE Vss Vss
A0 A1 A2 A3 A10 D1 D2 RAS WE Vcc Vcc NC
Address bit i Column address strobe Data bit j No connection Output enable Row address strobe Write enable
24-pin dual in-line package (DIP)
Mar. 2006 Computer Architecture, Memory System Design Slide 12
DRAM Evolution
Fig. 17.7 Trends in DRAM main memory.
1990 1980 2000 2010
Nu
mb
er
of
me
mo
ry c
hip
s
Calendar year
1
10
100
1000
Large PCs
Work- stations
Servers
Super- computers
1 MB
4 MB
16 MB
64 MB
256 MB
1 GB
4 GB
16 GB
64 GB
256 GB
1 TB
Computer class
Memory size
Small PCs
Mar. 2006 Computer Architecture, Memory System Design Slide 13
17.3 Hitting the Memory Wall
Fig. 17.8 Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.
1990 1980 2000 2010 1
10
10
Re
lati
ve p
erf
orm
anc
e
Calendar year
Processor
Memory
3
6
Mar. 2006 Computer Architecture, Memory System Design Slide 14
Bridging the CPU-Memory Speed Gap
Idea: Retrieve more data from memory with each access
Fig. 17.9 Two ways of using a wide-access memory to bridge the speed gap between the processor and memory.
Wide-access
memory
.
.
.
Narrow bus to
processor Mux
Wide-access
memory
. . .
Wide bus to
processor
.
.
. Mux
(a) Buffer and mult iplexer at the memory side
(a) Buffer and mult iplexer at the processor side
. . .
Mar. 2006 Computer Architecture, Memory System Design Slide 15
17.4 Pipelined and Interleaved Memory
Address translation
Row decoding & read out
Column decoding
& selection
Tag comparison & validation
Fig. 17.10 Pipelined cache memory.
Memory latency may involve other supporting operationsbesides the physical access itself
Virtual-to-physical address translation (Chap 20) Tag comparison to determine cache hit/miss (Chap 18)
Mar. 2006 Computer Architecture, Memory System Design Slide 16
Memory Interleaving
Fig. 17.11 Interleaved memory is more flexible than wide-access
memory in that it can handle multiple independent accesses at once.
Add- ress
Addresses that are 0 mod 4
Addresses that are 2 mod 4
Addresses that are 1 mod 4
Addresses that are 3 mod 4
Return data
Data in
Data out Dispatch
(based on 2 LSBs of address)
Bus cycle
Memory cycle
0
1
2
3
0
1
2
3
Module accessed
Time
Mar. 2006 Computer Architecture, Memory System Design Slide 17
17.5 Nonvolatile Memory
ROM PROM
EPROM
Fig. 17.12 Read-only memory organization, with the
fixed contents shown on the right.
B i t l i n e s
Word lines
Word contents
1 0 1 0
1 0 0 1
0 0 1 0
1 1 0 1
S u p p l y v o l t a g e
Mar. 2006 Computer Architecture, Memory System Design Slide 18
Flash Memory
Fig. 17.13 EEPROM or Flash memory organization.
Each memory cell is built of a floating-gate MOS transistor.
S o u r c e l i n e s
B i t l i n e s
Word lines
n+
n
p subs- trate
Control gate
Floating gate
Source
Drain
Mar. 2006 Computer Architecture, Memory System Design Slide 19
17.6 The Need for a Memory Hierarchy
The widening speed gap between CPU and main memory
Processor operations take of the order of 1 ns
Memory access requires 10s or even 100s of ns
Memory bandwidth limits the instruction execution rate
Each instruction executed involves at least one memory access
Hence, a few to 100s of MIPS is the best that can be achieved
A fast buffer memory can help bridge the CPU-memory gap
The fastest memories are expensive and thus not very large
A second (third?) intermediate cache level is thus often used
Mar. 2006 Computer Architecture, Memory System Design Slide 20
Typical Levels in a Hierarchical Memory
Fig. 17.14 Names and key characteristics of levels in a memory hierarchy.
Tertiary Secondary
Main
Cache 2
Cache 1
Reg’s $Millions $100s Ks
$10s Ks
$1000s
$10s
$1s
Cost per GB Access latency Capacity
TBs 10s GB
100s MB
MBs
10s KB
100s B
min+ 10s ms
100s ns
10s ns
a few ns
ns
Speed gap
21
Before our next class meeting… Homework #10 due on Thursday, Nov. 16 (no
electronic submissions)