Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 223 times |
Download: | 1 times |
Rev. by Luciano Gualà (2008) 14 -
William Stallings Computer Organization and Architecture
Chapter 4 & 5Cache Memory and Internal Memory
Rev. by Luciano Gualà (2008) 34 -
Memory
• How much ? As much as possible
• How fast ? As fast as possible
• How expensive ? As cheap as possible
• Fast memory is expensive• Large memory is expensive• The larger the memory, the slower the
access
Rev. by Luciano Gualà (2008) 44 -
Memory Hierarchy
• CPU Registers• L1 cache (on chip)• L2 cache (on
board)• Main memory• Disk cache• Disk• Optical• Tape
Acc
ess
tim
e
Siz
e
Acc
ess
Fre
qu
ency
Co
st p
er b
it
Rev. by Luciano Gualà (2008) 54 -
Characteristics
• Location• Capacity• Unit of transfer• Access method• Performance• Physical type• Physical characteristics• Organisation
Rev. by Luciano Gualà (2008) 64 -
Location
• CPU Registers
• Internal: access directly from CPU Cache RAM
• External: access through I/O module Disks CD-ROM, …
Rev. by Luciano Gualà (2008) 74 -
Capacity
• Word size The natural unit of organisation Usually, it is equal to the numer of bits used for
representing numbers or instructions Typical word size: 8 bits, 16 bits, 32 bits
• Number of words (or Bytes)1 Byte = 8 bits = 23 bits
1 K Byte = 210 Bytes = 210 x 23 bits = 1024 bytes (Kilo)
1 M Byte = 210 K Bytes = 1024 K Bytes (Mega)
1 G Byte = 210 M Bytes = 230 Bytes (Giga)
1 T Byte = 210 G Bytes = 1024 G Bytes (Tera)
Rev. by Luciano Gualà (2008) 84 -
Unit of Transfer
• Number of bits can be read/written at the same time
• Internal Usually governed by data bus width bus width may be equal to word size or (often) larger Typical bus width: 64, 128, 256 bits
• External Usually a block which is much larger than a word
• A related concept: addressable unit Smallest location which can be uniquely addressed Word internally Cluster on M$ disks
Rev. by Luciano Gualà (2008) 94 -
Access Methods (1)
• Sequential Start at the beginning and read through in order Access time depends on location of data and previous
location e.g. tape
• Direct Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location e.g. disk
Rev. by Luciano Gualà (2008) 104 -
Access Methods (2)
• Random Individual addresses identify locations exactly Access time is independent of location or previous access e.g. RAM
• Associative Data is located by a comparison with contents of a
portion of the store Access time is independent of location or previous access e.g. cache
Rev. by Luciano Gualà (2008) 114 -
Performance
• Access time Time between presenting the address and getting the
valid data
• Memory Cycle time Time may be required for the memory to “recover”
before next access Cycle time is access + recovery
• Transfer Rate Rate at which data can be moved TN=TA+ N/R
N: number of bits TA: access time TN: time need to read N bits R: transfer rate
Rev. by Luciano Gualà (2008) 124 -
Physical Types
• Semiconductor RAM, ROM, EPROM, Cache
• Magnetic Disk & Tape
• Optical CD & DVD
• Others …
Rev. by Luciano Gualà (2008) 134 -
Semiconductor Memory
• RAM (Random Access Memory) Misnamed as all semiconductor mem. are random
access Read/Write Volatile Temporary storage Static or dynamic
• ROM (Read only memory) Permanent storage Read only
Rev. by Luciano Gualà (2008) 144 -
Dynamic RAM
• Bits stored as charge in capacitors• Charges leak• Need refreshing even when powered• Simpler construction• Smaller per bit• Less expensive• Need refresh circuits• Slower• Main memory (static RAM would be too expensive)
Rev. by Luciano Gualà (2008) 154 -
Static RAM
• Bits stored as on/off switches• No charges to leak• No refreshing needed when powered• More complex construction• Larger per bit• More expensive• Does not need refresh circuits• Faster• Cache (here the faster the better)
Rev. by Luciano Gualà (2008) 164 -
Read Only Memory (ROM)
• Permanent storage• Microprogramming (see later)• Library subroutines• Systems programs (BIOS)• Function tables
Rev. by Luciano Gualà (2008) 174 -
Types of ROM
• Written during manufacture Very expensive for small runs
• Programmable (once) PROM Needs special equipment to program
• Read “mostly” Erasable Programmable (EPROM)
• Erased by UV (it can take up to 20 minuts) Electrically Erasable (EEPROM)
• Takes much longer to write than read• a single byte can be erased
Flash memory• Erase memory electrically “block-at-a-time”
Rev. by Luciano Gualà (2008) 184 -
Physical Characteristics
• Decay (refresh time)• Volatility (needs power source)• Erasable• Power consumption
Rev. by Luciano Gualà (2008) 194 -
Organisation
• Physical arrangement of bits into words• Not always obvious
e.g. interleaved
Rev. by Luciano Gualà (2008) 204 -
Basic Organization (1)
• Basic element: memory cell has 2 stable states: one represent 0, the other 1 can be written at least once can be read
Select
R/W Control
Input DataCell
SelectCell
Write Read
R/W Control
Output Data
Rev. by Luciano Gualà (2008) 214 -
Basic Organization (2)
• Basic organization of a 512x512 bits chip
A0
A8
A9
A17
Array of Memory Cells
(512x512)
Column AddressDecoder
Ro
w A
dd
ress
D
eco
de
r
9
9
D0
Timing and control
1Sense Amplifierand I/O Gate
Rev. by Luciano Gualà (2008) 224 -
Module Organisation
• Basic organization of a 256KB chip
• 8 times a 512x512 bits chip
• …For a 1 MB chip replicate 4 times this organization…
Rev. by Luciano Gualà (2008) 244 -
Organisation for larger sizes
• The larger the size the higher the number of address pins
• For 2k words, k pins are needed
• A solution to reduce the number of address pins Multiplex row address and
column address k/2 pins to address 2k Bytes Adding one more pin doubles
range of values so x4 capacity
Rev. by Luciano Gualà (2008) 264 -
Refreshing (Dynamic RAM)
• Refresh circuit included on chip• Disable chip• Count through rows• Read & Write back• Takes time• Slows down apparent performance
Rev. by Luciano Gualà (2008) 284 -
Error Correction
• Hard Failure Permanent defect
• Soft Error Random, non-destructive No permanent damage to memory
• Detected using Hamming error correcting code it is able to detect and correct 1-bit errors
Rev. by Luciano Gualà (2008) 304 -
A simple example of correction (1)
A B
C
A B
C
1
11 0
1
11 0
1 0
0
• Correcting errors in 4 bits words• 3 control groups
• In each control group add 1 parity bit
Rev. by Luciano Gualà (2008) 314 -
• One of the bits change value
• Using control bit the right value is restored
A simple example of correction (2)
A B
C
A B
C
1
10 0
1
11 0
1 0
0
1 0
0
Rev. by Luciano Gualà (2008) 324 -
Compare Circuit
• it takes two K-length binary strings X, Y as input X=XK…X1
Y=YK…Y1
• it returns a K-length binary string Z (syndrome) Z=ZK…Z1
Zi=Xi Yi for each i=1,…,K
• Z=0…0 means no error
Rev. by Luciano Gualà (2008) 334 -
Relation between M and K• Z may assume 2K values• the value Z=0…0 means no error• the error may be in any bit among the M+K bits• it must be
2K -1 M+K
Data bits (M) Control Bits (K) Additional Memory (%)
4 3 75
8 4 50
16 5 31,25
32 6 18,75
64 7 10,94
128 8 6,25
256 9 3,52
Rev. by Luciano Gualà (2008) 344 -
How to arrange the M+K bits
• the M+K bits are arranged so that if Z contains a single bit equal to 1
• error occured in the corresponding control bit
if Z contains more than one bit equal to 1• error occured in the i-th bit where i is the value (in
binary) of Z
Rev. by Luciano Gualà (2008) 354 -
The case M=4
bit position 7 6 5 4 3 2 1
position number 111 110 101 100 011 010 001
data bits D4 D3 D2 D1
control bits C4 C2 C1
C1= D1 D2 D4C2= D1 D3 D4C4= D2 D3 D4
D1
D4D2 D3
C1 C2
C4
Rev. by Luciano Gualà (2008) 364 -
Exercise
• Design a Hamming error correcting code for 8-bit words
• See the textbook for the solution
Rev. by Luciano Gualà (2008) 374 -
Cache
• Small amount of fast memory• Sits between normal main memory and
CPU• May be located on CPU chip or module
Rev. by Luciano Gualà (2008) 384 -
Cache operation - overview
• CPU requests contents of memory location• Check cache for this data• If present (hit), get from cache (fast)• If not present (miss), read required block
from main memory to cache• Then deliver from cache to CPU
Rev. by Luciano Gualà (2008) 394 -
Cache Performance
• Cache access time: t=1• Memory access time: T=10• Hit Probability: H
Taverage access=t*H+(T+t)*(1-H)=t+(1-H)*T
0123456789
10
0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00
H
T a
vera
ge a
ccess
Rev. by Luciano Gualà (2008) 404 -
Locality of Reference (Denning’68)
• Spatial Locality Memory cells physically close to those just
accessed tend to be accessed
• Temporal Locality During the course of the execution of a
program, all accesses to the same memory cells tend to close in time
• e.g. loops, arrays
Rev. by Luciano Gualà (2008) 414 -
An example
200 …201 …202 SUB X, Y203 BRZ 211… …… …… …210 BRA 202211 …… …… …225 BRE R1, R2, 235
… …… … 235
conditional branch
conditional branch
unconditional branch
Rev. by Luciano Gualà (2008) 434 -
Cache Design
• Size• Mapping Function• Replacement Algorithm• Write Policy• Block Size• Number of Caches
Rev. by Luciano Gualà (2008) 444 -
Size does matter
• Cost More cache is expensive
• Speed More cache is faster (up to a point) Checking cache for data takes time
Rev. by Luciano Gualà (2008) 454 -
Cache-memory mapping
• There are M=2n/K blocks C << M
• Each block is mapped to a cache line
Rev. by Luciano Gualà (2008) 464 -
Mapping Function
• Word size: 1 Byte• Cache of 64KBytes (216 Bytes)• Cache block of 4 bytes
64 KB/4 = 16K (214) lines of 4 bytes
• 16MBytes (224) main memory 224/4 = 4M (222) blocks in main memory
• Map 222 blocks to 214 lines of cache
Rev. by Luciano Gualà (2008) 474 -
A simple example of Direct Mapping
00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 ……..……..……..1111011111
{
{
{
{
{
Block 0
Block 1
Block 2
Block 3
Block 15
wrs-r
{Block 4
Line 0
Line 1
Line 2
Line 3
Line 0
Line 3
Rev. by Luciano Gualà (2008) 484 -
Direct Mapping (1)
• Each block of main memory is mapped to a specific cache line i.e. if a block is in cache, it must be in one
specific place
• In a cache of C lines, block j is stored into line i, where: i = j mod C
Rev. by Luciano Gualà (2008) 494 -
Direct Mapping (2)
• Address is in two parts w Least Significant Bits (LSB) identify unique
word s Most Significant Bits (MSB) specify one
memory block
• The MSBs are split into a cache line field r (least significant) a tag of s-r (most significant)
Rev. by Luciano Gualà (2008) 504 -
Direct Mapping: Summarizing
• address length: n=s+w bits• number of addressable units (words): 2s+w
• block size=cache line size= 2w words• number of memory bocks: 2s+w/2w= 2s
• number of cache lines: C= 2r
• tag length: (s-r) bits
Rev. by Luciano Gualà (2008) 514 -
Cache line Main Memory blocks held
• 0 0, C, 2C, …,2s-C• 1 1, C+1, 2C+1, …, 2s-C+1
• C-1 C-1, 2C-1, 3C-1, …, 2s-1
Cache Line Mapping Table
Rev. by Luciano Gualà (2008) 524 -
Direct MappingAddress Structure
Tag s-r Line or Slot r Word w
8 14 2
• 24 bit address – 16MBytes (224) main memory• 2 bit word identifier (4 byte block)• Cache: 64 KB/4 = 16K (214) lines of 4 bytes• 22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
• No two blocks mapping to the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Rev. by Luciano Gualà (2008) 544 -
Direct Mapping pros & cons
• Simple• Inexpensive• Fixed location for given block
If a program repeatedly accesses 2 distinct blocks that are mapped to the same line, cache misses are very high (thrashing)
Rev. by Luciano Gualà (2008) 554 -
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory• Every line’s tag is examined for a match• Cache searching gets expensive
Rev. by Luciano Gualà (2008) 564 -
A simple example of Associative Mapping
00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 ……..……..……..1111011111
{
{
{
{
{
Block 0
Block 1
Block 2
Block 3
Block 15
ws
{Block 4
0011 0001 0000 0100
Line 0Line 1Line 2Line 3
w=0 w=1
}
}
}
}
Note: a replacement algorithm is needed(see later)
Rev. by Luciano Gualà (2008) 574 -
Associative Mapping: Summarizing
• address length: n=s+w• number of addressable units (words): 2s+w
• block size=cache line size= 2w words• number of memory bocks: 2s+w/2w= 2s
• number of cache lines: not specified• tag length: s bits
Rev. by Luciano Gualà (2008) 584 -
Tag 22 bitWord2 bit
Associative MappingAddress Structure
• 22 bit tag stored with each 4 byte block of data
• Compare tag field with tag entry in cache to check for hit
• Least significant 2 bits of address identify which byte is required from the 4 byte data block
Rev. by Luciano Gualà (2008) 604 -
Set Associative Mapping
• Cache is divided into v sets• Each set contains k lines• number of cache lines C=v۰k• A given block maps to any line in a given set
Block j can be in any line of set i, where i=j mod v
• There are k lines in a set (k-way set associative mapping) k=1: direct mapping; k=C: associative mapping
• The best choice in practice is 2 lines per set 2 way associative mapping A given block can be in only one set, but in any of its 2 lines
Rev. by Luciano Gualà (2008) 614 -
A simple example of Set Associative Mapping
00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 ……..……..……..1111011111
{
{
{
{
{
Block 0
Block 1
Block 2
Block 3
Block 15
ws-d
{Block 4
Set 0
Set 1
Set 1
Set 0
010 000 111 000
Line 0Line 1Line 2Line 3
w=0 w=1
d
Set 0 Set 0
Set 1
{
{
Set 1
Note: a replacement algorithm is needed(see later)
Rev. by Luciano Gualà (2008) 624 -
Set Associative Mapping
• Address is in two parts w Least Significant Bits (LSB) identify unique
word s Most Significant Bits (MSB) specify one
memory block
• The MSBs are split into a cache set field d (least significant) a tag of s-d (most significant)
Rev. by Luciano Gualà (2008) 634 -
Set Associative Mapping: Summarizing
• address length: n=s+w bits• number of addressable units (words): 2s+w
• block size=cache line size= 2w words• number of memory bocks: 2s+w/2w= 2s
• number of lines for each cache set: k• number of sets: v =2d
• number of cache lines: C = k v= k 2d • tag length: (s -d) bits
Rev. by Luciano Gualà (2008) 644 -
Set Associative MappingAddress Structure
Tag 9 bit Set 13 bitWord2 bit
• number of cache lines: 214 • number of cache sets: 213 • each cache set has two lines: 2-way set
associative mapping• Use set field to determine cache set to look in• Compare Tag field with all lines in the set to
see if we have a hit
Rev. by Luciano Gualà (2008) 664 -
Replacement Algorithms (1)Direct mapping
• No choice• Each block only maps to one line• Replace that line
Rev. by Luciano Gualà (2008) 674 -
Replacement Algorithms (2)Associative & Set Associative
• Hardware implemented algorithm (to obtain speed)• Least Recently used (LRU)• e.g. in 2 way set associative
Which of the 2 blocks is LRU?
• First in first out (FIFO) replace block that has been in cache longest
• Least frequently used replace block which has had fewest hits
• Random Almost as good as LRU
Rev. by Luciano Gualà (2008) 684 -
Write Policy
• Multiple CPUs may have individual caches• I/O may address main memory directly
cache(s) and main memory may become non-consistent
Rev. by Luciano Gualà (2008) 694 -
Write through
• All writes go to main memory as well as cache
• Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
• Lots of traffic• Slows down writes
Rev. by Luciano Gualà (2008) 704 -
Write back
• Updates initially made in cache only• Update bit for cache slot is set when update
occurs• If block has to be replaced, write to main memory
only if update bit is set• I/O must access main memory through cache• N.B. 15% of memory references are writes• Caches of other devices get out of sync
Cache coherency problem (a general problem in distributed systems !)
Rev. by Luciano Gualà (2008) 714 -
Block Size
• Too small Locality of reference is not used
• Too large Locality of reference is lost
• Typical block size: 8 – 32 bytes