Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | miriam-gallagher |
View: | 17 times |
Download: | 0 times |
Computer Architecture
Lecture 8: Memory hierarchy. Cache memory
Piotr Bilski
Characteristics of the memory systems
• Location• Capacity• Transfer unit• Access mode• Performance• Physical structure• Physical characteristics• Organization
Memory location
• Processor (registers, L1cache memory)
• Internal (main) memory (RAM)
• External memory (auxilary – disk drives)
Memory capacity
• Word size
• Number of words
• Memory capacity is expressed in bytes and their multiplications, so:
1 B = 8 b
1 KB = 1024 B, 1 MB = 1024 KB etc.
Transfer unit
• Number of the data lines connected to the memory module (normally equal to the word length), but:– Word is a basic unit in the memory
organization– Adressable unit is used to direct memory
addressing (byte or word)– Transfer unit can be equal to word or
addressable unit
Memory access modes
• Sequential access (e.g. tape memory)
• Direct access (disk memory)
• Random access (main memory)
• Associative access (cache memory)
Memory performance
• Access time– time between putting address to the address bus and acquiring information on the data bus
• Cycle time – access time increased by the time of the gap between the next access
• Transfer speed – for RAM: 1 / cycle time
Physical memory structure
• Semiconductor (RAM, ROM)
• Magnetic (hard disks, floppy disks, streamers)
• Optical (CD-ROM, DVD-ROM)
• Magnetooptical (WORM)
Physical characteristics
• Volatility– Volatile memory (RAM)– Non-volatile memory (ROM)
• Content modification– Erasable (np. RAM, EPROM)– Non-erasable (ROM)
Memory organization
• One level („flat”)
• Multilevel (e.g. cache)
Hit ratio0 1
T2
T1
T1 + T2
Access time
Memory hierarchy
Processor registers
Cache memory
Main (operational) memory
External memory
Capacity
Speed
Access time
cost
access time – cost / bit
capacity – cost / bit
capacity – access time
Why do we need cache memory?
• Locality of references rule – executed program consists of the fragments existing next to each other and executed one by one
• Time locality
• Spatial locality
Cache memory work regime
0
1
2
C-1
BlockRowsFlag
Block length (K words)
Memory address
Block 1 (K words)
0
1
2
3
Block N (K words)
Word length
2n - 1Main memory addressed using n bits (total 2n words)
Cache memory has C rows
Cache memory work regime (cont.)
ProcessorCache memory Main memory
Transfer of words
Transfer of blocks
Reading from cache memorySTART
Acquiring address from CPU
Is this block’s address in the
cache memory?
EXECUTION
Accessing main memory for the addressed block
Assignment of the block to the cache memory row
Transferring block into the cache memory
Transferring word to CPU
NO
Transfer of word to CPU
YES
Details of the cache memory
• Size
• Mapping
• Replacement algorithm
• Writing algorithm
• Row size
• Number of the cache memories
Size of the cache memory• Minimization of the memory cost• Maximization of the processor’s speed
Processor TypeProd. year
L1 cache instruction
L1 cache data
L2 memory
L3 memory
IBM 360 Mainframe 1968 16-32 KB None None
IBM 3033 Mainframe 1978 64 KB None None
80486 PC 1989 8 KB None None
Pentium PC 1993 8 KB 8 KB 256/512 None
PowerPC G4 PC/serv. 1999 32 KB 32 KB256/1 MB
2 MB
Pentium 4 PC/serv. 2000 8 KB 8 KB 256 KB None
Itanium PC/serv. 2001 16 KB 16 KB 96 KB 4 MB
Athlon Xp PC/serv. 1999 64 KB 64 KB 512 KB None
Athlon 64 PC/serv. 2002 64 KB 64 KB 1 MB None
Mapping function
• The number of the rows in the cache is smaller than the number of the blocks in the main memory
• Three methods exist:– Direct– Associative– Set-associative
Cache memory with direct mapping
Comparison
Flag Row Word
Memory address
Flag
…
Data
L0
Li
Main memory
B0
W0
W1
W2
W3
s-r r ws-r
w
s+w
s
w
hit
miss
Direct mapping (cont.)• i – number of the row in the cache memory• j – number of the block in the main memory• m – number of rows in the cache memory
i = j mod m
Address length: s+w bitsNumber of the addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of blocks in the main memory: 2s
Number of rows in the cache memory: 2r
Result of the direct mapping
Row in the cache memoryAssigned blocks in the main
memory
0 0, m, 2m, ... , 2s – m
1 1, m+1, 2m+1, ..., 2s – m +1
... ...m-1 m-1, 2m-1, 3m-1, ... , 2s – 1
Example of the direct mapping
• For the cache memory having 214 rows (4 B each) and main memory of 16 MB capacity:
Row in the cache memory Assigned main memory blocks
0 000000, 010000, ... , FF0000
1 000004, 010004, ..., FF0004
... ...214-1 00FFFC, 01FFFC, ... , FFFFFC
Row width: 8 b flag, 32 b data
Cache memory of associative mapping
Comparison
Flag Word
Memory address
Flag
…
Data
L0
Li
Main memory
B0
W0
W1
W2
W3
s
w
w
s+w
s
w
hit
miss
s
s
Associative mapping (cont.)
Address length: s+w bitsNumber of the addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of the main memory blocks: 2s
Number of rows in the cache memory: anyFlag size: s words
Example of the associative mappingAddress
000000
000004
12357A
FFFFF4
FFFFF8
FFFFFC
Data
35281987
F235A72C
3982FB1A
Flag Data
22 b 32 b
Flag (22 b)
Word (2 b)
000000
3FFFFF
048D5E
35281987
3982FB1A
F235A72C
Cache memory with set-associative mapping
Comparison
Flag Section Word
Memory address
Flag
…
Data
S0
Si
Main memory
W0
W1
W2
W3
s-d d ws-d
s+w
s+w
hit
miss
Set-associative mapping (cont.)
• i – number of the row in the cache memory• j – number of the block in the main memory• m – number of rows in the cache memory
m = v x ki = j mod v
Address length: s+w bitsNumber of addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of blocks in the main memory: 2s
Set-associative mapping (cont.)
Number of rows in a section: k
Number of sections: v = 2d
Number of rows in the cache memory: kv = k x 2d
Flag size: (s-d) bits
Example of the set-passociative mapping
Flag
000
01A
1FF
Data
35281987
F235A72C
67321342
3982FB1A
Flag Data
9 b 32 b
Flag (9 b)
Słowo (2 b)
000
01A
35281987
67321342
Section (13 b)
0000
0004
7FFC
0000
0004
7FFC9 b 32 b
01A F235A72C
Algorithms of the cache memory content replacement
• Least recently used (LRU)
• First in - first out (FIFO)
• Least frequently used (LFU)
• Random choice
Algorithms of writing into the cache memory
• write through
• write back
• System assuring consistency (multiprocessor system with cache)– Bus control with write through– Hardware transparency– Memory not mapped by the cache memory
Other problems
• Row size and block size
• Number of the cache memories– Memory of the higher level is integrated in
one chip with the processor, works with identical frequency
– Memory of the lower level works with the bus frequency (it is on the mainboard)
Pentium 4 cache memory
Pentium 4 processor core
• Instruction fetching/decoding unit– Fetches instructions from L2 cache memory– Decodes them into microoperations– transfers microoperations to L1 cache memory
• Non-sequential instruction execution unit– Queues microoperations
• Execution units– Execute microoperations– Fetch data from the L1 cache– Write results into the registers
• Memory subsystem– Communicates with the system bus and L2 cache
memory
PowerPC cache memory
Processor Size B / row Organization
PowerPC 601 1 x 32 KB 32 8-way
PowerPC 603 2 x 8 KB 32 2-way
PowerPC 604 2 x 16 KB 32 4-way
PowerPC 620 2 x 32 KB 64 8-way
PowerPC G3 2 x 32 KB 64 8-way
PowerPC G4 2 x 32 KB 32 8-way
PowerPC cache memory (cont.)