Computer Architecture

Computer Architecture

Lecture 8: Memory hierarchy. Cache memory

Piotr Bilski

Characteristics of the memory systems

• Location• Capacity• Transfer unit• Access mode• Performance• Physical structure• Physical characteristics• Organization

Memory location

• Processor (registers, L1cache memory)

• Internal (main) memory (RAM)

• External memory (auxilary – disk drives)

Memory capacity

• Word size

• Number of words

• Memory capacity is expressed in bytes and their multiplications, so:

1 B = 8 b

1 KB = 1024 B, 1 MB = 1024 KB etc.

Transfer unit

• Number of the data lines connected to the memory module (normally equal to the word length), but:– Word is a basic unit in the memory

organization– Adressable unit is used to direct memory

addressing (byte or word)– Transfer unit can be equal to word or

addressable unit

Memory access modes

• Sequential access (e.g. tape memory)

• Direct access (disk memory)

• Random access (main memory)

• Associative access (cache memory)

Memory performance

• Access time– time between putting address to the address bus and acquiring information on the data bus

• Cycle time – access time increased by the time of the gap between the next access

• Transfer speed – for RAM: 1 / cycle time

Physical memory structure

• Semiconductor (RAM, ROM)

• Magnetic (hard disks, floppy disks, streamers)

• Optical (CD-ROM, DVD-ROM)

• Magnetooptical (WORM)

Physical characteristics

• Volatility– Volatile memory (RAM)– Non-volatile memory (ROM)

• Content modification– Erasable (np. RAM, EPROM)– Non-erasable (ROM)

Memory organization

• One level („flat”)

• Multilevel (e.g. cache)

Hit ratio0 1

T2

T1

T1 + T2

Access time

Memory hierarchy

Processor registers

Cache memory

Main (operational) memory

External memory

Capacity

Speed

Access time

cost

access time – cost / bit

capacity – cost / bit

capacity – access time

Why do we need cache memory?

• Locality of references rule – executed program consists of the fragments existing next to each other and executed one by one

• Time locality

• Spatial locality

Cache memory work regime

0

1

2

C-1

BlockRowsFlag

Block length (K words)

Memory address

Block 1 (K words)

0

1

2

3

Block N (K words)

Word length

2n - 1Main memory addressed using n bits (total 2n words)

Cache memory has C rows

Cache memory work regime (cont.)

ProcessorCache memory Main memory

Transfer of words

Transfer of blocks

Reading from cache memorySTART

Acquiring address from CPU

Is this block’s address in the

cache memory?

EXECUTION

Accessing main memory for the addressed block

Assignment of the block to the cache memory row

Transferring block into the cache memory

Transferring word to CPU

NO

Transfer of word to CPU

YES

Details of the cache memory

• Size

• Mapping

• Replacement algorithm

• Writing algorithm

• Row size

• Number of the cache memories

Size of the cache memory• Minimization of the memory cost• Maximization of the processor’s speed

Processor TypeProd. year

L1 cache instruction

L1 cache data

L2 memory

L3 memory

IBM 360 Mainframe 1968 16-32 KB None None

IBM 3033 Mainframe 1978 64 KB None None

80486 PC 1989 8 KB None None

Pentium PC 1993 8 KB 8 KB 256/512 None

PowerPC G4 PC/serv. 1999 32 KB 32 KB256/1 MB

2 MB

Pentium 4 PC/serv. 2000 8 KB 8 KB 256 KB None

Itanium PC/serv. 2001 16 KB 16 KB 96 KB 4 MB

Athlon Xp PC/serv. 1999 64 KB 64 KB 512 KB None

Athlon 64 PC/serv. 2002 64 KB 64 KB 1 MB None

Mapping function

• The number of the rows in the cache is smaller than the number of the blocks in the main memory

• Three methods exist:– Direct– Associative– Set-associative

Cache memory with direct mapping

Comparison

Flag Row Word

Memory address

Flag

…

Data

L0

Li

Main memory

B0

W0

W1

W2

W3

s-r r ws-r

w

s+w

s

w

hit

miss

Direct mapping (cont.)• i – number of the row in the cache memory• j – number of the block in the main memory• m – number of rows in the cache memory

i = j mod m

Address length: s+w bitsNumber of the addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of blocks in the main memory: 2s

Number of rows in the cache memory: 2r

Result of the direct mapping

Row in the cache memoryAssigned blocks in the main

memory

0 0, m, 2m, ... , 2s – m

1 1, m+1, 2m+1, ..., 2s – m +1

... ...m-1 m-1, 2m-1, 3m-1, ... , 2s – 1

Example of the direct mapping

• For the cache memory having 214 rows (4 B each) and main memory of 16 MB capacity:

Row in the cache memory Assigned main memory blocks

0 000000, 010000, ... , FF0000

1 000004, 010004, ..., FF0004

... ...214-1 00FFFC, 01FFFC, ... , FFFFFC

Row width: 8 b flag, 32 b data

Cache memory of associative mapping

Comparison

Flag Word

Memory address

Flag

…

Data

L0

Li

Main memory

B0

W0

W1

W2

W3

s

w

w

s+w

s

w

hit

miss

s

s

Associative mapping (cont.)

Address length: s+w bitsNumber of the addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of the main memory blocks: 2s

Number of rows in the cache memory: anyFlag size: s words

Example of the associative mappingAddress

000000

000004

12357A

FFFFF4

FFFFF8

FFFFFC

Data

35281987

F235A72C

3982FB1A

Flag Data

22 b 32 b

Flag (22 b)

Word (2 b)

000000

3FFFFF

048D5E

35281987

3982FB1A

F235A72C

Cache memory with set-associative mapping

Comparison

Flag Section Word

Memory address

Flag

…

Data

S0

Si

Main memory

W0

W1

W2

W3

s-d d ws-d

s+w

s+w

hit

miss

Set-associative mapping (cont.)

• i – number of the row in the cache memory• j – number of the block in the main memory• m – number of rows in the cache memory

m = v x ki = j mod v

Address length: s+w bitsNumber of addressed units: 2s+w wordsBlock size = row size: 2w wordsNumber of blocks in the main memory: 2s

Set-associative mapping (cont.)

Number of rows in a section: k

Number of sections: v = 2d

Number of rows in the cache memory: kv = k x 2d

Flag size: (s-d) bits

Example of the set-passociative mapping

Flag

000

01A

1FF

Data

35281987

F235A72C

67321342

3982FB1A

Flag Data

9 b 32 b

Flag (9 b)

Słowo (2 b)

000

01A

35281987

67321342

Section (13 b)

0000

0004

7FFC

0000

0004

7FFC9 b 32 b

01A F235A72C

Algorithms of the cache memory content replacement

• Least recently used (LRU)

• First in - first out (FIFO)

• Least frequently used (LFU)

• Random choice

Algorithms of writing into the cache memory

• write through

• write back

• System assuring consistency (multiprocessor system with cache)– Bus control with write through– Hardware transparency– Memory not mapped by the cache memory

Other problems

• Row size and block size

• Number of the cache memories– Memory of the higher level is integrated in

one chip with the processor, works with identical frequency

– Memory of the lower level works with the bus frequency (it is on the mainboard)

Pentium 4 cache memory

Pentium 4 processor core

• Instruction fetching/decoding unit– Fetches instructions from L2 cache memory– Decodes them into microoperations– transfers microoperations to L1 cache memory

• Non-sequential instruction execution unit– Queues microoperations

• Execution units– Execute microoperations– Fetch data from the L1 cache– Write results into the registers

• Memory subsystem– Communicates with the system bus and L2 cache

memory

PowerPC cache memory

Processor Size B / row Organization

PowerPC 601 1 x 32 KB 32 8-way




PowerPC G3 2 x 32 KB 64 8-way

PowerPC G4 2 x 32 KB 32 8-way

PowerPC cache memory (cont.)

Date post:	03-Jan-2016
Category:	Documents
Upload:	miriam-gallagher
View:	17 times
Download:	0 times

Computer Architecture

Documents