+ All Categories
Home > Documents > Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Date post: 02-Jan-2016
Category:
Upload: bertina-york
View: 231 times
Download: 0 times
Share this document with a friend
Popular Tags:
60
Shared Memory Multiprocessor s A. Jantsch / Z. Lu / I. Sander
Transcript
Page 1: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Shared Memory Multiprocessors

A. Jantsch / Z. Lu / I. Sander

Page 2: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Outline

Shared memory architectures Centralized memory Distributed memory

Shared memory programming Critical section Mutex and semaphore

Caches Write through / write-back caches The cache coherency problem

April 20, 2023 SoC Architecture 2

Page 3: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Shared Memory Architectures

Page 4: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 4

Shared Memory Architectures

Shared Memory Multiprocessor are often usedSymmetric Multiprocessors (SMP)

Symmetric access to all of main memory from any processor

also called UMA (uniform memory access)Distributed Shared Memory (DSM)

Access time depends on the location of data word in memory

also called NUMA (non-uniform memory access)

Page 5: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 5

Shared Memory Architectures

A shared memory programming model has a direct representation in hardware

Caches Increase performanceDemand cache coherence and memory

consistency protocols

Page 6: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 6

Shared Cache Architecture

Several processors are connected via a switch with a shared memory

Has been used for a very small number of processors

Is difficult to use for a large number of processors, since the shared cache must deliver an extremely large bandwidth

P1 Pm

Switch

Cache

Main Memory

Page 7: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 7

Bus-shared Shared Memory

The interconnect is a shared bus between the processors local caches and the memory

Has been used up to 20 to 30 processors

Scaling is limited due to the bandwidth limitations of the shared bus

P1 Pm

Cache

Main Memory

CacheBus

Page 8: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 8

Dancehall Architecture

Scalable Point-to-Point Network placed between caches and memory modules that together form the main memory

Due to the size of the interconnection network, the memory can be very far from the processors

P1 Pm

Cache

Memory

Cache

InterconnectionNetwork

Memory

Page 9: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 9

Distributed Memory

No symmetric approach. The local memory is much closer than the rest of the global memory.

Structure works very well with scaling

Important in the design to use the local memory efficiently.

P1 Pm

Cache

Memory

Cache

InterconnectionNetwork

Memory

Page 10: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

ARM’s Cortex-A9

April 20, 2023 SoC Architecture 10

L2 Cache Controller (PL310)

Cache-2-CacheTransfers

SnoopFiltering

Generalized Interrupt Control

and Distribution

Snoop Control Unit (SCU)

Timers

Advanced Bus Interface Unit

Optional 2nd I/F with Address FilteringPrimary AMBA 3 64bit Interface

AcceleratorCoherence

Port(AXI-3 Slave)

FPU/NEON

Cortex-A9 CPU

InstructionCache

DataCache

PTMI/F

FPU/NEON

Cortex-A9 CPU

InstructionCache

DataCache

PTMI/F

FPU/NEON

Cortex-A9 CPU

InstructionCache

DataCache

PTMI/F

FPU/NEON

Cortex-A9 CPU

InstructionCache

DataCache

PTMI/F

AC

P A

XI-

3 S

lave

ACP AXI-3 Master ACP AXI-3 Master

H/W Accelerators64-bit Non-cached

AXI-3 Masters

Three AXI-buses on the Cortex-A9 ACP port provides coherent access to L1 and L2 caches through the Snoop Control Unit Two AXI masters off of the L2 cache provide 8GB/s at 500Mhz access to the main SoC bus

Page 11: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Tilera Gx Architecture

April 20, 2023 SoC Architecture 11

4x4, 6x6, 8x8, 10x10 Chips 3 instructions per cycle per core 32 MB on chip cache 750 GOPS (32 bit operations) 200 Tbps on chip interconnect bandwidth 500 Gbps memory bandwidth ~ 1 GHz operating frequency 10W – 55W power consumption

5 mesh networks: 32 bit; Dimension order routing; 1 cycle traversal QDN: Request Dynamic Network, 64bit – memory requests RDN: Response Dynamic Network, 112bit – memory responses TDN :Tile Dynamic network, 128bit – cache-to-cache communication UDN: User Dynamic Network, 32bit – under SW control IDN: I/O Dynamic Network, 32bit – for non-memory I/O

Page 12: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Shared Memory Programming

Page 13: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Process and history

A process executes a sequence of statements. Each statement consists of one or more atomic (indivisible)

actions which transform one state into another (state transition).

Process state is formed of values of variables at a point in time.

The process history is a trace of one execution: a sequence of atomic operations.

Example

P1:

April 20, 2023 SoC Architecture 13

A1S0 S1 S2

A2Sm

Am

Page 14: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Atomic Operations

Indivisible sequence of state transitions Fine-grained atomic operations

Machine instructions (read, write, test-and-set, read-modify-write, swap etc.)

Atomicity is guaranteed by hardware Coarse-grained atomic actions

A sequence of fine-grained atomic actions Should not be interrupted Internal state transitions are not visible ”outside”.

April 20, 2023 SoC Architecture 14

Page 15: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Concurrent execution

The concurrent execution of multiple processes can be viewed as the interleaving of their sequences of atomic actions. A history is a trace of ONE execution, i.e., an intereleaving of atomic

actions of processes.

Example Individual histories

P1: s0 → s1

P2: p0 → p1 Interleaved execution history Trace 1: s0→p0→s1→p1

Trace 2: s0→s1→p0→p1

April 20, 2023 SoC Architecture 15

Page 16: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

How many traces?

A concurrent program of n processes each with m atomic actions can produce N = ((nm)!) / ((m!)n) different histories!

Example 2 processes, each with 2 actions, i.e., n=2, m=2, N=6 3 processes, each with 2 actions, i.e., n=3, m=2, N=90 3 processes, each with 3 actions, i.e., n=3, m=3, N=1680

Implication This makes it impossible to show the correctness of a program by

tesing (run the program and see what happens). Design a ”correct” program in the first place. For shared variable

programming, problems are concerned with accessing shared variables. Therefore a key issue is process synchronization.

April 20, 2023 SoC Architecture 16s0

Page 17: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Concurrent Execution Example

Possible Results: 0, 1, 2, 3, Undefined

April 20, 2023 SoC Architecture 17

Task A

x:=0;y:=0;Print (x+y);

Task B

x:=x+1;y:=y+2;

Page 18: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Concurrent Execution Example

April 20, 2023 SoC Architecture 18

Task A

x:=0;y:=0;Print (x+y);

Task B

x:=x+1;y:=y+2;

Results: 0

Task A

x:=0;y:=0;

Print (x+y);

Task B

x:=x+1;y:=y+2;

Results: 3

Task A

x:=0;

y:=0;Print (x+y);

Task B

x:=x+1;y:=y+2;

Results: 1

Page 19: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Concurrent Execution Example

April 20, 2023 SoC Architecture 19

Task A

x:=0;y:=0;

Print (x+y);

Task B

x:=x+1;

y:=y+2;

Results: 2

Task A

x:=0;y:=0;

Print (x+y);

Task B

R1 := x;

x:=R1+1;R2:=y;y:=R2+2;

Results: Undefined

Page 20: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Synchronization Synchronization constrains possible histories to desirable

(good) histories. Synchronization methods

Mutual exclusion (mutex) Exclusive access to shared variables within a critical section A mechanism that guarantees serialization of critical sections

(atomicity of critical sections with respect to each other) Condition synchronization

Delaying a process until the state satisfies a boolean condition. Lessons learnt: synchronization is required whenever

processes read and write shared variables to preserve data dependencies.

April 20, 2023 SoC Architecture 20

Page 21: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Concurrent Execution Example

Result: 3

provided that a. Both tasks behave sequential, and

b. writes are seen in the same order by both tasksApril 20, 2023 SoC Architecture 21

Task Ax:=0;y:=0;S:=1;While (S2==0);Print (x+y);

Task Bwhile (S1==0);x:=x+1;y:=y+2;S2:=1;

S1:=0; S2:=0;

Page 22: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Critical section

CS: a piece of code that can only be executed by one process at a time To provide mutual exclusive access to shared resources

(sequence of statements accessing shared variables) Two sections can be critical wrt each other if they cannot

be executed simultaneously, i.e., mutual exclusive sections.

Some synchronization mechanism is required at the entry and exit of the CS to ensure exclusive use.

April 20, 2023 SoC Architecture 22

Page 23: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Critical Section Example

April 20, 2023 SoC Architecture 23

Task ADo private operations;

Critical section begin; Update shared state;Critical section end;

Do private operations;

Task BDo private operations;

Critical section begin; Update shared state;Critical section end;

Do private operations;

Page 24: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

The Critical section problem Design entry and exit protocols that satisfy the following

properties: Mutual exlcusion

At most one process at a time is entering, executing and exiting the critical section.

Absence of deadlocks (livelocks) One of the competing processes succeed to enter Termination: CS should terminate in finite time

Absence of unnecessary delay A process is not prevented from entering if others do not compete.

Fairness (enventual entry, liveness) A process should eventually enter CS.

April 20, 2023 SoC Architecture 24

Page 25: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Solutions

Locking mechanisms Lock on enter; unlock on exit Variants of locks: spin lock (busy-waiting),

queueing locks, etc. Semphores

A general solution to the synchronization problem for both mutual exclusion and condition synchronization.

April 20, 2023 SoC Architecture 25

Page 26: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Lock

Enter CS: set the lock when it is cleared. < await (lock==false) lock := true >;

Exit CS: clear/release the lock

lock := false;

Synonyms: enter-exit, lock-unlock, acquire-release Example:

April 20, 2023 SoC Architecture 26

bool lock=0;

process CS1 {

while (true) {

<await ((lock==false) lock:= true>; //entry

CS;

lock :=false; //exit

non-critical section;}

}

bool lock=0;

process CS2 {

while (true) {

<await ((lock==false) lock := true>; //entry

CS;

lock :=false; //exit

non-critical section;}

}

Page 27: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Lock implementation Lock/unlock in terms of instructions:

Locking consists of several instructions Unlock is an ordinary store instruction.

To support the atomicity of locking, locks need hardware support, i.e., special atomic memory instructions. General semantics: <read location, test the value read, compute a new

value and store the new value to the location>

Many variants: read-modify-write, test&set, fetch&increment; swap, etc.

April 20, 2023 SoC Architecture 27

lock: load register, location //copy location to register

cmp register, #0 //compare with 0

bnz lock //if not 0, try again

store location, #1 //store 1, marking locked

ret

unlock: store location, #0

ret

Page 28: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Semaphore

A semaphore is a special kind of shared variable manipulated by two atomic operations, P and V.

Semaphores provide a low-level but efficient signaling mechanim for both mutual exclusion and condition synchronization

Inspired by a railroad semaphore: up/down signal flag Semaphore operation in Dutch

P (decrement when nonnegative), stands for ”proberen”(test) or ”passeren”

V (increment) stands for ”verhogen” or ”vrijeven”

April 20, 2023 SoC Architecture 28

Page 29: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Semaphore syntax and semantics

Declarationsem s = expr // single semaphore

InitializationDefault to 1

The value of a semaphore is non-negative integer Operations

P(s): <await (s>0) s:=s-1;> //wait, down

V(s): <s:=s+1> //signal, up

April 20, 2023 SoC Architecture 29

Page 30: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Semaphore types

Binary semaphore taking the value of 0 and 1 only.

Split binary semaphore is a set of binary semaphore where at most one semaphore is 1 at a time. The sum of semaphore values [0,1]

General (counting) semaphore takes any nonnegative integer value and can be used for condition synchronization, for example, Serves as a resource counter: counts the number of resources

April 20, 2023 SoC Architecture 30

Page 31: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Mutex semaphore A CS may be executed with mutual exclusion by

enclosing it within P and V operations on a binary semaphore.

Example: initiates to 1 to indicated CS is free

April 20, 2023 SoC Architecture 31

sem mutex=1;

process CS[i=0 to n] {

while (true) {

P(mutex); //entry, down

CS;

V(mutex); //exit, up

non-critical section;}

}

Page 32: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

Caches and Cache Coherency

Page 33: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 33

Caches and Cache Coherence

Caches play key role in all cases Reduce average data access time Reduce bandwidth demands placed on shared

interconnect

But private processor caches create a problem Copies of a variable can be present in multiple caches A write by one processor may not become visible to others

They’ll keep accessing stale value in their caches Cache coherence problem Need to take actions to ensure visibility

Page 34: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 34

Cache Memories

A cache memory is used to reduce the access time to memory

Cache misses can occur since the cache is much smaller than the memory

Processor CacheMain

Memory

Page 35: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 35

Cache Memories

The decision which parts of the memory reside in the cache is taken by a replacement-algorithm

There are different protocols for a write operation: Write-Back and Write-Through

Processor CacheMain

Memory

Page 36: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 36

Cache MemoriesRead Operation

If the memory location is in the cache (cache hit), the data is read from the cache.

If the memory location is not in the cache (cache miss), the block containing the data (is read from memory) and the cache is updated.

Processor CacheMain

Memory

Page 37: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 37

Cache MemoriesWrite Operation (Write Hit)

Write-Through Protocol A write operation updates the main memory

location depending on protocol cache may also be updated in this course we assume cache to be updated during

write hit

Processor CacheMain

Memory

Page 38: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 38

Cache MemoriesWrite Operation (Write Hit)

Write-Back Protocol A write operation updates only the cache location

and marks it as updated with an associated flag bit (dirty flag)

The main memory is updated later, when the block containing the marked address is removed from the cache.

Processor CacheMain

Memory

Page 39: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 39

Cache MemoriesWrite Operation (Write Miss)

Since data is not necessarily needed on a write there are two options Write Allocate: The block is allocated on a write miss,

followed by the corresponding write-hit actions No-Write Allocate: Write misses do not affect the cache,

instead only the lower-level memory is updated.

Processor CacheMain

Memory

Page 40: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 40

Cache MemoriesWrite Operation (Write Miss)

Write-through and write-back can be combined with write-allocate or no-write-allocate

Typically Write-back caches use write-allocate Write-through uses no-write-allocate

Processor CacheMain

Memory

Page 41: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 41

States for Cache Blocks

Write-through Invalid Valid

Write-Back Invalid Valid Dirty (not updated in memory)

Page 42: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 42

Cache Coherence(Uniprocessor)

1. P1 reads location u (value 5) from main memory2. P3 reads location u from main memory3. P3 writes u, changing the value to 74. P1 reads value u again5. P2 reads location u from main memory

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

Write-Back Cache

Page 43: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 43

Cache Coherence(Uniprocessor)

1. P1 reads location u (value 5) from main memory Cache of P1 is updated The block containing u=5 is loaded into the cache

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

u=5

u=5

Read

Page 44: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 44

Cache Coherence(Uniprocessor)

2. P3 reads location u from main memory Cache and Memory have still the value u=5

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

u=5

u=5

Read

Page 45: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 45

Cache Coherence(Uniprocessor)

3. P3 writes u, changing the value to 7 Cache is updated (u=7) and u is marked. Memory is not

changed!

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

u=5

u=7

Write

Page 46: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 46

Cache Coherence(Uniprocessor)

4. P1 reads value u again Since cache is common to all processes, there is no

problem though the main memory is not updated! All processes have the same view of the cache!

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

u=5

u=7

Read

Page 47: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 47

Cache Coherence(Uniprocessor)

5. P2 reads location u from main memory Since cache is common to all processes, there is no

problem though the main memory is not updated! All processes have the same view of the cache!

P1 P2

Cache

Main Memory

Bus

P3Single Processorrunning 3 processes

u=5

u=7

Read

Page 48: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 48

Cache Coherence(Uniprocessor)

If only a uniprocessor is involved there is no cache coherence problem!

However, if another device on the bus is involved that has direct memory access (like a DMA), the cache may not represent the contents of the memory and the cache coherence problem can occur!

Page 49: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 49

Cache Coherence Problem

1. P1 reads location u (value 5) from main memory2. P3 reads location u from main memory3. P3 writes u, changing the value to 74. P1 reads value u again5. P2 reads location u from main memory

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

Page 50: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 50

Cache Coherence Problem(Write-Through Cache)

1. P1 reads location u (value 5) from main memory P1’s cache is updated (u=5)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=5

u=5

Read

Write-Through

Page 51: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 51

Cache Coherence Problem(Write-Through Cache)

2. P3 reads location u from main memory P3’s cache is updated (u=5)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=5

u=5 u=5

Read

Page 52: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 52

Cache Coherence Problem(Write-Through Cache)

3. P3 writes u, changing the value to 7 main memory is updated (u=7) P3’s cache is updated (no-write-allocate caches

update cache on a write hit)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=7

u=7 (Valid)

Write Through

u=5

Page 53: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 53

Cache Coherence Problem(Write-Through Cache)

4. P1 reads value u again P1 reads the value from the cache (u=5), which

is not the correct value!

P1 P2

Cache

Main Memory

CacheBus

P3

Cache u=7 (Valid)

u=7

u=5

Read

Page 54: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 54

Cache Coherence Problem(Write-Through Cache)

5. P2 reads location u from main memory P2 reads the value from the main memory (u=7)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache u=7 (Valid)

u=7

u=5

Read

Page 55: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 55

Cache Coherence Problem(Write-Back Cache)

1. P1 reads location u (value 5) from main memory P1’s cache is updated (u=5)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=5

u=5

Page 56: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 56

Cache Coherence Problem(Write-Back Cache)

2. P3 reads location u from main memory P3’s cache is updated (u=5)

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=5

u=5 u=5

Read

Page 57: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 57

Cache Coherence Problem(Write-Back Cache)

3. P3 writes u, changing the value to 7 P3’s cache is updated (u=7) and location u is

marked Main memory is NOT updated!

P1 P2

Cache

Main Memory

CacheBus

P3

Cache

u=5

u=7

Write Back

u=5

Page 58: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 58

Cache Coherence Problem(Write-Back Cache)

4. P1 reads value u again P1 reads the value from the cache (u=5), which

is not the correct value!

P1 P2

Cache

Main Memory

CacheBus

P3

Cache u=7

u=5

u=5

Read

Page 59: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 59

Cache Coherence Problem(Write-Back Cache)

5. P2 reads location u from main memory P2 reads the value from the cache (u=5), which

is not the correct value!

P1 P2

Cache

Main Memory

CacheBus

P3

Cache u=7

u=5

u=5

Read

Page 60: Shared Memory Multiprocessors A. Jantsch / Z. Lu / I. Sander.

April 20, 2023 SoC Architecture 60

Cache Coherence

Since communication between processors is done by means of shared memory, cache coherence must be guaranteed.

The hardware should support cache coherence!

Everybody should have the same view of the memory system!


Recommended