+ All Categories
Home > Documents > Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2...

Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2...

Date post: 05-Jan-2016
Category:
Upload: imogen-small
View: 244 times
Download: 0 times
Share this document with a friend
38
Distributed Shared Memory (part 1)
Transcript
Page 1: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Distributed Shared Memory (part 1)

Page 2: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Distributed Shared Memory (DSM)

mem0

proc0

mem1

proc1

mem2

proc2

memN

procN

network

...

shared memory

Page 3: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Shared memory programming

• Standard – pthread• synchronizations

– Barriers – Locks– Semaphores

Page 4: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Sequential SOR

for some number of timesteps/iterations {for (i=0; i<n; i++ )

for( j=1, j<n, j++ )temp[i][j] = 0.25 *

( grid[i-1][j] + grid[i+1][j]

grid[i][j-1] + grid[i][j+1] );for( i=0; i<n; i++ )

for( j=1; j<n; j++ )grid[i][j] = temp[i][j];

}

Page 5: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Parallel SOR with Barriers (1 of 2)

void* sor (void* arg){

int slice = (int)arg;int from = (slice * (n-1))/p + 1;int to = ((slice+1) * (n-1))/p + 1;

for some number of iterations { … }}

Page 6: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Parallel SOR with Barriers (2 of 2)

for (i=from; i<to; i++) for (j=1; j<n; j++)

temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]);

barrier();for (i=from; i<to; i++)

for (j=1; j<n; j++) grid[i][j]=temp[i][j];

barrier();

Page 7: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Differences between SMP and Software DSM

• Delay: tradeoffs, such as block size• Software => traps: cost of

read/write misses• Goals of caches: multiprocessor =

performance, dist. system = transparency

• bus vs. long networks: reliance on serialization and broadcast.

Page 8: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Consequent differences in protocols and applications

• Bigger block size– Cost amortization, higher hit ratio for larger

blocks?– Reduced overhead

• But therefore...– Migration vs. Replication– False sharing increases

• DSM protocol more complex: Must handle lost, corrupted, and out-of-order packets

• Above, coupled with cost of traps, => SDSM consistency cost much higher!

Page 9: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Results of high consistency costs

• Manage sharing more carefully• Align data to page boundaries

Page 10: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Consistency Models

• Sequential Consistency– All processors observe the same order– Must correspond to some serial order– Only ordering constraint is that

reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.

Page 11: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Common consistency protocols

• Write update– Multicast update to all replicas

• Write invalidate– Invalidate cached copies in p2, p3– Cache miss if p2/p3 access X

• Valid data from other cache

Page 12: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Conventional Implementation

• As proposed by Li & Hudak, TOCS ‘86.• Use virtual memory to implement

sharing.• Shared memory divided up by virtual

memory pages.• Use single-writer, multiple-reader write-

invalidate coherence protocol.• Keep pages in one of three states:

– invalid, read-only, read-write

Page 13: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example

proc0 proc1 proc2 procN

shared memory

Page 14: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Read Access Hit

proc0 proc1 proc2 procN

read

Page 15: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Access Hit

proc0 proc1 proc2 procN

write

Page 16: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Read Access Miss

proc0 proc1 proc2 procN

read

Page 17: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Read Fault

proc0 proc1 proc2 procN

readfault

Page 18: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Replication on Read

proc0 proc1 proc2 procN

read

Page 19: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Access Miss

proc0 proc1 proc2 procN

write

Page 20: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Fault

proc0 proc1 proc2 procN

writefault

Page 21: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Invalidation

proc0 proc1 proc2 procN

write

Page 22: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Access to Read-Only

proc0 proc1 proc2 procN

write

Page 23: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Fault

proc0 proc1 proc2 procN

writefault

Page 24: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example: Write Invalidation

proc0 proc1 proc2 procN

write

Page 25: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

How to Remember Locations?

• Broadcast on miss (as in SMP).• Static home.• Dynamic home or owner.

Page 26: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Ownership and Owner Location

• Owner is the last writer.• Owner maintains copyset.• Every processor maintains

probable owner (not always the real owner).

Page 27: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Ownership Location

• Every read or write miss is sent to (local) probable owner.

• If owner, handle appropriately, else forward to probable owner.

Page 28: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Ownership Modification

• If write miss, new writer becomes owner, and all forwarders set probable owner to requester.

• If read miss, set probable owner to responding processor.

Page 29: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Example

• Initially, owner(page0) = p0, and probable owner(page0) = p0 everywhere.

• Write miss by p1, sends message to its probable owner (p0), handled there, new owner = p1, probable owner(0) on p0 = 1.

• Read miss by p2, sends message to probable owner (p0), forwarded to probable owner (p1), handled there, probable owner(0) on p2 becomes p1.

Page 30: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Implement synchronizations

• Use messages to implement synchronizations

Page 31: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Barriers

• Designate one processor as barrier manager.

• When a process waits at a barrier, it sends an arrival message to the barrier manager and waits.

• When barrier manager has received all messages, it sends a departure message to all processes.

Page 32: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Locks

• Designate one process as the lock manager for a particular lock.

• When a process acquires a lock, it sends an acquire message to the manager and waits.

• Manager forwards message to last acquirer.

• If lock free, send lock grant message.• If lock held, hold on to request until

free, and then send lock grant message.

Page 33: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Problem: False Sharing

• Concurrent access to different data within the same consistency unit.

• With page as consistency unit, lots of opportunity for false sharing.

• Two flavors:– read-write – write-write

Page 34: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Read-Write False Sharing

x

y

Page 35: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Page 36: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Read-Write False Sharing (Cont.)

w(x)

r(y) r(y) r(x)

synch

w(x) w(x)

Page 37: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Write-Write False Sharing

w(x)

w(y) w(y) r(x)

synch

w(x) w(x)

Page 38: Distributed Shared Memory (part 1). Distributed Shared Memory (DSM) mem0 proc0 mem1 proc1 mem2 proc2 memN procN network... shared memory.

Summary

• Software shared memory on distributed memory hardware.– Uses virtual memory.

• Home migration to improve locality– important because of high latencies.

• Sequential consistency suffers from false sharing


Recommended