+ All Categories
Home > Documents > CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Date post: 03-Jan-2016
Category:
Upload: zia-beck
View: 36 times
Download: 2 times
Share this document with a friend
Description:
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS. Fall 2011 Prof. Jennifer Welch. Distributed Shared Memory. A model for inter-process communication Provides illusion of shared variables on top of message passing - PowerPoint PPT Presentation
49
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1
Transcript
Page 1: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

CSCE 668DISTRIBUTED ALGORITHMS AND SYSTEMS

Fall 2011Prof. Jennifer WelchCSCE 668

Set 16: Distributed Shared Memory 1

Page 2: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Distributed Shared Memory

CSCE 668Set 16: Distributed Shared Memory

2

A model for inter-process communication Provides illusion of shared variables on

top of message passing Shared memory is often considered a

more convenient programming platform than message passing

Formally, give a simulation of the shared memory model on top of the message passing model

We'll consider the special case of no failures only read/write variables to be simulated

Page 3: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

The Simulation

CSCE 668Set 16: Distributed Shared Memory

3

alg0

read/write return/ack

send recv

Message Passing System

algn-1

read/write return/ack

send recv

users of read/write shared memory

Shared Memory

Page 4: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Shared Memory Issues

CSCE 668Set 16: Distributed Shared Memory

4

A process invokes a shared memory operation (read or write) at some time

The simulation algorithm running on the same node executes some code, possibly involving exchanges of messages

Eventually the simulation algorithm informs the process of the result of the shared memory operation.

So shared memory operations are not instantaneous! Operations (invoked by different processes) can overlap

What values should be returned by operations that overlap other operations? defined by a memory consistency condition

Page 5: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Sequential Specifications

CSCE 668Set 16: Distributed Shared Memory

5

Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency.

Object supports operations invocations matching responses

Set of sequences of operations that are legal

Page 6: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Sequential Spec for R/W Registers

CSCE 668Set 16: Distributed Shared Memory

6

Each operation has two parts, invocation and response

Read operation has invocation readi(X) and response returni(X,v) (subscript i indicates proc.)

Write operation has invocation writei(X,v) and response acki(X) (subscript i indicates proc.)

A sequence of operations is legal iff each read returns the value of the latest preceding write.

Ex: [write0(X,3) ack0(X)] [read1(X) return1(X,3)]

Page 7: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Memory Consistency Conditions

CSCE 668Set 16: Distributed Shared Memory

7

Consistency conditions tie together the sequential specification with what happens in the presence of concurrency.

We will study two well-known conditions: linearizability sequential consistency

We will only consider read/write registers, in the absence of failures.

Page 8: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Definition of Linearizability

CSCE 668Set 16: Distributed Shared Memory

8

Suppose is a sequence of invocations and responses for a set of operations. an invocation is not necessarily immediately followed

by its matching response, can have concurrent, overlapping ops

is linearizable if there exists a permutation of all the operations in (now each invocation is immediately followed by its matching response) s.t. |X is legal (satisfies sequential spec) for all vars X,

and if response of operation O1 occurs in before

invocation of operation O2, then O1 occurs in before O2 ( respects real-time order of non-overlapping operations in ).

Page 9: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Examples

CSCE 668Set 16: Distributed Shared Memory

9

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,1)

p0

p1

Is this sequence linearizable?Yes - brown triangles.

What if p1's read returns 0?

0

No - see arrow.

1

2

3

4

Page 10: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Definition of Sequential Consistency

CSCE 668Set 16: Distributed Shared Memory

10

Suppose is a sequence of invocations and responses for some set of operations.

is sequentially consistent if there exists a permutation of all the operations in s.t. |X is legal (satisfies sequential spec) for all

vars X, and if response of operation O1 occurs in before

invocation of operation O2 at the same process, then O1 occurs in before O2 ( respects real-time order of operations by the same process in ).

Page 11: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Sequential Consistency Examples

CSCE 668Set 16: Distributed Shared Memory

11

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,0)

p0

p1

Is this sequence sequentially consistent?Yes - brown numbers.

What if p0's read returns 0?

0

No - see arrows.

1 2

3 4

Page 12: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Specification of Linearizable Shared Memory Comm. System

CSCE 668Set 16: Distributed Shared Memory

12

Inputs are invocations on the shared objects

Outputs are responses from the shared objects

A sequence is in the allowable set iff Correct Interaction: each proc. alternates

invocations and matching responses Liveness: each invocation has a matching

response Linearizability: is linearizable

Page 13: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Specification of Sequentially Consistent Shared Memory

CSCE 668Set 16: Distributed Shared Memory

13

Inputs are invocations on the shared objects Outputs are responses from the shared

objects A sequence is in the allowable set iff

Correct Interaction: each proc. alternates invocations and matching responses

Liveness: each invocation has a matching response

Sequential Consistency: is sequentially consistent

Page 14: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Algorithm to Implement Linearizable Shared Memory

CSCE 668Set 16: Distributed Shared Memory

14

Uses totally ordered broadcast as the underlying communication system.

Each proc keeps a replica for each shared variable

When read request arrives: send bcast msg containing request when own bcast msg arrives, return value in local replica

When write request arrives: send bcast msg containing request upon receipt, each proc updates its replica's value when own bcast msg arrives, respond with ack

Page 15: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

The Simulation

CSCE 668Set 16: Distributed Shared Memory

15

alg0

read/write return/ack

to-bc-send to-bc-recv

Totally Ordered Broadcast

algn-1

read/write return/ack

to-bc-send to-bc-recv

users of read/write shared memory

Shared Memory

Page 16: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Correctness of Linearizability Algorithm

CSCE 668Set 16: Distributed Shared Memory

16

Consider any admissible execution of the algorithm in which underlying totally ordered broadcast

behaves properly users interact properly (alternate

invocations and responses Show that , the restriction of to the

events of the top interface, satisfies Liveness and Linearizability.

Page 17: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Correctness of Linearizability Algorithm

CSCE 668Set 16: Distributed Shared Memory

17

Liveness (every invocation has a response): By Liveness property of the underlying totally ordered broadcast.

Linearizability: Define the permutation of the operations to be the order in which the corresponding broadcasts are received. is legal: because all the operations are

consistently ordered by the TO bcast. respects real-time order of operations: if O1

finishes before O2 begins, O1's bcast is ordered before O2's bcast.

Page 18: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Why is Read Bcast Needed?

CSCE 668Set 16: Distributed Shared Memory

18

The bcast done for a read causes no changes to any replicas, just delays the response to the read.

Why is it needed? Let's see what happens if we remove it.

Page 19: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Why Read Bcast is Needed

CSCE 668Set 16: Distributed Shared Memory

19

write(1)

read return(1)

read return(0)

to-bc-send

p0

p1

p2

Page 20: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Algorithm for Sequential Consistency

CSCE 668Set 16: Distributed Shared Memory

20

The linearizability algorithm, without doing a bcast for reads:

Uses totally ordered broadcast as the underlying communication system.

Each proc keeps a replica for each shared variable

When read request arrives: immediately return the value stored in the local replica

When write request arrives: send bcast msg containing request upon receipt, each proc updates its replica's value when own bcast msg arrives, respond with ack

Page 21: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Correctness of SC Algorithm

CSCE 668Set 16: Distributed Shared Memory

21

Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the order of non-overlapping writes

- implies per-process order of writes is preserved

Lemma (9.4): If pi writes Y and later reads X, then pi's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).

Page 22: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Correctness of the SC Algorithm

CSCE 668Set 16: Distributed Shared Memory

22

(Theorem 9.5) Why does SC hold? Given any admissible execution , must

come up with a permutation of the shared memory operations that is legal and respects per-proc. ordering of operations

Page 23: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

The Permutation

CSCE 668Set 16: Distributed Shared Memory

23

Insert all writes into in their to-bcast order.

Consider each read R in in the order of invocation:

suppose R is a read by pi of X place R in immediately after the later

of the operation by pi that immediately

precedes R in , and the write that R "read from" (caused

the latest update of pi's local copy of X preceding the response for R)

Page 24: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation Example

CSCE 668Set 16: Distributed Shared Memory

24

write(2)

read return(2)

read return(1)

to-bc-send

p0

p1

p2

ack

write(1) ack

to-bc-send

permutation is given by brown numbers

1

3

4

2

Page 25: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation Respects Per Proc. Ordering

CSCE 668Set 16: Distributed Shared Memory

25

For a specific proc: Relative ordering of two writes is

preserved by Lemma 9.3 Relative ordering of two reads is

preserved by the construction of If write W precedes read R in exec. , then

W precedes R in by construction Suppose read R precedes write W in .

Show same is true in .

Page 26: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation Respects Ordering

CSCE 668Set 16: Distributed Shared Memory

26 Suppose in contradiction R and W are

swapped in : There is a read R' by pi that equals or precedes R in There is a write W' that equals W or follows W in the

to-bcast order And R' "reads from" W'.

But: R' finishes before W starts in and updates are done to local replicas in to-bcast order

(Lemma 9.3) so update for W' does not precede update for W

so R' cannot read from W'.

R' R W|pi :

: …W … W' … R' … R …

Page 27: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation is Legal

CSCE 668Set 16: Distributed Shared Memory

27

Consider some read R of X by pi and some write W s.t. R reads from W in .

Suppose in contradiction, some other write W' to X falls between W and R in :

Why does R follow W' in ?

: …W … W' … R …

Page 28: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation is Legal

CSCE 668Set 16: Distributed Shared Memory

28

Case 1: W' is also by pi. Then R follows W' in because R follows W' in .

Update for W at pi precedes update for W' at pi in (Lemma 9.3).

Thus R does not read from W, contradiction.

Page 29: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation is Legal

CSCE 668Set 16: Distributed Shared Memory

29

Case 2: W' is not by pi. Then R follows W' in due to some operation O, also by pi , s.t. O precedes R in , and O is placed between W' and R in

Consider the earliest such O.

Case 2.1: O is a write (not necessarily to X). update for W' at pi precedes update for O at pi in

(Lemma 9.3) update for O at pi precedes pi's local read for R in

(Lemma 9.4) So R does not read from W, contradiction.

: …W … W' … O … R …

Page 30: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Permutation is Legal

CSCE 668Set 16: Distributed Shared Memory

30

Case 2.2: O is a read.• By construction of , O must read X and in

fact read from W' (otherwise O would not be after W')

• Update for W at pi precedes update for W' at pi in (Lemma 9.3).

• Update for W' at pi precedes local read for O at pi in (otherwise O would not read from W').

• Thus R cannot read from W, contradiction.

: …W … W' … O … R …

Page 31: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Performance of SC Algorithm

CSCE 668Set 16: Distributed Shared Memory

31

Read operations are implemented "locally", without requiring any inter-process communication.

Thus reads can be viewed as "fast": time between invocation and response is only that needed for some local computation.

Time for a write is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).

Page 32: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Alternative SC Algorithm

CSCE 668Set 16: Distributed Shared Memory

32

It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance: writes are local/fast (even though bcasts are

sent, don't wait for them to be received) reads can require waiting for some bcasts to be

received Like the previous SC algorithm, this one

does not implement linearizable shared memory.

Page 33: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Time Complexity for DSM Algorithms

CSCE 668Set 16: Distributed Shared Memory

33

One complexity measure of interest for DSM algorithms is how long it takes for operations to complete.

The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received.

The sequential consistency algorithm required D time for writes and 0 time for reads, since we are assuming time for local computation is negligible.

Can we do better? To answer this question, we need some kind of timing model.

Page 34: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Timing Model

CSCE 668Set 16: Distributed Shared Memory

34

Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast).

Assume that every message has delay in the range [d-u,d].

Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).

Page 35: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Time and Clocks in Layered Model

CSCE 668Set 16: Distributed Shared Memory

35

Timed execution: associate an occurrence time with each node input event.

Times of other events are "inherited" from time of triggering node input recall assumption that local processing time is

negligible. Model hardware clocks as before: run at

same rate as real time, but not synchronized Notions of view, timed view, shifting are

same: Shifting Lemma still holds (relates h/w clocks and

msg delays between original and shifted execs)

Page 36: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Lower Bound for SC

CSCE 668Set 16: Distributed Shared Memory

36

Let Tread = worst-case time for a read to complete

Let Twrite = worst-case time for a write to complete

Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, Tread + Twrite d.

Page 37: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

37

Consider any SC simulation with Tread + Twrite < d. Let X and Y be two shared variables, both initially 0. Let 0 be admissible execution whose top layer

behavior is

write0(X,1) ack0(X) read0(Y) return0(Y,0) write begins at time 0, read ends before time d every msg has delay d

Why does 0 exist? The alg. must respond correctly to any sequence of

invocations. Suppose user at p0 wants to do a write, immediately

followed by a read. By SC, read must return 0. By assumption, total elapsed time is less than d.

Page 38: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

38time 0 d

write(X,1) read(Y,0)p0

p1

0

Page 39: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

39

Similarly, let 1 be admissible execution whose top layer behavior iswrite1(Y,1) ack1(Y) read1(X) return1(X,0) write begins at time 0, read ends before time

d every msg has delay d

1 exists for similar reason.

Page 40: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

40time 0 d

write(X,1) read(Y,0)p0

p1

0

write(Y,1) read(X,0)

p0

p1

1

Page 41: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

41

Now merge p0's timed view in 0 with p1's timed view in 1 to create admissible execution '.

But ' is not SC, contradiction!

Page 42: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

SC Lower Bound Proof

CSCE 668Set 16: Distributed Shared Memory

42time 0 d

write(X,1) read(Y,0)p0

p1

0

write(Y,1) read(X,0)

p0

p1

1

write(X,1) read(Y,0)p0

p1

'

write(Y,1) read(X,0)

Page 43: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Write Lower Bound

CSCE 668Set 16: Distributed Shared Memory

43

Theorem (9.8): In any simulation of linearizable shared memory on top of point-to-point message passing, Twrite ≥ u/2.

Proof: Consider any linearizable simulation with

Twrite < u/2. Let be an admissible exec. whose top layer

behavior is:p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X

Shift to create admissible exec. in which p1 and p2's writes are swapped, causing p0's read to violate linearizability.

Page 44: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Write Lower Bound

CSCE 668Set 16: Distributed Shared Memory

44

0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

:

p0

p1

p2

delaypattern

d - u/2

d - u/2

d - u/2 d - u/2

d d - u

Page 45: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Write Lower Bound

CSCE 668Set 16: Distributed Shared Memory

45

0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

p0

p1

p2

delaypattern

d

d - u

d - u d

d- u d

shift p1

by u/2

shift p2

by -u/2

Page 46: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Read Lower Bound

CSCE 668Set 16: Distributed Shared Memory

46

Approach is similar to the write lower bound. Assume in contradiction there is an

algorithm with Tread < u/4.

Identify a particular execution: fix a pattern of read and write invocations,

occurring at particular times fix the pattern of message delays

Shift this execution to get one that is still admissible but not linearizable

Page 47: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Read Lower Bound

CSCE 668Set 16: Distributed Shared Memory

47

Original execution: p1 reads X and gets 0 (old value). Then p0 starts writing 1 to X. When write is done, p0 reads X and gets 1

(new value). Also, during the write, p0 and p1 alternate

reading X. At some point, the reads stop getting the old

value (0) and start getting the new value (1)

Page 48: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Read Lower Bound

CSCE 668Set 16: Distributed Shared Memory

48

Set all delays in this execution to be d - u/2. Now shift p2 earlier by u/2. Verify that result is still admissible (every

delay either stays the same or becomes d or d - u).

But in shifted execution, sequence of values read is

0, 0, …, 0, 1, 0, 1, 1, …, 1

Page 49: CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Linearizability Read Lower Bound

CSCE 668Set 16: Distributed Shared Memory

49

p0

p1

p2

read 0

read 1

read 0

read 1

read 1

read 1

read 1

read 0

write 1

u/2

p0

p1

read 0 read 0 read 1 read 1

p2

read 1read 1 read 1read 0

write 1


Recommended