+ All Categories
Home > Documents > Dist Pres

Dist Pres

Date post: 06-Apr-2018
Category:
Upload: sandeep-saini
View: 221 times
Download: 0 times
Share this document with a friend

of 75

Transcript
  • 8/2/2019 Dist Pres

    1/75

    Determining Global States of

    Distributed Systems

    Presented by

    Sanjeev R. Kulkarni

  • 8/2/2019 Dist Pres

    2/75

    Global State Detection 2

    References

    1. Distributed Snapshots: Determining Global States ofDistributed Systems, K. Mani Chandy and Leslie

    Lamport,ACM Transactions on Computer Systems, vol 3,

    no 1, Feb85.2. PUBLISHING: A Reliable Broadcast Communication

    Mechanism, Michael L. Powell and David L. Presotto,

    Proceedings of the Ninth ACM Symposium on Operating

    Systems Principles, Oct 83.3. Consistent Global States of Distributed Systems:

    Fundamental Concepts and Mechanisms, Ozalp Babaoglu

    and Keith Marzullo, Distributed Systems, Sape J.

    Mullender, Addison-Wesley, 1993.

  • 8/2/2019 Dist Pres

    3/75

    Global State Detection 3

    Outline of the talk

    Complexities of state detection in Distributed

    Systems

    The notion of Consistent States The Distributed Snapshots algorithm

    Application to detect Stable Properties and

    Checkpointing Another approach for state recording: Publishing

  • 8/2/2019 Dist Pres

    4/75

    Global State Detection 4

    Model of Computation

    Finite set of processes

    Process send messages on a finite set of

    unidirectional channels Channels are error free, FIFO and have infinite

    buffers

    Messages experience arbitrary but finite delays Strongly connected network

  • 8/2/2019 Dist Pres

    5/75

    Global State Detection 5

    Model of Computation (cont.)

    A computation is a sequence of events.

    An event is an atomic action that changes the state

    of a process and at most one channel state that isincident on that channel.

    p

    q `

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

  • 8/2/2019 Dist Pres

    6/75

    Global State Detection 6

    Happened Before Relation

    Events e and e` of the same process.

    if e happens before e` then e e`

    e and e` in two different processes

    if e = send(m) and e` = recv(m) then e e`

    Transitive

    if e e` and e` e`` then e e``

  • 8/2/2019 Dist Pres

    7/75

    Global State Detection 7

    Determining Global States

    Global State

    The global state of a distributed computation isthe set of local states of all individual processes

    involved in the computation plus the state of the

    communication channels.

  • 8/2/2019 Dist Pres

    8/75

    Global State Detection 8

    More on States

    process state memory state + register state + signal masks + open

    files + kernel buffers + Or

    application specific info like transactions completed,

    functions executed etc,.

    channel state Messages in transit i.e. those messages that have

    been sent but not yet received

  • 8/2/2019 Dist Pres

    9/75

    Global State Detection 9

    Whats the need for global states?

    Many problems in Distributed Computing can be

    cast as executing some action on reaching a

    particular state e.g.

    distributed deadlock detection is finding a cycle in the

    Wait For Graph.

    Termination detection

    Checkpointing

    many more..

  • 8/2/2019 Dist Pres

    10/75

    Global State Detection 10

    Why global state determination is

    difficult in Distributed Systems?

    Distributed State :

    Have to collect information that is spread

    across several machines!!

    Only Local knowledge :A process in the computation does not know

    the state of other processes.

  • 8/2/2019 Dist Pres

    11/75

    Global State Detection 11

    Difficulties

    Instantaneous recording not possible

    No global clock : Distributed recording of local states

    cannot be synchronized based on time

    Random Network Delays : No centralized process caninitiate the detection

  • 8/2/2019 Dist Pres

    12/75

    Global State Detection 12

    Difficulties due to Non Determinism

    Deterministic Computation

    At any point in computation there is at most one event

    that can happen next.

    Non-Deterministic Computation

    At any point in computation there can be more than one

    event that can happen next.

  • 8/2/2019 Dist Pres

    13/75

    Global State Detection 13

    Deterministic Computation Example

    A Variant of producer-consumer example

    Producer code:

    while (1)

    {

    produce m;

    send m;

    wait for ack;

    }

    Consumer code:while (1)

    {

    recv m;

    consume m;send ack;

    }

  • 8/2/2019 Dist Pres

    14/75

    Global State Detection 14

    Example: Initial State

    m

  • 8/2/2019 Dist Pres

    15/75

    Global State Detection 15

    Example

    m

  • 8/2/2019 Dist Pres

    16/75

    Global State Detection 16

    Example

    m

  • 8/2/2019 Dist Pres

    17/75

    Global State Detection 17

    Example

    a

  • 8/2/2019 Dist Pres

    18/75

    Global State Detection 18

    Example

    a

  • 8/2/2019 Dist Pres

    19/75

    Global State Detection 19

    Example

    a

  • 8/2/2019 Dist Pres

    20/75

    Global State Detection 20

    Deterministic state diagram

  • 8/2/2019 Dist Pres

    21/75

    Global State Detection 21

    Non-deterministic computation

    3 processes

    m1

    m2

    m3

    p

    q

    r

  • 8/2/2019 Dist Pres

    22/75

    Global State Detection 22

    p

    q

    r

    q

    Three possible runs

    r

    m1 m3

    m2

    m1

    m2

    m3

    m1m3

    m2

    p

    r

    p

    q

  • 8/2/2019 Dist Pres

    23/75

    Global State Detection 23

    A Non-Deterministic Computation

    All these states are feasible

  • 8/2/2019 Dist Pres

    24/75

    Global State Detection 24

    Feasible and Actual States

    Any state that an external observer could

    have observed is a feasible state

    A state that an external observer didobserve

    is an Actual state

  • 8/2/2019 Dist Pres

    25/75

    Global State Detection 25

    A Non-Deterministic Computation

    Only some states are actual

  • 8/2/2019 Dist Pres

    26/75

    Global State Detection 26

    Non-Determinism

    Deterministic computation

    A local event would reveal everything about the

    global state! The process will know other process state

    Not so for Non-Deterministic computation!

    m

  • 8/2/2019 Dist Pres

    27/75

    Global State Detection 27

    A nave snapshot algorithm

    Processes record their state at any arbitrary

    point

    A designated process collects these states

    + So simple!!

    - Correct??

  • 8/2/2019 Dist Pres

    28/75

    Global State Detection 28

    Example

    Producer Consumer problemp records its state

    m

    p q

  • 8/2/2019 Dist Pres

    29/75

    Global State Detection 29

    Example

    p q

    m

  • 8/2/2019 Dist Pres

    30/75

    Global State Detection 30

    Example

    q records its state

    p q

    m

  • 8/2/2019 Dist Pres

    31/75

    Global State Detection 31

    Example

    The recorded state

    m

    p q

    m

  • 8/2/2019 Dist Pres

    32/75

    Global State Detection 32

    Where did we err?

    What did we do?

    p

    q

    m

  • 8/2/2019 Dist Pres

    33/75

    Global State Detection 33

    Error!!

    The sender has no record of the sending

    The receiver has the record of the receipt

    Result

    Global state has record of the receive event but

    no send event violating the happened before

    concept!!

  • 8/2/2019 Dist Pres

    34/75

    Global State Detection 34

    The notion of Consistency

    A global state is consistent if it couldhave

    been observed by an external observer

    If e e` then it is never the case that e` is

    observed by the external observer and not e

    All feasible states are consistent

  • 8/2/2019 Dist Pres

    35/75

    Global State Detection 35

    An Example

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2

    m3

  • 8/2/2019 Dist Pres

    36/75

    Global State Detection 36

    A Consistent State?

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2

    m3

    Sp1 Sq

    1

  • 8/2/2019 Dist Pres

    37/75

    Global State Detection 37

    Yes

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2

    m3

    Sp1 Sq

    1

  • 8/2/2019 Dist Pres

    38/75

    Global State Detection 38

    A Consistent State?

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2

    m3

    Sp2 Sq

    3m3

  • 8/2/2019 Dist Pres

    39/75

    Global State Detection 39

    Yes

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2 m3

    Sp2 Sq

    3m3

  • 8/2/2019 Dist Pres

    40/75

    Global State Detection 40

    An inconsistent State

    p

    q

    p q

    Sp0 Sp

    1 Sp2 Sp

    3

    Sq0 Sq

    1 Sq2 Sq

    3

    m1m2

    m3

    Sp1 Sq

    3

  • 8/2/2019 Dist Pres

    41/75

    Global State Detection 41

    Chandy and Lamport Algorithm

    Features:

    Does not promise us to give us exactly what is

    thereBut gives us consistent state!!

  • 8/2/2019 Dist Pres

    42/75

    Global State Detection 42

    A brief sketch of the algorithm

    (from process ps perspective) p sends a marker message along all its outgoing channels

    after it records its state and before it sends any other

    messages.

    On receipt of a marker message from channel c

    else

    state ( c ) = messages received on c since it had

    recorded its state excluding the marker.

    if p has not recorded its state

    record the state

    state ( c ) = EMPTY

  • 8/2/2019 Dist Pres

    43/75

    Global State Detection 43

    Algorithm in Action

    p

    qSq

    0 Sq1 Sq

    2 Sq3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

  • 8/2/2019 Dist Pres

    44/75

    Global State Detection 44

    Algorithm in Action

    p

    qSq

    0 Sq1 Sq

    2 Sq3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

    q records state as Sq1 , sends marker to p

  • 8/2/2019 Dist Pres

    45/75

    Global State Detection 45

    Algorithm in Action

    p

    qSq

    0 Sq1 Sq

    2 Sq3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

    p records state as Sp2, channel state as empty

  • 8/2/2019 Dist Pres

    46/75

    Global State Detection 46

    Algorithm in Action

    p

    qSq

    0 Sq1 Sq

    2 Sq3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

    q records channel state as m3

  • 8/2/2019 Dist Pres

    47/75

    Global State Detection 47

    Algorithm in Action

    p

    qSq

    0 Sq1 Sq

    2 Sq3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

    Recorded Global State = ((Sp2, Sq

    1), (0,m3) )

  • 8/2/2019 Dist Pres

    48/75

    Global State Detection 48

    Why this is consistent

    Proof that if recv(m) is recorded then send(m) is

    also recorded.

    p q

    mM

  • 8/2/2019 Dist Pres

    49/75

    Global State Detection 49

    Algorithm in Action

    p

    q Sq0 Sq

    1 Sq2 Sq

    3

    Sp0 Sp

    1 Sp2 Sp

    3

    m1 m2 m3

    Recorded Global State = ((Sp2, Sq

    1), (0,m3) )

    Moral: Computation may not even have

    passed through the state recorded!

  • 8/2/2019 Dist Pres

    50/75

    Global State Detection 50

    What have we recorded

    The recorded consistent state can be anything!

  • 8/2/2019 Dist Pres

    51/75

    Global State Detection 51

    Properties of the recorded global

    state If Si and Sj are the global state when

    Lamports algorithm started and finished

    respectively and S* is the state recorded bythe algorithm then,

    S* is reachable from Si

    Sj is reachable from S*

  • 8/2/2019 Dist Pres

    52/75

    Global State Detection 52

    S* Is reachable from Si

    Si

    Sj

  • 8/2/2019 Dist Pres

    53/75

    Global State Detection 53

    Sj Is reachable from S*

    Si

    Sj

  • 8/2/2019 Dist Pres

    54/75

    Global State Detection 54

    Still what good is it?

    Stable Properties

    A property is called a stable property iff for

    all states S` reachable from S

    Eg: Deadlock, Termination, Token loss

  • 8/2/2019 Dist Pres

    55/75

    Global State Detection 55

    Stable Properties

    Si

    Sj

    S*

  • 8/2/2019 Dist Pres

    56/75

    Global State Detection 56

    Stable Properties

    Si

    Sj

    S*

  • 8/2/2019 Dist Pres

    57/75

    Global State Detection 57

    Detection of Stable Properties

    Outcome = false;

    while ( outcome == false )

    {determine Global State S;

    outcome = (S);

    }

  • 8/2/2019 Dist Pres

    58/75

    Global State Detection 58

    Checkpointing

    S* serves as a

    checkpoint

    On a failure, restart the

    computation from S*

    Problem!

    Not able to restore

    to Sj

    Si

    Sj

    S*

  • 8/2/2019 Dist Pres

    59/75

    Global State Detection 59

    Solution: Publishing

    A Broadcast medium

    A centralrecorder process records all the

    messages received by each process

    Processes record their states at their own

    time and send it to the recorder

  • 8/2/2019 Dist Pres

    60/75

    Global State Detection 60

    Architecture of Publishing

    recorder Sp1 Sq1

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1

    q Sq1

    p q

  • 8/2/2019 Dist Pres

    61/75

    Global State Detection 61

    q sends the message

    recorder Sp1 Sq2

    m1

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp1

    q Sq1 1

  • 8/2/2019 Dist Pres

    62/75

    Global State Detection 62

    p sends an ack

    recorder records m1

    recorder Sp2 Sq2

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    63/75

    Global State Detection 63

    Determining Global State

    Recorder can construct global state from

    Checkpointed States of all processes

    Plus

    Messages recd since last checkpoint

  • 8/2/2019 Dist Pres

    64/75

    Global State Detection 64

    Problems

    Publishing keeps track of all messages

    received by each process

    Expensive!

    Solution

    recorder takes checkpoint of process p at time t

    deletes all messages recd by p before t.

  • 8/2/2019 Dist Pres

    65/75

    Global State Detection 65

    p checkpoints

    recorder Sp2 Sq2

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    66/75

    Global State Detection 66

    Recorder stores Sp2

    deletes m1

    recorder Sp2 Sq2

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp2

    q Sq1 1

  • 8/2/2019 Dist Pres

    67/75

    Global State Detection 67

    The initial situation

    recorder Sp2 Sq2

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    68/75

    Global State Detection 68

    Say p crashes

    recorder Sq2

    p qSTATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    69/75

    Global State Detection 69

    Recorder reinstates p to Sp1

    recorder Sq2

    p q

    Sp1

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    70/75

    Global State Detection 70

    Replays back m1

    recorder Sq2

    p q

    Sp2

    m1

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    71/75

    Global State Detection 71

    q crashes

    recorder

    p q

    Sp2

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

  • 8/2/2019 Dist Pres

    72/75

    Global State Detection 72

    Recorder reinstates q to Sq1

    recorder

    p q

    Sp2

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

    Sq1

  • 8/2/2019 Dist Pres

    73/75

    Global State Detection 73

    Ignore m1

    recorder

    p q

    Sp2

    m1

    STATE SENT

    ID

    MSGS

    RECD

    p Sp1 m1

    q Sq1 1

    Sq1

  • 8/2/2019 Dist Pres

    74/75

    Global State Detection 74

    Comparison

    SNAPSHOT PUBLISHING

    Network Stronglyconnected Need not be

    Mode Distributed Centralized

    Scalability Yes No

    Restorability No Yes

  • 8/2/2019 Dist Pres

    75/75

    Summary

    Global State detection difficult in

    Distributed Systems

    Snapshot algorithm may not give an actualstate but is very helpful in detecting Stable

    Properties

    Publishing gives an asynchronous way ofdetermining global states but is unscalable


Recommended