Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 1 times |
CS294, Yelick Time, p1
CS 294-8Time, Clocks, and
Snapshots in Distributed Systems
http://www.cs.berkeley.edu/~yelick/294
CS294, Yelick Time, p2
Agenda• Concurrency models
– Partial orders, state machines,
• Correctness conditions– Serializability, …– Safety and liveness
• Clock synchronization• Bag of tricks for distributed alg’s
– Timestamps, markers,
CS294, Yelick Time, p3
Common Approach #1• Reasoning about a concurrent
system: partially order events in which you can observe the ordering– Messages, memory operations, events
• Consider all possible topological sorts to serialize
• Each of those serial “histories” must be “correct.”
CS294, Yelick Time, p4
Happens Before Relation• System is a set of processes• Each process is a sequence of
events (a total ordering) on events• Happens before, denoted ->, is
defined as the smallest relation s.t.:– 1) if a and b are in the same process,
and a is before b, then a -> b– 2) If a is a message send and b is the
matching receive, then a -> b– 3) if a->b and b->c then a->c
(transitive)
CS294, Yelick Time, p5
Happens Before Example
• What if processes are multithreaded?• How to we determine the events?
– Send message, receive message, what else?– What about read/write? Nonblocking opns?
• Does this help if we’re trying to reason about a program or an algorithm?
time
CS294, Yelick Time, p6
Common Approach #2• Reasoning about a concurrent
system: View the system as a state machine that produces serial executions:– Invocation/response events,
send/receive message (with explicit model of network between)
• Each of the global serial executions must be “correct.”
CS294, Yelick Time, p7
State Machine Approach• A distributed system is:
– A finite set of processes– A finite set of channels (network links)
• A channel is– Reliable, order-preserving, and with infinite
buffer space
• A process is: – A set of states, with one initial state– A set of events
• Models differ on specific details
CS294, Yelick Time, p8
Comparison of Models• Which approach (partial orders vs.
state machines) is better?• Is one lower level than the other?• Is one for shared memory and the
other for message?
CS294, Yelick Time, p9
Common Notions of Correctness
• Serializability, strong serializability• Sequential consistency, total store
ordering, weak ordering, …• Linearizability (used in wait-free)
• All of these are based on the idea that operations (transactions) must appear to happen in some serial order
• Which of these (or others) are useful?
CS294, Yelick Time, p10
Variations on Correctness• Why are there so many notions?
– Do all processes observe the same “serial” order, or can some see different views?
– Are the specifications of each operation deterministic? Can processes see different “correct” behaviors?
– Are all operations executed by a process ordered? E.g., read x -> read y doesn’t matter.
– Is the observed order consistent with real time?
CS294, Yelick Time, p11
Closed World Assumption• Most of these correctness ideas
assume that all communication is part of the system.
• Anomalies come from:– Phone calls between users– A second “external” network
• How does one prevent these anomalies?
CS294, Yelick Time, p12
Clock Condition• A logical clock assigns a timestamp C<a>
to each event a.• Clock Condition: For any events a, b:
– If a->b then C<a> < C<b>
• To implement a clock– Increment clock between each pair of events
within a process– When you send a message, append current
time– When you receive a message, update own clock
with max(timestamp, my-current-time) +1
CS294, Yelick Time, p13
Mutual Exclusion• Lamport defines a total order => as
the clock ordering with ties broken by PID
• “Clocks” are not synchronized a priori– Loosely synchronized within algorithm
• Uses for mutual exclusion algorithm– How useful is this algorithm?– Is the specification useful?
CS294, Yelick Time, p14
Clock Synchronization• Two notions of synchronization:
– External: Within some bound of UTC (Universal Coordinated Time)
– Internal: Within some bound of other clocks in the system
• Huge literature on clock synch. under various models (e.g., PODC).– Impossible in general due to message delays – Many algorithms synchronize clocks with high
probability (within some bound)
• Are synchronized clocks useful? For fault tolerance or in general?
CS294, Yelick Time, p15
Clock Synchronization Algorithms
• Christian [89] describes a centralized time server approach– Set time = t_server + t_round_trip/2– No fault tolerance to server failure
• Berkeley Algorithm [89]: – Master polls slaves and records round_trip
time to estimate local times – Averages “non faulty” clocks– Send delta (not new time) to slaves– Master can be re-elected
CS294, Yelick Time, p16
Clock Synchronization Algorithms
• Network Time Protocol (NTP) [Mills 95] meant for wide area:– Uses statistical techniques to filter clock data– Servers can reconfigure after faults– Protected (authentication used)
• Design:– Primary servers connected to radio clock– Secondary servers synchronize with primaries,
with distance from primary defining stata– Modes: Multicast, Procedure call, Symmetric– 1-10s of milliseconds accuracy reported
CS294, Yelick Time, p17
Stable Properties• Termination detection, garbage
detection, deadlock are stable properites: once true, they remain true “forever.”
• Snapshot algorithms are useful for stable properties.
• What properties related to faults are stable?
CS294, Yelick Time, p18
Safety and Liveness• A safety property says:
– informally: nothing bad ever happens – formally: any violation of a safety property can be
observed on a finite subsequence of the execution– this is a partial correctness condition
• A liveness property says:– informally: something good eventually happens– formally: it is a property of an infinite execution that
cannot be checked on finite subsequences– this is a total correctness condition (combined with
safety)
• What about real time constraints?
CS294, Yelick Time, p19
Snapshot Algorithm• Each process records is own state• Pair of processes that share a channel
coordinate to save its state
c
c’
p q
• Example: Single token conservation system• s0 = no token; s1 = has token• in-p --> snapshot p --> in c --> snapshot q
and channels by q => see 2 copies of token
CS294, Yelick Time, p20
Snapshot Algorithm• Idea: mark point in message stream
– P sends 1 marker after recording its state and before sending any message on channel c
– When q receives the marker it either• Records current state and c as being empty• Keeps previously recorded state and
records c as contain all messages received since recorded and after marker
– Works for n strongly connect processors
CS294, Yelick Time, p21
Related Algorithms• Dijkstra-Scholten termination
detection– Create an implicit spanning tree:
sender of your first message is your parent
– Keep track of children awaiting termination and signal parent when they (and you) are done.
CS294, Yelick Time, p23
Using State Machines to Model Parallel Systems
• A large parallel application is built from a set of communicating objects
• The correctness can be divided into two problems:– the correctness of each of the objects– the correctness of the system that uses the objects
• Both can be modeled as a state machine
Distributed hash table
Distributed task queue
CS294, Yelick Time, p24
Atomicity• In any attempt to reason about
concurrency, need to determine the level of atomicity– what is the smallest indivisible event?
• reads and writes to memory (usually) • basic arithmetic and conditional operations• message sends/receives
• Often too complex to do all reasoning at this level– group a set of low level events within a
function/method into a higher level atomic event
• Leads to multi-level, modular proofs
CS294, Yelick Time, p25
Specifications and Implementations
• The first step in reasoning about correctness is having a specification of desired behavior– what sequences of operations (inputs and
results) are legal?
• A specification and implementation may be written– in the same formal language– in two different languages
• Concentrate on the behaviors produced and avoid syntax here
CS294, Yelick Time, p26
Correctness of Serial Types•A sequential ADT may be specified by pre/post conditions on the operations– insert (s, e)
• pre: true• post: s = s + { e }
–remove (s, e)• pre: e in s• post: s = s - { e }
•An implementation is correct if, given any sequence of these operations, they meet the specifications
insert remove …
time
CS294, Yelick Time, p27
Linearizability• Taken from “Axioms for concurrent
objects” by Herlihy and Wing, TOPLAS, July 1990.
• Intuition behind correctness condition– each operation should appear instantaneous– order of nonconcurrent operations should be
preserved
• Using this intuition, examine some histories and determine if they are acceptable
• Example: a queue DT with – enqueue (E) and – dequeue (D) operations
CS294, Yelick Time, p28
Some Queue Histories• History 1, acceptable
• History 2, not acceptable
• History 3, not acceptable
q.E(x) A q.D(y) A q.E(z) A
q.E(y) B q.D(x) B
q.E(x) A q.D(y) A
q.E(y) B
q.E(x) A q.D(y) A
q.E(y) B q.D(y) B
CS294, Yelick Time, p29
Execution Model Details• A concurrent systems is a
– collection of sequential threads– communication through shared objects
• Each object has a– unique name– a type, which defines the set of legal operations
• Formally, each operation invocation is 2 events– an invocation event: <obj op(args*) A>– a response event: <obj term(res*) A>– where term is a termination condition: OK, exception,
etc.– an invocation and response match if they have the
same process name and the same object name
CS294, Yelick Time, p30
Histories• Assume invocation and response events are
totally ordered (although operation may overlap)• A history is a finite sequence of invocation and
response events• An invocation is pending in a history if no
matching response follows.• Given a history H, complete(H) is the maximal
subsequence of H consisting only of invocations and matching responses
• A history H is sequential if:– the first event in H is an invocation– each invocation (except possibly the last) is
immediately followed by a matching response
• A history that is not sequential is concurrent.
CS294, Yelick Time, p31
Ordering in Histories• A history H induces an irreflexive partial order <H
• op1 <H op2 iff the response for op1 appears before the invocation for op2 in H
• another presentation (non graphical) of the same history op1 op2 op1 < op2 & op1 < op4
op3 op4 op3 < op2 & op3 < op4
op1_invop3_invop1_resop3_resop2_invop4_invop2_resop4_res
CS294, Yelick Time, p32
Linearizability
• A process subhistory H|P (H at P) is the subsequence of all events in H whose process names are P
• Two histories H1 and H2 are equivalent if for all processes P, H1|P = H2|P
• A history H is linearizable if it can be extended (by appending zero or more response events) to a history H’ such that– L1: complete(H’) is equivalent to some legal
sequential history S– L2: <H subseteq <S
CS294, Yelick Time, p33
Observations on Linearizability
• Linearizability is stronger that sequential consistency
• Linearizability is composable, which is nice for building large systems of component objects
• Intuitively, linearizability states that operations must appear to take effect sometime between their invocation and response
• Too strong for large scale machines and distributed data structures?
CS294, Yelick Time, p34
Relaxed Consistency Model• In the Multipol project, we used a weaker
notion of correctness• Each operation was divided into an
invocation part and completion part– like put/sync and get/sync in Split-C– e.g., E.update_mesh & E.sync_update– e.g., T.insert_element * T.sync_insert
• This worked well for several example problems
• Data structures do not compose well– need more than 2 phases in some cases
CS294, Yelick Time, p35
Correctness• An implementation is a set of history in which
events of two objects– a representation object, REP– an abstract object, ABS
are interleaved in a constrained way:• For each history H in the implementation
– 1) the subhistories H|REP and H|ABS are well-formed– 2) for each processor P, each REP operation in H | P lies
within an abstract operation in H | P– e.g., q.enq_inv(x) lock(q.l), q.back++, q.dat[back]=x,
unlock(q.l) q.enq_res/OK
• An implementation is correct if H|ABS is linearizable
CS294, Yelick Time, p36
Safety and Liveness Examples• Examples of safety properties
– There are no race conditions– This queue is never empty– One thread never accesses another thread’s data– No arithmetic operation in the program ever overflows
• Examples of liveness properties– The program eventually terminates– The scheduler is “fair”: every enabled thread
eventually gets to run– The scheduler is “fair”: a thread enabled infinitely
often will get to run infinitely often– The set will eventually contain the grobner basis of the
input
CS294, Yelick Time, p37
Proving Safety and Liveness• Techniques for proving safety properties
are relatively straightforward– sometimes many states and cases for a particular
system– extends ideas from reasoning about sequential code
• Techniques for proving liveness are much more difficult– often want something stronger than liveness as well,
such as a time bound– some of the formal frameworks make proofs of
particular liveness properties easier, e.g., fairness– often ensure a liveness property using a stronger
safety property
CS294, Yelick Time, p38
Methods for Proving Correctness (Safety)
• In serial implementations, the subset of REP values that are legal representations of and ABS values are those that satisfy the representation invariant– I: REP -> BOOL, a predicate on REP values
• The meaning of a legal representation is defined by an abstraction function– A: REP -> ABS
• In sequential programs, the abstract operations may go through a set of states in which the invariant I is not true
CS294, Yelick Time, p39
Existence of Abstraction Functions
There are three cases that arise in trying to prove that an abstraction function exist in a concurrent system
• The function can be defined directly on the REP state• A history variable is needed to record a past event• A prophecy variable is needed record a future event
(Alternatively, you may use an abstraction relation.)
A(t’)A(t)
t t’
CS294, Yelick Time, p40
Example 1: Locked Queue• Given a queue implementation containing
– integers, back, front– an array of values, items– a lock, l
Enq = proc (q: queue, x: item) // ignoring buffer overflow
lock(l) i: int = q.back++ // allocate new slot q.items[i] = x // fill it unlock(l)Deq = proc (q: queue) returns (item) signals empty lock(l) if (back==front) signal empty else front++ ret: item = items[front] unlock (l) return(ret)
CS294, Yelick Time, p41
Simple Abstraction Function• The abstraction function maps the elements in
items items[front…back] to the abstract queue value
• Proof is straightward
• The lock prevents the “interesting” cases
CS294, Yelick Time, p42
Example 2: Statistical DB Type
• Given a “statistical DB” Spec with the operations– add(x): add a new number, x, to the DB– size(): report the number of elements in the DB– mean(): report the mean of all elements in the DB– variance(): report the variance of all elements in the DB
• Straightforward implementation keeps set of values
• A more compact one uses the three values only:– integer count, initially 0 // number of elements– float sum, initially 0 // sum of elements– float sumSquare, initially 0 // sum of squares of all
elements
• Implementations of operations are straightforward
CS294, Yelick Time, p43
Need for History Variables• The problem with verifying this example is that
the specification contains more information than the implementation
• Proof idea:– add a variable items to the representation state (for the
proof only)– the implementation may update items– it may not use items to compute operations
• The abstraction function, A, for the augmented DB simply maps the items field to the abstract state
• Need to prove the representation invariants:– count = items.size– sum = sum({x | x in items})– sumSquare = sum({x2 | x in items})
CS294, Yelick Time, p44
Example 2: Queue Implementation
• Given a queue implementation containing– an integer, back– an array of values, items
Enq = proc (q: queue, x: item) // ignoring buffer overflow
i: int = INC(q.back) // allocate new slot, atomic STORE(q.items[i], x) // fill it
Deq = proc (q: queue) returns (item) while true do range: int = READ(q.back) - 1 for i: int in 1.. range do x: item = SWAP (q.items[i], null) if x!= null then return
CS294, Yelick Time, p45
Queue Example Notes• There are several atomic operations defined
here– STORE, SWAP, INC– These may or may not be supported on given
hardware, which would change the proof
• The deq operation starts at the front end of the queue– slots already dequeued will show up at nulls– slots not yet filled will also be nulls– picks the first non-empty slot– will repeat scan until it finds an element, waiting for an
enqueue to happen if necessary
• Many inefficiencies, such as the lack of a head pointer. Example is to illustrate proof technique.
CS294, Yelick Time, p46
Need for Prophecy Variables•Representation invariant
– to be useful, these must be true at every point in an execution, I.e., the points observable by other concurrent operations
– sequential case was much weaker
•Abstraction functionEnq(x) AEnq(y) B INC(q.back) A for this execution, there is no way of OK(1) A defining an abstraction function without INC(q.back) B predicting the future, I.e., whether x or y OK(2) B will be dequeued first STORE(q.items[2], y) B OK() BOK() B
CS294, Yelick Time, p47
Abstraction Relation• An alternate approach to using history and
prophecy variables is to use and abstraction relation, rather than a function
• Queue example:
History Linearized values {[]}Enq(x) A {[], [x]}Enq(y) B {[], [x], [y], [x,y], [y,x]}OK() B {[y], [x,y], [y,x]}OK() A {[x,y], [y,x]}Deq() C {[x], [y], [x,y], [y,x]}OK(x) C {[y]}
CS294, Yelick Time, p48
Key idea in Queue Proof• Lemma: If Enq(x), Enq(y), Deq(x) and Deq(y) are
complete operations of H such that x’s Enq precedes y’s Enq, then y’s Deq does not precede x’s Deq.
• Proof: Suppose this is not true. Pick a linearization and let qi and qj be queue values following the Deq operation of x and y respectively. From the assumption that j < i, qj-1 = [y,…,x,…] which implies that y is enqueued before x, a contradiction.
CS294, Yelick Time, p49
Relaxed Consistency Model• In the Multipol project, we used a weaker
notion of correctness• Each operation was divided into an
invocation part and completion part– like put/sync and get/sync in Split-C– e.g., E.update_mesh & E.sync_update– e.g., T.insert_element * T.sync_insert
• This worked well for several example problems
• Data structures do not compose well– need more than 2 phases in some cases
CS294, Yelick Time, p50
Generalizing Linearizability to Relaxed Consistency
• Generalization to split-phase operations is straightforward
• Each logical operation has:– op_start– op_complete
• In the formal model, each of these have invocation and response events (like other operations)
• The total order <H is define by the completion of op_complete happening before the initiation of op_start
• Informally, the operation must take place atomically between two phases
• Additional generalization: different processes see different total orders.