Consistency and Replication

Consistency and Replication

Chapter 7

Most of the lecture notes are based on slides by Prof. Jalal Y. Kawash at Univ. of Calgary

Some notes are based on slides by Prof. Kenneth Chiu at SUNY Binghamton

I have modified them and added new slides

Giving credit where credit is due:

CSCE455/855 Distributed Operating Systems


Chapter 7

Part I Consistency Models

Reasons for Replication

• Reliability:– Mask failures– Mask corrupted data

• Performance:– Scalability (size and geographical)

• Examples:– Web caching– Horizontal server distribution

Cost of Replication

• Replicas must be kept consistentDilemma:1. Replicate data for better performance2. Modification on one copy triggers modifications

on all other replicas3. Propagating each modification to each replica

can degrade performance?

Consistency Issues – Access/Update Ratio

time

…Updates to the Web page

User accesses to the page

Consistency Model

• When and how the modifications are made = consistency model:

– Weak versus strong consistency model

Consistency Models (cont.)

The general organization of a logical data store, physically distributed and replicated across multiple processes.

Consistency Models (cont)

• A process performs a read operation on a data item, expects the operation to return a value that shows the result of the last write operation on that data

• No global clock difficult to define the last write operation• Consistency models provide other definitions• Different consistency models have different restrictions on the values

that a read operation can return

read1 read2

Summary of Consistency Models

a) Consistency models not using synchronization operations.b) Models with synchronization operations.

Consistency Description

Strict Absolute time ordering of all shared accesses matters.

Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time.

Causal All processes see causally-related shared accesses in the same order.

FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order.

(a)


Weak Shared data can be counted on to be consistent only after a synchronization is done

Release Shared data are made consistent when a critical region is exited

Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.

(b)

Framework for Consistency Partial and Total Orders

Let S be a set, and R S S• R is anti-reflexive if x S, (x,x) R• R is transitive if x, y, z S, if (x,y) R

and (y,z) R then (x,z) R • A PO is an anti-reflexive, transitive relation• A PO is denoted by (S,R)• xRy means (x,y) R • A TO is a PO (S,R) such that x, y S x

y, either xRy or yRx

Framework for Consistency Operations and Data Items

• Operations are either writes or reads • A write is denoted wp(x)v• A read is denoted rp(x)v• A read-write data item is the set of all

sequences <o1, o2, … on> such that1. Each oi is either a read or a write2. Each read returns the same value written by the

most recent preceding write in the sequence

Framework for Consistency Operations and Processes

• Each operation can be decomposed into two components:

– Invocation and response• wp(x)v: invocation = wp(x)v; response = empty• rp(x)v: invocation = rp(x)?; response = v• A process is a sequence of operation

invocations• A process computation is a sequence of

operations obtained by augmenting each invocation in the process by its response

Framework for Consistency Multiprocess Systems

• A (multiprocess) system (P,D) is a set of processes, P, and a set of data items, D, such that all operation invocations of processes in P are applied to items in D

• A (multiprocess) system (P,D) computation is a collection of process computations one for each process in P

Framework for Consistency Example

Program p:x = y

Process p:r(y)v?w(x)v?

Program q:y = x

Process q:r(x)v?w(y)v?

Process p Comp:r(y)5w(x)5

Process q Comp:r(x)0w(y)0

System (P,D):P = {p,q}D = {x,y}

System (P,D) Computation:p: r(y)5 w(x)5q: r(x)0 w(y)0

Framework for Consistency Program Order

• Define program order, denoted (O, <po), by o1<po o2 iff o2 follows o1 in p’s computation

Process p:r(y)v?w(x)v?

Process q:r(x)v?w(y)v?

Process p Comp:r(y)5w(x)5

Process q Comp:r(x)0w(y)0

• rp(y)5 <po wp(x)5

• rq(x)0 <po wq(y)0• All of program order for

the example

Program p:x = y

Program q:y = x

Framework for Consistency Consistency Models

• A consistency model is a set of constraints on system computations

• A system computation of (P,D) satisfies a consistency model CM if the computation meets all the constraints in CM

Relation of Consistency Models• Sequential All processes see all shared accesses in the

same order

• Strict Absolute time ordering of all shared accesses matters.

• For two consistency models CM1 and CM2 CM1 is stronger than CM2 if the constraints of CM1 imply those of CM2

– CM2 is weaker than CM1– Sequential consistency is weaker than strict consistency

Framework for Consistency – Validity • Given a set of operations O • O|w indicates all the write operations in O• O|r indicates all the read operations in O• O|p is the subset of O containing p’s

operations, for some process p• O|x is the subset of O containing operations

on x, for some data item x• Let (O,<) be a total order of O• (O,<) is valid if for each data item x, the

subsequence (O|x,<) is valid for x

Framework for Consistency Valid Total Orders

Computation:p: w(x)5 r(y)5 q: r(x)0 w(y)5 r(x)5

Valid Total Order: rq(x)0 wq(y)5 wp(x)5 rq(x)5 rp(y)5

x and y are initially 0

Valid for x: rq(x)0 wq(y)5 wp(x)5 rq(x)5 rp(y)5

Valid for y: rq(x)0 wq(y)5 wp(x)5 rq(x)5 rp(y)5

Invalid Total Order: wp(x)5 rq(x)0 wq(y)5 rq(x)5 rp(y)5

Sequential Consistency (SC)

• Two constraints: – the result of any execution is the same as if the operations

of all the processes were executed in some sequential order, and

– the operations of each individual process appear in this sequence in the order specified by its program

• Let O be the set of all the operations of a computation C of a system (P,D). Then, C satisfies SC if there is a valid total order (O,<) such that (O,<po) (O,<)

SC – Intuition

process …

All Data Items (the set D)

process process

Switch

FIFO Channels

Sequential Consistency – Example

• C satisfies SC if there is a valid total order (O,<) such that (O,<po) (O,<)

C1p: w(x)1 r(x)2q: r(x)1 w(x)2

C1 satisfies SC(O,<) = <wp(x)1, rq(x)1, wq(x)2, rp(x)2>(O,<po) = { (wp(x)1, rp(x)2), (rq(x)1, wq(x)2) }

Sequential Consistency – Examples

C2p: w(x)1 r(x)2q: w(x)2 r(x)1

C2 does not satisfy SC(O, <po) = { (wp(x)1, rp(x)2), (wq(x)2, rq(x)1) }<wp(x)1, wq(x)2, rp(x)2, rq(x)1> (is not valid)<wp(x)1, rq(x)1, wq(x)2, rp(x)2> (violates PO)

C3p: w(x)1 w(y)2q: r(y)2 r(x)0

Exercise: Does C3 satisfy SC?(x and y are initially 0)

Coherence [Goodman]

• SC per data item

• Let O be the set of all the operations of a computation C of a system (P,D). Then, C satisfies Coherence if for each x D there is a valid total order (O|x,<x) such that (O|x,<po) (O|x,<x)

Coherence – Intuition

process …

OneData Item

process process

OneData Item

One Data Item…

FIFO Channels

Coherence – Examples

C1p: w(x)1 r(x)2q: r(x)1 w(x)2

C1 satisfies Coherence(O|x,<x) = <wp(x)1, rq(x)1, wq(x)2, rp(x)2>

C2p: w(x)1 r(x)2q: w(x)2 r(x)1

C2 does not satisfy Coherence

C3p: w(x)1 w(y)2q: r(y)2 r(x)0

C3 satisfies Coherence but not SC

C4p: w(x)3 w(x)2 r(y)3q: w(y)3 w(y)1 r(x)3

Does C4 satisfy Coherence? SC?

SC versus Coherence• If Computation C satisfies SC, then it

satisfies Coherence + PO• If a Computation C satisfies Coherence, then

it does not necessarily satisfy SC– Proof: Computation C3 is an example

All Computations satisfying consistency model CM = C(CM)

C(Coherence)

C(SC)

C3

FIFO [Lipton & Sandberg]

• Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes

• When would we like to use FIFO consistency model

Review: SC – Intuition

process …


process process

Switch

FIFO Channels

Review: Coherence – Intuition

process …

OneData Item

process process

OneData Item

One Data Item…

FIFO Channels

FIFO – Intuition

process

All Data Items (D)

process

All Data Items (D)

process

All Data Items (D)

process

All Data Items (D)

FIFO Channels

FIFO [Lipton & Sandberg]

• Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes

• Let O be the set of all the operations of a computation C of a system (P,D). Then, C satisfies FIFO if for each p P there is a valid total order (O|p O|w,<p) such that (O|p O|w,<po) (O|p O|w,<p)

FIFO – Examples

C1p: w(x)1 r(x)2q: r(x)1 w(x)2

C1 satisfies FIFO (also SC and Coherence)(O|p O|w,<p) = <wp(x)1, wq(x)2, rp(x)2>(O|q O|w,<q) = <wp(x)1, rq(x)1, wq(x)2>

C2p: w(x)1 r(x)2q: w(x)2 r(x)1

C2 satisfies FIFO but not Coherence

C3p: w(x)1 w(y)2q: r(y)2 r(x)0

C3 satisfies Coherence but not SC nor FIFO

FIFO – Examples (cont)

C5p: w(x)3 w(x)1 w(y)2q: r(y)2 r(x)3

Does C4 satisfy FIFO? Coherence? SC?

Does C5 satisfy FIFO? Coherence? SC?


SC versus FIFO• If Computation C satisfies SC, does it satisfy

FIFO?

• If Computation C satisfies FIFO, does it satisfy SC?

SC versus FIFO• If Computation C satisfies SC, then it

satisfies FIFO• If a Computation C satisfies FIFO, then it

does not necessarily satisfy SC– Proof: Computation C4 is an example

C(FIFO)

C(SC)

C4


Coherence versus FIFO• If Computation C satisfies Coherence, does it satisfy

FIFO?

Coherence versus FIFO• If Computation C satisfies Coherence, then it does

not necessarily satisfy FIFO– Proof: Computation C5 is an example

C5p: w(x)3 w(x)1 w(y)2q: r(y)2 r(x)3

Coherence versus FIFO• If a Computation C satisfies FIFO, does it satisfy

Coherence?

Coherence versus FIFO• If a Computation C satisfies FIFO, then it does not

necessarily satisfy Coherence– Proof: Computation C2 is an example

C2p: w(x)1 r(x)2q: w(x)2 r(x)1

Coherence versus FIFO• There are computations that satisfy both Coherence

and FIFO, but not SC– Proof: Computation C4

C(Coherence) C(FIFO)C(SC)

C: satisfies FIFO and Coherence, but not SC


Weak Consistency• Consider Critical Section

– If a process is in a critical section, its intermediate results of operations are not necessarily propagated to others.

• Idea– Enforce consistency on a Group of Operations– Limit the time when consistency holds– Let programmer explicitly specify this

Synchronization Operations

• In addition to reads and writes, introduce synchp() operation, which

– synchronizes all local copies of the data store• Propagate local updates• Bring in other’s updates

Weak Consistency (cont.)• Three conditions

1. No operation on a synchronization variable is allowed to be performed until all previous writes have completed everywhere.

2. No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.

3. Accesses to synchronization variables associated with a data store, are sequentially consistent.

Weak Consistency (cont.)

Weak Consistency Not Weak ConsistencyP1 P2 P3 P1 P2 P3

W(x, a)W(x, b)W(y, c)

S2S1

S3

bR(x)cR(y)

cR(y)bR(x)

W(x, a)W(x, b)W(y, c)

S2S1

S3

aR(x)cR(y)

bR(x)cR(y)

Or aR(x)NilR(y)bR(x)

No operation on a synchronization variable is allowed to be performed until all previous writes have completed everywhere.

No read or write operation on data items are allowed to be performed until all previous operations to synchronization

variables have been performed.

WC – Example

C6p: w(x)3 s() q: r(x)0 s() w(y)1 s’() r(x)3m: w(x)5 r(y)1 s() r(x)3

• All of p, q, and m must agree on a total order of synch operations consistent with program order; for example:

<sq(), sp(), s’q(), sm()>• (O|p O|w O|s , <p) =

< wm(x)5, sq(), wp(x)3, sp(), wq(y)1, s’q(), sm() >• (O|q O|w O|s, <q) = < rq(x)0, wm(x)5, sq(), wp(x)3, sp(), wq(y)1, s’q(), sm(), rq(x)3>• (O|m O|w O|s, <m) = < wm(x)5, sq(), wp(x)3, sp(), wq(y)1, s’q(), rm(y)1, sm(), rm(x)3 >

Accesses to synchronization variables associated with a data store, are sequentially consistent.

Summary of Consistency Models

a) Consistency models not using synchronization operations.b) Models with synchronization operations.


Strict Absolute time ordering of all shared accesses matters.

Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp

Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time

Causal All processes see causally-related shared accesses in the same order.

FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order

(a)


Weak Shared data can be counted on to be consistent only after a synchronization is done

Release Shared data are made consistent when a critical region is exited

Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.

(b)

Weaker Models• Sometimes strong models are needed, if the result of

race conditions are very bad.– Banks

• Sometimes the result of races are just inefficiency, or inconvenience, etc.

• How strong is Orbitz’s model?– If it shows that a flight ticket with a certain price is

available, is it really?• One kind of weaker model is eventual consistency

– It eventually becomes consistent

Lazy Consistency Models• When updates are scarce• When updates are not conflicting

– Examples: DNS and WWW• Eventual Consistency (EC): Lazy propagation

of updates to all replicas– If no updates take place for a long time, all replicas

will become consistent– Cheap to implement– If a client always accesses the same replica, the

same or newer data will be read as time passes. EC works.

Eventual Consistency

• How well does EC work for mobile clients?• Client-centric is for this. Consistent for a single client.

Notation

• xi[t] is the version of x at local copy Li at time t.• Version xi[t] is the result of a series of write

operations at Li that took place since initialization. This is WS(xi[t]).

• If operations in WS(xi[t]) have also been performed at local copy Lj at a later time t2, we write WS(xi[t1];xj[t2]).

Monotonic Reads (1)• A data store is said to provide monotonic-read

consistency:– If a process reads the value of a data item x any

successive read operation on x by that process will always return that same value or a more recent value.

Monotonic Reads (2)

• A data store that provides monotonic reads consistency

• A data store that does not

Monotonic Writes (1)

• In a monotonic-write consistent store, the following condition holds:–A write operation by a process on a data item x is

completed before any successive write operation on x by the same process.

Monotonic Writes (2)

• A data store that provides monotonic writes consistency


Read Your Writes (1)• A data store is said to provide read-your-writes

consistency:– If the effect of a write operation by a process on data

item x will always be seen by a successive read operation on x by the same process.

• Suppose your web browser has a cache.– You update your web page on the server.– Do you have read-your-writes consistency?

Read Your Writes (2)

• A data store that provides read your writes consistency


Writes Follow Reads (1)• A data store is said to provide writes-follow-reads

consistency:– If a write operation by a process on a data item x

following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read.

Writes Follow Reads (2)

• A data store that provides writes follow reads consistency



Chapter 7

Part II Replica Management

& Consistency Protocols

Replica Management

• Replica-Server Placement• Replica Placement:

– Where is a replica placed?– When is a replica created?– Who creates the replica?

• How do we distribute updates between replicas?

– Update propagation

Types of Replicas

Permanent Replicas

• Initial set of replicas– Other replicas can be created from them– Small and static set

• Example: Web site horizontal distribution1. Replicate Web site on a limited number of

machines on a LAN– Distribute request in round-robin

2. Replicate Web site on a limited number of machines on a WAN (mirroring)– Clients choose which sites to talk to

Server-Initiated Replicas (1)

• Dynamically created at the request of the owner of the DS• Example: push-based caches

– Owners: web owners for CNN, Yahoo– Web hosting services: that provided by Akamai– Web hosting servers can dynamically create replicas close to the

demanding client• Need dynamic policy to create, migrate and delete replicas

http://www.akamai.com/


• One is to keep track of Web page hits– Keep a counter and access-origin list for each page

F Server Q

Server P


• Count access requests from different clients: cntQ(P, F), if it is more than ½ * requests for F at Q, then Q attempts to migrate F to P.

Client-Initiated Replicas

• These are caches– Temporary storage (expire fast)

• Managed by clients• Cache hit: return data from cache• Cache miss: load copy from original server• Kept on the client machine, or on the same

LAN• Multi-level caches

Multi-Level Caches

Design Issues for Update Propagation

• A process modified a replica. • Based on the consistency model supported, the

update is then propagated to all other replicas at its proper time

• How to carry out update propagation and make two replicas consistent with each other?

1. Propagate state or operation2. Pull or Push protocols3. Unicast or multicast propagation

State versus Operation Propagation (1)

1. Propagate a notification of update• Invalidate protocols

• When data item x is changed at a replica, it is invalidated at other replicas

• An attempt to read the item causes an “item-fault” triggering updating the local copy before the read can complete

• Uses little network bandwidth• When is good to use this distribution protocol?

• When read-to-write ratio is low or high?• Good when read-to-write ratio is low

State versus Operation Propagation (2)

When read-to-write ratio is high: 2. Transfer modified data or a log of changes

• High network bandwidth usage3. Propagate the update operation

• Each replica must have a process capable of performing the update

• Very low network bandwidth usage• Other trade-off between this two protocols?

Pull versus Push• Push: updates are propagated to other

replicas without solicitation– Typically, from permanent to server-initiated

replicas– Used to achieve a high degree of consistency

• Pull: A replica asks another for an update– Typically, from client-initiated replica– Inconsistent cache results in longer response time

Pull versus Push Protocols

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Issue Push-based Pull-based

State of server List of client replicas and caches None

Messages sent Update (and possibly fetch update later) Poll and update

Response time at client Immediate (or fetch-update time) Fetch-update time

76

Leases• Combined push and pull• A server promise

– push updates for a certain time– a lease expires => the client

• polls the server or requests a new lease– length of a lease?– different types of leases

• age based: {time to last modification}• renewal-frequency based: long-lasting leases to active users• state-space overhead: increasing utilization of a server =>

lower expiration times of new leases

Unicast versus Multicast• Unicast: a replica sends separate n-1 updates

to every other replica

• Multicast: a replica sends one update to multiple replicas

– Network takes care of multicasting– Can be cheaper– Suits push-based protocols

Consistency Protocols

• Actual implementation of consistency models• For instance, how to implement sequential

consistency?

• Whether primary copy exists or not– Primary-based protocols– Replicated-write protocols

Primary-Based Remote-Write Protocol

• Primary copy and backups for each data item• Read from local copy• Write to the (remote) primary server

– Update backups• Blocking vs. non-blocking update

Primary-Based Remote-Write Protocol (cont.)

The principle of primary-backup protocol.

Sequential Consistency

process …


process process

Switch

FIFO Channels



Sequential consistencyRead Your Writes




X W3’

Sequential consistencyRead Your Writes W3’ X

Primary-Based Local Write Protocol

• Primary copy and backups for each data item• Read from local copy• Move primary copy to local server and write to it• Update backups

Primary-Based Local-Write Protocol (cont.)

Primary-backup protocol in which the primary migrates to the process wanting to perform an update.

Example: Mobile PC primary server for items to be needed


• Which consistency protocol does DNS (Domain Name System) follows?

No Primary Copy Replicated-Write Protocols

• Active replication

• Quorum-based protocol

Ordering Guarantees

• Updates sent from different processes may be delivered in different orders at different sites

• Totally-ordered multicast

• Causally-ordered multicast

Ordering Guarantees

• The sequencer approach– All requests must be sent to a sequencer, where

they are given an identifier– The sequencer assigns consecutive increasing

identifiers as it receives requests– Requests arriving at sites are held back until

they are next in sequence

Active Replication

The problem of replicated invocations.

Active Replication (cont.)

Forwarding an invocation request from a replicated object.Returning a reply to a replicated object.

Network Partitions• Primary-based

– Remote-write protocol– Local-write protocol

• Replicated-write protocol– Totally-ordered multicast approach– Sequencer approach

• None of them work if network is partitioned!

Network Partitions

• Idea?– Use Majority

• Write• Read

• Read– Retrieve number of replicas in read quorum– Select the one with the latest version.– Perform a read on it

• Write– Retrieve number of replicas in write quorum.– Find the latest version and increment it.– Perform a write on the entire write quorum.

Well-known Solution: Quorum-Based Protocols

Quorum-Based Protocols

• N: Total #Replicas• NR: #Replicas in Read Quorum• NW: #Replicas in Write Quorum • Constraints:

1. NR + NW > N2. NW > N/2

Quorum-Based Protocols

Three examples of the voting algorithm for N = 12 replicas(a) A correct choice of read and write set(b) A choice that may lead to write-write conflicts(c) A correct choice, known as ROWA (read one, write all)

Date post:	15-Feb-2016
Category:	Documents
Upload:	rane
View:	37 times
Download:	0 times

Consistency and Replication

Documents