+ All Categories
Home > Documents > ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf ·...

ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf ·...

Date post: 15-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
33
ZooKeeper CSCE 678
Transcript
Page 1: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

ZooKeeper

CSCE 678

Page 2: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Weak vs Strong Consistency

• In distributed systems, consistency is often the target of weakening

• Examples of weak consistency:• NoSQL servers – Dynamo• Datanodes in HDFS

• But sometimes, strong consistency is needed

2

Page 3: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Use Cases of Strong Consistency

• Configuration management

• Message Queues

• Group membership

• Synchronization:• Mutexes and read/write locks• Barriers: process joins

3

(Will talk about them one by one)

Page 4: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Use Case: Configuration & Group Membership

4

Primary[Configuration]Primary: …Secondaries: …Port IDs: …Created by: …Push

Page 5: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Use Case: Message Queues

5

ProducerA

ProducerB

Consumer

enqueue

enqueue

dequeue

Page 6: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Use Case: Synchronization

• Mutexes

6

• Read/write locks

lock();

last_x = read(x);write(y, last_x);write(x, last_x + 1);

unlock();

rd_lock();

if (queue.size > 0) {wr_lock();x = queue.dequeue();wr_unlock();

}

rd_unlock();

Shared among readers

Exclusive for one writer

Page 7: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

How to Define Strong Consistency?

• Linearizability• Also called atomic consistency or immediate consistency• As soon as a write operation finishes, the whole system

should see the latest data.

7

Page 8: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Serializability vs Linearizability

• For distributed systems, these two words are used in very specific contexts

• Serializability: for databases• Equivalent to Serializable Isolation (I) in ACID• No ordering for concurrent transactions

• Linearizability: for strongly-consistent reads/writes• In respect to operations, not transactions• Considering the global ordering of reads/writes

8

Page 9: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Serializability vs Linearizability

• Serializability

9

Client A

Client B

read(x), write(y), read(y)

read(y), write(x), write(y) read(y), read(x), write(y)

System read(x), write(y), read(y) read(y), write(x), write(y) read(y), read(x), write(y)

Transactions

No constraint of ordering between concurrent transactions.

(from A) (from B) (from B)

time

Page 10: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Serializability vs Linearizability• Actually, serializability can be implemented by linearizability

(i.e., a lock service)

10

Server 1

Server 2

read(x), write(y), read(y)

lock(x)

lock(y)

read (shared) lock

write (exclusive) lock

read(y), write(x), write(y)

blocked

ok ok

ok ok

Two-phase lock (2PL) Irrelevant to two-phase commit (2PC)

Page 11: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Serializability vs Linearizability

• LinearizabilityAs soon as a write operation finishes, the whole system should see the latest data.

11

Client A write(x, 1) => oktime

Server 1

Server 2insert

Client B read(x) => 1

Page 12: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

More on Linearizability

• Reads concurrent with a write

12

Client Atime

Client B

read(x) => 0

Client C write(x, 1) => ok

read(x) => 1

read(x) => 1

Any read that begins before the writemust see the old value.

Any read that begins AFTER the write may see old or new value; but as soon as one client see the new value, all other clients should too.

Page 13: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

More on Linearizability

• Write concurrent with another write

13

Client Atime

Client B

Client C

write(x, 1) => ok

read(x) => 1

Any write begins before another writemust be committed first.

write(x, 2) => ok

read(x) => 2

Page 14: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Linearizability vs Causality• Linearizability implies causal consistency

14

(1) Causality is based on happens-before relationshipin a client

If client sets x = 1 and sets y = 0, then the two values have causal relationship.

Client Atime

Client B

write(y, 0)write(x, 1)

read(y) => 0

Causality

read(x) => 0

Violate causal relationship!

read(x) => 0

Page 15: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

15

Linearizability vs Causality• Linearizability implies causal consistency

(2) Linearizability implies total ordering of each object, soautomatically preserves happens-before.

A linearizable system doesn’t have to do anything to preserve causal consistency.

Client Atime

Client B

write(y, 0)write(x, 1)

read(y) => 0 read(x) => 1

read(x) => 0

Total order

Page 16: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

How to Implement Linearizability?

• Read/write from single leader• Failover to a replica may lose linearizability

• Consensus algorithms• Using two-phase commits (2PC) or Paxos• Prevent stale replicas

• Most likely unlinearizable: multi-leader replication

16

Q: How does linearizability apply to CAP Theorem?

Page 17: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Linearizability in CAP Theorem

• Linearizability = Strong C• Read/write through single leader: lose A & P• Read through replicas, write through leader: lose P

• With consensus, linearizability can be:• Partition tolerant when ½ of replicas are connected• Fair availability with wait-free operations and fast,

lossless leader recovery

17

Page 18: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Apache ZooKeeper

• A coordination service for all use cases of strong consistency in a distributed system

• Wait-free: any operation will not block on other slow or failed clients

• ZooKeeper has no API for locking, but can be used to implement any locking mechanism

18

Page 19: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

System Overview

19

ZooKeeper Service

ServerServerServer Server Server

Follower LeaderFollower Follower Follower

Client Client Client Client Client Client Client

Forward operations

Page 20: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Namespace

• ZooKeeper uses a filesystem-like namespace

20

/

/App1 /App2

/App1/p_1 /App1/p_2 /App1/p_3

znodes

Data Data Data Not large data files

(more like metadata)

Data

Page 21: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Namespace

• ZooKeeper has two types of znodes (paths)• Permanent (regular): clients explicitly create and delete

the znodes• Ephemeral: clients create the znodes, and either delete

them explicitly or let the system automatically deletes them when client sessions timeout.

21

Page 22: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

API• create(path, data, flags)

• flags: regular/ephemeral, sequential (appending a seqnum)

• getData(path, watch) -> (data, version)

• setData(path, data, version)

• delete(path, version)

• exists(path, watch) -> true/false

• getChildren(path, watch) -> [paths]

• sync(path) path is ignored now

22

Page 23: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Asynchronous Operations• Operations can be synchronous or asynchronous

• Client can queue up multiple asynchronous operations

• Server responds by invoking callbacks

23

ZooKeeper Se

ServerServerServer

Follower LeaderFollower

Forward operati

Client

getData(x) create(y) …

callback for getData(x) callback for create(y) …

Page 24: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Event Notification

• All operations except sync are wait-free

• No locking API but clients can implement locks using watch events

24

1 l = “/my-lock”;2 if exists(l, watch=true) then wait for watch event;3 n = create(l, EPHEMERAL);4 if n is error then goto 2;

1 delete(l);

Lock (very naïve version)

Unlock

Server will push events to clientwhen the path is updated

Page 25: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

A(asynchronous)-Linearizability

• Local order: all operations from the same client are processed FIFO (first-in-first-out)

• Global order: Linearizable writes

25

ServerClient A

ServerClient B

write Op 1, write Op 2, …

write Op 1, write Op 2, …

Some consensus to ensurethe write ops are appliedexactly as their global order.

Page 26: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

A(asynchronous)-Linearizability• Reads are not linearized (may not see latest state)

• Read directly from server (identical replicas)• Other servers may have pending writes in queues• Solution: sync after writes

26

ServerClient A

ServerClient B

write Op 1, write Op 2, sync

read Op 1, read Op 2, …

After sync, all clients willsee the changes of priorwrites from client A.

Page 27: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Zab (Atomic Broadcast)

• All servers forward messages to a single leader• Important role as a sequencer• Leader can change if partitioned or failed

• The leader broadcasts (proposes) the messages to be delivered to all followers

27

ZooKeeper Service

ServerServerServer Server Server

Follower LeaderFollower Follower Follower

Forward operations

Page 28: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Zab (Atomic Broadcast)

• Two-phase commit (2PC)

28

Leader

Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request

Follower Follower

Page 29: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Zab (Atomic Broadcast)

• Two-phase commit (2PC)

29

Leader

Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request

Follower Follower

Step 2. Propose the changewith zxid to followers

Page 30: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Zab (Atomic Broadcast)

• Two-phase commit (2PC)

30

Leader

Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request

Follower Follower

Step 2. Propose the changewith zxid to followers

Step 3. Followersacknowledge theproposal.

Q: In what situation maythe follower reject the proposal?

Page 31: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Zab (Atomic Broadcast)

• Two-phase commit (2PC)

31

Leader

Request, e.g., write(x, 1)Step 1. assign a monotonicallyincreasing id (zxid) to the request

Follower Follower

Step 2. Propose the changewith zxid to followers

Step 3. Followersacknowledge theproposal.

Step 4. Leader commitsthe change if receives morethan ½ of acks (votes).

Page 32: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Leader Election

• If a server finds the leader disconnected or failed, it tries to become the leader (same 2PC protocol).

32

LeaderCandidate

Follower Follower

Becomes the leader whenreceives ½ of votes.

If followers receive multiple proposal,they vote for the candidate with the highest zxid.

Page 33: ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

References

• “ZooKeeper: Wait-free coordination for Internet-scale systems,” USENIX ATC ‘10 (by Hunt et al.)

• Zab protocol: “A simple totally ordered broadcast protocol”, LADIS ’08 (by Reed and Junqueira)

• “Linearizability: A Correctness Condition for Concurrent Objects”, TOPLAS 1990 (by Herlihy and Wing)

• “Designing Data-Intensive Applications”, O’Reilly 2017(by Martin Kleppmann)

33


Recommended