Winter is coming? Not if ZooKeeper is there!

Post on 13-Apr-2017

82 views 0 download

transcript

Winter is coming? Not if ZooKeeper is there!

Presented By : Joydeep Banik Roy Sr. Software Engineer

Cerner Corporation

Winter is coming?

Distributed System

“A distributed system is capable of exploiting the capacity of multiple processors by running

components, perhaps replicated, in parallel. A system might be distributed geographically for

strategic reasons, such as the presence of servers in multiple locations participating in a single

application.”- ZooKeeper

Distributed Process Coordination, O’Reilly

Fallacies of the Distributed System

o The network is reliable.o Latency is zero.o Bandwidth is infinite.o The network is secure.o Topology doesn't change.o There is one administrator.o Transport cost is zero.o The network is homogeneous.

Coordination

A coordination task is a task involving multiple processes for the purposes of cooperation or to regulate contention.

Examples: Master Election Crash detection Group membership management Metadata management

What is ZooKeeper?

“Distributed, open-source coordination service for distributed applications that exposes a simple API, like a file system API, that applications can build upon to implement higher level services for

synchronization, configuration maintenance, and groups and naming.”

/master “richman.com”

/worker• /worker/worker-1

“poorman.com”

/tasks• /tasks/task-1

“poor-to-rich.sh”

How it does : Shared Storage

Server 1

Server 2

Server 3

Server 4

(Leader)

Server 5

Client Library

Client Library

Client Library

Client Library

APPLICATION

Sessi

on 0x

AB Session 0x11Sessi

on 0x

2A

Session 0x10

/master “richman.com”

/worker• /worker/worker-1

“poorman.com”

/tasks• /tasks/task-1 “run-

cmd”

The ZooKeeper Data Model

ZooKeeper has a hierarchal name space.Each node in the namespace is called as a ZNode. Every ZNode has data (given as byte[]) and can have children. parent : “/zookeeper"|-- child1 : “/master"|-- child2 : “/workers"|-- child3 : “/tasks"`-- task-1 : “run cmd;"

ZNode properties: Maintains a stat structure with version

numbers for data changes, ACL changes and timestamps

Version number increases with changesData is read and written in its entirety

Znode Example: Simple Lock

/resource

Process1 Process2 Process3

/Lock ”PROCESS1”

Znode Example: Simple Lock

/Resource

Process2 Process3

/Lock ”PROCESS2”

Znode Example: Simple Lock

/Resource

Process3

/Lock ”PROCESS3”

ZNODE Types

Persistent exists till deleted

explicitly.

Ephemeral deleted once the

client session ends.

Sequential appends a

monotonically increasing counter to the end of path.

Watches and Notifications

Event – Execution of update to a znode

Watch – one time trigger associated with a znode

Notification – When a watch is triggered by an event it generates a notification

“ZooKeeper always pays its debts”

“One important guarantee of notifications is that they are delivered to a client before any

other change is made to the same znode”

ZooKeeper Guarantees

Sequential Consistency - Updates from a client will be applied in the order that they were sent.

Atomicity - Updates either succeed or fail. No partial results. Single System Image - A client will see the same view of the service

regardless of the server that it connects to. Reliability - Once an update has been applied, it will persist from that

time forward until a client overwrites the update. Timeliness - The clients view of the system is guaranteed to be up-to-

date within a certain time bound. Rather than watching stale data, a server will shut down and forse client to connect to another one with more recent image.

ZooKeeper is Simple

ZNODE OPERATIONS (API)

READ WRITEgetACL setACLexists create

getChildren deletegetData setData

SYNC() call

Example : Master-Worker

/master

/assign

/task

/worker /worker-1

/worker-1

/task-1

/task-1

/status DONE

ZooKeeper Recipes

● Configuration management – machines bootstrap config from a centralized source, facilitates simpler deployment/provisioning

● Naming service - like DNS, mappings of names to addresses

● Distributed synchronization - locks, barriers, queues

● Leader election - a common problem in distributed coordination

● Centralized and highly reliable (simple) data registry

Recipe #1 : Barriers

Used for Configuration management The clients want to read a configuration but the configuration is not yet

ready. Barrier blocks the processing of a set of nodes till a condition is met.

Therefore a /barrier znode is created. Client calls the ZooKeeper API's exists() function on the barrier node,

with watch set to true. If exists() returns false, the barrier is gone and the client proceeds Else, if exists() returns true, the clients wait for a watch event from

ZooKeeper for the barrier node.

Recipe #2 : Distributed Exclusive Lock

Assuming there are N clients trying to acquire a lock Clients creates an ephemeral, sequential znode under the

path /Cluster/_locknode_ Clients requests a list of children for the lock znode (i.e.

_locknode_) The client with the least ID according to natural ordering will

hold the lock. Other clients sets watches on the znode with id immediately

preceding its own id. This is done to avoid “The Herd Effect”. Periodically checks for the lock in case of notification. The client wishing to release a lock deletes the node, which

triggering the next client in line to acquire the lock.

ZK|---Cluster +---hadoopConfig +---memberships +---_locknode_ +---host1-HiveClient +---host2-Impala +---host3-YARN +--- … \---hostN-Crunch

Recipe #3 : Leader Election A znode, say “/leader/election-path" All participants of the election process create an ephemeral-sequential node on the

same election path. The node with the smallest sequence number is the leader. Each “follower” node listens to the node with the next lower seq. number Upon leader removal go to election-path and find a new leader or become the leader if

it has the lowest sequence number. Upon session expiration check the election state and go to election if needed. Applications may consider creating a separate znode to acknowledge that the leader

has executed the leader procedure.

Recipe #4 : Distributed Queue A znode /queue is created. Distributed clients create EPHEMERAL-SEQUENTIAL znodes by passing path name

ending in /queue- to create() Pathnames have the form /queue/queue-X where X is monotonically increasing number. If a single consumer takes items out of the queue, they will be ordered FIFO. The client calls getChildren() and process all queue nodes until exhausted. Guaranteed

to not miss anything as the nodes are ordered FIFO Priority Queues come with a small change.

Apache Curator

Lot more recipes available and open sourced by NetFlix. Visit http://curator.apache.org/ for more recipes and their

implementation.

Language Bindings

- ZooKeeper ships client libraries in: JavaCPerlPython

- Community contributed client bindings available for Scala, C#, Node.js, Ruby, ErLang, Go, Haskellhttps://cwiki.apache.org/ZOOKEEPER/zkclientbindings.html

Who uses ZooKeeper?

References

ZooKeeper : Distributed Process Coordination By Flavio Junqueira and Benjamin Reed https://zookeeper.apache.org/ It has some fabulous documentation! http://curator.apache.org/ Check out the recipes! Some really generous slides on slideshare like this one :

http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper And others…

Questions?

DON’T FORGET TO RATE THIS TALK

THANK YOU