+ All Categories
Home > Documents > Big Data II: Stream Processing and...

Big Data II: Stream Processing and...

Date post: 20-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
59
Big Data II: Stream Processing and Coordination CS 240: Computing Systems and Concurrency Lecture 22 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from A. Haeberlen.
Transcript
Page 1: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Big Data II: Stream Processing and Coordination

CS 240: Computing Systems and ConcurrencyLecture 22

Marco CaniniCredits: Michael Freedman and Kyle Jamieson developed much of the original material.

Selected content adapted from A. Haeberlen.

Page 2: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Single node– Read data from socket– Process

– Write output

2

Simple stream processing

Page 3: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Convert Celsius temperature to Fahrenheit– Stateless operation: emit (input * 9 / 5) + 32

3

Examples: Stateless conversion

CtoF

Page 4: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Function can filter inputs– if (input > threshold) { emit input }

4

Examples: Stateless filtering

Filter

Page 5: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Compute EWMA of Fahrenheit temperature– new_temp = ⍺ * ( CtoF(input) ) + (1- ⍺) * last_temp– last_temp = new_temp– emit new_temp

5

Examples: Stateful conversion

EWMA

Page 6: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• E.g., Average value per window – Window can be # elements (10) or time (1s)

– Windows can be disjoint (every 5s)

– Windows can be “tumbling” (5s window every 1s)

6

Examples: Aggregation (stateful)

Avg

Page 7: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

7

Stream processing as chain

AvgCtoF Filter

Page 8: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

8

Stream processing as directed graph

AvgCtoF Filter

KtoFsensortype 2

sensor type 1 alerts

storage

Page 9: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Enter “BIG DATA”

9

Page 10: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Large amounts of data to process in real time

• Examples– Social network trends (#trending)

– Intrusion detection systems (networks, datacenters)

– Sensors: Detect earthquakes by correlating vibrations of millions of smartphones

– Fraud detection • Visa: 2000 txn / sec on average, peak ~47,000 / sec

10

The challenge of stream processing

Page 11: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Tuple-by-Tupleinput ← readif (input > threshold) {

emit input }

Micro-batchinputs ← readout = []for input in inputs {

if (input > threshold) {out.append(input)

}}emit out

11

Scale “up”

Page 12: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Tuple-by-TupleLower Latency

Lower Throughput

Micro-batchHigher Latency

Higher Throughput

12

Scale “up”

Why? Each read/write is an system call into kernel. More cycles performing kernel/application transitions

(context switches), less actually spent processing data.

Page 13: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

13

Scale “out”

Page 14: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

14

Stateless operations: trivially parallelized

C F

C F

C F

Page 15: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Aggregations:– Need to join results across parallel computations

15

State complicates parallelization

AvgCtoF Filter

Page 16: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Aggregations:– Need to join results across parallel computations

16

State complicates parallelization

Avg

CtoF

CtoF

CtoF

SumCnt

SumCnt

SumCnt

Filter

Filter

Filter

Page 17: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Aggregations:– Need to join results across parallel computations

17

Parallelization complicates fault-tolerance

Avg

CtoF

CtoF

CtoF

SumCnt

SumCnt

SumCnt

Filter

Filter

Filter

- blocks -

Page 18: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

18

Parallelization complicates fault-tolerance

Avg

CtoF

CtoF

CtoF

SumCnt

SumCnt

SumCnt

Filter

Filter

Filter

- blocks -

Can we ensure exactly-once semantics?

Page 19: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Compute trending keywords– E.g.,

19

Can parallelize joins

Sum/ key

Sum/ key

Sum/ key

Sum/ key

Sort top-k

- blocks -

portion tweets

portion tweets

portion tweets

Page 20: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

20

Can parallelize joins

Sum/ key

Sum/ key top-k

Sum/ key

portion tweets

portion tweets

portion tweets

Sum/ key

Sum/ key

Sum/ key top-k

top-k

Sort

Sort

Sort

Hashpartitioned

tweets

1. merge2. sort3. top-k

Page 21: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

21

Parallelization complicates fault-tolerance

Sum/ key

Sum/ key top-k

Sum/ key

portion tweets

portion tweets

portion tweets

Sum/ key

Sum/ key

Sum/ key top-k

top-k

Sort

Sort

Sort

Hashpartitioned

tweets

1. merge2. sort3. top-k

Page 22: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

A Tale of Four Frameworks

1. Record acknowledgement (Storm)

2. Micro-batches (Spark Streaming, Storm Trident)

3. Transactional updates (Google Cloud dataflow)

4. Distributed snapshots (Flink)

22

Page 23: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Architectural components– Data: streams of tuples, e.g., Tweet = <Author, Msg, Time>– Sources of data: “spouts”– Operators to process data: “bolts”– Topology: Directed graph of spouts & bolts

23

Apache Storm

Page 24: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Multiple processes (tasks) run per bolt

• Incoming streams split among tasks– Shuffle Grouping: Round-robin distribute tuples to tasks– Fields Grouping: Partitioned by key / field – All Grouping: All tasks receive all tuples (e.g., for joins)

24

Apache Storm: Parallelization

Page 25: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Goal: Ensure each input “fully processed”

• Approach: DAG / tree edge tracking– Record edges that get created as tuple is

processed

– Wait for all edges to be marked done

– Inform source (spout) of data when complete; otherwise, they resend tuple

• Challenge: “at least once” means:– Bolts can receive tuple > once

– Replay can be out-of-order

– ... application needs to handle

25

Fault tolerance via record acknowledgement(Apache Storm – at least once semantics)

Page 26: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Spout assigns new unique ID to each tuple

• When bolt “emits” dependent tuple, it informs system of dependency (new edge)

• When a bolt finishes processing tuple, it calls ACK (or can FAIL)

• Acker tasks:– Keep track of all emitted edges and

receive ACK/FAIL messages from bolts. – When messages received about all edges

in graph, inform originating spout

• Spout garbage collects tuple or retransmits

• Note: Best effort delivery by not generating dependency on downstream tuples

26

Fault tolerance via record acknowledgement(Apache Storm – at least once semantics)

Page 27: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Split stream into series of small, atomic batch jobs (each of X seconds)

• Process each individual batch using Spark “batch” framework

– Akin to in-memory MapReduce

• Emit each micro-batch result

– RDD = “Resilient Distributed Data”

27

Apache Spark Streaming:Discretized Stream Processing

Spark

SparkStreaming

batches of X seconds

live data stream

processed results

Page 28: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

28

Apache Spark Streaming:Dataflow-oriented programming# Create a local StreamingContext with batch interval of 1 secondssc = StreamingContext(sc, 1)# Create a DStream that reads from network socketlines = ssc.socketTextStream("localhost", 9999)

words = lines.flatMap(lambda line: line.split(" ")) # Split each line into words

# Count each word in each batchpairs = words.map(lambda word: (word, 1)) wordCounts = pairs.reduceByKey(lambda x, y: x + y)

wordCounts.pprint()

ssc.start() # Start the computationssc.awaitTermination() # Wait for the computation to terminate

Page 29: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

29

Apache Spark Streaming:Dataflow-oriented programming# Create a local StreamingContext with batch interval of 1 secondssc = StreamingContext(sc, 1)# Create a DStream that reads from network socketlines = ssc.socketTextStream("localhost", 9999)

words = lines.flatMap(lambda line: line.split(" ")) # Split each line into words

# Count each word in each batchpairs = words.map(lambda word: (word, 1)) wordCounts = pairs.reduceByKeyAndWindow( lambda x, y: x + y,

lambda x, y: x - y, 3, 2)wordCounts.pprint()

ssc.start() # Start the computationssc.awaitTermination() # Wait for the computation to terminate

Page 30: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Can build on batch frameworks (Spark) and tuple-by-tuple (Storm)– Tradeoff between throughput (higher) and latency (higher)

• Each micro-batch may succeed or fail – Original inputs are replicated (memory, disk)– At failure, latest micro-batch can be simply recomputed (trickier if stateful)

• DAG is a pipeline of transformations from micro-batch to micro-batch– Lineage info in each RDD specifies how generated from other RDDs

• To support failure recovery:– Occasionally checkpoints RDDs (state) by replicating to other nodes– To recover: another worker (1) gets last checkpoint, (2) determines

upstream dependencies, then (3) starts recomputing using those usptream dependencies starting at checkpoint (downstream might filter)

30

Fault tolerance via micro batches(Apache Spark Streaming, Storm Trident)

Page 31: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Computation is long-running DAG of continuous operators

• For each intermediate record at operator– Create commit record including input record, state update, and

derived downstream records generated– Write commit record to transactional log / DB

• On failure, replay log to – Restore a consistent state of the computation– Replay lost records (further downstream might filter)

• Requires: High-throughput writes to distributed store

31

Fault Tolerance via transactional updates (Google Cloud Dataflow)

Page 32: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Rather than log each record for each operator, take system-wide snapshots

• Snapshotting:– Determine consistent snapshot of system-wide state

(includes in-flight records and operator state)– Store state in durable storage

• Recover:– Restoring latest snapshot from durable storage– Rewinding the stream source to snapshot point, and replay inputs

• Algorithm is based on Chandy-Lamport distributed snapshots, but also captures stream topology

32

Fault Tolerance via distributed snapshots(Apache Flink)

Page 33: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Use markers (barriers) in the input data stream to tell downstream operators when to consistently snapshot

Fault Tolerance via distributed snapshots(Apache Flink)

33

Page 34: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Coordination

Practical consensus

34

Page 35: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Lots of apps need various coordination primitives– Leader election– Group membership– Locks– Leases

• Common requirement is consensus but we’d like to avoid duplication– Duplicating is bad and duplicating poorly even worse– Maintenance?

35

Needs of distributed apps

Page 36: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• One approach– For each coordination primitive build a specific service

• Some recent examples– Chubby, Google [Burrows et al, USENIX OSDI, 2006]

• Lock service– Centrifuge, Microsoft [Adya et al, USENIX NSDI, 2010]

• Lease service

36

How do we go about coordination?

Page 37: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Alternative approach– A coordination service– Develop a set of lower level primitives (i.e., an API)

that can be used to implement higher-level coordination services

– Use the coordination service API across many applications

• Example: Apache Zookeeper37

How do we go about coordination?

Page 38: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• A “Coordination Kernel”– Provides a file system abstraction and API that

enables realizing several coordination primitives• Group membership• Leader election• Locks• Queueing• Barriers• Status monitoring

38

ZooKeeper

Page 39: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• In brief, it’s a file system with a simplified API• Only whole file reads and writes

– No appends, inserts, partial reads• Files are znodes; organized in hierarchical

namespace• Payload not designed for application data storage

but for application metadata storage• Znodes also have associated version counters

and some metadata (e.g., flags)

39

Data model

Page 40: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• CAP perspective: Zookeeper is CP– It guarantees consistency– May sacrifice availability under system partitions

• strict quorum based replication for writes• Consistency (safety)

– FIFO client order: all client requests are executed in order sent by client

• Matters for asynchronous calls– Linearizable writes: all writes are linearizable– Serializable reads: reads can be served locally by any

server, which may have a stale value

40

Semantics

Page 41: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Regular znodes– May have children– Explicitly deleted by clients

• Ephemeral znodes– May not have children– Disappear when deleted or when creator terminates

• Session termination can be deliberate or due to failure• Sequential flag

– Property of regular znodes– Children have strictly increasing integer appended to their

names41

Types of znodes

Page 42: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• create(znode, data, flags)– Flags denote the type of the znode:

• REGULAR, EPHEMERAL, SEQUENTIAL– znode must be addressed by giving a full path in all

operations (e.g., ‘/app1/foo/bar’)– returns znode path

• delete(znode, version)– Deletes the znode if the version is equal to the actual

version of the znode– set version = -1 to omit the conditional check (applies

to other operations as well)

42

Client API

Page 43: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• exists(znode, watch)– Returns true if the znode exists, false otherwise– watch flag enables a client to set a watch on the znode– watch is a subscription to receive an information from

the Zookeeper when this znode is changed– NB: a watch may be set even if a znode does not exist

• The client will be then informed when a znode is created• getData(znode, watch)

– Returns data stored at this znode– watch is not set unless znode exists

43

Client API (cont’d)

Page 44: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• setData(znode, data, version)– Rewrites znode with data, if version is the current

version number of the znode– version = -1 applies here as well to omit the condition

check and to force setData• getChildren(znode, watch)

– Returns the set of children znodes of the znode• sync()

– Waits for all updates pending at the start of the operation to be propagated to the Zookeeper server that the client is connected to

44

Client API (cont’d)

Page 45: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Some examples

45

Page 46: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Propose(v)create(“/c/proposal-”, “v”, SEQUENTIAL)

• Decide()C = getChildren(“/c”)Select znode z in C with smallest sequence numberv’ = getData(z)Decide v’

46

Implementing consensus

Page 47: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Clients initialized with the name of znode– E.g., “/config”

config = getData(“/config”, TRUE)

while (true)

wait for watch notification on “/config”

config = getData(“/config”, TRUE)

Note: A client may miss some configuration, but it will always “refresh” when it realizes the configuration is stale

47

Simple configuration management

Page 48: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Idea: leverage ephemeral znodes• Fix a znode “/group”• Assume every process (client) is initialized with its

own unique name and ID– What to do if there are no unique names?

joinGroup()create(“/group/” + name, [address,port], EPHEMERAL)

getMembers()getChildren(“/group”, false)

48

Group membership

Set to true to get notified about membership changes

Page 49: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Lock(filename)1: create(filename, “”, EPHEMERAL)

2: if create is successful

3: return //have lock

4: else

5: getData(filename,TRUE)

6: wait for filename watch

7: goto 1:

Release(filename) delete(filename)

49

A simple lock

Page 50: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Herd effect– If many clients wait for the lock they will all try to

get it as soon as it is released

• Only implements exclusive locking

50

Problems?

Page 51: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Lock(filename)

1: myLock = create(filename + “/lock-”, “”, EPHEMERAL & SEQUENTIAL)

2: C = getChildren(filename, false)

3: if myLock is the lowest znode in C then return

4: else

5: precLock = znode in C ordered just before myLock

6: if exists(precLock, true)

7: wait for precLock watch

8: goto 2:

Release(filename)

delete(myLock)

51

Simple Lock without Herd Effect

Page 52: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• The previous lock solves herd effect but makes reads block other reads

• How to do it such that reads always get the lock unless there is a concurrent write?

52

Read/Write Locks

Page 53: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

Write Lock(filename)1: myLock = create(filename + “/write-”, “”, EPHEMERAL & SEQUENTIAL)[...] // same as simple lock w/o herd effect

Read Lock(filename)1: myLock = create(filename + “/read-”, “”, EPHEMERAL & SEQUENTIAL)

2: C = getChildren(filename, false)

3: if no write znodes lower than myLock in C then return4: else

5: precLock = write znode in C ordered just before myLock6: if exists(precLock, true)

7: wait for precLock watch

8: goto 3:

Release(filename)

delete(myLock)

53

Read/Write Locks

Page 54: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

A brief look inside

54

Page 55: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

55

Zookeeper components

Writerequests

Requestprocessor

In-memoryReplicated

DB

DBCommit

log

Readrequests

ZABAtomic

broadcast

Tx

TxTx

Page 56: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Fully replicated– To be contrasted with partitioning/placement in

storage systems

• Each server has a copy of in-memory DB– Store the entire znode tree– Default max 1 MB per znode (configurable)

• Crash-recovery model– Commit log– + periodic snapshots of the database

56

Zookeeper DB

Page 57: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Used to totally order write requests– Relies on a quorum of servers (f+1 out of 2f+1)

• ZAB internally elects leader replica

• Zookeeper adopts this notion of a leader– Other servers are followers

• All writes are sent by followers to the leader– Leader sequences the requests and invokes ZAB

atomic broadcast

57

ZAB: a very brief overview

Page 58: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

• Upon receiving a write request – Leader calculates in what state system will be after the

write is applied– Transforms the operation in a transactional update

• Transactional updates are then processed by ZAB, DB– Guarantees idempotency of updates to the DB

originating from the same operation

• Idempotency important as ZAB may redeliver a message

58

Request processor

Page 59: Big Data II: Stream Processing and Coordinationweb.kaust.edu.sa/.../F17/slides/L22-big-data-2.pdf · (context switches), less actually spent processing data. 13 ... Discretized Stream

That’s allHope you enjoyed CS 240

Review session: Dec 6, in class

Final exam: Dec 10, 9AM-12PM, Bldg 9: Lecture Hall 1

59


Recommended