Go Stream
Matvey Arye, Princeton/CloudflareAlbert Strasheim, Cloudflare
Awesome CDN service for websites big & small
Millions of request a second peak
24 data centers across the globe
Data Analysis
– Customer facing analytics
– System health monitoring
– Security monitoring
=> Need global view
Functionality
• Calculate aggregate functions on fast, big data
• Aggregate across nodes (across datacenters)
• Data stored at different time granularities
Basic Design Requirements
1. Reliability – Exactly-once semantics
2. High Data Volumes
Our Environment
Source
Storage
Source
Stream processing
Basic Programming Model
Op
Op Op
Storage
Storage Op
OpStorage Op
OpOpStorage Op
Existing Systems
S4 The reliability model is not consistent
Storm Exactly-once-semantics requires batching
Reliability only inside the stream processing systemWhat if a source goes down? The DB?
The Need For End-to-End Reliability
Source Stream Proccessing Storage
When source comes back up where does it start sending data from?
If using something like Storm, need additional reliability mechanisms
The Takeaway
Need end-to-end reliability- Or -
Multiple reliability mechanisms
Reliability of stream processing not enough
Design of Reliability
• Avoid queuing because destination has failed– Rely on storage at the edges– Minimize replication
• Minimize edge cases
• No specialized hardware
Big Design Decisions
End-to-end reliability
Only transient operator state
Recovering From Failure
SourceI am starting a stream with you. What have you already seen from me?
StorageI’ve seen <X>
Source Okie dokie. Here is all the new stuff.
Tracking what you have seen
Store identifier for all items
Store one identifier for highest number1234
Tracking what you have seen
Store identifier for all itemsThe answer to what have I seen is hugeRequires lots of storage for IDs
Store one identifier for highest numberParallel processing of ordered data is tricky
1234
Tension between
ParallelizationHigh Volume Data
Ordering
Reliability
Go Makes This Easier
Language from Google written for concurrency
Goroutine I run code
Goroutine I run code
Channels send databetween Go routines
Most synchronization is done by passing data
Goroutine SchedulingChannels are FIFO queues that have a maximum capacity
So goroutine can be in 4 states:1. Executing Code 2. Waiting for a thread to execute code3. Blocking to receive data from a channel4. Blocking to send data to a channel
Scheduler optimizes assignment of goroutines to threads.
Efficient Ordering Under The Hood
12
34
Source distributes items to workers in a specific order
Reading from each worker:1. Read one tuple off the count
channel. Assign count to X2. Read X tuples of the result channel
Count of output tuples for each inputActual result tuples
Input tuple
Intuition behind design
Multiple output channels allows each worker towrite independently.
Count channel tells reader how many tuples to expect. Does not block except when result needed to satisfy ordering.
Judicious blocking allows scheduler to use blocking as a signal for which worker to schedule.
Throughput does not suffer
2 4 8 16 320
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
OrderedUnordered
Floating Point Operations (x1000)
Tupl
es p
er S
econ
d
The Big Picture - Reliability
• Source provide monotonically increasing ids– per stream
• Stream processor preserves ordering – per source-stream
• Central DB maintains mapping of:Source-stream => highest ID processed
Functionality of Stream Processor• Compression, serialization
• Partitioning for distributed sinks
• Bucketing– Take individual records and construct aggregates
• Across source nodes• Across time – adjustable granularity
• Batching– Submitting many records at once to the DB
• Bucketing and batching all done with transient state
Where to get the code
Stablehttps://github.com/cloudflare/go-stream
Bleeding Edgehttps://github.com/cevian/go-stream
Data Model
Streaming OLAP-like cubesUseful summaries of high-volume data
27
Cube Dimensions
01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
28
Cube Aggregates
(Count, Max)
bar.c
om/m
01:01:01
29
Updating A CubeRequest #1bar.com/m01:01:00
Latency: 90 ms
(0,0) (0,0) (0,0) (0,0) 01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
30
Map Request To CellRequest #1bar.com/m01:01:00
Latency: 90 ms
(0,0) (0,0) (0,0) (0,0) 01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
31
Update The AggregatesRequest #1bar.com/m01:01:00
Latency: 90 ms
(1,90) (0,0) (0,0) (0,0) 01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
32
Update In-Place
Request #2bar.com/m01:01:00
Latency: 50 ms
(2,90) (0,0) (0,0) (0,0) 01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
33
Cube Slice
01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
01:01:01
Tim
e
URL
…
01:01:58
01:01:59Slice
34
Cube Rollup
01:01:00
foo.
com
/r
foo.
com
/q
bar.c
om/n
bar.c
om/m
Tim
e
URL
URL: bar.com/*Time: 01:01:01
URL: foo.com/*Time: 01:01:01
35
Rich Structure
(5,90)(3,75)
(8,199)(21,40)
D
A
C
01:01:59
01:01:00
fo
o.
co
m/
r
fo
o.
co
m/
q
ba
r.
co
m/
n
bar.co
m/m
01:01:01
01:01:58
B
…
E
Cell URL Time
A bar.com/* 01:01:01
B * 01:01:01
C foo.com/* 01:01:01
D foo.com/r 01:01:*
E foo.com/* 01:01:*
Key Property
2 types of rollups
1. Across Dimensions2. Across Sources
We use the same aggregation function for bothPowerful conceptual constraints
Semantic properties preserved when changing the granularity of reporting
Where to get the code
Stablehttps://github.com/cloudflare/go-stream
Bleeding Edgehttps://github.com/cevian/go-stream