Data Streams: Lecture 15 13/8/2012
CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams
Kristin Tufte, David Maier
Data Streams: Lecture 15 2
How Soccer Players Would do Stream Joins Handshake Join
Evaluate window-based stream joins Highly parallelizable Implementation on multi-core machine and
FPGA Previous stream join execution
strategies Sequential execution based on operational
semantics
3/8/2012
Data Streams: Lecture 15 3
Let’s talk about stream joins
Join window of R with window of S Focus on sliding windows here
Scan, Insert, Invalidate How might I parallelize?
Partition and replicate Time-based windows vs. tuple-based
windows3/8/2012
Figure Credit: How Soccer Players Would do Stream Joins – Teubner,
Mueller, Sigmod 2011
Data Streams: Lecture 15 4
So, Handshake Join…
3/8/2012
Stream Join
Input A Input B
Handshake Join
Traditional Stream Join
Entering tuple pushes oldest tuple out
No central coordination Same semantics May introduce disorder
Parallelization needs partitioning; possibly replication
Needs central coordination
Figure Credit : How Soccer Players Would do Stream Joins – Teubner,
Mueller, Sigmod 2011
Data Streams: Lecture 15 5
Parallelization
Each core gets a segment of each window
Data flow: act locally on new data arrival and passing on data
Good for shared-nothing setups Simple communication – interact with
neighbors; avoid bottlenecks
3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –
Teubner, Mueller, Sigmod 2011
Data Streams: Lecture 15 6
Parallelization - Observations Parallelizes tuple-based windows and
non equi-join predicates As written, compares all tuples – could
hash at each node to optimize Note data transfer costs between cores
and each tuple is processed at each core
Soccer players have short arms, hardware is NUMA
3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –
Teubner, Mueller, Sigmod 2011
Data Streams: Lecture 15 7
Scalability Data flow + point-to-point
communication Add’l cores: larger window sizes or
reduce workload per core “directly turn any degree of parallelism
into higher throughput or larger supported window sizes”
“can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates”3/8/2012
Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Data Streams: Lecture 15 8
Encountering Tuples Item in either
window, encounters all current times in the other window
Immediate scan strategy
Flexible segment boundaries (cores)
Other local implementations
3/8/2012
Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Data Streams: Lecture 15 9
Handshake Join with Message Passing
Lock-step processing (tuple-based windows)
FIFO queues with message passing Missed join-pair
3/8/2012
Data Streams: Lecture 15 10
Two-phase forwarding Asymmetric
synchronization (replication on one core only)
Keep copies of forwarded tuples until ack received
Ack for s4 must be processed between r5 and r6
3/8/2012
Data Streams: Lecture 15 11
Load Balancing & Synchronization
3/8/2012
Even distribution not needed for correctness
Maintain mostly even-sized local S windows
Synch at pipeline ends to manage windows
Data Streams: Lecture 15 12
FPGA Implementation
Tuple-based windows that fit into memory
Common clock signal; lock-step processing
Nested-loops join processing3/8/2012
Data Streams: Lecture 15 13
Performance
3/8/2012
Scalability on Multi-Core CPU
Scalability on FPGAs; 8 tuples/window
Data Streams: Lecture 15 14
Before we move on…
Soccer joins focuses on sliding windows How would their algorithm and
implementation work for tumbling windows?
What if we did tumbling windows only?
3/8/2012
Query-Aware Partitioning for Monitoring Massive Network Data Streams OC-786 Networks
100 million packets/sec 2x40 Gbit/sec
Query plan partitioning Issues: “heavy” operators, non-uniform
resource consumption Data stream partitioning
3/8/2012 Data Streams: Lecture 15 15
Data Streams: Lecture 15 16
Let’s partition the data…
Computes packet summaries between src and dest for network monitoring
Round robin partitioning -> worst case a single flow results in n partial flows
3/8/2012
SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),
MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPort
And, we might want a HAVING…
Round robin partitioning -> no node can apply HAVING
CPU and network load on final aggregator is high
3/8/2012 Data Streams: Lecture 15 17
SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),
MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN
So, let’s partition better…
What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? Yeah! Nodes can compute and apply
HAVING locally … But, what if I have more than one
query? 3/8/2012 Data Streams: Lecture 15 18
SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),
MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN
Data Streams: Lecture 15 19
But I need to run lots of queries… Large number of simultaneous queries
are common (i.e. 50) Subqueries place different requirements
on partitioning Dynamic repartitioning for each query?
That’s what the parallel DBs do… Splitting 80 Gbit/sec -> specialized network
hardware Partition stream once and only once…
3/8/2012
Data Streams: Lecture 15 20
Partitioning Limitations Program partitioning in FPGAs
TCP fields (src, dest IP) - ok Fields from HTTP – not ok
Can’t re-partition every time the workload changes
3/8/2012
Data Streams: Lecture 15 21
Query-Aware Partitioning Analysis framework
Determine optimal partitioning Partition-aware distributed query
optimizer Takes advantage of existing partitions
3/8/2012
Query-Aware Partitioning Analysis framework
Determine optimal partitioning Partition-aware distributed query
optimizer Takes advantage of existing partitions
Compatible partitioning Maximizes amount of data reduction done
locally Formal definition of compatible partitioning Compatible partitioning – aggregations &
joins3/8/2012 Data Streams: Lecture 15 22
Data Streams: Lecture 15 23
GS Uses Tumbling Windows (only)
3/8/2012
SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP
SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.lenFROM PKT1 JOIN PKT2WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP
Time attribute is ordered (increasing)
Data Streams: Lecture 15 24
Query Example
3/8/2012
flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP
heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP
flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1
Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Data Streams: Lecture 15 25
Query Example
3/8/2012
flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP
heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP
flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1
Which partitioning scheme is optimal for each of the queries?
Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example
3/8/2012 Data Streams: Lecture 15 26
flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP
heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP
flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1
How to reconcile potentially conflicting partitioning requirements?
Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Data Streams: Lecture 15 27
Query Example
3/8/2012
flows: SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP
heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP
flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1
How can we use information about existing partitioning in a distributed query optimizer? Figure Credit: Query-Aware Partitioning for Monitoring Massive
Network Data Streams, Johnson, et al. SIGMOD 2008
Data Streams: Lecture 15 28
What if we could only partition on destIP?
3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive
Network Data Streams, Johnson, et al. SIGMOD 2008
Data Streams: Lecture 15 29
Partition compatibility
Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union
(srcIP, destIP, srcPort, destPort) can’t aggregate locally
3/8/2012
SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP
Data Streams: Lecture 15 30
Partition compatibility
Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union
(srcIP, destIP, srcPort, destPort) can’t aggregate locally
P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P3/8/2012
SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP
Data Streams: Lecture 15 31
Should we partition on temporal attributes? If we partition on temporal atts:
Processor allocation changes with time epochs
May help avoid bad hash fcns Might lead to incorrect results if using panes Tuples correlated in time tend to be
correlated on temporal attribute – bad for load balancing
Exclude temporal attr from partitioning
3/8/2012
Data Streams: Lecture 15 32
What partitionings work for aggregation queries?
Group-bys on scalar expressions of source input attr Ignore grouping on aggregations in lower-
level queries Any subset of a compatible partitioning is
also compatible
3/8/2012
SELECT expr1, expr2, .., exprn
FROM STREAM_NAMEWHERE tup_predicateGROUP BY temp_var, gb_var1, ..., gb_varm
HAVING group_predicate
Data Streams: Lecture 15 33
What partitionings work for join queries?
3/8/2012
Equality predicates on scalar expressions of source stream attrs Any non-empty subset of a compatible partitioning
is also compatible Need to reconcile partitioning of S and R
SELECT expr1, expr2, .., exprn
FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as RWHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates
Data Streams: Lecture 15 34
Now, multiple queries…
3/8/2012
tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort
flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP
{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}
{sc_exp(srcIP), sc_exp(destIP)}
{sc_exp(srcIP), sc_exp(destIP)}Result:
Data Streams: Lecture 15 35
Now, multiple queries…
3/8/2012
tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort
flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP
{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}
{sc_exp(srcIP), sc_exp(destIP)}
Fully compatible partitioning set likely to be empty
Partition to minimize cost of execution
Data Streams: Lecture 15 36
Query Plan Transformation
3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive
Network Data Streams, Johnson, et al. SIGMOD 2008
Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions
Main idea: partial aggregates (think panes)
Data Streams: Lecture 15 37
Performance
3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive
Network Data Streams, Johnson, et al. SIGMOD 2008