Kristin Tufte, David Maier

Data Streams: Lecture 15 13/8/2012

CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams

Kristin Tufte, David Maier

Data Streams: Lecture 15 2

How Soccer Players Would do Stream Joins Handshake Join

Evaluate window-based stream joins Highly parallelizable Implementation on multi-core machine and

FPGA Previous stream join execution

strategies Sequential execution based on operational

semantics

3/8/2012


Let’s talk about stream joins

Join window of R with window of S Focus on sliding windows here

Scan, Insert, Invalidate How might I parallelize?

Partition and replicate Time-based windows vs. tuple-based

windows3/8/2012

Figure Credit: How Soccer Players Would do Stream Joins – Teubner,

Mueller, Sigmod 2011


So, Handshake Join…

3/8/2012

Stream Join

Input A Input B

Handshake Join

Traditional Stream Join

Entering tuple pushes oldest tuple out

No central coordination Same semantics May introduce disorder

Parallelization needs partitioning; possibly replication

Needs central coordination

Figure Credit : How Soccer Players Would do Stream Joins – Teubner,

Mueller, Sigmod 2011


Parallelization

Each core gets a segment of each window

Data flow: act locally on new data arrival and passing on data

Good for shared-nothing setups Simple communication – interact with

neighbors; avoid bottlenecks

3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –

Teubner, Mueller, Sigmod 2011


Parallelization - Observations Parallelizes tuple-based windows and

non equi-join predicates As written, compares all tuples – could

hash at each node to optimize Note data transfer costs between cores

and each tuple is processed at each core

Soccer players have short arms, hardware is NUMA

3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –

Teubner, Mueller, Sigmod 2011


Scalability Data flow + point-to-point

communication Add’l cores: larger window sizes or

reduce workload per core “directly turn any degree of parallelism

into higher throughput or larger supported window sizes”

“can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates”3/8/2012

Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011


Encountering Tuples Item in either

window, encounters all current times in the other window

Immediate scan strategy

Flexible segment boundaries (cores)

Other local implementations

3/8/2012

Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011


Handshake Join with Message Passing

Lock-step processing (tuple-based windows)

FIFO queues with message passing Missed join-pair

3/8/2012


Two-phase forwarding Asymmetric

synchronization (replication on one core only)

Keep copies of forwarded tuples until ack received

Ack for s4 must be processed between r5 and r6

3/8/2012


Load Balancing & Synchronization

3/8/2012

Even distribution not needed for correctness

Maintain mostly even-sized local S windows

Synch at pipeline ends to manage windows


FPGA Implementation

Tuple-based windows that fit into memory

Common clock signal; lock-step processing

Nested-loops join processing3/8/2012


Performance

3/8/2012

Scalability on Multi-Core CPU

Scalability on FPGAs; 8 tuples/window


Before we move on…

Soccer joins focuses on sliding windows How would their algorithm and

implementation work for tumbling windows?

What if we did tumbling windows only?

3/8/2012

Query-Aware Partitioning for Monitoring Massive Network Data Streams OC-786 Networks

100 million packets/sec 2x40 Gbit/sec

Query plan partitioning Issues: “heavy” operators, non-uniform

resource consumption Data stream partitioning

3/8/2012 Data Streams: Lecture 15 15


Let’s partition the data…

Computes packet summaries between src and dest for network monitoring

Round robin partitioning -> worst case a single flow results in n partial flows

3/8/2012

SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),

MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPort

And, we might want a HAVING…

Round robin partitioning -> no node can apply HAVING

CPU and network load on final aggregator is high



MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN

So, let’s partition better…

What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? Yeah! Nodes can compute and apply

HAVING locally … But, what if I have more than one

query? 3/8/2012 Data Streams: Lecture 15 18


MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN


But I need to run lots of queries… Large number of simultaneous queries

are common (i.e. 50) Subqueries place different requirements

on partitioning Dynamic repartitioning for each query?

That’s what the parallel DBs do… Splitting 80 Gbit/sec -> specialized network

hardware Partition stream once and only once…

3/8/2012


Partitioning Limitations Program partitioning in FPGAs

TCP fields (src, dest IP) - ok Fields from HTTP – not ok

Can’t re-partition every time the workload changes

3/8/2012


Query-Aware Partitioning Analysis framework

Determine optimal partitioning Partition-aware distributed query

optimizer Takes advantage of existing partitions

3/8/2012

Query-Aware Partitioning Analysis framework

Determine optimal partitioning Partition-aware distributed query

optimizer Takes advantage of existing partitions

Compatible partitioning Maximizes amount of data reduction done

locally Formal definition of compatible partitioning Compatible partitioning – aggregations &

joins3/8/2012 Data Streams: Lecture 15 22


GS Uses Tumbling Windows (only)

3/8/2012

SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP

SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.lenFROM PKT1 JOIN PKT2WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP

Time attribute is ordered (increasing)


Query Example

3/8/2012

flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP

heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP

flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008


Query Example

3/8/2012




Which partitioning scheme is optimal for each of the queries?


Query Example





How to reconcile potentially conflicting partitioning requirements?



Query Example

3/8/2012

flows: SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP


flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

How can we use information about existing partitioning in a distributed query optimizer? Figure Credit: Query-Aware Partitioning for Monitoring Massive

Network Data Streams, Johnson, et al. SIGMOD 2008


What if we could only partition on destIP?

3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive



Partition compatibility

Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union

(srcIP, destIP, srcPort, destPort) can’t aggregate locally

3/8/2012



Partition compatibility

Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union

(srcIP, destIP, srcPort, destPort) can’t aggregate locally

P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P3/8/2012



Should we partition on temporal attributes? If we partition on temporal atts:

Processor allocation changes with time epochs

May help avoid bad hash fcns Might lead to incorrect results if using panes Tuples correlated in time tend to be

correlated on temporal attribute – bad for load balancing

Exclude temporal attr from partitioning

3/8/2012


What partitionings work for aggregation queries?

Group-bys on scalar expressions of source input attr Ignore grouping on aggregations in lower-

level queries Any subset of a compatible partitioning is

also compatible

3/8/2012

SELECT expr1, expr2, .., exprn

FROM STREAM_NAMEWHERE tup_predicateGROUP BY temp_var, gb_var1, ..., gb_varm

HAVING group_predicate


What partitionings work for join queries?

3/8/2012

Equality predicates on scalar expressions of source stream attrs Any non-empty subset of a compatible partitioning

is also compatible Need to reconcile partitioning of S and R

SELECT expr1, expr2, .., exprn

FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as RWHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates


Now, multiple queries…

3/8/2012

tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort

flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP

{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}

{sc_exp(srcIP), sc_exp(destIP)}

{sc_exp(srcIP), sc_exp(destIP)}Result:


Now, multiple queries…

3/8/2012

tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort

flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP

{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}

{sc_exp(srcIP), sc_exp(destIP)}

Fully compatible partitioning set likely to be empty

Partition to minimize cost of execution


Query Plan Transformation



Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions

Main idea: partial aggregates (think panes)


Performance



Date post:	23-Feb-2016
Category:	Documents
Upload:	bryson
View:	43 times
Download:	0 times

Kristin Tufte, David Maier

Documents