8/2/2019 17 Stream Intro
1/71
An Introduction To Data Stream Query Processing
Neil Conway
Amalgamated Insight, Inc.
May 24, 2007
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 1 / 45
http://find/8/2/2019 17 Stream Intro
2/71
Outline
1 The Need For Data Stream Processing
2 Stream Query Languages
3 Query Processing Techniques For StreamsSystem ArchitectureShared EvaluationAdaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen Source
Proprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 2 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
3/71
Outline
1 The Need For Data Stream Processing
2 Stream Query Languages
3 Query Processing Techniques For StreamsSystem ArchitectureShared EvaluationAdaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen Source
Proprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 3 / 45
http://find/8/2/2019 17 Stream Intro
4/71
The Need For Data Stream Processing
Whats wrong with database systems?
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45
http://find/8/2/2019 17 Stream Intro
5/71
The Need For Data Stream Processing
Whats wrong with database systems?
Nothing, but they arent the right solutionto every problem
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45
http://find/8/2/2019 17 Stream Intro
6/71
The Need For Data Stream Processing
Whats wrong with database systems?
Nothing, but they arent the right solutionto every problem
What are some problems for which
a traditional DBMS is an awkward fit?
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 4 / 45
http://find/8/2/2019 17 Stream Intro
7/71
Financial Analysis
Electronic trading is now commonplaceTrading volume continues to increase rapidly
Algorithmic trading: detect advantageous market conditions,automatically execute trades
Latency is key
VisualizationA hard problem in itself
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 5 / 45
http://find/8/2/2019 17 Stream Intro
8/71
Financial Analysis
Electronic trading is now commonplaceTrading volume continues to increase rapidly
Algorithmic trading: detect advantageous market conditions,automatically execute trades
Latency is key
VisualizationA hard problem in itself
Typical Queries
5-minute rolling average, volume-waited average price (VWAP)Comparison between sector averages and portfolio averages over time
Implement models provided by quantitive analysis
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 5 / 45
http://find/8/2/2019 17 Stream Intro
9/71
Network Monitoring
Network volume continues to increase rapidly
Custom solutions are possible, but roll-your-own is expensive
Ad-hoc queries would be nice
Can we build generic infrastructure for these kinds of monitoringapplications?
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 6 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
10/71
Sensor Networks
Pervasive Sensors
As the cost of micro sensors continues to decline over the next decade,we could see a world in which everything of material significance getssensor-tagged. Mike Stonebraker
Military applications: real-time command and control
Healthcare
Habitat monitoring
Manufacturing
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 7 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
11/71
Other Examples
Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow
Business Activity Monitoring (BAM)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
12/71
Other Examples
Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow
Business Activity Monitoring (BAM)
Fraud DetectionSophisticated, cross-channel fraud
Real-time
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
13/71
Other Examples
Real-Time Decision SupportTurnaround-time for traditional data warehouses is often too slow
Business Activity Monitoring (BAM)
Fraud DetectionSophisticated, cross-channel fraud
Real-time
Online Gaming
Detect malicious behavior
Monitor quality of service
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 8 / 45
http://find/8/2/2019 17 Stream Intro
14/71
Data Stream Management Systems
Database SystemsMostly static data, ad-hoc one-time queries
Fire the queries at the data, return result sets
Store and query
Focus: concurrent reads & writes, efficient use of I/O, maximizetransaction throughput, transactional consistency, historical analysis
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 9 / 45
http://find/http://goback/8/2/2019 17 Stream Intro
15/71
Data Stream Management Systems
Database SystemsMostly static data, ad-hoc one-time queries
Fire the queries at the data, return result sets
Store and query
Focus: concurrent reads & writes, efficient use of I/O, maximizetransaction throughput, transactional consistency, historical analysis
Data Stream Systems
Mostly transient data, continuous queriesFire the data at the queries, incrementally update result streams
Data rates often exceed disk throughput
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 9 / 45
( )
http://find/8/2/2019 17 Stream Intro
16/71
Complex Event Processing (CEP)
Data stream processing emerged from the database community
Early 90s: active databases with triggers
Complex Event Processing is another approach to the same problemsDifferent nomenclature and backgroundOften similar in practice
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 10 / 45
O li
http://find/8/2/2019 17 Stream Intro
17/71
Outline
1 The Need For Data Stream Processing
2 Stream Query Languages
3 Query Processing Techniques For StreamsSystem ArchitectureShared Evaluation
Adaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen Source
Proprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 11 / 45
D S
http://goforward/http://find/http://goback/8/2/2019 17 Stream Intro
18/71
Data Streams
A stream is an infinite sequence of tuple, timestamp pairs
Append-onlyNew type of database object
The timestamp defines a total order over the tuples in a streamIn practice: require that stream tuples have a special CQTIME column
Different approaches to building stream processing systems
This talk: relation-oriented DSMS. Specifically, TelegraphCQ,AmInsight, StreamBase, . . .
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 12 / 45
CREATE STREAM
http://find/8/2/2019 17 Stream Intro
19/71
CREATE STREAM
Exactly 1 column must have a CQTIME constraint
CQTIME can be system-generated or user-provided
With user-provided timestamps, system must cope with out-of-ordertuples
Slack specifies maximum out-of-orderness
Example Query
CREATE STREAM trades (
symbol varchar(5),
price real,
volume integer,
tstamp timestamp CQTIME USER GENERATED SLACK 1 minute
) TYPE UNARCHIVED;
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 13 / 45
T f St
http://find/http://goback/8/2/2019 17 Stream Intro
20/71
Types of Streams
Raw StreamsStream tuples are injected into the system by an external data source
E.g. stock tickers, sensor data, network interface, . . .
Both push and pull models have been explored
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 14 / 45
T f St
http://find/http://goback/8/2/2019 17 Stream Intro
21/71
Types of Streams
Raw StreamsStream tuples are injected into the system by an external data source
E.g. stock tickers, sensor data, network interface, . . .
Both push and pull models have been explored
Derived Streams
Defined by a query expression that yields a stream
Archived Streams
Allows historical and real-time stream content to be combinedin a single database object
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 14 / 45
Language Design Philosophy
http://find/http://goback/8/2/2019 17 Stream Intro
22/71
Language Design Philosophy
Pragmatism: relational query languages are well-establishedRelational query evaluation techniques are well-understoodEveryone knows SQL
Therefore, add stream-oriented extensions to SQL
Pioneering work: CQL from Stanford STREAM project
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 15 / 45
Language Design Philosophy
http://find/http://goback/8/2/2019 17 Stream Intro
23/71
Language Design Philosophy
Pragmatism: relational query languages are well-establishedRelational query evaluation techniques are well-understoodEveryone knows SQL
Therefore, add stream-oriented extensions to SQL
Pioneering work: CQL from Stanford STREAM project
Kinds Of Operators
Relation Relation: Plain Old SQL
Stream Relation: Periodically produce a relation from a stream
Relation Stream: Produce stream from changes to a relation
Note that S S operators are not provided.
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 15 / 45
Continuous Queries
http://find/http://goback/8/2/2019 17 Stream Intro
24/71
Continuous Queries
Fundamental Difference
The result of a continuous query is an unbounded stream, not a finiterelation
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 16 / 45
Continuous Queries
http://find/http://goback/8/2/2019 17 Stream Intro
25/71
Continuous Queries
Fundamental Difference
The result of a continuous query is an unbounded stream, not a finiterelation
Typical Query
1 Split infinite stream into pieces via windowsS R
2 Compute analysis for the current window, comparison with priorwindows or historical data
R R
3 Convert result of analysis into result stream
R SOften implicit (use defaults)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 16 / 45
Stream Relation Operators: Windows
http://find/http://goback/8/2/2019 17 Stream Intro
26/71
Stream Relation Operators: Windows
Streams are infinite: at any given time, examine a finite sub-setApply window operator to stream to periodically produce
visible sets of tuples
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 17 / 45
Stream Relation Operators: Windows
http://find/8/2/2019 17 Stream Intro
27/71
Stream Relation Operators: Windows
Streams are infinite: at any given time, examine a finite sub-setApply window operator to stream to periodically produce
visible sets of tuples
Properties of Sliding Windows
Range: Width of the window. Units: rows or time.
Slide: How often to emit new visible sets. Units: rows or time.
Start: When to start emitting results.
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 17 / 45
Example Query
http://find/http://goback/8/2/2019 17 Stream Intro
28/71
Example Query
Description
Every second, return the total volume of trades in the previous second.
Query
SELECT sum(volume) AS volume,
advance_agg(qtime) AS windowtime
FROM trades < VISIBLE 1 second ADVANCE 1 second >
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 18 / 45
Another Example
http://find/8/2/2019 17 Stream Intro
29/71
Another Example
Description
Every 5 seconds, return the volume-adjusted price of MSFT for the last 1minute of trades.
QuerySELECT sum(price * volume) / sum(volume) AS vwap,
sum(volume) AS volume,
advance_agg(qtime) AS windowtime
FROM trades < VISIBLE 1 minute ADVANCE 5 seconds >
WHERE symbol = MSFT
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 19 / 45
More About Windows
http://find/http://goback/8/2/2019 17 Stream Intro
30/71
More About Windows
AggregationUseful aggregate: advance agg(CQTIME)
Timestamp that marks the end of the current window
Similar aggregates for beginning of window, middle of window
might also be useful
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 20 / 45
More About Windows
http://find/http://goback/8/2/2019 17 Stream Intro
31/71
More About Windows
AggregationUseful aggregate: advance agg(CQTIME)
Timestamp that marks the end of the current window
Similar aggregates for beginning of window, middle of window
might also be useful
Other Window Types
Landmark: Fixed left edge, elastic right edge. Periodically reset.(All stock trades after 9AM today.)
Partitioned: Divide stream into sub-streams based on partitioning key(s),then apply another S R operator to the sub-streams.
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 20 / 45
Relation Stream Operators
http://find/http://goback/8/2/2019 17 Stream Intro
32/71
p
Types of Operators
ISTREAM: the tuples added to a relation
RSTREAM: all the tuples in a relation
DSTREAM: the tuples removed from relation
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 21 / 45
Relation Stream Operators
http://find/http://goback/8/2/2019 17 Stream Intro
33/71
p
Types of Operators
ISTREAM: the tuples added to a relation
RSTREAM: all the tuples in a relation
DSTREAM: the tuples removed from relation
Defaults
ISTREAM for queries without aggregation/grouping
RSTREAM for queries with aggregation/grouping
DSTREAM is rarely useful
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 21 / 45
Mixed Joins
http://find/http://goback/8/2/2019 17 Stream Intro
34/71
Common Requirement
Compare stream tuples with historical data
System must provide both tables and streams!
Elegantly modeled as a join between a table and a stream
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 22 / 45
Mixed Joins
http://find/http://goback/8/2/2019 17 Stream Intro
35/71
Common Requirement
Compare stream tuples with historical data
System must provide both tables and streams!
Elegantly modeled as a join between a table and a stream
Implementation
Stream is the right (outer) join operand; left (inner) operand isarbitrary Postgres subplan
For each stream tuple, join against non-continuous subplan
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 22 / 45
Mixed Join Example
http://find/http://goback/8/2/2019 17 Stream Intro
36/71
DescriptionEvery 3 seconds, compute the total value of high-volume trades made onstocks in the S & P 500 in the past 5 seconds.
Example QuerySELECT T.symbol, sum(T.price * T.volume)
FROM s_and_p_500 S,
trades T < VISIBLE 5 sec ADVANCE 3 sec >
WHERE T.symbol = S.symbol
AND T.volume > 5000GROUP BY T.symbol
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 23 / 45
Composing Streams
http://find/http://goback/8/2/2019 17 Stream Intro
37/71
The tuples in a stream can be viewed as a series of eventsE.g. The temperature in the room is 20, 25, 30, . . .
The output of a continuous query is another series of events, typicallyhigher-level or more complex
E.g. The room is on fire.
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 24 / 45
Composing Streams
http://find/http://goback/8/2/2019 17 Stream Intro
38/71
The tuples in a stream can be viewed as a series of eventsE.g. The temperature in the room is 20, 25, 30, . . .
The output of a continuous query is another series of events, typicallyhigher-level or more complex
E.g. The room is on fire.Therefore, streams can be composed in various ways:
Stream views
Macro semantics
Derived streams
SubqueriesActive tables
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 24 / 45
Derived Streams
http://find/http://goback/8/2/2019 17 Stream Intro
39/71
A derived stream is a database object definedby a persistent continuous query
Unlike a stream view, always active
Similar to a materialized view
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 25 / 45
Example Query
http://find/http://goback/8/2/2019 17 Stream Intro
40/71
Description
Every 3 seconds, compute the volume-weighted average price (VWAP)for all stocks traded in the past 5 seconds.
Query
CREATE STREAM vwap (symbol varchar(5),
vwap float,
vtime timestamp cqtime) AS
(SELECT symbol,
sum(price * volume) / sum(volume),
advance_agg(qtime)FROM trades < VISIBLE 5 seconds ADVANCE 3 seconds >
GROUP BY symbol);
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 26 / 45
Subqueries
http://find/http://goback/8/2/2019 17 Stream Intro
41/71
One-time subqueries can be used in continuous queries, of course
Continuous subqueries are planned and executed asindependent queries
Essentially inline derived streams
Require that subqueries yielding streams specify CQTIME
Planned: WITH-clause subqueries
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 27 / 45
Active Tables
http://find/http://goback/8/2/2019 17 Stream Intro
42/71
An active table is a table with an associated continuous query
Two modes of operation:
Append: New stream tuples appended to table at each windowReplace: At each new window, truncate previous table contents
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 28 / 45
Event Language
http://find/http://goback/8/2/2019 17 Stream Intro
43/71
Example Query
SELECT Shoplifting!, D.loc, D.id
FROM Store S C D PARTITION BY id
WHERE S.loc = shelf and C.loc = checkoutAND D.loc = door
EVENT AND (FOLLOWS(S, D, 1 hour),
NOT PRECEDES(C, D, 1 hour));
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 29 / 45
Outline
http://find/http://goback/8/2/2019 17 Stream Intro
44/71
1 The Need For Data Stream Processing
2 Stream Query Languages
3 Query Processing Techniques For StreamsSystem ArchitectureShared Evaluation
Adaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen Source
Proprietary5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 30 / 45
Basic Requirements
http://find/http://goback/8/2/2019 17 Stream Intro
45/71
Adaptivity
Static query planning is undesirable for long-running queries
Either replan or use adaptive planning
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45
Basic Requirements
http://find/8/2/2019 17 Stream Intro
46/71
Adaptivity
Static query planning is undesirable for long-running queries
Either replan or use adaptive planning
Shared Processing
Essential for good performance: 100s of queries not uncommonLong-lived queries make this more feasible
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45
Basic Requirements
http://find/http://goback/8/2/2019 17 Stream Intro
47/71
Adaptivity
Static query planning is undesirable for long-running queries
Either replan or use adaptive planning
Shared Processing
Essential for good performance: 100s of queries not uncommonLong-lived queries make this more feasible
Graceful Overload Handling
Stream data rates are often highly variableOften too expensive to provision for maximal data rate
Therefore, must handle overload gracefully
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 31 / 45
System Architecture
http://find/http://goback/8/2/2019 17 Stream Intro
48/71
Modified version of PostgreSQL
One-time queries executed normally
Continuous queries planned and executed by the CqRuntime process
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 32 / 45
System Architecture
http://find/http://goback/8/2/2019 17 Stream Intro
49/71
Modified version of PostgreSQL
One-time queries executed normally
Continuous queries planned and executed by the CqRuntime process
Stream input: COPY, or submitted via TCP to CqIngress process
libevent-based, simple COPY-like protocol
Stream output: cursors, active tables, CqEgress process
Communication between processes done via shared memory queueinfrastructure
Message passing done via SysV shmem and locks
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 32 / 45
Shared Runtime
http://find/http://goback/8/2/2019 17 Stream Intro
50/71
New continuous query is defined shared runtime via shared memory
Runtime plans the query, folds query into single shared query plan
Not a traditional tree; graph of operators
Shared Runtime Main Loop1 Check for control messages: add new CQ, remove CQ, . . .2 Check for new stream tuples
Route each stream tuple through the operator graph (CPS)
Push output tuples to result consumers
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 33 / 45
Shared Evaluation
http://find/http://goback/8/2/2019 17 Stream Intro
51/71
Continuous query evaluation done by a network of operators in theshared runtime
If multiple queries reference the same operator, we can evaluate itonly once
Better than linear scalability!
Each operator keeps track of the queries it helps to implement
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 34 / 45
Implementing Shared Evaluation
http://find/http://goback/8/2/2019 17 Stream Intro
52/71
Sharing PredicatesSimple cases: , , =Construct a tree that divides domain of type into disjoint regionsFor each tuple: walk the tree to find the region the tuple belongs in
Region implies which queries the tuple is still visible to
Immutable functions can also be shared relatively easily
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 35 / 45
Implementing Shared Evaluation
http://find/http://goback/8/2/2019 17 Stream Intro
53/71
Sharing PredicatesSimple cases: , , =Construct a tree that divides domain of type into disjoint regionsFor each tuple: walk the tree to find the region the tuple belongs in
Region implies which queries the tuple is still visible to
Immutable functions can also be shared relatively easily
Sharing Joins, Aggregates
Can also be done
Even between queries with varying windows and predicatesRequires some thought (say, a PhD thesis or two)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 35 / 45
Adaptive Tuple Routing
http://find/http://goback/8/2/2019 17 Stream Intro
54/71
Given a new tuple, how do we route it through the graph ofoperators?
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45
Adaptive Tuple Routing
http://find/http://goback/8/2/2019 17 Stream Intro
55/71
Given a new tuple, how do we route it through the graph ofoperators?
Traditional approach: statically choose an optimal route for eachstream
Hard optimization problemNeed to re-optimize when new queries defined or system conditionschange (e.g. operator selectivity)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45
Adaptive Tuple Routing
http://find/http://goback/8/2/2019 17 Stream Intro
56/71
Given a new tuple, how do we route it through the graph ofoperators?
Traditional approach: statically choose an optimal route for eachstream
Hard optimization problemNeed to re-optimize when new queries defined or system conditionschange (e.g. operator selectivity)
TelegraphCQ approach: adaptive per-tuple routing
Push tuples one at a time through the operator graph; choose order of
operators at runtime
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 36 / 45
Implementing Adaptive Routing
http://find/http://goback/8/2/2019 17 Stream Intro
57/71
For each tuple, maintain lineage
What operators has this tuple visited?Which queries can still see this tuple?
Implication: cant push down projections
Make routing decisions on the basis of simple run-time statistics
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 37 / 45
Handling Overload
http://find/http://goback/8/2/2019 17 Stream Intro
58/71
Common scenario: peak stream rate >> average stream rate(bursty)
The system should cope gracefully
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45
Handling Overload
http://find/http://goback/8/2/2019 17 Stream Intro
59/71
Common scenario: peak stream rate >> average stream rate(bursty)
The system should cope gracefully
Three alternatives:1 Spool tuples to disk, process later
But stream rates often exceed disk throughput
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45
Handling Overload
http://find/http://goback/8/2/2019 17 Stream Intro
60/71
Common scenario: peak stream rate >> average stream rate(bursty)
The system should cope gracefully
Three alternatives:1 Spool tuples to disk, process later
But stream rates often exceed disk throughput
2 Drop excess tuples
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45
Handling Overload
http://find/http://goback/8/2/2019 17 Stream Intro
61/71
Common scenario: peak stream rate >> average stream rate(bursty)
The system should cope gracefully
Three alternatives:1 Spool tuples to disk, process later
But stream rates often exceed disk throughput
2 Drop excess tuples3 Substitute statistical summaries for dropped stream tuples
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45
Handling Overload
http://find/http://goback/8/2/2019 17 Stream Intro
62/71
Common scenario: peak stream rate >> average stream rate(bursty)
The system should cope gracefully
Three alternatives:1 Spool tuples to disk, process later
But stream rates often exceed disk throughput
2 Drop excess tuples3 Substitute statistical summaries for dropped stream tuples
Quality of Service (QoS)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 38 / 45
Outline
http://find/http://goback/8/2/2019 17 Stream Intro
63/71
1 The Need For Data Stream Processing
2 Stream Query Languages3 Query Processing Techniques For Streams
System ArchitectureShared Evaluation
Adaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen SourceProprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 39 / 45
Open Source DSMS
http://find/http://goback/8/2/2019 17 Stream Intro
64/71
Esper
DSMS engine written in Java (GPL). SQL-like stream query language.http://esper.codehaus.org
TelegraphCQ
Academic prototype from UC Berkeley, based on PostgreSQL 7.3PostgreSQLs SQL dialect, plus stream-oriented extensions
BSD licensed; http://telegraph.cs.berkeley.edu
StreamCruncherDSMS engine written in Java. Free for commercial use (not open source).
http://www.streamcruncher.com
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 40 / 45
Proprietary DSMS
http://esper.codehaus.org/http://telegraph.cs.berkeley.edu/http://www.streamcruncher.com/http://www.streamcruncher.com/http://telegraph.cs.berkeley.edu/http://esper.codehaus.org/http://find/8/2/2019 17 Stream Intro
65/71
StreamBaseA Stonebraker company. Founded in 2003.
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45
Proprietary DSMS
http://find/http://goback/8/2/2019 17 Stream Intro
66/71
StreamBaseA Stonebraker company. Founded in 2003.
Other Startups
Coral8Apama (purchased by Progress Software in 2005)
and more . . .
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45
Proprietary DSMS
http://find/http://goback/8/2/2019 17 Stream Intro
67/71
StreamBaseA Stonebraker company. Founded in 2003.
Other Startups
Coral8Apama (purchased by Progress Software in 2005)
and more . . .
Established Companies
TIBCO BusinessEvents, Oracle BAM
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 41 / 45
Amalgamated Insight
http://find/http://goback/8/2/2019 17 Stream Intro
68/71
Based on the experience gained from TelegraphCQ
New codebase
Application components:1 Continuous Query Engine
Modified version of PostgreSQL (currently 8.1.9+)
2 Integration Framework
Connectors, input/output converters, query management
3 Visualization
Closed Series A funding in June 2006
1.0 release will be available Real Soon Now (currently RC3)
Lesson: PostgreSQL is a huge competitive advantage
Were hiring :-)
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 42 / 45
Outline
Th N d F D S P i
http://find/http://goback/8/2/2019 17 Stream Intro
69/71
1 The Need For Data Stream Processing
2 Stream Query Languages3 Query Processing Techniques For Streams
System ArchitectureShared Evaluation
Adaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen SourceProprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 43 / 45
Outline
Th N d F D S P i
http://find/http://goback/8/2/2019 17 Stream Intro
70/71
1 The Need For Data Stream Processing
2 Stream Query Languages3 Query Processing Techniques For Streams
System ArchitectureShared Evaluation
Adaptive Tuple RoutingOverload Handling
4 Current Choices For A DSMSOpen SourceProprietary
5 Demo
6 Q & A
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 44 / 45
Q & A
http://find/http://goback/8/2/2019 17 Stream Intro
71/71
Thank You.
Any Questions?
Neil Conway (AmInsight) Data Stream Query Processing May 24, 2007 45 / 45
http://find/http://goback/