Streaming Data, Continuous Queries, and Adaptive Dataflow

Streaming Data, Continuous Queries, and Adaptive Dataflow

Michael FranklinUC Berkeley

NRC June 2002

.

2

Data Stream ProcessingNetworked data streams central to current and

future computing.Existing data management and query processing

infrastructure is lacking:– Adaptability– Continuous and Incremental Processing– Work Sharing for large scale– Resource scalability: from “smart dust” up to clusters

to grids.XML provides additional opportunites.

3

Example 1: “Transactional Flows”

E-Commerce, clickstream, swipestream, logs…

Network Monitoring B2B and Enterprise apps

– Supply-Chain, CRM, ERP (Quasi) real-time flow of events and data Must manage these flows to drive business

processes. Mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.

4

Example 2: Information Dissemination

User Profiles

Users

Filtered Data

Data Sources

•Doc creation or crawler initiates flow of data towards users.•profiles are aggregated back towards data.

5

Example 3: Sensor Nets

Tiny (or not so tiny) devices measure the physical world.– Berkeley “motes”, Smart Dust, Smart Tags, …

Many monitoring applications– Transportation, Seismic, Energy, Military…

Form dynamic ad hoc networks. Aggregate and communicate streams of values. Not one way – can actuate to effect or actively

monitor the environment

6

Common Features Centrality of Dataflow and Data Routing

– Architecture is focused on data movement– Moving streams of data through code in a network

Volatility of the environment– Dynamic resources & topology, partial failures– Long-running (never-ending?) tasks– Potential for user interaction during the flow– Large Scale: users, data, resources, …

Resource Constraints– Bandwidth, memory,processing,battery,…– Time and human attention

7

In The Beginning

Data

Query

Index

Result

8

Pub Sub/CQ/Filtering

Queries

Dat

a

Index Result

•Effectively processes all queries simultaneously.•Shares work for common sub-expressions.

9

Telegraph/PSoup: Query & Data Duality

Queries

Index

Result

DataData

Index

10

Telegraph/PSoup: Query & Data Duality

Queries

Index

Result

Data

Index

Query

11

PSoup – Query Invocation PSoup continuously maintains materialized views over

streaming data and queries. Data is returned to user when query is invoked.

– Invocation requires applying “windows” to precomputed results.

Adaptive approach allows system to continuously absorb new data and new queries without recompilation.

Lots of issues to study: – Query indexing, Spilling to disk, bulk processing– Other semantics and interaction models (e.g., alerts)

12

Stream Processing Research Agenda Need continuously-adaptive processing. Need appropriate data model & query lang.

– Window semantics: input and output– Notification semantics & thresholds

Approximation, satisficing, and QoS– must be driven by user needs and context– adapt to available resources & time constraints

Integration & interaction with “pooled” data.– time travel, archiving, “normal” databases

Structured, semi-, and un- data; XML etc. Sensor-sensitive processing. Metrics and Benchmarks (challenge problems).

13

Conclusions Dataflow and streaming are central to many

emerging application areas.– Solutions require a mixture of database and

networking approaches:adaptivity and tolerance of partial failureexploitation of user, app, and data semantics

A new infrastructure is needed for solving these problems. – Duality of Data and Queries

Currently a topic of major interest in the research community.

Date post:	25-Feb-2016
Category:	Documents
Upload:	yannis
View:	31 times
Download:	3 times

Streaming Data, Continuous Queries, and Adaptive Dataflow

Documents