+ All Categories
Home > Documents > Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Date post: 11-Jan-2016
Category:
Upload: michael-rich
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
Telegraph Continuously Adaptive Dataflow Joe Hellerstein
Transcript
Page 1: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

TelegraphContinuously Adaptive Dataflow

Joe Hellerstein

Page 2: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Scenarios

• Ubiquitous computing: more than clients– sensors and their data feeds are key

• smart dust, biomedical (MEMS sensors)• each consumer good records (mis)use

– disposable computing

• video from surveillance cameras, broadcasts, etc.

• Global Data Federation– all the data is online – what are we waiting for?– The plumbing is coming

• XML/HTTP, etc. give LCD communication• but how do you flow, summarize, query and analyze

data robustly over many sources in the wide area?

Page 3: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Dataflow in Volatile Environments• Federated query processors a reality

– Cohera, IBM DataJoiner– No control over stats, performance, administration

• Large Cluster Systems “Scaling Out”– No control over “system balance”

• User “CONTROL” of running dataflows– Long-running dataflow apps are interactive– No control over user interaction

• Sensor Nets: the next killer app– E.g. “Smart Dust”– No control over anything!

• Telegraph– Dataflow Engine for these environments

Page 4: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Data Flood: Main Features

• What does it look like?– Never ends: interactivity required

• Online, controllable algorithms for all tasks!

– Big: data reduction/aggregation is key– Volatile: this scale of devices and nets will

not behave nicely

Page 5: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

The Telegraph Dataflow Engine

• Key technologies– Interactive Control

• interactivity with early answers and examples• online aggregation for data reduction

– Dataflow programming via paths/iterators • Elevate query processing frameworks out of DBMSs• Long tradition of static optimization here

– Suggestive, but not sufficient for volatile environments

– Continuously adaptive flow optimization• massively parallel, adaptive dataflow via Rivers

and Eddies

Page 6: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

CONTROLContinuous Output and Navigation Technology with Refinement On Line

• Data-intensive jobs are long-running. How to give early answers and interactivity?– online interactivity over feeds

• pipelining “online” operators, data “juggle”– online data correlation algs: ripple joins, online

mining and aggregation– statistical estimators, and their performance

implications• Deliver data to satisfy statistical goals

• Appreciate interplay of massive data processing, stats, and HCI

“Of all men's miseries, the bitterest is this: to know so much and have control over nothing”

–Herodotus

Page 7: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Performance Regime for CONTROL

• New “Greedy” Performance Regime– Maximize 1st derivative of the user-

happiness function

Time

100%

CONTROLTraditional

Page 8: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

CONTROLContinuous Output and Navigation Technology with Refinement On Line

Page 9: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

CONTROLContinuous Output and Navigation Technology with Refinement On Line

Page 10: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
Page 11: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Potter’s Wheel Anomaly Detection

Page 12: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

River

• We built the world’s fastest sorting machine– On the “NOW”: 100 Sun workstations + SAN– But it only beat the record under ideal

conditions!• River: performance adaptivity for data

flows on clusters– simplifies management and programming– perfect for sensor-based streams

Page 13: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Declarative Dataflow: NOT new• Database Systems have been doing this for years

– Xlate declarative queries into an efficient dataflow plan– “query optimization” considers:

• Alternate data sources (“access methods”)• Alternate implementations of operators• Multiple orders of operators• A space of alternatives defined by transformation rules• Estimate costs and “data rates”, then search space

• But in a very static way!– Gather statistics once a week– Optimize query at submission time– Run a fixed plan for the life of the query

• And these ideas are ripe to elevate out of DBMSs– And outside of DBMSs, the world is very volatile– There are surely going to be lessons “outside the box”

Page 14: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Static Query Plans

• Volatile environments like sensors need to adapt at a much finer grain

Page 15: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Continuous Adaptivity: Eddies

• How to order and reorder operators over time– based on performance, economic/admin feedback

• Vs.River:– River optimizes each operator “horizontally”– Eddies optimize a pipeline “vertically”

Eddy

Page 16: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Competitive Eddies

Eddy

R2R1 R3 S1 S2 S3

hash

block index1 index2

Page 17: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Telegraph: Putting it Together• Scalable, adaptive dataflow infrastructure. Apps include…

– sensor nets– massively parallel and wide-area query engines– net appliances: chaining xform8n/aggreg8n/compression/ etc. in

proxies– any volatile dataflow scenario

• Technology: a marriage of…– CONTROL, Rivers & Eddies

• Many research questions here• E.g. how to combine River and Eddy adaptivity• E.g. how to tune Eddies for statistical performance goals

– Combinations of browse/query/mine at UI– Storage management to handle new hardware realities

• Look for a live service this summer!

Page 18: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Integration with Endeavour

• Give– Be data-intensive backbone to diverse clients– Be replication/delivery dataflow engine for

OceanStore– Telegraph Storage Manager provides storage

(xactional/otherwise) for OceanStore– Provide platform for data-intensive “tacit info mining”

• Take– Leverage OceanStore to manage distributed

metadata, security– Leverage protocols out of TinyOS for sensors

Page 19: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Connectivity & Heterogeneity

• Lots of folks working on data format translation, parsing– we will borrow, not build– currently using JDBC & Cohera Net Query

• commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC

– we may write “Teletalk” gateways from sensors• Heterogeneity

– never a simple problem– Control project developed interactive, online data

transformation tool: ABC

Page 20: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

More Info

• Collaborators:– Mike Franklin, Eric Brewer, Christos

Papadimitriou– Sirish Chandrasekaran, Amol Deshpande,

Kris Hildrum, Sam Madden, Vijayshankar Raman, Mehul Shah

• Me: [email protected]• Web:

– http://db.cs.berkeley.edu/telegraph– http://control.cs.berkeley.edu

Page 21: Telegraph Continuously Adaptive Dataflow Joe Hellerstein.

Extra slides for backup


Recommended