+ All Categories
Home > Documents > HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley...

HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley...

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04
Transcript

HiFi Systems: Network-Centric Query

Processing for the Physical World

Michael Franklin

UC Berkeley

2.13.04

M. Franklin, UC Berkeley, Feb. 04

Introduction

Continuing improvements in sensor devices

– Wireless motes

– RFID

– Cellular-based telemetry Cheap devices can monitor the environment at

a high rate.

Connectivity enables remote monitoring at many different scales.

Widely different concerns at each of these levels and scales.

M. Franklin, UC Berkeley, Feb. 04

Plan of Attack

Motivation/Applications/Examples Characteristics of HiFi Systems Foundational Components

– TelegraphCQ– TinyDB

Research Issues Conclusions

M. Franklin, UC Berkeley, Feb. 04

The Canonical HiFi System

M. Franklin, UC Berkeley, Feb. 04

RFID - Retail Scenario

“Smart Shelves” continuously monitor item addition and removal.

Info is sent back through the supply chain.

M. Franklin, UC Berkeley, Feb. 04

Manufacturer C

Retailer A

“Extranet” Information Flow

Manufacturer D

Retailer B

Aggregation/ Distribution

Service

M. Franklin, UC Berkeley, Feb. 04

M2M - Telemetry/Remote Monitoring

Energy Monitoring - Demand Response

Traffic Power Generation Remote Equipment

M. Franklin, UC Berkeley, Feb. 04

Time-Shift Trend Prediction

National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.

M. Franklin, UC Berkeley, Feb. 04

Virtual Sensors

Sensors don’t have to be physical sensors. Network Monitoring algorithms for detecting viruses,

spam, DoS attacks, etc. Disease outbreak detection

M. Franklin, UC Berkeley, Feb. 04

Properties

High Fan-In, globally-distributed architecture.

Large data volumes generated at edges.– Filtering and cleaning must be done there.

Successive aggregation as you move inwards.– Summaries/anomalies continually, details later.

Strong temporal focus. Strong spatial/geographic focus. Streaming data and stored data. Integration within and across enterprises.

M. Franklin, UC Berkeley, Feb. 04

One View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

On-the-flyprocessing

Disk-basedprocessing

CombinedStream/DiskProcessing

TimeScale

seconds years

M. Franklin, UC Berkeley, Feb. 04

Another View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

GeographicScope

local global

SeveralReaders

RegionalCenters

CentralOffice

M. Franklin, UC Berkeley, Feb. 04

One More View of the Design Space

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

Degree of Detail Aggregate

Data VolumeDup Elimhistory: hrs

Interesting Eventshistory: days

Trends/Archivehistory: years

M. Franklin, UC Berkeley, Feb. 04

Building Blocks

TelegraphCQTinyDB

TelegraphCQ: Monitoring Data Streams

Streaming Data– Network monitors– Sensor Networks– News feeds– Stock tickers

B2B and Enterprise apps– Supply-Chain, CRM, RFID– Trade Reconciliation, Order Processing etc.

(Quasi) real-time flow of events and data Must manage these flows to drive business

(and other) processes. Can mine flows to create/adjust business

rules or to perform on-line analysis.

TelegraphCQ (Continuous Queries)

An adaptive system for large-scale shared dataflow processing.

Based on an extensible set of operators:1) IngressIngress (data access) (data access) operators

Wrappers, File readers, Sensor Proxies2) Non-Blocking Data processingData processing operators

Selections (filters), XJoins, …

3) Adaptive RoutingAdaptive Routing Operators Eddies, STeMs, FLuX, etc.

Operators connected through “Fjords”– queue-based framework unifying push&pull.– Fjords will also allow us to easily mix and match

streaming and stored data sources.

M. Franklin, UC Berkeley, Feb. 04

Extreme Adaptivity

This is the region that we are exploring in the Telegraph project.

???Dynamic,

Parametric,Competitive,

staticplans

latebinding

inter-operator

per tupl

ecurrentDBMS

Query Scrambling,MidQuery

Re-opt

Eddies,CACQ

XJoin, DPHJConvergent

QP

???

PSoup

intra-operator

Traditional query optimization depends on statistical knowledge of the data and a stable environment.

The streaming world has neither.

M. Franklin, UC Berkeley, Feb. 04

Adaptivity Overview [Avnur & Hellerstein 2000]

• How to order and reorder operators over time?

– Traditionally, use performance, economic/admin feedback

– won’t work for never-ending queries over volatile streams

• Instead, use adaptive record routing.

Reoptimization = change in routing policy

staticdataflow

A B

C

D

eddy

A B C D

M. Franklin, UC Berkeley, Feb. 04

The TelegraphCQ Architecture

TelegraphCQ Wrapper

ClearingHouse

Wrappers

Proxy

TelegraphCQ Front End

Planner Parser Listener

Mini-Executor

Catalog

Query Plan Queue

Eddy Control Queue

Query Result Queues

}

Shared Memory

Shared Memory Buffer Pool

Disk

Split

TelegraphCQBack End

Modules

Scans

CQEddySplit

Split

TelegraphCQBack End

Modules

Scans

CQEddy

A single CQEddycan encode multiplequeries.

M. Franklin, UC Berkeley, Feb. 04

The StreaQuel Query Language

SELECT projection_list

FROM from_list

WHERE selection_and_join_predicates

ORDEREDBY

TRANSFORM…TO

WINDOW…BY

Target language for TelegraphCQ

Windows can be applied to individual streams Window movement is expressed using a “for loop construct in

the “transform” clause We’re not completely happy with our syntax at this point.

Example Window Query: Landmark

0 105 15 20 25 30 35 40 45 50 55 60

NOW = 40 = t

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

NOW = 41 = t

...

...

NOW = 45 = t

NOW = 50 = t

Current Status - TelegraphCQ System developed by modifying PostgreSQL.

Initial Version released Aug 03 – Open Source (PostgreSQL license)– Shared joins with windows and aggregates– Archived/unarchived streams– Next major release planned this summer.

Initial users include– Network monitoring project at LBL (Netlogger)– Intrusion detection project at Eurecom (France)– Our own project on Sensor Data Processing– Class projects at Berkeley, CMU, and ???

Visit http://telegraph.cs.berkeley.edu for more information.

M. Franklin, UC Berkeley, Feb. 04

Query-based interface to sensor networks

Developed on TinyOS/Motes Benefits

– Ease of programming and retasking

– Extensible aggregation framework

– Power-sensitive optimization and adaptivity

Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel).

http://telegraph.cs.berkeley.edu/tinydb

SELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms

App

Sensor Network

TinyDB

Query, Trigger

Data

Declarative Queries in Sensor Nets

SELECT nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound

0 1 455 x x x

0 2 389 x x x

1 1 422 x x x

1 2 405 x x x

Sensors

“Report the light intensities of the bright nests.”

EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound

0 1 455 x x x

0 2 389 x x x

Many sensor network applications can be described using query Many sensor network applications can be described using query

language primitives.language primitives.

– Potential for tremendous reductions in development and debugging effort.

Aggregation Query Example

Epoch region CNT(…) AVG(…)

0 North 3 360

0 South 3 520

1 North 3 370

1 South 3 520

“Count the number occupied nests in each loud region of the island.”

SELECT region, CNT(occupied) AVG(sound)

FROM sensors

GROUP BY region

HAVING AVG(sound) > 200

EPOCH DURATION 10sRegions w/ AVG(sound) > 200

M. Franklin, UC Berkeley, Feb. 04

Query Language (TinySQL)

SELECT <aggregates>, <attributes>[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> | ONCE][INTO <buffer>][TRIGGER ACTION <command>]

A

B C

D

FE

Sensor Queries @ 10000 Ft

Query

{D,E,F}

{B,D,E,F}

{A,B,C,D,E,F}

Written in SQLWith Extensions For :

•Sample rate

•Offline delivery

•Temporal Aggregation

(Almost) All Queries are Continuous and Periodic

M. Franklin, UC Berkeley, Feb. 04

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing: Aggregation

1 2 3 4 5

4

3

2

1

4

1

2 3

4

5

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Epoch

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing: Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

5

1

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Epoch

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #

Interval 3SELECT COUNT(*) FROM sensors

Inte

rval #

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #

Interval 2SELECT COUNT(*) FROM sensors

Inte

rval #

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5

Sensor #

SELECT COUNT(*) FROM sensors Interval 1

Inte

rval #

M. Franklin, UC Berkeley, Feb. 04

In-Network Processing : Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

1

Sensor #

SELECT COUNT(*) FROM sensors Interval 4

Inte

rval #

In Network Aggregation: Example Benefits

2500 Nodes

50x50 Grid

Depth = ~10

Neighbors = ~20

M. Franklin, UC Berkeley, Feb. 04

Total Bytes Xmitted vs. Aggregation Function

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function

To

tal B

yte

s X

mit

ted

M. Franklin, UC Berkeley, Feb. 04

Taxonomy of Aggregates

TinyDB insight: classify aggregates according to various functional properties

– Yields a general set of optimizations that can automatically be applied

Property Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Monotonic COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

Current Status - TinyDB System built on top of TinyOS (~10K lines embedded

C code)Latest release 9/2003 Several deployments including redwoods at UC

Botanical Garden

Visit http://telegraph.cs.berkeley.edu/tinydb for more information.

36m33m: 11132m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Temperature vs. Time

8

13

18

23

28

33

7/7/039:40

7/7/0313:41

7/7/0317:43

7/7/0321:45

8/7/031:47

8/7/035:49

8/7/039:51

8/7/0313:53

8/7/0317:55

8/7/0321:57

9/7/031:59

9/7/036:01

9/7/0310:03

Date

Tem

pera

ture

(C

)

Humidity vs. Time

35

45

55

65

75

85

95

Rel H

um

idit

y (

%)

101 104 109 110 111

M. Franklin, UC Berkeley, Feb. 04

Putting It All Together?

TelegraphCQTinyDB

M. Franklin, UC Berkeley, Feb. 04

Ursa - A HiFi Implementation

Current effort towards building an integrated infrastructure that spans the large scale in:– Time– Geography– Resources

Ursa-Minor

(TinyDB-based)Ursa-Major

(TelegraphCQ w/Archiving)

Mid-tier

(???)

M. Franklin, UC Berkeley, Feb. 04

TelegraphCQ/TinyDB Integration

Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream.

Main issues revolve around what to run where.– TCQ is a query processor– TinyDB is also a query processor– Optimization criteria include: total cost,

response time, answer quality, answer likelihood, power conservation on motes, …

Project on-going, should work by summer. Related work: Gigascope work at AT&T

M. Franklin, UC Berkeley, Feb. 04

TCQ-based Overlay Network

TCQ is primarily a single node system– Flux operators [Shah et al 03] support cluster-based

processing.

Want to run TCQ at each internal node. Primary issue is support for wide-area

temporal and geographic aggregation.– In an adaptive manner, of course

Currently under design. Related work: Astrolabe, IRISNet, DBIS, …

M. Franklin, UC Berkeley, Feb. 04

Querying the Past, Present, and Future

Need to handle archived data– Adaptive compression can reduce processing

time.– Historical queries– Joins of Live and Historical Data– Deal with later arriving detail info

Archiving Storage Manager - A Split-stream SM for stream and disk-based processing.

Initial version of new SM running. Related Work: Temporal and Time-travel DBs

M. Franklin, UC Berkeley, Feb. 04

XML, Integration, and Other Realities

Eventually need to support XML Must integrate with existing enterprise apps.

In many areas, standardization well underway Augmenting moving data

Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, …

High Fan-in High Fan-out

M. Franklin, UC Berkeley, Feb. 04

Conclusions

Sensors, RFIDs, and other data collection devices enable real-time enterprises.

These will create high fan-in systems.

Can exploit recent advances in streaming and sensor data management.

Lots to do!


Recommended