Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
HiFi Systems: Network-Centric Query
Processing for the Physical World
Michael Franklin
UC Berkeley
2.13.04
M. Franklin, UC Berkeley, Feb. 04
Introduction
Continuing improvements in sensor devices
– Wireless motes
– RFID
– Cellular-based telemetry Cheap devices can monitor the environment at
a high rate.
Connectivity enables remote monitoring at many different scales.
Widely different concerns at each of these levels and scales.
M. Franklin, UC Berkeley, Feb. 04
Plan of Attack
Motivation/Applications/Examples Characteristics of HiFi Systems Foundational Components
– TelegraphCQ– TinyDB
Research Issues Conclusions
M. Franklin, UC Berkeley, Feb. 04
RFID - Retail Scenario
“Smart Shelves” continuously monitor item addition and removal.
Info is sent back through the supply chain.
M. Franklin, UC Berkeley, Feb. 04
Manufacturer C
Retailer A
“Extranet” Information Flow
Manufacturer D
Retailer B
Aggregation/ Distribution
Service
M. Franklin, UC Berkeley, Feb. 04
M2M - Telemetry/Remote Monitoring
Energy Monitoring - Demand Response
Traffic Power Generation Remote Equipment
M. Franklin, UC Berkeley, Feb. 04
Time-Shift Trend Prediction
National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.
M. Franklin, UC Berkeley, Feb. 04
Virtual Sensors
Sensors don’t have to be physical sensors. Network Monitoring algorithms for detecting viruses,
spam, DoS attacks, etc. Disease outbreak detection
M. Franklin, UC Berkeley, Feb. 04
Properties
High Fan-In, globally-distributed architecture.
Large data volumes generated at edges.– Filtering and cleaning must be done there.
Successive aggregation as you move inwards.– Summaries/anomalies continually, details later.
Strong temporal focus. Strong spatial/geographic focus. Streaming data and stored data. Integration within and across enterprises.
M. Franklin, UC Berkeley, Feb. 04
One View of the Design Space
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
On-the-flyprocessing
Disk-basedprocessing
CombinedStream/DiskProcessing
TimeScale
seconds years
M. Franklin, UC Berkeley, Feb. 04
Another View of the Design Space
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
GeographicScope
local global
SeveralReaders
RegionalCenters
CentralOffice
M. Franklin, UC Berkeley, Feb. 04
One More View of the Design Space
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
Degree of Detail Aggregate
Data VolumeDup Elimhistory: hrs
Interesting Eventshistory: days
Trends/Archivehistory: years
TelegraphCQ: Monitoring Data Streams
Streaming Data– Network monitors– Sensor Networks– News feeds– Stock tickers
B2B and Enterprise apps– Supply-Chain, CRM, RFID– Trade Reconciliation, Order Processing etc.
(Quasi) real-time flow of events and data Must manage these flows to drive business
(and other) processes. Can mine flows to create/adjust business
rules or to perform on-line analysis.
TelegraphCQ (Continuous Queries)
An adaptive system for large-scale shared dataflow processing.
Based on an extensible set of operators:1) IngressIngress (data access) (data access) operators
Wrappers, File readers, Sensor Proxies2) Non-Blocking Data processingData processing operators
Selections (filters), XJoins, …
3) Adaptive RoutingAdaptive Routing Operators Eddies, STeMs, FLuX, etc.
Operators connected through “Fjords”– queue-based framework unifying push&pull.– Fjords will also allow us to easily mix and match
streaming and stored data sources.
M. Franklin, UC Berkeley, Feb. 04
Extreme Adaptivity
This is the region that we are exploring in the Telegraph project.
???Dynamic,
Parametric,Competitive,
…
staticplans
latebinding
inter-operator
per tupl
ecurrentDBMS
Query Scrambling,MidQuery
Re-opt
Eddies,CACQ
XJoin, DPHJConvergent
QP
???
PSoup
intra-operator
Traditional query optimization depends on statistical knowledge of the data and a stable environment.
The streaming world has neither.
M. Franklin, UC Berkeley, Feb. 04
Adaptivity Overview [Avnur & Hellerstein 2000]
• How to order and reorder operators over time?
– Traditionally, use performance, economic/admin feedback
– won’t work for never-ending queries over volatile streams
• Instead, use adaptive record routing.
Reoptimization = change in routing policy
staticdataflow
A B
C
D
eddy
A B C D
M. Franklin, UC Berkeley, Feb. 04
The TelegraphCQ Architecture
TelegraphCQ Wrapper
ClearingHouse
Wrappers
Proxy
TelegraphCQ Front End
Planner Parser Listener
Mini-Executor
Catalog
Query Plan Queue
Eddy Control Queue
Query Result Queues
}
Shared Memory
Shared Memory Buffer Pool
Disk
Split
TelegraphCQBack End
Modules
Scans
CQEddySplit
Split
TelegraphCQBack End
Modules
Scans
CQEddy
A single CQEddycan encode multiplequeries.
M. Franklin, UC Berkeley, Feb. 04
The StreaQuel Query Language
SELECT projection_list
FROM from_list
WHERE selection_and_join_predicates
ORDEREDBY
TRANSFORM…TO
WINDOW…BY
Target language for TelegraphCQ
Windows can be applied to individual streams Window movement is expressed using a “for loop construct in
the “transform” clause We’re not completely happy with our syntax at this point.
Example Window Query: Landmark
0 105 15 20 25 30 35 40 45 50 55 60
NOW = 40 = t
TimelineSTWindow
TimelineSTWindow
TimelineSTWindow
TimelineSTWindow
NOW = 41 = t
...
...
NOW = 45 = t
NOW = 50 = t
Current Status - TelegraphCQ System developed by modifying PostgreSQL.
Initial Version released Aug 03 – Open Source (PostgreSQL license)– Shared joins with windows and aggregates– Archived/unarchived streams– Next major release planned this summer.
Initial users include– Network monitoring project at LBL (Netlogger)– Intrusion detection project at Eurecom (France)– Our own project on Sensor Data Processing– Class projects at Berkeley, CMU, and ???
Visit http://telegraph.cs.berkeley.edu for more information.
M. Franklin, UC Berkeley, Feb. 04
Query-based interface to sensor networks
Developed on TinyOS/Motes Benefits
– Ease of programming and retasking
– Extensible aggregation framework
– Power-sensitive optimization and adaptivity
Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel).
http://telegraph.cs.berkeley.edu/tinydb
SELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms
App
Sensor Network
TinyDB
Query, Trigger
Data
Declarative Queries in Sensor Nets
SELECT nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s
EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound
0 1 455 x x x
0 2 389 x x x
1 1 422 x x x
1 2 405 x x x
Sensors
“Report the light intensities of the bright nests.”
EpochEpoch nestNonestNo LightLight TempTemp AccelAccel SoundSound
0 1 455 x x x
0 2 389 x x x
Many sensor network applications can be described using query Many sensor network applications can be described using query
language primitives.language primitives.
– Potential for tremendous reductions in development and debugging effort.
Aggregation Query Example
Epoch region CNT(…) AVG(…)
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
“Count the number occupied nests in each loud region of the island.”
SELECT region, CNT(occupied) AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10sRegions w/ AVG(sound) > 200
M. Franklin, UC Berkeley, Feb. 04
Query Language (TinySQL)
SELECT <aggregates>, <attributes>[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> | ONCE][INTO <buffer>][TRIGGER ACTION <command>]
A
B C
D
FE
Sensor Queries @ 10000 Ft
Query
{D,E,F}
{B,D,E,F}
{A,B,C,D,E,F}
Written in SQLWith Extensions For :
•Sample rate
•Offline delivery
•Temporal Aggregation
(Almost) All Queries are Continuous and Periodic
M. Franklin, UC Berkeley, Feb. 04
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing: Aggregation
1 2 3 4 5
4
3
2
1
4
1
2 3
4
5
Sensor #
Inte
rval #
Interval 4SELECT COUNT(*) FROM sensors
Epoch
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing: Aggregation
1 2 3 4 5
4 1
3
2
1
4
1
2 3
4
5
1
Sensor #
Inte
rval #
Interval 4SELECT COUNT(*) FROM sensors
Epoch
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
1 2 3 4 5
4 1
3 2
2
1
4
1
2 3
4
5
2
Sensor #
Interval 3SELECT COUNT(*) FROM sensors
Inte
rval #
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
2 3
4
5
31
Sensor #
Interval 2SELECT COUNT(*) FROM sensors
Inte
rval #
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
1
2 3
4
5
5
Sensor #
SELECT COUNT(*) FROM sensors Interval 1
Inte
rval #
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
1
2 3
4
5
1
Sensor #
SELECT COUNT(*) FROM sensors Interval 4
Inte
rval #
In Network Aggregation: Example Benefits
2500 Nodes
50x50 Grid
Depth = ~10
Neighbors = ~20
M. Franklin, UC Berkeley, Feb. 04
Total Bytes Xmitted vs. Aggregation Function
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function
To
tal B
yte
s X
mit
ted
M. Franklin, UC Berkeley, Feb. 04
Taxonomy of Aggregates
TinyDB insight: classify aggregates according to various functional properties
– Yields a general set of optimizations that can automatically be applied
Property Examples Affects
Partial State MEDIAN : unbounded, MAX : 1 record
Effectiveness of TAG
Duplicate Sensitivity
MIN : dup. insensitive,AVG : dup. sensitive
Routing Redundancy
Exemplary vs. Summary
MAX : exemplaryCOUNT: summary
Applicability of Sampling, Effect of Loss
Monotonic COUNT : monotonicAVG : non-monotonic
Hypothesis Testing, Snooping
Current Status - TinyDB System built on top of TinyOS (~10K lines embedded
C code)Latest release 9/2003 Several deployments including redwoods at UC
Botanical Garden
Visit http://telegraph.cs.berkeley.edu/tinydb for more information.
36m33m: 11132m: 110
30m: 109,108,107
20m: 106,105,104
10m: 103, 102, 101
Temperature vs. Time
8
13
18
23
28
33
7/7/039:40
7/7/0313:41
7/7/0317:43
7/7/0321:45
8/7/031:47
8/7/035:49
8/7/039:51
8/7/0313:53
8/7/0317:55
8/7/0321:57
9/7/031:59
9/7/036:01
9/7/0310:03
Date
Tem
pera
ture
(C
)
Humidity vs. Time
35
45
55
65
75
85
95
Rel H
um
idit
y (
%)
101 104 109 110 111
M. Franklin, UC Berkeley, Feb. 04
Ursa - A HiFi Implementation
Current effort towards building an integrated infrastructure that spans the large scale in:– Time– Geography– Resources
Ursa-Minor
(TinyDB-based)Ursa-Major
(TelegraphCQ w/Archiving)
Mid-tier
(???)
M. Franklin, UC Berkeley, Feb. 04
TelegraphCQ/TinyDB Integration
Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream.
Main issues revolve around what to run where.– TCQ is a query processor– TinyDB is also a query processor– Optimization criteria include: total cost,
response time, answer quality, answer likelihood, power conservation on motes, …
Project on-going, should work by summer. Related work: Gigascope work at AT&T
M. Franklin, UC Berkeley, Feb. 04
TCQ-based Overlay Network
TCQ is primarily a single node system– Flux operators [Shah et al 03] support cluster-based
processing.
Want to run TCQ at each internal node. Primary issue is support for wide-area
temporal and geographic aggregation.– In an adaptive manner, of course
Currently under design. Related work: Astrolabe, IRISNet, DBIS, …
M. Franklin, UC Berkeley, Feb. 04
Querying the Past, Present, and Future
Need to handle archived data– Adaptive compression can reduce processing
time.– Historical queries– Joins of Live and Historical Data– Deal with later arriving detail info
Archiving Storage Manager - A Split-stream SM for stream and disk-based processing.
Initial version of new SM running. Related Work: Temporal and Time-travel DBs
M. Franklin, UC Berkeley, Feb. 04
XML, Integration, and Other Realities
Eventually need to support XML Must integrate with existing enterprise apps.
In many areas, standardization well underway Augmenting moving data
Related Work: YFilter [Diao & Franklin 03], Mutant Queries [Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, …
High Fan-in High Fan-out