1
Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor
Networks
Samuel MaddenUC Berkeley
With Robert Szewczyk, Michael Franklin, and David Culler
WMCSA June 21, 2002
2
Motivation: Sensor Nets and In-Network Query Processing
Many Sensor Network Applications are Data Oriented
Queries Natural and Efficient Data Processing Mechanism– Easy (unlike embedded C code)– Enable optimizations through abstraction
Aggregates Common Case– E.g. Which rooms are in use?
In-network processing a must– Sensor networks power and bandwidth constrained– Communication dominates power cost– Not subject to Moore’s law!
3
Overview
Background– Sensor Networks
Our Approach: Tiny Aggregation (TAG)– Overview– Expressiveness– Illustration– Optimizations– Grouping
Current Status & Future Work
4
Overview
Background– Sensor Networks
Our Approach: Tiny Aggregation (TAG)– Overview– Expressiveness– Illustration– Optimizations– Grouping
Current Status & Future Work
5
Background: Sensor Networks A collection of small, radio-equipped, battery
powered, networked microprocessors– Typically Ad-hoc & Multihop Networks– Single devices unreliable– Very low power; tiny batteries power for months
Apps: Environment Monitoring, Personal Nets, Object Tracking
Data processing plays a key role!
6
Berkeley Mica Motes & TinyOS
TinyOS operating system (services) 4Mhz Processor 4K RAM, 512K EEPROM, 128K code space Single channel CSMA half-duplex radio @
40kbits – Lossy: 20% loss @ 5ft in Ganesan et al. – Communication Very Expensive: 800 instrs/bit
7
Overview
Background– Sensor Networks
Our Approach: Tiny Aggregation (TAG)– Overview– Expressiveness– Illustration– Optimizations– Grouping
Current Status & Future Work
8
The Tiny Aggregation (TAG) Approach
Push declarative queries into network– Impose a hierarchical routing tree onto the
network Divide time into epochs Every epoch, sensors evaluate query over
local sensor data and data from children– Aggregate local and child data– Each node transmits just once per epoch– Pipelined approach increases throughput
Depending on aggregate function, various optimizations can be applied
9
SQL Primer SQL is an established declarative language; not wedded to it
– Some extensions clearly necessary, e.g. for sample rates We adopt a basic subset:
‘sensors’ relation (table) has– One column for each reading-type, or attribute– One row for each externalized value
May represent an aggregation of several individual readings
SELECT {aggn(attrn), attrs} FROM sensorsWHERE {selPreds}GROUP BY {attrs}HAVING {havingPreds}EPOCH DURATION s
SELECT AVG(light) FROM sensors WHERE sound < 100GROUP BY roomNoHAVING AVG(light) < 50
10
Aggregation Functions
Standard SQL supports “the basic 5”:– MIN, MAX, SUM, AVERAGE, and COUNT
We support any function conforming to:Aggn={fmerge, finit, fevaluate}
Fmerge{<a1>,<a2>} <a12>
finit{a0} <a0>
Fevaluate{<a1>} aggregate value
(Merge associative, commutative!)Example: Average
AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>
AVGinit{v} <v,1>
AVGevaluate{<S1, C1>} S1/C1
Partial Aggregate
11
Query Propagation
TAG propagation agnostic– Any algorithm that can:
Deliver the query to all sensors Provide all sensors with one or
more duplicate free routes to some root
Paper describes simple flooding approach
– Query introduced at a root; rebroadcast by all sensors until it reaches leaves
– Sensors pick parent and level when they hear query
– Reselect parent after k silent epochs
Query
P:0, L:1
2
1
5
3
4
6
P:1, L:2
P:1, L:2
P:3, L:3
P:2, L:3
P:4, L:4
12
Illustration: Pipelined Aggregation
1
2 3
4
5
SELECT COUNT(*) FROM sensors
Depth = d
13
Illustration: Pipelined Aggregation
1 2 3 4 5
1 1 1 1 1 1
1
2 3
4
5
1
1
11
1
Sensor #
Ep
och
#
Epoch 1SELECT COUNT(*) FROM sensors
14
Illustration: Pipelined Aggregation
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
1
2 3
4
5
1
2
21
3
Sensor #
Ep
och
#
Epoch 2SELECT COUNT(*) FROM sensors
15
Illustration: Pipelined Aggregation
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
1
2 3
4
5
1
2
31
4
Sensor #
Ep
och
#
Epoch 3SELECT COUNT(*) FROM sensors
16
Illustration: Pipelined Aggregation
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
4 5 1 3 2 1
1
2 3
4
5
1
2
31
5
Sensor #
Ep
och
#
Epoch 4SELECT COUNT(*) FROM sensors
17
Illustration: Pipelined Aggregation
1 2 3 4 5
1 1 1 1 1 1
2 3 1 2 2 1
3 4 1 3 2 1
4 5 1 3 2 1
5 5 1 3 2 1
1
2 3
4
5
1
2
31
5
Sensor #
Ep
och
#
Epoch 5SELECT COUNT(*) FROM sensors
18
Discussion
Result is a stream of values– Ideal for monitoring scenarios
One communication / node / epoch– Symmetric power consumption, even at root
New value on every epoch– After d-1 epochs, complete aggregation
Given a single loss, network will recover after at most d-1 epochs
With time synchronization, nodes can sleep between epochs, except during small communication window
Note: Values from different epochs combined– Can be fixed via small cache of past values at each node– Cache size at most one reading per child x depth of tree
1
2 3
4
5
19
Simulation Result
Total Bytes Xmitted vs. Aggregation Function
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function
Tota
l Byt
es X
mitt
ed
Simulation Results
2500 Nodes
50x50 Grid
Depth = ~10
Neighbors = ~20
Some aggregates require dramatically more state!
20
Optimization: Channel Sharing
Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate
– E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report
– Applies to all such exemplary aggregates
Learn about query advertisements it missed– If a sensor shows up in a new environment, it can learn
about queries by looking at neighbors messages. Root doesn’t have to explicitly rebroadcast query!
21
Optimization: Hypothesis Testing
Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value.– E.g. Tell all the nodes that the MIN is
definitely < 50; nodes with value ≥ 50 need not participate.
– Works for any linear aggregate function How is hypothesis computed?
– Blind guess– Statistically informed guess– Observation over first few levels of tree / rounds of
aggregate
22
Optimization: Use Multiple Parents
For duplicate insensitive (e.g. MAX), or partitionable (e.g. COUNT) aggregates,– Send (part of) aggregate to all parents– Decreases variance
Dramatically, when there are lots of parents
No extra cost, since all messages broadcast
23
Grouping
Value-based, complete partitioning of records If query is grouped, sensors apply predicate to local
readings on each epoch Aggregate records tagged with group When a child record (with group) is received:
– If it belongs to a stored group, merge with existing record for that group
– If not, just store it At the end of each epoch, transmit one record per group Number of groups may exceed available storage
– Can evict groups for aggregation at root!
24
Overview
Background– Sensor Networks
Our Approach: Tiny Aggregation (TAG)– Overview– Expressiveness– Illustration– Optimizations– Grouping
Current Status & Future Work
25
Status & Future Work
Status– Simple simulator
Complete set of experiments, including behavior of algorithms in the face of loss
– Generalization of algorithms beyond complete pipelining– Taxonomy of aggregates to allow optimizations on functional
properties– Basic implementation (shown in demo)
Future work– Expressiveness issues
Aggregates over temporal data Nested queries, e.g MAX(AVG(1000 readings) @ each node)
– Correctness Issues in The Face Of Loss How does the user know which nodes are and are not included in
an aggregate?
26
Summary
Declarative queries for aggregates– Straightforward, familiar interface– Enables optimizations
Snooping techniques for exemplary aggregates Multiple parents for partitionable aggregates
Pipelined, epoch based algorithm– Streaming Results– Symmetric communication– Low-power friendly
27
Questions?
28
Grouping
GROUP BY expr– expr is an expression over one or more
attributes Evaluation of expr yields a group number Each reading is a member of exactly one group
Example: SELECT max(light) FROM sensorsGROUP BY TRUNC(temp/10)
Sensor ID Light Temp Group
1 45 25 2
2 27 28 2
3 66 34 3
4 68 37 3
Group max(light)
2 45
3 68
Result:
29
Having
HAVING preds– preds filters out groups that do not satisfy
predicate– versus WHERE, which filters out tuples that
do not satisfy predicate– Example:
SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100
Yields all groups with temperature under 100
30
Group Eviction
Problem: Number of groups in any one iteration may exceed available storage on sensor
Solution: Evict!– Choose one or more groups to forward up tree– Rely on nodes further up tree, or root, to recombine
groups properly– What policy to choose?
Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.
Experiments suggest:– Policy matters very little– Evicting as many groups as will fit into a single message is
good
31
Simulation Environment
Java-based simulation & visualization for validating algorithms, collecting data.
Coarse grained event based simulation– Sensors arranged on a grid, radio
connectivity by Euclidian distance– Communication model
Lossless: All neighbors hear all messages Lossy: Messages lost with probability that increases
with distance Symmetric links No collisions, hidden terminals, etc.
32
Simulation Screenshot
33
Experimental Results
Experiments with simulator– Performance of basic TAG– Benefits of hypothesis testing– Effect of loss
Most experiments in terms of bytes or messages sent, since message transmission is the dominant cost– Depends on radio being turned off between
epochs and aggregation functions being cheap
34
Experiment: Basic TAG
Dense Packing, Ideal Communication
Bytes / Epoch vs. Network Diameter
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
10 20 30 40 50
Network Diameter
Avg
. B
yte
s /
Ep
och
COUNTMAXAVERAGEMEDIANEXTERNALDISTINCT
35
Experiment: Hypothesis Testing
Uniform Value Distribution, Dense Packing, Ideal Communication
Messages/ Epoch vs. Network Diameter
0
500
1000
1500
2000
2500
3000
10 20 30 40 50
Network Diameter
Messag
es /
Ep
och
No GuessGuess = 50Guess = 90Snooping
36
Experiment: Effects of Loss
Percent Error From Single Loss vs. Network Diameter
0
0.5
1
1.5
2
2.5
3
3.5
10 20 30 40 50
Network Diameter
Perc
en
t Err
or
Fro
m S
ing
le L
oss
AVERAGECOUNTMAXMEDIAN
37
Experiment: Benefit of Cache
Percentage of Network I nvolved vs. Network Diameter
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Network Diameter
% N
etw
ork
No Cache5 Rounds Cache9 Rounds Cache15 Rounds Cache
38
Pipelined Aggregates
After query propagates, during each epoch:– Each sensor samples local sensors once– Combines them with PSRs from children– Outputs PSR representing aggregate state
in the previous epoch. After (d-1) epochs, PSR for the whole tree
output at root– d = Depth of the routing tree– If desired, partial state from top k levels
could be output in kth epoch To avoid combining PSRs from different
epochs, sensors must cache values from children
1
2 3
4
5
Value from 5 produced at
time t arrives at 1 at time
(t+3)
Value from 2 produced at
time t arrives at 1 at time
(t+1)
39
Pipelining Example
1
2
43
5
SID Epoch Agg.
SID Epoch Agg.
SID Epoch Agg.
40
Pipelining Example
1
2
43
5
SID Epoch Agg.
2 0 1
4 0 1
SID Epoch Agg.
1 0 1
SID Epoch Agg.
3 0 1
5 0 1
Epoch 0
<5,0,1>
<4,0,1>
41
Pipelining Example
1
2
43
5
SID Epoch Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
SID Epoch Agg.
3 0 1
5 0 1
3 1 1
5 1 1
Epoch 1
<5,1,1>
<4,1,1><3,0,2>
<2,0,2>
42
Pipelining Example
1
2
43
5
SID Epoch Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
2 2 1
4 2 1
3 1 2
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
1 2 1
2 0 4
SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1
Epoch 2
<5,2,1>
<4,2,1><3,1,2>
<2,0,4>
<1,0,3>
43
Pipelining Example
1
2
43
5
SID Epoch
Agg.
2 0 1
4 0 1
2 1 1
4 1 1
3 0 2
2 2 1
4 2 1
3 1 2
SID Epoch Agg.
1 0 1
1 1 1
2 0 2
1 2 1
2 0 4
SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1
Epoch 3
<5,3,1>
<4,3,1><3,2,2>
<2,1,4>
<1,0,5>
44
Pipelining Example
1
2
43
5
Epoch 4
<5,4,1>
<4,4,1><3,3,2>
<2,2,4>
<1,1,5>
45
Optimization: Delta Compression
If a sensor’s reading is unchanged from previous epoch, it need not transmit.– Parents assume value is unchanged– Leverage child value cache– Periodic heartbeats to handle disconnection
Extension: if a sensor’s reading is unchanged by more than some threshold, it need not transmit– Similar to hypothesis testing with AVERAGE– Really future work: See C. Olsten, “Best-Effort Cache
Synchronization”, SIGMOD 2002.
46
Taxonomy of Aggregates
TAG insight: classifying aggregates according to various functional properties– Yields a general set of optimizations that can
automatically be appliedProperty Examples Affects
Partial State MEDIAN : unbounded, MAX : 1 record
Effectiveness of TAG
Duplicate Sensitivity MIN : dup. insensitive,AVG : dup. sensitive
Routing Redundancy
Exemplary vs. Summary
MAX : exemplaryCOUNT: summary
Applicability of Sampling, Effect of Loss
Monotonic COUNT : monotonicAVG : non-monotonic
Hypothesis Testing, Snooping
47