TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks

Post on 12-Jan-2016

42 views 2 download

description

TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks. Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December 9th, 2002 @ OSDI. TAG Introduction. What is a sensor network? Programming Sensor Networks Is Hard Declarative Queries Are Easy - PowerPoint PPT Presentation

transcript

1

TAG: A Tiny Aggregation Service for Ad-Hoc Sensor

Networks

Samuel MaddenUC Berkeley

withMichael Franklin, Joseph Hellerstein, and Wei

Hong

December 9th, 2002 @ OSDI

2

TAG Introduction

• What is a sensor network?• Programming Sensor Networks Is

Hard• Declarative Queries Are Easy

– Tiny Aggregation (TAG): In-network processing via declarative queries!

• Example: » Vehicle tracking application: 2

weeks for 2 students» Vehicle tracking query: took 2

minutes to write, worked just as well!

SELECT MAX(mag) FROM sensors WHERE mag > threshEPOCH DURATION 64ms

3

Overview

• Sensor Networks• Queries in Sensor Nets• Tiny Aggregation

– Overview– Simulation & Results

4

Overview

• Sensor Networks• Queries in Sensor Nets• Tiny Aggregation

– Overview– Simulation & Results

5

Device Capabilities

• “Mica Motes”– 8bit, 4Mhz processor

» Roughly a PC AT

– 40kbit CSMA radio– 4KB RAM, 128K flash, 512K EEPROM– TinyOS based

• Variety of other, similar platforms exist– UCLA WINS, Medusa, Princeton ZebraNet,

MIT Cricket

6

Sensor Net Sample Apps

Habitat Monitoring: Storm petrels on great duck island, microclimates on James Reserve.

Traditional monitoring apparatus.

Earthquake monitoring in shake-test sites.

Vehicle detection: sensors along a road, collect data about passing vehicles.

7

Metric: Communication• Lifetime from one

pair of AA batteries – 2-3 days at full

power– 6 months at 2%

duty cycle

• Communication dominates cost– 100s of uS to

compute– 30mS to send

message

• Our metric: communication!

Time v. Current Draw During Query Processing

0

5

10

15

20

0 0.5 1 1.5 2 2.5 3Time (s)

Cu

rre

nt

(mA

) Snoozing

Processing

Processingand Listening

Transmitting

8

Communication In Sensor Nets

• Radio communication has high link-level losses– typically about 20%

@ 5m

• Ad-hoc neighbor discovery

• Tree-based routing

A

B C

D

FE

9

Overview

• Sensor Networks• Queries in Sensor Nets• Tiny Aggregation

– Overview– Optimizations & Results

10

Declarative Queries for Sensor Networks

• Examples:SELECT nodeid, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

1 EpocEpochh

NodeiNodeidd

LightLight TemTempp

AccelAccel SounSoundd

0 1 455 x x x

0 2 389 x x x

1 1 422 x x x

1 2 405 x x x2

SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

2SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

roomNo,

GROUP BY roomNo

HAVING AVG(sound) > 200

Rooms w/ sound > 200

Sensors

11

Overview

• Sensor Networks• Queries in Sensor Nets• Tiny Aggregation

– Overview– Optimizations & Results

12

TAG

• In-network processing of aggregates– Common data analysis operation

» Aka gather operation or reduction in || programming

– Communication reducing» Benefit operation dependent

– Across nodes during same epoch

• Exploit semantics improve efficiency!

13

Query Propagation

SELECT COUNT(*)SELECT COUNT(*)……

1

2 3

4

5

14

Pipelined Aggregates

• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)

» local readings » readings from children

– Outputs PSR from previous epoch.

• After (depth-1) epochs, PSR for the whole tree output at root

1

2 3

4

5Value from 5 produced at

time t arrives at 1 at time

(t+3)

Value from 2 produced at

time t arrives at 1 at time

(t+1)

• To avoid combining PSRs from different epochs, sensors must cache values from children

15

Illustration: Pipelined Aggregation

1

2 3

4

5

SELECT COUNT(*) FROM sensors

Depth = d

16

Illustration: Pipelined Aggregation

1 2 3 4 5

1 1 1 1 1 1

1

2 3

4

5

1

1

11

1

Sensor #

Ep

och

#

Epoch 1SELECT COUNT(*) FROM sensors

17

Illustration: Pipelined Aggregation

1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

1

2 3

4

5

1

2

21

3

Sensor #

Ep

och

#

Epoch 2SELECT COUNT(*) FROM sensors

18

Illustration: Pipelined Aggregation

1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

1

2 3

4

5

1

2

31

4

Sensor #

Ep

och

#

Epoch 3SELECT COUNT(*) FROM sensors

19

Illustration: Pipelined Aggregation

1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

4 5 1 3 2 1

1

2 3

4

5

1

2

31

5

Sensor #

Ep

och

#

Epoch 4SELECT COUNT(*) FROM sensors

20

Illustration: Pipelined Aggregation

1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

4 5 1 3 2 1

5 5 1 3 2 1

1

2 3

4

5

1

2

31

5

Sensor #

Ep

och

#

Epoch 5SELECT COUNT(*) FROM sensors

21

Aggregation Framework

• As in extensible databases, we support any aggregation function conforming to:

Aggn={finit, fmerge, fevaluate}

finit{a0} <a0>

Fmerge{<a1>,<a2>} <a12>

Fevaluate{<a1>} aggregate value

(Merge associative, commutative!)Example: Average

AVGinit {v} <v,1>

AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>

AVGevaluate{<S, C>} S/C

Partial State Record (PSR)

22

Types of Aggregates

• SQL supports MIN, MAX, SUM, COUNT, AVERAGE

• Any function can be computed via TAG

• In network benefit for many operations– E.g. Standard deviation, top/bottom N,

spatial union/intersection, histograms, etc. – Compactness of PSR

23

Taxonomy of Aggregates• TAG insight: classify aggregates according to

various functional properties– Yields a general set of optimizations that can

automatically be appliedProperty Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Monotonic COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

24

TAG Advantages

• Communication Reduction– Important for power and contention

• Continuous stream of results– In the absence of faults, will converge

to right answer• Lots of optimizations

– Based on shared radio channel– Semantics of operators

25

Simulation Environment

• Evaluated via simulation

• Coarse grained event based simulator– Sensors arranged on a grid– Two communication models

»Lossless: All neighbors hear all messages»Lossy: Messages lost with probability that

increases with distance

26

Simulation Result

Total Bytes Xmitted vs. Aggregation Function

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function

Tota

l Byt

es X

mitt

ed

Simulation Results

2500 Nodes

50x50 Grid

Depth = ~10

Neighbors = ~20

Some aggregates require dramatically more state!

27

Optimization: Channel Sharing (“Snooping”)

• Insight: Shared channel enables optimizations

• Suppress messages that won’t affect aggregate– E.g., MAX

– Applies to all exemplary, monotonic aggregates

28

Optimization: Hypothesis Testing

• Insight: Guess from root can be used for suppression– E.g. ‘MIN < 50’– Works for monotonic & exemplary aggregates

»Also summary, if imprecision allowed

• How is hypothesis computed?– Blind or statistically informed guess– Observation over network subset

29

Experiment: Hypothesis Testing

Uniform Value Distribution, Dense Packing, Ideal Communication

Messages/ Epoch vs. Network Diameter(SELECT MAX(attr), R(attr) = [0,100])

0

500

1000

1500

2000

2500

3000

10 20 30 40 50

Network Diameter

Messages /

Epoch

No Guess

Guess = 50

Guess = 90

Snooping

30

Optimization: Use Multiple Parents

• For duplicate insensitive aggregates• Or aggregates that can be expressed as a linear

combination of parts– Send (part of) aggregate to all parents

» In just one message, via broadcast

– Decreases variance

A

B C

A

B C

A

B C

1

A

B C

A

B C

1/2 1/2

31

Multiple Parents Results

• Better than previous analysis expected!

• Losses aren’t independent!

• Insight: spreads data over many links

Benefit of Result Splitting (COUNT query)

0

200

400

600

800

1000

1200

1400

(2500 nodes, lossy radio model, 6 parents per node)

Avg

. C

OU

NT Splitting

No Splitting

Critical Link!

No Splitting With Splitting

32

Summary

• TAG enables in-network declarative query processing – State dependent communication benefit– Transparent optimization via taxonomy

» Hypothesis Testing» Parent Sharing

• Declarative queries are the right interface for data collection in sensor nets!– Easier to program and more efficient for vast majority

of users

TinyDB Release Available - TinyDB Release Available - http://telegraph.cs.berkeley.edu/tinydb

33

Questions?

TinyDB Demo After The Session…

34

TinyOS

• Operating system from David Culler’s group at Berkeley

• C-like programming environment

• Provides messaging layer, abstractions for major hardware components– Split phase highly asynchronous, interrupt-

driven programming model

Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS 2000. See http://webs.cs.berkeley.edu/tos

35

In-Network Processing in TinyDB

SELECT AVG(light)EPOCH DURATION 4s

• Cost metric = #msgs• 16 nodes• 150 Epochs• In-net loss rates: 5%• External loss: 15%• Network depth: 4

In-Network vs. Out of Network Aggregation

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

In-Network External

# M

essa

ges

36

Grouping• Recall: GROUP BY expression partitions sensors into

distinct logical groups– E.g. “partition sensors by room number”

• If query is grouped, sensors apply expression on each epoch

• PSRs tagged with group• When a PSR (with group) is received:

– If it belongs to a stored group, merge with existing PSR– If not, just store it

• At the end of each epoch, transmit one PSR per group• Need to evict if storage overflows.

37

Group Eviction• Problem: Number of groups in any one iteration

may exceed available storage on sensor• Solution: Evict! (Partial Preaggregation*)

– Choose one or more groups to forward up tree– Rely on nodes further up tree, or root, to recombine

groups properly– What policy to choose?

» Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.

» Experiments suggest:• Policy matters very little• Evicting as many groups as will fit into a single message is

good

* Per-Åke Larson. Data Reduction by Partial Preaggregation. ICDE 2002.

38

Declarative Benefits In Sensor Networks

• Vastly simplifies execution for large networks– Since locations are described by predicates– Operations are over groups

• Enables tolerance to faults– Since system is free to choose where and

when operations happen

• Data independence– System is free to choose where data lives,

how it is represented

39

Simulation Screenshot

40

Hypothesis Testing For Average

• AVERAGE: each node suppresses readings within some ∆ of a approximate average µ*. – Parents assume children who don’t report

have value µ*

• Computed average cannot be off by more than ∆.

41

TinyAlloc• Handle Based Compacting Memory Allocator• For Catalog, Queries

Free Bitmap

Heap

Master Pointer Table

Handle h;

call MemAlloc.alloc(&h,10);

(*h)[0] = “Sam”;

call MemAlloc.lock(h);

tweakString(*h);

call MemAlloc.unlock(h);

call MemAlloc.free(h);

User Program

Free Bitmap

Heap

Master Pointer Table

Free Bitmap

Heap

Master Pointer Table

Free Bitmap

Heap

Master Pointer Table

Compaction

42

Schema

• Attribute & Command IF – At INIT(), components register attributes

and commands they support» Commands implemented via wiring» Attributes fetched via accessor command

– Catalog API allows local and remote queries over known attributes / commands.

• Demo of adding an attribute, executing a command.

43

Q1: Expressiveness

• Simple data collection satisfies most users

• How much of what people want to do is just simple aggregates?– Anecdotally, most of it– EE people want filters + simple statistics

(unless they can have signal processing)

• However, we’d like to satisfy everyone!

44

Query Language

• New Features:– Joins– Event-based triggers

»Via extensible catalog

– In network & nested queries– Split-phase (offline) delivery

»Via buffers

45

Sample Query 1Bird counter:CREATE BUFFER birds(uint16 cnt)

SIZE 1

ON EVENT bird-enter(…)SELECT b.cnt+1FROM birds AS bOUTPUT INTO bONCE

46

Sample Query 2

Birds that entered and left within time t of each other:

ON EVENT bird-leave AND bird-enter WITHIN tSELECT bird-leave.time, bird-leave.nestWHERE bird-leave.nest = bird-enter.nestONCE

47

Sample Query 3

Delta compression:

SELECT light FROM buf, sensorsWHERE |s.light – buf.light| > tOUTPUT INTO bufSAMPLE PERIOD 1s

48

Sample Query 4Offline Delivery + Event ChainingCREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16

yAccel)SIZE 1000PARTITION BY NODE

SELECT xAccel, yAccelFROM SENSORSWHERE xAccel > t OR yAccel > tSIGNAL shake_start(…)SAMPLE PERIOD 1s

ON EVENT shake_start(…)SELECT loc, xAccel, yAccelFROM sensorsOUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel)SAMPLE PERIOD 10ms

49

Event Based Processing

• Enables internal and chained actions• Language Semantics

– Events are inter-node– Buffers can be global

• Implementation plan– Events and buffers must be local– Since n-to-n communication not (well)

supported

• Next: operator expressiveness

50

Attribute Driven Topology Selection

• Observation: internal queries often over local area*– Or some other subset of the network

»E.g. regions with light value in [10,20]

• Idea: build topology for those queries based on values of range-selected attributes– Requires range attributes, connectivity to be

relatively static* Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level Naming. SOSP, 2001.

51

Attribute Driven Query Propagation

1 2 3

4

[1,10]

[7,15]

[20,40]

SELECT …

WHERE a > 5 AND a < 12

Precomputed intervals == “Query Dissemination Index”

52

Attribute Driven Parent Selection

1 2 3

4

[1,10] [7,15] [20,40]

[3,6]

[3,6] [1,10] = [3,6]

[3,7] [7,15] = ø

[3,7] [20,40] = ø

Even without intervals, expect that sending to parent with closest value will help

53

Hot off the press…Nodes Vi s i t ed vs . Range Quer y Si ze f or

Di ff er ent I ndex Pol i ci es

0

50

100

150

200

250

300

350

400

450

0.001 0.05 0.1 0.2 0.5 1Quer y Si ze as % of Val ue Range

( Random val ue di st r i but i on, 20x20 gr i d, i deal connect i vi t y t o ( 8) nei ghbor s)

Nu

mb

er

o

f

No

de

s

Vi

si

te

d

(4

00

=

M

ax)

B es t Case (Expec ted)C loses t ParentNeares t V alueSnooping

54

Grouping• GROUP BY expr

– expr is an expression over one or more attributes» Evaluation of expr yields a group number» Each reading is a member of exactly one group

Example: SELECT max(light) FROM sensorsGROUP BY TRUNC(temp/10)

Sensor ID Light Temp Group1 45 25 22 27 28 23 66 34 34 68 37 3

Group max(light)

2 45

3 68

Result:

55

Having

• HAVING preds– preds filters out groups that do not satisfy

predicate– versus WHERE, which filters out tuples that

do not satisfy predicate– Example:

SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100

Yields all groups with temperature under 100

56

Group Eviction• Problem: Number of groups in any one iteration may

exceed available storage on sensor• Solution: Evict!

– Choose one or more groups to forward up tree– Rely on nodes further up tree, or root, to recombine groups

properly– What policy to choose?

» Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.

» Experiments suggest:• Policy matters very little• Evicting as many groups as will fit into a single message is good

57

Experiment: Basic TAG

Dense Packing, Ideal Communication

Bytes / Epoch vs. Network Diameter

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

10 20 30 40 50

Network Diameter

Avg

. B

yte

s /

Ep

och

COUNTMAXAVERAGEMEDIANEXTERNALDISTINCT

58

Experiment: Hypothesis Testing

Uniform Value Distribution, Dense Packing, Ideal Communication

Messages/ Epoch vs. Network Diameter

0

500

1000

1500

2000

2500

3000

10 20 30 40 50

Network Diameter

Messag

es /

Ep

och

No GuessGuess = 50Guess = 90Snooping

59

Experiment: Effects of Loss

Percent Error From Single Loss vs. Network Diameter

0

0.5

1

1.5

2

2.5

3

3.5

10 20 30 40 50

Network Diameter

Perc

en

t Err

or

Fro

m S

ing

le L

oss

AVERAGECOUNTMAXMEDIAN

60

Experiment: Benefit of Cache

Percentage of Network I nvolved vs. Network Diameter

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Network Diameter

% N

etw

ork

No Cache5 Rounds Cache9 Rounds Cache15 Rounds Cache

61

Pipelined Aggregates• After query propagates, during each epoch:

– Each sensor samples local sensors once– Combines them with PSRs from children– Outputs PSR representing aggregate state

in the previous epoch.• After (d-1) epochs, PSR for the whole tree

output at root– d = Depth of the routing tree– If desired, partial state from top k levels

could be output in kth epoch• To avoid combining PSRs from different

epochs, sensors must cache values from children

1

2 3

4

5

Value from 5 produced at

time t arrives at 1 at time

(t+3)

Value from 2 produced at

time t arrives at 1 at time

(t+1)

62

Pipelining Example

1

2

43

5

SID Epoch Agg.

SID Epoch Agg.

SID Epoch Agg.

63

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

SID Epoch Agg.

1 0 1

SID Epoch Agg.

3 0 1

5 0 1

Epoch 0

<5,0,1>

<4,0,1>

64

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

SID Epoch Agg.

3 0 1

5 0 1

3 1 1

5 1 1

Epoch 1

<5,1,1>

<4,1,1><3,0,2>

<2,0,2>

65

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

2 2 1

4 2 1

3 1 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

1 2 1

2 0 4

SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1

Epoch 2

<5,2,1>

<4,2,1><3,1,2>

<2,0,4>

<1,0,3>

66

Pipelining Example

1

2

43

5

SID Epoch

Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

2 2 1

4 2 1

3 1 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

1 2 1

2 0 4

SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1

Epoch 3

<5,3,1>

<4,3,1><3,2,2>

<2,1,4>

<1,0,5>

67

Pipelining Example

1

2

43

5

Epoch 4

<5,4,1>

<4,4,1><3,3,2>

<2,2,4>

<1,1,5>

68

Our Stream Semantics• One stream, ‘sensors’• We control data rates• Joins between that stream and buffers are

allowed• Joins are always landmark, forward in time, one

tuple at a time– Result of queries over ‘sensors’ either a single tuple

(at time of query) or a stream

• Easy to interface to more sophisticated systems• Temporal aggregates enable fancy window

operations

69

Formal Spec.

ON EVENT <event> [<boolop> <event>... WITHIN <window>] [SELECT {<expr>|agg(<expr>)|temporalagg(<expr>)} FROM [sensors | <buffer> | events]] [WHERE {<pred>}] [GROUP BY {<expr>}] [HAVING {<pred>}] [ACTION [<command> [WHERE <pred>] |

BUFFER <bufname> SIGNAL <event>({<params>}) | (SELECT ... ) [INTO BUFFER <bufname>]]]

[SAMPLE PERIOD <seconds> [FOR <nrounds>] [INTERPOLATE <expr>] [COMBINE {temporal_agg(<expr>)}] |

ONCE]

70

Buffer Commands

[AT <pred>:]CREATE [<type>] BUFFER <name> ({<type>})PARTITION BY [<expr>]SIZE [<ntuples>,<nseconds>][AS SELECT ...

[SAMPLE PERIOD <seconds>]]

DROP BUFFER <name>