Tributaries and Deltas: Tributaries and Deltas: Efficient and Robust Aggregation Efficient and Robust Aggregation
in Sensor Network Streamsin Sensor Network Streams
Amit Manjhi, Suman Nath, Phillip B. Gibbons
Carnegie Mellon University Intel Research Pittsburgh
Amit Manjhi, SIGMOD '052@Carnegie MellonDatabases
Background: SensorsBackground: Sensors
Constraints:– Conserving battery power is important– Communication consumes orders of magnitude
more energy than local computation– Operate in dynamic, harsh environments
Battery-powered tiny devices– Used in Eco-system monitoring at James Reserve, Habitat monitoring at Great Duck
Island, etc.
Amit Manjhi, SIGMOD '053@Carnegie MellonDatabases
Background: Sensor networksBackground: Sensor networks
In-network aggregation is performed to save communication
Important type of query is computing aggregates
e.g., total number of live sensors
Count 3
3
Amit Manjhi, SIGMOD '054@Carnegie MellonDatabases
Existing energy-efficient in-network Existing energy-efficient in-network approaches: Tree and Multi-pathapproaches: Tree and Multi-path
Tree [TinyDB, Cougar]
Multi-path [Considine et al. ICDE ‘04]
+ Robust Topology
- Approximate answer+ Exact answer
- Non-robust topology
Amit Manjhi, SIGMOD '055@Carnegie MellonDatabases
Tree and Multi-path TradeoffsTree and Multi-path Tradeoffs
Can we get the best of both by adapting to changing loss rates?
0
0.2
0.4
0 0.1 0.2 0.3 0.4Loss rate
R M
S
E r
r o
rTreeMulti-pathRobust topology
Exact answer
Loss rate varies with change in conditions
Amit Manjhi, SIGMOD '056@Carnegie MellonDatabases
Our solution: Tributary-DeltaOur solution: Tributary-Delta
• Simultaneously run Tree and Multi-path in different parts of the network
• As energy-efficient as tree or multi-path
• Multi-path region adapts to loss rate
Delta (Multi-path region)
Tributary (Tree region)
Amit Manjhi, SIGMOD '057@Carnegie MellonDatabases
OutlineOutline
• Background and motivation
• Tributary-Delta
• Simple aggregates in TD framework
• Frequent Items in TD framework
• Evaluation
• Related work and conclusion
Amit Manjhi, SIGMOD '058@Carnegie MellonDatabases
How does Tributary-Delta work?How does Tributary-Delta work?
• Correctness: A tree node should not receive aggregates from a multi-path node
• Gives rise to a delta at the centre (multi-path aggregation is used in the nodes at the centre)
Delta (Multi-path region)
Tributary (Tree region)
Delta
TT
T T
Amit Manjhi, SIGMOD '059@Carnegie MellonDatabases
How does Tributary-Delta adapt?How does Tributary-Delta adapt?
Delta
Tributary
TD-Coarse: uniform expansion
TD: focused expansion
Expand or shrink the delta region
• Expand delta increases robustness
• Shrink delta lowers approximation error
Amit Manjhi, SIGMOD '0510@Carnegie MellonDatabases
Computing Aggregates in the Computing Aggregates in the Tributary-Delta FrameworkTributary-Delta Framework
Tree Algorithm: Generate tree partial results
1. Each tree node
2. Each multi-path node
3. Nodes at the boundary
Multi-path Algorithm: Generate multi-path partial results
Conversion Function: Convert tree results to multi-path results
Amit Manjhi, SIGMOD '0511@Carnegie MellonDatabases
Example AggregatesExample Aggregates
• Many useful aggregates can be readily computed within the Tributary-Delta framework– Missing piece: a suitable conversion function
• We provide conversion functions for several aggregates– Count– Sum, Average– Top-k – Uniform sample
Amit Manjhi, SIGMOD '0512@Carnegie MellonDatabases
Computation of “Count”Computation of “Count”
1.Tree Algorithm is simple
2.Multi-path Algorithm [AMS STOC’96]
3 1
a) T T T H: report 3b) Probability of obtaining ‘i’
proportional to 2-i
c) To combine multi-path partial values, take the maximum
d) Max. value is i, estimate is 2i
3. Conversion function: receive count 3, repeat “coin toss” 3 times, and take maximum
3
32
3
3
11
1
2
2 1
2
3
01
Amit Manjhi, SIGMOD '0513@Carnegie MellonDatabases
OutlineOutline
• Background and motivation
• Tributary-Delta
• Simple aggregates in TD framework
• Frequent Items in TD framework
• Evaluation
• Related work and conclusion
Amit Manjhi, SIGMOD '0514@Carnegie MellonDatabases
Finding Frequent ItemsFinding Frequent Items
• Tree Algorithm: – Previous work [Greenwald, Khanna PODS ’04, Manjhi et
al. ICDE ‘05]
– Our tree algorithm achieves optimal bound for total communication
• Multi-path Algorithm: – Previous work [Nath et al. SenSys ’04]– Our multi-path algorithm is more accurate than
previous work
• Conversion Function
Amit Manjhi, SIGMOD '0515@Carnegie MellonDatabases
Formal Problem StatementFormal Problem Statement
…
ApproximateAnswers
Formulate problem as [Manku, Motwani VLDB’02, Manjhi et al. ICDE ’05]
FrequencyCounts
1%0.9%
Find items that are more frequent than 1% with error 0.1%
Amit Manjhi, SIGMOD '0516@Carnegie MellonDatabases
Framework for finding Freq. ItemsFramework for finding Freq. Items
1. Add frequency counts from children
3. Drop counters that are below zero
2. Decrement frequency counts
These steps are repeated at each internal node; decrements depend on height in the tree
Amit Manjhi, SIGMOD '0517@Carnegie MellonDatabases
How much to decrement at different levels?How much to decrement at different levels?
Err
or
Leaf RootExact
Max possible
error
Height
Minimizes communication
on any linkNeed to balance two competing pressures:1. Early reduction of data (near leaf)2. Informed reduction of data (near root)
Minimizes total communication
Late Drop
Early Drop
Geometric decrease in decrement, e.g.: 0.5%, 0.25%, 0.125%,… 0.5%, 0.75%, 0.875%,…., =0.1%
Amit Manjhi, SIGMOD '0518@Carnegie MellonDatabases
Multi-path Algorithm for Freq. ItemsMulti-path Algorithm for Freq. Items
1. Add Duplicate insensitive addition
3. Drop counters below zero
2. Decrement Duplicate insensitive subtraction
2. Drop counters below (rising) threshold
Threshold is maintained based on careful analysis Paper has details on lowering communication
Amit Manjhi, SIGMOD '0519@Carnegie MellonDatabases
OutlineOutline
• Background and motivation
• Tributary-Delta
• Simple aggregates in TD framework
• Frequent Items in TD framework
• Evaluation
• Related works and conclusion
Amit Manjhi, SIGMOD '0520@Carnegie MellonDatabases
Evaluation MethodologyEvaluation Methodology
• The TAG Simulator [Madden et al. OSDI ‘02]• Topology: 600 random sensors in 20 x 20
– Base station is at the center
• Approaches:– Tree-based scheme: TAG – Multi-path scheme: Synopsis Diffusion [Nath
et al. SenSys ‘04]– TD-Coarse: uniform expansion– TD: focused expansion
Amit Manjhi, SIGMOD '0521@Carnegie MellonDatabases
Effects of regional loss rateEffects of regional loss rate
0
0.1
0.2
0.3
0.4
0 0.2 0.4 0.6 0.8 1
Loss rate in shaded region
R M
S
E r
r o
r
Tree Multi-path TD-Coarse TDLoss rate = 0.05
Varyingloss rate
All four approaches use same energy
Amit Manjhi, SIGMOD '0522@Carnegie MellonDatabases
Effects of global loss rateEffects of global loss rate
0
0.25
0.5
0.75
1
0 0.2 0.4 0.6 0.8 1
Loss Rate
RM
S E
rro
r
Tree Multi-path TD-coarse TD
Varying loss rate
0
0.05
0.1
0.15
0.2
0 0.1 0.2 0.3 0.4
Loss Rate
RM
S E
rro
r
Tree Multi-path TD-coarse TD
1. Our methods effectively combine the benefits: perform better than either existing approach2. All four approaches use same energy
Amit Manjhi, SIGMOD '0523@Carnegie MellonDatabases
Computation of frequent itemsComputation of frequent items
0
10
20
30
40
50
60
70
0 0.2 0.4 0.6 0.8 1
Loss rate in shaded region
Fal
se n
egat
ives
in %
Tree Multi-path TD
False positives < 3%
Loss rate = 0.05
Varying loss rate
Data from real sensor deployment
Amit Manjhi, SIGMOD '0524@Carnegie MellonDatabases
Other results in paperOther results in paper
• Adaptation details
• Tree construction algorithm that reduces communication
• 2-approximation for total and maximum load, and extension to quantiles
• More extensive evaluation
Amit Manjhi, SIGMOD '0525@Carnegie MellonDatabases
Related WorkRelated Work
• Existing in-network aggregation algorithms– Tree: TinyDB [Madden et al. SIGMOD ’03]
– Multi-path: Considine et al. ICDE ’04, Bawa et al. SIGMOD ’04, Nath et al. SenSys ‘04
• Adapting to changes in the environment– Directed Diffusion [Intanagowiwat et al.
MobiCOM ’00], TAG [Madden et al. OSDI ’02]
• Frequent items and quantiles– Manku, Motwani VLDB ’02, Greenwald, Khanna
PODS ’04, Manjhi et al. ICDE ‘05
Amit Manjhi, SIGMOD '0526@Carnegie MellonDatabases
ConclusionConclusion
• Tributary-Delta: energy-efficient, and robust solution– Combines benefits of existing tree- and multi-
path based approaches – Adapt to changing network conditions
• Algorithms for finding frequent items
• Results confirm the advantages– Error reduction is up to a factor of 3
Amit Manjhi, SIGMOD '0527@Carnegie MellonDatabases
Future WorkFuture Work
• Deployment in a real scenario —incorporate in TinyDB
• Add other aggregates to the suite of aggregates
Amit Manjhi, SIGMOD '0528@Carnegie MellonDatabases
Back-up slides!
Amit Manjhi, SIGMOD '0529@Carnegie MellonDatabases
Adaptation DetailsAdaptation Details
Ask application for a threshold on percentage contributing
Base station gets overall numbers on % contributing
< >
Decrease delta regionIncrease delta region
Amit Manjhi, SIGMOD '0530@Carnegie MellonDatabases
Tree Construction Algorithm – (1/2)Tree Construction Algorithm – (1/2)
Ring 2 Tree links are subset of ring links
Avoid expensive synchronization
Amit Manjhi, SIGMOD '0531@Carnegie MellonDatabases
Tree Construction Algorithm – (2/2)Tree Construction Algorithm – (2/2)
Ring 2
Opportunistic parent switching: Each node of height i+1 should
have at least 2 nodes of height i
1
1
1
1
1
1
2
2
2
2
3
3
Each i+1 height node pins any two of its height i nodes, and then flags itself.
Any non-pinned node can switchparent to a non-flagged node
Amit Manjhi, SIGMOD '0532@Carnegie MellonDatabases
Multi-path over RingsMulti-path over Rings
• Each node transmits once = optimal energy cost (same as Tree)
Ring 2
• A node is in ring i if it is i hop away from the base-station
• Broadcasts by nodes in ring i are received by nodes in ring i-1
Amit Manjhi, SIGMOD '0533@Carnegie MellonDatabases
A 2-approximation SolutionA 2-approximation SolutionE
rror
Leaf RootExact
Max possible
error
Height
Minimizes communication
on any link
Minimizes total communication
Late Drop
Early Drop
2-approx on both
objectives
Amit Manjhi, SIGMOD '0534@Carnegie MellonDatabases
Minimizing total communication for quantilesMinimizing total communication for quantiles
• Original algorithm by Greenwald, Khanna PODS ’04
• Vary the size of quantiles in a geometric pattern, and the total communication is linear in the number of sensor nodes.
Amit Manjhi, SIGMOD '0535@Carnegie MellonDatabases
Extensive EvaluationExtensive Evaluation
• Evaluation of our frequent items tree algorithm
• Evaluation of our frequent items multi-path algorithm
• How quickly TD and TD-Coarse respond to changes in loss rates?