Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | ashlynn-floyd |
View: | 216 times |
Download: | 0 times |
1
Report of Advanced Data Base Topics Project
Instructor : Dr. rahgozar
euhanna ghadimi, Ali abbasi , kave pashaii
Data Storage selection in sensor networks
2
Outline
1. IntroductionDefinition, Applications, Differences, Storage2. Queries2.1. Querying in Cougar2.2. Querying in TinyDB2.3. In-network Aggregation3. Other Issues
3
Introduction
From a data storage point of view, a sensor network database :“a distributed database that collects physical measurements about the environment, indexes them, and serves queries from users and other applications external to or from within the network”Research in sensor network databases:• relatively new• can benefit from current efforts in data
streams and P2P networks
4
Disaster ResponseCirculatory Net
EmbedEmbed numerous distributed devices to monitor and interact with physical world: in work-spaces, hospitals, homes, vehicles, and “the environment” (water, soil, air…)
Network these devices so that they can coordinate to perform higher-level tasks.
Requires robust distributed systems of tens of thousands of devices.
The long term goal
5
Sensor Net Sample Apps
Traditional monitoring apparatus.
Earthquake monitoring in shake-test sites.
Vehicle detection: sensors along a road, collect data about passing vehicles.
Habitat Monitoring: Storm petrels on Great Duck Island, microclimates on James Reserve.
6
Overview of research
• Sensor network challenges• One approach: Directed diffusion
• Basic algorithm • Initial simulation results (Intanagowat)
• Other interesting localized algorithms in progress:
• Aggregation (Kumar)• Adaptive fidelty (Xu)• Address free architecture, Time synch (Elson)• Localization (Bulusu, Girod)• Self-configuration using robotic nodes (Bulusu, Cerpa)• Instrumentation and debugging (Jerry Zhao)
7
The Challenge is Dynamics!The physical world is dynamic • Dynamic operating conditions• Dynamic availability of resources
• … particularly energy!• Dynamic tasks
Devices must adapt automatically to the environment• Too many devices for manual configuration• Environmental conditions are unpredictable
Unattended and un-tethered operation is key to many applications
8
ApproachEnergy is the bottleneck resource• And communication is a major consumer--avoid
communication over long distances
Pre-configuration and global knowledge are not applicable• Achieve desired global behavior through
localized interactions • Empirically adapt to observed environment
Leverage points• Small-form-factor nodes, densely distributed to
achieve Physical locality to sensed phenomena• Application-specific, data-centric networks• Data processing/aggregation inside the network
9
Directed Diffusion ConceptsApplication-aware communication primitives• expressed in terms of named data (not in terms of the nodes
generating or requesting data)
Consumer of data initiates interest in data with certain attributesNodes diffuse the interest towards producers via a sequence of local interactionsThis process sets up gradients in the network which channel the delivery of dataReinforcement and negative reinforcement used to converge to efficient distributionIntermediate nodes opportunistically fuse interests, aggregate, correlate or cache data
10
Illustrating Directed Diffusion
Sink
Source
Setting up gradients
Sink
Source
Sending data
Sink
Source
Recoveringfrom node failure
Sink
Source
Reinforcingstable path
11
Sensor Network Tomography: Key Ideas and Challenges
Kinds of tomograms• network health
• resource-level indicators
• responses to external stimuli
Can exchange resource health • during low-level
housekeeping functions• … such as radio
synchronization
Key challenge: energy-efficiency• need to aggregate local
representations• algorithms must auto-scale• outlier indicators are
different
12
Self configuring networks using and supporting robotic nodes
(Bulusu, Cerpa, Estrin, Heidemann, Mataric, Sukhatme)
Robotics introduces self-mobile nodes and adaptively placed nodesSelf configuring ad hoc networks in the context of unpredictable RF environment
Place nodes for network augmentation or formationPlace beacons for localization granularity
13
Programming Sensor Nets Is Hard
• Months of lifetime required from small batteries• 3-5 days naively; can’t recharge often• Interleave sleep with processing
–Lossy, low-bandwidth, short range communication
»Nodes coming and going»~20% loss @ 5m»Multi-hop
–Remote, zero administration deployments–Highly distributed environment–Limited Development Tools»Embedded, LEDs for Debugging!
Need high level abstractions!
Current (mA) by Processing Phase
0
5
10
15
20
Processing Processing &
Listening
Processing &
Transmitting
I dle
Cur
rent
(m
A)
200-800 instructions per bit transmitted!
High-Level Abstraction Is Needed!
14
A Solution: Declarative QueriesUsers specify the data they want• Simple, SQL-like queries• Using predicates, not specific addresses• Same spirit as Cougar – Our system: TinyDB
Challenge is to provide:• Expressive & easy-to-use interface• High-level operators
• Well-defined interactions• “Transparent Optimizations” that many programmers would miss
• Sensor-net specific techniques
• Power efficient execution framework
Question: do sensor networks change query processing?
Yes!
15
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther Research Future Directions
16
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther Research Future Directions
17
TinyDB Demo
18
TinyOS
Schema
Query Processor
Multihop Network
TinyDB Architecture
Schema:•“Catalog” of commands & attributes
Filterlight >
400get (‘temp’)
Aggavg(tem
p)
QueriesSELECT AVG(temp) WHERE light > 400
ResultsT:1, AVG: 225T:2, AVG: 250
Tables Samples got(‘temp’)
Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…
getTempFunc(…)getTempFunc(…)
TinyDBTinyDB
~10,000 Lines Embedded C Code
~5,000 Lines (PC-Side) Java
~3200 Bytes RAM (w/ 768 byte heap)
~58 kB compiled code
(3x larger than 2nd largest TinyOS Program)
19
Declarative Queries for Sensor Networks
Examples:SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s
1EpocEpoc
hhNodeiNodei
ddnestNnestN
ooLightLight
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
Sensors
“Find the sensors in bright nests.”
20
Aggregation Queries
Epoch region CNT(…) AVG(…)
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
“Count the number occupied nests in each loud region of the island.”
SELECT region, CNT(occupied) AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
3
Regions w/ AVG(sound) > 200
SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
2
21
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther Research Future Directions
22
Tiny Aggregation (TAG)
In-network processing of aggregates• Common data analysis operation
• Aka gather operation or reduction in || programming
• Communication reducing• Operator dependent benefit
• Across nodes during same epoch
Exploit query semantics to improve efficiency!
23
Query Propagation Via Tree-Based Routing
Tree-based routing• Used in:
• Query delivery • Data collection
• Topology selection is important; e.g.
• Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS 2002, Heidemann, SOSP 2001
• LEACH/SPIN, Heinzelman et al. MOBICOM 99
• SIGMOD 2003• Continuous process
• Mitigates failures
A
B C
D
FE
Q:SELECT …
Q Q
Q
Q
Q
Q
Q
Q QQ
R:{…}
R:{…}
R:{…}
R:{…} R:{…}
24
Basic Aggregation
In each epoch:• Each node samples local sensors once• Generates partial state record (PSR)
• local readings • readings from children
• Outputs PSR during assigned comm. interval
At end of epoch, PSR for whole network output at rootNew result on each successive epoch
Extras:• Predicate-based partitioning via GROUP BY
1
2 3
4
5
25
Illustration: Aggregation
1 2 3 4 5
4 1
3
2
1
4
1
2 3
4
5
1
Sensor #
Inte
rval #
Interval 4SELECT COUNT(*) FROM sensors
Epoch
26
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2
1
4
1
2 3
4
5
2
Sensor #
Interval 3SELECT COUNT(*) FROM sensors
Inte
rval #
27
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
2 3
4
5
31
Sensor #
Interval 2SELECT COUNT(*) FROM sensors
Inte
rval #
28
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
1
2 3
4
5
5
Sensor #
SELECT COUNT(*) FROM sensors Interval 1
Inte
rval #
29
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
1
2 3
4
5
1
Sensor #
SELECT COUNT(*) FROM sensors Interval 4
Inte
rval #
30
Interval Assignment: An Approach
1
2 3
4
5
SELECT SELECT COUNT(*)…COUNT(*)…4 intervals / epoch
Interval # = Level
4
3
Level = 1
2
Epoch
Comm Interval
4 3 2 1 555
ZZ
ZZ
ZZZ
ZZ
ZZ
Z ZZ
Z ZZ
Z
ZZ
ZZ
ZZ Z
ZZ
ZZ
Z ZZ
ZZ
ZZ
ZZ
ZZ
ZZ Z
ZZ
ZZ
Z
ZZ
Z
ZZ
Z
ZZ
Z
L T
L T
L T
T
L T
L LPipelining: Increase throughput by delaying result arrival until a later epoch
Madden, Szewczyk, Franklin, Culler. Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks. WMCSA 2002.
•CSMA for collision avoidance
•Time intervals for power conservation
•Many variations(e.g. Yao & Gehrke, CIDR 2003)
•Time Sync (e.g. Elson & Estrin OSDI 2002)
31
Aggregation Framework
• As in extensible databases, we support any aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} <a0>
Fmerge {<a1>,<a2>} <a12>
Fevaluate {<a1>} aggregate valueExample: Average
AVGinit {v} <v,1>
AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} S/C
Partial State Record (PSR)
Restriction: Merge associative, commutative
32
Types of Aggregates
SQL supports MIN, MAX, SUM, COUNT, AVERAGE
Any function over a set can be computed via TAG
In network benefit for many operations• E.g. Standard deviation, top/bottom N,
spatial union/intersection, histograms, etc. • Compactness of PSR
33
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther ResearchFuture Directions
34
Simulation Environment
Evaluated TAG via simulation
Coarse grained event based simulator• Sensors arranged on a grid• Two communication models
• Lossless: All neighbors hear all messages• Lossy: Messages lost with probability that increases
with distance
Communication (message counts) as performance metric
35
Taxonomy of Aggregates
TAG insight: classify aggregates according to various functional properties• Yields a general set of optimizations that can
automatically be applied
PropertiesPartial State
MonotonicityExemplary vs. SummaryDuplicate Sensitivity
Drives an API!
36
Partial State
Growth of PSR vs. number of aggregated values (n) • Algebraic: |PSR| = 1 (e.g. MIN)• Distributive: |PSR| = c (e.g. AVG)• Holistic: |PSR| = n (e.g. MEDIAN)• Unique: |PSR| = d (e.g. COUNT DISTINCT)
• d = # of distinct values• Content Sensitive: |PSR| < n (e.g. HISTOGRAM)
Property Examples AffectsPartial State MEDIAN : unbounded,
MAX : 1 recordEffectiveness of TAG
“Data Cube”, Gray et. al
37
Benefit of In-Network Processing
Simulation Results
2500 Nodes
50x50 Grid
Depth = ~10
Neighbors = ~20
Uniform Dist.
Total Bytes Xmitted vs. Aggregation Function
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
EXTERNAL MAX AVERAGE DI STI NCT MEDI AN
Aggregation Function
Tot
al B
ytes
Xm
itte
d
•Aggregate & depth dependent benefit!
HolisticHolisticUniqueUnique
DistributiveDistributiveAlgebraicAlgebraic
38
Monotonicity & Exemplary vs. Summary
Property Examples AffectsPartial State MEDIAN : unbounded,
MAX : 1 recordEffectiveness of TAG
Monotonicity COUNT : monotonicAVG : non-monotonic
Hypothesis Testing, Snooping
Exemplary vs. Summary
MAX : exemplaryCOUNT: summary
Applicability of Sampling, Effect of Loss
39
Channel Sharing (“Snooping”)
Insight: Shared channel can reduce communication
Suppress messages that won’t affect aggregate• E.g., MAX• Applies to all exemplary, monotonic aggregates
Only snoop in listen/transmit slots• Future work: explore snooping/listening tradeoffs
40
Hypothesis Testing
Insight: Guess from root can be used for suppression• E.g. ‘MIN < 50’• Works for monotonic & exemplary aggregates
• Also summary, if imprecision allowed
How is hypothesis computed?• Blind or statistically informed guess• Observation over network subset
41
Experiment: Snooping vs. Hypothesis Testing
Uniform Value Distribution
Dense Packing
Ideal Communication
Messages/ Epoch vs. Network Diameter(SELECT MAX(attr), R(attr) = [0,100])
0
500
1000
1500
2000
2500
3000
10 20 30 40 50
Network Diameter
Messages /
Epoch
No Guess
Guess = 50
Guess = 90
Snooping
Pruning in Network
Pruning at Leaves
42
Duplicate Sensitivity
Property Examples AffectsPartial State MEDIAN : unbounded,
MAX : 1 recordEffectiveness of TAG
Monotonicity COUNT : monotonicAVG : non-monotonic
Hypothesis Testing, Snooping
Exemplary vs. Summary
MAX : exemplaryCOUNT: summary
Applicability of Sampling, Effect of Loss
Duplicate Sensitivity
MIN : dup. insensitive,AVG : dup. sensitive
Routing Redundancy
43
Use Multiple Parents
Use graph structure • Increase delivery probability with no communication
overhead
For duplicate insensitive aggregates, orAggs expressible as sum of parts• Send (part of) aggregate to all parents
• In just one message, via multicast
• Assuming independence, decreases variance
SELECT COUNT(*)
A
B C
R
A
B C
c
R
P(link xmit successful) = p
P(success from A->R) = p2
E(cnt) = c * p2
Var(cnt) = c2 * p2 * (1 – p2) V
# of parents = n
E(cnt) = n * (c/n * p2)
Var(cnt) = n * (c/n)2 * p2 * (1 – p2) = V/n
A
B C
c/n c/n
R
n = 2
44
Multiple Parents Results
Better than previous analysis expected!Losses aren’t independent!Insight: spreads data over many links
Benefit of Result Splitting (COUNT query)
0
200
400
600
800
1000
1200
1400
(2500 nodes, lossy radio model, 6 parents per node)
Avg
. C
OU
NT Splitting
No Splitting
Critical Link!
No Splitting With Splitting
45
Taxonomy Related Insights
Communication Reducing• In-network Aggregation (Partial State)• Hypothesis Testing (Exemplary & Monotonic)• Snooping (Exemplary & Monotonic)• Sampling
Quality Increasing• Multiple Parents (Duplicate Insensitive)• Child Cache
46
TAG Contributions
Simple but powerful data collection language• Vehicle tracking:
SELECT ONEMAX(mag,nodeid)EPOCH DURATION 50ms
Distributed algorithm for in-network aggregation• Communication Reducing• Power Aware
• Integration of sleeping, computation• Predicate-based grouping
Taxonomy driven API • Enables transparent application of techniques to
• Improve quality (parent splitting)• Reduce communication (snooping, hypo. testing)
47
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther Research Future Directions
48
Acquisitional Query Processing (ACQP)
Closed world assumption does not hold
• Could generate an infinite number of samples
An acqusitional query processor controls
• when,
• where,
• and with what frequency data is collected!
Versus traditional systems where data is provided a priori
Madden, Franklin, Hellerstein, and Hong. The Design of
An Acqusitional Query Processor. SIGMOD, 2003
49
ACQP: What’s Different?
How should the query be processed?• Sampling as a first class operation• Event – join duality
How does the user control acquisition?• Rates or lifetimes• Event-based triggers
Which nodes have relevant data?• Index-like data structures
Which samples should be transmitted?• Prioritization, summary, and rate control
50
• E(sampling mag) >> E(sampling light)
1500 uJ vs. 90 uJ
Operator Ordering: Interleave Sampling + Selection
SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag light
Traditional DBMS
ACQP
At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!
Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective
and pred2 is not):and pred2 is not):
Cheap
Costly
51
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s
• Novel, general pushdown technique
•Mag sampling is the most expensive operation!
WINMAX
(mag>x)
mag light
Traditional DBMS
light
mag
(mag>x)
WINMAX
(light > MAX)
ACQP
52
Lifetime Queries
Lifetime vs. sample rateSELECT …EPOCH DURATION 10 s
SELECT …LIFETIME 30 days
Extra: Allow a MAX SAMPLE PERIOD• Discard some samples• Sampling cheaper than transmitting
53
(Single Node) Lifetime Prediction
Voltage vs. Time, Measured Vs. ExpectedLif etime Goal = 24 Weeks (4032 Hours. 15 s / sample)
R2 = 0.8455
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000Time (Hours)
Vol
tage
(Raw
Uni
ts)
Voltage (Expected)Voltage (Measured)Linear Fit
950
970
990
1010
1030
0 100 200 300
ExpectedMeasured
I nsuffi cient Voltage to
Operate (V = 350)
54
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy & ExperimentsAcquisitional Query ProcessingOther ResearchFuture Directions
55
Sensor Network Challenge Problems
Temporal aggregates
Sophisticated, sensor network specific aggregates• Isobar Finding• Vehicle Tracking• Lossy compression
• WaveletsHellerstein, Hong, Madden, and Stanek. Beyond Average. IPSN 2003
“Isobar Finding”
56
TinyDB Deployments
Initial efforts:• Network monitoring• Vehicle tracking
Ongoing deployments:• Environmental monitoring • Generic Sensor Kit• Building Monitoring• Golden Gate Bridge
57
Data Storage
Recently IntroducedLarger capacity, larger battery powerUsual sensors send their data to itIt replies queries
(sheng et. al ACM MobiHoc 2006)
58
Problems Data Storage Placement• (Sheng et. al paper)
Data Storage Selection• Our method : An adaptive and
decentralized method
59
Costs in the system
60
Overall cost
61
Our method
62
Our method (Cont.)
63
Our results
0
500
1000
1500
2000
2500
1 19 29 43 55 65 79 92 106
117
131
145
157
166
177
191
204
235
248
id
cost
Very Good !!
64
References
Book:• Wireless Sensor Networks: An Information
Processing Approach, by F. Zhao and L. Guibas, Elsevier, 2004.
Papers:• [1]Bo Sheng, Qun Li, and Weizhen
Mao. Data Storage Placement in sensor networks ,ACM Mobihoc 2006, Florence, Italy, May 22-25, 2006,
• [2]B. Bonfils,.P. Bonnet , Adaptive and Decentralized Operator Placement for In-Network Query Processing ,2003, springer verlag .
65
References(Cont.)• [3]S. Bhattacharya, H. Kim, S. Prabh, and T.
Abdelzaher. Energy-conserving data placement and asynchronous multicast in wireless sensor networks. In Proceedings of the 1st international conference on Mobile systems, applications and services, pages 173–185, New York, NY, USA, 2003. ACM Press.
• [4]H. S. Kim, T. F. Abdelzaher, and W. H. Kwon. Minimum-energy asynchronous dissemination to mobile sinks in wireless sensor networks. In Proceedings of the 1st international conference on Embedded networked sensor systems, pages 193–204, New York, NY, USA, 2003. ACM Press.
• [5] A. Trigoni, Y. Yao, A. Demers, J. Gehrke and R. Rajaraman. Multi-Query Optimization for Sensor Networks. in the International Conference on Distributed Processing on Sensor Systems (DCOSS), 2005.
66
References(Cont.)
• [6]Madden S., Franklin M.J., Hellerstein J.M., Hong W., The Design of an Acquisitional Query Processor For Sensor Networks, Proc. Int. Conf. on Management of Data (SIGMOD), San Diego (USA), 2003.
• [7] P. Bonnet, J. Gehrke, P. Seshadri, Towards Sensor Database Systems, Lecture Notes in Computer Science, 2001, Springer Verlag