David Chu--UC Berkeley Amol Deshpande--University of Maryland
Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei
Hong--Arched Rock Corp. Approximate Data Collection in Sensor
Networks using Probabilistic Models ICDE 2006 1 klhsueh
09.11.03
Slide 2
Outline Introduction Ken architecture Replicated Dynamic
Probabilistic Model Choosing the Prediction Model Evaluation
Conclusion 2
Slide 3
Sensing data Introduction 3 Kept in sync
Slide 4
Outline Introduction Ken architecture Replicated Dynamic
Probabilistic Model Choosing the Prediction Model Evaluation
Conclusion 4
Slide 5
Ken Operation Is the expected values accurate enough? Find the
attributes that are useful to the prediction. No 5 sourcesink
Slide 6
Ken Operation 1. Compute the probability distribution function
(pdf) 2. Compute the expected value according to the pdf 3. If then
stop. 4. Otherwise: a. Find the smallest such that the expected
value according to the pdf is accurate enough. a. Send the values
of attributes in X to the sink. source (at time t) 6
Slide 7
Ken Operation 1. Compute the probability distribution function
2. If the sink received from the source values of attributes in,
then condition p using these values as described in sources Step
4(a) above. 3. Compute the expected values of the attributes, and
use them as the approximation to the true values. sink (at time t)
7
Slide 8
Outline Introduction Ken architecture Replicated Dynamic
Probabilistic Model Choosing the Prediction Model Evaluation
Conclusion 8
Slide 9
Replicated Dynamic Probabilistic Model Ex1: very simple
prediction model Ex2: linear prediction model Assume that the data
value remains constant over time. 9 It utilizes the temporal
correlations, but ignores spatial correlations. Considering both
correlations Ken uses dynamic probabilistic model.
Slide 10
Dynamic Probabilistic Model A probability distribution function
(pdf) for the initial state A transition model The pdf at time t+1
Replicated Dynamic Probabilistic Model observations communicated to
the sink. 10
Slide 11
Ex3: 2-dimensional linear Gaussian model Replicated Dynamic
Probabilistic Model Compute expected values Wonly have to
communicate one value to the sink because of spatial correlations.
11 Not accurate!
Slide 12
Outline Introduction Ken architecture Replicated Dynamic
Probabilistic Model Choosing the Prediction Model Evaluation
Conclusion 12
Slide 13
Choosing the Prediction Model Total communication cost :
intra-source Checking whether the prediction is accurate.
source-sink Sending a set of values to the sink. 13
Slide 14
Choosing the Prediction Model Ex3: Disjoint-Cliques Model
Exhaustive algorithm for finding optimal solution Greedy heuristic
algorithm 14 Reduce intra-source cost & Utilizing spatial
correlations between attributes
Slide 15
Choosing the Prediction Model Ex4: Average Model 15
Slide 16
Outline Introduction Ken architecture Replicated Dynamic
Probabilistic Model Choosing the Prediction Model Evaluation
Conclusion 16
Slide 17
Evaluation 17 Real-world sensor network data Lab: Intel
Research Lab in Berkeley consisting of 49 mica2 motes Garden: UC
Berkeley Botanical Gardens consisting of 11 mica2 motes. Three
attributes: temperature, humidity, voltage time-varying
multivariate Gaussians We estimated the model parameters using the
first 100 hours of data (training data), and used traces from the
next 5000 hours (test data) for evaluating Ken. error bounds of 0.5
o C for temperature, 2% for humidity and 0.1V for battery
voltage.
Slide 18
Evaluation 18
Slide 19
Evaluation 19 Comparison Schemes TinyDB: always reports all
sensor values to the base station Approximate Caching: caches the
last reported reading at the sink and source, and sources do not
report if the cached reading is within the threshold of the current
reading. Ken with Disjoint-Cliques (DjC) and Average (Avg) models:
Greedy-k heuristic algorithm to find the Disjoint-Clique model
(DjCk)
Slide 20
Evaluation 20 Ken and ApC both achieve significant savings over
TinyDB Average reports at a higher rate than Disjoint-Cliques with
max clique size restricted to 2 (DjC2). Capturing and modeling
temporal correlations alone may not be sufficient to outperform
caching. Utilizing spatial correlations Garden dataset have more
data reduction 21% 36%
Slide 21
Evaluation 21 Disjoint-Cliques Models
Slide 22
Evaluation 22 Quantify the merit of various clique size
Physical deployment may not have sufficiently strong spatial
correlations.
Slide 23
Evaluation 23 Base station resides at the east end of the
network. The areas closer to the base station do not benefit from
larger cliques
Slide 24
Evaluation 24
Slide 25
Conclusion 25 We propose a robust approximate technique called
Ken that uses replicated dynamic probabilistic models to minimize
communication from sensor nodes to the networks PC base
station.