Date post: | 26-Feb-2023 |
Category: |
Documents |
Upload: | khangminh22 |
View: | 0 times |
Download: | 0 times |
University of California
Los Angeles
A Self-Calibrating System
of Distributed Acoustic Arrays
A dissertation submitted in partial satisfaction
of the requirements for the degree
Doctor of Philosophy in Computer Science
by
Lewis David Girod
2005
The dissertation of Lewis David Girod is approved.
Stefano Soatto
Gregory J. Pottie
Miodrag Potkonjak
Deborah L. Estrin, Committee Chair
University of California, Los Angeles
2005
ii
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . 3
1.2 How to avoid reading this document . . . . . . . . . . . . . . . . 6
I The Array Calibration Problem 8
2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Requirements of Acoustic Detection Applications . . . . . . . . . 9
2.2 Definition of the Calibration Problem . . . . . . . . . . . . . . . . 11
2.3 Outline of Proposed Solution . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Time of Flight Ranging Layer . . . . . . . . . . . . . . . . 13
2.3.2 Multilateration Layer . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Properties of the Acoustic Array Hardware . . . . . . . . . . . . . 15
2.5 Comparison to Related Systems . . . . . . . . . . . . . . . . . . . 16
2.5.1 RF Localization Techniques . . . . . . . . . . . . . . . . . 17
2.5.2 Laser–based Localization . . . . . . . . . . . . . . . . . . . 19
2.5.3 Ultrasound Acoustic Localization . . . . . . . . . . . . . . 19
2.5.4 Orientation Discovery . . . . . . . . . . . . . . . . . . . . 20
2.5.5 Alternatives to the use of Sensor Arrays . . . . . . . . . . 21
3 Estimation of Range and DOA . . . . . . . . . . . . . . . . . . . . 23
iii
3.1 Filtering and Correlation . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Code Generation and Modulation . . . . . . . . . . . . . . 28
3.1.2 Input Extraction . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Detection and Extraction . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Noise Estimation and Peak Detection . . . . . . . . . . . . 32
3.2.2 Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Interpolation and Normalization . . . . . . . . . . . . . . . 41
3.3 DOA Estimation and Combining . . . . . . . . . . . . . . . . . . 42
3.3.1 Lag Finding . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 DOA Estimation . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Alternative Approaches to DOA Estimation . . . . . . . . 48
3.3.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.5 Peak Detection . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Environmental Effects . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Multilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Overview and Context . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Prefiltering and Initial Estimation . . . . . . . . . . . . . . . . . . 57
4.3 Two Solutions to the Position Estimation Problem . . . . . . . . 59
4.3.1 R–θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 Iterative Non–Linear Least–Squares Minimization . . . . . 62
4.4 Interleaved Orientation Estimation . . . . . . . . . . . . . . . . . 69
iv
4.5 Outlier Rejection Using Studentized Residuals . . . . . . . . . . . 71
4.6 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.7 System Considerations . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1 What Can Go Wrong? . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Strategies for Robustness . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Successfully Managing Complexity . . . . . . . . . . . . . . . . . 84
II The Acoustic Sensing Platform 85
6 Emstar: a Software Framework . . . . . . . . . . . . . . . . . . . . 86
6.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1.1 Inter–node communication is not usually transparent. . . . 87
6.1.2 The system within a node is complex and benefits from
distributed system design principles. . . . . . . . . . . . . 88
6.2 How Emstar Works . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.1 Layer 0: FUSD Syscall Inter–process RPC . . . . . . . . . 91
6.2.2 Layer 1: GLib Event System . . . . . . . . . . . . . . . . . 101
6.2.3 Layer 2: Emstar Device Patterns and Libraries . . . . . . . 103
6.2.4 Layer 3: Emstar Components and Services . . . . . . . . . 119
6.2.5 Layer 4: Additional Tools and Environment . . . . . . . . 125
7 A Synchronized Distributed Sampling Layer . . . . . . . . . . . 133
7.1 A Buffered Acoustic Sensor Interface . . . . . . . . . . . . . . . . 134
v
7.1.1 Continuous Sampling and Buffering . . . . . . . . . . . . . 135
7.1.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.3 Multi–Client Interface . . . . . . . . . . . . . . . . . . . . 138
7.2 An Integrated Time Synchronization Service . . . . . . . . . . . . 139
7.2.1 Conversion–Based Time Synchronization . . . . . . . . . . 140
7.2.2 The Timesync API and Time Conversion Graph . . . . . . 142
7.2.3 RBS vs. MAC Layer Timestamps . . . . . . . . . . . . . . 144
7.3 Hop–by–Hop Time Conversion . . . . . . . . . . . . . . . . . . . . 149
8 Multihop Wireless Layer . . . . . . . . . . . . . . . . . . . . . . . . 151
8.1 How Wireless is Different . . . . . . . . . . . . . . . . . . . . . . . 151
8.2 The StateSync Abstraction . . . . . . . . . . . . . . . . . . . . . . 153
8.2.1 Application Requirements . . . . . . . . . . . . . . . . . . 154
8.2.2 The StateSync Abstraction . . . . . . . . . . . . . . . . . . 155
8.2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.3 Variants of StateSync . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.3.1 SoftState . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.3.2 LogFlood . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.3.3 LogTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.4 Benchmarking StateSync . . . . . . . . . . . . . . . . . . . . . . . 169
8.4.1 Metrics and Experimental Setup . . . . . . . . . . . . . . . 169
8.4.2 Benchmark Tests . . . . . . . . . . . . . . . . . . . . . . . 169
8.4.3 Determining Application Suitability . . . . . . . . . . . . . 174
vi
8.5 Applying StateSync to Position Estimation . . . . . . . . . . . . . 175
8.5.1 Applying the StateSync Model . . . . . . . . . . . . . . . . 176
8.5.2 StateSync Simplifies the System Design . . . . . . . . . . . 177
8.6 Performance of StateSync for Position Estimation . . . . . . . . . 179
8.7 Enabling System Visibility Using LogFlood . . . . . . . . . . . . . 182
III Experimental Results 184
9 Range and DOA Estimation Testing . . . . . . . . . . . . . . . . 185
9.1 DOA Component Testing . . . . . . . . . . . . . . . . . . . . . . 186
9.1.1 Azimuth Performance . . . . . . . . . . . . . . . . . . . . 188
9.1.2 Zenith Performance . . . . . . . . . . . . . . . . . . . . . . 191
9.2 Range Component Testing . . . . . . . . . . . . . . . . . . . . . . 195
10 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
10.1 Urban Outdoor Test: Court of Sciences . . . . . . . . . . . . . . . 206
10.1.1 Measurement of Ground Truth . . . . . . . . . . . . . . . 210
10.1.2 Selecting the Residual Cutoff . . . . . . . . . . . . . . . . 211
10.1.3 Comparison of R–θ and NLLS . . . . . . . . . . . . . . . . 214
10.1.4 Map Scaling With Temperature . . . . . . . . . . . . . . . 218
10.1.5 Repeatability . . . . . . . . . . . . . . . . . . . . . . . . . 219
10.2 Forest Outdoor Test: James Reserve . . . . . . . . . . . . . . . . 223
10.3 Analysis of Symmetric Ranges . . . . . . . . . . . . . . . . . . . . 231
vii
11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . 235
11.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
11.2.1 Practicality and Cost of the System . . . . . . . . . . . . . 238
11.2.2 Scaling Properties and Applicability Across Environments 240
11.2.3 Ideas and Components to Carry Forward . . . . . . . . . . 240
12 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
12.1 Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . . . 245
12.2 Platform Improvements . . . . . . . . . . . . . . . . . . . . . . . . 248
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
viii
List of Figures
2.1 Photo of a node deployed at the James Reserve [Ham00], and a
diagram of a proposed distributed acoustic sensing application to
localize acorn woodpeckers. . . . . . . . . . . . . . . . . . . . . . 10
2.2 Block diagram of the self–calibration system. . . . . . . . . . . . 12
2.3 Photograph of an acoustic array, and a diagram of the array ge-
ometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Block diagram of the ranging detection algorithm. . . . . . . . . 24
3.2 The Filtering and Correlation stage of the ranging detection al-
gorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Modulation for seed=7, encoding the sequence 0111011110. . . . 25
3.4 (a) The power spectral density (PSD) function for the exact ref-
erence signal. (b) The PSD of the reference signal as recorded at
the source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 (a) The PSD of the input signal received at Node 101, 80 meters
from the source. (b) The correlation of the signal above, expressed
in the time domain. The correlation peak is 7dB above the noise
floor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 SNR for PN code detection as a function of sample rate skew, and
the observed sample rate skew in 100K outdoor trials. . . . . . . 29
3.7 Analyzing the distribution of autocorrelation noise. . . . . . . . . 33
ix
3.8 (a) Distribution of noise for the correlation shown in Figure 3.5.
(b) Distribution of peak correlation values for a 100K trial outdoor
test. For each successful trial, the largest noise peak and the
detection peak are included. . . . . . . . . . . . . . . . . . . . . . 34
3.9 The Detection and Extraction stage of the ranging detection al-
gorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.10 The DOA Estimation and Combining stage of the ranging detec-
tion algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.11 Plot of the DOA objective function observed for a test with φ =
0, θ = 281. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.12 Combined signals for the trial from Figure 3.5. The two curves
show the effect of recombination using the straight DOA estimate
and our heuristic. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Algorithm for determining an initial parameter estimate. . . . . . 58
6.1 The five layers of the Emstar framework. . . . . . . . . . . . . . 90
6.2 Message timing diagram of a FUSD call. The middle column of
the diagram represents the FUSD kernel module. . . . . . . . . . 94
6.3 A dependency loop, and using a broker service to break the loop. 96
6.4 Diagram showing how to use a thread and a queue to break a
FUSD dependency loop. . . . . . . . . . . . . . . . . . . . . . . . 97
6.5 The FUSD file operations structure. . . . . . . . . . . . . . . . . 98
6.6 Throughput comparison of FUSD and in–kernel implementations
of /dev/zero, timing a read of 1GB of data on a 2.8 GHz Xeon,
for both 2.4 and 2.6 kernels. . . . . . . . . . . . . . . . . . . . . . 100
x
6.7 The Emstar event system API. . . . . . . . . . . . . . . . . . . . 102
6.8 Setting a timer in the Emstar event system. . . . . . . . . . . . . 103
6.9 Block diagram of the Status Device pattern. The functions bi-
nary(), printable(), and write() are callbacks defined by the server,
while status notify() is called by the server to notify the client of
a state change. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.10 A snippet of code that creates a Status Device. . . . . . . . . . . 108
6.11 Block diagram of the Packet Device pattern. The functions send()
and filter() are callbacks defined by the server, while pd recieve()
and pd unblock() are functions called by the server. . . . . . . . . 109
6.12 Snippet of code that creates a Command Device. . . . . . . . . . 111
6.13 Block diagram of the Query Device pattern. In the Query Device,
queries from the clients are queued and “process” is called serially.
The “R” boxes represent a buffer per client to hold the response
to the last query from that client. . . . . . . . . . . . . . . . . . 113
6.14 Block diagram of the Sensor Device pattern. In the Sensor Device,
the server submits new samples by calling sdev push(). These
are stored in the ring buffer (RB), and streamed to clients with
relevant requests. The “R” boxes represent each client’s pending
request. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.15 “Crashproof” auto–reopen algorithm. . . . . . . . . . . . . . . . 118
6.16 Screen shot of EmView, the Emstar visualizer. . . . . . . . . . . 132
7.1 Block diagram of the buffered acoustic sensor interface. . . . . . 134
xi
7.2 Plot of the linear relationship between the VXP sample clock and
the platform’s CPU clock. . . . . . . . . . . . . . . . . . . . . . . 136
7.3 Block diagram of the syncd service. . . . . . . . . . . . . . . . . . 140
7.4 RBS correlation of the timing of received broadcasts. This graph
shows that CPU clocks are stable with respect to each other over
time periods as long as 20 minutes. . . . . . . . . . . . . . . . . . 144
7.5 The MAC clocks appear to actively adapt their rates, rather than
maintaining frequency stability: (a) shows a central mode with
perfect rate matching, while (b) shows that the frequency of the
MAC clock is unstable when referenced to the CPU clock. But we
know from Figure 7.4 that the CPU clocks are stable with respect
to each other. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.6 Expanded plot of MAC timestamps showing high levels of noise. 146
8.1 Publisher applications push tables of key–value pairs to StateSync,
which disseminates them and delivers the complete table of all re-
ceived keys to subscribers whenever a change occurs. . . . . . . 156
8.2 The StateSync Log Scheme maintains a checkpointed and and an
active log. In the diagram, the first two ADD entries in the active
log are carried over from the checkpointed log after the redundant
entries have been compressed out. . . . . . . . . . . . . . . . . . 162
8.3 A screen shot from EmView displaying the wireless testbed de-
ployed in our building. The scale of the map is 5 meters per grid
square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
xii
8.4 Results of benchmark tests on the testbed. Each grouping of bars
represents four 20–minute experiments in which 64K of data is
published in a fixed number of chunks, issued at regular intervals. 170
8.5 The latency distribution, broken down by hopcount. . . . . . . . 173
8.6 The distribution of key lifetimes for our position estimation ap-
plication. The mean key lifetime is 1506 seconds ±121. . . . . . . 175
8.7 Results of tests of our Position Estimation application from our
12 node testbed. The latency graphs show a CDF of latency in
seconds. The curve for LogTree shows some initial traffic in setting
up the ClusterSync trees before the start of data traffic. . . . . . 180
8.8 Results of tests of our Position Estimation application from a 50
node simulation. The mean latency for LogTree is 31.54 ± 0.58;
for LogFlood is 14.33± 0.12. . . . . . . . . . . . . . . . . . . . . 181
9.1 Experimental setup for the DOA component test. . . . . . . . . . 186
9.2 Mounting the measurement laser for the azimuth test (left) and
the zenith test (right). . . . . . . . . . . . . . . . . . . . . . . . . 187
9.3 Overall distribution of errors in the Azimuth test. These results
are well within our target of ±1 deg. . . . . . . . . . . . . . . . . 188
9.4 Results of the Azimuth test, showing deviation from ground truth.
These results suggest a bias that is dependent on angle. . . . . . 189
9.5 Results of the Zenith test, showing deviation from ground truth.
Some asymmetry is evident when comparing the two sides. Neg-
ative angles approach from beneath the array and are heavily
obstructed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
xiii
9.6 Overall error distribution from the Zenith test. We observe that
the error distribution for “midrange” angles is comparable to that
of the azimuth estimates, although the error distribution for over-
head angles performs more poorly. . . . . . . . . . . . . . . . . . 194
9.7 The experimental setup for our range test in Lot 9, showing tests
at 5m (left) and 50m (right). The 50m test required multihop
synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.8 Results of the Ranging test, 0-90m. In (a) the impulses show the
mean deviation from ground truth (right y scale), as a function
of distance. In (b) experiments are shown ordered by distance,
with the mean deviation plotted relative to the right y scale. The
distance for each experiment is represented by the dotted line,
referenced to the left y scale. . . . . . . . . . . . . . . . . . . . . 196
9.9 Plots showing the relationship between distance, SNR, and error.
The upper graph shows a scatter plot of range error vs. SNR. The
lower graph shows the relationship between SNR and distance,
with experiments ordered by distance. The dashed line shows a
function of distance that fits well to SNR. The dotted line shows
the distance corresponding to each experiment. . . . . . . . . . . 197
9.10 Results of the Ranging test, zooming in on 10 and 5 meters. These
tests show good accuracy and precision, despite being taken over
a long time interval and assuming a single temperature over the
entire experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . 200
xiv
9.11 Results of the Ranging test, zooming in on tests from 50–55 me-
ters. Anomalous behavior is observed at 50 meters, perhaps the
result of a transient synchronization problem. A bug that could
have caused this has since been fixed. . . . . . . . . . . . . . . . 201
9.12 Overall error distribution from the Lot 9 Range Test. The stan-
dard deviation of the range error for all tests is 3.81 cm. If we
drop the 17 values with error larger than 10 cm, the standard
deviation of the remaining distribution is 1.76 cm. By applying
the narrower model in our multilateration algorithm, we can drop
the data in the tails as outliers. . . . . . . . . . . . . . . . . . . 202
10.1 The experimental setup for our system test in the UCLA Court of
Sciences. Node locations are indicated by numbered dots, while
yellow bars indicate the location of hedges. North is toward the
top of the photo. Image courtesy of Google Earth. . . . . . . . . 207
10.2 Output of the NLLS Position Estimation Algorithm, for the 1:45
AM dataset. The green crosses denote ground truth; the red
arrows show the position and orientation of each node. . . . . . . 208
10.3 Output of the R–θ Position Estimation Algorithm, for the 1:59
AM dataset. This dataset was the best result for R–θ. . . . . . . 209
10.4 Results of running our 14 courtyard experiments using a residual
threshold of 2. We see that half of our experiments do equally
well with a threshold of 3. . . . . . . . . . . . . . . . . . . . . . 212
10.5 CDF of the results of applying several different residual thresholds
to our 14 courtyard experiments. . . . . . . . . . . . . . . . . . 213
xv
10.6 Position error achieved by the R–θ and NLLS algorithms on our 14
courtyard experiments, using a residual threshold of 3. The NLLS
algorithm consistently outperforms the R–θ algorithm because it
is able to make better use of the more accurate range data. Our
2D results improve upon those in [KMS05] by a factor of 20. . . 215
10.7 (a) Average Range Residual achieved by the R–θ and NLLS algo-
rithms for our courtyard experiments, using a residual threshold
of 3. (b) Average Range Residual and Position Error for NLLS. . 216
10.8 Scaling factors relative to ground truth, and air temperature. We
see a correlation between map scaling relative to ground truth,
and air temperature. . . . . . . . . . . . . . . . . . . . . . . . . 218
10.9 Repeatability statistics for position estimates, showing the per–
node distribution of deviations from ground truth. All errorbar
ranges are ± Standard Deviation. The mean standard deviations
for X, Y, and Z estimates over all nodes are 3.18 cm, 3.85 cm,
and 49.15 respectively. . . . . . . . . . . . . . . . . . . . . . . . 220
10.10 Repeatability statistics for Yaw, Pitch, Roll, computed using the
same method as in Figure 10.9. All errorbar ranges are ± Stan-
dard Deviation. The mean deviation for yaw estimates over all
nodes is 1.37 deg. . . . . . . . . . . . . . . . . . . . . . . . . . . 221
10.11 The experimental setup for our system test in the James Reserve
in Idyllwild. Node locations are indicated by red numbered dots.
North is toward the top of the photo; all arrays were aligned by
compass to point west. . . . . . . . . . . . . . . . . . . . . . . . 223
xvi
10.12 3–D map generated by the NLLS algorithm from our deployment
in the James Reserve. Ground truth is shown as crosses, esti-
mated positions and orientations as arrows. . . . . . . . . . . . . 224
10.13 3–D position estimation map generated by the R–θ algorithm.
Both this and Figure 10.12 use data captured at 10:30 AM on
September 29, 2005. . . . . . . . . . . . . . . . . . . . . . . . . . 225
10.14 Histograms of the Position Error and Average Range Residual
Metrics for the James Reserve data. NLLS outperforms R–θ ac-
cording to both metrics, although inaccuracies in ground truth
likely prevent errors under 50 cm. . . . . . . . . . . . . . . . . . 227
10.15 Repeatability statistics for position estimates, over all 10–node
James Reserve data. All errorbar ranges are ± Standard Devi-
ation. The mean standard deviations for X, Y, and Z estimates
over all nodes are 3.48, 3.78, and 17.1 respectively. . . . . . . . . 229
10.16 Repeatability statistics for Yaw, Pitch, and Roll. All errorbar
ranges are ± Standard Deviation. The mean standard deviation
for yaw estimates over all nodes is 3.15 deg. . . . . . . . . . . . . 230
10.17 Symmetric ranges, showing variation as a function of temperature
and a consistent offset. . . . . . . . . . . . . . . . . . . . . . . . 232
10.18 (a) Symmetric ranges for 100–103, and (b) Raw range data show-
ing probable synchronization failure. . . . . . . . . . . . . . . . . 233
11.1 3–D plot showing the importance of using the φ angle information.
Although the system converged without φ, it converged to a folded
configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
xvii
List of Tables
4.1 Error Distributions for Range and DOA Estimates. . . . . . . . . 60
6.1 Device Patterns currently defined by the Emstar system. . . . . . 104
8.1 Packet and byte counts for LogTree and LogFlood for 1 chunk, 12
senders, at 600 seconds and 1200 seconds. . . . . . . . . . . . . . 172
9.1 Range experiments, grouped by target scale and precision. . . . . 199
10.1 Experiment timing and weather conditions. . . . . . . . . . . . . 210
10.2 Position Error and Average Range Residual metrics for the NLLS
and R–θ algorithms, run on the 6 10–node experiments captured
at the James Reserve. For the experiment at 10:44 AM, the NLLS
algorithm failed to reach convergence. . . . . . . . . . . . . . . . 228
xviii
Acknowledgments
Portions of the system described in this thesis were implemented by other
contributors. Much of the underpinnings of Emstar were co-designed and imple-
mented by Jeremy Elson, and many others have contributed the implementation.
Jeremy Elson designed and implemented syncd, the Emstar time synchroniza-
tion service, gsyncd, and many other components of Emstar from Packet Device
and Directory Device, to the original version of EmSim and the radio channel
simulator. Nithya Ramanathan implemented the Sensor Device pattern used by
vxpcd and other modules. Martin Lukac implemented the multilateration module
and the Emstar HTTP service. Thanos Stathopoulos implemented IP Connec-
tor, EmTOS, and contributed greatly to the work that led to Emstar’s capability
for heterogeneous simulation. Nia-Chiang Liang implemented the least squares
minimization for the DOA estimator. Hanbiao Wang provided insight towards
the design of the DOA and multilateration algorithms.
Jeffrey Tseng assembled the initial version of the hardware platform, and Mar-
tin Lukac helped with numerous changes to the hardware configuration. Dustin
Mcintire developed the system software distribution for both the initial Intel
Stargate platform and the final Slauson platform, and answered numerous ques-
tions during kernel debugging. Naim Busek assisted in the development of the
schematics for the custom microphone pre-amp board, Jeffrey Tseng did the parts
placement and board layout, Mohammad Rahimi helped to debug the pre-amp
circuits, and Carolina Garcia helped with board assembly.
Nia-Chiang Liang, Luiz Faveira, Martin Lukac, Alberto Cerpa, Vlad Trifa,
and Chris Mar helped prepare and run experiments. For our outdoor experi-
xix
ments we used the facilities of the James Reserve, managed by the University
of California Riverside, and specifically Michael Hamilton, Michael Taggart, and
Tom Unwin.
Chapter 6 contains text from the previously published work EmStar: a Soft-
ware Environment for Developing and Deploying Wireless Sensor Networks, by
Lewis Girod, Jeremy Elson, Alberto Cerpa, Thanos Stathopoulos, Nithya Ra-
manathan, Deborah Estrin, in the proceedings of the 2004 USENIX Technical
Conference, Boston, MA. Figures from this work are reprinted here with permis-
sion.
Chapter 7 describes work done together with Jeremy Elson, who designed and
implemented the original Emstar time synchronization system, which this work
builds on and extends.
Chapter 8 contains text from the previously published Technical Report, A
Reliable Multicast Mechanism for Sensor Network Applications, by Lewis Girod,
Martin Lukac, Andrew Parker, Thanos Stathopoulos, Jeffrey Tseng, Hanbiao
Wang, Deborah Estrin, Richard Guy and Eddie Kohler, CENS Technical Report
48, April 25, 2005.
Support for this work has been provided by the NSF Cooperative Agreement
CCR-0120778, and the UC MICRO program (grant 01-031) with matching funds
from Intel.
xx
Vita
1972 Born, Buffalo, New York, USA.
1994 B.S. (Mathematics),
Massachusetts Institute of Technology
1995 B.S. (Computer Science) and M.Eng. (EECS),
Massachusetts Institute of Technology
1995–1998 Sponsored Research Staff,
Advanced Network Architecture group,
Laboratory for Computer Science, MIT.
1998–2000 Graduate Research Assistant,
Information Sciences Institute, USC.
Summer 1999 Summer Intern,
AT&T Cambridge Research Laboratory,
Cambridge, UK.
2000–2003 Senior Development Engineer,
Sensoria Corporation.
2000–2005 Graduate Research Assistant,
Computer Science Department, UCLA.
xxi
Publications
Elson, J., Girod, L., and Estrin, D., “A Wireless Time-Synchronized COTS Sen-
sor Platform, Part I: System Architecture” (short paper). In Proceedings of the
IEEE CAS Workshop on Wireless Communications and Networking, Pasadena,
CA. September 5–6 2002.
Girod, L., Bychkovskiy, V., Elson, J., and Estrin, D., “Locating tiny sensors in
time and space: A case study”. In Proceedings of the International Conference
on Computer Design (ICCD 2002), Freiburg, Germany. September 16-18 2002.
Invited paper.
—, Elson, J., Cerpa, A., Stathopoulos, T., Ramanathan, N., and Estrin, D.,
“EmStar: a Software Environment for Developing and Deploying Wireless Sensor
Networks”. In Proceedings of the 2004 USENIX Technical Conference, Boston
MA June 2004.
—, and Estrin, D., “Robust Range Estimation Using Acoustic and Multimodal
Sensing”. In Proceedings of IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2001), Maui, Hawaii, October 2001.
—, Lukac, M., Parker, A., Stathopoulos, T., Tseng, J., Wang, H., Estrin, D.,
Guy, R., and Kohler, E., “A Reliable Multicast Mechanism for Sensor Network
Applications”. Center for Embedded Networked Sensing Technical Report #48,
April 25, 2005.
xxii
—, Stathopoulos, T., Ramanathan, N., Elson, J., Estrin, D., Osterweil, E., and
Schoellhammer, T., “A System for Simulation, Emulation, and Deployment of
Heterogeneous Sensor Networks”. In Proceedings of the International Conference
on Sensor Network Systems (SenSys 2004), November 2004.
Merrill, W., Girod, L., Elson, J., Sohrabi, K., Newberg, F., and Kaiser, W.,
“Autonomous Position Location in Distributed, Embedded, Wireless Systems”.
In Proceedings of the IEEE CAS Workshop on Wireless Communications and
Networking, Pasadena, CA, September 5–6 2002.
Savvides, A., Girod, L., Srivastava, M., and Estrin, D., “Localization in Sen-
sor Networks”. In C.S. Radhavendra, K.M. Sivalingam and T. Znati, editors,
Wireless Sensor Networks. Kluwer Academic Publishers, 2004.
xxiii
Abstract of the Dissertation
A Self-Calibrating System
of Distributed Acoustic Arrays
by
Lewis David Girod
Doctor of Philosophy in Computer Science
University of California, Los Angeles, 2005
Professor Deborah L. Estrin, Chair
The area of sensor networks promises to support the biological and physical
sciences by enabling measurements that were previously impossible. This is ac-
complished by pushing intelligence into the network and closer to the sensors,
enabling sensing to be accomplished at much higher scales and densities with
lower cost.
Recently, interest in acoustic sensing problems has increased, including the
localization and monitoring of birds, wolves, and other species; as well as of
localization of electronic devices themselves. This has spurred the development
of a rapidly–deployable distributed acoustic sensing platform.
A key problem in the development of this platform is the acoustic array cali-
bration problem, which estimates the locations and orientations of a distributed
collection of acoustic sensors. We present a system composed of a set of in-
dependent acoustic nodes that automatically determines calibration parameters
including the relative location and orientation (X,Y, Z, Θ) of each array. These
xxiv
relative coordinates are then fitted to one or more survey points to relate the
relative coordinates to a physical map. The application that computes these
estimates is itself a distributed sensing application.
In this work we present a solution to this position estimation problem, demon-
strating a complete vertical application built above a stack of re–usable sys-
tem components and distributed services, implemented on a deployable embed-
ded hardware platform. We describe: the hardware platform itself; Emstar, a
software framework for developing complex embedded system software; a time–
synchronized sampling layer; a multihop reliable multicast coordination primi-
tive; a time–of–flight acoustic ranging and direction–of–arrival (DOA) estimation
layer; and the top–level application that estimates the position and orientation
of each array.
We present the results of controlled tests of the ranging and DOA estimation
system, as well as the results of deployment experiments in both an urban envi-
ronment and a forested environment. These results demonstrate that our system
outperforms other similar systems, and that it can achieve the sufficient accuracy
for anticipated applications, such as bird localization.
xxv
CHAPTER 1
Introduction
The area of sensor networks promises to support the biological and physical
sciences by enabling measurements that were previously impossible. This is ac-
complished by pushing intelligence into the network and closer to the sensors,
enabling sensing to be accomplished at much higher scales and densities with
lower cost.
While most of the currently deployed sensor networks focus on long–lived,
low–rate sensing applications such as microclimate monitoring, interest in appli-
cations involving high–rate sensors has been on the increase. Recently, interest in
rapidly–deployable, self–configuring acoustic sensing problems has increased, in-
cluding the localization and monitoring of birds, wolves, and other species; as well
as of localization of electronic devices themselves. However, despite these needs,
very few of the requisite underpinnings of such systems are currently available.
This has spurred the development of a rapidly–deployable distributed acoustic
sensing platform.
Many of the problems faced in this work are familiar problems from the area of
wireless sensing and embedded networking. Although in this work we are target-
ing more capable systems and we expect to support shorter–lived deployments,
we still must design the system with energy consumption in mind. In addition,
because of the high volume of data captured by acoustic sensors, this system
must support local processing in order to function in the context of a wireless
1
network.
When we specifically consider embedded acoustic sensing using these high–
capability platforms, we see several systems requirements come to the forefront:
• Synchronized, distributed sampling: the ability to relate and compare with
precision time series data and events recorded on different nodes in the
network.
• A network stack designed to support ad–hoc wireless applications: link
estimation, routing, and transport.
• Reliable group communication to support distributed coordination.
• Automatic array calibration: precise, automatic estimation of the position
and orientation of the sensor arrays in the system.
• Tools to support development, debugging, and deployment.
A key problem in the development of this platform is the acoustic array cali-
bration problem in which the locations and orientations of a distributed collection
of acoustic sensors are estimated. We present a system composed of a set of in-
dependent acoustic nodes that automatically determines calibration parameters,
including the relative location and orientation (X,Y, Z, Θ) of each array. These
relative coordinates are then fitted to one or more survey points to relate the
relative coordinates to a physical map.
This problem is difficult from both a systems perspective (how we can make
the system work robustly), as well as from an algorithmic perspective, as it
involves the correct functioning of a number of separate algorithms from DSP
algorithms and estimation algorithms to multilateration algorithms. By solving
2
this problem, we simultaneously achieve two goals: we add a critical feature to
our platform, while presenting a worked–out example of a distributed acoustic
sensing application that exercises much of our target platform functionality.
In this work, we present a definition of the array calibration problem, and
explain the algorithms we used to solve it. We then explain in detail the system
components and infrastructure we developed to implement the solution. Finally,
we present the results of component testing under controlled conditions, as well
as a deployment experiment in a realistic forested environment. We thus demon-
strate that our system can achieve sufficient calibration accuracy to implement
typical acoustic applications, such as bird localization.
1.1 Contributions of this Work
The system we have built provides a deployable acoustic sensing platform that
automatically positions and orients its sensors within a relative coordinate sys-
tem, autonomously with no infrastructure requirement. As a result, this system
can be deployed much more easily and with less damage to the environment than
a system that required extensive surveying or wired infrastructure. Since the sys-
tem does not depend on GPS, it can be deployed in obstructed environments with
overhead foliage. Since the system is above all a platform for acoustic sensing, the
self–positioning component requires no additional hardware because it uses the
same acoustic sensing hardware that would be required for typical applications.
This work represents a vertical system implementation, from hardware through
a distributed application. As such, it touches many areas of the embedded net-
worked sensing field. In the next three paragraphs, we summarize the contri-
butions of this work, breaking them down into three main categories. These
3
categories represent work at different layers of the system, from lowest layer of
hardware and system software to the highest layer of high level algorithms.
Integration of an Embedded Platform.
• We designed and constructed a box with a processing unit and a small
“head unit” containing speakers and a microphone array.
• We developed a software framework that enables convenient inter–process
communication and robust operation in the field.
• We developed an end–to–end implementation of a synchronized sampling
layer, without which implementing time–of–flight ranging over multiple RF
hops would be very difficult.
• We developed deployment tools that facilitate the deployment and control
of the deployed system.
Network Stack and Distributed System Support.
• We integrated multihop time synchronization support from previous work
into our system.
• We developed a topology discovery and control layer that discovers the
topology of an ad–hoc wireless network.
• We developed a multihop reliable broadcast data dissemination service that
provides a simple publish–subscribe interface.
4
Acoustic Ranging and Positioning System.
• We designed and developed DSP algorithms to precisely estimate range and
direction of arrival using acoustic signals.
• We implemented multilateration algorithms to translate a collection of
range and angle estimates into a consistent coordinate system.
• We tied this all together into a distributed application above the aforemen-
tioned network and platform development.
The closest similar system of which we are aware is a Mica2–based system
developed at UIUC [KMS05] based on ranging developed at Vanderbilt [SBM04].
Like our work, this system is a complete audible ad–hoc acoustic localization
system. However, the performance of their system demonstrated by their exper-
iments is considerably inferior to ours. Whereas our system located 10 nodes
over an 80 by 50 meter area with a 9 cm average 2D position error, their system
located 45 nodes over a 60 by 60 meter area with an average position error of
2.47 meters—nearly 25 times worse.
The reasons for this are likely a combination of poorer range accuracy, shorter
sensing range, and possibly the absence of outlier rejection heuristics integrated
into the multilateration algorithms. Because of the computational limitations
of the Mica2, the UIUC system uses a narrowband detector implemented by an
analog PLL circuit. This detector is susceptible to various forms of noise and
will therefore require higher signal amplitudes at the receiver. The detection
range of the UIUC system is also limited by RAM buffer space available and by
the absence of multihop time synchronization, which makes the RF transmission
range an upper bound on the acoustic detection range. In the descriptions of
5
their algorithms, no mention was made of outlier rejection apart from filtering
the input ranges.
In this work we demonstrate a robust localization system that is highly accu-
rate and operates well in a difficult outdoor environment, representing a signifi-
cant improvement over other work in the field.
1.2 How to avoid reading this document
In order to make this work more useful to the casual reader, we present where to
find the most useful parts of this work.
System Overview
• System design overview: Section 2.3
• Putting the whole system together: Section 8.5
Ranging and Multilateration Algorithms
• Layout and coordinate system of the acoustic arrays: Section 2.4
• The ranging system and DSP algorithms: Chapter 3
• The direction of arrival (DOA) estimator: Section 3.3
• The Multilateration algorithms: Chapter 4
Reusable System Components
• Emstar, our software framework: Chapter 6
6
• Synchronized sampling and time synchronization: Chapter 7
• Multihop transport layer (StateSync): Chapter 8
• Deployment tools: Sections 6.2.5.3, 6.2.5.4, and 8.7
Performance Measurements
• Ranging and DOA Component performance: Chapter 9
• Position estimation system performance: Chapter 10
• Network transport performance: Section 8.6
7
CHAPTER 2
Problem Definition
At a high level, the acoustic array calibration problem seeks to discover all cal-
ibration parameters required to use a collection of these arrays in a distributed
sensing application. However, to more clearly define this problem, we must first
discuss the application requirements in the context of the specific properties of
the hardware.
2.1 Requirements of Acoustic Detection Applications
Distributed acoustic detection algorithms are a class of applications that in-
volve sensing a phenomenon at several points and combining that information to
achieve some goal. For example, a project to study acorn woodpeckers in a partic-
ular habitat might intend to detect woodpecker calls, determine their locations,
and count the number of distinct individuals. Figure 2.1 sketches out a possible
arrangement of nodes to support such an application. Recent work on localization
of animal calls has focused on “beam–crossing” techniques [WYP04] [WCA05].
In “beam–crossing”, several stations positioned in a convex hull around the target
detect the signals from the target and estimate the direction of arrival (DOA) of
the signals. The estimated DOA vectors from multiple points are then used to
triangulate the target. Techniques that rely on DOA computed from small arrays
are preferred because in practice signals received at different stations often lack
9
Figure 2.1: Photo of a node deployed at the James Reserve [Ham00], and a diagram of
a proposed distributed acoustic sensing application to localize acorn woodpeckers.
the coherence required to measure time difference of arrivals (TDOA) at stations
that are more than a few meters apart.
Experience with DOA estimation for animal calls has yielded estimation er-
ror on the order of ±2.5 deg [WCA05]. Calibration error in the orientation of
a receiving arrays adds directly into the error in a DOA estimation at that sta-
tion. Therefore, we want to minimize the calibration error, and for practical
purposes keep it well under the ±2.5 deg figure. Error in the position estimates
for the arrays also adds into a DOA estimate, since a 0.5 meter error in position
amounts to a 1 degree error for a target at a range of 30 meters. Note that for
the purposes of localization by cross–beam, the position estimates need only be
relatively consistent; uniform scaling of the map does not affect the results.
Based on this application, we can define some typical application requirements
for our system. The system shall need sufficient numbers of nodes to surround the
10
target, with a distance from array to target of 30–50 meters. The calibration of
the system must achieve position estimates accurate to ±0.5 meters in a relative
map, and orientation estimates accurate to ±1 deg.
2.2 Definition of the Calibration Problem
The acoustic array calibration problem seeks to determine a set of parameters
that define the locations and orientations of a collection of arrays. The parameters
are referenced to the coordinate system specified in the array geometry diagram
shown later in Figure 2.3. These parameters are defined in a coordinate system
that can either be referenced to a single origin array, or referenced to one or more
arrays located at survey points.
• Let (Xi, Yi, Zi) be the location of array i, relative to survey coordinates or
simply to the other arrays.
• Let Θi be the azimuth orientation of array i, relative to survey coordinates
or simply to the other arrays.
• We assume that all arrays are leveled, so that the remaining two degrees of
freedom are fixed relative to each other.
• If survey coordinates are present, we add a global scaling variable V that
defines a scaling factor to enable the system to scale to fit the survey points.
• Environmental parameters such as temperature, humidity, and wind speed
and direction effect the effective speed of sound upon which time–of–flight
(TOF) ranging is based. Local measurements can be used to compensate to
some extent for these factors, but a future extension of this problem might
include an environmental model as additional estimated parameters.
11
RangingLayer
Emit CodedSignal
DetectionAlgorithm
SelectCode
DetectionAlgorithm
<Code, Detect Time> <Code, Detect Time><Range, θ, φ>
Multilateration Layer
<Range, θ, φ><X, Y, Z, θ>
Time-Synchonized Sampling Service Layer
Time Sync Control Traffic
Trigger
Trigger
Multi-hop Network Layer
RangingLayer
Emit CodedSignal
DetectionAlgorithm
SelectCode
DetectionAlgorithm
<Code, Detect Time> <Code, Detect Time><Range, θ, φ>
Multilateration Layer
<Range, θ, φ><X, Y, Z, θ>
Time-Synchonized Sampling Service Layer
Time Sync Control Traffic
Trigger
Trigger
Multi-hop Network Layer
Figure 2.2: Block diagram of the self–calibration system.
We propose a solution to first estimate range and angle–of–arrival information
through acoustic ranging, and using that information to estimate these parame-
ters.
2.3 Outline of Proposed Solution
To solve this estimation problem, we divide the system into several components,
as shown in Figure 2.2: the time–of–flight ranging layer, the multilateration layer,
a time–synchronized sampling layer and a networking layer. In the time–of–flight
ranging layer, each node emits calibration signals that are received by the other
nodes. Through detection algorithms, the system estimates the phase and angle
of arrival of the incoming signal from each peer. The use of the time–synchronized
12
sampling layer enables these phases to be compared a across nodes to establish
range estimates based on the time of flight of the signals. These range and angle
estimates are then passed over the network to a multilateration component that
estimates the most likely values for the calibration parameters to match the range
and angle data. In this section we briefly outline these layers, before discussing
them in more detail in the following chapters.
2.3.1 Time of Flight Ranging Layer
The TOF Ranging Layer is triggered by higher layers to emit a ranging signal.
The ranging signal is coded signal modulated on a 12 KHz carrier, which can be
readily detected by a matched filter. These techniques have been shown to work
well for acoustic ranging in previous work, including [GE01] [GBE02] [MGS04].
The emitted code is detected at the emitter in order to determine the exact
time at which the code was transmitted. The transmission time and the code
index are then sent to the other nodes via a multihop wireless network. Upon
arrival, the time synchronized sampling layer is queried to extract the region of
the signal that might contain the arriving ranging sequence, and the detection
algorithm determines the range and direction of arrival (DOA) for the incoming
signal. The range and DOA information is passed back through the network to
the multilateration layer.
2.3.2 Multilateration Layer
The multilateration layer controls the automatic calibration process and com-
putes a consistent map based on the ranges and DOA values estimated by the
ranging layer. The multilateration layer analyses the raw list of range and DOA
estimates to locally determine whether new range experiments are needed. If
13
more data is needed, it will trigger local ranging after a randomized delay. Events
that cause range or angle estimates to be invalidated, such as moving one of the
array receivers, would trigger new ranging through the same mechanism when
the multilateration layer discovers that the ranges have been revoked.
The multilateration algorithm itself is based on a non–linear least squares
optimization in the variables (X,Y, Z, Θ). To address cases where line of sight
(LOS) is obscured, outlier rejection heuristics are used to remove inconsistent
data. Outliers are removed both by removing cases where the DOA estimate is
inconsistent with the estimated placement of the nodes by more than 20 deg, and
by excluding range constraints that have high weighted residuals.
2.3.3 Network Layer
The network layer provides a multihop mesh network with several convenient
primitives for group communication and coordination. At the routing layer, flood-
ing and IP routing are supported. The flooding layer also supports hop–by–hop
time conversions for known packet types.
The network layer also supports some reliable state dissemination protocols.
The StateSync protocol [GLP05] provides a reliable multicast transport infras-
tructure with low–latency updates over a multihop network. This module is
interfaced through a simple publish–subscribe API. For example, in this system,
nodes publish their current range and DOA estimates to this system, which dis-
seminates them multiple hops across the network. The same mechanism is also
used to publish link state, routing state, and the existence of faults throughout
the network, enabling debugging and visualization in the field.
14
8cm
0°θθ φ
14cm
(-4,-4,0)
(-4,4,14)
0°φ
90°θ
8cm
0°θθ φ
14cm
(-4,-4,0)
(-4,4,14)
0°φ
90°θ
8cm8cm
0°θθ φ
14cm
14cm
(-4,-4,0)
(-4,4,14)
0°φ
90°θ
Figure 2.3: Photograph of an acoustic array, and a diagram of the array geometry.
2.4 Properties of the Acoustic Array Hardware
Many of the details of our implementation depend on the properties of the acous-
tic array hardware implementation. While many aspects of our software can be
modified to support configurations other than the one we have selected, some
aspects of the system, as well as the results are influenced by these details. In
this section, we describe the configuration of our acoustic array hardware and
discuss some alternatives.
Our acoustic sensor nodes consist of a stand–alone wireless processing unit
connected to a four–channel microphone array and acoustic emitter. The CPU
is capable of sampling and emitting four channels at 48 KHz. The microphones
have a frequency response from 40Hz–15KHz, while the emitter has a frequency
response from 3.5KHz–27KHz.
The geometry of the array is shown in Figure 2.3. Projected onto the (x, y)
plane, the four microphones lie on the corners of a square 8 cm on a side. Three
15
of the microphones lie on the (x, y) plane, while the fourth is raised 14 cm above
the plane. The origin of the array is considered to be the center of the plane
containing the three co–planar microphones. The emitter is composed of four
separate emitters, wired in parallel and positioned around the array, 4 cm below
the origin plane. This geometry is simple to construct and provides enough
diversity along the z axis to provide reasonable results for estimates of zenith
angles.
The geometry we selected is not ideal. When we began this work we consulted
with colleagues who were working in parallel on a woodpecker detection project,
which we intended our platform to support. They suggested a square array with
all four microphones in a plane because it would work well for their algorithms.
However, after constructing the arrays and gaining some experience with them,
we later retrofitted them to have one of the microphones raised above the plane
in an effort to improve the the system’s zenith estimates. Although our current
results have been satisfying, if we were to build more arrays today, we would
probably select a more symmetrical geometry, such as a tetrahedron. By adding
more microphones symmetrically, that geometry would approach a spherical or
hemispherical array, such as is presented in Duraiswami [DZL05].
2.5 Comparison to Related Systems
Since our system is designed to support acoustic detection algorithms, in some
sense using acoustics for position estimation is “free”—we already need to have all
of the hardware and software to do acoustic monitoring, so all that is required are
the additional algorithms to determine positions. In addition, it is advantageous
to use the array itself in the position estimation because our future measurements
(i.e. acoustic detection algorithms) will all be made relative to that array location.
16
This raises the bar for every other localization mechanism, because they will still
require some additional mechanism or calibration step to reference the location of
the “localization receiver” to the array. In general, these requirements are quite
difficult to satisfy using other techniques.
2.5.1 RF Localization Techniques
Position estimation using RF techniques has thus far been found to be difficult
to achieve in ad–hoc deployments. GPS is typically not available in forested re-
gions, because the faint signals from the satellites are easily blocked by overhead
foliage. Ultra wideband RF transceivers may someday provide accurate ad–hoc
localization, but licensing problems and the difficulty in acquiring the hardware
have made it difficult to test or adopt. Early experience with UWB systems have
shown error distributions with standard deviation of about 45 cm [CKS03], in
near–range, line–of–sight conditions. Obstructed environments with significant
multipath pose serious problems [LS02], although these issues might be addressed
through the solution of over–constrained systems, similarly to acoustic systems.
In any case, the improved penetration of RF relative to acoustics does not nec-
essarily improve matters, because just as with acoustics, obstructed RF environ-
ments often introduce significant range errors. In practice, most of the UWB
ranging technology available off–the–shelf relies on tightly synchronized base sta-
tions. While requiring base stations may not be an unreasonable requirement, it
does increase the cost and difficulty of deployment.
Solutions based on measuring received signal strength or connectivity typically
provide very poor accuracy in practical environments [BHE00]. Signal strength
poses problems because the transmit power of a radio is not generally well–
calibrated, and the propagation characteristics of the environment are generally
17
unknown. Multipath fading is especially problematic, because it can introduce
abrupt variations in received signal strength over very small distances.
Several ad–hoc location systems have been built based on automated calibra-
tion of RF measurements. SpotOn [HVB01] presented a system that included a
calibration phase to calibrate out difference in the transmitters and receivers, al-
though this system was still subject to environmental variations. RADAR [BP00]
presents a more comprehensive calibration process, in which a robot is used to
discover a detailed map of signal strength to 802.11 base stations.
Some systems have presented “range free” solutions based on connectivity
rather than signal strength [NN03] [SRZ03] [SS04]. These systems propose filtered
connectivity as a more reliable and more readily modeled metric than received
signal strength. The binary nature of the connectivity metric can also yield
simplifications in the localization algorithms. However, these systems do not
achieve accuracy approaching phase–based techniques.
Some recent work on an interferometry–based scheme presented in [MVD05]
demonstrated high accuracy in line–of–sight, low–multipath conditions: 5 cm av-
erage position error in an 18 by 18 meter field with three anchor points. While
these techniques are resilient to many types of amplitude noise, and are indepen-
dent of variations in signal amplitude due to transmitter or receiver variations,
they are not immune to multipath interference. Although this work did not show
results from environments containing reflectors, it seems likely that environments
rich in multipath interference would introduce significant difficulties.
18
2.5.2 Laser–based Localization
Laser ranging and pointing systems have been used for localization in the robotics
community for many years. One of the most popular systems is the SICK Scan-
ning Laser Rangefinder, an off–the–shelf module that can easily be attached to
a mobile robot and can report the range to reflective surfaces with accuracies
as high as a few millimeters. However, these systems are generally large and
expensive, both in terms of monetary cost and energy cost.
In the world of sensor networks, several laser–based systems have been pro-
posed, including the Lighthouse system [Koe03] and the Spotlight system [HSS05]
[SHS05]. These systems work by scanning a laser over a field of devices and
marking the times that the laser is detected by each device. These times are
then correlated to the scanning position of the laser at that time to estimate
the location of the device. While these systems have been demonstrated to work
well outdoors, they require line–of–sight from the scanning laser to the nodes in
the field. This requirement is similar to the requirements of GPS, and is not a
practical assumption in a forested area.
2.5.3 Ultrasound Acoustic Localization
The most successful implementations of ad–hoc sensor network localization sys-
tems to date have been based on measuring acoustic time–of–flight. Of these,
most have presented localization solutions based on ultrasound, including the Ac-
tive Bat [WJH97], AHLoS [SKB00] [SPS03] [SHS01], Cricket [PCB00] [PMB01]
[SBG04], and Calamari [WC03]. Of these systems, all are ad–hoc systems save the
Active Bat, which relies on surveyed ceiling–mounted receivers. Calamari fuses
RF signal strength with ultrasound ranges to improve its performance. Cricket
is based on ceiling–mounted beaconers, which are self–configuring in version 2.
19
AHLoS is a fully self–configuring system developed for the Active Kindergarten
project.
While ultrasound holds many advantages, such as being inaudible, experience
with ultrasound has shown that it does not perform well in outdoor environments.
Ultrasound is readily blocked by obstructions such as foliage, and tends to have
lower effective range. In addition, most implementations of ultrasound rang-
ing use off–the–shelf detectors and narrow-band signals that tend to have lower
processing gain than the wideband coding techniques we can use using audible
acoustics. Work on wideband ultrasonic transducers by Hazas et. al. [HW02]
might offer better outdoor performance, but these transducers are not readily
available as a manufactured product, and so far there has been limited experi-
ence with these devices. By comparison, the broad frequency diversity achievable
using typical audio speakers enables excellent interference rejection, especially
for somewhat obstructed environments such as are found in forested areas. In
our experience [GE01] [MGE02] and others [KMS05] [SBM04], using the audible
acoustic spectrum has given excellent performance in outdoor environments.
2.5.4 Orientation Discovery
Determining the orientation of the arrays is also a challenging problem. As we
have seen, this is a critical aspect of the calibration if we intend to support
distributed sensing applications. The small size of the arrays makes accurate
manual alignment of orientation difficult because a small movement of the array
yields a large rotation. Magnetic orientation sensors could be employed, but
they are subject to significant local variation caused by metallic objects and
other sources of magnetic interference. Instead, we use the acoustic sensors to
estimate the array orientation directly. This way, we avoid the need to reference
20
the measurement to the exact physical configuration of the array. The principles
we use to estimate direction of arrival are similar to those used in the Cricket
Compass [PMB01], although our algorithms achieve better results with fewer
receivers.
2.5.5 Alternatives to the use of Sensor Arrays
As an alternative to the use of sensor arrays, other work such as [RDY05]
and [DZL05] has suggested that a single sensor with a known near field envi-
ronment can be used for DOA estimation. In these methods, the signal from a
single sensor is deconvolved according to a known near field impulse response,
which is parameterized in terms of the direction of arrival. Given a hypothesized
direction of arrival, this method can be used to perform spatial filtering. This
principle is applied in human auditory perception to estimate direction of arrival.
However, it is not clear whether we gain much for our systems by using this
technique.
Since these methods require a fixed near–field configuration (in the case of
humans, the head), having one microphone in addition to external reflectors may
not actually result in a reduction in form factor relative to a microphone array. In
addition, while it is understood how to apply this method to spatial filtering, we
are not currently aware of algorithms that can do the inverse: efficiently recover
the best match impulse response from a family of responses parameterized on
incoming angle1. Since additional microphones are relatively inexpensive compo-
nents, adding microphones may ultimately be cheaper if they significantly reduce
the processing required.
1While humans use these methods to estimate incoming angle, we need our system to achievehigher levels of accuracy, and we have far less computational power to work with.
21
The other drawback is that the precision of our direction of arrival estimate
depends a great deal on how precisely we know the incoming signal. For appli-
cations such as woodpecker detection, the signal is often not characterized well
enough to get a clean impulse response.
22
CHAPTER 3
Estimation of Range and DOA
In Chapter 2 we laid out the basic outline of the time of flight ranging system
developed in this work. In this Chapter we will present the detection and esti-
mation algorithms in more detail, describing the ranging system and the inside
of the “Detection Algorithm” box of Figure 2.2.
Ranging and localization systems have been discussed and characterized in
many survey papers [PAK05] [LR03] and book chapters [SGS04] [KSP03]. This
ranging system is an active, cooperative ranging mechanism. It is an active
mechanism because the emitter in the system generates a signal specifically so
that a receiver can detect it and determine a range and bearing estimate. It is
cooperative because the emitter and receivers are working together: the emitter
notifies the receivers when a signal is emitted, and provides explicit timing and
decoding information.
In this system, the emitter selects a code seed and generates a coded ranging
signal. This ranging signal is then emitted through the speaker outputs. In order
to determine the exact time of emission, a segment of data is captured from the
local microphones and processed to detect the signal. The local detection time
and the code seed are then sent over the network to reach the receivers.
Figure 3.1 shows an expanded view of the detection algorithm. In the first
stage, the input signals are extracted and filtered according to the particular code
23
Start Time
Rate Skew
Code
Filtering andCorrelation
Signal Input
4
4
Approx 1st Peak Phase
Noise Estimate
1st Peak Phase
θ, φ, V
SNR
Detection andExtraction
4DOA EstimationAnd CombiningStart Time
Rate Skew
Code
Filtering andCorrelation
Signal Input
44
44
Approx 1st Peak Phase
Noise Estimate
1st Peak Phase
θ, φ, V
SNR
Detection andExtraction
44DOA EstimationAnd Combining
Figure 3.1: Block diagram of the ranging detection algorithm.
used by the emitter, as described previously in [GE01]. In the second stage, the
ranging signal is detected in the signals, and the approximate phase of the signal
is determined. Using this approximate phase, input signals are cropped to select
out only the segment of the input containing the ranging signal. In the final
stage, the cropped inputs are analyzed to estimate direction of arrival (DOA).
Then, that DOA estimate is used to recombine the 4 channels into a single signal,
from which a more accurate phase estimate and SNR value can be determined.
The following sections will describe each of these stages in detail.
This work is similar to other work such as the work of Sallai [SBM04], which
also performs ranging using audible acoustic signals. One advantage of Sallai’s
work is that it is simple enough computationally to fit on a Mica2 mote. However,
as we will see in Chapter 9, the performance of our detection algorithms is far
superior to that of the Mote–based system, and in addition our system determines
direction of arrival. Where our system measures range with a standard deviation
of 3.8 cm, the system described in [SBM04] achieves a standard deviation of
approximately 20 cm.
24
Start Time
Rate Skew
CodeModulator FFT
Signal Input
4
Extract
FFT4 4 4
FD Correlation2 KHz High
Pass
Start Time
Rate Skew
CodeModulator FFT
Signal Input
44
Extract
FFT44 44 44
FD Correlation2 KHz High
Pass
Figure 3.2: The Filtering and Correlation stage of the ranging detection algorithm.
3.1 Filtering and Correlation
The Filtering and Correlation stage of the detection process extracts a segment
from the input signal and filters it in preparation for detection. This stage has
two parallel tracks: one that generates and processes the reference signal, and
the other that processes the acoustic input signals.
-30000
-20000
-10000
0
10000
20000
30000
0 4 8 12 16 20 24 28 32 36
DA
CV
alue
Samples (48 KHz)
PN Code Modulation
Figure 3.3: Modulation for seed=7, encoding the sequence 0111011110.
25
100000
1e+06
1e+07
1e+08
1e+09
1e+10
1e+11
1e+12
1e+13
1e+14
1e+15
0 2 4 6 8 10 12 14 16 18 20 22 24
Pow
er
Frequency Component, KHz
(a) Exact Spectrum of PN Code Emitted by 103
100000
1e+06
1e+07
1e+08
1e+09
1e+10
1e+11
1e+12
1e+13
1e+14
1e+15
0 2 4 6 8 10 12 14 16 18 20 22 24
Pow
er
Frequency Component, KHz
(b) System Response to PN Code Measured at 103
Figure 3.4: (a) The power spectral density (PSD) function for the exact reference
signal. (b) The PSD of the reference signal as recorded at the source.
26
100000
1e+06
1e+07
1e+08
1e+09
1e+10
1e+11
1e+12
1e+13
1e+14
1e+15
0 2 4 6 8 10 12 14 16 18 20 22 24
Pow
er
Frequency Component, KHz
(a) Spectrum Measured at 101 (80m range, Court of Sciences)
-60
-50
-40
-30
-20
-10
0
10
0 0.05 0.1 0.15 0.2 0.25 0.3
dB
Time lag in seconds
(b) Correlation of Reference with Input Measured at 101
Correlation (peak 7 dB)Noise Floor
Figure 3.5: (a) The PSD of the input signal received at Node 101, 80 meters from the
source. (b) The correlation of the signal above, expressed in the time domain. The
correlation peak is 7dB above the noise floor.
27
3.1.1 Code Generation and Modulation
Our ranging system implements direct–sequence spread spectrum [SP80] [Rap96]
in the acoustic domain, by emitting and detecting a coded ranging signal. The
codes used in this system are selected from a family of chaotic pseudo–noise (PN)
codes generated by repeated evaluation of the logistic equation,
xn+1 = Rxn(1− xn). (3.1)
These types of chaotic codes have been used successfully in other commu-
nications systems, including underwater acoustic communications systems such
as [APB02]. In our system, we quantize the output of the equation above to one
bit per iteration, such that
Cn =
0 if xn < 0.5
1 otherwise(3.2)
This code is then modulated on a 12 KHz carrier. The modulation scheme
is a modified Binary Phase Shift Keying (BPSK) scheme. An example of the
modulation can be seen in Figure 3.3. Essentially, the signal shifts phase 180 deg
on every 0 bit, and maintains the same phase on every 1 bit. An additional
discontinuity in the signal is introduced by starting and ending each bit at π2
rather than at 0. This forces an additional rail–to–rail transition.
We performed several empirical tests, weighing waveforms designed for smooth
transitions against waveforms with more abrupt transitions. We found that over-
28
all better performance in both range and phase accuracy was achieved using
more abrupt transitions. Although we were concerned that the abrupt transi-
tions might not reproduce well as they passed through the speaker components,
we found that they increase the energy delivered through the system and also
result in a much broader spectrum. In practice, using more abrupt modulation
yields higher signal–to–noise ratios (SNR).
0
5
10
15
20
0.998 0.999 1 1.001 1.0020
5000
10000
15000
20000
25000
dB
Tri
als
Rate Skew Multiplier
Peak Autocorrelation, in dB above Noise Floor
AutocorrelationObserved Rate Skew (±50 PPM)
Figure 3.6: SNR for PN code detection as a function of sample rate skew, and the
observed sample rate skew in 100K outdoor trials.
Figure 3.4 shows the power spectral densities of the exact reference signal
and the reference signal as emitted and measured directly at the source. These
graphs show that much of the energy in the original signal is preserved as it passes
through the system.
Generating the reference signal used for detection requires inputs from other
layers of the system. The detection algorithm is passed a message containing the
29
code seed used by the emitter, and the rate skew. The code seed parameter defines
the initial value for the chaotic function. The rate skew parameter defines the
relative skew between the emitter and receiver codecs, as determined by the time
synchronization subsystem.
When the signal is modulated, rate skew is used to adjust the modulation
rate to match the rate of the incoming signal. This skew input could also be
used to correct for Doppler shift in systems involving motion. Rate skew has a
significant impact on the performance of the correlation. Figure 3.6 shows the
peak correlation of one of our PN codes, correlated against itself with varying
degrees of rate skew. The y axis of the graph is in dB above the noise floor
(see Section 3.2 for more details on how we estimate the noise floor for this
system). The plot shows that a skew rate of just 0.06% results in the maximum
correlation peak dropping to the noise floor. This skew rate could result from a
relative velocity of 2 m/s.
Measurements of sound card oscillators have shown accuracy on the order of
50 parts per million (PPM) [Baa05]. Observations with our system confirm this
measurement. Figure 3.6 shows the distribution of observed rate skew for 100K
trials. Our observations showed a mean of 1, a standard deviation of 13 PPM,
and a range of 100 PPM.
3.1.2 Input Extraction
Along with the code seed and rate input, the detection algorithm is passed a
message containing the signal start time. This start time has already been con-
verted by the time synchronization subsystem so that it is expressed in terms
of our local CPU clock. The sampling layer can therefore be queried to extract
segments of the signal starting at the time of emission. Once the signal is located
30
in those segments, the distance from the start time to the detection time, or lag,
determines the time of flight.
After extracting the relevant segments from the 4 input channels, the input
segments are transformed to the frequency domain using an FFT. A 2 KHz high
pass filter is then applied to the signals to eliminate low frequency noise. This
eliminates wind noise and other environmental sources of low frequency noise
with large amplitude. This can be seen clearly in Figure 3.5, where the power in
the low–frequency components dwarfs the rest of the signal. However, as we see
in Figure 3.4, the ranging signals used in this system roll off below 2 KHz. In
addition, the response of the piezo emitters used in the system drops off around
3 KHz. Therefore, a 2 KHz high pass filter can be applied without significantly
impacting the ranging signal.
3.1.3 Correlation
After the reference signal and the input signals have been transformed to the
frequency domain and pre–filtered, the next step is to correlate the input signals
to the reference. This process is sometimes called a matched filter because it
filters the input signal exclusively for superimposed copies of the reference signal.
Unlike simple filters that match a continuous band of frequencies, a matched filter
matches against an exact distribution of frequencies, and therefore is much more
specific.
In the time domain, this process is called convolution, and is implemented
as a “sliding correlator”, in which the reference is matched against the signal at
every possible offset. In the frequency domain, this can be done by multiplying
the two frequency distributions together [KC76]. The result of correlation (after
transforming back to the time domain) is shown in Figure 3.5(b). As we can
31
see, even at 80m range this correlation function achieved an excellent signal to
noise ratio of 7 dB. The online determination of the noise floor is discussed in
Section 3.2.
3.2 Detection and Extraction
After the initial filtering and correlation steps are complete, the next stage im-
plements a rough detection to locate the ranging signal in the input. In the
detection stage, the input signals are first transformed back to the time domain
to yield correlation functions. Next, the correlations are processed by an adap-
tive noise estimator and peak detector that determines an estimate of the earliest
peak above the noise floor on each channel. Based on these estimates, a small
segment containing the peaks is extracted from all four channels. These segments
are interpolated at a higher resolution and normalized based on the noise floor
estimates. These small segments in the time domain are then passed along to the
estimation stage.
3.2.1 Noise Estimation and Peak Detection
The detection problem for ranging differs from many similar problems in the area
of communications. The difference stems from the fact that the most accurate
range estimate comes from the first arrival, rather than the strongest arrival.
So, where in communications systems the object is to locate and combine the
strongest multipath components, with ranging the object is to locate only the
first arriving component, which may not be the strongest. To achieve this, our
detector must develop a model of the noise and lock onto the first significant
deviation from that model. In practice because the noise level can vary somewhat
32
0
0.1
0.2
0.3
0.4
0.5
-50000 -25000 0 25000 50000
Fra
ctio
n
Correlation Value
Distribution of Autocorrelation Noise
Fraction of Correlation Points in BinLogistic PDF, µ = −4.09, b = 6510.10
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
-50000 -25000 0 25000 50000
Fra
ctio
n
Correlation Value
Filtered Distribution of Autocorrelation Noise
Fraction of Correlation Points in BinLogistic PDF, µ = −8.29, b = 9233.11
Figure 3.7: Analyzing the distribution of autocorrelation noise.
33
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
-150 -100 -50 0 50 100 150
Fra
ctio
n
Correlation Value
Distribution of Correlation Noise for 103–101
Fraction of Correlation Points in BinLogistic PDF, µ = −0.00125, b = 31.847
1
10
100
1000
10000
100000
0 20 40 60 80 100
Tri
als
Correlation Peak in Multiples of σ
Distribution of Peak Values among Successful Trials, C = 12
Noise PeaksDetection Peaks
Figure 3.8: (a) Distribution of noise for the correlation shown in Figure 3.5. (b)
Distribution of peak correlation values for a 100K trial outdoor test. For each successful
trial, the largest noise peak and the detection peak are included.
34
FD Correlation
Extract
44
Adaptive NoiseEsimtator andPeak Detector
Approx 1st Peak Phase
Noise Estimate
InterpolationFFT � 8x IFT
TD Correlation
4
IFFT4
Normalize
4
FD Correlation
Extract
4444
Adaptive NoiseEsimtator andPeak Detector
Approx 1st Peak Phase
Noise Estimate
InterpolationFFT � 8x IFT
TD Correlation
44
IFFT44
Normalize
44
Figure 3.9: The Detection and Extraction stage of the ranging detection algorithm.
as a function of time, we would also like to develop a model that reacts to changes
in the noise level.
To address this problem, we first need to develop a model for the noise in
the system. Because of the similarity between our detection process and straight
autocorrelation, we begin by examining the distribution of autocorrelation noise
for our PN code family. In autocorrelation, the reference code is correlated to
itself. In our system, the reference is correlated to a copy of the reference which
has been passed through a system function that includes distortions introduced
by the physical emitter, the receiver and the intervening environment. Therefore,
the noise model should be some combination of the autocorrelation noise and the
noise introduced in the system.
3.2.1.1 Modeling Autocorrelation Noise
Figure 3.7 shows the results of analyzing the distribution of autocorrelation noise.
Figure 3.7(a) shows the distribution of correlation values of a typical PN code,
after removing the values immediately surrounding the peak. The distribution
has a heavy proportion of small values which we conjecture are artefacts of the
exactness of the functions being correlated. We also conjecture that noise intro-
duced by the environment will dominate over these small correlates, effectively
35
redistributing those points according to the noise model imposed by the system
and environment.
If we filter these small values out, the distribution roughly fits a logistic dis-
tribution characterized by the distribution function
L(x, µ, b) =1
1 + e−(x−µ)
b
, where (3.3)
b =σ√
3
π. (3.4)
Figure 3.7(b) shows the fit to the logistic distribution, achieving a value of
D+ = 0.0185 from the Kalmagorov–Smirnov test and A2 = 2.074 from the
Anderson–Darling test. These fit results leave room for future work to better
characterize this data.
Using this fit, we can explain how we determined the noise floor for the graph
in Figure 3.6 that shows autocorrelation peaks as a function of rate skew. Based
on the distribution in Figure 3.7(b), we defined a safe noise floor as 6σ, which
in that case is 105. Applying this to the data in Figure 3.6 yielded good results.
The largest peak that yielded an incorrect answer (i.e. did not detect at 0–lag)
was 5.1σ. While some smaller peaks yielded the correct answer, no peaks over
6σ were incorrect.
3.2.1.2 Modeling Noise from Real Data
Next, we considered real data from an outdoor experiment in a forested region
of the James Reserve [Ham00]. We will discuss this experiment in more detail in
Chapter 10, but for now we will consider the data set as 105 independent ranging
36
trials. Here we follow a similar approach to the case of autocorrelation noise, in
which we examine a single trial in detail and then look at the results from the
complete set of trials.
Figure 3.8(a) shows the distribution of correlation noise for the ranging trial
shown in Figure 3.5 (an 80 meter test). There is a recursive problem in deter-
mining a noise distribution to detect noise: we want exclude the signal from our
distribution, but we also want to use the distribution to distinguish noise from
signal.
To address this we first empirically determine a cutoff factor by choosing a
cutoff and comparing the output of an online estimator to ground truth. The
online estimator continuously estimates the mean and variance of the correlation
until the data exceeds the current standard deviation estimate by the cutoff
factor. In the event that no point exceeds that factor, the signal is assumed to
be undetectable and that trial is dropped. In this empirical design process we
reduce the cutoff until the estimator began returning values less than the ground
truth lag—values that result from locking onto noise.
We define the estimator more clearly as follows: given a collection of signal
time series Si and ground truth values Gi, we define the online estimates of mean
and standard deviation:
µi,j =
∑
k<j Si,k
j, (3.5)
σi,j =
√
√
√
√
∑
k<j (Si,k − µi,j)2
j. (3.6)
37
We then select the lowest cutoff value C such that:
∀i(
∃j s.t. arg minj
Si,j > Cσij
)
→ j ≥ Gi. (3.7)
Once a hypothetical cutoff is selected, we measure how well this cutoff per-
forms using a large data set. Using our cutoff–based detection algorithm to lo-
cate the signal, we then analyze the preceding noise offline. This analysis shows
that the noise preceding the detection is roughly similar to the “filtered” auto-
correlation noise, with a weak fit to the logistic distribution, as in for example
Figure 3.8(a). For each trial in the data set, we analyze the noise distribution and
record the noise peak Ni, the largest candidate point before the detection peak Di
that exceeded the cutoff:
Ni = arg maxk<j
Si,k
σi,k
, (3.8)
Di =Si,j
σi,j
. (3.9)
Figure 3.8(b) shows a plot of the distribution of the Ni and Di over all trials for
which a detection peak was found. This distribution plot serves as a verification
of the choice of cutoff C, because it shows a clear gap between the N distribution
and the D distributions. This gap assures us that by choosing our cutoff value
C, we will have a very low likelihood of a false positive detection, in which we
mistake a sample from the Ni distribution as a detection peak. It also assures us
that we are unlikely to significantly lower C without increasing the probability
of false positives; rather, improvement can only come through a more accurate
38
noise model, i.e. a model more sophisticated than standard deviation.
3.2.1.3 An Adaptive Noise Model
The last remaining part of this problem has to do with reacting to dynamics in
the level of noise. This idea fits well with the concept of an online estimator, as
we can readily design an online estimator to incorporate a filter that favors recent
data over older data. The most common solution to this type of problem is the
Exponentially Weighted Moving Average (EWMA), which is a type of filter that
is easy to implement digitally and is equivalent to an “RC” low–pass filter. A
EWMA function E(t) filtering the isochronous signal S(t) is represented by the
update function:
Et = αEt−1 + (1− α)St, 0 < α < 1. (3.10)
Each element St is carried forward in the average with an exponentially de-
creasing weight; at time t, the element St−k is weighted by αk(1 − α). We can
control the scale of the adaptation by selecting α. To see how, consider the step
response of the EWMA function. If we assume that E0 = 0 and S0+ = 1, then
(applying the Maclaurin series):
Et = (1− α)t−1∑
i=0
αi
= (1− α)
(
∞∑
i=0
αi −∞∑
i=t
αi
)
= (1− α)
(
(
1
1− α
)
− αt∞∑
i=0
αi
)
39
= (1− α)((
1
1− α
)
− αt
(
1
1− α
))
= 1− αt
log(1− Et) = t log(α)
α = elog(1−Et)
t .
Translating this to acoustics, by choosing α we can select at what scale to
adapt to changes in the noise environment. Because physical processes are the
source of sounds, the time constants involved with onset of noise are typically
scaled to match the rates that objects typically are expected to move. Since we
control the signal we intend to detect, we can ensure that its onset is much faster
than the expected types of noise, and we can select a filter that can adapt to en-
vironmental noise without risk of filtering out our signal. In our implementation,
we selected α = 0.99, in order to adapt to within 1% for changes on the order of
5 ms (480 samples). The results presented in Figure 3.8(b) use an EWMA filter
with α = 0.99 to compute the estimates of mean and variance.
3.2.2 Extraction
The detection algorithm described above is run on each of the 4 channels to de-
termine a noise floor estimate and an estimate of the earliest arrival time of the
signal. Next, we extract a peak–region segment, a small segment of 64 samples
surrounding the earliest arrival time, so that we can focus the rest of the process-
ing on that segment. This technique reduces processing requirements and also
helps to filter out multipath interference. By extracting a small region centered
on the first arrival, echoes and reverberation that arrive with a phase lag greater
40
than 321 samples (22 cm of distance) will be ignored.
To extract the segment, we select a center point based on the results of the
detection algorithm. Only detection times with SNR values above the cutoff
are considered. If none of the channels achieves this, the detection is aborted.
Otherwise, the earliest time among the four channels is used to define the center
of the extraction. This heuristic increases the probability that the system locks
on to the true shortest path, even if one of the channels misses the earliest path
but detects a strong reflection. The 64 sample peak–region segments are then
extracted from each of the four channel inputs, and passed along to the next
stage.
3.2.3 Interpolation and Normalization
In this stage, the peak–region segments are normalized to equalize their noise
floors and they are interpolated to a higher sample rate using the Fourier series.
In the normalization step, each of the signals’ mean and variance is estimated
based on the samples immediately preceding the peak regions. The signals are
then adjusted so that these parameters are matched across the four channels.
Next, the signals are re–sampled at a higher resolution. To re–sample a seg-
ment, we first compute the FFT of the data to generate the coefficients of the
Fourier series that reconstructs the time series. Then, we numerically evaluate
the Fourier series at 8x resolution to generate a time series with exactly the same
frequency composition as the original, but with 8x as many points. While this
does not add any additional information, it does enable us to perform subsample
shifts when implementing a limited–slip sliding correlator in the next section.
1A region of 32 samples is extracted because 32 is the next power of 2 larger than the longestallowed phase lag between any two microphones based on array geometry. For larger arrays,this size would necessarily increase.
41
This technique is equivalent to fractional–phase shifts when correlating in the
frequency domain, but in our case, with small series and limited phase shifts,
time–domain correlation is considerably more efficient. One question which we
leave to future work is the question of how much information is lost when we
perform this re–sampling operation using only components from the small, 32–
bin FFT of the peak–region segment. We have not experimented to determine
the optimal size of the segment to use to re–sample the peak–region, nor the
optimal expansion factor in the re–sampling operation.
3.3 DOA Estimation and Combining
TD Correlation
DOA Estimator
1st Peak Phase
θ, φ, V
Combiner
Peak Detector
SNR
Max
TD Correlation4
6TD Correlation
DOA Estimator
1st Peak Phase
θ, φ, V
Combiner
Peak Detector
SNR
Max
TD Correlation44
66
Figure 3.10: The DOA Estimation and Combining stage of the ranging detection algo-
rithm.
The final stage of processing, shown in Figure 3.10, takes as input the pre–
filtered peak–region segments and estimates a direction of arrival (DOA) and
range estimate. We use a technique known as 2–TDOA to estimate the DOA of
the incoming signal [RF03]. This estimation process begins by cross–correlating
the channels to determine the most likely relative phase lag for each pair. These
lags form constraint equations that are solved using least squares to estimate DOA
parameters (θ, φ) and a compensation variable v to allow for local variations in
the speed of sound. If a DOA estimate can be determined, that estimate is used
to combine the four channels before doing a final peak detection to estimate the
42
range. We now discuss each of these steps in more detail.
3.3.1 Lag Finding
To find the most likely phase lags, we do a cross–correlation of each pair of
channels and find the simple maximum of each correlation. Unlike the case of
range determination where we want to determine the absolute phase of the signal,
for DOA determination we are only interested in the relative phases of the signals.
This means that we are free to lock on to the strongest part of the signal without
worrying that it might not represent the true onset. To get accurate relative
phase, it is important that our algorithm consistently lock on to the same feature
in each of the lagged signals. Our experience indicates that a simple max achieves
this property.
The work done in the previous stage to re–sample the peak region now pays
off in DOA estimation. Each cross–correlation has a maximum relative phase
offset bounded by the distance between the microphones given the geometry of
the array. This means that we can apply a limited–slip cross–correlation in the
time domain, which can be less expensive than a frequency–domain correlation.
If M is the slip required and N is the length of the segment, then a time–domain
solution is faster whenever M < 2C lg N , where C is the constant cost of the
FFT operation.
In this case re–sampling is also critical, because the small size of the array
means that phase lags must be determined with subsample precision2. For ex-
ample, one sample lag for a beam arriving at 90 deg resolves to a 5 deg variation,
because a single sample is 0.71 cm, and the baseline of the array is only 8 cm:
2Doing the correlation in the frequency domain would give the same result if the result oftransforming back to the time domain were interpolated in the same way as our re–samplingprocess.
43
θ = cos−10.71
8= 84.9. (3.11)
The output of the cross–correlation step is summarized to describe the relative
phase lags Li,j and the maximum correlation values Wi,j for each of the 6 pairwise
correlations.
3.3.2 DOA Estimation
Given the lag components and their weights, we now use a weighted least–squares
minimization to estimate the most likely DOA. We define the constraint equations
based on the geometry of the array and on the array coordinate system. From the
array coordinate system, we can define the unit vector for a particular direction:
cos θ cos φ
sin θ cos φ
sin φ
. (3.12)
The phase lag observed between two microphones is therefore the component
of that unit vector along the vector between the two microphones. Therefore, by
combining the lags Li,j computed in the previous step with the array geometry
and coordinate system (recall from Figure 2.3), we can therefore derive constraint
equations in the variables (V, θ, φ):
44
V
0 8 0
−8 8 14
−8 0 0
−8 0 14
−8 −8 0
0 −8 −14
cos θ cos φ
sin θ cos φ
sin φ
=
L1,2
L1,3
L1,4
L2,3
L2,4
L3,4
. (3.13)
Although these constraints are not linear functions, we can apply an iterative
non–linear least squares minimization, starting with an initial estimated value
for the variables and iteratively improving upon that estimate. To determine
the initial estimate, we test four hypothesis angles per quadrant, and select the
minimum of those 32 alternatives. Once we have an initial estimate, we itera-
tively apply a weighted least squares minimization to a linearized version of the
constraint equations, with the weights defined by the values of the maximum
correlation values Wi,j.
Linearization is a technique in which the partial derivatives of the constraint
equations are evaluated according to an existing estimate, and the resulting linear
equations are solved to determine a correction to update the estimate. This
technique is explained in greater detail in Section 4.3.2. In this case, the linearized
equations for the six constraints in Equation 3.13 are:
45
8 sin θ cos φ 8V cos θ cos φ −8V sin θ sin φ
−8 cos θ cos φ+
8 sin θ cos φ+
14 sin φ
V
8 sin θ cos φ+
8 cos θ cos φ
V
8 cos θ sin φ+
−8 sin θ sin φ+
14 cos φ
−8 cos θ cos φ 8V sin θ cos φ 8V cos θ sin φ
−8 cos θ cos φ+
14 sin φ
8V sin θ cos φ V
8 cos θ sin φ+
14 cos φ
−8 cos θ cos φ+
−8 sin θ cos φ
V
8 sin θ cos φ+
−8 cos θ cos φ
V
8 cos θ sin φ+
8 sin θ sin φ
−8 sin θ cos φ+
−14 sin φ
−8V cos θ cos φ V
8 sin θ sinφ+
−14 cos φ
dv
dθ
dφ
=
L1,2 − V (8 sin θ cos φ)
L1,3 − V (−8 cos θ cos φ + 8 sin θ cos φ + 14 sin φ)
L1,4 − V (−8 cos θ cos φ)
L2,3 − V (−8 cos θ cos φ + 14 sin φ)
L2,4 − V (−8 cos θ cos φ− 8 sin θ cos φ)
L3,4 − V (−8 sin θ cos φ− 14 sin φ)
. (3.14)
These linearized constraints are approximations that are only valid in the
region surrounding the initial estimate, meaning that a large movement of a
variable is a sign of potentially dangerous instability. This iterative algorithm
is also subject to settling on local minima; the algorithm implements steepest
descent from the initial estimate to reach a local minimum. We found that the
objective function is quite smooth and our relatively lightweight initial estimate is
sufficient to get into the region of convergence for the true minimum. Figure 3.11
shows a plot of the objective function for a test with φ = 0, θ = 281.
The most serious problem we encountered with this estimation mechanism was
that the results for φ > 50 deg tended to be skewed for some azimuth angles. We
46
020406080100120
Min
imum
Err
or
θ
φ
Plot of DOA objective function for θ = 281 deg
0 50 100 150 200 250 300 350
-80-60-40-20
020406080
Figure 3.11: Plot of the DOA objective function observed for a test with φ = 0, θ = 281.
hypothesized that this was caused by the “self–ranging” speaker which is placed
directly over the channel 0 microphone. This obstruction introduces excess lag
on that channel for high elevation arrivals. By compensating for this lag with a
fixed constant, we achieved an improvement in convergence for the zenith angle
estimates.
In some cases, the NLLS estimate does not converge well and yields a local
minimum. In these cases, we achieved results closer to ground truth through a
brute force solution in which we tested every possible angle at 1 deg intervals.
We apply this fallback mechanism whenever the weighted sum of the 6 residuals
is greater than 10 cm.
In Section 9.1 we discuss these results in more detail. There, we speculate
that a more thorough per–array calibration procedure would yield a performance
improvement for both zenith and azimuth angle estimates. We also hypothesize
47
that shadowing of one microphone by the mounting for another might explain
some of the estimation errors. In that case, an outlier rejection heuristic might
help to address that problem.
3.3.3 Alternative Approaches to DOA Estimation
Some work has been done to develop DOA estimators that use all of the infor-
mation in the signals rather than only using one lag estimate per pair of micro-
phones [RF03]. These techniques work by assuming some fixed set of hypothesis
directions, and testing each hypothesis to find the direction with the maximum
correlated energy.
We experimented with this type of solution in our angular correlation algo-
rithm (AC), but found its performance to be lower than was achievable using
2–TDOA. In our solution, for each hypothesis angle, we phase–shifted the signals
according to the lags induced by the hypothesis angle and correlated all four
channels together at that specific lag configuration. This algorithm is similar to
a “sliding correlator”, but where a sliding correlator adjusts the relative phase of
two signals for all possible linear phase lags, our “angular correlator” adjusts the
phase of all four signals according to the lags induced by all possible 3D incoming
angles.
While the AC algorithm in principle uses more of the information in the input
signals, it performs poorly in practice, because it makes rigid assumptions about
the geometry of the array. In 2–TDOA, the 6–way cross–correlation determines
the most likely lag measurements regardless of whether those lags are an exact
fit to the geometry of the array. In fact, many practical considerations reduce
the likelihood that the empirical geometry will conform exactly to the nominal
specifications. There are many factors that affect the empirical geometry, in-
48
cluding error in microphone mounting, phase dependence on incoming angle, and
environmental variations in the speed of sound.
Because of these slight deviations, our AC algorithm often misses the peak
correlation lags that the 6–way cross–correlation in 2–TDOA consistently locates.
The probability that all four lags are exact (a condition that would result in AC
computing the maximal correlation) is quite low in practice, and there is a rapid
falloff as the correlation moves off–peak. In comparison, the 2–TDOA algorithm
extracts the maximal energy from each pairwise cross–correlation, and then fits
those lags to the geometry, allowing for error in both lag detection and array
geometry. As a result, the AC algorithm failed to improve upon the 2–TDOA
algorithm, despite taking a larger amount of signal information into account.
3.3.4 Recombination
Once we have an estimate of the direction of arrival, we can use that information
to recombine the signals according to the expected phase offsets into a single time
series with higher SNR. This technique is often called beam–forming or spatial
filtering. To do this combination we apply a heuristic that combines the DOA
estimate computed in the previous step with the results of the pairwise cross–
correlation.
A spatial filtering algorithm takes as an input M input signals Si, their weights
wi and their relative phase offsets pi. The filtered signal S ′ is computed by the
weighted sum:
S ′
j =M∑
i=1
wiSi,j+pi. (3.15)
49
Given a DOA estimate, we can compute the corresponding lags using the
formulæ given in the constraint equations for the 2–TDOA algorithm in Sec-
tion 3.3.2. However, because of variations in the array geometry these computed
lags do not perfectly match the observed maximum correlation values.
To address this problem we apply the following heuristic to adjust the phase
lags computed from the DOA estimate so that they fit the observed correlation
peaks more accurately:
AdjustLags(DOALags[4], XCorrLags[6], XCorrPeaks[6])
indices [6 ]← 0,1,2,3,4,5
mapi [6 ]← 0,0,0,1,1,2
mapj [6 ]← 1,2,3,2,3,3
AscendingSortByKey(indices,XCorrPeaks, 6)
for index ∈ [0, 5]
do i ← indices[index]
norm ← XCorrPeaks[i]/max(XCorrPeaks, 6)
curr ← (DOALags[mapj[i]]−DOALags[mapi[i]])
∆← norm ∗(XCorrLags[i]− curr)
if ∆ < 2 samples
then DOALags[mapj[i]]← DOALags[mapj[i]]−∆/2
DOALags[mapi[i]]← DOALags[mapi[i]] + ∆/2
This heuristic will ignore cross–correlation measurements that differ greatly
from the DOA estimate. When correcting the lags, it will favor corrections that
match the highest cross–correlation peaks, by scaling down corrections resulting
from lesser peaks and by performing the corrections resulting from larger peaks
last. Figure 3.12 shows an example of the improvement yielded by our heuristic,
relative to combination based only on the DOA estimate.
50
-10
-5
0
5
11260 11280 11300 11320
SN
R
Samples (48 KHz)
Combined Signal at 101 (80m range, Court of Sciences)
Combined using DOA EstimateCombined using Heuristic
Detected Peak Phase
Figure 3.12: Combined signals for the trial from Figure 3.5. The two curves show the
effect of recombination using the straight DOA estimate and our heuristic.
3.3.5 Peak Detection
Once the signal has been recombined, we need to determine the onset of the rang-
ing signal in order to get a range estimate. Figure 3.12 shows an example of the
recombined signal for the 80m trial described earlier in Figure 3.5. Determining
the peak is a difficult problem because the output of the correlation function
has strong negative and positive peaks, and often features trailing periodic rever-
beration. Depending on environmental conditions, this reverberation can often
approach or exceed the initial peak, so a simple maximum peak value is not a good
solution. If the goal is to achieve accuracy on the order of 1 cm, our selection
heuristic must consistently select the “same” peak for different measurements,
because the peaks are typically several samples apart.
The first key part of our heuristic lies in the definition of a “peak”. Rather
51
than measuring the absolute height of individual peaks, we instead measure
“swings” peak–to–peak, associating each swing with the preceding peak. This
is helpful because this eliminates the effect of any DC bias in the signal, which
could otherwise throw off an absolute measurement of peak value.
The second part defines the metric used to select a peak. By looking at the
correlation functions, we observed that a good peak selection typically has both
a large swing and a high slope. Thus, we define the metric for peak selection as
the magnitude of the swing times the slope of the swing, or S2/R where S is the
magnitude of the swing and R is the length of the “run”. We found that using
this metric, a simple maximum achieved good results, because reverberation and
pre–onset noise tends to have a lower slope than the main pulse.
Future work might develop a better metric. Possible avenues for analysis
include:
• Analyzing the distributions of swing value and slope.
• Tracing backwards from the maximum through the contiguous peak region.
• Applying information from the autocorrelation of the selected code.
3.4 Environmental Effects
Environmental parameters affect the accuracy of our ranging system. Temper-
ature, humidity, and wind affect the speed of sound, and therefore can have a
significant impact on the accuracy of a measured range. Note that because DOA
estimation operates on relative phase shifts, these parameters generally have min-
imal impact on DOA.
Because temperature and humidity affect the speed of sound in air, they cause
52
error that scales with distance. Wind can carry the sound in a particular direc-
tion, also resulting in a distortion of the results. These errors can be significant;
for example, a 1 deg C offset in temperature results in approximately 1% error.
For a 80m range, a 1% error dominates over the other sources of error in the
system, all of which add up to a few cm at the most. This can be seen more
clearly in Chapter 9, where we will see that the range measurements exhibit high
precision even when the accuracy is compromised by environmental parameters.
Our initial strategy was to try to compensate for temperature by measuring
the temperature and humidity and correcting the speed of sound. This proved
unsuccessful for two reasons. First, it is difficult to measure these parameters
accurately enough to achieve good results. At 80 m, a 10 cm error translates
to about 0.2 deg C, but thermometers are generally specified to measure with
at most ±0.5 deg C. Second, the temperature measurements were only made at
a few points. However, the distortion of the propagation of the ranging signal
is really a path integral of the speed of sound over the path followed by the
signal. Measuring this value is difficult, and adjusting based on a single point is
dangerous because that point may be subject to significant local variations.
Because of these issues, we settled on a different strategy, in which we avoid
the problem. First, we recommend designing the system to perform calibrations
at night, when signal quality will be better and when the environment has reached
a steady–state temperature. Second, we design the system to take all of the cali-
bration measurements in a short time span, during which we can expect minimal
environmental change. Third, we use the multilateration algorithm described in
the next chapter to discover a map based on the geometric consistencies and inde-
pendent of scale, and then fit the resulting map to surveyed points to determine
the scaling factor implied by the environmental parameters.
53
CHAPTER 4
Multilateration
After we have estimated ranges and angles and determined confidence values,
we must synthesize this data into a single consistent map, by determining the
Cartesian coordinates and orientations for the arrays. While many solutions in
the literature address the problem of ad–hoc positioning, one component of our
work that has not been discussed much is the problem of orientation calibration.
As we discussed in Chapter 2, this problem is critical for our application, because
our platform must support localization applications based on DOA and beam–
crossing techniques.
4.1 Overview and Context
Our solution takes the general structure of iterative refinement. We begin by
computing an initial estimate, and then refine that estimate by solving a set of
constraints. In Sections 4.3.2 and 4.3.1 we present the details of two constraint
representations which we tested. Once our constraints reach convergence, we
analyze the system to determine if there are any constraints that fit very poorly,
and if so, remove the worst offender and try again.
Our iterative process solves for position and orientation in separate, inter-
leaved constraint systems. After the primary constraint system converges on an
improvement to the position estimates, we determine the incoming angles based
54
on that system and compare them to our DOA measurements. For each node,
we compute the average difference between measured and computed angles to
estimate a orientation bias for that node relative to the current set of position
estimates. We report this bias as the node’s orientation estimate. This algorithm
will be described in more detail in Section 4.4.
A number of proposed localization systems have focused on developing dis-
tributed algorithms that implement this map–building operation while mini-
mizing network transmissions, CPU usage, or both [CGS04] [SHS01] [SPS03]
[KMS05] [LR03]. In this work we chose to implement a centralized solution
rather than a distributed one, for two reasons. First, a centralized solution is
much simpler and requires a simpler communication protocol. Second, it is very
difficult to filter the data effectively without developing an over–constrained sys-
tem. Most distributed algorithms focus on collecting the smallest amount of
information required to perform multilateration, e.g. [SPS03]. However, because
this approach rarely yields over–constrained systems, it is much more difficult to
detect and reject bad data resulting from either system errors or environmental
problems such as obstructions.
There are numerous examples in the literature of solutions to this type of
problem based on least–squares minimization [CGS04], multi–dimensional scal-
ing [Tor52] [CGS04] [KMS05] [JZ04] [RD04] and maximum likelihood optimiza-
tion [RD04]. To apply these techniques, we represent our measured data as
weighted constraints. The weights are determined by modeling the error in the
underlying measurements. We can capture a model from our controlled tests,
although that model may not hold in general: obstructions and other envi-
ronmental effects may impact the distribution of errors. In our work we have
characterized the underlying ranging components to obtain an estimate of the
55
uncertainty in those measurements, and then implemented filtering at different
points in the process to reject bad data. We have applied two different forms of
least–squares minimization, one that results in a linear system similar to those
described in [CGS04], and one that is based on extending a multi–dimensional
scaling (MDS) solution and that we solve as a non–linear least–squares (NLLS)
problem.
We reject bad data in two ways: by checking for geometrical inconsisten-
cies using DOA information, and by checking for constraint inconsistencies using
the method of studentized residuals [WJH97]. Past work in this area has used
the triangle inequality to reject range information based on inconsistent geome-
try [GBE02] [KMS05]. In Moore’s work [MLR04], more sophisticated geometric
analyses were used to avoid inconsistency. However, while these methods may
be good choices for systems that provide ranges only, they are not as effective
as using the direction information that our system provides. Using DOA we can
immediately spot reflections by identifying ranges that arrive from the wrong
direction, and reject that data. Because virtually all significant errors in range
data are the result of reflections in obstructed environments, this technique is
very effective.
In the remainder of this chapter, we discuss our multilateration algorithms
in detail. The following sections present each step of the process in turn, from
developing the initial estimate, through position estimation and orientation esti-
mation.
56
4.2 Prefiltering and Initial Estimation
In Section 2 we defined the calibration problem to be a solution to an estimation
problem that computes the Cartesian coordinates and an absolute orientation
for each node in the system. We defined these unknowns as Xi, Yi, Zi, Θi for
each array i. The inputs to this estimation problem consist of the data from the
ranging layer described in Chapter 3. Each node in the system will emit ranging
sounds, that are detected by the other nodes in the system. Each node i will
record detected ranges Ri,j and angles θi,j and φi,j, where j is the emitter of the
signal being detected1. Each node performs five trials within a short span of
time, in order to minimize variation in environmental conditions. These values
are collected centrally to be processed.
In the first phase of processing, the data is filtered to remove obvious incon-
sistencies. The five trials are taken by each node are first filtered by selecting the
entry corresponding to the median angular estimate2. These medians are kept
and the remaining entries are dropped.
Whenever there are bidirectional ranges, the smaller of the forward and reverse
paths Ri,j and Rj,i are retained, while the forward and reverse angles are selected
based on the angular confidence estimate. However, if a range is more than 3
meters longer than its reverse, or if it is more than 10% longer than its reverse,
or if a range is dropped as an outlier, its corresponding angles are also dropped.
The logic behind this heuristic is that large range errors are usually the result of
reflections, which will also corrupt the angle measurement. Medium–length long
ranges are usually the result of minor obstructions, such as foliage. While these
1Note that we use Θi to mean the orientation parameter being estimated and θi,j to meanthe DOA estimate to node j measured by a node i.
2To compute the median angle, we use a heuristic that first selects the largest subset of thedata that lies in two contiguous quadrants.
57
DoGuess(node[],N, i)
node[i ].params ← AverageEstimates(node, i))node[i ].state ← guessed
node[i ].estimates ← nil
for 0 ≤ j < N
do if (node[j ].state = free) ∧ (∃(R, θ, φ)i,j)then Append(node[j ].estimates ,
Extrapolate(node[i ].params , (R, θ, φ)i,j))
InitialGuess(node[],N)
for 0 ≤ i < N
do node[i ].estimates ← nil
node[i ].state ← free
i ← arg maxi CountRanges(node[i ])1 DoGuess(node, i)
i ← arg maxiLength(node[i ].estimates)if node[i ].state = free
then goto 1
Figure 4.1: Algorithm for determining an initial parameter estimate.
ranges should be ignored, the angle is likely to still be fairly accurate. Note that
all of this data is subject to rejection after each constraint solving step, in the
event that it appears to be an outlier.
After pre–filtering the range and angle data, we next construct an initial
estimate of the system parameters. We construct this estimate by first selecting
an origin point. We choose an origin by finding the most “well–connected” node,
that is, the node with the largest number of ranges and angles to other nodes.
Once that node is selected, we can define its location as the origin and begin
extrapolating the locations of other connected nodes using the algorithm shown
in Figure 4.1.
58
4.3 Two Solutions to the Position Estimation Problem
In this work we tested two different solutions to the position estimation problem:
R–θ and Non–Linear Least–Squares (NLLS). In the R–θ scheme, we represent
each range and DOA estimate as a vector in R3 linking the coordinates of the
two nodes, resulting in a set of three linear equations. In the NLLS approach, we
represent each range and angle constraint separately, resulting in a set of three
non–linear equations that can be linearized and solved with iterative techniques.
The R–θ scheme is fast and simpler to understand, but it does not perform as
well as the NLLS approach.
4.3.1 R–θ
The R–θ scheme results in a set of three linear equations for each range and angle
estimate. A 2–dimensional R–θ solution is described in [CGS04] and [CDG03].
Let us consider from the perspective of node Ni. Let Ri,j, θi,j, and φi,j be node
Ni’s estimate of the range, azimuth and zenith DOA to node Nj, and let us
assume Θi to be an initial estimate of the orientation of node Ni. We can then
write the constraints
Xj −Xi = Ri,j cos(θi,j −Θi) cos φi,j, (4.1)
Yj − Yi = Ri,j sin(θi,j −Θi) cos φi,j, (4.2)
Zj − Zi = Ri,j sin φi,j. (4.3)
Since we assume Θi to be constant, and all of the other values are measure-
ments, the only variables are the position estimates (Xi, Yi, Zi). We also note
59
Measurement Mean Error Standard Deviation
Range (cm) -2.38 1.76Azimuth (deg) 0.14 0.96
Zenith Overall (deg) 2.22 7.97Zenith, −30 deg–+45 deg 0.26 0.86Zenith, +45 deg–+90 deg 0.31 2.29
Table 4.1: Error Distributions for Range and DOA Estimates.
that these are linear equations and this are readily and efficiently solved using
weighted least–squares minimization, for example using singular value decompo-
sition [PTV92]. This technique assumes that the errors in the data are normally
distributed, and if so, a weighting value can be applied to each equation by mul-
tiplying through both sides by the square root of that weight.
This weighting value is inversely proportional to the square root of the stan-
dard deviation of the distribution3. Thus, in order to choose a weight we need
an estimate of the distribution of the errors in the measurements. Based on
the experiments we will describe in Chapter 9, Table 4.1 provides the standard
deviation values for range, azimuth and zenith estimates.
We can now apply these distribution estimates to derive weightings for each
of the constraints. First, we note that the uncertainty in one of our constraints
is largely due to the angular estimates. The angular error causes a position
error that is proportional to the range. Since we see that our range estimates
are accurate to within a few centimeters, and typical inter–node spacings are on
the order of tens of meters, the position error resulting from angular error will
dominate.
3The square root is needed because the least–squares minimization implicitly squares theerror terms.
60
Thus, neglecting the uncertainty in the range estimate, we can derive weight
values for our constraints:
wXi,j= max (Ri,j cos(θi,j −Θi ± σazi) cos(φi,j ± σzen)) , (4.4)
wYi,j= max (Ri,j sin(θi,j −Θi ± σazi) cos(φi,j ± σzen)) , (4.5)
wZi,j= max (Ri,j sin(φi,j ± σzen)) . (4.6)
Each row of the constraint matrix is then divided by the square root of the
appropriate weight value, and it is solved in the normal fashion.
After solving, we can analyze the residuals to find the constraint that con-
tributes the most to the fit error, and possibly drop it as an outlier. We describe
this process in Section 4.5.
The R–θ solution is computationally fast, but as we will see in Section 10, it
does not perform well. The reason is that all of the constraints involve angular
measurements, and therefore they cannot take advantage of the much higher pre-
cision provided by the range measurements. This results in an order of magnitude
greater average positioning error, as well as an order of magnitude poorer fit to
the ranging data.
These results are fairly consistent with the results from [CGS04]. Although
that paper lauded their simulations of the R–θ technique as a success, when
applied to scenarios with 20–meter inter–node spacing, even according to their
data this technique results in considerable error.
61
4.3.2 Iterative Non–Linear Least–Squares Minimization
The second approach we investigated uses Iterative Non–Linear Least–Squares
Minimization (NLLS). A similar approach to this is also described in [CGS04],
where they call it Iterative Least–Mean–Square Refinement.
While systems of linear equations can be solved quickly, they are very limited
in the sorts of constraints they can express. For example, a central constraint for
a multilateration algorithm is the distance formula
Ri,j =√
(Xj −Xi)2 + (Yj − Yi)2 + (Zj − Zi)2. (4.7)
However, this constraint (along with others detailed in Section 4.3.2.2) can-
not be solved directly using linear algebra because it is not a linear function in
Xi, Yi, Zi. Instead, we use a technique called Linearization to convert a system
of non–linear constraints to a set of linear equations. By solving these linear
equations, we can compute a refinement to our existing estimate of the system
variables, and iteratively improve our estimate.
4.3.2.1 Linearization
NLLS works by linearizing the non–linear constraint equations, and then solving
the system in an iterative fashion. A constraint can be linearized by first deter-
mining an initial estimate for the parameters, and then expanding the constraint
as a Taylor series around that initial estimate. Thus, for an arbitrary constraint
function F (X) = K, where X is an N element vector with current estimates X,
the linearized constraint is:
62
F (X) + F ′(X)(X − X) + ... = K (4.8)
F (X) +N∑
i
∂F
∂Xi |X(Xi + Xi) + ... = K. (4.9)
If we neglect the higher order terms of the Taylor series, the resulting function
is a linear approximation to the constraint function, valid in the neighborhood
of the estimate X. Using this linearization technique, we can express a set of
constraints Fi(X) = K as a linear system
Ax = b, where (4.10)
Ai,j =∂Fi
∂Xj |X(4.11)
xj = Xj − Xj (4.12)
bi = Ki − Fi(X). (4.13)
By solving this linear system, we determine the x that maximize consistency
of the system. Since these xi are deltas on our original estimates X, we can
iterate by updating X ← X + x, recomputing A and b, and solving again until
we reach a stopping condition.
To apply this technique, we must have a way to determine an initial estimate
for our parameters, and we must compute these expansions for the constraints
we plan to apply.
63
4.3.2.2 Non–Linear Constraint Equations
Once we have determined an initial estimate for a set of nodes, we can define
constraint equations that describe all of the information we have about them.
These constraints take the form of equations that should equal zero; when we solve
the set of functions we minimize the sum of the square errors in the constraint
equations. Since we plan to use NLLS to solve our system, we also derive a
linearized version of each constraint.
Range Constraints Each range between two nodes forms a range constraint
equation. This equation takes the form of the distance formula, with the coordi-
nates of each node being the unknowns we intend to estimate. So, given a range
estimate between nodes i and j of Ri,j, we define the range constraint to be:
V Di,j = Ri,j, where (4.14)
Di,j =√
(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.15)
Linearizing this constraint,
dXiVXi −Xj
Di,j
+ dYiVYi − Yj
Di,j
+
dZiVZi − Zj
Di,j
+ dXjV−(Xi −Xj)
Di,j
+
dYjV−(Yi − Yj)
Di,j
+ dZjV−(Zi − Zj)
Di,j
= Ri,j − V Di,j. (4.16)
Note that we added a scalar V to the constraint above to enable the range
64
constraints to expand uniformly, for example as a function of temperature. In
cases where certain nodes are anchored, V can allow the map to adjust to accom-
modate expansion in the ranging measurements relative to fixed anchor points.
V can be estimated separately from this system as described in Section 4.4.
Azimuth Constraints Each azimuth estimate recorded by a node can be ex-
pressed with an azimuth constraint. This equation uses the arctangent function
to relate the azimuth angle θi,j to the node coordinates and orientation parameter.
So, given an azimuth estimate θi,j measured at node i:
arctan(Yj − Yi, Xj −Xi) + Θi = θi,j. (4.17)
Note that in this equation, arctan is used to mean the atan2() function, that
uses the signs of its arguments to return angles in all four quadrants. Note also
that we do not consider Θi a variable in this system of constraints. Instead, we
iteratively estimate Θi interleaved with iterations to estimate (Xi, Yi, Zi). Thus,
linearizing this constraint,
dXiTi,j
Yj − Yi
(Xj −Xi)2− dXjTi,j
Yj − Yi
(Xj −Xi)2
− dYiTi,j
1
Xj −Xi
+ dYjTi,j
1
Xj −Xi
= θi,j − (arctan(Yj − Yi, Xj −Xi) + Θi), where (4.18)
65
Ti,j =1
1 + ( Yj−Yi
Xj−Xi)2
. (4.19)
Zenith Constraints Each zenith estimate recorded by a node can be expressed
with an zenith constraint. These equations relate the Z dimension to the observed
zenith angles φi,j. Because node deployments are often mostly in a plane, these
equations are strong contributors to accurately estimating node position along
the Z axis4.
So, given zenith estimate φi,j measured at node i:
arctanZj − Zi
√
(Xi −Xj)2 + (Yi − Yj)2= φi,j. (4.20)
Linearizing this constraint,
dZiT′
i,j
−1
D′
i,j
+ dZjT′
i,j
1
D′
i,j
− dXiT′
i,jKi,j(Xi −Xj) + dXjT′
i,jKi,j(Xi −Xj)
− dYiT′
i,jKi,j(Yi − Yj) + dYjT′
i,jKi,j(Yi − Yj)
= φi,j − arctanZj − Zi
D′
i,j
, where (4.21)
4Because of the aforementioned poor error properties of angular estimates, this also meansthat the Z estimates may not be very accurate in flat deployments
66
T ′
i,j =1
1 + (Zj−Zi
D′
i,j
)2, (4.22)
D′
i,j =√
(Xi −Xj)2 + (Yi − Yj)2, (4.23)
Ki,j = (Zj − Zi)((Xi −Xj)2 + (Yi − Yj)
2)−32 . (4.24)
Anchor Points In all of these constraints, we make no assumptions about
anchor points. In fact, our position estimation system works well without any
anchor points other than a single point that is defined to be the origin: the
mixture of angular constraints and range constraints will fix the map into a
specific orientation, while the origin point fixes it to a specific coordinate frame.
Alternatively, any number of points in the system can be assigned constant
coordinates. Any constraints that involve only those nodes drop out of the system,
and constraints that relate that node to other nodes retain only the terms relating
to the non–anchor node. Anchor nodes can be helpful, by reducing error that
might accumulate in very large systems5. However, anchors can also introduce
errors, because their placement is not perfect, and their coordinates may not be
scaled to match the ranges.
To address the scaling issue, the V factor in the range constraints enables the
ranges to scale to match the coordinates of the anchor points. Placement errors
can be absorbed by the existing range and angle constraints, although the errors
in placement are likely to have a different distribution than the errors in range
and angle measurements, so the weighting of those constraints may need to be
modified. It may also be more appropriate to process placement constraints in
5Although, the results in Section 10 show negligible error for the largest systems we coulddeploy (10 nodes).
67
the “interleaved” solution step described in Section 4.4, or a post–processing fit
step as described in Section 4.6.
4.3.2.3 NLLS Constraint Weighting
Once we have our linearized constraint matrix, we can solve it using standard
linear algebra techniques such as Singular Value Decomposition [PTV92]. We
can then add the refinement values into our estimates and iteratively converge on
a solution. Each iteration will minimize the sum of the squares of the “residual”
values for the constraint equations. However, just as in the R–θ case, these
residual values are naturally in different units, and must be properly normalized
according to estimates of the variance of the measurements.
We can use a similar technique here to the case of R–θ. The range constraint
residuals are in units of cm, and from Table 4.1 we know the standard deviation
of the ranges. For the angular constraints, the residuals are measured in radians.
Thus, in order to make them commensurate with the range residuals we need to
apply weights to scale the standard deviation of the azimuth and zenith angles
to equal the standard deviation of the range measurements. Therefore, we can
define weights:
wrange = 1.0, (4.25)
wazi = σrange/σazi, (4.26)
wzen = σrange/σzen. (4.27)
In our implementation, we also take into account confidence estimates for
the angular constraints, as well as varying σzen as a function of φ. Since these
68
weights are constants for each constraint equation, we can apply an arbitrarily
sophisticated weighting scheme and variance estimate.
4.4 Interleaved Orientation Estimation
In the previous section, we presented two different schemes for estimating the
node positions (Xi, Yi, Zi). Both of these schemes assumed that the node orien-
tations Θi were already known and constant. While we can get an initial estimate
of the orientations in the algorithm described in Figure 4.1, our system must also
refine this estimate. Unfortunately, including Θi as a variable in the same system
of constraints introduces problems because it allows the entire map to rotate.
To address this, we iteratively refine each set of variables in an interleaved
fashion. After each refinement step, we use the updated node positions (Xi, Yi, Zi)
to derive new estimates for the node orientation. The node orientations are
computed by averaging the differences between the measured angles and angles
computed based on the current position estimates. This average represents a bias
between the observations and the estimated positions, which we deduce to be the
orientation of the array.
After each pass improves the orientation estimates, the position estimates
should likewise improve because the angular constraints should be more consis-
tent. This process continues until a termination condition is reached. In the case
of the R–θ algorithm, this process terminated when the orientation correction
drops below a fixed threshold. In the case of the NLLS algorithm, this pro-
cess normally terminates when all components of the NLLS refinement step drop
below a fixed threshold.
For NLLS, the orientation correction introduces an additional problem. If
69
there is an overall bias in the angular constraints, correcting the orientation will
cause the NLLS algorithm to react by rotating the entire map in the opposite
direction. We address this with two additional mechanisms. First, we rotate the
map after each orientation correction, so that the orientation offsets of the nodes
always average 0. Second, we stop improving the yaw estimate after the first 10
iterations of the NLLS refinement algorithm. Since the yaw estimate converges
quickly, continuing to estimate and rotate serves no useful purpose.
After the tenth iteration, we perform an angular “sanity check”. Because
we now have a reasonable estimate of both yaw and position, we can now tell
if there appear to be angles that are significantly off relative to the position
estimates. These bad angles are likely to be associated with reflected ranges,
since the reflected path arrives at a different angle than the line–of–sight path.
During the angle check, we locate the worst angular inconsistency in the whole
system, and drop the range and angles associated with that angle if the error is
greater than 20 degrees. We then re–enter the yaw estimation mode for another
three iterations, before performing another angular check. If the angular check
finds no outlier angles, we then drop out of yaw estimation mode for the remainder
of the algorithm.
In addition to estimating the “yaw” orientation of each array, we also estimate
the “pitch” and “roll” of each array using a similar averaging technique. However,
while we use the “yaw” estimate in the azimuth constraint equations to correct
the azimuth DOA measurements, we did not find that correcting the zenith angles
was helpful. However, we assume that deploying the arrays such that they are
level is relatively easy and that in any case the accuracy of zenith angles is less
critical to the applications.
Other corrections and estimations can be included in this interleaved step,
70
for example the estimation of the V scaling parameter. However, we leave the
exploration and implementation of these ideas to future work.
4.5 Outlier Rejection Using Studentized Residuals
One of the fundamental assumptions underlying this algorithm is that the error
in the inputs can be modeled as Gaussian. However, in practice this is not
always true. The error observed in ranging and angular estimates often includes
very large outliers that can wreak havoc on a system of constraints. Typically,
outliers arise when the line–of–sight from sender to receiver is blocked and a
strong reflection is observed. These reflections arrive at the wrong angle and
in addition take a much longer path than the original. If the obstruction is a
solid and permanent one, repeated experiments typically yield the same wrong
answer repeatedly, throwing off error estimates based on the variance of individual
measurements. While the effort in Chapter 3 has resulted in a highly effective
and resilient ranging sensor, in many environments it will still yield incorrect
answers.
To address the problem of outlier rejection, we use the technique of studentized
residuals [Har05] [WJH97]. Analysis of raw residuals in the solution of a linear
system does not generally yield useful information, because the largest residuals
are often not the most important ones. In fact, the opposite is true, because the
constraints that are “easiest to move” will yield the largest residual error terms.
Studentized Residuals is a technique that weights the residuals inversely to
the standard deviation of that residual’s value in response to perturbations to the
system. That is, if a particular residual would change dramatically in response
to other constraints being removed, that residual would be considered to have
71
a high variance. Thus, Studentized Residuals normalizes the magnitude of each
residual so that a high value connotes both a large residual error and a low
variance—suggesting that this error has a broad impact on the system.
The rejection heuristic runs after the system has converged. If a Studentized
Residual is found to be over a fixed threshold, the constraint corresponding to
the largest residual is dropped and the system is run again to convergence. If no
residual is over the threshold, the estimation is considered complete. The fixed
threshold is derived empirically; in our experience (see Section 10) a threshold of
4 works well.
4.6 Performance Metrics
In Chapter 10 we present the results of our experiments running our system.
However, first we need to define metrics to accurately measure the performance
of our system.
We use two metrics to assess the effectiveness of the position estimation algo-
rithm: a quality–of–fit metric that is independent of ground truth, and a position
error metric after an affine fit to ground truth. These metrics are similar to, but
slightly modified from, those discussed in [SMP03] and [SMP02].
The quality–of–fit metric is formed by the distribution and statistics of the
residual error terms of the constraints. For example, consider the average differ-
ence between the measured ranges and the computed ranges based on the position
estimates:
2
N(N − 1)
∑
Ri,j −√
(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.28)
72
This formula provides some insight into the degree of consistency in the even-
tual fit. If this average is on the order of the expected variance in the range
measurements, and if the distribution of errors is normal, this suggests that the
fit is very consistent and that if the system is well constrained, the results are
likely to be correct. This is similar to the EF metric developed in [SMP03]
and [SMP02], except computed as an average rather than a sum. It is unclear
how to properly interpret a sum, as its magnitude varies greatly with the number
of input ranges in the system.
The position error metric relies on access to (Xi, Yi, Zi), representing ground
truth position information for each node i. However, when computing the dis-
tance between corresponding points, we want to make comparisons that take
only the shape into account, ignoring differences of translation, scale, and ro-
tation. The Procrustes method is a collection of techniques for characterizing
shape [DM98]. This method formalizes techniques that extract a characteristic
shape from a set of points, and define transforms that filter out translation, scale,
and rotation to fit an experimental dataset to a characteristic shape. By fitting
our estimated maps to ground truth landmarks, we accomplish several objectives
at once: we relate our map to a real coordinate system, we define a metric to
measure position error, and we define a way to compare repeated trials to each
other.
Our fitting process is similar to the Procrustes methods, but we have modified
it to implement several forms of outlier rejection. Although our fit metric has
given good performance, these methods might be improved more formally in
future work. The fit process involves four steps. First, we compute a scaling
factor from one map to the other. Second, we translate the maps so that the the
node closest to the centroid is the origin, and scale the maps according to our
73
computed scaling factor. Third, we rotate the maps in three dimensions according
to the average angular offsets between corresponding nodes. Finally, we translate
the estimated map by the average difference between all corresponding points.
This fit method is not perfect, but it allows the easy integration of outlier
rejection heuristics, that would otherwise be difficult to implement. Several tech-
niques that are resilient to outliers are described in [DM98], including Least
Median of Squares solutions (LMS). We leave these implementations to future
work, and describe our present heuristic in the following sections.
Determining a Scaling Factor. Our transformation first estimates a scaling
factor by computing the ratio of the sum over all nodes of the distance to the
nearest neighboring node in the computed map,
V =
∑
i
√
(Xi −XMi)2 + (Yi − YMi
)2 + (Zi − ZMi)2
∑
i
√
(Xi − XMi)2 + (Yi − YMi
)2 + (Zi − ZMi)2
, where (4.29)
Mi = arg minj
√
(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.30)
We initially used the centroid size metric described in [DM98], but we found it
to be very susceptible to mis–placed nodes that throw off the centroid. Our metric
is a modified version of the baseline size metric, which uses the distance between
two arbitrary points; our metric simply averages a collection of baselines. This
metric is resilient to outliers caused by a single node being mis–placed. Rather
than adding several ranges to a mis–placed node into the sum, this metric will
tend to only count errors from mis–placed nodes once.
74
Translation to the Centroid Node. Next, we locate the node closest to the
centroid of the map, and translate the maps so that that node is the origin in
both maps. This differs from the usual methods used in Procrustes techniques,
which locate the maps according to their actual centroid. The problem with using
the true centroid is that, similar to the problem with the centroid size metric,
mis–placed nodes often add significant error to the centroid.
Except in the unlikely case that the center node is mis–placed, we avoid this
problem by locking to a single node in the center of the map. The most central
node is rarely significantly mis–placed, because it is typically one of the most
well–constrained nodes. In general, this method does introduce some distortion,
because the error in the placement of the “origin” node is distributed throughout
the map. Using the true centroid might be possible if we employed an iterative
technique to eliminate outliers and re–fit. We leave this possibility to future work.
Rotation. Next we rotate our estimated map in three axes to match the ground
truth map. To determine the rotation, we compute a weighted average of the
angular offsets of each corresponding pair of nodes,
θ =1
N − 1arctan
(
∑
i>0 R0,i sin(Ti)∑
i>0 R0,i cos(Ti)
)
, where (4.31)
Ti = arctan(
Yi
Xi
)
− arctan
(
Yi
Xi
)
. (4.32)
In order to make the metric more resilient to mis–placed nodes, the angular
average computed above is computed on the subset of the angles centered on the
largest cluster of angles within a 20 degree span. Angles lying outside of that
span are dropped form the average.
75
Then, we apply a rotation matrix for rotation about each of the three axes,
and scale to rotate the estimated map to match ground truth, for example to
rotate about the Z axis,
X ′
i = (Xi cos(θ)− Yi sin(θ))/V, (4.33)
Y ′
i = (Xi sin(θ) + Yi cos(θ))/V. (4.34)
This method might be improved by using a Least Median of Squares solution,
and by solving of all three axes simultaneously.
Final Translation. Finally, the translation between each pair of points is com-
puted and averaged, and that average translation is applied to translate the es-
timated map.
Because these transforms scale and rotate without warping, they serve to
match the estimated map to the ground truth without altering the consistency
achieved by the estimation algorithms. After fitting, we then compute the po-
sition error metric, both as a projection to the (x, y) plane and considering all
three dimensions. We consider both metrics important, since our typical “flat”
deployments tend to have greater error along the Z axis.
4.7 System Considerations
So far we have discussed a number of algorithms and heuristics that transform
raw range and DOA estimates into position estimates. Separately from these
algorithms, many system considerations influence how the position estimation
76
works.
First, we have seen that it is more important that the range data be consistent
than that it be completely accurate. In the absence of sophisticated calibration
techniques to compensate for environmental factors, we expect that range data
will be scaled by an unknown factor. However, if the ranges are consistent, the
resulting map will be uniformly scaled and it can be made quite precise by fitting
it to a set of anchors. This suggests that the system should be designed to capture
a consistent snapshot of ranges in a brief span of time in order to minimize the
impact of changing environmental conditions.
Second, as the size of systems grows, the position estimation process may need
to be broken down into phases and possibly distributed among a set of nodes.
However, since effective outlier rejection requires an over–constrained system, we
do not suggest a maximally distributed system; rather, a distributed set of nodes
that each perform centralized computations on data from its local region. Even
for a given centralized computation, many of these techniques involve expensive
matrix operations that grow as O(N3). To make these algorithms practical for
embedded systems, we may need to reduce N by estimating positions for only a
part of the map at a time.
Third, there are many different network protocol schemes that might be ap-
plied to this problem. In our initial implementation, we use a protocol called
StateSync to reliably publish all of the raw range and angle data, broadcast to
neighbors N hops away. Since every node receives all of the raw data, they all
locally compute a map, without requiring any further coordination. This works
well for small networks, and it is very simple architecturally, but it is not the
most efficient technique. A much more efficient structure, both in terms of net-
work and CPU usage, would elect leaders who coordinate the ranging process.
77
These leaders would first coordinate the nodes to schedule ranging into a brief
time period. After ranging, the nodes would send the results back to the leader,
which would process it into a map. Each leader would then publish their map to
the entire network, and maps from different leaders would be stitched together
into a single comprehensive map. We leave this implementation to future work.
78
CHAPTER 5
Robustness
So far in Part I, we have largely discussed algorithms, only briefly touching on
practical issues. However, making a system like this work is much more than
inventing an algorithm that works. To achieve a robust solution, a great deal of
effort must be devoted to handling error conditions that occur in deployments,
and many layers of scaffolding must be constructed to develop and test such a
system. In this Chapter, we briefly present a number of things that can go wrong,
and some general approaches for combating those problems. Then, in Part II we
discuss the scaffolding we have created to address these issues in building our
platform.
5.1 What Can Go Wrong?
Problems and failures can occur at many layers of the system.
Hardware Malfunctions. At the lowest layer, many different hardware mal-
functions can occur. A failure in the power system (such as battery failure, loose
wiring, or water damage) would cause a reboot or a permanent node failure, to
which other nodes would need to adapt. The wiring to the RF antenna can fail,
leaving the node unable to communicate, or in some cases able to send but not
receive. The wiring in the microphone array can fail, causing one or more input
79
channels to fail, or causing the output channel to fail.
Resource Limits. Resource availability failures can occur due to software
glitches or unusual conditions that cause unexpected resource usage patterns.
Memory or flash exhaustion can cause the system to become unresponsive, or to
reboot. Hardware resources can also be unavailable if there is a hardware prob-
lem such as a loose PCMCIA card or some other malfunction. Unusual hardware
conditions or software bugs can also cause components to restart when they en-
counter cases that they are unprepared to handle. We have also observed some
problems in the Linux kernel on our system (ARM Linux version 2.6.10), specif-
ically problems with the JFFS2 flash filesystem in the event that it gets close to
filled, and also some sporadic problems with our sound and wireless card drivers.
The Wireless Channel. The wireless channel is a well–known source for a
wide range of failures. Peer–to–peer connectivity is time–varying and often can
be asymmetric. Connectivity is affected by the physical properties of the channel
(which themselves are a function of the environment), as well as by the use of the
channel by other nodes in the system. Collisions with peers and, in the case of
the “hidden terminal effect”, collisions with nodes that cannot be received, are
another important source of message loss [Rap96].
Time Synchronization. At the time–synchronization level, synchronization
to a node may not always be achievable because of connectivity. The Reference
Broadcast Synchronization (RBS) [EGE02a] technique used in our system (de-
scribed in Chapter 7) places certain requirements on connectivity: that every
node to be synchronized receive broadcast messages in common with at least one
other node, and that a contiguous chain of such relationships be present between
80
any two nodes that need to be synchronized. Because this connectivity require-
ment is more stringent than simple connectivity, it is possible for a node to be
present in the system and communicative, without being able to be synchronized.
Ranging and Multilateration. Finally, the ranging and multilateration lay-
ers must deal with these different failures in the underlying system, as well as
detecting and reducing the many forms of error that can appear in the rang-
ing measurements. Ranging errors can occur due to detection of late arrivals
in obstructed environments, failures to detect due to insufficient signal quality,
or system failures that interfere with time synchronization or other parts of the
system.
To make our system work in spite of this variety of failures, we must develop
a system that incorporates many layers of robustness.
5.2 Strategies for Robustness
We apply several strategies to allow the system to continue working in the face
of these varied types of fault.
Fault Detection and Reporting. Often software components can detect and
report faults through self–checks and upon detection of error conditions. A fault
reporting service enables software components to report faults which are then
propagated to an operator who can address the problems. While we want the
system to continue to operate in the presence of faults, this mechanism enables
the operator to debug the system. In our experience this has been crucial to
getting the system deployed, as there are invariably some hardware failures due
to loose wires or deployment errors, which would otherwise be very tedious to
81
track down. The fault reporting system can also give the operator a hint to
investigate further when things go wrong.
Soft State Design. One of the most powerful tools in the pursuit of robustness
is soft state design [Cla88]. In soft state, an operation is periodically executed,
without making assumptions about prior state from past executions, relying on
caching and low–level retry to enhance performance. This technique allows a
trivial recovery from any possible error condition, simply by throwing away any
information that has been previously cached and running through the same code
path. Since all error conditions are handled as part of the normal code path, we
avoid the common problem of latent bugs in the error paths. This principle is
generally one of reducing the number of states a system can be in, reducing the
number of code paths, and avoiding lockups caused by inconsistent views of the
state of the system.
Reactive State Machines. Soft state design was popularized by distributed
systems and network protocols. However, it can also be applied within a node
to the communications channels between services, as well as to the design of the
services themselves. Reactive state machines are the application of the soft state
principle to the design of state machines.
In a reactive program, inputs feed in from a number of sources, and at each
step, the current set of inputs are used to determine the next output. In this
type of system the world model carried forward from step to step is designed to
be explicit. The term “reactive”, which was coined by Brooks [Bro86] and the
robotics community, suggests that rather than proceeding with old plans based
on old models, fresh inputs should modify the model on the fly and thus change
the behavior of the system on the fly.
82
For example, in application to wireless networking, many layers of the system
might need to react to a recently disconnected peer, changing their behavior in
response to that new condition.
Reduce and Simplify Inter–Node Dependencies. Inter–node dependen-
cies are costly because of the added complexity in handling the cases where a
peer node misbehaves or disappears. For example, any time your algorithm re-
quires a special leader node, there much be some additional mechanism to handle
the case that something goes wrong with the leader—whether it goes off–line
permanently or temporarily, or develops some more insidious fault that might
only affect its performance. Thus, design choices that simplify or eliminate de-
pendence on other nodes often result in a net simplification of the system. For
example, if there is a task that any node can do, it may make the most sense for
all nodes to perform that task independently, rather than devising some scheme
to have a single node perform the task and publish the result.
Fault Isolation. Process isolation in Linux enables a portion of the system to
fail without causing the whole system to fail. This enables the system to continue
to operate even if a subcomponent fails and restarts due to a software bug. The
benefit of this capability is that exit() is an acceptable way to address failures
that seem unrecoverable or that are unexpected. The caveat to this capability
is that the system components must be designed to survive the restart of their
underlying components. This increases the burden on the designer, who must at a
minimum be aware of the issues involved. However, this property is important as
the complexity of the system increases and when new conditions are experienced
during deployments.
The belief that more failures occur in the field than in the lab isn’t just an
83
urban legend—it is the result of running code for the first time in a different
environment. New timing relationships, new connectivity properties, and new
sensor inputs practically guarantee that the system will behave differently in the
field. As a result, the system may enter states that were not well exercised during
lab testing.
5.3 Successfully Managing Complexity
In Part II, we will see how we apply these ideas to build a layered stack of soft-
ware and system components that we can then integrate into a working system.
In this process, we manage complexity by dividing the problem into component
services, such as time synchronization, sampling, ranging, etc. Each service en-
capsulates a well–defined chunk of functionality, that is large enough to be useful
and small enough to be manageable, and provides a service to other modules and
to applications through published APIs. Throughout this process, we maintain
system visibility as a first–order goal: each service provides numerous debugging
and diagnostic interfaces in addition to its API. We will see how we apply the
many robustness strategies outlined above in the construction of our platform,
successfully addressing the many failure modes.
84
CHAPTER 6
Emstar: a Software Framework
Through our initial experiences developing distributed sensing systems we dis-
covered numerous impediments to developing deployable systems. While many
early projects found immediate success at solving relatively simple problems in a
one–off demo context, it proved more difficult to build more complex and robust
solutions on these early successes [GSR04] [ADB04] [ARE05]. Our desire to ex-
plore deeper and more powerful applications led us to develop Emstar [GEC04]
[GSR04] [EGE04] [EBB03b], a software framework for developing distributed
sensing systems from Linux systems.
Emstar is a complete software framework and development/deployment en-
vironment designed for distributed sensing applications. The role of Emstar for
distributed sensing applications is analogous to that of GNOME [War04] or the
Win32 API [RN97] for GUI applications. Just as GNOME and Win32 pro-
vide tools and libraries to build a universe of GUI applications with a common
user experience, Emstar provides tools and libraries to build a growing set of
inter–operable distributed sensing applications and system components. Where
GNOME and Win32 provide several different “Save As” dialog boxes and a li-
brary to build tear–off pull–down menus, Emstar provides several different link
estimation modules and libraries to build drivers for new link–layer network de-
vices.
86
6.1 Design Principles
Emstar was developed with several design principles in mind. These principles
resulted from early experiences with distributed wireless sensing systems, and
from considering the ways in which existing programming interfaces meshed or
clashed with the needs of these systems.
6.1.1 Inter–node communication is not usually transparent.
In the design of the Internet, TCP, an end–to–end reliable stream transport layer,
serves the purposes of most applications. This reliable layer, combined with the
overall high performance of the Internet, lends itself to remote/local transparency
via sockets. A local service that is accessed via a socket can transparently be
accessed remotely.
However, in the case of wireless sensor networks, local/remote transparency
is often undesirable. The reduced link reliability of wireless networks and the
absence of a fixed topology translate to higher communication costs, from a
combination of increased transmission costs and increased complexity of control
protocols. These increased costs of communication in a wireless network mean
that the application needs to know whether a service is local or remote—therefore
transparency is often counterproductive! When compared with a local transac-
tion, a remote transaction may have much higher latency, much higher energy
cost, and may fail because of connectivity failures. Because masking these differ-
ences is rarely beneficial, Emstar was designed to focus on support for interfaces
to local services, under the assumption that access to remote services would re-
quire more application–specific solutions.
87
6.1.2 The system within a node is complex and benefits from dis-
tributed system design principles.
The development of the Internet has highlighted many techniques for build-
ing complex systems out of components that are individually subject to fail-
ure [Cla88]. These principles also apply to the operation of individual nodes in
many distributed sensing applications, for two primary reasons. First, nodes in
distributed sensing applications must operate robustly in challenging environ-
ments. Because the environment provides inputs and circumstances that are
difficult to predict or reproduce in laboratory environments, a robust design is
often required to survive a deployment.
Second, in many distributed sensing applications, network costs mandate that
much of the intelligence in the system must be pushed into the network rather
than centralized. For example, 10 nodes hosting 4 channels of streaming audio
constitutes an aggregate data rate of 4MB/sec. Even if we set aside issues such
as packet loss, contention, adaptive transmit rates, and control overhead, this
rate is over 3 times the nominal capacity of an 802.11b card. Pushing intelligence
into the network increases both the complexity of individual nodes and of their
interactions. Problems that are easy when centralized, such as selecting a leader,
are yet another protocol design challenge when they are distributed, with the
need for fault recovery at every level.
These considerations have led to the introduction of distributed systems de-
sign principles into the inter–module interfaces within Emstar nodes. For exam-
ple, Emstar modules are typically designed to recover from the failure of other
modules in the system, using techniques such as soft–state refresh and dynamic
registration and unregistration of services at run–time. In order to reduce the
burden on the system designer, many of these features are built–in features of
88
the Emstar libraries.
System visibility is critical from the foundation up. As the complexity
of our systems increases, they become more difficult to debug. A critical aspect
of this is a capability to gain direct visibility and insight into the workings of
individual modules in the system. By using the UNIX device file interface as
its IPC interface, Emstar’s inter–module interfaces can be browsed and often
accessed directly from the shell. In some cases, transactions on the IPC channels
can be directly viewed without modification the code. Debugging devices that
provide insight into the current state of a module are cheap and easy to add, and
are often much more convenient than using log files. These techniques enable
rapid fault isolation and debugging.
Interleaved, interacting events are the common case. Similar to the
principles of “reactive robotics”, distributed sensing systems tend to operate in a
“reactive” mode in which their immediate behavior is heavily influenced by sensor
inputs [Bro86]. This reactive style fits well within an event–driven programming
model, because events and inputs of different types arriving asynchronously must
be integrated to influence the immediate behavior of the system. From a system
design perspective, reactivity requires timely delivery of event notification among
the modules in the system, as opposed to a polling–based approach.
System development tools need to support “real code simulation” and
“emulation” for quick turn–around debugging. One of the most diffi-
cult aspects of distributed sensing systems is the difficulty of effectively testing
the system. Experience has shown that deployments often expose new kinds of
problem that did not initially appear in simulation. This issue underscores the
89
Layer 0: FUSDLow Level IPC
Layer 1: GlibHandle events on IPC
Layer 2: Device Patterns &LibrariesIPC mechanism for a variety of interactions
Layer 3: Existing Modules and ServicesExisting useful components for applications
Layer 4: Extra ToolsHelp run, maintain, and debug application
Layer 0: FUSDLow Level IPC
Layer 1: GlibHandle events on IPC
Layer 2: Device Patterns &LibrariesIPC mechanism for a variety of interactions
Layer 3: Existing Modules and ServicesExisting useful components for applications
Layer 4: Extra ToolsHelp run, maintain, and debug application
Figure 6.1: The five layers of the Emstar framework.
importance of “real–code simulation” in which systems that experienced prob-
lems in deployment can be brought back into the lab and tested under various
simulated conditions. Another important element is the ability to run in vari-
ous “emulation” modes, where centralized, high–visibility simulations can be run
with real hardware in the loop. Often these techniques are the only practical
ways to debug a system that fails in the field.
6.2 How Emstar Works
Software frameworks are by nature difficult to conceptualize because they have no
tangible instantiation, except as the foundation beneath an application. However,
a framework can be described by describing the services and interfaces it provides,
and the structure it imposes on an application. Described in this way, we can
describe Emstar as a five layer system [Byt05], as shown in Figure 6.11.
1This figure is due to Martin Lukac.
90
6.2.1 Layer 0: FUSD Syscall Inter–process RPC
The lowest layer of Emstar is FUSD [GEC04] [Els02]. FUSD is a micro–kernel
interface implemented in Linux that allows user–space server processes to register
character device files and handle system calls on those devices. FUSD provides a
convenient way to enable cross–process message–passing, while at the same time
exposing shell–accessible interfaces to internal state and control functions. In
some respects, FUSD is similar to the AT&T Plan 9 system [PPT90], but has
the advantage of running over any platform running Linux rather than requiring
the port of a complete OS to the latest embedded hardware. FUSD also has much
in common with the procfs and sysfs features of Linux, which expose control and
status interfaces to in–kernel features; the difference being that FUSD exposes
interfaces to user–space processes.
By enabling systems to be readily composed of separate processes, we benefit
greatly from fault isolation. A multi–process system prevents implementation
errors in one process to cause a complete system reset. This is an important
property for deployed sensor network systems because data from the field some-
times causes failures that did not occur in the lab. For example, one version of
our ranging system which was successfully tested in the lab, encountered a new
kind of failure in the field. The deployed system suffered from a certain type of
inconsistency in the ranging data early in the run, which would eventually be
resolved as more ranging data was collected. However, this inconsistency some-
times caused an exception in the multilateration engine, which in turn caused the
multilateration module to restart. If this restart had caused a complete system
restart, the system would never have gotten past the startup phase—but because
of process isolation our system was able to limp past that point and return valid
answers.
91
Fault isolation also means that combining components is less likely to result
in new failures. A more tightly coupled, single–process approach can lead latent
errors in a particular component to surface only when several components are
used in combination. By isolating components from each other, it is easier to
integrate systems of components of varied origin.
6.2.1.1 System Calls as Blocking RPC
A system call on a FUSD device represents a blocking RPC call to the server,
brokered by the kernel. For example, consider the following snippet of client
code:
int status, fd;char buf[100];fd = open("/dev/test", O RDWR);status = read(fd, buf, sizeof(buf));
In line 4, the read() system call results in the following sequence of events,
corresponding to the diagram in Figure 6.2:
1. Client process traps into the kernel and blocks.
2. Kernel marshals the arguments to the read() call into a FUSD message.
3. Kernel queues the FUSD message for the server process bound to “/dev/test”,
and wakes the server.
4. Server reads and processes any previously queued FUSD messages.
5. Server reads out the FUSD message and processes it.
6. After processing the message, Server marshals a response and writes the
response message to the kernel.
92
7. Kernel passes the response back to the Client and the Client’s system call
returns with a result code.
From this sequence of events, it is important to note that the client blocks for
the entire duration of the system call. Even if the system call is a “non–blocking”
call such as a read() on a file descriptor that is configured non–blocking, the client
will still be blocked until the server processes the message and returns a response,
e.g. EAGAIN to indicate that the server is not ready. This means that if the server
is unresponsive the client can block in a system call for arbitrary amounts of time.
Slow response times could be caused either by errors in the implementation of the
server, or if the server is busy handling calls from other clients. Response time
can also be increased if scheduling latency becomes significant from high load.
Despite this drawback of a blocking RPC model, there is also considerable
benefit to this approach. First, synchronous RPC calls can be made in a straight-
forward coding style, as opposed to using completion callbacks. Each syscall syn-
chronously returns a result code with a minimum of latency. Lengthy operations
are typically structured as a request which is accepted or rejected quickly in one
RPC call, followed by notification when the requested operation completes.
Second, syscalls are a very basic interface accessible from any POSIX appli-
cation, with no library required beyond the standard UNIX interface libraries.
Emstar device file interfaces are browseable within the device filesystem, and in
many cases can be accessed directly by existing UNIX programs such as cat. The
syscall interface is also narrow and therefore readily ported to operating systems
other than Linux.
93
Client A FUSD Server B
read()
Msg3
1
2
Msg3Msg2
3
read()
Msg2 Process Other Clientread()
Msg3 Process Client A
Msg3
Msg3read() returnsresult code
Msg3
write() result code
4
5
6
7
Client A blocks untilsystem call returns
Client A FUSD Server B
read()
Msg3
1
2
Msg3Msg2
3
read()
Msg2 Process Other Clientread()
Msg3 Process Client A
Msg3
Msg3read() returnsresult code
Msg3
write() result code
4
5
6
7
Client A blocks untilsystem call returns
Figure 6.2: Message timing diagram of a FUSD call. The middle column of the diagram
represents the FUSD kernel module.
94
6.2.1.2 Client–Server Connections in FUSD
Using FUSD, client and server are distinct roles. A process becomes a server by
registering a new device file and handling operations on that device file. A process
becomes a client by opening a FUSD device file. By successfully opening a device
file, a connection is established between client and server, which is named on the
client side by the file descriptor returned by open(). The client may invoke RPC
calls to the server at any time by making a system call on that file descriptor.
The client may also listen for asynchronous notification on that file descriptor
using select() or poll(). Thus, to communicate from server to client, the server
first notifies the client and the client then calls back to the server. For example,
the server might indicate “readable” and the client responds by calling read().
This asymmetric relationship has advantages. The primary advantage is that
clients can be written to lower standards than servers without compromising the
integrity of the system. That is, while an incorrectly implemented server can
permanently block a client, an implementation error in a client can’t cause a
server to fail. This is a consequence of the blocking semantics: while a server
can potentially cause a client to block, a client can at worst only pass malformed
arguments in a system call which are either caught by the kernel or should be
rejected by the server process.
The FUSD client–server connection provides certain guarantees ensured by
the kernel that enable fault isolation. The kernel ensures memory fault isolation
between the client and server processes. Any pointers provided as arguments to
a system call are checked for validity in the kernel and transferred to the memory
space of the destination process, thus protecting the client and server from each
other. In addition, in the event that a client or server terminates unexpectedly,
any open connections are automatically cleaned up. When a client terminates
95
with open connections, close() messages are generated and sent to the servers
handling those connections. When a server terminates with active clients, those
clients’ file descriptors are immediately notified with exception signals and any
future system calls on those descriptors will return EBADF error codes.
6.2.1.3 FUSD Dependency Graphs
An Emstar system typically involves dozens of components, each of which hosts
multiple servers and is client to several other components. For example, the
acoustic localization system described in this document, along with all of its Em-
star subcomponents, is composed of 23 components and 182 device file interfaces.
While processes are often both servers and clients of other processes, there is a
requirement that the dependency graph of clients and servers be loop–free. This
stems from the blocking nature of the system calls. A loop in the dependency
graph introduces the potential for deadlock, as shown in the left side of Figure 6.3.
In the diagram, process A is blocked in write() system call as a client of process
B, but before handling that call process B attempts to make a write() call back
to process A. When a process is blocked in a system call as a client, it cannot
respond as a server.
A/dev/A
B
/dev/B
write(X) write(Y)
Broker
/dev/broker
A B
write(A�Y)
X � read()
write(B�X)
Y � read()
A/dev/A
B
/dev/B
write(X) write(Y)
A/dev/A
B
/dev/B
write(X) write(Y)
Broker
/dev/broker
A B
write(A�Y)
X � read()
write(B�X)
Y � read()
Broker
/dev/broker
A B
write(A�Y)
X � read()
write(B�X)
Y � read()
Figure 6.3: A dependency loop, and using a broker service to break the loop.
Figure 6.3 also shows a way to resolve this type of circular dependency. The
96
most common solution is to restructure the system to add a new service that
can act as a “broker”. Building systems as a collection of services tends to lend
itself naturally to this type of structure, because most services naturally act as a
broker in the course of their operations.
syncd
udpd
/dev/link/udp0/data
/dev/sync/params
syncd
udpd
/dev/link/udp0/data
/dev/sync/params
IO Threadsyncd
udpd
/dev/link/udp0/data
/dev/sync/params
syncd
udpd
/dev/link/udp0/data
/dev/sync/params
syncd
udpd
/dev/link/udp0/data
/dev/sync/params
IO Threadsyncd
udpd
/dev/link/udp0/data
/dev/sync/params
IO Thread
Figure 6.4: Diagram showing how to use a thread and a queue to break a FUSD
dependency loop.
In some cases, there is no natural decomposition into strictly layered services.
When a strict layering is inconvenient, a thread and a queue can be used to break
a loop, as shown in Figure 6.4. Essentially, this decouples the client and server
portions of a process, with a queue between them.
6.2.1.4 The FUSD Device API
As we have seen, from a client’s perspective the FUSD API is simply the well–
known POSIX system call API. From a server’s perspective, FUSD provides
an API through which those system calls can be handled [Els02]. This API is
modeled along the lines of the character device API in the Linux kernel, with a
few key differences. See [RC01] for an introduction to the Linux character device
API.
Similar to the Linux character device API, FUSD handles the character device
system calls by calling handler functions specified by the server in the structure
shown in Figure 6.5. For example, when a client calls read(), the server’s read
97
typedef struct fusd file operations {int (*open) (struct fusd file info *file);int (*close) (struct fusd file info *file);ssize t (*read) (struct fusd file info *file, char *buffer, size t length,
5 loff t *offset);ssize t (*write) (struct fusd file info *file, const char *buffer,
size t length, loff t *offset);int (*ioctl) (struct fusd file info *file, int request, void *data);int (*poll diff) (struct fusd file info *file, unsigned int cached state);
10 int (*unblock) (struct fusd file info *file);} fusd file operations t;
Figure 6.5: The FUSD file operations structure.
callback is called to handle that call. The fusd file info t pointer contains the
arguments for the call and other information about the calling process, includ-
ing an application–determined per–connection pointer. The server may either
return a return value, causing the read() to complete immediately, or may delay
the return, causing the client to block until a later time. A separate function,
fusd return(), is called to trigger an asynchronous return.
A new client–server connection is established when the client calls open() and
the server accepts that open() by returning a value of 0 to indicate success. At that
time, the server may set a per–connection “private data” pointer. This pointer
enables the server to distinguish different client connections and respond to their
requests appropriately; it will be passed to subsequent handler callbacks relating
to that connection. The close() handler is called when the connection breaks for
any reason (e.g. if the client calls close() or if the client’s process terminates for
any reason.) The close() handler should do any clean–up and resource recovery
necessary to deallocate that connection.
The handlers for read(), write(), and ioctl() are used respectively to transfer
98
data to, from, and to/from a client process. The actual semantics of what this
means, e.g. what the server does with the data and what data the server re-
turns to the client, is application specific. However, the convention in Emstar is
to retain some similarity with the POSIX meaning of the calls. Emstar imple-
mentations typically design the semantics to be compatible with common UNIX
utility programs such as cat and shell functions such as echo.
Thus far, the FUSD API is very similar to the Linux character device API.
The main difference between the two lies in the way poll() is handled. In a Linux
driver, poll() is a callback function that is called whenever the polling process
awakens to see whether it should drop out of a blocking poll() or select(). This
Linux poll() callback is called from deep in the scheduler, a point in the kernel at
which a response is required immediately. Since queries out to a FUSD service can
potentially have unbounded latency (and in any case require a different process
to be scheduled), calling to a FUSD server to satisfy a kernel poll() request is not
an option. Consequently, FUSD preemptively requests poll() state and caches it
so that it can respond immediately from the cached version. The freshness of
the cache is maintained using the poll diff() callback function: a poll diff() call is
left “outstanding”, such that whenever the poll state changes from the current
cached state, the server is obliged to return that poll diff() with the update.
6.2.1.5 FUSD Performance
While FUSD has many advantages, the performance of drivers written using
FUSD suffers relative to an in–kernel implementation. To quantify the costs of
FUSD, we compared the performance of FUSD and in–kernel implementations
of the /dev/zero device in Linux. To implement /dev/zero using FUSD, we im-
plemented a server with a read() handler that returned a zeroed buffer of the
99
1
10
100
1000
10000
64 256 1024 4096 16384 65536
Tra
nsfe
r sp
eed
(MB
ytes
/sec
)
Size of read (bytes)
FUSD /dev/zero, 2.4.20 kernelModule /dev/zero, 2.4.20 kernel
FUSD /dev/zero, 2.6.3 kernelModule /dev/zero, 2.6.3 kernel
Figure 6.6: Throughput comparison of FUSD and in–kernel implementations of
/dev/zero, timing a read of 1GB of data on a 2.8 GHz Xeon, for both 2.4 and 2.6
kernels.
requested length. The in–kernel implementation implemented the same read()
handler directly in the kernel.
Figure 6.6 shows the results of our experiment, running on a 2.8 GHz Xeon.
The figure shows that for small reads, FUSD is about 17x slower than an in–
kernel implementation, while for long reads, FUSD is only about 3x slower. This
reduction in performance is a combination of two independent sources of over-
head.
The first source of overhead is the additional system call overhead and schedul-
ing latency incurred when FUSD proxies the client’s system call out to the user–
space server. For each read() call by a client process, the user–space server first be
scheduled, and then must itself call read() once to retrieve the marshalled system
100
call, and must call writev() once to return the response with the filled data buffer.
This additional per–call latency dominates for small data transfers.
The second source of overhead is an additional data copy. Where the native
implementation only copies the response data back to the client, FUSD copies
the response data twice: once to copy it from the user–space server, and again
to copy it back to the client. This cost dominates for large data transfers.
In our experiments, we tested both the 2.6 and 2.4 kernels, and found that
2.6 kernels yielded an improvement for smaller transfer sizes. The 2.6 kernel has
a more significant impact when many processes are running in parallel.
6.2.2 Layer 1: GLib Event System
As we discussed in Section 6.1, Emstar is designed to support “reactive”, event–
driven designs. To address these needs, Emstar incorporates an event system
that supports the management and multiplexing of I/O and timer events in a
modular way. Rather than invent a new event system, Emstar uses a preexisting
system that is part of the GLib library, a standard library widely used in Linux
and open–source projects. In order to minimize dependence on any particular
event system, Emstar defines a thin “glue” layer, shown in Figure 6.7, that in the
current implementation connects the Emstar codebase to the GLib events API,
but could be replaced with some amount of effort with another event system.
The Emstar events API handles two kinds of event: timer events which are
optionally retriggerable, and I/O events, which enable poll flags to be watched for
a specific file descriptor. Since all Emstar signals and I/O are based on timers and
file descriptors, at the lowest layer these are the only event functions required.
These functions in turn call into GLib functions that configure the GLib event
loop. Figure 6.8 shows an example using a GLib timer from an Emstar program.
101
/* Condition values for I/O Events */#define FUSD NOTIFY INPUT 0x1#define FUSD NOTIFY OUTPUT 0x2#define FUSD NOTIFY EXCEPT 0x4
5
/* Return values for event callbacks */#define EVENT DONE (0)#define EVENT RENEW (1)#define TIMER DONE (0)
10 #define TIMER RENEW (1)#define EVENT ERROR(x) ((x) << 16)#define TIMER RENEW MS(x) ((x+1) << 4)
typedef int (*g event handler cb t)15 (void *data, int fd, int fusd condition, g event t *event);
typedef int (*g timer handler cb t)(void *data, int interval, g event t *event);
20 int g event add (int fd, int fusd condition,g event handler cb t function,void *data, g event opts t *opts,g event t **ref);
25 int g timer add (uint interval,g timer handler cb t function,void *data, g event opts t *opts,g event t **ref);
30 int g event destroy(g event t *closure);
Figure 6.7: The Emstar event system API.
102
int cb func(void *data, int *interval, g event t *ev) {elog(LOG NOTICE, "Timeout fired!");return TIMER RENEW;}
5
int main(int argc, char * argv[ ]){
/* install the timer event */status = g timer add(1000, cb func, &cb data, NULL, NULL);
10
/* . . . */
/* enter the event loop */g main();
15 return 1;}
Figure 6.8: Setting a timer in the Emstar event system.
Although this event API is very low–level, higher–level components can be
constructed above it. Typically, a higher level event will define its own application–
specific callbacks, encapsulating and masking the internal details of the basic I/O
and timer events. This enables modularity, since at the lowest level a consistent
events API and loop is used to combine and manage these low level events.
6.2.3 Layer 2: Emstar Device Patterns and Libraries
Layer 2 of the Emstar design is a layer of libraries that comprise the heart of
Emstar, supporting the implementations of all of the tools, services, and appli-
cations built within the framework. These libraries include a collection of useful
utility functions and data structures called libmisc, a collection of event–based
I/O functions such as socket and file I/O in libevent, and a set of functions for
creating and using FUSD devices called libdev.
103
Pattern Name Description
Status Device Presents current status on demand, and notificationof status change.
Packet Device Send and receive small packets on a best–effortbasis, with per–client queueing.
Command Device Presents usage information when read, acceptscommand strings via write().
Query Device Synchronous RPC with a single round–robin queuefor transactions.
Sensor Device Streaming or buffered interface to a bufferedsequence of samples measured from a sensor.
Log Device Buffer of recent log messagesOption Device /proc–style runtime–configurable optionDirectory Device Internally stores a mapping from strings to small
integers, and allows clients to access and add to themapping.
Table 6.1: Device Patterns currently defined by the Emstar system.
Using FUSD, it is possible to implement character devices with almost arbi-
trary semantics. FUSD itself does not enforce any restrictions on the semantics
of system calls, other than those needed to maintain fault isolation between the
client, server, and kernel. While this absence of restriction makes FUSD a very
powerful tool, we have found that in practice the interface needs of most ap-
plications fall into well–defined classes, which we term Device Patterns. Device
Patterns factor out the device semantics common to a class of interfaces, while
leaving the rest to be customized in the implementation of the service. Table 6.1
shows a list of Emstar Device Patterns.
The Emstar device patterns are implemented by libraries that hook into the
GLib event framework. The libraries encapsulate the detailed interface to FUSD,
leaving the service to provide the configuration parameters and callback functions
that tailor the semantics of the device to fit the application. For example, while
104
the Status Device library defines the mechanism of handling each read(), it calls
back to the application to represent its current “status” as data.
Relative to other approaches such as log files and status files, a key property
of Emstar device patterns is their active nature. For example, the Logring Device
pattern creates a device that appears to be a regular log file, but always contains
only the most recent log messages, followed by a stream of new messages as they
arrive. The Status Device pattern appears to be a file that always contains the
most recent state of the service providing it. However, most status devices also
support poll()–based notification of changes to the state.
The following sections will describe a few of the Device Patterns defined within
Emstar. Most of these patterns were discovered during the development of ser-
vices that needed them and later factored out into libraries. In some cases, several
similar instances were discovered, and the various features amalgamated into a
single pattern.
6.2.3.1 Status Device
The Status Device pattern provides a device that reports the current state of a
module. The exact semantics of “state” and its representation in both human–
readable and binary forms are determined by the service. Status Devices are used
for many purposes, from the output of a neighbor discovery service to the current
configuration and packet transfer statistics for a radio link. Because they are so
easy to add, Status Devices are often the most convenient way to instrument a
program for debugging purposes, such as the output of the Neighbors service and
the packet reception statistics for links.
Status Devices support both human–readable and binary representations through
two independent callbacks implemented by the service. Since the devices default
105
Status Device
Server
O I
Client1 Client2 Client3
write binaryprintable
status_notify()
Status Device
Server
O I
Client1 Client2 Client3
write binaryprintable
status_notify()
Figure 6.9: Block diagram of the Status Device pattern. The functions binary(), print-
able(), and write() are callbacks defined by the server, while status notify() is called by
the server to notify the client of a state change.
to ASCII mode on open(), programs such as cat will read a human–readable
representation. Alternatively, a client can put the device into binary mode us-
ing a special ioctl() call, after which the device will produce output formatted
in service–specific structs. For programmatic use, binary mode is preferable for
both convenience and compactness.
Status Devices support traditional read–until–EOF semantics. That is, a
status report can be any size, and its end is indicated by a zero–length read.
But, in a slight break from traditional POSIX semantics, a client can keep a
Status Device open after EOF and use poll() to receive notification when the
status changes. When the service triggers notification, each client will see its
device become readable and may then read a new status report.
This process highlights a key property of the status device: while every new
report is guaranteed to be the current state, a client is not guaranteed to see
every intermediate state transition. The corollary to this is that if no clients care
106
about the state, no work is done to compute it. Applications that desire queue
semantics should use the Packet Device pattern (described in Section 6.2.3.2).
Like many Emstar device patterns, the Status Device supports multiple con-
current clients. Intended to support one–to–many status reporting, this feature
has the interesting side effect of increasing system transparency. A new client
that opens the device for debugging or monitoring purposes will observe the same
sequence of state changes as any other client, effectively snooping on the “traffic”
from that service to its clients. The ability to do this interactively is a powerful
development and troubleshooting tool.
A Status Device can implement an optional write() handler, which can be
used to configure client–specific state such as options or filters. For example, a
routing protocol that maintained multiple routing trees might expose its routing
tables as a status device that was client–configurable to select only one of the
trees.
In order to demonstrate the simplicity of implementing a “dual mode” Status
Device, Figure 6.10 shows an complete example using this interface. The ex-
ample creates a device called /dev/energy/status, that reports information about
remaining energy in the system, represented by the energy status t structure. The
device is created in the main() function, by calling the constructor with an options
structure.
The options structure specifies the name of the device, a private data pointer,
and two callback functions that will be called when the device is accessed by a
client. If the client sets the device into binary mode, the “binary” handler is
called to generate a response; otherwise, the “printable” handler is called. The
handlers are provided a buf t (a dynamically allocated growable buffer) which
they must fill. Typically the binary output is reported as a struct that it exposed
107
#include <libdev/status dev.h>
typedef struct energy status s {float batt voltage;
5 int seconds remain;} energy status t;
int e stat bin(status context t *ctx, buf t *buf) {energy status t *es = (energy status t *)sd data(ctx);
10 bufcpy(buf, es, sizeof(energy status t));return STATUS MSG COMPLETE;}
int e stat print(status context t *ctx, buf t *buf) {15 energy status t *es = (energy status t *)sd data(ctx);
bufprintf(buf, "Energy status: \n");bufprintf(buf, " %.2f volts, %d seconds remain\n",
es−>batt voltage, es−>seconds remain);return STATUS MSG COMPLETE;
20 }
int main(int argc, char **argv) {energy status t energy status = {};status context t *stat dev = NULL;
25 status dev opts t s opts = {device: {
devname: "energy/status",device info: &energy status},
30 printable: e stat print,binary: e stat bin};g status dev(&s opts, &stat dev);/* e cmd init(&energy status); */
35 g main();return 0;}
Figure 6.10: A snippet of code that creates a Status Device.
108
PacketDevice
Server
Client1
I O
F
Client2
I O
F
Client3
I O
F
O Isendpd_receive()pd_unblock()
filterPacketDevice
Server
Client1
II OO
F
Client2
II OO
F
Client3
II OO
F
OO IIsendpd_receive()pd_unblock()
filter
Figure 6.11: Block diagram of the Packet Device pattern. The functions send() and
filter() are callbacks defined by the server, while pd recieve() and pd unblock() are func-
tions called by the server.
to clients in a header file, while the printable output constructs an equivalent
message from the same underlying struct. This approach of always reporting the
complete status (rather than a diff–based scheme) simplifies implementation and
eliminates a wide array of potential bugs.
Of course, in a real application there would be mechanism that acquired and
filled in the energy status. In the event that a significant change occurred in
the energy state, it might be appropriate to notify any existing clients. In this
example, notification would take the form of the call g status dev notify(stat dev).
This call would trigger read notification on all clients, who would then re–read
the device to get the updated status.
6.2.3.2 Packet Device
The Packet Device pattern provides a read/write device that provides a queued
multi–client packet interface. This pattern is generally intended for packet data,
109
such as the interface to a radio, a fragmentation service, or a routing service, but
it is also convenient for many other interfaces where queue semantics are desired.
Reads and writes to a Packet Device must transfer a complete packet in each
system call. If read() is not supplied with a large enough buffer to contain the
packet, the packet will be truncated. A Packet Device may be used in either a
blocking or poll()–driven mode. In poll(), readable means there is at least one
packet in its input queue, and writable means that a previously filled queue has
dropped below half full.
Packet Device supports per–client input and output queues with client–configurable
lengths. When at least one client’s output queue contains data, the Packet De-
vice processes the client queues serially in round–robin order, and presents the
server with one packet at a time. This supports the common case of servers that
are controlling access to a rate–limited serial channel.
To deliver a packet to clients, the server must call into the Packet Device
library. Packets can be delivered to individual clients, but the common case is to
deliver the packet to all clients, subject to a client–specified filter. This method
enhances the transparency of the system by enabling a “promiscuous” client to
see all traffic passing through the device.
6.2.3.3 Command Device
The Command Device pattern provides an interface similar to the writable entries
in the Linux /proc filesystem, which enable user processes to modify configura-
tions and trigger actions. In response to a write(), the provider of the device
processes and executes the command, and indicates any problem with the com-
mand by returning an error code. Command Device does not support any form
of delayed or asynchronous return to the client.
110
#include <libdev/command dev.h>
char *e usage(void *data) {return "Echo ’suspend’ to suspend system\n";
5 }
int e command(char *cmd, size t size, void *data) {int retval = EVENT RENEW;if (strncasecmp(cmd, "suspend", 7) == 0) {
10 /* initiate suspend mode. . . */}else
retval |= EVENT ERROR(EINVAL);return retval;
15 }
void e cmd init(energy status t *es) {cmd dev opts t c opts = {
device: {20 devname: "energy/command",
device info: es},command: e command,usage: e usage
25 };g command dev(&c opts, NULL);}
Figure 6.12: Snippet of code that creates a Command Device.
111
While Command Devices can accept arbitrary binary data, they typically
parse a simple ASCII command format. Using ASCII enables interactivity from
the shell and often makes client code more readable. Using a binary structure
might be slightly more efficient, but performance is not a concern for low–rate
configuration changes.
The Command Device pattern also includes a read() handler, which is typically
used to report “usage” information. Thus, an interactive user can get a command
summary using cat and then issue the command using echo. Alternatively, the
Command Device may report state information in response to a read. This
behavior would be more in keeping with the style used in the /proc filesystem,
and is explicitly implemented in a specialization of Command Device called the
Options Device pattern.
Figure 6.12 continues our previous example by adding a Command Device.
Uncommenting line 34 of Figure 6.10 and linking with Figure 6.12 will instantiate
a new Command Device called /dev/energy/command, that can be used to trigger
the system to suspend.
The implementation requires only the “command” handler. This handler tests
the string and triggers the suspend process if the string equals suspend. Any other
string will return the error EINVAL. The usage handler returns a usage string to
the client.
In many cases the commands to a command device are more complex than
a simple keyword. To support these cases, the Emstar libraries include a simple
parser that defines a standard syntax used by most Command Devices. This
syntax specifies a sequence of key/value pairs, delimited by colons.
112
Query Device
Server
Q
I
Client1 Client2 Client3
processqdev_reply()
Q
R RR
usage
Query Device
Server
I
Client1 Client2 Client3
processqdev_reply()
R RR
usage
Figure 6.13: Block diagram of the Query Device pattern. In the Query Device, queries
from the clients are queued and “process” is called serially. The “R” boxes represent a
buffer per client to hold the response to the last query from that client.
6.2.3.4 Query Device
The Device Patterns we have covered up to now provide useful semantics, but
none of them really provides the semantics of synchronous RPC. To address this,
the Query Device pattern implements a transactional, request/response seman-
tics. To execute a transaction, a client first opens the device and writes the
request data. Then, the client uses poll() to wait for the file to become readable,
and reads back the response in the same way as reading a Status Device. For
those services that provide human–readable interfaces, we use a universal client
called echocat that performs these steps and reports the output.
It is interesting to note that the Query Device was not one of the first de-
vice types implemented; rather, most configuration interfaces in Emstar have
been implemented by separate Status and Command devices. In practice, any
given configurable service will have many clients that need to be apprised of its
113
current configuration, independent of whether they need to change the config-
uration. This is exacerbated by the high level of dynamics in sensor network
applications. Furthermore, to build more robust systems we often use soft–state
to store configurations. The current configuration is periodically read and then
modified if necessary. The asynchronous Command/Status approach achieves
these objectives while addressing a wide range of potential faults.
To the service implementing a Query Device, this pattern offers a simple,
transaction–oriented interface. The service defines a callback to handle new
transactions. Queries from the client are queued and are passed serially to the
transaction processing callback, similar to the way the output queues are handled
in a Packet Device. If the transaction is not complete when the callback returns,
it can be completed asynchronously. At the time of completion, a response is
reported to the device library, which it then makes available to the client. The
service may also optionally provide a callback to provide usage information, in
the event that the client reads the device before any query has been submitted.
Clients of a Query Device are normally serviced in round–robin order. How-
ever, some applications need to allow a client to “lock” the device and perform
several back–to–back transactions. The service may choose to give a current
client the “lock”, with an optional timeout. The lock will be broken if the time-
out expires, or if the client with the lock closes its file descriptor.
6.2.3.5 Sensor Device
Sensor Device provides a convenient interface to recorded sensor data. On the
server side, the server acquires the sensor data and calls a function to push it to
the interface. Internally, the Sensor Device maintains a ring buffer of recent data
samples and assigns a monotonic index to each sample.
114
Sensor Device
Q
Client1 Client2 Client3
sdev_push()
R
R
R
usage
Server
RB
command
Sensor Device
Client1 Client2 Client3
sdev_push()
R
R
R
usage
Server
RBRB
command
Figure 6.14: Block diagram of the Sensor Device pattern. In the Sensor Device, the
server submits new samples by calling sdev push(). These are stored in the ring buffer
(RB), and streamed to clients with relevant requests. The “R” boxes represent each
client’s pending request.
Clients can retrieve the data by sending a request for a range of samples to
the Sensor Device. This range specifies an absolute starting sample or a starting
point relative to “now”, and optionally an ending point. If no ending point is
specified, data will continue to be streamed to the client until the client closes the
connection. Because the Sensor Device maintains a ring buffer, a client can access
recent historical data. As we will see in Chapter 7, this property is important for
building systems that want to compare sensor data recorded at different nodes.
Samples are returned to the client in a packet format, with a header that
includes the starting sample index, number of samples, sample size and format.
This header is important because the Sensor Device is in some ways best–effort;
performance glitches or misbehaving clients may result in data dropped from the
sequence. For example, if a client makes a streaming request but never reads, it
is impossible for the Sensor Device to maintain an infinite buffer for that data.
Instead, when the client finally reads, it will only read back beginning with the
115
history retained in the ring buffer.
Like Query Device, Sensor Device is built above Status Device. The im-
plementation works by reporting the next chunk of sensor data as the current
“status”, and keeps notifying the client until no data remains to report. This im-
plementation is simple and provides good fault isolation, but passing bulk data
through FUSD has disadvantages in terms of performance. FUSD involves sig-
nificant context switches in and out of the kernel, and bulk data transfer through
FUSD messages involves many unnecessary copies. For high–rate sensors this can
be a significant challenge. To address this in the future, we intend to implement
a new version of Sensor Device that uses shared memory for bulk data transfer
and uses FUSD only to coordinate access to that memory.
6.2.3.6 Client Libraries
One of the benefits of the Emstar design is that services and applications are
separate processes and communicate through POSIX system calls. As such, Em-
star clients and applications can be implemented in a wide variety of languages
and styles. However, a large part of the convenience of Emstar as a development
environment comes from a set of helper libraries that improve the elegance and
simplicity of building robust applications.
In the preceding Sections we have described several device patterns, and we
have noted that an important part of these device patterns is the library that im-
plements them on the service side. Most device patterns also include a client–side
“API” library, that provides basic utility functions, GLib compatible notification
interfaces, and a crashproofing feature intended to prevent cascading failures.
Crashproofing is intended to prevent the failure of a lower–level service from
causing exceptions in clients that would lead them to abort. It achieves this
116
by encapsulating the mechanism required to open and configure the device, and
automatically triggering that mechanism to re–open the device whenever it closes
unexpectedly.
The algorithm used in crashproofing is described in Figure 6.15. The argu-
ments to this algorithm are the name of the device, and two callback functions,
config and handler. The config function configures a freshly opened device file
according to the needs of the client, e.g. setting queue lengths and filter pa-
rameters. The handler function is called when new data arrives. Note that in
the implementation, the call to poll() occurs in the GLib event system, but the
fundamental algorithm is the same.
A client’s use of crashproof devices is completely transparent. The client
constructs a structure specifying the device name, a handler callback, and the
client configuration, including desired queue lengths, filters, etc. Then, the client
calls a constructor function that opens and configures the device, and starts
watching it. according to the algorithm in Figure 6.15. In the event of a crash
and reopen, the information originally provided by the client will be used to
reconfigure the new descriptor. Crashproof client libraries are supplied for both
Packet and Status devices.
6.2.3.7 Domain Specific Device Interfaces
Along with the generic devices we have described, there are also many domain–
specific device interfaces. These interfaces are implemented by libraries and are
usually composed of a set of devices that, taken together, provide a single logical
interface. The most broadly used example of this is the Data Link interface, a
specification of a standard interface for network stack modules.
The link interface is composed of a set of devices located in the /dev/link/*
117
Watch-Crashproof(devname,config,handler)
1 fd ← open(devname)2 if configure(fd) < 0 goto 113 crashed ← false
4 resultset ← poll(fd , {input, except})5 if crashed6 then status ← read(fd , buffer)7 if status < 0 abort8 if devname ∈ buffer goto 19 else
10 if except ∈ resultset11 then close(fd)12 fd ← open(“/dev/fusd/status”)13 if fd < 0 abort14 crashed ← true
15 elseif input ∈ resultset16 then status ← read(fd , buffer)17 if fatal error goto 1118 if status ≥ 0 handler(buffer , status)19 goto 4
Figure 6.15: “Crashproof” auto–reopen algorithm.
118
tree. Each link device is composed of a set of device files in a subdirectory
named by the link name, e.g. /dev/link/udp0/*. A link device always has three
subdevices: data, status and command, and in addition may also have other
related devices, such as neighbors, routes, errors, etc.
The data device is a Packet Device interface that is used to exchange packets
with the network. All packets transmitted on this interface begin with a standard
link header that specifies common fields. This link header masks certain cosmetic
differences in the actual over–the–air headers used by different MAC layers, such
as the Berkeley MAC [HSW00] and SMAC [YHE02] layers supported on Mica
Motes.
The command and status devices provide asynchronous access to the config-
uration of a stack module. The status device reports the current configuration
of the module (such as its channel, sleep state, link address, etc.) as well as the
latest packet transfer and error statistics. The command device is used to issue
configuration commands, for example to set the channel, sleep state, etc. The
set of valid commands and the set of values reported in status varies with the
underlying capabilities of the hardware. However, the binary format of the status
output is standard across all modules (currently, the union of all features).
Many “link drivers” and services have been implemented using the Link in-
terface. This uniform interface enables services to be stacked and swapped (at
run–time if needed), and provides a uniform interface for applications. We will
discuss the services in more detail in Section 6.2.4.
6.2.4 Layer 3: Emstar Components and Services
Layer 3 in the Emstar design is a collection of reusable components and services
that address common needs in embedded networked systems. This spans a wide
119
range of functionality including device drivers, routing algorithms, time synchro-
nization services, and distributed collaboration services. In this section we will
introduce many of the components, while Chapters 7 and 8 will focus on time
synchronization and network services in more detail.
6.2.4.1 Network Stack Components
In Section 6.2.3.7 we described the Link interface used to create network stack
components. Emstar includes a suite of components that can be used and com-
bined to provide network functionality tuned to the needs of wireless embedded
systems. These components include “link drivers”, that implement the lowest
layer interfaces to network resources, pass–through modules that implement var-
ious types of filter and passive processing, and routing modules that provide
facilities that provide network–layer interfaces that route messages among one or
more link–layer interfaces.
Emstar implements several “link drivers”, providing interfaces to radio link
hardware including 802.11, and several flavors of the Mica Mote. The 802.11
driver overlays the socket interface, sending and receiving packets through the
Linux network stack, and optionally integrating feedback from the MAC layer
about RSSI, precise timing, and transmission failures. Two versions of the Mote
driver exist, one that supports both Berkeley MAC and SMAC on Mica2, and
a new version that supports only BMAC but adds support for newer platforms
such as Telos.
Because all of these drivers conform to the link interface spec, applications use
a single access method to use different physical radio hardware. However, the Link
interface does not intend to treat all links transparently—it explicitly exposes low
level information about the link’s capabilities and status so that applications can
120
make intelligent decisions about how and when to use them. For example, link
capabilities such as variable transmit power, nominal link capacity, and MTU
are all accessible to applications and routing algorithms from the Link’s status
device.
Link interfaces are also used to construct modules sit in the middle of the
stack, passing packets through to lower layers, possibly analyzing or modifying
them along the way. A pass–through module is both a client of a lower Link
device and a provider of an upper Link device. To simplify the implementation,
some of the work of proxying status and command interfaces is done by a library.
In some cases, the implementation of a pass–through involves implementing a
single function that transforms a packet from above and sends it below, and vice
versa.
Linkstats, Blacklisting, and Fragmentation are examples of pass–through mod-
ules. Linkstats adds a small header to each packet and counts gaps in sequence
numbers to estimate link quality. Blacklisting uses the output of a neighbor dis-
covery module to block traffic on links that are not bidirectional. Fragmentation
breaks large packets up into smaller packets.
Link interfaces are also used to provide interfaces to routing modules. These
are network layer interfaces rather than link layer interfaces, so the source and
destination addresses are usually interpreted as network layer IDs i.e. Node IDs)
rather than link layer IDs (i.e. Interface IDs). The simplest routing module is the
floodd module, which accepts messages, adds a sequence number, and re–sends
each message exactly once. There is also a generic routing module called sink that
uses routing tables provided by another module to route messages to a specified
destination.
121
6.2.4.2 Local Directory Service
One of the best examples of a small Emstar service is the local directory service.
This service allows applications to dynamically assign and share mappings of
short strings to small integers. This avoids the need to statically assign numbers
to items in the system where the set of items is known only at run–time.
For example, there are many implementations that might provide Link inter-
faces, and an implementations might be used more than once in a single Emstar
system. One solution would be a global file that statically assigned numbers to
links. The disadvantage of such a scheme is that it is difficult to manage, and
the list will grow long and cumbersome.
Using the Directory service, each Link provider can register their link with
the service and be dynamically assigned a number. These numbers are thus
guaranteed to be small and the mapping is known by querying the Directory
service. The Directory service is also used in several other places, including to
define mappings of local clocks to numbers. However, because these mappings
are dynamic on each node, the numbers assigned cannot be assumed to be the
same on two different nodes in the system. For cases where global assignments
must be made, other techniques must be employed.
6.2.4.3 EmRun Services
EmRun is a program that parses a configuration file that describes an Emstar
system, launches the system, and provides several centralized services. Among
the services it provides are automatic respawning with visibility into the status
and history of processes, centralized in–memory logging, process responsiveness
tracking, fault reporting, and fast startup / graceful shutdown.
122
Respawn Process respawn is neither new, nor difficult to achieve, but it is very
important to an Emstar system. It is difficult to track down every bug, especially
ones that occur very infrequently, such as a floating–point error processing an
unusual set of data. Nonetheless, in a deployment, even infrequent crashes are
still a problem. Often, process respawn is sufficient to work around the problem;
eventually, the system will recover. Emstar’s process respawn is unique because
it happens in the context of “crashproofed” interfaces (Section 6.2.3.6). When
an Emstar process crashes and restarts, Crashproofing prevents a ripple effect,
and the system operates correctly when the process is respawned.
When processes die unexpectedly, EmRun tracks the termination signal and
last log message reported by the process. This information can be accessed from
the last msg device, that reports the count and circumstances of all process ter-
minations along with their final message.
In–Memory Logs EmRun saves each process’ output to in–memory log rings
that are available interactively from the /dev/emlog/* hierarchy. These illustrate
the power of FUSD devices relative to traditional logfiles. Unlike rotating logs,
Emstar log rings never need to be switched, never grow beyond a maximum size,
and always contain only recent data.
Process Responsiveness Tracking An unresponsive server can cause perfor-
mance bottlenecks for an entire Emstar system. To address this, EmRun tracks
the responsiveness of all processes in the system. The EmRun client library in-
cludes a timer event that sends a periodic heartbeat message to EmRun. EmRun
tracks the arrival of these messages and compares the arrival time to the sched-
uled send time according to the timer. The discrepancy in the times reveals an
estimate of the responsiveness of the process, since whenever the timer fires, that
123
process is also free to respond to I/O events.
Fault Reporting Fielded systems often have the possibility of unexpected
faults. For example, our acoustic systems sometimes experience wiring problems
that cause one or more channels of the microphones or the speaker to fail. It
is also possible for our acoustic systems to be incorrectly set up, for example
if the battery pack for the microphone array is disconnected or if the array’s
wires are crossed. A driver that detects a fault can publish this fault through
a centralized reporting service provided by EmRun. This fault report is only
available locally through the faults device file, but other modules can publish
that fault information over the network to other nodes and to the user.
The fault reporting API is very simple. A process reports a fault by opening
a device file, writing in a string describing the fault, and keeping that file open
until the fault is corrected. When the file is closed (whether by the application
or by the process terminating), the fault will be removed from the list. Other
processes may monitor the fault list using the standard Status Client library.
Fast Startup EmRun’s fast startup and graceful shutdown is critical for a
system that needs to duty cycle to conserve energy. The implementation depends
on a control channel that Emstar services establish back to EmRun when they
start up. Emstar services notify EmRun when their initialization is complete,
signaling that they are now ready to respond to requests. The emrun init() library
function, called by the service, communicates with EmRun by writing a message
to /dev/emrun/.int/control. EmRun then launches other processes waiting for
that service, based the dependency graph expressed in the EmRun configuration
file.
This feedback enables EmRun to start independent processes with maximal
124
parallelism, and to wait exactly as long as it needs to wait before starting de-
pendent processes. This scheme is far superior to the naive approach of waiting
between daemon starts for pre–determined times, i.e., the ubiquitous “sleep 2”
statements found in *NIX boot scripts. Various factors can make startup times
difficult to predict and high in variance, such as flash filesystem garbage collec-
tion. On each boot, a static sleep value will either be too long, causing slow
startup, or too short, causing services to fail when their prerequisites are not yet
available.
Graceful Shutdown The control channel is also critical to supporting graceful
shutdown. EmRun can send a message through that channel, requesting that the
service shut down, saving state if needed. EmRun then waits for SIGCHLD to
indicate that the service has terminated. If the process is unresponsive, it will be
killed by a signal.
An interesting property of the EmRun control channel is one that differen-
tiates FUSD from other approaches. When proxying system calls to a service,
FUSD includes the PID, UID, and GID of the client along with the marshalled
system call. This means that EmRun can implicitly match up the client con-
nections on the control channel to the child processes it has spawned, and reject
connections from non–child processes. This property is not yet used much in
Emstar but it provides an interesting vector for customizing and securing device
behavior.
6.2.5 Layer 4: Additional Tools and Environment
In addition to reusable components that might be integrated into newly developed
systems, there are a collection of ancillary tools that help developers design,
125
implement, and deploy new systems. These tools include tools for simulation,
deployment, remote access, and visualization.
6.2.5.1 EmSim: the Emstar Simulator
Transparent simulation at varying levels of accuracy is crucial for building and
deploying large systems [EBB03a] [LLW03] [GSR04]. EmSim enables “real–code”
simulation at many different accuracy regimes. EmSim runs many virtual nodes
in parallel, each with its own device hierarchy. Because Emstar applications al-
ways interact with the world through standard interfaces such as Link devices and
Sensor Devices, EmSim can transparently run nodes in simulation by presenting
identical interfaces to simulated or remote resources.
For operations in pure simulation, a radio channel simulator and a sensor
simulator can provide interfaces to a modeled world, or re–play conditions that
have been observed empirically. EmSim also supports “emulation mode”, in
which real hardware such as radios and sensors can be accessed remotely. This
yields a very convenient interface to a testbed, because the entire application
can run centrally on a single simulation server, while the radio traffic or sensor
data comes from a deployed testbed. We have found that using real radios is far
superior to attempting to model radios, especially when there may be bugs or
glitches in the operation or performance of the radios.
These different simulation regimes speed development and debugging; pure
simulation helps to get the code logically correct, while emulation in the field helps
to understand environmental dynamics before a real deployment. Simulation and
emulation do not eliminate the need to debug a deployed system, but they do
tend to reduce it.
In all of these regimes, the Emstar source code and configuration files are
126
identical to those in a deployed system, making it painless to transition among
them during development and debugging. This serves to eliminate accidental
code differences that can arise when running in simulation requires modifications.
EmSim can also simulate heterogeneous networks contain both Motes and Emstar
systems, by running the Mote code inside an EmTOS wrapper. Other “real–
code” simulation environments include TOSSim [LLW03] and SimOS [RBD97],
but Emstar is the only environment that readily supports heterogeneous networks
and “emulation” using real hardware.
6.2.5.2 Remote Access Methods
As an IPC mechanism, FUSD has the benefit being fast, deterministic, and syn-
chronous, making straight–line programming of sequential calls possible. These
properties make it easy to communicate between processes on a single node, but
unlike sockets, they do not provide any native remote access mechanism. In ad-
dition, some languages such as Java are designed to handle everything in terms
of sockets, and don’t have complete support for POSIX system calls.
To address these concerns, we have implemented several remote access mech-
anisms to Emstar. These mechanisms enable access to Emstar services over the
network, and can also simplify the integration of Emstar with other systems. The
three remote access mechanisms supported by Emstar are FUSDnet, the Emstar
HTTP server, and EmProxy.
FUSDnet FUSDnet is a remote access protocol based on FUSD. Using FUS-
Dnet, any FUSD device can be accessed remotely via a sockets protocol.
A server that wants to enable incoming remote connections must set a special
flag when it registers the device. This flag will register the device with the
127
FUSDnet daemon, which listens for incoming requests and de–multiplexes them
to the appropriate server. A client that wants to connect to a remove FUSD
service must run a client program that opens a socket to the remote node, requests
a connection to the specified device, and creates a local stub device. Once the
connection is established, the local stub device will be an exact mirror of the
remote device. A system call made on the stub will be marshalled and transferred
via the socket to the remote server, where it will be handled and a response
returned. Thus, FUSDnet provides transparent access to remote FUSD services.
FUSDnet is transparent, but it is recommended only for use in conditions
with reliable and deterministic network links between client and server. FUSDnet
might be a convenient way to link two Emstar systems together if they have a
wired Ethernet link between them, but it would not be so appropriate if those two
systems are physically separate and are connected wirelessly. For such situations,
protocols designed for slow or unreliable links would be preferred.
HTTP With the advent of the Web, HTTP has become one of the most uni-
versally implemented protocols. Recognizing this, Emstar supports an HTTP
gateway that enables remote access to FUSD devices. This access is implemented
by CGI scripts that enable access to Status Devices, Command Devices, and Log
Devices via simple URL formats. This approach can easily be extended by adding
additional CGI scripts to handle other device types.
The Emstar HTTP server integrates with EmRun to provide a default web
page that shows the node’s current status and can integrate sub–pages for each
running process. This “Node Page” can make it easy for novice users to browse
the status of a node using a web client. The HTTP service also enables integration
with Java and other software that can readily access services via HTTP, and
128
allows those programs to run remotely.
EmProxy EmProxy is a remote access protocol that exposes real–time state
changes via a best–effort UDP protocol. Unlike FUSDnet and HTTP, which use
TCP to reliably connect to a specific node, EmProxy provides a broadcast inter-
face to control and monitor groups of nodes over a broadcast network. Because
EmProxy can operate over broadcast to groups of nodes, it is very useful in de-
ployed environments where many nodes are involved but only a subset of those
nodes are reachable at any given time. The use of UDP enables EmProxy to
report status at high rates, dropping messages rather than buffering them when
the rate exceeds capacity.
An EmProxy client connects to EmProxy by periodically sending a request
message that lists a set of Status Devices to monitor. The EmProxy service opens
those Status Devices and monitors them for notification. Whenever notification
is triggered, EmProxy reads the new state and reports that back to the requester
via UDP. The request string can include options and arguments to limit the rate
at which replies are reported, to automatically re–read the device periodically,
and to set the mode in which the device is read (e.g. binary, ASCII, XML, etc.)
EmProxy also supports the ability to run shell commands and report back the
results. This broadcast remote shell is very useful for managing and controlling
groups of nodes in a deployed setting.
6.2.5.3 Deployment Tools
Emstar includes a number of tools and facilities designed to aid deployments.
When working on a deployment the two primary issues to address are finding
out the state of the nodes and controlling the nodes. Emstar provides several
129
mechanisms that address these issues: rbsh (Remote Broadcast SHell), IP routing,
and efficient state flooding. The routing and flooding facilities will be discussed
in more detail in Chapter 8.
rbsh The rbsh program is an invaluable tool for dealing with collections of nodes.
It provides a simple shell prompt interface, but when commands are written at
the prompt, they are broadcast out over a selected network, and the command
script is run on each node. The results of the command are then reported back
and collated at the prompt.
In a deployed setting, this provides a fast and convenient way to send com-
mands to all reachable nodes without needing to know which nodes exist or which
are reachable. In addition, relative to tools based on ssh and other connection–
oriented protocols, there is no need to maintain connections to remote nodes,
nor to time out broken connections. The result is a simple, fast, and generally
intuitive shell interface.
We have also had success using rbsh in scripts to control groups of nodes while
running experiments and to implement simple forms of coordination without
writing specialized application code.
IP Routing In a deployed setting, being able to telnet across the network is
very useful. For this reason, even if the application itself does not require end–
to–end IP routing, IP routing can be very useful for debugging a deployment.
Often full pairwise routing is not needed, and routing along a tree is sufficient.
Emstar provides IP routing by combining native Emstar routing facilities with
the IP Connector. The IP connector creates a tunnel device that IP applications
such as telnet and ping can use, but routes the traffic on that device through a
Emstar Link device.
130
State Flooding One of the most challenging parts of a deployment is deter-
mining what is happening in the network. For example, it is important to know
the link quality observed between different nodes in the system so that gaps in
connectivity can be corrected. It is also important for the user to be aware of
faults that may have occurred in the deployment.
To address these needs, Emstar includes an efficient reliable state flooding
mechanism. This mechanism does not rely on any form of routing; it exchanges
messages peer to peer to flood the current state of a set of variables to its neigh-
bors with a hopcount to limit propagation. Each flooded message also includes
a sequence number that is used to detect gaps in the sequence. When a gap is
detected, a local retransmission protocol requests the missing data. This mecha-
nism is described in more detail in Chapter 8.
In our acoustic deployment, we used this mechanism to flood reported faults
and neighbor link quality. This enabled us to use a laptop to observe the network
from anywhere in the field, quickly getting a picture of the link quality in the
network, and immediately seeing any reported faults.
6.2.5.4 Visualizing Emstar Systems
EmView is a graphical visualizer for Emstar systems. Through an extensible
design, developers can easily add “plugins” for new applications and services.
Figure 6.16 shows a screen–shot of EmView displaying real–time state of a run-
ning deployment at the James Reserve. In this instance, the data was being
displayed live from our state flooding protocol.
EmView uses the EmProxy protocol to acquire status information from a
collection of nodes. Although the protocol is only best–effort, the responses
are delivered with low latency, such that EmViewcaptures real–time system dy-
131
Figure 6.16: Screen shot of EmView, the Emstar visualizer.
namics. In order to support heterogeneous networks, EmView first requests a
configuration file from each node that details how to visualize the services on
that node. Based on that file, EmView then follows up with a request for node
status as needed. This design enables EmView to visualize any Emstar system
without needing to be informed up front about the details of what software or
services are present in each system.
132
CHAPTER 7
A Synchronized Distributed Sampling Layer
The time–synchronized distributed sampling layer is a critical part of what makes
this platform so ideally suited to distributed acoustic processing, and it is also
one of the most difficult parts of the system to engineer. This layer provides an
application developer with an API that represents the incoming signals at a single
node as a contiguous time series with a monotonically increasing sample clock. It
also provides a mechanism to precisely compare the time that two samples were
recorded, even if those samples were recorded on different nodes in the system.
This layer greatly simplifies many distributed signal processing applications.
For example, our calibration system demonstrates how this layer simplifies a
time–of–flight ranging implementation. When a node emits a ranging signal,
it detects the signal locally to determine the exact sample index at which the
signal was emitted. Then, that sample index is published across multiple hops
to potential receivers. Through the synchronized sampling layer, receiving nodes
can translate the sender’s sample index into their own time series, correct for any
skew in sample rates, and extract the portion of their local signals containing
the ranging signal. This process will succeed as long as the receiver performs the
extraction within 8 seconds of the original event occurring.
This layer consists of three elements which we will describe separately in the
following sections: a buffered acoustic sensor interface, a time synchronization
system, and a hop–by–hop time conversion facility built into the routing layer.
133
7.1 A Buffered Acoustic Sensor Interface
SamplingHardware
vxpcd
Ring Buffer
Client
/dev/sensors/vxp/all
syncd
/dev/sync/params
/dev/sync/pairs_inject
(remote) syncd
SamplingHardware
vxpcd
Ring BufferRing Buffer
Client
/dev/sensors/vxp/all
syncd
/dev/sync/params
/dev/sync/pairs_inject
(remote) syncd
Figure 7.1: Block diagram of the buffered acoustic sensor interface.
The buffered acoustic sensor interface, shown in Figure 7.1, is critical to
simplifying the development of distributed sensing applications. This compo-
nent maintains a consistent, monotonic, and continuous timebase, correlates that
timebase to the node’s main CPU clock, and provides a multi–client buffered and
streaming interface to the sensor data. In the diagram, the box marked vxpcd is
the acoustic sensor interface, providing access to data sampled from the sound
hardware. The box marked syncd is the time synchronization service, which is
discussed in more detail in Section 7.2.
This interface performs several important functions, which are described in the
next few sections. While some of these features are specific to the shortcomings of
this particular hardware, in our experience most sound hardware has some subset
of these problems. Given the costs of developing custom hardware, system designs
that can work around hardware shortcomings are desirable.
134
7.1.1 Continuous Sampling and Buffering
In order to provide a continuous and monotonic timebase, vxpcd continuously
samples from the sound hardware. By sampling continuously, vxpcd leverages the
frequency stability of the sample clock, and avoids glitches and discontinuities in
the time series. In the event that a hardware error or another problem forces a
break in sampling, vxpcd will insert space into the signal in an effort to preserve
the continuity of the signal. Continuous sampling also enables synchronization
information to be estimated and retained over time, as opposed to having to
re–sync each time sampling is started.
Given that vxpcd is sampling continuously, buffering that data is the next
logical step. In addition to streaming new data to its clients, vxpcd retains the
audio data in a large ring buffer. This buffered interface can greatly simplify the
design of distributed sensing applications, because the application can sustain
significant coordination delays without worrying that the signals of interest will
have passed by the time the system can react. In many applications, nodes
that may not initially detect a signal can still extract information about the
signal if they know where to look. A buffered sensor interface enables such an
implementation to work, even if the node coordination is delayed by messaging
latency or local processing delays. Even in cases where potential receivers can be
warned in advance, it is usually simpler to design the system with more relaxed
timing considerations.
7.1.2 Synchronization
One of the biggest engineering challenges of building this platform is the prob-
lem of achieving synchronization to the audio hardware, a Digigram VXP440
PCMCIA card. Difficulties with the design of this specific audio hardware made
135
-500000
0
500000
1e+06
1.5e+06
2e+06
2.5e+06
3e+06
3.5e+06
0 10 20 30 40 50 60 70 80-15
-10
-5
0
5
10
15
Offse
t(µ
S)
Fit
erro
r(µ
S)
Samples x50000 (Seconds x1.04)
Correlation of VXP440 Sample Clock to CPU Clock (RMS=5.2 µS)
Figure 7.2: Plot of the linear relationship between the VXP sample clock and the
platform’s CPU clock.
this problem more difficult, but we have encountered similar issues in the past
with other hardware, including the audio hardware internal to the iPAQ and the
Cirrus Logic CS4281. In general, the sound hardware that is generally available
off–the–shelf is not designed to support high–precision synchronization.
Using the VXP440, two separate synchronization problems must be solved.
First, the 4 channels of audio must be synchronized together in order to achieve
high–precision phase comparisons between the channels. Second, the audio streams
must be synchronized to the system clock, so that software running on the main
processor can relate a particular point in a time series to a particular time.
In previous systems, we have used interrupt timing to deduce the time at
which samples were recorded. However, in the case of the VXP440 interrupt
timing was not well–correlated to the sample timing. In this case we contracted
136
with the manufacturer to add a custom feature that would report on demand the
total number of samples recorded by the card since the beginning of sampling.
Given this feature, we modified the in–kernel driver and the vxpcd module to
exploit this new command1.
To synchronize the audio streams to the CPU clock, the modified vxpcd mod-
ule periodically queries the card to retrieve the total sample count for each channel
and records the CPU time at which the command was issued. Each of these re-
quests provides a single observation, a data point that maps a sample index to a
CPU time. These observations are submitted to the timesync system, which per-
forms a linear fit on the data to determine a relation between the two clocks that
can be used to convert from one to the other. In order to enable finer–granularity
conversion, the sample clock is expanded by a factor of 20, so that each count is
approximately 1 microsecond at a sample rate of 48 KHz. This enables times to
be expressed and converted with sub–sample accuracy.
Figure 7.2 shows the relationship between the two clocks at a particular time
on one of our nodes2. The x axis is in multiples of 50K samples, or 1.04 seconds
at a 48 KHz sample rate. The line running across the graph represents the
linear conversion function that is used to convert from one clock to another.
This line and the points on the graph are plotted according to the scale of the
left–hand y axis, representing the offset in µS relative to a constant offset. The
three horizontal dashed lines and the impulses represent the residuals from the
linear fit: the difference between a specific data point and the conversion value.
These are plotted in µS, according to the scale of the right–hand y axis. The
“RMS fit error” is an estimate of the quality of the fit based on the root mean
1The modified firmware and drivers are available from our website. Although the newfirmware appears to introduce certain race conditions, we have used it successfully.
2This method of graphing syncd time conversions is due to Jeremy Elson.
137
square of the residuals. Properties of the VXP440 hardware that cause a non–
deterministic timing on command responses are the limiting factor in achieving
tighter synchronization.
In addition to synchronizing the acoustic time series to the CPU clock, vxpcd
must also synchronize the 4 channels of the VXP440. Although the sampling
process is driven off a single crystal, the design of the VXP440 implements two
independent stereo streams with no explicit synchronization between them. Be-
cause the design of the command channel precludes the possibility to command
both streams in a synchronized manner, we must implement a mechanism for
lining up the streams after the fact.
To do this, vxpcd opens each stream independently and buffers the data until
synchronization data can be retrieved from the card. By requesting synchroniza-
tion data from each stream in turn and matching up the CPU timestamps, vxpcd
can determine the offset between the channels and insert spaces into one of the
streams to sync them up. Because both streams are driven from the same clock,
the inter–channel synchronization achieved by this system is quite accurate.
7.1.3 Multi–Client Interface
The vxpcd module presents a multi–client Sensor Device interface that supports
both streaming and buffered access modes. The Sensor Device interface, de-
scribed in Section 6.2.3.5, is one of the Emstar device patterns. Sensor Device
provides an interface that can be accessed from the shell or from scripts using
simple utilities, as well as via a binary programmatic API implemented in the
Sensor Client library. Sensor Device supports an unlimited number of concurrent
clients, enabling multiple independent applications to use the same sensor data.
Sensor Device supports a streaming interface with a buffer of past samples. A
138
client requests data starting at a particular sample index. If that starting index is
in the past, the past data is immediately reported, and any remaining requested
data is streamed to the client as it arrives. If that starting index is in the future,
no data is returned until the first samples arrive.
The client API implements helper functions that provide buffered and stream-
ing event–driven interfaces. The buffered interface allows a client to request a
whole buffer, and have that buffer delivered whole for processing. This is usually
the most convenient way to acquire a bounded clip of data when the data must
be processed in its entirety. The streaming interface allows a client to receive
incoming data in fixed or variable sized chunks. This is most convenient when
the client must process a stream of data in real time, with bounded latency.
The vxpcd module uses the Sensor Device pattern to provide two separate
sensor devices: single and all. The single device presents only the data received on
channel 0, while the all device presents all four channels, after synchronization, in
an interleaved format. By providing both of these alternatives, applications that
need to do simpler real–time detection can reduce overhead by only processing
one of the channels. In the event that an event is detected, the data from all four
channels can be extracted and more sophisticated processing can be done.
7.2 An Integrated Time Synchronization Service
The Emstar time–synchronization service was discussed in prior work [EGE02a]
[ER02] [EGE02b], including a Ph.D. thesis [Els03]. That work developed the
framework and theory behind the Emstar time synchronization services, as well
as the initial implementation of syncd, the Emstar timesync service, shown in
Figure 7.3. In this work, we continued to develop that implementation, devel-
139
syncd Packet IO Thread
vxpcd
Compute Thread
RangingSystem
/dev/sync/params
/dev/sync/pairs_inject
udpd
floodd
syncd Packet IO Thread
vxpcd
Compute Thread
RangingSystem
/dev/sync/params
/dev/sync/pairs_inject
udpd
floodd
Figure 7.3: Block diagram of the syncd service.
oping new drivers and addressing some additional system considerations. We
also proved more conclusively that the approach promulgated in the Reference
Broadcast Synchronization (RBS) system design is often the only solution to tight
cross–node time synchronization that does not require low–level firmware access
that is generally impossible to achieve using COTS components.
7.2.1 Conversion–Based Time Synchronization
The Emstar timesync implementation is a departure from many other timesync
implementations, such as NTP [Mil94], as well as many timesync schemes in
the sensor network domain [GKS03] [GR03] [MKS04] [GGS05]. Rather than at-
tempting to synchronize or discipline clocks, the Emstar timesync module allows
the clocks to run freely, instead computing conversion parameters that enable an
application to relate timestamps from different timebases.
140
This approach has several advantages. First, disciplining a clock often requires
constant, timely adjustments, which can be difficult to engineer in application
software. The most effective clock disciplining approaches are implemented with
specialized hardware, e.g. a phase locking feedback loop using a VCO, frequency
counter, and a DAC, or a temperature compensation circuit. When it is possible
to implement these types of solution in software, it must be done at a very low
layer of the system.
Second, altering a running clock results in discontinuities that complicate the
correct use of the clock values. To use these clock values correctly, applications
must be aware of a complex array of discontinuities, including the possibility that
the clock jumps backwards. When considering signal processing applications that
assume isochronous time series, this added complexity is a significant headache.
Third, the system can operate in a relative sync mode or stay offline for long
periods of time without introducing problems. Where clock–disciplining solutions
encounter their worst problems with discontinuities when they are forced to run
offline for long periods of time, a conversion–based system can always gracefully
recover. While timestamps recorded in the intervening offline period may not
be accurately converted, once new conversion parameters are computed, recent
timestamps will be readily compared. In addition, a conversion–based approach
does not require a global master clock; any two peers can meaningfully compare
their timestamps to each other without any third–party reference.
This third point highlights the caveat to the conversion–based timesync ap-
proach: because clocks are not linear over long time periods, interpretation of his-
torical timestamps requires historical conversion information. Thus, conversion–
based approaches are plainly better for applications requiring precise timing com-
parisons of recent timestamps. However, this approach is not sufficient for appli-
141
cations that require interpretation of historical timestamps, or that require global
time or frequency references.
The easiest way to address this concern is to use the Emstar conversion–based
timesync system to publish “global time” out to the network from a trusted
time and frequency reference. The Emstar gsyncd module does exactly this; it
uses hop–by–hop local conversions to disseminate global time from one or more
locations. Thus, historical timestamps should be maintained in a global timebase,
while accurate local comparisons can be made directly using local conversion
parameters.
7.2.2 The Timesync API and Time Conversion Graph
The syncd module maintains a time conversion graph in which nodes represent
clocks and edges represent linear conversion functions. This graph includes con-
versions from two different types of data: RBS relations, and “pairs” relations.
An RBS relation relates clocks on two nodes by correlating the measured
times of events, for example of the reception of broadcast packets. The better
correlated the event time estimates, the better RBS will work. That is, if the
process of detecting an event tends to have correlated latency on all nodes, then
RBS will work well, even if the latency varies widely from one event to the next.
The canonical example of this is the fact that when considering reception of
broadcast packets, RBS is immune to non–determinism in media access times,
variations in packet length and transmit rate, etc.
A pairs relation correlates an arbitrary clock to a node’s CPU clock. For
example, the relationship between the CPU clock and the sample clock for a
sound card is expressed as a pairs relation. The difference is subtle: where RBS
relations are between two instances of the same observation mechanism, pairs
142
relations are between two different observation mechanisms. This means that
pairs relations tend to introduce more error, and often have an error distribution
that does not have a µ of 0, whereas RBS relations can often factor out the
common mechanisms because the error it introduces is correlated.
These RBS and pairs relations form a connected graph. All of the pairs
relations on a particular node form a star topology around the CPU clock, while
RBS relations link from one node to another. The client side of the timesync
API allows an application to readily convert a timestamp in any known clock to
any other clock that is connected to it through the conversion graph. The server
side of the API allows a service to inject synchronization information that clients
can use.
For example, in Section 7.1.2 we described how our implementation synchro-
nizes the sound hardware to the CPU clock by submitting observations to the
timesync subsystem as a pairs relation. Once this data is provided to syncd,
applications can convert from the sample clock to and from the CPU clock, and
thus to any other clock that can is “reachable” from that node. By converting
timestamps in packets as they travel through the network, sensor data from re-
mote nodes annotated with timestamps can be precisely matched up with local
sensor data.
One of the improvements we made to the timesync system was to decouple the
portions of syncd that send and receive radio messages, the portion that performs
the linear fit computations, and the portion that services clients and servers.
These changes were necessary to allow the system to scale to larger numbers of
nodes, higher density deployments, and to more complex systems. Figure 7.3
shows a block diagram of the use of timesync in our platform and ranging appli-
cation. During the development of this platform, we discovered that the linear
143
0
1000
2000
3000
4000
5000
6000
7000
0 200 400 600 800 1000 1200-20
-15
-10
-5
0
5
10
15
20
Offse
t(µ
S)
Fit
erro
r(µ
S)
Seconds
Correlation of Network interrupts to CPU via RBS (RMS=8.7 usec)
Figure 7.4: RBS correlation of the timing of received broadcasts. This graph shows
that CPU clocks are stable with respect to each other over time periods as long as 20
minutes.
fit computations were blocking the main thread and causing significant system
latency problems. We also encountered a problem with module dependency loops
when we added timesync support to the low layer radio interface driver. Both
of these problems were solved through the use of threads and message queues,
represented in the diagram by dashed lines.
7.2.3 RBS vs. MAC Layer Timestamps
During the development of this platform, we discovered that the our existing
RBS–based synchronization was not performing as well as it had on other plat-
forms. Figure 7.4 shows the performance of RBS synchronization on our platform,
based on correlating interrupt arrival times on different nodes. The plot shows
144
0
20
40
60
80
100
120
140
0 50 100 150 200 250 300-100
-80
-60
-40
-20
0
20
40
60
Offse
t(µ
S)
Fit
erro
r(µ
S)
Seconds
Correlation of two MAC clocks using RBS (RMS=1.2 usec)
-2000
0
2000
4000
6000
8000
10000
12000
14000
0 50 100 150 200 250 300-1000
-500
0
500
1000
1500
2000
Offse
t(µ
S)
Fit
erro
r(µ
S)
Seconds
Correlation of the MAC clock to the CPU clock: (RMS=393.6 µS)
Figure 7.5: The MAC clocks appear to actively adapt their rates, rather than main-
taining frequency stability: (a) shows a central mode with perfect rate matching, while
(b) shows that the frequency of the MAC clock is unstable when referenced to the CPU
clock. But we know from Figure 7.4 that the CPU clocks are stable with respect to
each other.
145
-200
-150
-100
-50
0
50
100
150
200
0 25 50 75 100 125 150
µSec
Seconds in CPU time
Frequency Stability of MAC-level timestamps
Observations after applying linear correction
Figure 7.6: Expanded plot of MAC timestamps showing high levels of noise.
seconds on the x axis, and shows the residuals from the linear fit as impulses ac-
cording to the scale of the right–hand y axis (see Section 7.1.2 for a more detailed
explanation of these plots).
Where we had been able to achieve fit errors of a microsecond with our previ-
ous iPAQ platform and Orinoco 802.11 cards, we found that our performance on
the Slauson with SMC 802.11 cards was 15–30 microseconds fit error given the
same averaging parameters. We found that we could improve this situation by
extending the averaging period, but even this could only improve the fit error to
about 5–10 microseconds. The problem appeared to be that, compared with the
Orinoco firmware, the generation of interrupts by the Prism II firmware was less
deterministically correlated with packet reception. As a result, we were seeing a
great deal more noise in the data, which required longer averaging intervals to
correct.
146
In an effort to improve upon this, we implemented a new in–kernel inter-
face that exposed MAC layer header information about each packet, including
a microsecond–granularity MAC layer timestamp. We hoped that by using this
timestamp we could get a more precise RBS relation directly between cards, and
we could then create a comprehensive pairs relation from the MAC clock to the
CPU clock. The advantage of this approach would be that we could use every
arriving packet to improve the pairs relation, and the MAC timestamp would
be highly accurate because of its low–level source. In addition, by breaking the
conversion into two components, the MAC–MAC conversion layer and the MAC–
CPU conversion layer, conversions through multiple radio hops could remain in
terms of MAC clocks directly, thus avoiding the need to convert through the
higher–error MAC–CPU relation on each hop.
This seemed like a promising plan, but it ran into an interesting, but fatal flaw.
Figure 7.5 describes the performance of the two components of our MAC layer
synchronization experiment. The upper graph shows the result of correlating the
MAC timestamps of broadcast packets received on two nodes (the MAC–MAC
conversion). The lower graph shows the result of correlating the MAC clock
to CPU clock (the MAC–CPU conversion) based on relating interrupt times to
packet timestamps. In both graphs, the x axis is seconds of real time, and the
right–hand y axis shows the residual of each point against the linear fit.
These graphs display some interesting properties. Considering first the graph
of MAC–MAC correlation, we see that it contains several widely–spaced modes,
with a very tight central mode. The syncd linear fitting algorithm automatically
performs outlier rejection, and in this case rejected the other modes as outliers,
leaving only the central mode and a very tight fit of 1.2 µS average error. However,
the fact that many of the spread of outliers is greater than 100 µS is worrisome.
147
The other interesting fact about the MAC–MAC correlation is that the rate is
exactly matched, with a computed rate skew of 44 picoseconds per second. Given
the large outliers, this perfectly matched rate casts more doubt on the validity of
the MAC timestamps.
Now considering the graph of MAC–CPU correlation, we see a very non–linear
behavior. Over the course of 300 seconds, the rates of the two clocks have varied
greatly with respect to each other, resulting in large errors over this averaging
interval. A valid linear approximation of the MAC clock can tolerate averaging
intervals of at most a few seconds. We know that this frequency instability does
not come from the CPU clocks, because our CPU–CPU correlation shown in Fig-
ure 7.4, demonstrates very linear behavior over 1200 seconds. Worse still, we can
see from Figure 7.6 that in addition to having very poor frequency stability, they
also have large amounts of noise, with frequent spikes greater than 50 microsec-
onds. The combination of these two factors is disastrous, since the noise can only
be corrected if we can assume that the clock will remain linear.
We did not perform any experiments to carefully measure these clock prop-
erties against a ground truth frequency source, and we do not know precisely
why the MAC clock performs poorly. One hypothesis is that in ad–hoc mode the
802.11 cards continually sync to each others’ clocks, for example by training the
clock in response to incoming packets. This explains our observation that the
RBS sync among MAC clocks always reported a rate skew of 0, suggesting that
the MAC clocks are keeping their rates synchronized to each other. As an object
lesson, this demonstrates once again the value in the RBS approach, which rather
than relying on features of the hardware, tends to be blind to the implementation
details of the hardware.
148
7.3 Hop–by–Hop Time Conversion
Thus far we have discussed the design and application of our synchronized sam-
pling layer, but we have not discussed in detail how this works in a multihop
network. As we have seen, the synchronized sampling layer provides a network
of time conversions on each node, linking local clocks such as the sample clocks
of the sensors and the CPU clock, and also linking to CPU clocks on neighboring
nodes. However, we have not addressed connections to nodes more than one hop
away.
We could solve the multihop case in two ways. The first possibility would be
to publish the neighborhood conversion information throughout the network, so
that any node could convert from any other node’s clock to its own. This has
the disadvantage of being costly in terms of network traffic, and also does a large
amount of unnecessary work in the event that conversions are not needed. In ad-
dition, since time conversion data does not remain valid forever, new conversions
would constantly need to be flooded throughout the network.
Given these drawbacks, we chose to add hooks into the routing layer that
would convert packets in flight. In other words, when a packet containing a
timestamp is sent by an application, that timestamp is modified at every hop to
convert it into the local timebase. Any packet that is successfully converted can
be forwarded on to its destination. These hooks are implemented in a library
that all routing modules invoke to process packets. The library understands
certain packet types; to implement hop–by–hop conversion for a new application,
the developer need only modify that library to add the appropriate conversion
hook3.
3The Directed Diffusion API [IGE00] provides a more elegant solution to the problem ofmodifying packets at every hop. Our system could take a similar approach, but it is not clearwhether the additional generality would be worth the increase in complexity.
149
This solution has the benefit of using the latest conversion data available
(since it is getting it straight from the source), as well as only doing work when it
is actually needed—when timing information needs to be transferred through the
network. It also is a localized algorithm; since all time conversions are computed
within a neighborhood and since the packet only need travel from the source
to the destination, this solution scales with the number of hops from source to
destination, independent of the overall size of the network.
The gsyncd service mentioned in Section 7.2.1 provides a similar solution to
this problem by pushing out timestamps and converting them at every hop to
compute conversions at every node to a single global time reference. gsyncd
builds a global sync tree from one or more global time references to every node,
minimizing the error in the conversion paths from the root [KEE03].
This solution is a good alternative to hop–by–hop conversion, although in
many cases it would not perform as well as converting along the direct path
from source to destination, because it would involve more conversions. Using
gsyncd, all conversions take the path from the source up the global sync tree
to the branch where the path to the destination joins the tree, and then back
down the tree to the destination. However, using routing with integrated hop–
by–hop time conversion, conversions only occur along the path from the source
to destination, which is often a shorter path than the path through the root. The
other disadvantage of using the gsyncd approach is that it requires the additional
coordination of a global sync broadcaster, without which the system would fail.
150
CHAPTER 8
Multihop Wireless Layer
One of the most challenging aspects of wireless embedded networks is the de-
sign and development of the communications stack. This has been an active
area of research in the field, from ad–hoc IP routing protocols [PB94] [JM96]
[PR99] [CJB01] to a wide variety of protocol stacks designed for sensor net-
works [WTC03a] [IGE00]. The principal problem is that many of the abstrac-
tions that were developed in the context of the wired Internet no longer work
well when applied to wireless embedded systems. Given the relative youth of
the wireless embedded networking field, this leaves us with the difficult task of
developing all of the layers of the stack more or less from the ground up.
8.1 How Wireless is Different
Wireless networks differ substantially from wired networks. First, wireless net-
works tend to have lower capacities than wired networks. At any point in
time, wireless network technology tends to lag behind the equivalent wired tech-
nology in terms of capacity. For example, the current typical wired Ethernet
chipsets range from 100–1000Mb/sec, while typical wireless rates range from 11–
54Mb/sec. In addition, energy conservation is an important part of many em-
bedded wireless applications, and both transmitting and listening has significant
energy cost, thus encouraging the development of more efficient protocols.
151
Second, wireless networks are significantly less reliable than wired networks.
This is true not only in terms of the probability of packet loss, but also the prob-
ability of a failure of contention–avoidance mechanisms such as CSMA. This fact
is one of the primary reasons that the TCP protocol performs poorly over wireless
links: TCP erroneously interprets packet loss as an indicator of congestion and
backs off its transmission rate.
Third, unlike wired networks, wireless networks do not have a pre–defined
topology. Rather, each pair of nodes has some probability of transmission success
in the absence of other traffic, and that probability varies as a function of time.
Some pairs that exhibit a very low probability of success might be considered to
be “disconnected”, but even with low probability some packets will get through,
and in addition that status may change over time. To make matters worse,
links are not always symmetric. Hardware differences and asymmetries in the
noise environments can yield links with radically different loss rates forward and
reverse. There are many characterizations of wireless networks in the literature,
such as [CWP05] [ZG03] [WTC03b] [CAB03].
This means that, relative to wired networks that for the most part make
routing decisions purely on the basis of whether a link is up or down, routing
protocols in wireless networks have to:
• Continuously estimate the characteristics of links to neighbors, while dis-
counting the effects of collisions.
• Continuously agree upon a multihop topology.
• Route data as required by the application, while minimizing overhead.
Building such a system is a daunting task, because to make it work requires a
vertical solution that addresses all of these problems at once. This task is made
152
more difficult because much of the prior work in existing layered protocol designs
does not apply.
In this work, to narrow the scope of this problem we have attacked it from the
perspective of building StateSync, an efficient vertical protocol implementation
that provides a simple publish/subscribe API. In the process of developing several
implementations of this protocol, we learned a little more about the shape of a
more general protocol stack for wireless networks. We discuss this work in the
remainder of this Chapter.
8.2 The StateSync Abstraction
As the field of embedded networked sensing matures, useful abstractions are
emerging to satisfy the needs of increasingly complex applications. As a part of
the implementation of our acoustic position estimation application, we developed
StateSync, an abstraction for reliable dissemination of application state through
a multihop wireless network.
The StateSync layer presents a publish/subscribe interface to a set of application–
defined tables. The contents of these tables are reliably and efficiently broad-
casted a specified number of hops away, using a protocol that is robust to changes
to the network topology and changes in the receiver set. StateSync conforms to
a minimal consistency model for received values published by a single node, but
does not attempt to guarantee consistency between received values published by
different nodes. Using StateSync, the complexity of the multihop wireless net-
work is reduced to processing a gradually evolving set of table entries, subject to
certain minimal consistency checks.
153
8.2.1 Application Requirements
Embedded networked sensing applications inherit a long list of application re-
quirements that are more or less unique among distributed systems. The main
distinguishing characteristic is a high degree of dependence on the environment,
in the face of dynamic conditions and a limited capability to discover environ-
mental properties with certainty. Properties of the environment often affect both
system performance and the application’s objectives, and thus must be estimated
to achieve the system’s goals. These issues are at the heart of the design of suc-
cessful system components for embedded networked sensing applications.
We designed StateSync to extend the ideas of previous abstractions [WSB04]
and protocols [LPC04] to support of a specific class of applications. These appli-
cations have the following properties:
• Reliable delivery greatly simplifies the design of the application.
• A relatively large amount of data is shared, and freshness of the data is
important, including assurance that the publisher of data is still active.
• The data being shared exhibits low “churn”, meaning that the expected
lifespan of a data element is long compared with the system latency re-
quirements.
Our acoustic position estimation system is a good match to these properties.
Our system needs to disseminate range estimates throughout the network in or-
der to fuse them into a coordinate system. These range estimates tend to stay
constant as long as the nodes do not move, but might change drastically in the
event that a node is disturbed. Reliability is important for this application, be-
cause after a change to the position of one of the nodes, inconsistent or stale data
154
can present problems for the multilateration algorithm. The range data in this
application tends to have long lifespans, often going for hours or days without
modification. When modifications do occur, they often affect only a small frac-
tion of the data being published by a given node. Despite these long lifespans, low
latency is desirable, because additional latency in the propagation of updates di-
rectly affects the application–level performance of the position estimation system
by delaying position updates.
Building applications over the StateSync abstraction not only greatly sim-
plifies the implementation of applications, but also provides opportunities for
efficiently aggregating application state changes. Other examples of services and
applications that can benefit from this type of layer are routing protocols, con-
figuration and calibration mechanisms, and membership agreement protocols.
This work is similar to prior work in the wired network domain, including
ISIS [BC91], SRM/WB [FJL97], and implementations of Linda [GB82]. However,
these techniques make assumptions about Internet and LAN performance and
connectivity properties that do not hold for ad–hoc wireless networks. Our work
is designed to provide similar abstractions, but the protocol and implementation
is designed from the ground up for embedded ad–hoc wireless networks.
8.2.2 The StateSync Abstraction
The StateSync abstraction defines the data model, the API, and the semantics
of StateSync. StateSync imposes a simple data model of typed key–value pairs.
The data types are user–defined and can either specify a fixed record and key
length, or use variable record and key lengths. The key–value pairs are implicitly
annotated with a flow ID that includes a unique address for the publisher and
other user–definable fields. This additional implicit key effectively assigns each
155
Node A Node B
StateSync
Publish
Subscribe
StateSync
Publish
Subscribe
Node A Node B
StateSync
Publish
Subscribe
StateSync
Publish
Subscribe
StateSync
Publish
Subscribe
StateSync
Publish
Subscribe
Figure 8.1: Publisher applications push tables of key–value pairs to StateSync, which
disseminates them and delivers the complete table of all received keys to subscribers
whenever a change occurs.
publisher an independent key–space. At most one value is permitted per key:
when a pair is published with the same key, type, and flow ID as an existing pair,
the original pair is replaced.
The StateSync API presents a Publish/Subscribe interface. A publisher pro-
vides StateSync with a complete set of keys for a given type and flow ID to replace
all existing keys for that type and flow ID. A subscriber will receive events when-
ever there is any update in the data matching a specified type. The complete
data matching that type can be retrieved from StateSync, combined from all flows
that reach the subscriber. Each key–value pair is annotated with the flow ID of
the publisher of that data, as well as other metadata such as the arrival time and
the distance to the publisher in hops. For fixed–length records, simple arrays of
records are passed to and from the API. Figure 8.1 shows a block diagram of the
StateSync API.
The StateSync mechanism provides semantics that are designed to be relaxed
156
enough to be implemented efficiently in a wireless network, while still maintaining
useful properties. The StateSync subscribe interface presents only the most recent
state to a subscriber; it does not present each intermediate published state. This
policy eliminates the need to retain a backlog or a complete history in the event
of lengthy disconnection. In addition, StateSync guarantees that each state pre-
sented at a subscriber was in fact an actual prior state of the publisher. That is,
the view at the subscriber is never a partial state of the publisher (such as would
occur if a sequence of updates were played out of order). Third, the latency with
which a state propagates from publisher to receiver conforms to a probabilistic
latency bound that is a function of the number of hops, the size of the transfer,
and timers in the implementation. StateSync deliberately relaxes any guarantee
of consistency across disparate publishers. Consistency is guaranteed across the
set of receivers of a given published state, after no change has occurred for the
expected latency bound for the farthest node.
8.2.3 Related Work
The design of StateSync builds on the observations and experience of many past
and present systems in sensor networks. The importance and value of a neighbor-
hood abstraction was clearly laid out in the discussion of Hood [WSB04]. Hood
provides a way to approach several important concepts about neighborhoods,
and provides a best–effort transport layer. StateSync provides a similar API to
Hood, but extends its scope by defining a model that includes reliable delivery
over multiple hops. The Hood and StateSync solutions in some ways address
orthogonal application properties. Whereas Hood is designed to share ephemeral
data in a best–effort fashion, StateSync is designed to share long–lived data with
very low quiescent cost. Each of these solutions advances a significant space of
157
applications.
Relative to much prior work that present very generalized solutions to prob-
lems in distributed systems, StateSync defines a narrower set of properties, which
nonetheless represent a large application space. The StateSync API draws upon
prior experience with Publish/Subscribe interfaces in the context of Directed
Diffusion [IGE00] [HSE03] and other early work in Sensor Networks. However,
StateSync imposes more structure than a simple raw data interface, providing
an interface supporting application–defined fixed–length tables. The StateSync
data model of typed key–value pairs draws on experience with Tuple space sys-
tems such as Linda [GB82]. However, StateSync relaxes most of the locking and
group consistency semantics, because group consistency is generally too heavy–
weight for the wireless networks StateSync is designed to support. The StateSync
implementations build upon Diffusion Trees and upon work in reliable multi-
cast [FJL97], but encapsulate most of the protocol details behind an interface
that is fairly implementation–independent.
StateSync’s focus on maintaining a low quiescent cost of state synchroniza-
tion bears much resemblance to the Trickle [LPC04] protocol for code update on
TinyOS motes. In implementing our algorithms, we focused on low latency oper-
ation, efficient support for many concurrent publishers, and prompt detection of
the disappearance of a publisher. Trickle is designed for higher latency tolerance,
and while Trickle can support multiple trees, the costs scale with the number of
trees. The “polite gossip” mechanism of Trickle is a very effective way to reduce
quiescent cost of maintaining state, but unfortunately the savings is incompatible
with detecting source disappearance.
158
8.3 Variants of StateSync
In our exploration of the StateSync abstraction, we developed several variants
of varying complexity and with different performance characteristics in terms of
latency and network traffic. Since each variant conforms to a common API, we
can readily compare them in the context of different applications.
In this section we present three StateSync variants, in increasing order of
sophistication: SoftState, LogFlood, and LogTree. SoftState is a very simple im-
plementation based on periodic re–flooding of the complete state with no retrans-
mission mechanism. LogFlood introduces a log mechanism to enable publication
of updates to existing state and implements a local retransmission protocol, while
using a flooding mechanism to push data with low latency. LogTree introduces
an overlay network consisting only of the most reliable bidirectional links, and
forms distribution trees via that overlay. These variants are discussed in more
detail in the following sections.
8.3.1 SoftState
SoftState implements a periodic refresh of the complete state published by each
node. Each refresh is transmitted via a best–effort flooding service and is received
by nodes a specified number of hops away. If the complete state is larger than
a single MTU, the message is fragmented and reassembled across each hop. No
other form of reliability is implemented, so as the state size grows the latency of
SoftState increases rapidly. The latency of updates is a function of the refresh
interval and of the probability of message loss, which is in turn a function of total
state size.
SoftState is a very simple variant of StateSync with numerous drawbacks—
159
for example, its quiescent cost is high for most applications. However, it is
sufficient for some applications, and it can be readily implemented on low–end
platforms. An application that publishes only small amounts of data and can
accept the bandwidth / latency tradeoff can use this protocol. SoftState is also
appropriate for applications with high “churn” relative to latency requirements.
If the expected lifetime of the data being published is on the order of the required
refresh interval, then there is little to be gained by transmitting only the portions
of the state that have changed.
8.3.2 LogFlood
The LogFlood variant introduces two important mechanisms that enable higher
efficiency and allow StateSync to be applied to a much larger space of applications.
The first is a log mechanism that stores and transmits published data in the
form of a log of additions and deletions of key–value pairs. This log enables the
data to be broken down into small segments and transmitted and re–transmitted
piecemeal. The second is a local retransmission protocol that can request missing
segments from a neighbor based on sequence numbers. In the following sections,
we will show that these two mechanisms enable much larger amounts of state to
be transmitted efficiently.
8.3.2.1 The StateSync Log Scheme
As we have described in Section 8.2, StateSync is based on a key–value data
model and the API is tuned to support tables of fixed length key–value pairs.
These design decisions fit neatly into a log–based transport scheme, because they
enable the application to define the granularity at which changes typically occur,
and specify precisely which parts of the existing state need to be re–transmitted.
160
The StateSync log scheme is designed to provide correctness with low overhead
and to support a continuous stream of log entries. The StateSync log is composed
of a sequence of variable–length entries containing a 16–bit sequence number and
a command field. The first entry is always an INIT command, and has sequence
number 0. The INIT message contains a 64–bit log sequence number that is
chosen randomly by each node on boot and is incremented whenever a new log is
created. This sequence number is used to protect StateSync against inconsistency
from reboots or stale data.
Following the INIT command, a sequence of ADD and DEL entries represent
the addition and deletion of keys. An ADD entry adds a new key and value
to the state published by a given node, replacing any previous entry with the
same key. A DEL entry removes an existing key and value from the published
state. Additional command types are used to fragment large entries that might
otherwise exceed the network MTU.
Unlike protocols like TCP that use byte ranges, sequence numbers in a StateSync
log are assigned at the granularity of log entries. The reason for this design
choice is two–fold. First, sequencing at a larger granularity reduces the required
size of the sequence numbers, and thus reduces protocol overhead. Second,
by always transmitting whole entries rather than byte ranges, the log entries
can be processed by the application out of order, as in application layer fram-
ing [FJL97]. The drawback of this scheme is that, unlike the case of IP fragmenta-
tion, StateSync log entries cannot be adaptively fragmented “in flight”. Instead,
a predefined granularity must be selected at design time, taking into account the
MTU of the networks in the system and the expected size of the values published
by the application. While the choice of granularity can impact the utilization of
packets, in practice we have been able to use a single default value for all of our
161
INIT2367
Checkpointed Log
ADDx=1
ADDy=4
DELx
ADDy=6
TERM
INIT2368
Active Log
ADDy=6
ADDz=3
ADDw=9
DELy
ADDz=3
….
Redundant EntriesEntries Carried Over
ADDs=6
Active Entries
INIT2367
Checkpointed Log
ADDx=1
ADDy=4
DELx
ADDy=6
TERM
INIT2368
Active Log
ADDy=6
ADDz=3
ADDw=9
DELy
ADDz=3
….
Redundant EntriesEntries Carried Over
ADDs=6
Active Entries
Figure 8.2: The StateSync Log Scheme maintains a checkpointed and and an active log.
In the diagram, the first two ADD entries in the active log are carried over from the
checkpointed log after the redundant entries have been compressed out.
development.
The other key design problem for the StateSync log mechanism is how to
address the problem of an infinitely growing log. While ADD and DEL commands
often make a previous log entry redundant, those redundant log entries cannot be
deleted without forfeiting the semantic requirement that StateSync subscribers
always see a valid past state of the publisher. In addition, as state changes
occur, an increasing fraction of the sequence space will be consumed by redundant
entries. Given StateSync’s relatively small 16–bit sequence numbers, this can lead
to sequence number exhaustion. To address this we apply a solution similar to
the “new page” abstraction implemented by the WB application[FJL97].
Each StateSync log maintains two sub–logs: a checkpointed log and an active
log, as shown in Figure 8.2. New additions to the log are always appended to the
active log. When certain conditions are met—such as a maximum level of redun-
dancy in the log—the active log is “checkpointed”. A special TERM command is
162
appended to the active log, and it is rotated into the checkpointed slot. A new
active log is formed by incrementing the log sequence number and compressing
the previous active log, renumbering the entries starting from sequence 0.
The checkpointing process addresses the problem of infinite logs at minimal
cost. The only cost of the scheme is an additional TERM entry; once the ter-
minated log is received completely, the checkpointing process is a local opera-
tion that does not require any additional network traffic. As an optimization,
StateSync will queue out–of–order entries that pertain to the new active log be-
fore checkpointing is complete.
8.3.2.2 The StateSync Retransmission Protocol
Once the state data is organized in a sequenced stream of small blocks, we can
implement a local retransmission protocol. Similar to many reliable multicast
protocols, StateSync’s retransmission protocol is receiver–driven with proactive
broadcast as an optimization [FJL97]. Receivers add received entries to their logs
and maintain state about which log entries are missing based on sequence num-
ber gaps. Receivers then schedule NACK requests for specific missing sequence
ranges, with an initial delay followed by an exponential backoff. Optimizations
such as NACK suppression and more sophisticated timers such as in [FJL97] are
not currently implemented.
The wire protocol used by StateSync is designed to be efficient in terms of
network usage and flexible in terms of packet structure. The packet format does
not have a pre–defined header structure, but rather is composed of a series of
variable–length entries, similar to other proposed wire formats [GKE04] [BFH03]
[IGE00]. As a result, this flexible structure exhibits lower overhead and is also
more amenable to piggybacking on other traffic. The wire protocol incorporates
163
numerous optimizations, such as the ability to define a length field that applies
to several subsequent log entries, or a sequence number that applies to several
subsequent NACKs. For example, the overhead of sending 20 sequential range
entries in our acoustic localization application is 25 bytes beyond the 400 bytes
of data.
8.3.2.3 LogFlood Multihop Implementation
Given the log and protocol mechanisms described above, the LogFlood multihop
implementation is straightforward. First, the retransmission protocol is extended
to include the flow–id and the current hopcount of the successive data. The flow–
id identifies the publisher–subscriber pair and any additional de–multiplexing
bits. In this case, the publisher is identified by a network–layer address and the
subscriber is always “broadcast N–hops”. Because of the flexible structure of the
wire protocol, entries from multiple flows can be packed into a single message.
With this minor change, a simple state machine can implement the multihop
flooding protocol. Incoming messages are parsed to extract the flow they pertain
to, the hopcount, and the log entries comprising the data. Any messages that
are not already present in the log and that are not beyond the maximum desired
hopcount are scheduled for retransmission. The hopcount of a flow is determined
by recording the lowest hopcount of incoming messages on that flow, and adding 1.
When the next transmission is scheduled, any outgoing entries are concatenated
with their flow–id’s and hopcounts into a single packet and broadcast out to
neighbors.
This simple state machine, in addition to the local retransmission protocol,
implements an efficient many–to–many flood that can piggyback floods from dif-
ferent sources onto the same packets. However, it is not guaranteed to be reliable;
164
3256S
3256N
3286 3420 344034363428
Hallway
3256S
3256N
3286 3420 344034363428
Hallway
Figure 8.3: A screen shot from EmView displaying the wireless testbed deployed in our
building. The scale of the map is 5 meters per grid square.
if the last packet is lost, the retransmission protocol cannot discover that there is
a sequence number to NACK. To solve this issue, LogFlood also floods a periodic
refresh message, beginning a fixed time after the last new log entry was flooded.
These messages are small and can be piggybacked as described above, but still
represent a significant quiescent transmission overhead. They also place limits on
join latency. The quiescent cost scales roughly as nk where n is the total number
of nodes and k is the number of nodes in the flood radius, The join latency is
determined by the refresh rate.
8.3.3 LogTree
The LogTree variant builds on the log scheme and local retransmission protocol
described in Section 8.3.2. However, where LogFlood used a flooding protocol for
proactive dissemination and end–to–end reliability, LogTree implements a distri-
bution tree for each publisher in order to reduce redundant transmissions without
significantly impacting latency. LogTree also reduces the quiescent cost of the re-
liability mechanism to 1 message per node per refresh interval, compared with
k messages per node for LogFlood. To accomplish this, LogTree introduces an
underlying layer called ClusterSync.
165
8.3.3.1 ClusterSync
The ClusterSync mechanism serves two functions. First, it estimates the topology
of the network and constructs an overlay network consisting only of links that
meet certain criteria. Second, it provides a single–hop version of StateSync, with
the same API and semantics.
To form the overlay, ClusterSync uses a link estimator and periodic beacons to
discover the topology of the network and to continuously estimate link quality. It
uses a link estimator called RNPLite that consumes one additional byte of over-
head per packet and computes link estimates based on the principles in [CWP05].
ClusterSync uses the link estimates to select links for the overlay that meet cer-
tain criteria, including bidirectionality, a minimum link quality metric, and a
connectivity metric that prefers neighbors with distinct neighbor sets.
The single–hop version of StateSync uses the same log scheme and retransmis-
sion protocol as other versions of StateSync. End–to–end reliability is achieved
by each node periodically including its latest sequence number in the beacon
message it sends for link estimation. When other ClusterSync traffic is present,
beacon messages and sequence numbers are piggybacked on existing traffic.
The ClusterSync mechanism has many advantages that pertain to applica-
tions. Many applications benefit from ClusterSync’s stable overlay network and
prompt detection of topology changes. While topology information is not always
necessary to the correctness of an application, it often simplifies the application
and results in greater responsiveness. The stable overlay also presents a more
stable definition for the hopcount used to limit the scope of state dissemination.
From an application’s perspective, it is often more important that the receiver set
be stable than that they be a specific “distance” away. ClusterSync also provides
an efficient way to reliably disseminate state variables to immediate broadcast
166
neighbors. ClusterSync will provide the greatest performance improvement to
applications that need to publish keys with long lifespans, since the up–front cost
of reliable transfer is amortized by higher efficiency during quiescent periods.
8.3.3.2 LogTree
LogTree is a multihop StateSync variant that builds distribution trees to yield
a performance improvement over LogFlood. It builds its trees in the overlay
topology constructed by ClusterSync, and uses ClusterSync to publish routing
and flow metadata.
LogTree implements a distance vector algorithm that computes a route and
a number of hops back to each publisher. The route to each publisher is used
to select a peer for requesting local retransmissions and the hopcount to the
publisher is used to determine whether or not to proactively forward new data.
Each node also advertises its “preferred” upstream peer for transmission, which
is used to prune the proactive distribution tree. All of this routing metadata (i.e.
flow ID, hopcount, and preferred upstream peer) is published to adjacent nodes in
the overlay network through the ClusterSync mechanism. Because ClusterSync
is reliable, LogTree only needs to process updates from ClusterSync and keep
pushing its most recent routing state back to ClusterSync. The ClusterSync
layer handles all of the complexity of message loss and of timing out stale data
and stale neighbors.
LogTree implements end–to–end reliability using a similar mechanism. In ad-
dition to the other per–flow routing metadata, a log sequence number is published
via ClusterSync. This sequence number propagates along with the distance vector
messages to inform all nodes of the most recent sequence number published by
the source node. In order to limit the traffic pushed through ClusterSync, LogTree
167
sets a 5 second holdoff timer after each new data element is pushed before push-
ing a new sequence number out via ClusterSync. This information enables nodes
to request retransmissions in the event that the most recent message of the log
was lost.
8.3.3.3 Optimizations to LogTree
Our experiments with LogTree show that it outperforms LogFlood and SoftState
in terms of total volume of data transferred, and does not suffer that much in
terms of latency (see Section 8.4). However, in order to achieve these results we
implemented two optimizations: flooding mode and flow–ID compression.
Flooding mode addresses the startup latency of ClusterSync and of building
distribution trees. The original LogTree implementation suffered latency problems
in the event that the overlay network had not yet formed, or when the distribution
tree for a particular source had not yet been constructed. To address this, we
modified LogTree to proactively flood messages in cases where hopcount was
not yet reached and neighbors were observed that did not report an active tree.
This optimization achieves similar latency to LogFlood, while only incurring the
bandwidth penalty as the distribution tree is still being constructed.
Flow–ID compression is an optimization that allows the routing metadata to
scale better as the number of distribution trees grows. Each node defines a dictio-
nary that locally maps flow–IDs to small integers, and publishes this dictionary
through ClusterSync. This enables a full 12–byte flow–ID to be replaced by a
1–byte nickname, reducing the size of published route metadata and reducing
the size of headers on data messages that pertain to a given flow. This technique
might be applied to other nicknaming problems, although it can increase join
latency as the complete dictionary must be replayed to new neighbors.
168
8.4 Benchmarking StateSync
In this section, we describe how we measure the performance of our StateSync
variants, both in a set of benchmark tests, and in the context of running appli-
cations.
8.4.1 Metrics and Experimental Setup
Our criteria are primarily focused on two metrics: the distribution of latency
in state propagation, and the network traffic incurred by our mechanisms. The
latency is determined by logging the activities of the application or benchmark,
matching up publish states with subscribe states, and logging the time lag. Net-
work traffic is determined by measuring the number of bytes and packets that
pass through the network interface, and in some cases by measuring statistics
gathered directly from the mechanisms.
Our measurements were taken from simulations and tests on a wireless testbed.
The testbed experiments were run from a centralized server with remote connec-
tions to a set of 12 802.11 radios hosted by Stargates distributed throughout
our building, as shown in Figure 8.3. The simulations were run within the Em-
star [GEC04] environment on a typical workstation. Simulations of the Localiza-
tion and Sink Tree applications were also run with a larger, 50–node topology.
For validation purposes, we also ran simulations using the same topology as the
testbed experiments, and found that the differences were negligible.
8.4.2 Benchmark Tests
In order to characterize the abstract performance of our different mechanisms,
we ran a series of benchmarks. The results of those benchmarks are shown in
169
0
2000
4000
6000
8000
10000
12000
1_chunk
4_chunks
16_chunks
64_chunks
Tot
al N
umbe
r of
Pac
kets
Packets TX
LogFlood_1_senderLogFlood_12_sendersLogTree_1_senderLogTree_12_senders
0
500000
1e+06
1.5e+06
2e+06
1_chunk
4_chunks
16_chunks
64_chunks
Tot
al N
umbe
r of
Byt
es
Bytes TX
LogFlood_1_senderLogFlood_12_sendersLogTree_1_senderLogTree_12_senders
0.1
1
10
100
1_chunk
4_chunks
16_chunks
64_chunks
Med
ian
Late
ncy
(sec
)
Latency
LogFlood_1_senderLogFlood_12_senders
LogTree_1_senderLogTree_12_senders
Figure 8.4: Results of benchmark tests on the testbed. Each grouping of bars represents
four 20–minute experiments in which 64K of data is published in a fixed number of
chunks, issued at regular intervals.
170
Figure 8.4. These benchmarks are intended to measure the differences in per-
formance between the LogFlood and LogTree variants when driven with simple
workloads. Each experiment lasted 20 minutes, and published 64K of data via
StateSync, evenly distributed among the publishers in that experiment. The only
difference from one experiment to the next was the distribution of the data in
time (i.e. when it was published) the number of nodes involved in publishing.
In the first set of experiments, only one node published data and we varied
the number of “chunks” the data was broken into. Each chunk was published at
a uniform division of the 20 minutes. From Figure 8.4, we can see that LogTree
always sends fewer bytes and generally achieves comparable latency.
One peculiar feature of the Bytes Transmitted graph is the fact that the 1
sender, 16 chunks case for LogFlood is so much higher than the other cases. This
is caused by additional retransmissions that occurred in that case. Apart from
that case, we see that in general for one sender the performance of LogFlood
appears to approach that of LogTree as the frequency of updates increases. This
occurs because LogTree gains the most ground in quiescent periods when updates
are not occurring, but the current state is being refreshed. As the frequency of
updates increases, the length of the quiescent periods decreases and LogTree loses
its advantage. In contrast, LogFlood performs best in the cases where quiescent
periods are short. The 1 sender, 64 chunk case performs almost optimally because
the data transmissions are spaced out such that collisions are unlikely and the
ratio of refresh messages to data messages is the minimum of all of our test cases.
In the second set of experiments, all 12 nodes in the network published at each
interval, dividing the same total amount of data among them. With 12 senders,
both variants incur greater traffic cost. LogFlood consistently sends more data
than LogTree, although LogFlood sends fewer packets (and per–packet overhead is
171
First 600s Last 600sBytes Packets Bytes Packets
LogFlood 979604 2074 108874 1283LogTree 628467 2157 22231 830
Table 8.1: Packet and byte counts for LogTree and LogFlood for 1 chunk, 12 senders,
at 600 seconds and 1200 seconds.
not included in the byte counts.) This occurs because the current implementation
ClusterSync sends its own independent packets rather than piggybacking them
on other traffic. Since each followup refresh in LogTree involves a change to the
data published through ClusterSync, this substantially increases the cost in terms
of packets, although these additional packets carry very small amounts of data.
This problem can be addressed in two ways: by piggybacking ClusterSync and
LogTree packets, and by changing the timing of ClusterSync packets to induce
more aggregation of data into each packet.
Increasing the number of senders magnifies the overhead incurred while the
data size being published remains the same. Relative to LogFlood, LogTree saves
overhead in three different ways. First, LogTree performs more efficiently during
quiescent periods. Table 8.1 shows the performance of LogTree during the quies-
cent period after the 64K data is transferred in the 1 chunk, 12 sender case. In
the latter 600 seconds of the experiment, LogFlood transmits 5 times more bytes
and 50% more packets than LogTree.
Second, LogTree can represent refresh messages and packet headers more ef-
ficiently because it can compress the flow IDs to single byte nicknames. In our
experiment, as the number of senders and chunks grows, the number of packets
grows as the product of senders and chunks.
Third, because it follows the broadcast distribution tree, LogTree transmits
172
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01 0.1 1 10 100 1000
Fra
ctio
n
Latency (sec)
CDF of Publication Latency by hopcount, using LogTree
1 Hop2 Hop3 Hop4 Hop5 Hop6 Hop
Figure 8.5: The latency distribution, broken down by hopcount.
fewer times than LogFlood. Even though our test topology does not lend itself to
efficient distribution trees1 we see that the total number of bytes transmitted by
LogTree beats the minimum possible byte count required for flooding (786KB) in
the 1 and 4 chunk, 12 sender cases.
This benchmark data also provides us with some idea on how to model latency
as a function of the amount of data being pushed and the number of hops. From
Figure 8.4, we see that the latency scales roughly linearly with the size of the
input data. This result was expected, given the various forms of rate limiting
implemented in the local retransmission protocol. In addition, Figure 8.5 shows
that the distribution of latencies also increases as a function of the number of hops
from the publisher. The bimodal distribution in latency reflects the probability
1The long and narrow shape of our ceiling topology yields less redundancy than a gridtopology.
173
of a loss that results in additional delay before the 5 second holdoff timer expires
and the new sequence number is pushed.
8.4.3 Determining Application Suitability
Latency and traffic consumed are a good starting point for determining whether
StateSync is helpful to an application. StateSync is most appropriate to applica-
tions that need notification when state is stale or when the source of some data
has disappeared. In cases such as these, epidemic protocols are not appropriate
because they will mask stale data; the only possible solution is some kind of re-
fresh mechanism. StateSync provides a reliable transport that protects a large
collection of state variables with a single aggregate refresh.
To quantify an application’s needs we characterize the application using two
metrics: the application’s specific latency requirements, and the level of “churn”
in the application’s data, defined by the expected lifetime of a key–value pair.
If the expected lifespan of application data is low enough compared with the
required latency bound, then a simple periodic refresh may be cheaper than the
expected cost of a reliable transmission protocol. However, if the lifespan of
application data is likely to be much longer than the latency with which stale
data is to be detected, then the additional overhead of a reliable protocol is
justified. This argument holds true to an even greater extent in cases where the
quantity of data being refreshed further increases the cost of refresh.
Figure 8.6 shows the distribution of key lifetimes we observed when the system
was driven by our Position estimation application. In our study of the overall
performance of this protocol, we also studied the performance when used in two
other applications. While we do not discuss those results here, they can be found
in [GLP05].
174
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
Pro
babi
lity
Key Lifetime (sec)
CDF of key lifetimes in seconds for Acoustic Localization application
Acoustic Localization Churn
Figure 8.6: The distribution of key lifetimes for our position estimation application.
The mean key lifetime is 1506 seconds ±121.
8.5 Applying StateSync to Position Estimation
We have already discussed many layers of our Position Estimation system, in-
cluding the Time–of–Flight ranging layer, the Multilateration layer, and the Syn-
chronized Sampling layer. The StateSync collaboration primitive is the final piece
that lets the whole system fit together. In this Section we show how we put all
these pieces together and how we leverage StateSync as a simplifying primitive.
In our system, the range estimates and angle estimates determined in Chap-
ter 3 using the synchronization infrastructure described in Chapter 7 are reliably
published to all nodes within some maximum hopcount. GPS–derived or stati-
cally configured information about survey points are published from the individ-
ual survey points using the same mechanism. Each node therefore has all the raw
175
data and can fuse it using the multilateration algorithm described in Chapter 4 to
estimate its position relative to the other nodes. This algorithm may determine
that additional ranging information is required in order to tie itself in to the map.
In that case, the multilateration algorithm triggers additional local experiments
to obtain additional improved estimates.
8.5.1 Applying the StateSync Model
The reliability and consistency model of StateSync is used to ensure consistency
in the datasets that are fed to the multilateration algorithm. In the event that a
node is rotated or moved, the ranges and direction estimates relating to that node
are no longer valid. This in itself is not a serious problem, as it will only result
in estimating the node’s location as its last location. However, if further ranging
experiments lead to a mixture of old and new range and orientation estimates,
these inconsistencies are likely to cause the multilateration algorithm to fail.
In our application, this problem is addressed using a per–node “orientation
sequence number” that is incremented each time the node moves or otherwise
invalidates its ranges. The ranging component indicates its current orientation
sequence number when it requests peers to range to it. This enables nodes that
receive acoustic range signals to annotate their published estimates with the
sequence number that was in effect at the acoustic sender at the time that the
experiment occurred. Published estimates are also annotated with the publisher’s
sequence number, indicating that those estimates are relative to their current
position. Whenever a node increments its sequence number, it deletes all ranges
it had previously published, and then publishes its new sequence number.
In spite of StateSync’s relatively loose consistency semantics, this protocol en-
ables the multilateration component to maintain a consistent dataset. To main-
176
tain consistency, the multilateration component records the current sequence
number published by each node, and all published data annotated with other
sequence numbers is ignored. The only exception is for range notification mes-
sages that arrive with a subsequent sequence number: as an optimization, these
messages are processed and published ahead of the arrival of a sequence num-
ber update from the source node. Because StateSync’s semantics guarantee that
states from an individual publisher arrive in sequence, the table received from
a node can never itself contain inconsistent sequence numbers, and StateSync
reliability guarantees that the table update will occur within a probabilistic time
bound.
8.5.2 StateSync Simplifies the System Design
This architecture simplifies the system on a number of levels. First, the applica-
tion itself does not need to implement any form of coordination. The Ranging
subsystem responds to local triggers from the Multilateration subsystem, and
publishes the data via StateSync The Multilateration subsystem receives data
from StateSync and attempts to fuse it together. If the Multilateration does not
succeed in integrating that node into the system, or if new nodes appear that it
might range to, it triggers a local ranging experiment by commanding the Rang-
ing layer. Randomization is used in place of explicit scheduling of the ranging
experiments, de–synchronizing the ranging process.
Alternative solutions to this problem rapidly become more complex. Even if
explicit coordination is done, there is still the possibility that a collision occurs.
Further, if the explicit coordination requires a “leader”, then there are additional
problems of choosing a leader, and of handling cases where the leader fails or
becomes temporarily or permanently unavailable. There may be many reasons to
177
implement a leader–based approach: using a leader and explicit coordination the
ranging process can be done more quickly and only the leader need implement the
multilateration algorithm. However, it is interesting that using these primitives
this more complex approach is not necessary. Should a more coordinated scheme
be implemented, the StateSync primitive would still be helpful, similar to the
way Tuple Spaces such as Linda [GB82] are used. We discuss this possibility in
more detail in Chapter 12.
The StateSync reliability layer offloads much of the complexity of the wire-
less network from the application. StateSync handles all of the difficulties of
retransmission and recovery from message loss, while leveraging broadcasts to
efficiently distribute the data to multiple nodes. Nodes that join the network
after most of the other nodes have been running will automatically download the
complete set of published ranges when they join. Nodes that lose connectivity
to one part of the network and later rejoin to another part will see a smooth
transition. Their data will be re–broadcast only if the part of the network where
they rejoin has not already received that data. Data published by nodes that
reboot will automatically be marked as stale and removed.
Because the signal processing and estimation components of our Position esti-
mation system are already fairly complex, the StateSync abstraction is a powerful
tool. Using StateSync, the complexity of the multihop wireless network is reduced
to processing a gradually evolving set of table entries, subject to certain minimal
consistency checks.
178
8.6 Performance of StateSync for Position Estimation
The Position Estimation application is well suited to the properties of StateSync.
Typical large deployments of this type of system yield between 10 and 20 ranging
pairs per node [MGS04]. If the system performs multiple trials for each range, this
results in approximately 2–5KB of published data per node. The “churn” graph in
Figure 8.6 shows that these ranging records tend to have long lifespans, meaning
that simple soft–state refresh approaches will be costly over time compared with
mechanisms that can reliably cache the data. At the same time, low latency is
desirable because the latency in state update directly affects the length of time
that the system operates with incorrect position information, which in turn could
lead to sensing and actuation based on inaccurate location estimates.
To test the performance of StateSync when used in our localization appli-
cation, we ran several tests with different variants of StateSync underneath our
application. We ran each test for 1 hour on our wireless testbed, and for 1 hour in
the simulator, running a 50 node grid topology. The ranging process was primar-
ily driven by the multilateration algorithm, which would cause 3 range requests
in rapid succession, followed by an exponential backoff. In addition, we simulated
three of the nodes “moving” by forcing them to invalidate their range information
at three particular times. These invalidations result in a burst of ranging activity
and cause the backoffs to be canceled.
Figures 8.7 and 8.8 show the results of testing our localization application
using different variants of StateSync. The graphs show two types of information:
the cumulative bytes transmitted throughout the network as a function of time,
and the distribution of latency in a published state arriving at a subscriber.
From the graphs we see that in terms of bytes transmitted, LogTree performs
179
10
100
1000
10000
100000
1e+06
1e+07
1e+08
1 10 100 1000 10000
Cum
ulat
ive
Tra
ffic
(byt
es)
Time (sec)
Cumulative Network Traffic in Bytes for Position Estimation application
SoftStateLogFloodLogTree
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01 0.1 1 10 100 1000
Pro
babi
lity
Latency (sec)
CDF of Publication Latency for Position Estimation application
SoftStateLogFloodLogTree
Figure 8.7: Results of tests of our Position Estimation application from our 12 node
testbed. The latency graphs show a CDF of latency in seconds. The curve for LogTree
shows some initial traffic in setting up the ClusterSync trees before the start of data
traffic.
180
100
1000
10000
100000
1e+06
1e+07
1e+08
1e+09
1 10 100 1000 10000
Cum
ulat
ive
Tra
ffic
(byt
es)
Time (sec)
Cumulative Network Traffic in Bytes for Position Estimation application
SoftStateLogFloodLogTree
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.001 0.01 0.1 1 10 100 1000 10000
Pro
babi
lity
Latency (sec)
CDF of Publication Latency for Position Estimation application
SoftStateLogFloodLogTree
Figure 8.8: Results of tests of our Position Estimation application from a 50 node
simulation. The mean latency for LogTree is 31.54± 0.58; for LogFlood is 14.33± 0.12.
181
better than LogFlood, both in terms of the amount of overhead during trans-
fers as well as in the rate of traffic during quiescent periods. During transfers,
LogTree’s pruned distribution tree provides significant savings over flooding, es-
pecially as the size of the network grows to reach the maximum hopcount of the
published flows. During quiescent periods, LogFlood’s periodic re–flood of the
latest sequence number is considerably more costly than the ClusterSync beacon
traffic that refreshes the ClusterSync sequence number (that in turn protects the
LogTree per–flow sequence numbers).
The latency graphs show that LogTree has twice the expected latency of
LogFlood. This can be explained by a number of factors, including higher hop-
counts on average than in the flooding case, and lower redundancy when data
is sent via the distribution tree. The “knee” at 5 seconds in the data from the
small network is caused by message loss combined with the 5 second holdoff on
publishing a new sequence number in LogTree. However, for Position Estimation,
the latency of LogTree is more than acceptable.
8.7 Enabling System Visibility Using LogFlood
In addition to using StateSync to publish range data, we have also used the
LogFlood variant of StateSync in deployment to improve our visibility into the
state of the network. We modified LogFlood to reduce the traffic it generated by
rate–limiting it to one packet per second and by increasing the refresh interval
by a factor of 100. We then implemented a few small applications that would
report information into the LogFlood system, and implemented a visualization
component to display the data in EmView. The result was a lightweight way to
see the current state of the whole fielded network at once. By starting up the
LogFlood component on our laptop, we would quickly get a view of the network as
182
our laptop retrieved the data cached at the nearest reachable nodes, and received
a follow–on stream of flooded updates.
In our system, we publish EmRun fault information and link quality data for
all neighbors over 70% reception rate, republishing only when the link quality
changes dramatically. In addition, we publish the state of the LogTree overlay
network as it changes, giving us a view into whether the network is successfully
connected or partitioned. This information was invaluable to the process of di-
agnosing problems in the field, from hardware faults and mis–configurations to
network connectivity problems.
183
CHAPTER 9
Range and DOA Estimation Testing
In order to assess the capabilities of our system, we first performed several exper-
iments to measure the performance of the ranging and DOA estimation system.
These tests were performed in controlled environments, with as much ground–
truth accuracy as we could achieve. We performed two types of experiment:
an angular experiment to test the DOA estimation, and a straight–line distance
experiment to test the range estimation.
These tests demonstrate the functionality of all of the layers below the mul-
tilateration layer. These tests are a partial integration test, requiring the correct
behavior of the synchronized sampling layer, the time synchronization subsystem,
the multihop networking layer with hop–by–hop time conversion, and the esti-
mation algorithms themselves. For example, high precision ranging is effectively
a “ground–truth” test of the complete synchronized sampling layer. Inaccura-
cies in time conversions from the sender to the receiver will show up as range
error: every 28 µS of timing error translates into 1 cm ranging error. In addi-
tion, forward and reverse ranges between the same pair of nodes apply inverse
linear conversions, so any timing error will affect the forward and reverse ranges
oppositely. See Section 10.3 for an exploration of this kind of analysis.
185
24 feet
Cem
ent W
all
24 feet
Cem
ent W
all
24 feet
Cem
ent W
all
Figure 9.1: Experimental setup for the DOA component test.
9.1 DOA Component Testing
To analyze the performance of the direction–of–arrival (DOA) estimation, we
performed an outdoor experiment in Lot 9, one of the parking structures on
the UCLA campus. The setup for our experiment is shown in Figure 9.1. In
this experiment, we laid out a carefully measured square 24 feet on a side, by
setting out a plastic tape measure and taping it to the ground. We used a laser
measurement tool (Hilti PD30)1 to ensure that the tape was laid down squarely.
We positioned the emitter at one corner of the square. We positioned the receiver
on a tripod in the center so that we could easily rotate the array about the y
axis.
In order to accurately measure the ground truth azimuth angles, we attached
a laser to the side of the microphone array, as shown in the left hand image
1The Hilti PD30 laser range measurement tool is accurate to 1/16 inch and has a maximumrange of 600 feet.
186
Figure 9.2: Mounting the measurement laser for the azimuth test (left) and the zenith
test (right).
in Figure 9.2. Before taking each measurement, we set a cardboard box at a
measured location along the side of the square. We then rotated the array until
the laser lined up with a mark on the box that was in turn lined up at a particular
distance along the side of the square. We took measurements at 1–foot intervals
along the square, an average of 3.75 degrees. At each measurement location, we
recorded 5 trials.
After doing this experiment for azimuth angles, we mounted the array on its
side and remounted the laser, as shown in the right image in Figure 9.2. We
then repeated the test, collecting data for the range of zenith angles for azimuth
90 deg and 270 deg.
187
0
0.02
0.04
0.06
0.08
-2 -1 0 1 2
Fra
ctio
n
Error in deg
Distribution of Azimuth Estimation Errors
Fraction of Values in BinNormal Distribution, µ = −0.14, σ = 0.96
Figure 9.3: Overall distribution of errors in the Azimuth test. These results are well
within our target of ±1 deg.
9.1.1 Azimuth Performance
In this section, we present the results of our azimuth experiment. Figure 9.3
shows the overall distribution of errors from our azimuth test, without considering
incoming angle. This shows a roughly normal distribution, with mean 0.14 deg
and σ = 0.96 deg. This result outperforms the results reported in the Cricket
Compass [PMB01], and it is also a good result relative to our target of estimating
orientation ±1 deg.
Figure 9.4 shows the accuracy and precision of the DOA estimator as a func-
tion of incoming angle. The results for each test angle generally show high pre-
cision, but the means show a bias that appears to be dependent on angle. We
hypothesized two possible sources for this error, and tested modifications of the
algorithm to try to compensate for them.
188
-3
-2
-1
0
1
2
3
0 45 90 135 180 225 270 315 360
Dev
iati
onin
Deg
rees
(95%
Con
fiden
ce)
Degrees
Accuracy and Precision of Azimuth Measurements at 5.17 m
Figure 9.4: Results of the Azimuth test, showing deviation from ground truth. These
results suggest a bias that is dependent on angle.
9.1.1.1 Failure of the Parallel Ray Assumption
First, since these tests were done at a fairly close range, the “parallel ray” as-
sumption underlying our estimation algorithm does not hold completely. The
equations we derived in Section 3.3 assume that the signal hits all microphones
from the same angle, meaning that we can compare the measured lags directly
to the component of that angle along the line between the microphones. How-
ever, at 5m range, the ray reaching the “high” microphone travels an additional
0.19 cm, resulting in an angular error of 1.36 deg. To test this hypothesis, we
implemented a partial solution to this problem. Our partial solution corrected
the lags that are used to compute the error term on the right–hand side of equa-
tion 3.13. Rather than using the projection of the baseline onto the hypothesis
angle, we geometrically computed the lag based on the angle and the range es-
189
timate, while leaving the coefficients in the Jacobian matrix unchanged. This
appeared to make the zenith angles more consistent, but had a negligible effect
on the overall performance of the azimuth measurement.
9.1.1.2 Weighting to Offset Instability
Our second hypothesis was drawn from the fact that certain angles are more
error–prone than others. Cases where the incoming ray is nearly orthogonal to a
baseline in the array geometry will tend to yield constraints that are very sensitive
to slight perturbations in the input. To address this, we implemented a simple
weighting scheme that reduced the weights to compensate for this sensitivity.
Our scheme weights each row of the Jacobian matrix, as
Si =∑
j
Ji,j (9.1)
J ′
i,j = Ji,j
√
maxi Si
Si
. (9.2)
However, we did not find that this additional weighting significantly affected
the error.
9.1.1.3 Perturbations in Mounting and Placement
Our remaining hypothesis is that small variations between the actual and as-
sumed array geometry are causing the angle–dependent errors. An in–lab array
calibration procedure could be devised to calibrate the array to correct for these
specific deviations, similar to [RD04]. More investigation is required to determine
whether this is a significant source of error, and if so determine a solution. We
190
leave these improvements for future work.
9.1.2 Zenith Performance
We next present the results of our zenith angle experiment. Our setup for this
experiment is just as we described in Section 9.1, with the array mounted on its
side so that the zenith angle changes as the array rotates.
The zenith angle measurement varies from −90 deg to +90 deg, but it can be
measured at a variety of different azimuth angles. In our experiment, we mounted
the array with 0 deg azimuth pointed up, which meant that as we rotated the
array, one half of the experiment would we taken with azimuth 90 deg, and one
half with azimuth 270 deg.
The results for these two portions of the test are shown in Figure 9.5. The
left half of each graph represents results from cases where the signal is coming
toward the array from below the plane of the base, while the right half represents
cases where the signal approached from above the plane of the base.
9.1.2.1 Performance of Negative Zenith Angles
In general, we saw poor performance for negative zenith angles, where the angle
was less than −30 deg. In these cases, line of sight is often blocked by the Lucite
base of the array, yielding large errors and outliers.
The poor performance of the array for signals coming from below is not a
serious concern, because in practice this system would be deployed in a configu-
ration that minimizes the occurrence of signals coming from below the plane of
the array. Typically the arrays will be deployed on a terrain, with the object of
detecting signals on or above the terrain. For our envisioned applications, it is
191
-20
-15
-10
-5
0
5
10
15
20
-90 -45 0 45 90
Dev
iati
onin
Deg
rees
(95%
Con
fiden
ce)
Degrees
Zenith Measurements at 5.17 m from 90 deg side
-20
-15
-10
-5
0
5
10
15
20
-90 -45 0 45 90
Dev
iati
onin
Deg
rees
(95%
Con
fiden
ce)
Degrees
Zenith Measurements at 5.17 m from 270 deg side
Figure 9.5: Results of the Zenith test, showing deviation from ground truth. Some
asymmetry is evident when comparing the two sides. Negative angles approach from
beneath the array and are heavily obstructed.
192
sufficient to achieve accurate detection of signals at most 30 deg below the plane
of the array. Algorithms that depend on angular data can also take that error
distribution into account.
9.1.2.2 Performance of Positive Zenith Angles
The right hand side of the graphs shows the performance as the signal arrival di-
rection passes over the top of the array from 0–90deg. We observe generally good
performance up to about 45 deg, with slightly worse performance as we approach
90 deg. We also observe poorer performance as we approach from the 90 deg az-
imuth side relative to approaching from the 270 deg side. This asymmetry may
be a consequence of asymmetries in the array geometry and slight deviations in
the placement of the microphones.
The geometry of our arrays is partly a historical consequence: once manufac-
tured, changing the geometry is difficult. It would be interesting to experiment
with other geometries, such as a tetrahedron. From a software perspective, there
is very little cost to testing alternate geometries. We leave experiments with
alternative geometries to future work.
Another hypothesis is that the performance from the 90 deg side was hurt
because the taller microphone blocked LOS to the other lower microphone. Per-
forming a test with the array rotated to approach from azimuth 45 deg would
eliminate that shadow and might yield improved performance. An exhaustive
test of incoming angles for several arrays might yield more clues to the source of
these errors, as well as methods of compensation.
193
0
0.05
0.1
0.15
0.2
0.25
0.3
-10 -5 0 5 10
Fra
ctio
n
Error in deg
Distribution of Zenith Estimation Errors
Fraction of Values, −30 deg < φ < +45 degNormal Distribution, µ = 0.26, σ = 0.86
Fraction of Values, +45 deg < φ < +90 degNormal Distribution, µ = 0.31, σ = 2.28
Figure 9.6: Overall error distribution from the Zenith test. We observe that the error
distribution for “midrange” angles is comparable to that of the azimuth estimates,
although the error distribution for overhead angles performs more poorly.
9.1.2.3 Modeling the Error Distribution
As we saw in Chapter 4, our position estimation algorithms require a model of
the errors in the input variables. For most techniques, this model needs to be a
Gaussian model. Figure 9.6 shows our attempt to fit a Gaussian model to the
zenith error we observed in our test.
Noting that the error is angle–dependent, we divided the data into two sets,
one covering the “midrange” angles −30 deg < φ < +45 deg, and the other cov-
ering the “overhead” angles +45 deg < φ < +90 deg. We drop the angles under
−30 deg because they are usually very unreliable. Figure 9.6 shows two separate
histograms, one for each of these sets. Note that the bars in the histogram are
scaled to represent a fraction of the overall data set, so that they may be directly
194
Figure 9.7: The experimental setup for our range test in Lot 9, showing tests at 5m
(left) and 50m (right). The 50m test required multihop synchronization.
compared.
This technique could be applied further (e.g. creating a model that was
fully parameterized on both zenith and azimuth angles), but this data set is not
extensive enough to support a more complex model. We only have zenith data
for two azimuth angles, and all of our data comes form a single array. We do
not know the extent to which a detailed analysis of this data would be specific
to each array.
9.2 Range Component Testing
In order to characterize the accuracy and precision of our system’s ranging es-
timates, we designed an experiment to quantify the performance of the ranging
system. Figure 9.7 shows the setup for the ranging experiment. In order to very
accurately measure the range between the two arrays, we mounted the arrays
195
0
15
30
45
60
75
90
0 15 30 45 60 75 90-20
-15
-10
-5
0
5
10
15
20
Met
ers
(95%
Con
fiden
ce)
Mea
nE
rror
incm
Meters
Range Measurements from Lot 9
0
15
30
45
60
75
90
0 10 20 30 40 50-30
-25
-20
-15
-10
-5
0
5
10
Dis
tance
inM
eter
s
Dev
iati
onin
cm(9
5%C
onfiden
ce)
Experiment sequence (ordered by distance)
Range Measurements from Lot 9
Figure 9.8: Results of the Ranging test, 0-90m. In (a) the impulses show the mean
deviation from ground truth (right y scale), as a function of distance. In (b) experiments
are shown ordered by distance, with the mean deviation plotted relative to the right y
scale. The distance for each experiment is represented by the dotted line, referenced
to the left y scale.
196
0
2
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Ran
geE
rror
(cm
)
SNR
Range Error vs. SNR observed in Range Test
-10
-5
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35 40 45 500
15
30
45
60
75
90
SN
R(9
5%C
onfiden
ce)
Dis
tance
inM
eter
s
Experiment sequence (ordered by distance)
SNR Correlated to 21− 11 log10 r in Range Test Data
SNR21− 11 log10 r
Distance
Figure 9.9: Plots showing the relationship between distance, SNR, and error. The
upper graph shows a scatter plot of range error vs. SNR. The lower graph shows the
relationship between SNR and distance, with experiments ordered by distance. The
dashed line shows a function of distance that fits well to SNR. The dotted line shows
the distance corresponding to each experiment.
197
on blocks that we could position on the ground with high precision. We used a
laser ranging tool and a tape measure to determine the ground truth distances,
using the laser to position the tape and then using the tape to get consistent
fine–grained measurements.
9.2.0.4 Range Error, Distance, and SNR
Figure 9.8 shows the results of the complete experiment. The upper graph plots
the measurements linearly against ground truth, with impulses showing the mean
deviation from ground truth according to the right–hand y axis. The lower graph
plots the mean error and 95% confidence intervals for all experiments, ordered by
distance; distance is shown as the dashed line. We observe from these graphs in
Figure 9.8 that there is no significant correlation between magnitude of error and
distance in an unobstructed environment. The errors we observe are independent
of distance because they result from inaccuracy in the detection process rather
than from error that accumulates over distance.
The plots in Figure 9.9 describe the relationship between measurement error,
SNR, and distance, for the test in Lot 9. The scatter plot in the upper graph
shows some correlation between measurement error and SNR. However, this data
is skewed because it excludes failed detections that did not generate a meaningful
error value. The lower plot relates SNR to distance, and shows that as the
distance grows the SNR drops, roughly proportionally to −11 log10 r. In this case,
failed detections are counted as SNR 0. We suspect that this sublinear falloff is
due to a combination of directional emitters, and a wave–guide effect induced by
the floor and ceiling of the parking structure. Unlike radio propagation, where
reflections cause a 180 deg phase shift and result in cancellation, reflections in
acoustic propagation generally add to the strength of the signal. These favorable
198
Experiment Precision Scale Range Scale
1.00–1.04m 1cm 1m5.00–5.04m 1cm 5m10.00–10.04m 1cm 10m50.00–50.04m 1cm 50m84.00–84.04m 1cm 84m5.00–5.50m 10cm 5m50.00–50.50m 10cm 50m1–10m 1m 1–10m10–50m 5m 10–50m50–90m 10m 50–90m
Table 9.1: Range experiments, grouped by target scale and precision.
propagation characteristics enable very long ranges to be achieved with lower
energy expenditures.
9.2.0.5 Multi–Scale Measurement
In order to get a sense for the overall performance of the ranging system, we
performed an experiment that tested the performance at many different scales.
The clumps of points in Figure 9.8(a) fall into clusters at several scales. Table 9.1
shows the list of distances that were tested. Each set of tests is designed to assess
the precision and accuracy of the system at a given distance scale.
The results of the tests are shown in Figures 9.10 and 9.11. These plots zoom
in on different segments of a distance–distance plot. For example, we can see that
at 5m the system achieves sufficient precision to correctly order measurements
one cm apart, although the measurements’ accuracy is off by nearly 1cm. The
same sequence of measurements at 50m was much less precise and much less
accurate, although other tests past 50m yielded better results. We believe that
some system connectivity problems were the source of the error observed in the
199
0
2
4
6
8
10
0 2 4 6 8 10
Met
ers
(95%
Con
fiden
ce)
Meters
Range Measurements from Lot 9, 1-10m
4.9
5
5.1
5.2
4.9 5 5.1 5.2
Met
ers
(95%
Con
fiden
ce)
Meters
Range Measurements from Lot 9, 5m
Figure 9.10: Results of the Ranging test, zooming in on 10 and 5 meters. These tests
show good accuracy and precision, despite being taken over a long time interval and
assuming a single temperature over the entire experiment.
200
49
50
51
52
53
54
55
49 50 51 52 53 54 55
Met
ers
(95%
Con
fiden
ce)
Meters
Range Measurements from Lot 9, 50-55m
49.9
50
50.1
50.2
49.9 50 50.1 50.2
Met
ers
(95%
Con
fiden
ce)
Meters
Range Measurements from Lot 9, 50m
Figure 9.11: Results of the Ranging test, zooming in on tests from 50–55 meters.
Anomalous behavior is observed at 50 meters, perhaps the result of a transient syn-
chronization problem. A bug that could have caused this has since been fixed.
201
0
0.05
0.1
0.15
0.2
0.25
-15 -10 -5 0 5 10
Fra
ctio
n
Error in cm
Distribution of Range Estimation Errors, Lot 9 Range Test
Fraction of Values in BinNormal Distribution, µ = −2.38, σ = 3.81Normal Distribution, µ = −1.73, σ = 1.76
Figure 9.12: Overall error distribution from the Lot 9 Range Test. The standard
deviation of the range error for all tests is 3.81 cm. If we drop the 17 values with error
larger than 10 cm, the standard deviation of the remaining distribution is 1.76 cm. By
applying the narrower model in our multilateration algorithm, we can drop the data in
the tails as outliers.
50 meter tests, and that they are anomalous.
9.2.0.6 Modeling Range Error
In Chapter 4 we saw the importance of developing an error model. Figure 9.12
shows the overall distribution of errors from the data we collected in the Lot
9 range test. From this graph, we can see that the data does not fit well to a
Gaussian model.
The reason for this is twofold. First, in the absence of environmental changes,
we would expect the data to be skewed to longer ranges. As we saw in Chapter 3,
the ranging signals tend to be detected as a sequence of pulses. If a lower–energy
202
first arrival is missed, the result will be a small positive error when one of the
following pulses is detected. As the SNR drops, the likelihood that second arrivals
are detected increases.
Second, this data was collected over a long time span, in which environmental
factors affecting range may have changed. We suspect that this is the cause of
the heavy tail of short ranges in this dataset. For the purposes of applying this
data to position estimation, we collect the data over a short time span in order
to avoid this problem.
Note that this model does not account for the effects of obstructions and
long reflections. Since our Lot 9 test did not have any obstructed conditions, we
did not observe any long ranges. However, as we saw in Chapter 4, reflection
problems are better removed by outlier rejection techniques, rather than trying
to incorporate them into a Gaussian model. The Gaussian model only applies
to measurement and detection error; it does not apply to errors from reflections
and the Gaussian model can be used to identify possible outliers when a system
of measurements is analyzed.
9.2.0.7 Sources of Error
In the course of running range experiments, we developed several hypotheses
about sources of error in this system. We hypothesize three primary sources of
error: time synchronization error, excessive noise, and weather factors.
Time synchronization error occurs when radio link quality changes and old
time conversions are kept without being updated. This occurred due to a bug in
the system which has since been corrected; time conversions now time out after
a certain period. However, we suspect that this bug may have tainted this data.
We plan to perform a followup experiment to address this.
203
Our estimation algorithms do very good noise rejection, but a sufficient level
of background noise will still render the signal undetectable. This problem is
mainly an issue in urban environments; in natural environments it is rare that
noise sources are present that pose a significant problem for our system.
Weather factors are the primary source of error and the most difficult to
address. As we discussed in Section 3.4, variations in the temperature can cause
1% error per degree C. Wind and relative humidity also affect the accuracy of an
acoustic ranging system. Worse, it is unclear how to compensate for this. First
of all, it is difficult to measure air temperature accurately enough to compensate
for temperature with the accuracy required to achieve centimeter precision over
80 meters—most temperature sensors are only accurate to 0.5 degC. Second, it is
not the temperature at a single point, but the average temperature all along the
path between the nodes that determines the correction.
Given these facts, our approach is to try to apply our system in the region
where the environmental factors are relatively uniform, and therefore their im-
pact can be well approximated by average values. For example, by performing
calibration at night, we avoid uneven heating from solar radiation and many is-
sues associated with that, such as updrafts. Our other technique to combat this
problem is to take all the measurements as temporally close together as possible.
For example, rather than combining range data collected over a long period of
time, we attempt to perform the ranges in a short span, during which we can
assume minimal changes to the environmental parameters.
Unfortunately, this does not help us much with our ranging distance test,
because the measurements are taken over the course of a few hours, as we move
the two nodes farther and farther apart. This time lag, and the fact that Lot 9
is only partially enclosed, may explain why we saw what may be time–varying
204
error in our Lot 9 data (we performed the measurements in order from near
to far, so in sequence they increase monotonically both in time and distance).
For this dataset, we collected point measurements of temperature and humidity
taken at different times and places, which varied considerably. Rather than use
all those measurements, we corrected this dataset based on a single consistent
temperature, selected near the end of the run.
While this testing strategy gives reasonable results, it is possible that much
of the error we see is due to temporal variations in conditions. To get a more
consistent assessment of the ranging system independent of environment, we plan
to perform a follow–up experiment in an underground garage which should have
a more stable climate.
205
CHAPTER 10
System Testing
In this chapter we describe the results of several complete system tests. In each
case, we deployed 10 nodes into an environment, measured the ground truth, and
ran the system repeatedly. Each run performed 5 trials at each of the nodes,
attempting to receive the signals at all of the other nodes. All of the raw data
was captured for offline analysis.
We performed system tests in two different environments: first in the UCLA
Court of Sciences and then in a forested area of the UCR James Reserve in
Idyllwild, CA. These tests are described in the following sections.
10.1 Urban Outdoor Test: Court of Sciences
Our first test was performed on September 22 2005 in the UCLA Court of Sci-
ences. The Court of Sciences, shown in Figure 10.1, is a large paved courtyard
surrounded on three sides by buildings. In addition to the paved areas, the court-
yard has intermittent grassy areas, and several tall hedges and planters. In the
figure, the hedges have been indicated with yellow lines. The positions of our
nodes in deployment has been indicated with yellow dots, along with the ID of
each node.
Table 10.1 describes the sequence of experiments and the measured weather
conditions. Before experiment 9, we calibrated the emitters using a sound level
206
Figure 10.1: The experimental setup for our system test in the UCLA Court of Sciences.
Node locations are indicated by numbered dots, while yellow bars indicate the location
of hedges. North is toward the top of the photo. Image courtesy of Google Earth.
207
-3000
-2000
-1000
0
1000
2000
3000
-2000 0 2000 4000 6000
101:(-25.96,-0.08,0.06,90.5)
100:(33.91,-19.86,2.10,88.7)111:(-26.05,-20.07,0.22,87.6)
104:(0.00,-0.03,0.84,87.2)
105:(-0.14,-20.12,0.50,85.9)
102:(34.04,26.13,-0.08,84.8)108:(-25.87,25.96,-0.04,91.5)
(53.98,0.02,2.29,89.8):103
109:(33.97,0.03,1.57,90.8)
106:(0.12,26.01,-1.18,87.9)
Figu
re10.2:
Outp
ut
ofth
eN
LLS
Position
Estim
ationA
lgorithm
,for
the
1:45A
M
dataset.
The
greencrosses
den
otegrou
nd
truth
;th
ered
arrows
show
the
position
and
orientation
ofeach
node.
208
-3000
-2000
-1000
0
1000
2000
3000
-2000 0 2000 4000 6000
101:(-26.06,-0.35,-2.02,91.8)
100:(33.90,-19.68,1.53,88.9)111:(-25.83,-20.27,2.00,90.0)
104:(-0.14,-0.13,1.23,88.8)
105:(-0.30,-20.22,0.26,85.5)
102:(33.61,26.34,0.17,85.9)
(54.07,-0.03,2.10,91.2):103
109:(33.96,0.08,1.49,91.4)
106:(0.31,25.98,-0.14,89.8)108:(-25.52,26.29,-0.34,88.9)
Figu
re10.3:
Outp
ut
ofth
eR
–θ
Position
Estim
ationA
lgorithm
,for
the
1:59A
M
dataset.
This
dataset
was
the
best
result
forR
–θ.
209
Expt. Time deg C Humidity Nodes Note
1 21:46 19.0 73 102 22:03 18.6 76 103 22:20 19.0 78 104 22:34 18.4 79 105 23.01 18.2 80 106 23:21 17.2 79 107 23:33 17.1 80 108 23:51 16.9 80 109 00:41 16.9 79 10 Cal. Emitters, Reboot10 00:56 16.9 80 1011 01:29 16.6 81 1012 01:45 16.6 81 1013 01:59 16.6 81 1014 02:12 16.6 81 8 2 failed
Table 10.1: Experiment timing and weather conditions.
meter to all emit 100 dB at 1 meter from the emitter. During the calibration
process, one of the nodes was accidentally rebooted. At experiment 14, two nodes
malfunctioned and failed to participate in the system.
10.1.1 Measurement of Ground Truth
The nodes were laid out in a grid in order to simplify the measurement of ground
truth. Each array was mounted on a tripod 1.5 meters above the ground. Each
array was oriented with 0 deg pointed west.
We measured ground truth positions using the Hilti PD–30 laser rangefinder.
By laying the nodes out in a grid, we were able to sight along grid lines and
position the nodes in lines, with measured distances between each node. We then
also measured distances across from one line to another where that was possible.
In some cases, such as between nodes 109 and 100, line of sight was blocked and
210
there was no way to measure an east–west range.
The accuracy of the position measurement was limited by the stability of
the tripods, by our measurement capability, and by time limitations. While the
tripods are fairly stable, they might easily sway on the order of a centimeter.
Our ranging measurements over these fairly long distances were not always made
level, and the target of the laser was not always positioned perfectly. In addition,
the process suffers from additive error because all of our measurements were made
to the next node in the line rather than to an absolute reference position. In the
end, our cross–checking ranges (e.g. diagonal ranges) were generally accurate to
within about 4cm.
We aligned the arrays using a laser held to the side of the array, pointing
toward the next node in the grid. While this did not provide an absolute orienta-
tion reference, it was probably the most accurate solution, given the difficulty of
measuring angles on the small scale of the arrays and the problems using compass
readings in the presence of the speaker magnets.
We observed differences in elevation across the courtyard, but we had no
way to accurately measure them. However, using Google Earth we were able to
capture depth information for our experimental positions. We don’t know the
precision of Google Earth’s depth map, but it reports in units of feet.
In the next several sections we will explore different aspects of the performance
of our system on this data.
10.1.2 Selecting the Residual Cutoff
The first question we would like to answer is to find a good cutoff value for our
studentized residual outlier rejection scheme. Given that we have ground truth
information for this data, we can do an experiment where we keep rejecting the
211
0
2
4
6
8
0 3 6 90
20
40
60
80
Effec
tive
Res
idual
Thre
shol
d
Min
imum
3DPos
itio
nE
rror
,cm
Experiments, ordered by decreasing threshold
Studentized Residual Threshold Achieving Minimum Error
Effective Residual ThresholdMinimum Position Error
Figure 10.4: Results of running our 14 courtyard experiments using a residual threshold
of 2. We see that half of our experiments do equally well with a threshold of 3.
worst residual, and determine the residual that brought the error to a minimum
value. By doing this on a series of datasets from this experiment, we can thus
empirically determine a good cutoff.
In this experiment, we ran through 14 datasets collected from our Court of
Sciences test. These datasets vary from each other in several ways. Since they
were taken over the course of several hours from 9:45 PM through 2:15 AM, the
air temperature and humidity changed over that time period, resulting in changes
in the scaling of the map. In addition, near the end of the experiment several
nodes began to malfunction and failed to fully participate in the system, leading
to a reduction in the amount of data available to the algorithm.
Figure 10.4 shows the result of running our courtyard data with a residual
212
0
0.2
0.4
0.6
0.8
1
30 40 50 60 70 80 90
Fra
ctio
nW
ith
Hig
her
Err
or
Minimum 3D Position Error, cm
Error Achieved Given Different Residual Thresholds
Minimum Position Error, T=2Minimum Position Error, T=3Minimum Position Error, T=4
Figure 10.5: CDF of the results of applying several different residual thresholds to our
14 courtyard experiments.
threshold of 21. When such a low threshold is used, the algorithm will con-
tinue dropping constraints, even after all of the remaining constraints are “good”
constraints that reduce the overall system error. This graph shows two curves:
the minimum position error achieved by dropping constraints, and the residual
threshold required to achieve that minimum. To compute this graph, we recorded
based on ground truth the minimum error achieved each time a new low was
reached, and recorded the lowest–value residual dropped up to that point. Thus,
if we were to re–run one of these experiments with the “effective” threshold set
as our threshold, we would achieve that minimum error, although we might then
continue to drop additional constraints past that point.
We interpret this graph by observing that half of our experiments have an
1Two experiments failed to converge well enough to drop any points, so they were left outof this graph.
213
effective threshold of 3 or greater. Inspection of the raw data showed that even
when the true minimum was achieved with a lower threshold, a threshold of 3
got “most of the way” to the minimum, while dropping many fewer constraints.
Figure 10.5 shows the distributions of minimum error for rejection thresholds of
2, 3, and 4.
Since this result only represents a single dataset in a single environment, we
do not claim that our selection of a residual threshold value is universal. Rather,
we anticipate that the residual threshold might be tuned in the field. Fortunately,
the choice of residual cutoff value is not a difficult parameter to tune. There is
little risk in choosing a sub–optimal value. If we choose too high, there might be
too much bad data, resulting in poor convergence. If we choose too low, good
data may be dropped and the system may drop so much data that convergence
fails, and some nodes are not placed. An incorrect threshold choice is readily
detectable by observing the fit quality metric reported by the average range
residual value, by observing the amount of data dropped, and by observing the
high–level performance of the system. For the remainder of this analysis, we use
a threshold of 3; we anticipate that the threshold of 3 will work well for a wide
range of environments and will rarely need to be adjusted.
10.1.3 Comparison of R–θ and NLLS
In Chapter 4 we described two different position estimation algorithms, R–θ and
NLLS. To evaluate these algorithms, we run each of them on our 14 courtyard
datasets. The results of these runs are shown in Figure 10.6.
The graph shows two curves each for the R–θ and NLLS algorithms. For each
algorithm, the first curve shows position error projected onto the (x, y) plane after
fitting to ground truth, while the second curve shows the 3–D position error. We
214
0.1
1
10
100
1000
0 3 6 9 12 15
Pos
itio
nE
rror
(cm
)
Experiment
Comparison of R–θ and NLLS Algorithms: Position Error
R–θ XYZ Position Error (cm)R–θ XY Position Error (cm)
NLLS XYZ Position Error (cm)NLLS XY Position Error (cm)
Figure 10.6: Position error achieved by the R–θ and NLLS algorithms on our 14 court-
yard experiments, using a residual threshold of 3. The NLLS algorithm consistently
outperforms the R–θ algorithm because it is able to make better use of the more ac-
curate range data. Our 2D results improve upon those in [KMS05] by a factor of
20.
show both of these results for two reasons. First, our system is much more well–
constrained in the X–Y plane, because their placement deviates minimally from
the X–Y plane, and because the zenith angle estimation is generally less accurate
than the other measurements. Second, our ground truth measurements are much
more accurate in the (x, y) plane than in the Z direction. For depth measurements
we are relying on data from Google Earth which is at best accurate to 30 cm.
As we can see, the NLLS algorithm performs quite well in the X–Y plane and
significantly outperforms R–θ in both 2–D and 3–D estimation.
For a point of reference, we can compare the results reported by Kwon et.
al. [KMS05], which reported on a system of 45 nodes deployed on a grid in a
215
0.1
1
10
100
1000
0 3 6 9 12 15
Avg.
Ran
geR
esid
ual
(cm
)
Experiment
R–θ and NLLS Algorithms: Avg. Range Residual
R–θ Avg. Range Residual (cm)NLLS Avg. Range Residual (cm)
0.1
1
10
100
1000
0 3 6 9 12 15
cm
Experiment
NLLS Position Error and Avg. Range Residual
NLLS Avg. XYZ Position Error (cm)NLLS Avg. XY Position Error (cm)
NLLS Avg. Range Residual (cm)
Figure 10.7: (a) Average Range Residual achieved by the R–θ and NLLS algorithms
for our courtyard experiments, using a residual threshold of 3. (b) Average Range
Residual and Position Error for NLLS.
216
60 meter by 55 meter grassy area, with a minimum inter–node spacing of 9 me-
ters. Their system reported an average 2–D position error of 2.47 meters (or,
1.5 meters after dropping the largest 5 errors). In comparison, our system reliably
achieved average 2–D position errors of about 10 cm in a similar–sized deploy-
ment, but with only 10 nodes and a correspondingly larger minimum inter–node
spacing of 20 meters, and with a terrain interrupted by thick hedges. The smaller
number of nodes in our case does not benefit us, since we are not comparing the
computational complexity of the multilateration algorithms, and having fewer
nodes means a less–constrained system.
We observe an anomaly in the graph for experiment 9, where the NLLS algo-
rithm failed to converge. We suspect this was the result of a software bug that
caused a synchronization problem after the accidental reboot of one of the nodes.
This bug has since been fixed.
Figure 10.7(a) compares R–θ and NLLS using our other metric, Average
Range Residual. In this metric, we compute the average range residual, inde-
pendent of ground truth, by subtracting each measured range from the distance
between the computed position estimates in the map. We hoped that this quality–
of–fit metric might be useful as an indicator of position error. Figure 10.7(b) plots
position error and range consistency for NLLS on the same graph. While they
appear to be correlated, it is still not clear whether we can conclude much from
a good fit, although a bad fit is probably indicative of high position error. We
will take a more detailed look at this question in Section 10.2.
217
1.0095
1.01
1.0105
1.011
1.0115
1.0120 3 6 9 12 15
15
16
17
18
19
20
Sca
ling
Fac
tor
Tem
per
ature
,deg
C
Experiment
Correlation of Map Scaling With Temperature
Scaling FactorTemperature, deg C
Figure 10.8: Scaling factors relative to ground truth, and air temperature. We see a
correlation between map scaling relative to ground truth, and air temperature.
10.1.4 Map Scaling With Temperature
As we have previously noted, environmental parameters such as temperature
cause scaling in our estimated position map. Figure 10.8 shows a correlation
between air temperature and the scale of our position estimates.
Recall that our position error metric fits the position estimates to the ground
truth, and that fit includes a scale factor. That scale factor is shown on the left–
hand Y axis of Figure 10.8. The right–hand Y axis shows the air temperature in
deg C recorded during our courtyard experiment at a single point in the middle
of the field.
While the two sets of data show a correlation, they do not match the model
218
for the variation of the speed of sound as a function of temperature2 We do not
have a satisfying explanation for this discrepancy, but our current conjecture is
that there is an additional scaling factor in the system for which we are not
accounting. Future work on this problem might allow this system to produce
highly accurate air temperature estimates.
10.1.5 Repeatability
In this section we analyze the repeatability of our position algorithms across ex-
periments. One of the principal difficulties in analyzing this system is accurately
capturing ground truth. A consistent bias in the output of position estimation
might suggest either a flaw in the ground truth, or a persistent bias in the system
itself.
To investigate this further, we computed statistics for each node’s position
estimates over our courtyard experiments. Note that we dropped experiment 9 as
an outlier—this is fair because we detected a convergence failure and a very poor
fit metric, indicating that the experiment failed. The results of the remaining 13
experiments were summarized for each node by the mean and standard deviation
of each component: X, Y, Z, roll, pitch, and yaw.
The results of these analyses are shown in Figures 10.9 and 10.10. Recall
that to compute our position error metric, we first normalize the map to extract
its shape, by filtering out scale, translation, and rotation relative to the ground
truth landmarks that we use as a template. Once all maps have been fitted to the
ground truth, we can plot the distribution of estimates for a given node over all
experiments, relative to the ground truth values. In the upper plot in Figure 10.9,
we have plotted each node on an X–Y plot relative to the ground truth value.
2We also captured relative humidity data, but including this did not help the fit significantly.
219
-20
-15
-10
-5
0
5
10
15
20
-15 -10 -5 0 5 10 15
cm
cm
NLLS Repeatability: X/Y
100
101
102
105
106
108109
-150
-100
-50
0
50
100
150
100 102 104 106 108 110
cm
Node
NLLS Repeatability: Z
Figure 10.9: Repeatability statistics for position estimates, showing the per–node distri-
bution of deviations from ground truth. All errorbar ranges are ± Standard Deviation.
The mean standard deviations for X, Y, and Z estimates over all nodes are 3.18 cm,
3.85 cm, and 49.15 respectively.
220
-7-6-5-4-3-2-101234
100 102 104 106 108 110
deg
Node
NLLS Repeatability: Yaw
-12
-8
-4
0
4
-16 -12 -8 -4 0 4 8 12
Rol
lin
deg
Pitch in deg
NLLS Repeatability: Roll/Pitch
Figure 10.10: Repeatability statistics for Yaw, Pitch, Roll, computed using the same
method as in Figure 10.9. All errorbar ranges are ± Standard Deviation. The mean
deviation for yaw estimates over all nodes is 1.37 deg.
221
Thus, for example, we can see that node 100 was consistently estimated to be
north of the landmark by 15 cm, and east by 10 cm. If the estimated maps
matched the ground truth exactly, they would all be plotted in the center of the
map at (0,0).
In each case, the range of the errorbars is twice the standard deviation. In
cases such as Node 105 where there is a tight cluster of points substantially offset
from the origin, this suggests that there might be an error in ground truth. In
cases such as Node 100 where there is a diffuse cluster of points, it is less clear.
However, it is worth noting that our ground truth measurement of Node 100 was
based only on range and bearing to 105, because a hedge blocking our line–of–
sight to 109 made a range measurement to the west impossible.
Overall, this data is inconclusive: while it is entirely possible that much of the
offsets we see in (x, y) position are due to errors in ground truth, the deviations
are still small and we have no way to re–examine ground truth. Some of the worst
outliers, such as 100, correspond to nodes that we know were poorly measured,
and given that the ground truth measurements incorporate additive error, errors
of 5–10 cm across the field are not surprising. However, errors on that order
could also result from some potential calibration issues that are currently not
well understood. These are discussed in more detail in Section 10.3.
Repeatability in the Z dimension is shown in the lower plot of Figure 10.9.
Because of the flat topology of this deployment, the Z dimension is poorly con-
strained and must rely largely on the less–accurate angular data for its estimates.
While our ground truth is accurate to only about 30 cm, we see variation on the
order of a meter or more, which we can safely say is estimation error.
Figure 10.10 shows similar statistics computed for our yaw, pitch, and roll
estimates. Here we see that the yaw estimates are quite repeatable with an
222
Figure 10.11: The experimental setup for our system test in the James Reserve in
Idyllwild. Node locations are indicated by red numbered dots. North is toward the top
of the photo; all arrays were aligned by compass to point west.
average range of about 3 degrees. Given the size of the arrays, a ground truth
alignment error of 5 degrees is not surprising. The data shows that the arrays
are mostly oriented accurately, with a few, such as 102 and 105 that seem likely
to be misaligned rather than mis–estimated.
10.2 Forest Outdoor Test: James Reserve
In order to evaluate the performance of the system in an environment more
realistic for our typical applications, we performed another system test in the
223
-4000
-3000
-2000
-1000
0
1000
2000
3000
-1000 0 1000 2000 3000
103:(-0.06,-13.80,1.84,179.7)
100:(-0.75,-34.77,1.30,175.8)
105:(17.54,-11.77,2.13,190.6)
104:(27.24,-0.30,0.89,173.0)
110:(16.42,16.18,-0.44,181.3)111:(5.54,18.01,1.12,177.4)
106:(5.35,25.96,3.65,176.6)
102:(0.10,0.12,0.04,171.7)
109:(-4.09,19.70,2.44,177.8)
108:(17.13,-0.07,-0.21,177.6)
Figure 10.12: 3–D map generated by the NLLS algorithm from our deployment in the
James Reserve. Ground truth is shown as crosses, estimated positions and orientations
as arrows.
224
-4000
-3000
-2000
-1000
0
1000
2000
3000
-1000 0 1000 2000 3000
103:(-0.97,-13.18,2.95,178.2)
100:(-2.62,-33.11,1.60,177.9)
105:(16.27,-11.44,1.92,189.1)
104:(26.15,-0.72,0.59,171.5)
110:(15.85,15.84,-0.22,179.3)111:(5.05,17.44,1.20,173.5)
106:(3.84,27.65,4.17,172.4)
102:(0.16,-0.05,-0.02,175.6)
109:(-4.73,19.65,3.24,174.0)
108:(16.28,-0.09,0.23,175.5)
Figure 10.13: 3–D position estimation map generated by the R–θ algorithm. Both this
and Figure 10.12 use data captured at 10:30 AM on September 29, 2005.
225
James Reserve in Idyllwild, on September 28, 2005. For this test, we planted 10
stakes, roughly in the locations shown in Figure 10.113. We aligned the arrays
using a compass, lining the edge of the compass up with the edge of the array by
eye. The arrays were aligned with 0 deg pointing west.
We attempted to measure ground truth positions for the arrays, but it was
very difficult to get accurate readings because of the foliage and significant varia-
tions in elevation. In order to simplify the collection of ground truth, we aligned
the arrays linearly in a grid–like topology, and measured point–to–point ranges
with the laser rangefinder. We used a hand–held altimeter to get approximate
elevations, accurate to at best 1 meter. Based on this data, we have an ap-
proximate map, but it is not nearly as accurate as our courtyard ground truth
data.
Figures 10.12 and 10.13 show maps generated by our two positioning algo-
rithms, fitted to our approximate ground truth measurements. Both maps show
data from 10:30 AM, 5 hours after the beginning of the experiment. Due to bat-
tery problems during the test, only 6 experiments included all 10 nodes at once,
for two brief periods at 10 AM and at 2 PM. We selected this particular experi-
ment because of the experiments with 10 nodes, it reported the best position error
score, 44 cm average position error (139 cm for the R–θ algorithm). Table 10.2
shows the position error results for the 6 experiments with all 10 nodes.
When we consider all of the experiments (i.e. including those with fewer than
10 nodes), we find a much greater variation in the position error. Figure 10.14
shows histograms of the position error and range consistency metrics using the
NLLS and R–θ algorithms, over all experiments. The NLLS histogram includes
only those trials where the NLLS algorithm reached convergence, but is repre-
3Image retrieved from the James Reserve GIS system by Vanessa Rivera Del Rio.
226
0
0.1
0.2
0.3
0.4
0 500 1000 1500 2000
Fra
ctio
n
cm
Position Errors from James Reserve
R–θ AlgorithmNLLS Algorithm Reached Convergence
0
0.1
0.2
0.3
0.4
0 10 20 30 40 50
Fra
ctio
n
cm
Avg. Range Residuals from James Reserve
R–θ AlgorithmNLLS Algorithm Reached Convergence
Figure 10.14: Histograms of the Position Error and Average Range Residual Metrics for
the James Reserve data. NLLS outperforms R–θ according to both metrics, although
inaccuracies in ground truth likely prevent errors under 50 cm.
227
Position Error (cm) Average Range Residual (cm)Time NLLS R–θ NLLS R–θ
10:34 AM 51 153 2.41 9210:39 AM 44 139 4.42 10710:44 AM 297 110 174 11410:49 AM 59 359 2.07 432:46 PM 55 186 2.38 3082:51 PM 55 145 2.96 257
Table 10.2: Position Error and Average Range Residual metrics for the NLLS and R–θ
algorithms, run on the 6 10–node experiments captured at the James Reserve. For the
experiment at 10:44 AM, the NLLS algorithm failed to reach convergence.
sented as a fraction of all trials.
The NLLS algorithm generally performs better, although both algorithms
have a significant number of results with very high average error values. Inves-
tigating some of these cases, we found that these large values of the position
error metric were often due to a single node that was placed quite far away from
the correct location. We found that dropping the worst position error from the
average reduced the percentage of experiments with average position errors over
500 cm from 30% to 2.6%.
Note that the histogram of range residuals in Figure 10.14(b) does not show
the cluster of large errors we see in the position error data. This suggests that the
cause of the mis–placed nodes is under-constrained systems, which would tend to
yield a low average range residual. Unfortunately, we did not observe any signifi-
cant predictive relationship between average range residual and average position
error. We believe that more work is needed to discover a metric independent of
ground truth that can be used to identify bad fits. Perhaps these metrics can be
extended to take into account how well–constrained a node is: the average range
residual and the average position error among well–constrained nodes. We leave
228
-40
-20
0
20
40
-100 -80 -60 -40 -20 0 20 40 60 80 100
cm
cm
NLLS Repeatability: X/Y
100102
103
104
105
106
108
109
110 111
-100
-50
0
50
100
100 102 104 106 108 110
cm
Node
NLLS Repeatability: Z
Figure 10.15: Repeatability statistics for position estimates, over all 10–node James
Reserve data. All errorbar ranges are ± Standard Deviation. The mean standard
deviations for X, Y, and Z estimates over all nodes are 3.48, 3.78, and 17.1 respectively.
229
-15
-10
-5
0
5
10
15
100 102 104 106 108 110
deg
Node
NLLS Repeatability: Yaw
-4
0
4
-8 -4 0 4
Rol
lin
deg
Pitch in deg
NLLS Repeatability: Roll/Pitch
Figure 10.16: Repeatability statistics for Yaw, Pitch, and Roll. All errorbar ranges are
± Standard Deviation. The mean standard deviation for yaw estimates over all nodes
is 3.15 deg.
230
these efforts to future work.
Figures 10.15 and 10.16 show statistics for the variation in position estimates
over the 5 experiments for which 10 nodes reported and the NLLS converged.
In terms of X and Y, this data exhibits comparable variance to the courtyard
data, but with a much wider spread in the means of the position errors, perhaps
the result of errors in ground truth. In the case of the Z axis data shown in
Figure 10.15(b), the variance is actually lower than the comparable data from
the courtyard experiment. This improvement is probably due to the fact that the
James Reserve node placement has more variation in the Z axis, yielding a more
well–constrained system.
10.3 Analysis of Symmetric Ranges
Because it is often difficult to get accurate ground truth information from our
less–controlled system tests, this data is not always that helpful to characterize
the performance of the underlying ranging estimates. However, we can use this
data to learn about the ranging system by comparing “symmetric ranges”, e.g.
the range from Node A to Node B vs. the range from Node B to Node A.
Figure 10.17 and Figure 10.18(a) show symmetric range data from our James
Reserve experiment, comparing three nodes against each other. We can observe
several properties of this data. First, data has a consistent curve downwards,
reflecting a temperature increase from the starting time of 5:30 AM through to
12:30 AM. Second, the initial data is much cleaner than the later data. We believe
that this additional noise is due both to the increase in audible noise during the
day, as well as weather conditions, including solar heating and wind currents.
Some of this noise is also attributable to synchronization errors.
231
3450
3500
3550
3600
0 5000 10000 15000 20000 25000
cm
Experiment time (sec after 5:30 AM)
Symmetric Ranges: 100 vs. 102
100→ 102102→ 100
1400
1450
0 5000 10000 15000 20000 25000
cm
Experiment time (sec after 5:30 AM)
Symmetric Ranges: 102 vs. 103
102→ 103103→ 102
Figure 10.17: Symmetric ranges, showing variation as a function of temperature and a
consistent offset.
232
2050
2100
2150
0 5000 10000 15000 20000 25000
cm
Experiment time (sec after 5:30 AM)
Symmetric Ranges: 100 vs. 103
100→ 103103→ 100
2100
2120
2140
2160
0 50 100 150 200
cm
Experiment Sequence
Raw Symmetric Ranges: 100 vs. 103
100→ 103103→ 100
Figure 10.18: (a) Symmetric ranges for 100–103, and (b) Raw range data showing
probable synchronization failure.
233
To get better insight into how well the time synchronization system is working,
we show the raw experiment data in Figure 10.18(b). Whenever there is an error
in synchronization, forward and reverse ranges are likely to be affected oppositely.
The reason for this is that if the skew rate parameter of the conversion is incorrect,
this parameter will skew times positively in one direction and negatively in the
other. In the graph, we see that there are several instances where the lines draw
diverge symmetrically.
In the time since we performed this experiment, we have located and fixed a
bug in the time synchronization system that would retain old conversions long
after they were no longer valid. This problem was not noticeable in the lab or
in the courtyard test, because in those tests the connectivity density was higher.
However, in our James Reserve deployment, we encountered a sparser network
connectivity graph and in addition had several nodes run out of battery—leaving
conversions still in the system. We addressed this problem by adding a timeout
to remove conversions that are not periodically updated.
In addition to synchronization errors, there appears to be a fixed bias for each
node with respect to other nodes. We can see how this might be possible if each
node has some fixed delay that varies from node to node, because that delay
would add in when sending, and subtract when receiving. However, we currently
do not have an explanation for this anomaly.
234
CHAPTER 11
Conclusions
This work has made many important contributions to the field. Some of these con-
tributions have already been published, in [GE01] [GBE02] [MGE02] [EGE02b]
[GEC04] [GSR04] [GLP05]. Others, including a detailed description of the plat-
form and the position estimation algorithm, have not yet been published.
11.1 Summary of Contributions
In this work, we have demonstrated a deployable, wireless platform capable of
hosting distributed acoustic sensing applications. Our acoustic ranging and po-
sition estimation application requires many of the same capabilities as would a
distributed sensing and detection application such as woodpecker detection. Dur-
ing our system testing, we tested the system outdoors and developed many of the
necessary tools for deploying the system practically in the field. The results of
our position estimation tests show that we are able compute position estimates
to within our target tolerances, and orientation estimates nearly meeting our tar-
gets. With some additional work, we believe that these results can be improved
to easily meet our requirements.
Platform Development. In the process of building and testing this system,
we put significant effort into the development of a reusable platform, with a large
235
investment in reusable system software components. Over the course of several
years we developed Emstar, a software system designed to simplify the devel-
opment of wireless embedded networked systems1. Emstar includes numerous
reusable components, and embodies a set of design principles that we have used
to build a complex, yet robust system. We have implemented Emstar above a
hardware layer which we integrated to provide a platform capable of distributed
acoustic sensing and local DOA estimation via a microphone array.
Time–Synchronized Sampling and Multihop Coordination. Our imple-
mentations of time–synchronized sampling and multihop coordination primitives
are a key part of the platform we have built. We demonstrate their utility in
our implementation of acoustic ranging and position estimation, showing how
these components of the platform drastically simplify the implementation of our
position estimation application. We use the capability to do precise time syn-
chronization over multiple hops to perform accurate time–of–flight ranging, and
we use the StateSync coordination primitive to share that data reliably and ef-
ficiently over multiple RF hops, greatly simplifying many system aspects of the
position estimation algorithm.
Ranging and Direction of Arrival Estimation. We have developed a set
of algorithms that yield highly accurate range and DOA estimates. While these
techniques are not individually novel, their integration into a working distributed
system and a successful position estimation algorithm is. In this work we combine
many areas in signal processing to yield a high–performance ranging system, and
prove out our basic platform design with a demanding algorithm. While there is
1Emstar includes the work of many contributors, a few of whom are: Jeremy Elson, ThanosStathopoulos, Alberto Cerpa, Nithya Ramanathan, and Martin Lukac.
236
still room for improvement and further testing, our ranging system achieves high
precision, with a standard deviation of 1.7 cm, long ranges of up to 90 meters,
and excellent noise immunity. The ranging is resilient to obstructing foliage, as
we have demonstrated in our tests at the James Reserve. The DOA estimation
determines estimates in 3–D, and obtains azimuth angles with a standard de-
viation of 0.96 deg and zenith angles between -30 and 45 deg with a standard
deviation of 0.86 deg, and slightly worse performance for angles above 45 deg.
The performance of the ranging sub–system is comparable to the best work
based on ultrasound [SHS01], and significantly better than that of previous sys-
tems based on audible sound [KMS05] [SBM04]. However, our system also per-
forms well in the presence of foliage. The performance of the DOA sub–system
performs considerably better than similar systems based on ultrasound, such
as [PMB01], perhaps because our system can perform sub–sample phase compar-
isons.
Positioning Application Performance We have demonstrated position es-
timation performance far better than other work in the field, under more difficult
circumstances. The work of Kwon et. al. [KMS05] is closest to ours in terms of
the objective: outdoor, ad–hoc position estimation in 2–D, in a grassy environ-
ment. Relative to this, we took our deployed testing a step farther, performing
our tests in a very difficult environment containing significant levels of obstruc-
tion and multipath interference, and deployed it more sparsely, as described in
Section 10.2. Despite this more challenging environment, our system performed
considerably better, giving typical average 3D position errors of 60 cm relative
to the 1.5–2.6 meter 2D position errors reported in [KMS05]. Furthermore, our
repeatability analysis and a comparison to our courtyard test described in Sec-
tion 10.1 suggests that the true accuracy is actually much higher than is revealed
237
through comparison to our approximate ground truth measurements.
The effort invested in building the platform and the system software com-
ponents such as Emstar, the Synchronized Sampling layer and the StateSync
protocol has ensured that this system works as a stand–alone system, robustly,
and runs on a wireless, embedded system. In addition, it provides a platform
and a foundation for other applications to be built above. Applications such as
the woodpecker detection and monitoring system are already being built to run
on this platform.
11.2 Discussion
We now consider the system as a whole, and ask several questions about the
system: how practical is it, how well does it scale, and what can be generalized
from it?
11.2.1 Practicality and Cost of the System
Our system is based on an Intel PXA platform with 64 MB of RAM and 32 MB
of flash2. This increases the cost of the platform both in terms of dollars and en-
ergy consumed. Our current system is a prototype and has not been optimized;
if it were optimized the cost in both metrics could be considerably lower. On
the hardware side, the processor board could be integrated with a customized
sampling board and possibly with a DSP pre–processor. The microphone ar-
ray could be redesigned to be lighter, smaller, and more symmetrical in shape.
The software could be greatly optimized both by optimizing the code as well as
2The CPU board is developed and marketed by Sensoria Corporation. The acoustic platformhardware was integrated by Jefferey Tseng, and is now being offered as a product by AevenaCorporation.
238
optimizing the operation of the system.
Many of the other systems that have been documented in the literature are
based on much lighter–weight platforms. Work as UIUC [KMS05] and Vander-
bilt [SBM04] are based on the Mica2 [HC02], while the Cricket [PCB00] [PMB01]
system is based on its own platform, and the AHLoS work [SHS01] also con-
structed a specialized platform based on a 16–bit ARM processor. In our system
we did not attempt to fit everything on an 8– or 16–bit processor, in part to
avoid early optimization, but also because we wanted to build a general–purpose
platform to support easy construction of experimental signal processing applica-
tions. While it might be possible to use an 8–bit microcontroller to do woodpecker
detection, it definitely will not be easy3.
Given our first–order goal of supporting embedded signal–processing applica-
tions, we chose to develop a system to be sparsely deployed in relatively small
numbers, and to have each node host an acoustic array. These applications are
also the source of our more stringent accuracy requirements, and of the require-
ment that we estimate array orientation.
In the final analysis, a distributed acoustic detection system will need all of
the features of the platform we have built: a local microphone array, sufficient
local processing and storage, high bandwidth wireless network with an appropri-
ate networking stack, and high accuracy time synchronization. Our automatic
position estimation system not only provides another key feature of the platform,
but it also comes at no additional cost, because all of the features it depends on
are already required by our set of target applications.
3This is a generous statement; distinguishing one species from another at a reasonabledistance, if done digitally, requires quite a bit of processing.
239
11.2.2 Scaling Properties and Applicability Across Environments
We did not do a thorough investigation of the scaling properties of our system.
This was partly because we only constructed 10 nodes, so we could not test
larger networks in real life. However, we are confident that as the number of
nodes grows, the resources required by the multilateration algorithm will grow.
While we did not address this issue in this system, we have experience from
a prior system that scaled to 90 nodes [MGE02]. In this system, we gradually
accreted a coordinate system by adding 14 nodes at a time. We selected the
nodes to add based on their connectivity to each other: starting with the most
well–connected node and next adding in the node that was best connected to the
nodes in our current set. This method was simple and appeared to work well,
although we did not carefully measure its performance.
Scaling might also be accomplished by distributing the multilateration algo-
rithm to several nodes, and stitching the resulting coordinate systems together.
Some of the work we have done on the map fit heuristics in our position error
metric might apply well to this stitching process. If each node computed a patch
of the coordinate system nearest them and broadcast the results via StateSync,
data from the other nodes could be stitched in using translation, scaling, and
rotation. In a more sophisticated solution, StateSync can be used to elect leaders
who can locally coordinate this process.
11.2.3 Ideas and Components to Carry Forward
Many aspects of this system will be useful for other research and system devel-
opment efforts. First of all, the platform itself is designed to be used to build
new applications. Enormous effort is required to assemble the hardware, includ-
240
ing the physical acoustic array, and to get the whole software system working
including getting all of the parts of the system synchronized. Doing this requires
attention to details at every layer of the system from physical hardware and me-
chanical construction to system software to kernel drivers to the many layers of
software and the position estimation application that calibrates the deployed sys-
tem. Having all of this in one off–the–shelf box is a huge step forward for anyone
contemplating doing distributed acoustic sensing and detection. Already we have
several students beginning to work with this platform, and we hope that trend
continues.
In addition, many of the sub–systems and components are also valuable on
their own.
Emstar. The Emstar system is a valuable resource and is being used in numer-
ous other contexts, both within our lab and outside it. Emstar provides a software
framework designed for wireless embedded networking [GEC04], that is integrated
with simulation tools and that comes with a collection of tools and components
that are useful in deployment. Emstar also simplifies the integration of Motes
and other TinyOS devices into a common, heterogeneous system [GSR04].
StateSync. The StateSync primitive and implementation is currently being
used and extended by several other projects in our lab. StateSync provides a
simple interface to efficient, reliable, low–latency state dissemination to one–hop
neighbors or over multiple hops. StateSync has broad application to wireless
networked systems, facilitating the implementation of efficient routing protocols,
leader election algorithms, and distribution of configuration or calibration data.
241
Synchronized Sampling and Audio Server. The code for the Synchronized
Sampling layer is available, although it is not always trivial to port to hardware
because of the dependence on kernel modifications and in some cases special-
ized firmware. However, the audio server interface is easily ported, and as new
platforms are standardized, kernel patches can be implemented to add the syn-
chronization hooks. There is already one other student in our lab who has ported
the Audio Server for use on an iPAQ for a acoustic sensing project that does not
require tight synchronization.
Ranging and DOA Estimation. The Ranging and DOA estimation tech-
niques are not necessarily novel, although they collectively represent a working
system: an integration of known techniques that yields a characterized perfor-
mance. In our work we identified a few important ideas that are worth not-
ing. First, that 2–TDOA schemes work better than Angular Correlation (AC)
schemes, because AC schemes are too sensitive to the exact placement of micro-
phones, whereas multiple 2–TDOA measurements can be combined while allowing
for slip in the microphone placement. Second, that by interpolating the corre-
lation function at a higher resolution we are readily able to achieve sub–sample
phase comparison in the time domain.
Position Estimation and Metrics. We have not seen the exact formulation
of our position estimation algorithm in prior work, so this may be of interest for
those researchers looking for a position estimation algorithm that works well for
3–D systems where angular data is available. In addition, our approach to fitting
for our position error metric, and our delineation from the basic Procrustean
approaches may at least be a good starting point for finding a good anchor–
free position error metric, and may also be useful for use in stitching together a
242
coordinate system from patches. While our metric has performed adequately for
our current needs, we don’t claim that it is ideal or complete. Applying some
of the more advanced ideas in [DM98] might well yield significant improvements
over our metric.
The Utility of Angular Measurements. Another idea that we have demon-
strated in this work is that angular information is very important. We use angular
information in several ways:
• To get an initial guess before iterative refinement.
• To cross–check range data and detect likely reflections.
• To resolve many types of geometric ambiguity.
In practice, angular information is critical to get the vertical topology right,
because it is often difficult to place nodes with sufficient geometric diversity to
resolve all ambiguities. To demonstrate this, we ran our James Reserve data with
angles disabled. If all angles were disabled, the NLLS convergence failed because
the system was under–constrained. If θ is used without φ, the system converged
to a folded configuration, shown in Figure 11.1.
The ambiguity that caused the fold might be eliminated with more ranges,
especially if the nodes are well–distributed vertically. However, distributing nodes
vertically is often very inconvenient, because it requires mounting nodes in trees
or tall towers. By including angular information, we increase the robustness
of the system to poorly constrained regions, enabling correct results with fewer
nodes and sparser, easier–to–deploy networks. Angular information also helps
the system scale by resolving ambiguities locally that otherwise might require
information from other parts of the network.
243
-2000
0
2000
4000
-4000-2000
02000
4000
-1000
-800
-600
-400
-200
0
200
400
Z cm
Comparison of solutions with and without using φ
Position Estimates Using φPosition Estimates Ignoring φ
Ground truth
X cm
Y cm
Z cm
Figure 11.1: 3–D plot showing the importance of using the φ angle information. Al-
though the system converged without φ, it converged to a folded configuration.
244
CHAPTER 12
Future Work
While this work has made much progress, there are many areas where improve-
ments can still be made. More testing and experience with the platform is needed;
the feedback from initial users of the platform should be valuable input when de-
termining which future improvements to make a priority.
12.1 Algorithm Improvements
A number of improvements can be made to the algorithms used in this work.
Improved Position Error Metric and Fit. As we alluded in Section 4.6, our
position error metric might be improved to better apply the Procrustes method.
In our current implementation we took several shortcuts to enable outlier rejection
that might lead to skewed data. This will not be a simple application of the
techniques in [DM98], but it might yield publishable position estimation metrics
that other researchers in the area could use.
In addition, it would be very useful to devise a metric that could tell us
when the system might have converged to a wrong answer, without access to
ground truth. Our existing Average Range Residual metric tells us whether the
system converged well, but it does not take into account how well–constrained
that configuration is.
245
Better Confidence Metrics. Our ranging and DOA estimators output con-
fidence values to indicate the quality of the detection. Currently the ranging
metric is based on the SNR of the detected pulse, but the angular metric is a
somewhat ad–hoc formula. It would be useful to devise a more useful confidence
metric, perhaps related to the quality–of–fit of the angular estimator.
Improvements to the Optimization Method. While our methods appear
to work in practice, we have at times tuned the algorithms based on measure-
ments from our tests. In our NLLS and R–θ implementations we have used
weightings that we derived from tests under controlled conditions, yet we are
applying these distributions in other environments that may have different prop-
erties. Our outlier rejection scheme relies on the selection of a threshold for
rejection of constraints. We determined this threshold empirically using ground
truth from our Court of Sciences tests, but we wish to apply this in arbitrary
environments.
These concerns suggest that more work is needed to produce a more gener-
alized solution to the problem. One approach would start by making a minimal
set of assumptions about the data, and using these assumptions to build a model
that can be used to derive solutions to some of these questions. In this effort, we
want to either make as few assumptions as possible, or require that each deploy-
ment make some number of ground truth measurements so that the results may
be verified.
In this effort, we will first recover more accurate measurements of the James
Reserve ground truth deployment, so that we can better analyze the data. In
addition, deploying the system in new kinds of environment will also yield more
data with which to test our techniques.
246
Investigate the Forward/Reverse Range Offset. The offset observed in
Section 10.3 is not well understood. If this is a per–node offset or per–array
offset, then we should calibrate the systems to eliminate it. If the offset is not
stable, we should determine its cause and attempt to eliminate it. This offset may
account for the fact that our positioning performance does not seem to match
our ranging performance as well as expected.
Investigate Scaling Issues. We have thus far not addressed any scaling issues.
Since we only have 10 nodes it is difficult to investigate these issues in practice.
However, simulation work might be useful in testing possible staged or distributed
multilateration schemes, determining in the process how much accuracy is lost
when we solve the system in segments.
Implement Leader Election and Centralized Multilateration. The cur-
rent application performs periodic, uncoordinated ranging, and publishes all
range and angle data so that the multilateration algorithm can pick it up and
process it into a map. However, to get the best results, we want to do all ranging
in as short a time a possible. Implementing this would require additional coordi-
nation, and our idea here is to elect a leader who will coordinate all of the nearby
nodes to chirp in close succession, and then collect all the data to perform the
multilateration. After the multilateration, it would broadcast the results, and the
nodes receiving the results would stitch the maps from different leaders together
into a single map.
This implementation would be more complex than the current one, but using
StateSync it should be very straightforward to implement the leader election and
results broadcast components. The schedule for ranging can also be published
using StateSync, using the gsyncd global time distribution component to provide
247
a common timebase within which to reference the ranging. The resulting system
should provide a significant performance improvement during daytime when the
environmental parameters change most rapidly.
12.2 Platform Improvements
The platform is a prototype, and there are many ways to physically improve it:
• The sampling boards are expensive and cumbersome. Replacing them
with a custom board would be a cost savings and might improve the sys-
tem as well (e.g. we could eliminate the self–ranging speaker and many
workarounds in the audio driver). The audio amplifier should also be re-
placed.
• The microphone array configuration is asymmetric and would probably be
improved by changing to a symmetric configuration. The mounts that hold
the microphones should be replaced, because they shadow other micro-
phones in the array from certain angles, and are not mechanically held as
tightly as they should be. The “self–ranging” speaker should be positioned
so it does not block channel 0.
• An accelerometer should be added to the array to detect movement as well
as to enable correction for roll and pitch. An EEPROM on the array would
also be useful, so that calibration information and an ID can be stored
there.
• An 802.15.4 or Mica2 radio should be added to provide a low power paging
channel, to improve synchronization, and to communicate with Motes.
248
Calibration of Microphone Arrays. In our testing, we discovered a need
to better calibrate the microphone arrays. Specifically, a technique that would
determine more exactly the position of the microphones would result in better fits
in the DOA stage of the system. We can then associate calibration information
with each array, and use that information to improve the DOA estimates.
It would also be useful to quantify the accuracy of our model defining phase
lag as a function of DOA. We currently assume that the lags are a simple function
of the incoming angle, but if one of the microphones is shaded that will likely
induce excess path length to that microphone. A more thorough angular test
might reveal new information about the relationship between incoming angle
and the phase offsets.
Re–factor and Improve Multihop StateSync Implementation. Our ini-
tial experiences in the field have shown some problems using the multihop version
of StateSync. The performance is not as we had hoped, and we believe the cause
is that StateSync tries to do too much. To address this, we propose refactoring
Multihop StateSync and removing the currently existing “clustering algorithm”.
We believe that this will result in the creation of several components that are
individually more useful than the current monolithic StateSync implementation.
We can then re–assemble the StateSync functionality from this decomposed set
of modules.
Software Improvements and Optimization. In our work thus far we have
done very little optimization. Now that we are working with more complex sys-
tems, it may be time to start looking more carefully at resource usage and finding
ways to prune resource usage. We think that this process may also result in Em-
star being generally a simpler system, with fewer, more manageable components.
249
Two areas of optimization that we see currently are memory usage and mes-
sage passing overhead for high–rate sensor devices. We propose to address mem-
ory by examining the system to find out where the memory is used and for what
purpose. We propose to implement a shared–memory version of sensor device
that uses FUSD message passing for notification and control traffic only, and
uses shared–memory for direct data access.
Development of New Applications. Several new applications will serve to
refine and improve the platform. A mote localization system that localized Mote–
based emitters using DOA from multiple points would be a useful application of
this platform. As a mote localization scheme, it has the advantage that there
is no need for precise time synchronization to the motes, which substantially re-
duces the level of integration required. Other applications, such as woodpecker
detection, would also test the system in new ways. One interesting class of appli-
cations is a “call and response” application, in which the node emits a particular
animal call at a particular time, and listens for animals to respond to that call.
New Tests. Our range test in lot 9 was affected by a synchronization problem,
and possibly by weather. We think that redoing the test in an indoor environ-
ment, such as an underground parking garage, would result in more consistent
data.
We would also like to re–do the Court of Sciences test with more accurate
ground truth measurements, in addition to re–measuring the James Reserve de-
ployment.
250
References
[ADB04] A. Arora, P. Dutta, S. Bapat, V. Kulathumani, H. Zhang, V. Naik,V. Mittal, H. Cao, M Demirbas, M. Gouda, Y. Choi, T. Herman,S. Kulkarni, U. Arumugam, and M. Nesterenko. “A Line in the Sand:A Wireless Sensor Network for Target Detection, Classification andTracking.” Computer Networks, 46(5):605–634, December 2004.
[APB02] S. Azou, C. Pistre, and G. Burel. “A chaotic direct sequence spread-spectrum system for underwater communication.” In Proceedings ofthe IEEE Oceans Conf., Biloxi, Mississippi, October 2002.
[ARE05] A. Arora, R. Ramnath, E. Ertin, P. Sinha, S. Bapat, V. Naik, V. Ku-lathumani, H. Zhang, H. Cao, M. Sridharan, S. Kumar, N. Sed-don, C. Anderson, T. Herman, N. Trivedi, C. Zhang, M. Nesterenko,R. Shah, S. Kulkarni, M. Aramugam, L. Wang, M. Gouda, Y. Choi,D. Culler, P. Dutta, C. Sharp, G. Tolle, M. Grimmer, B. Ferriera, andK. Parker. “ExScal: Elements of an Extreme Scale Wireless SensorNetwork.” In The 11th IEEE International Conference on Embeddedand Real–Time Computing Systems and Applications (RTCSA 2005),2005.
[Baa05] Tom Van Baak. “Experiments With a PC Sound Card.” TheLeapSecond Web Site, 2005.
[BC91] Kenneth Birman and Robert Cooper. “The ISIS Project: Real Ex-perience with a Fault Tolerant Programming System.” OperatingSystems Review, pp. 103–107, April 1991.
[BFH03] Robert Braden, Ted Faber, and Mark Handley. “From protocol stackto protocol heap: role-based architecture.” SIGCOMM Comput.Commun. Rev., 33(1):17–22, 2003.
[BHE00] Nirupama Bulusu, John Heidemann, and Deborah Estrin. “GPS–lesslow cost outdoor localization for very small devices.” IEEE PersonalCommunications, 5(5):28–34, 2000.
[BP00] P. Bahl and V.N. Padmanabhan. “RADAR: An in–building RF–based user location and tracking system.” In Proceedings of IEEEInfocom. IEEE, 2000.
[Bro86] Rodney Brooks. “A Robust Layered Control System for a MobileRobot.” IEEE Journal of Robotics and Automation, 2(1), 1986.
251
[Byt05] Vladimir Bytchkovskiy. “Discussion of EmStar Layering.” PersonalCommunication, 2005.
[CAB03] Douglas S. J. De Couto, Daniel Aguayo, John Bicket, and RobertMorris. “A High-Throughput Path Metric for Multi-Hop WirelessRouting.” In Mobicom 2003. ACM, 2003.
[CDG03] K. Chintalapudi, A. Dhariwal, R. Govindan, and G. Sukhatme. “Onthe feasibility of ad–hoc localization systems.” Technical Report 03-796, Computer Science Department, University of Southern Califor-nia, 2003.
[CGS04] Krishna Chintalapudi, Ramesh Govindan, Gaurav Sukhatme, andAmit Dhariwal. “Ad-Hoc Localization Using Ranging and Sectoring.”In INFOCOM ’04: Proceedings of the IEEE Inforcom 2004, 2004.
[CJB01] B. Chen, K. Jamieson, H. Balakrishnan, and R. Morris. “Span: Anenergy–efficient coordination algorithm for topology maintenance inad hoc wireless networks.” In Proceedings of the International Con-ference on Mobile Computing and Networking (MobiCom2001), pp.85–96, 2001.
[CKS03] N.S. Correal, S. Kyperountas, Q. Shi, and M. Welborn. “An UltraWideband Relative Location System.” In Proceedings of the IEEEConference on Ultra Wideband Systems and Technologies, pp. 394–397, November 2003.
[Cla88] David D. Clark. “The Design Philosophy of the DARPA InternetProtocols.” Computer Communications Review, 18(4):106–114, 1988.
[CWP05] Alberto Cerpa, Jennifer L. Wong, Miodrag Potkonjak, and DeborahEstrin. “Statistical Model of Lossy Links in Wireless Sensor Net-works.” In IPSN ’05: Proceedings of the Fourth ACM/IEEE Inter-national Conference on Information Processing in Sensor Networks,2005.
[DM98] Ian L. Dryden and Kanti V. Mardia. Statistical Shape Analysis. JohnWiley and Sons, 1998.
[DZL05] Ramani Duraiswami, Dmitry N. Zotkin, Zhiyun Li, Elena Grassi,Nail A. Gumerov, and Larry S. Davis. “High Order Spatial AudioCapture and its Binaural Head–Tracked Playback over Headphoneswith HTRF Cues.” Journal of the Audio Engineering Society, 2005.
252
[EBB03a] J. Elson, S. Bien, N. Busek, V. Bychkovskiy, A. Cerpa, D. Gane-san, L. Girod, B. Greenstein, T. Schoellhammer, T. Stathopoulos,and D. Estrin. “EmStar: An Environment for Developing WirelessEmbedded Systems Software.” Technical Report CENS TechnicalReport 0009, Center for Embedded Networked Sensing, University ofCalifornia, Los Angeles, March 2003.
[EBB03b] Jeremy Elson, Solomon Bien, Naim Busek, Vladimir Bychkovskiy,Alberto Cerpa, Deepak Ganesan, Lewis Girod, Ben Greenstein, TomSchoellhammer, Thanos Stathopoulos, and Deborah Estrin. “Em-Star: An Environment for Developing Wireless Embedded SystemsSoftware.” Technical Report CENS-0009, Center for Embedded Net-worked Sensing, 2003.
[EGE02a] Jeremy Elson, Lewis Girod, and Deborah Estrin. “Fine–Grained Net-work Time Synchronization using Reference Broadcasts.” In OSDI,pp. 147–163, Boston, MA, December 2002.
[EGE02b] Jeremy Elson, Lewis Girod, and Deborah Estrin. “A Wireless Time–Synchronized COTS Sensor Platform, Part I: System Architecture.”In IEEE CAS Workshop on Wireless Communications and Network-ing, 2002.
[EGE04] Jeremy Elson, Lewis Girod, and Deborah Estrin. “EmStar: Develop-ment with High System Visibility.” IEEE Wireless CommunicationsMagazine, 2004.
[Els02] Jeremy Elson. FUSD: Framework for User Space Devices, 2002.
[Els03] Jeremy Elson. Time Synchronization in Wireless Sensor Networks.PhD thesis, Univerity of Caliornia at Los Angeles, 2003.
[ER02] Jeremy Elson and Kay Romer. “Wireless Sensor Networks: A NewRegime for Time Synchronization.” In First Workshop on Hot Topicsin Networks (HotNets-I), 2002.
[FJL97] Sally Floyd, Van Jacobson, Ching-Gung Liu, Steven McCanne, andLixia Zhang. “A reliable multicast framework for light-weight sessionsand application level framing.” IEEE/ACM Trans. on Networking(TON), 5(6):784–803, 1997.
[GB82] David Gelernter and Arthur J. Bernstein. “Distributed communica-tion via global buffer.” In PODC ’82: Proceedings of the first ACMSIGACT-SIGOPS symposium on Principles of distributed computing,pp. 10–18, New York, NY, USA, 1982. ACM Press.
253
[GBE02] Lewis Girod, Vladimir Bychkovskiy, Jeremy Elson, and Deborah Es-trin. “Locating tiny sensors in time and space: A case study.” Inin Proceedings of ICCD 2002 (invited paper), Freiburg, Germany,September 2002. http://lecs.cs.ucla.edu/Publications.
[GE01] L. Girod and D. Estrin. “Robust Range Estimation Using Acousticand Multimodal Sensing.” In International Conference on IntelligentRobots and Systems, October 2001.
[GEC04] Lewis Girod, Jeremy Elson, Alberto Cerpa, Thanos Stathopoulos,Nithya Ramanathan, and Deborah Estrin. “EmStar: a Software En-vironment for Developing and Deploying Wireless Sensor Networks.”In Proceedings of the 2004 USENIX Technical Conference, Boston,MA, 2004. USENIX Association.
[GGS05] S. Ganeriwal, D. Ganesan, H. Shim, V. Tsiatsis, and M. Srivastava.“Estimating Clock Uncertainty for Efficient Duty Cycling in SensorNetworks.” In Proceedings of the 3rd International Conference onEmbedded Networked Sensor Systems (SenSys05). ACM, November2005.
[GKE04] Ben Greenstein, Eddie Kohler, and Deborah Estrin. “A Sensor Net-work Application Construction Kit (SNACK).” In Proceedings of thesecond international conference on Embedded networked sensor sys-tems. ACM Press, 2004.
[GKS03] S. Ganeriwal, R. Kumar, and M. Srivastava. “Timing Sync Protocolfor Sensor Networks.” In Sensys, Los Angeles, 2003.
[GLP05] Lewis Girod, Martin Lukac, Andrew Parker, Thanos Stathopoulos,Jeffrey Tseng, Hanbiao Wang, Deborah Estrin, Richard Guy, andEddie Kohler. “A Reliable Multicast Mechanism for Sensor NetworkApplications.” Technical report, CENS, April 25, 2005 2005.
[GR03] J. van Greunen and Jan Rabaey. “Lightweight Time Synchroniza-tion for Sensor Networks.” In Proceedings of the ACM Workshop onWireless Sensor Networks and Applications (WSNA 2003), Septem-ber 2003.
[GSR04] L. Girod, T. Stathopoulos, N. Ramanathan, J. Elson, D. Estrin,E. Osterweil, and T. Schoellhammer. “Tools for Deployment andSimulation of Heterogeneous Sensor Networks.” In Proceedings ofSenSys 2004, November 2004.
254
[Ham00] M. Hamilton. “Hummercams, Robots, and the Virtual Reserve.”James San Jacinto Mountains Reserve web site, February 2000.
[Har05] Michael Hardy. “Studentized Residuals.” In Wikipedia: An OnlineEncyclopedia. Wikipedia Web Site, 2005.
[HC02] J. Hill and D. Culler. “Mica: A Wireless Platform for Deeply Em-bedded Networks.” IEEE Micro, 22(6):12–24, Nov/Dec 2002.
[HSE03] John Heidemann, Fabio Silva, and Deborah Estrin. “Matching DataDissemination Algorithms to Application Requirements.” In Pro-ceedings of the first international conference on Embedded networkedsensor systems. ACM Press, 2003.
[HSS05] T. He, R. Stoleru, and J.A. Stankovic. “Spotlight: Low-Cost Asym-metric Localization System for Networked Sensor Nodes.” In The 4thInternational Conference on Information Processing in Sensor Net-works (IPSN 2005), Demo Paper. USENIX, 2005.
[HSW00] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler,and Kristofer Pister. “System architecture directions for networkedsensors.” In Proceedings of the Ninth International Conference onArhitectural Support for Programming Languages and Operating Sys-tems (ASPLOS-IX), pp. 93–104, Cambridge, MA, USA, November2000. ACM.
[HVB01] J. Hightower, C. Vakili, G. Borriello, and R. Want. “Design and Cali-bration of the SpotON Ad–Hoc Location Sensing System.” Availablefrom Jeff Hightower’s web site., 2001.
[HW02] Michael Hazas and Andy Ward. “A novel broadband ultrasonic lo-cation system.” In The 4th International Conference on UbiquitousComputing (UbiComp 2002), 2002.
[IGE00] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin.“Directed diffusion: a scalable and robust communication paradigmfor sensor networks.” In MobiCom ’00: Proceedings of the 6th annualinternational conference on Mobile computing and networking, pp.56–67, New York, NY, USA, 2000. ACM Press.
[JM96] D. Johnson and D. Maltz. “Dynamic Source Routing in ad hoc wire-less networks.” Mobile Computing, pp. 153–181, 1996.
255
[JZ04] Xiang Ji and Hongyuan Zha. “Sensor Positioning in Wireless Ad–Hoc Networks with Multidimensional Scaling.” In IEEE INFOCOM,2004.
[KC76] C.H. Knapp and G.C. Carter. “The generalized correlation methodfor estimation of time delay.” IEEE Transactions on Acoustics,Speech, and Signal Processing, ASSP-24(4):320–327, August 1976.
[KEE03] Richard Karp, Jeremy Elson, Deborah Estrin, and Scott Schenker.“Optimal and Global Time Synchronization in Sensornets.” TechnicalReport CENS-0012, Center for Embedded Networked Sensing, 2003.
[KMS05] YoungMin Kwon, Kirill Mechitov, Sameer Sundresh, Wooyoung Kim,and Gul Agha. “Resilient Localization for Sensor Networks in Out-door Environments.” In IEEE International Conference on Dis-tributed Computing Systems (ICDCD05), 2005.
[Koe03] Kay Koemer. “The Lighthouse Location System for Smart Dust.” InThe First International Conference on Mobile Systems, Applications,and Services (MobiSys03). USENIX, 2003.
[KSP03] F. Koushanfar, S. Slijepcevic, and M. Potkonjak. “Location Discov-ery in Ad–hoc Wireless Sensor Networks.” In X. Cheng, X. Huang,and D.Z. Du, editors, Ad–hoc Wireless Networking. Kluwer AcademicPublishers, 2003.
[LLW03] P. Levis, N. Lee, M. Welsh, and D. Culler. “TOSSIM: Accurate andScalable Simulations of Entire TinyOS Applications.” In Sensys, LosAngeles, 2003.
[LPC04] Philip Levis, Neil Patel, David E. Culler, and Scott Shenker. “Trickle:A Self-Regulating Algorithm for Code Propagation and Maintenancein Wireless Sensor Networks.” In Proceedings of 1st Symposium onNetworked Systems Design and Implementation (NSDI 2004), March29-31, 2004, San Francisco, California, 2004.
[LR03] K. Langendoen and N. Reijers. “Distributed Localization in WirelessSensor Networks: a Quantitative Comparison.” Computer Networks,special issue on Wireless Sensor Networks, 2003.
[LS02] J.Y. Lee and R.A. Scholtz. “Ranging in a dense multipath environ-ment using an UWB radio link.” IEEE Journal on Selected Areas inCommunications, 20(9):1677–1683, December 2002.
256
[MGE02] William Merrill, Lewis Girod, Jeremy Elson, Katayoun Sohrabi,Fredric Newberg, and William Kaiser. “Autonomous Position Loca-tion in Distributed, Embedded, Wireless Systems.” In the IEEE CASWorkshop on Wireless Communications and Networking, Pasadena,CA, 2002.
[MGS04] W. Merrill, L. Girod, B. Schiffer, D. McIntire, G. Rava, K. Sohrabi,F. Newberg, J. Elson, and W. Kaiser. “Dynamic Networking andSmart Sensing Enable Next-Generation Landmines.” IEEE PervasiveComputing Magazine, Oct-Dec 2004.
[Mil94] David L. Mills. “Internet Time Synchronization: The Network TimeProtocol.” In Zhonghua Yang and T. Anthony Marsland, editors,Global States and Time in Distributed Systems. IEEE Computer So-ciety Press, 1994.
[MKS04] M. Maroti, B. Kusy, G. Simon, and A. Ledeczi. “The FloodingTime Synchronization Protocol.” In Proceedings of the 2nd Inter-national Conference on Embedded Networked Sensor Systems (Sen-Sys04). ACM, November 2004.
[MLR04] David Moore, John Leonard, Daniela Rus, and Seth Teller. “Robustdistributed network localization with noisy range measurements.” InSenSys ’04: Proceedings of the 2nd international conference on Em-bedded networked sensor systems, pp. 50–61, New York, NY, USA,2004. ACM Press.
[MVD05] Miklos Maroti, Peter Volgyesi, Sebestyen Dora, Branislav Kusy, An-dras Nadas, Akos Ledeczi, Gyorgy Balogh, and Karoly Molnar. “Ra-dio Interfermetric Geolocation.” In Proceedings of the 3rd Inter-national Conference on Embedded Networked Sensor Systems (Sen-Sys05). ACM, November 2005.
[NN03] D. Niculescu and B. Nath. “DV–based positioning in adhoc net-works.” Telecommunications Systems, pp. 267–280, 2003.
[PAK05] Neal Patwari, Joshua Ash, Spyros Kyperountas, Alfred Hero III, Ran-dolph Moses, and Neiyer Correal. “Locating the Nodes: CooperativeLocalization in Sensor Networks.” IEEE Signal Processing Magazine,pp. 54–69, July 2005.
[PB94] C. Perkins and P. Bhagwat. “Highly Dynamic Destrinatio–sequencedDistance Vector Routing (DSDV) for Mobile Computers.” In Proceed-ings of the ACM SIGCOMM, pp. 234–244. ACM, August 1994.
257
[PCB00] N. Priyantha, A. Chakraborty, and H. Balakrishnan. “The CricketLocation Support System.” In Mobicom 2000. ACM, August 2000.
[PMB01] Nissanka B. Priyantha, Allen K. L. Miu, Hari Balakrishnan, andSeth Teller. “The Cricket Compass for Context–Aware Mobile Ap-plicaitons.” In The 7th ACM Conference on Mobile Computing andNetworking (MOBICOM), 2001.
[PPT90] R. Pike, D. Presotto, K. Thompson, and H. Trickey. “Plan 9 fromBell Labs.” In Proceedings of the Summer 1990 UKUUG Conference,pp. 1–9, July 1990.
[PR99] C. Perkins and E. Royer. “Ad hoc On–demand Distance–vector(AODV) routing.” In Proceedings of the 2nd IEEE Workshop onMobile Computing Systems and Applications, 1999.
[PTV92] William H. Press, Saul A. Teukolsky, William T. Vetterling, andBrian P. Flannery. Numerical Recipes in C: The Art of ScientificComputing, 2nd Edition. Cambridge University Press, 1992.
[Rap96] T.S. Rappaport. Wireless Communications: Principles and Practice.Prentice–Hall, 1996.
[RBD97] Mendel Rosenblum, Edouard Bugnion, Scott Devine, and Steve Her-rod. “Using the SimOS Machine Simulator to Study Complex Com-puter Systems.” ACM TOMACS Speical Issue on Computer Simula-tion, 1997.
[RC01] Alessandro Rubini and Johnathan Corbet. Writing Linux DeviceDrivers, 2nd edition. O’Reilly, June 2001.
[RD04] Vikas C. Raykar and Ramani Duraiswami. “Automatic Position Cali-bration of Multiple Microphones.” In IEEE International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP04), 2004.
[RDY05] Vikas C. Raykar, Ramani Duraiswami, and B. Yagnanarayana. “Ex-tracting the frequencies of the pinna spectal notches from measuredhead–related impulse responses.” Journal of the American AcousticSociety, 118(1):364–374, July 2005.
[RF03] Yong Rui and Dinei Florencio. “New Direct Approaches to RobustSound Source Localization.” In IEEE International Conference onMultimedia and Expo (ICME), 2003.
258
[RN97] Brent Rector and Joseph Newcomer. Win32 Programming. AddisonWesley, January 1997.
[SBG04] Adam Smith, Hari Balakrishnan, Michel Goraczko, and NissankaPriyantha. “Tracking Moving Devices with the Cricket Location Sys-tem.” In ACM MobiSYS 2004, 2004.
[SBM04] Janos Sallai, Gyorgy Balogh, Miklos Maroti, Akos Ledeczi, andBranislav Kusy. “Acoustic Ranging in Resource–Constrained Sen-sor Networks.” Technical Report ISIS-04-504, Institute for SoftwareIntegrated Systems, 2004.
[SGS04] Andreas Savvides, Lewis Girod, Mani Srivastava, and Deborah Es-trin. “Localization in Sensor Networks.” In C. S. Raghavendra, K. M.Sivalingam, and T. Znati, editors, Wireless Sensor Networks. Kluwer,2004.
[SHS01] Andreas Savvides, Chih-Chieh Han, and Mani B. Strivastava. “Dy-namic fine-grained localization in Ad-Hoc networks of sensors.” InMobiCom ’01: Proceedings of the 7th annual international conferenceon Mobile computing and networking, pp. 166–179, New York, NY,USA, 2001. ACM Press.
[SHS05] Radu Stoleru, Tian He, John A. Stankovic, and David Luebke. “AHigh Accuracy, Low Cost Localization System for Wireless SensorNetworks.” In Proceedings of the 3rd International Conference onEmbedded Networked Sensor Systems (SenSys05). ACM, November2005.
[SKB00] A. Savvides, F. Koushanfar, A. Boulis, V. Karavas, and M.B. Sri-vastava M. Potkonjak. “Location Discovery in Ad-hoc Wireless Net-works.” Memorandum, Networked and Embedded Systems Labora-tory, UCLA, June 2000.
[SMP02] Sasha Slijepcevic, Seapahn Megerian, and Miodrag Potkonjak. “Lo-cation Errors in Wireless Embedded Sensor Networks: Sources, Mod-els, and Effects on Applications.” Mobile Computing and Communi-cations Review, 6(3):67–78, June 2002.
[SMP03] Sasha Slijepcevic, Seapahn Megerian, and Miodrag Potkonjak.“Characterization of Location Error in Wireless Sensor Networks:Analysis and Applications.” In IPSN ’03: Proceedings of the SecondACM/IEEE International Conference on Information Processing inSensor Networks, pp. 593–608, 2003.
259
[SP80] D.V. Sarwate and M.B. Pursley. “Crosscorrelation Properties ofPsuedorandom and Related Sequences.” Proceedings of the IEEE,68:593–619, 1980.
[SPS03] A. Savvides, H. Park, and M. B. Srivastava. “The N-Hop Multilat-eration Primitive for Node Localization Problems.” MONET SpecialIssue on Sensor Networks and Applications, 2003.
[SRZ03] Y. Shang, W. Ruml, Y. Zhang, and M. Fromherz. “Localization frommere connectivity.” In Proceedings of MobiHoc03, pp. 201–212, June2003.
[SS04] Radu Stoleru and John Stankovic. “Probability Grid: A LocationEstimation Scheme for Wireless Sensor Networks.” In Proceedings ofthe 1st IEEE Conference on Sensor and Adhoc Communications andNetworks (SECON 2004), 2004.
[Tor52] W.S. Torgerson. “Multidimensional Scaling: I. Theory and Method.”Psychometrika, 17:401–419, 1952.
[War04] Matthias Warkus. The Official GNOME 2 Developer’s Guide. NoStarch Press, April 2004.
[WC03] Kamin Whitehouse and David Culler. “Macro–Calibration in Sen-sor/Actuator Networks.” Mobile Networks and Applications, pp. 463–472, 2003.
[WCA05] Hanbiao Wang, Chiao-En Chen, Andreas Ali, Shadnaz Asgari,Ralph E. Hudson, Kung Yao, Deborah Estrin, and Charles Taylor.“Acoustic Sensor Networks for Woodpecker Localization.” In SPIEConference on Advanced Signal Processing Algorithms, Architecturesand Implementations, August 2005.
[WJH97] A. Ward, A. Jones, and A. Hopper. “A new location technique forthe active office.” IEEE Personal Communications, 4(5), October1997.
[WSB04] Kamin Whitehouse, Cory Sharp, Eric Brewer, and David Culler.“Hood: a neighborhood abstraction for sensor networks.” In Mo-biSYS ’04: Proceedings of the 2nd international conference on Mobilesystems, applications, and services, pp. 99–110, New York, NY, USA,2004. ACM Press.
260
[WTC03a] Alec Woo, Terence Tong, and David Culler. “Taming the UnderlyingChallenges of Reliable Multihop Routing in Sensor Networks.” InSensys 2003, 2003.
[WTC03b] Alec Woo, Terence Tong, and David Culler. “Taming the underlyingchallenges of reliable multihop routing in sensor networks.” In Pro-ceedings of the first international conference on Embedded networkedsensor systems, pp. 14–27. ACM Press, 2003.
[WYP04] Hanbiao Wang, Kung Yao, Greg Pottie, and Deborah Estrin.“Entropy–based Sensor Selection Heuristic for Localization.” In Sym-posium on Information Processing in Sensor Networks (IPSN04),April 2004.
[YHE02] W. Ye, J. Heidemann, and D. Estrin. “An energy-efficient MACprotocol for wireless sensor networks.” In Proceedings of IEEE IN-FOCOM, 2002.
[ZG03] Jerry Zhao and Ramesh Govindan. “Understanding Packet DeliveryPerformance In Dense Wireless Sensor Networks.” In Proceedingsof the first international conference on Embedded networked sensorsystems. ACM Press, 2003.
261