A Self-Calibrating System of Distributed Acoustic Arrays

University of California

Los Angeles

A Self-Calibrating System

of Distributed Acoustic Arrays

A dissertation submitted in partial satisfaction

of the requirements for the degree

Doctor of Philosophy in Computer Science

by

Lewis David Girod

2005

c© Copyright by

Lewis David Girod

2005

The dissertation of Lewis David Girod is approved.

Stefano Soatto

Gregory J. Pottie

Miodrag Potkonjak

Deborah L. Estrin, Committee Chair

University of California, Los Angeles

2005

ii

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Contributions of this Work . . . . . . . . . . . . . . . . . . . . . . 3

1.2 How to avoid reading this document . . . . . . . . . . . . . . . . 6

I The Array Calibration Problem 8

2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Requirements of Acoustic Detection Applications . . . . . . . . . 9

2.2 Definition of the Calibration Problem . . . . . . . . . . . . . . . . 11

2.3 Outline of Proposed Solution . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Time of Flight Ranging Layer . . . . . . . . . . . . . . . . 13

2.3.2 Multilateration Layer . . . . . . . . . . . . . . . . . . . . . 13

2.3.3 Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Properties of the Acoustic Array Hardware . . . . . . . . . . . . . 15

2.5 Comparison to Related Systems . . . . . . . . . . . . . . . . . . . 16

2.5.1 RF Localization Techniques . . . . . . . . . . . . . . . . . 17

2.5.2 Laser–based Localization . . . . . . . . . . . . . . . . . . . 19

2.5.3 Ultrasound Acoustic Localization . . . . . . . . . . . . . . 19

2.5.4 Orientation Discovery . . . . . . . . . . . . . . . . . . . . 20

2.5.5 Alternatives to the use of Sensor Arrays . . . . . . . . . . 21

3 Estimation of Range and DOA . . . . . . . . . . . . . . . . . . . . 23

iii

3.1 Filtering and Correlation . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 Code Generation and Modulation . . . . . . . . . . . . . . 28

3.1.2 Input Extraction . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Detection and Extraction . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Noise Estimation and Peak Detection . . . . . . . . . . . . 32

3.2.2 Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.3 Interpolation and Normalization . . . . . . . . . . . . . . . 41

3.3 DOA Estimation and Combining . . . . . . . . . . . . . . . . . . 42

3.3.1 Lag Finding . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.2 DOA Estimation . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Alternative Approaches to DOA Estimation . . . . . . . . 48

3.3.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.5 Peak Detection . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Environmental Effects . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Multilateration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.1 Overview and Context . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Prefiltering and Initial Estimation . . . . . . . . . . . . . . . . . . 57

4.3 Two Solutions to the Position Estimation Problem . . . . . . . . 59

4.3.1 R–θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3.2 Iterative Non–Linear Least–Squares Minimization . . . . . 62

4.4 Interleaved Orientation Estimation . . . . . . . . . . . . . . . . . 69

iv

4.5 Outlier Rejection Using Studentized Residuals . . . . . . . . . . . 71

4.6 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.7 System Considerations . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.1 What Can Go Wrong? . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Strategies for Robustness . . . . . . . . . . . . . . . . . . . . . . . 81

5.3 Successfully Managing Complexity . . . . . . . . . . . . . . . . . 84

II The Acoustic Sensing Platform 85

6 Emstar: a Software Framework . . . . . . . . . . . . . . . . . . . . 86

6.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.1.1 Inter–node communication is not usually transparent. . . . 87

6.1.2 The system within a node is complex and benefits from

distributed system design principles. . . . . . . . . . . . . 88

6.2 How Emstar Works . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.2.1 Layer 0: FUSD Syscall Inter–process RPC . . . . . . . . . 91

6.2.2 Layer 1: GLib Event System . . . . . . . . . . . . . . . . . 101

6.2.3 Layer 2: Emstar Device Patterns and Libraries . . . . . . . 103

6.2.4 Layer 3: Emstar Components and Services . . . . . . . . . 119

6.2.5 Layer 4: Additional Tools and Environment . . . . . . . . 125

7 A Synchronized Distributed Sampling Layer . . . . . . . . . . . 133

7.1 A Buffered Acoustic Sensor Interface . . . . . . . . . . . . . . . . 134

v

7.1.1 Continuous Sampling and Buffering . . . . . . . . . . . . . 135

7.1.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 135

7.1.3 Multi–Client Interface . . . . . . . . . . . . . . . . . . . . 138

7.2 An Integrated Time Synchronization Service . . . . . . . . . . . . 139

7.2.1 Conversion–Based Time Synchronization . . . . . . . . . . 140

7.2.2 The Timesync API and Time Conversion Graph . . . . . . 142

7.2.3 RBS vs. MAC Layer Timestamps . . . . . . . . . . . . . . 144

7.3 Hop–by–Hop Time Conversion . . . . . . . . . . . . . . . . . . . . 149

8 Multihop Wireless Layer . . . . . . . . . . . . . . . . . . . . . . . . 151

8.1 How Wireless is Different . . . . . . . . . . . . . . . . . . . . . . . 151

8.2 The StateSync Abstraction . . . . . . . . . . . . . . . . . . . . . . 153

8.2.1 Application Requirements . . . . . . . . . . . . . . . . . . 154

8.2.2 The StateSync Abstraction . . . . . . . . . . . . . . . . . . 155

8.2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.3 Variants of StateSync . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.3.1 SoftState . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.3.2 LogFlood . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.3.3 LogTree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

8.4 Benchmarking StateSync . . . . . . . . . . . . . . . . . . . . . . . 169

8.4.1 Metrics and Experimental Setup . . . . . . . . . . . . . . . 169

8.4.2 Benchmark Tests . . . . . . . . . . . . . . . . . . . . . . . 169

8.4.3 Determining Application Suitability . . . . . . . . . . . . . 174

vi

8.5 Applying StateSync to Position Estimation . . . . . . . . . . . . . 175

8.5.1 Applying the StateSync Model . . . . . . . . . . . . . . . . 176

8.5.2 StateSync Simplifies the System Design . . . . . . . . . . . 177

8.6 Performance of StateSync for Position Estimation . . . . . . . . . 179

8.7 Enabling System Visibility Using LogFlood . . . . . . . . . . . . . 182

III Experimental Results 184

9 Range and DOA Estimation Testing . . . . . . . . . . . . . . . . 185

9.1 DOA Component Testing . . . . . . . . . . . . . . . . . . . . . . 186

9.1.1 Azimuth Performance . . . . . . . . . . . . . . . . . . . . 188

9.1.2 Zenith Performance . . . . . . . . . . . . . . . . . . . . . . 191

9.2 Range Component Testing . . . . . . . . . . . . . . . . . . . . . . 195

10 System Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

10.1 Urban Outdoor Test: Court of Sciences . . . . . . . . . . . . . . . 206

10.1.1 Measurement of Ground Truth . . . . . . . . . . . . . . . 210

10.1.2 Selecting the Residual Cutoff . . . . . . . . . . . . . . . . 211

10.1.3 Comparison of R–θ and NLLS . . . . . . . . . . . . . . . . 214

10.1.4 Map Scaling With Temperature . . . . . . . . . . . . . . . 218

10.1.5 Repeatability . . . . . . . . . . . . . . . . . . . . . . . . . 219

10.2 Forest Outdoor Test: James Reserve . . . . . . . . . . . . . . . . 223

10.3 Analysis of Symmetric Ranges . . . . . . . . . . . . . . . . . . . . 231

vii

11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

11.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . 235

11.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

11.2.1 Practicality and Cost of the System . . . . . . . . . . . . . 238

11.2.2 Scaling Properties and Applicability Across Environments 240

11.2.3 Ideas and Components to Carry Forward . . . . . . . . . . 240

12 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

12.1 Algorithm Improvements . . . . . . . . . . . . . . . . . . . . . . . 245

12.2 Platform Improvements . . . . . . . . . . . . . . . . . . . . . . . . 248

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

viii

List of Figures

2.1 Photo of a node deployed at the James Reserve [Ham00], and a

diagram of a proposed distributed acoustic sensing application to

localize acorn woodpeckers. . . . . . . . . . . . . . . . . . . . . . 10

2.2 Block diagram of the self–calibration system. . . . . . . . . . . . 12

2.3 Photograph of an acoustic array, and a diagram of the array ge-

ometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Block diagram of the ranging detection algorithm. . . . . . . . . 24

3.2 The Filtering and Correlation stage of the ranging detection al-

gorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Modulation for seed=7, encoding the sequence 0111011110. . . . 25

3.4 (a) The power spectral density (PSD) function for the exact ref-

erence signal. (b) The PSD of the reference signal as recorded at

the source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 (a) The PSD of the input signal received at Node 101, 80 meters

from the source. (b) The correlation of the signal above, expressed

in the time domain. The correlation peak is 7dB above the noise

floor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 SNR for PN code detection as a function of sample rate skew, and

the observed sample rate skew in 100K outdoor trials. . . . . . . 29

3.7 Analyzing the distribution of autocorrelation noise. . . . . . . . . 33

ix

3.8 (a) Distribution of noise for the correlation shown in Figure 3.5.

(b) Distribution of peak correlation values for a 100K trial outdoor

test. For each successful trial, the largest noise peak and the

detection peak are included. . . . . . . . . . . . . . . . . . . . . . 34

3.9 The Detection and Extraction stage of the ranging detection al-

gorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.10 The DOA Estimation and Combining stage of the ranging detec-

tion algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.11 Plot of the DOA objective function observed for a test with φ =

0, θ = 281. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.12 Combined signals for the trial from Figure 3.5. The two curves

show the effect of recombination using the straight DOA estimate

and our heuristic. . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 Algorithm for determining an initial parameter estimate. . . . . . 58

6.1 The five layers of the Emstar framework. . . . . . . . . . . . . . 90

6.2 Message timing diagram of a FUSD call. The middle column of

the diagram represents the FUSD kernel module. . . . . . . . . . 94

6.3 A dependency loop, and using a broker service to break the loop. 96

6.4 Diagram showing how to use a thread and a queue to break a

FUSD dependency loop. . . . . . . . . . . . . . . . . . . . . . . . 97

6.5 The FUSD file operations structure. . . . . . . . . . . . . . . . . 98

6.6 Throughput comparison of FUSD and in–kernel implementations

of /dev/zero, timing a read of 1GB of data on a 2.8 GHz Xeon,

for both 2.4 and 2.6 kernels. . . . . . . . . . . . . . . . . . . . . . 100

x

6.7 The Emstar event system API. . . . . . . . . . . . . . . . . . . . 102

6.8 Setting a timer in the Emstar event system. . . . . . . . . . . . . 103

6.9 Block diagram of the Status Device pattern. The functions bi-

nary(), printable(), and write() are callbacks defined by the server,

while status notify() is called by the server to notify the client of

a state change. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.10 A snippet of code that creates a Status Device. . . . . . . . . . . 108

6.11 Block diagram of the Packet Device pattern. The functions send()

and filter() are callbacks defined by the server, while pd recieve()

and pd unblock() are functions called by the server. . . . . . . . . 109

6.12 Snippet of code that creates a Command Device. . . . . . . . . . 111

6.13 Block diagram of the Query Device pattern. In the Query Device,

queries from the clients are queued and “process” is called serially.

The “R” boxes represent a buffer per client to hold the response

to the last query from that client. . . . . . . . . . . . . . . . . . 113

6.14 Block diagram of the Sensor Device pattern. In the Sensor Device,

the server submits new samples by calling sdev push(). These

are stored in the ring buffer (RB), and streamed to clients with

relevant requests. The “R” boxes represent each client’s pending

request. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.15 “Crashproof” auto–reopen algorithm. . . . . . . . . . . . . . . . 118

6.16 Screen shot of EmView, the Emstar visualizer. . . . . . . . . . . 132

7.1 Block diagram of the buffered acoustic sensor interface. . . . . . 134

xi

7.2 Plot of the linear relationship between the VXP sample clock and

the platform’s CPU clock. . . . . . . . . . . . . . . . . . . . . . . 136

7.3 Block diagram of the syncd service. . . . . . . . . . . . . . . . . . 140

7.4 RBS correlation of the timing of received broadcasts. This graph

shows that CPU clocks are stable with respect to each other over

time periods as long as 20 minutes. . . . . . . . . . . . . . . . . . 144

7.5 The MAC clocks appear to actively adapt their rates, rather than

maintaining frequency stability: (a) shows a central mode with

perfect rate matching, while (b) shows that the frequency of the

MAC clock is unstable when referenced to the CPU clock. But we

know from Figure 7.4 that the CPU clocks are stable with respect

to each other. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.6 Expanded plot of MAC timestamps showing high levels of noise. 146

8.1 Publisher applications push tables of key–value pairs to StateSync,

which disseminates them and delivers the complete table of all re-

ceived keys to subscribers whenever a change occurs. . . . . . . 156

8.2 The StateSync Log Scheme maintains a checkpointed and and an

active log. In the diagram, the first two ADD entries in the active

log are carried over from the checkpointed log after the redundant

entries have been compressed out. . . . . . . . . . . . . . . . . . 162

8.3 A screen shot from EmView displaying the wireless testbed de-

ployed in our building. The scale of the map is 5 meters per grid

square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

xii

8.4 Results of benchmark tests on the testbed. Each grouping of bars

represents four 20–minute experiments in which 64K of data is

published in a fixed number of chunks, issued at regular intervals. 170

8.5 The latency distribution, broken down by hopcount. . . . . . . . 173

8.6 The distribution of key lifetimes for our position estimation ap-

plication. The mean key lifetime is 1506 seconds ±121. . . . . . . 175

8.7 Results of tests of our Position Estimation application from our

12 node testbed. The latency graphs show a CDF of latency in

seconds. The curve for LogTree shows some initial traffic in setting

up the ClusterSync trees before the start of data traffic. . . . . . 180

8.8 Results of tests of our Position Estimation application from a 50

node simulation. The mean latency for LogTree is 31.54 ± 0.58;

for LogFlood is 14.33± 0.12. . . . . . . . . . . . . . . . . . . . . 181

9.1 Experimental setup for the DOA component test. . . . . . . . . . 186

9.2 Mounting the measurement laser for the azimuth test (left) and

the zenith test (right). . . . . . . . . . . . . . . . . . . . . . . . . 187

9.3 Overall distribution of errors in the Azimuth test. These results

are well within our target of ±1 deg. . . . . . . . . . . . . . . . . 188

9.4 Results of the Azimuth test, showing deviation from ground truth.

These results suggest a bias that is dependent on angle. . . . . . 189

9.5 Results of the Zenith test, showing deviation from ground truth.

Some asymmetry is evident when comparing the two sides. Neg-

ative angles approach from beneath the array and are heavily

obstructed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

xiii

9.6 Overall error distribution from the Zenith test. We observe that

the error distribution for “midrange” angles is comparable to that

of the azimuth estimates, although the error distribution for over-

head angles performs more poorly. . . . . . . . . . . . . . . . . . 194

9.7 The experimental setup for our range test in Lot 9, showing tests

at 5m (left) and 50m (right). The 50m test required multihop

synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

9.8 Results of the Ranging test, 0-90m. In (a) the impulses show the

mean deviation from ground truth (right y scale), as a function

of distance. In (b) experiments are shown ordered by distance,

with the mean deviation plotted relative to the right y scale. The

distance for each experiment is represented by the dotted line,

referenced to the left y scale. . . . . . . . . . . . . . . . . . . . . 196

9.9 Plots showing the relationship between distance, SNR, and error.

The upper graph shows a scatter plot of range error vs. SNR. The

lower graph shows the relationship between SNR and distance,

with experiments ordered by distance. The dashed line shows a

function of distance that fits well to SNR. The dotted line shows

the distance corresponding to each experiment. . . . . . . . . . . 197

9.10 Results of the Ranging test, zooming in on 10 and 5 meters. These

tests show good accuracy and precision, despite being taken over

a long time interval and assuming a single temperature over the

entire experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . 200

xiv

9.11 Results of the Ranging test, zooming in on tests from 50–55 me-

ters. Anomalous behavior is observed at 50 meters, perhaps the

result of a transient synchronization problem. A bug that could

have caused this has since been fixed. . . . . . . . . . . . . . . . 201

9.12 Overall error distribution from the Lot 9 Range Test. The stan-

dard deviation of the range error for all tests is 3.81 cm. If we

drop the 17 values with error larger than 10 cm, the standard

deviation of the remaining distribution is 1.76 cm. By applying

the narrower model in our multilateration algorithm, we can drop

the data in the tails as outliers. . . . . . . . . . . . . . . . . . . 202

10.1 The experimental setup for our system test in the UCLA Court of

Sciences. Node locations are indicated by numbered dots, while

yellow bars indicate the location of hedges. North is toward the

top of the photo. Image courtesy of Google Earth. . . . . . . . . 207

10.2 Output of the NLLS Position Estimation Algorithm, for the 1:45

AM dataset. The green crosses denote ground truth; the red

arrows show the position and orientation of each node. . . . . . . 208

10.3 Output of the R–θ Position Estimation Algorithm, for the 1:59

AM dataset. This dataset was the best result for R–θ. . . . . . . 209

10.4 Results of running our 14 courtyard experiments using a residual

threshold of 2. We see that half of our experiments do equally

well with a threshold of 3. . . . . . . . . . . . . . . . . . . . . . 212

10.5 CDF of the results of applying several different residual thresholds

to our 14 courtyard experiments. . . . . . . . . . . . . . . . . . 213

xv

10.6 Position error achieved by the R–θ and NLLS algorithms on our 14

courtyard experiments, using a residual threshold of 3. The NLLS

algorithm consistently outperforms the R–θ algorithm because it

is able to make better use of the more accurate range data. Our

2D results improve upon those in [KMS05] by a factor of 20. . . 215

10.7 (a) Average Range Residual achieved by the R–θ and NLLS algo-

rithms for our courtyard experiments, using a residual threshold

of 3. (b) Average Range Residual and Position Error for NLLS. . 216

10.8 Scaling factors relative to ground truth, and air temperature. We

see a correlation between map scaling relative to ground truth,

and air temperature. . . . . . . . . . . . . . . . . . . . . . . . . 218

10.9 Repeatability statistics for position estimates, showing the per–

node distribution of deviations from ground truth. All errorbar

ranges are ± Standard Deviation. The mean standard deviations

for X, Y, and Z estimates over all nodes are 3.18 cm, 3.85 cm,

and 49.15 respectively. . . . . . . . . . . . . . . . . . . . . . . . 220

10.10 Repeatability statistics for Yaw, Pitch, Roll, computed using the

same method as in Figure 10.9. All errorbar ranges are ± Stan-

dard Deviation. The mean deviation for yaw estimates over all

nodes is 1.37 deg. . . . . . . . . . . . . . . . . . . . . . . . . . . 221

10.11 The experimental setup for our system test in the James Reserve

in Idyllwild. Node locations are indicated by red numbered dots.

North is toward the top of the photo; all arrays were aligned by

compass to point west. . . . . . . . . . . . . . . . . . . . . . . . 223

xvi

10.12 3–D map generated by the NLLS algorithm from our deployment

in the James Reserve. Ground truth is shown as crosses, esti-

mated positions and orientations as arrows. . . . . . . . . . . . . 224

10.13 3–D position estimation map generated by the R–θ algorithm.

Both this and Figure 10.12 use data captured at 10:30 AM on

September 29, 2005. . . . . . . . . . . . . . . . . . . . . . . . . . 225

10.14 Histograms of the Position Error and Average Range Residual

Metrics for the James Reserve data. NLLS outperforms R–θ ac-

cording to both metrics, although inaccuracies in ground truth

likely prevent errors under 50 cm. . . . . . . . . . . . . . . . . . 227

10.15 Repeatability statistics for position estimates, over all 10–node

James Reserve data. All errorbar ranges are ± Standard Devi-

ation. The mean standard deviations for X, Y, and Z estimates

over all nodes are 3.48, 3.78, and 17.1 respectively. . . . . . . . . 229

10.16 Repeatability statistics for Yaw, Pitch, and Roll. All errorbar

ranges are ± Standard Deviation. The mean standard deviation

for yaw estimates over all nodes is 3.15 deg. . . . . . . . . . . . . 230

10.17 Symmetric ranges, showing variation as a function of temperature

and a consistent offset. . . . . . . . . . . . . . . . . . . . . . . . 232

10.18 (a) Symmetric ranges for 100–103, and (b) Raw range data show-

ing probable synchronization failure. . . . . . . . . . . . . . . . . 233

11.1 3–D plot showing the importance of using the φ angle information.

Although the system converged without φ, it converged to a folded

configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

xvii

List of Tables

4.1 Error Distributions for Range and DOA Estimates. . . . . . . . . 60

6.1 Device Patterns currently defined by the Emstar system. . . . . . 104

8.1 Packet and byte counts for LogTree and LogFlood for 1 chunk, 12

senders, at 600 seconds and 1200 seconds. . . . . . . . . . . . . . 172

9.1 Range experiments, grouped by target scale and precision. . . . . 199

10.1 Experiment timing and weather conditions. . . . . . . . . . . . . 210

10.2 Position Error and Average Range Residual metrics for the NLLS

and R–θ algorithms, run on the 6 10–node experiments captured

at the James Reserve. For the experiment at 10:44 AM, the NLLS

algorithm failed to reach convergence. . . . . . . . . . . . . . . . 228

xviii

Acknowledgments

Portions of the system described in this thesis were implemented by other

contributors. Much of the underpinnings of Emstar were co-designed and imple-

mented by Jeremy Elson, and many others have contributed the implementation.

Jeremy Elson designed and implemented syncd, the Emstar time synchroniza-

tion service, gsyncd, and many other components of Emstar from Packet Device

and Directory Device, to the original version of EmSim and the radio channel

simulator. Nithya Ramanathan implemented the Sensor Device pattern used by

vxpcd and other modules. Martin Lukac implemented the multilateration module

and the Emstar HTTP service. Thanos Stathopoulos implemented IP Connec-

tor, EmTOS, and contributed greatly to the work that led to Emstar’s capability

for heterogeneous simulation. Nia-Chiang Liang implemented the least squares

minimization for the DOA estimator. Hanbiao Wang provided insight towards

the design of the DOA and multilateration algorithms.

Jeffrey Tseng assembled the initial version of the hardware platform, and Mar-

tin Lukac helped with numerous changes to the hardware configuration. Dustin

Mcintire developed the system software distribution for both the initial Intel

Stargate platform and the final Slauson platform, and answered numerous ques-

tions during kernel debugging. Naim Busek assisted in the development of the

schematics for the custom microphone pre-amp board, Jeffrey Tseng did the parts

placement and board layout, Mohammad Rahimi helped to debug the pre-amp

circuits, and Carolina Garcia helped with board assembly.

Nia-Chiang Liang, Luiz Faveira, Martin Lukac, Alberto Cerpa, Vlad Trifa,

and Chris Mar helped prepare and run experiments. For our outdoor experi-

xix

ments we used the facilities of the James Reserve, managed by the University

of California Riverside, and specifically Michael Hamilton, Michael Taggart, and

Tom Unwin.

Chapter 6 contains text from the previously published work EmStar: a Soft-

ware Environment for Developing and Deploying Wireless Sensor Networks, by

Lewis Girod, Jeremy Elson, Alberto Cerpa, Thanos Stathopoulos, Nithya Ra-

manathan, Deborah Estrin, in the proceedings of the 2004 USENIX Technical

Conference, Boston, MA. Figures from this work are reprinted here with permis-

sion.

Chapter 7 describes work done together with Jeremy Elson, who designed and

implemented the original Emstar time synchronization system, which this work

builds on and extends.

Chapter 8 contains text from the previously published Technical Report, A

Reliable Multicast Mechanism for Sensor Network Applications, by Lewis Girod,

Martin Lukac, Andrew Parker, Thanos Stathopoulos, Jeffrey Tseng, Hanbiao

Wang, Deborah Estrin, Richard Guy and Eddie Kohler, CENS Technical Report

48, April 25, 2005.

Support for this work has been provided by the NSF Cooperative Agreement

CCR-0120778, and the UC MICRO program (grant 01-031) with matching funds

from Intel.

xx

Vita

1972 Born, Buffalo, New York, USA.

1994 B.S. (Mathematics),

Massachusetts Institute of Technology

1995 B.S. (Computer Science) and M.Eng. (EECS),

Massachusetts Institute of Technology

1995–1998 Sponsored Research Staff,

Advanced Network Architecture group,

Laboratory for Computer Science, MIT.

1998–2000 Graduate Research Assistant,

Information Sciences Institute, USC.

Summer 1999 Summer Intern,

AT&T Cambridge Research Laboratory,

Cambridge, UK.

2000–2003 Senior Development Engineer,

Sensoria Corporation.

2000–2005 Graduate Research Assistant,

Computer Science Department, UCLA.

xxi

Publications

Elson, J., Girod, L., and Estrin, D., “A Wireless Time-Synchronized COTS Sen-

sor Platform, Part I: System Architecture” (short paper). In Proceedings of the

IEEE CAS Workshop on Wireless Communications and Networking, Pasadena,

CA. September 5–6 2002.

Girod, L., Bychkovskiy, V., Elson, J., and Estrin, D., “Locating tiny sensors in

time and space: A case study”. In Proceedings of the International Conference

on Computer Design (ICCD 2002), Freiburg, Germany. September 16-18 2002.

Invited paper.

—, Elson, J., Cerpa, A., Stathopoulos, T., Ramanathan, N., and Estrin, D.,

“EmStar: a Software Environment for Developing and Deploying Wireless Sensor

Networks”. In Proceedings of the 2004 USENIX Technical Conference, Boston

MA June 2004.

—, and Estrin, D., “Robust Range Estimation Using Acoustic and Multimodal

Sensing”. In Proceedings of IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS 2001), Maui, Hawaii, October 2001.

—, Lukac, M., Parker, A., Stathopoulos, T., Tseng, J., Wang, H., Estrin, D.,

Guy, R., and Kohler, E., “A Reliable Multicast Mechanism for Sensor Network

Applications”. Center for Embedded Networked Sensing Technical Report #48,

April 25, 2005.

xxii

—, Stathopoulos, T., Ramanathan, N., Elson, J., Estrin, D., Osterweil, E., and

Schoellhammer, T., “A System for Simulation, Emulation, and Deployment of

Heterogeneous Sensor Networks”. In Proceedings of the International Conference

on Sensor Network Systems (SenSys 2004), November 2004.

Merrill, W., Girod, L., Elson, J., Sohrabi, K., Newberg, F., and Kaiser, W.,

“Autonomous Position Location in Distributed, Embedded, Wireless Systems”.

In Proceedings of the IEEE CAS Workshop on Wireless Communications and

Networking, Pasadena, CA, September 5–6 2002.

Savvides, A., Girod, L., Srivastava, M., and Estrin, D., “Localization in Sen-

sor Networks”. In C.S. Radhavendra, K.M. Sivalingam and T. Znati, editors,

Wireless Sensor Networks. Kluwer Academic Publishers, 2004.

xxiii

Abstract of the Dissertation

A Self-Calibrating System

of Distributed Acoustic Arrays

by

Lewis David Girod

Doctor of Philosophy in Computer Science

University of California, Los Angeles, 2005

Professor Deborah L. Estrin, Chair

The area of sensor networks promises to support the biological and physical

sciences by enabling measurements that were previously impossible. This is ac-

complished by pushing intelligence into the network and closer to the sensors,

enabling sensing to be accomplished at much higher scales and densities with

lower cost.

Recently, interest in acoustic sensing problems has increased, including the

localization and monitoring of birds, wolves, and other species; as well as of

localization of electronic devices themselves. This has spurred the development

of a rapidly–deployable distributed acoustic sensing platform.

A key problem in the development of this platform is the acoustic array cali-

bration problem, which estimates the locations and orientations of a distributed

collection of acoustic sensors. We present a system composed of a set of in-

dependent acoustic nodes that automatically determines calibration parameters

including the relative location and orientation (X,Y, Z, Θ) of each array. These

xxiv

relative coordinates are then fitted to one or more survey points to relate the

relative coordinates to a physical map. The application that computes these

estimates is itself a distributed sensing application.

In this work we present a solution to this position estimation problem, demon-

strating a complete vertical application built above a stack of re–usable sys-

tem components and distributed services, implemented on a deployable embed-

ded hardware platform. We describe: the hardware platform itself; Emstar, a

software framework for developing complex embedded system software; a time–

synchronized sampling layer; a multihop reliable multicast coordination primi-

tive; a time–of–flight acoustic ranging and direction–of–arrival (DOA) estimation

layer; and the top–level application that estimates the position and orientation

of each array.

We present the results of controlled tests of the ranging and DOA estimation

system, as well as the results of deployment experiments in both an urban envi-

ronment and a forested environment. These results demonstrate that our system

outperforms other similar systems, and that it can achieve the sufficient accuracy

for anticipated applications, such as bird localization.

xxv

CHAPTER 1

Introduction

The area of sensor networks promises to support the biological and physical

sciences by enabling measurements that were previously impossible. This is ac-

complished by pushing intelligence into the network and closer to the sensors,

enabling sensing to be accomplished at much higher scales and densities with

lower cost.

While most of the currently deployed sensor networks focus on long–lived,

low–rate sensing applications such as microclimate monitoring, interest in appli-

cations involving high–rate sensors has been on the increase. Recently, interest in

rapidly–deployable, self–configuring acoustic sensing problems has increased, in-

cluding the localization and monitoring of birds, wolves, and other species; as well

as of localization of electronic devices themselves. However, despite these needs,

very few of the requisite underpinnings of such systems are currently available.

This has spurred the development of a rapidly–deployable distributed acoustic

sensing platform.

Many of the problems faced in this work are familiar problems from the area of

wireless sensing and embedded networking. Although in this work we are target-

ing more capable systems and we expect to support shorter–lived deployments,

we still must design the system with energy consumption in mind. In addition,

because of the high volume of data captured by acoustic sensors, this system

must support local processing in order to function in the context of a wireless

1

network.

When we specifically consider embedded acoustic sensing using these high–

capability platforms, we see several systems requirements come to the forefront:

• Synchronized, distributed sampling: the ability to relate and compare with

precision time series data and events recorded on different nodes in the

network.

• A network stack designed to support ad–hoc wireless applications: link

estimation, routing, and transport.

• Reliable group communication to support distributed coordination.

• Automatic array calibration: precise, automatic estimation of the position

and orientation of the sensor arrays in the system.

• Tools to support development, debugging, and deployment.

A key problem in the development of this platform is the acoustic array cali-

bration problem in which the locations and orientations of a distributed collection

of acoustic sensors are estimated. We present a system composed of a set of in-

dependent acoustic nodes that automatically determines calibration parameters,

including the relative location and orientation (X,Y, Z, Θ) of each array. These

relative coordinates are then fitted to one or more survey points to relate the

relative coordinates to a physical map.

This problem is difficult from both a systems perspective (how we can make

the system work robustly), as well as from an algorithmic perspective, as it

involves the correct functioning of a number of separate algorithms from DSP

algorithms and estimation algorithms to multilateration algorithms. By solving

2

this problem, we simultaneously achieve two goals: we add a critical feature to

our platform, while presenting a worked–out example of a distributed acoustic

sensing application that exercises much of our target platform functionality.

In this work, we present a definition of the array calibration problem, and

explain the algorithms we used to solve it. We then explain in detail the system

components and infrastructure we developed to implement the solution. Finally,

we present the results of component testing under controlled conditions, as well

as a deployment experiment in a realistic forested environment. We thus demon-

strate that our system can achieve sufficient calibration accuracy to implement

typical acoustic applications, such as bird localization.

1.1 Contributions of this Work

The system we have built provides a deployable acoustic sensing platform that

automatically positions and orients its sensors within a relative coordinate sys-

tem, autonomously with no infrastructure requirement. As a result, this system

can be deployed much more easily and with less damage to the environment than

a system that required extensive surveying or wired infrastructure. Since the sys-

tem does not depend on GPS, it can be deployed in obstructed environments with

overhead foliage. Since the system is above all a platform for acoustic sensing, the

self–positioning component requires no additional hardware because it uses the

same acoustic sensing hardware that would be required for typical applications.

This work represents a vertical system implementation, from hardware through

a distributed application. As such, it touches many areas of the embedded net-

worked sensing field. In the next three paragraphs, we summarize the contri-

butions of this work, breaking them down into three main categories. These

3

categories represent work at different layers of the system, from lowest layer of

hardware and system software to the highest layer of high level algorithms.

Integration of an Embedded Platform.

• We designed and constructed a box with a processing unit and a small

“head unit” containing speakers and a microphone array.

• We developed a software framework that enables convenient inter–process

communication and robust operation in the field.

• We developed an end–to–end implementation of a synchronized sampling

layer, without which implementing time–of–flight ranging over multiple RF

hops would be very difficult.

• We developed deployment tools that facilitate the deployment and control

of the deployed system.

Network Stack and Distributed System Support.

• We integrated multihop time synchronization support from previous work

into our system.

• We developed a topology discovery and control layer that discovers the

topology of an ad–hoc wireless network.

• We developed a multihop reliable broadcast data dissemination service that

provides a simple publish–subscribe interface.

4

Acoustic Ranging and Positioning System.

• We designed and developed DSP algorithms to precisely estimate range and

direction of arrival using acoustic signals.

• We implemented multilateration algorithms to translate a collection of

range and angle estimates into a consistent coordinate system.

• We tied this all together into a distributed application above the aforemen-

tioned network and platform development.

The closest similar system of which we are aware is a Mica2–based system

developed at UIUC [KMS05] based on ranging developed at Vanderbilt [SBM04].

Like our work, this system is a complete audible ad–hoc acoustic localization

system. However, the performance of their system demonstrated by their exper-

iments is considerably inferior to ours. Whereas our system located 10 nodes

over an 80 by 50 meter area with a 9 cm average 2D position error, their system

located 45 nodes over a 60 by 60 meter area with an average position error of

2.47 meters—nearly 25 times worse.

The reasons for this are likely a combination of poorer range accuracy, shorter

sensing range, and possibly the absence of outlier rejection heuristics integrated

into the multilateration algorithms. Because of the computational limitations

of the Mica2, the UIUC system uses a narrowband detector implemented by an

analog PLL circuit. This detector is susceptible to various forms of noise and

will therefore require higher signal amplitudes at the receiver. The detection

range of the UIUC system is also limited by RAM buffer space available and by

the absence of multihop time synchronization, which makes the RF transmission

range an upper bound on the acoustic detection range. In the descriptions of

5

their algorithms, no mention was made of outlier rejection apart from filtering

the input ranges.

In this work we demonstrate a robust localization system that is highly accu-

rate and operates well in a difficult outdoor environment, representing a signifi-

cant improvement over other work in the field.

1.2 How to avoid reading this document

In order to make this work more useful to the casual reader, we present where to

find the most useful parts of this work.

System Overview

• System design overview: Section 2.3

• Putting the whole system together: Section 8.5

Ranging and Multilateration Algorithms

• Layout and coordinate system of the acoustic arrays: Section 2.4

• The ranging system and DSP algorithms: Chapter 3

• The direction of arrival (DOA) estimator: Section 3.3

• The Multilateration algorithms: Chapter 4

Reusable System Components

• Emstar, our software framework: Chapter 6

6

• Synchronized sampling and time synchronization: Chapter 7

• Multihop transport layer (StateSync): Chapter 8

• Deployment tools: Sections 6.2.5.3, 6.2.5.4, and 8.7

Performance Measurements

• Ranging and DOA Component performance: Chapter 9

• Position estimation system performance: Chapter 10

• Network transport performance: Section 8.6

7

Part I

The Array Calibration Problem

8

CHAPTER 2

Problem Definition

At a high level, the acoustic array calibration problem seeks to discover all cal-

ibration parameters required to use a collection of these arrays in a distributed

sensing application. However, to more clearly define this problem, we must first

discuss the application requirements in the context of the specific properties of

the hardware.

2.1 Requirements of Acoustic Detection Applications

Distributed acoustic detection algorithms are a class of applications that in-

volve sensing a phenomenon at several points and combining that information to

achieve some goal. For example, a project to study acorn woodpeckers in a partic-

ular habitat might intend to detect woodpecker calls, determine their locations,

and count the number of distinct individuals. Figure 2.1 sketches out a possible

arrangement of nodes to support such an application. Recent work on localization

of animal calls has focused on “beam–crossing” techniques [WYP04] [WCA05].

In “beam–crossing”, several stations positioned in a convex hull around the target

detect the signals from the target and estimate the direction of arrival (DOA) of

the signals. The estimated DOA vectors from multiple points are then used to

triangulate the target. Techniques that rely on DOA computed from small arrays

are preferred because in practice signals received at different stations often lack

9

Figure 2.1: Photo of a node deployed at the James Reserve [Ham00], and a diagram of

a proposed distributed acoustic sensing application to localize acorn woodpeckers.

the coherence required to measure time difference of arrivals (TDOA) at stations

that are more than a few meters apart.

Experience with DOA estimation for animal calls has yielded estimation er-

ror on the order of ±2.5 deg [WCA05]. Calibration error in the orientation of

a receiving arrays adds directly into the error in a DOA estimation at that sta-

tion. Therefore, we want to minimize the calibration error, and for practical

purposes keep it well under the ±2.5 deg figure. Error in the position estimates

for the arrays also adds into a DOA estimate, since a 0.5 meter error in position

amounts to a 1 degree error for a target at a range of 30 meters. Note that for

the purposes of localization by cross–beam, the position estimates need only be

relatively consistent; uniform scaling of the map does not affect the results.

Based on this application, we can define some typical application requirements

for our system. The system shall need sufficient numbers of nodes to surround the

10

target, with a distance from array to target of 30–50 meters. The calibration of

the system must achieve position estimates accurate to ±0.5 meters in a relative

map, and orientation estimates accurate to ±1 deg.

2.2 Definition of the Calibration Problem

The acoustic array calibration problem seeks to determine a set of parameters

that define the locations and orientations of a collection of arrays. The parameters

are referenced to the coordinate system specified in the array geometry diagram

shown later in Figure 2.3. These parameters are defined in a coordinate system

that can either be referenced to a single origin array, or referenced to one or more

arrays located at survey points.

• Let (Xi, Yi, Zi) be the location of array i, relative to survey coordinates or

simply to the other arrays.

• Let Θi be the azimuth orientation of array i, relative to survey coordinates

or simply to the other arrays.

• We assume that all arrays are leveled, so that the remaining two degrees of

freedom are fixed relative to each other.

• If survey coordinates are present, we add a global scaling variable V that

defines a scaling factor to enable the system to scale to fit the survey points.

• Environmental parameters such as temperature, humidity, and wind speed

and direction effect the effective speed of sound upon which time–of–flight

(TOF) ranging is based. Local measurements can be used to compensate to

some extent for these factors, but a future extension of this problem might

include an environmental model as additional estimated parameters.

11

RangingLayer

Emit CodedSignal

DetectionAlgorithm

SelectCode

DetectionAlgorithm

<Code, Detect Time> <Code, Detect Time><Range, θ, φ>

Multilateration Layer

<Range, θ, φ><X, Y, Z, θ>

Time-Synchonized Sampling Service Layer

Time Sync Control Traffic

Trigger

Trigger

Multi-hop Network Layer

RangingLayer

Emit CodedSignal

DetectionAlgorithm

SelectCode

DetectionAlgorithm

<Code, Detect Time> <Code, Detect Time><Range, θ, φ>

Multilateration Layer

<Range, θ, φ><X, Y, Z, θ>

Time-Synchonized Sampling Service Layer

Time Sync Control Traffic

Trigger

Trigger

Multi-hop Network Layer

Figure 2.2: Block diagram of the self–calibration system.

We propose a solution to first estimate range and angle–of–arrival information

through acoustic ranging, and using that information to estimate these parame-

ters.

2.3 Outline of Proposed Solution

To solve this estimation problem, we divide the system into several components,

as shown in Figure 2.2: the time–of–flight ranging layer, the multilateration layer,

a time–synchronized sampling layer and a networking layer. In the time–of–flight

ranging layer, each node emits calibration signals that are received by the other

nodes. Through detection algorithms, the system estimates the phase and angle

of arrival of the incoming signal from each peer. The use of the time–synchronized

12

sampling layer enables these phases to be compared a across nodes to establish

range estimates based on the time of flight of the signals. These range and angle

estimates are then passed over the network to a multilateration component that

estimates the most likely values for the calibration parameters to match the range

and angle data. In this section we briefly outline these layers, before discussing

them in more detail in the following chapters.

2.3.1 Time of Flight Ranging Layer

The TOF Ranging Layer is triggered by higher layers to emit a ranging signal.

The ranging signal is coded signal modulated on a 12 KHz carrier, which can be

readily detected by a matched filter. These techniques have been shown to work

well for acoustic ranging in previous work, including [GE01] [GBE02] [MGS04].

The emitted code is detected at the emitter in order to determine the exact

time at which the code was transmitted. The transmission time and the code

index are then sent to the other nodes via a multihop wireless network. Upon

arrival, the time synchronized sampling layer is queried to extract the region of

the signal that might contain the arriving ranging sequence, and the detection

algorithm determines the range and direction of arrival (DOA) for the incoming

signal. The range and DOA information is passed back through the network to

the multilateration layer.

2.3.2 Multilateration Layer

The multilateration layer controls the automatic calibration process and com-

putes a consistent map based on the ranges and DOA values estimated by the

ranging layer. The multilateration layer analyses the raw list of range and DOA

estimates to locally determine whether new range experiments are needed. If

13

more data is needed, it will trigger local ranging after a randomized delay. Events

that cause range or angle estimates to be invalidated, such as moving one of the

array receivers, would trigger new ranging through the same mechanism when

the multilateration layer discovers that the ranges have been revoked.

The multilateration algorithm itself is based on a non–linear least squares

optimization in the variables (X,Y, Z, Θ). To address cases where line of sight

(LOS) is obscured, outlier rejection heuristics are used to remove inconsistent

data. Outliers are removed both by removing cases where the DOA estimate is

inconsistent with the estimated placement of the nodes by more than 20 deg, and

by excluding range constraints that have high weighted residuals.

2.3.3 Network Layer

The network layer provides a multihop mesh network with several convenient

primitives for group communication and coordination. At the routing layer, flood-

ing and IP routing are supported. The flooding layer also supports hop–by–hop

time conversions for known packet types.

The network layer also supports some reliable state dissemination protocols.

The StateSync protocol [GLP05] provides a reliable multicast transport infras-

tructure with low–latency updates over a multihop network. This module is

interfaced through a simple publish–subscribe API. For example, in this system,

nodes publish their current range and DOA estimates to this system, which dis-

seminates them multiple hops across the network. The same mechanism is also

used to publish link state, routing state, and the existence of faults throughout

the network, enabling debugging and visualization in the field.

14

8cm

0°θθ φ

14cm

(-4,-4,0)

(-4,4,14)

0°φ

90°θ

8cm

0°θθ φ

14cm

(-4,-4,0)

(-4,4,14)

0°φ

90°θ

8cm8cm

0°θθ φ

14cm

14cm

(-4,-4,0)

(-4,4,14)

0°φ

90°θ

Figure 2.3: Photograph of an acoustic array, and a diagram of the array geometry.

2.4 Properties of the Acoustic Array Hardware

Many of the details of our implementation depend on the properties of the acous-

tic array hardware implementation. While many aspects of our software can be

modified to support configurations other than the one we have selected, some

aspects of the system, as well as the results are influenced by these details. In

this section, we describe the configuration of our acoustic array hardware and

discuss some alternatives.

Our acoustic sensor nodes consist of a stand–alone wireless processing unit

connected to a four–channel microphone array and acoustic emitter. The CPU

is capable of sampling and emitting four channels at 48 KHz. The microphones

have a frequency response from 40Hz–15KHz, while the emitter has a frequency

response from 3.5KHz–27KHz.

The geometry of the array is shown in Figure 2.3. Projected onto the (x, y)

plane, the four microphones lie on the corners of a square 8 cm on a side. Three

15

of the microphones lie on the (x, y) plane, while the fourth is raised 14 cm above

the plane. The origin of the array is considered to be the center of the plane

containing the three co–planar microphones. The emitter is composed of four

separate emitters, wired in parallel and positioned around the array, 4 cm below

the origin plane. This geometry is simple to construct and provides enough

diversity along the z axis to provide reasonable results for estimates of zenith

angles.

The geometry we selected is not ideal. When we began this work we consulted

with colleagues who were working in parallel on a woodpecker detection project,

which we intended our platform to support. They suggested a square array with

all four microphones in a plane because it would work well for their algorithms.

However, after constructing the arrays and gaining some experience with them,

we later retrofitted them to have one of the microphones raised above the plane

in an effort to improve the the system’s zenith estimates. Although our current

results have been satisfying, if we were to build more arrays today, we would

probably select a more symmetrical geometry, such as a tetrahedron. By adding

more microphones symmetrically, that geometry would approach a spherical or

hemispherical array, such as is presented in Duraiswami [DZL05].

2.5 Comparison to Related Systems

Since our system is designed to support acoustic detection algorithms, in some

sense using acoustics for position estimation is “free”—we already need to have all

of the hardware and software to do acoustic monitoring, so all that is required are

the additional algorithms to determine positions. In addition, it is advantageous

to use the array itself in the position estimation because our future measurements

(i.e. acoustic detection algorithms) will all be made relative to that array location.

16

This raises the bar for every other localization mechanism, because they will still

require some additional mechanism or calibration step to reference the location of

the “localization receiver” to the array. In general, these requirements are quite

difficult to satisfy using other techniques.

2.5.1 RF Localization Techniques

Position estimation using RF techniques has thus far been found to be difficult

to achieve in ad–hoc deployments. GPS is typically not available in forested re-

gions, because the faint signals from the satellites are easily blocked by overhead

foliage. Ultra wideband RF transceivers may someday provide accurate ad–hoc

localization, but licensing problems and the difficulty in acquiring the hardware

have made it difficult to test or adopt. Early experience with UWB systems have

shown error distributions with standard deviation of about 45 cm [CKS03], in

near–range, line–of–sight conditions. Obstructed environments with significant

multipath pose serious problems [LS02], although these issues might be addressed

through the solution of over–constrained systems, similarly to acoustic systems.

In any case, the improved penetration of RF relative to acoustics does not nec-

essarily improve matters, because just as with acoustics, obstructed RF environ-

ments often introduce significant range errors. In practice, most of the UWB

ranging technology available off–the–shelf relies on tightly synchronized base sta-

tions. While requiring base stations may not be an unreasonable requirement, it

does increase the cost and difficulty of deployment.

Solutions based on measuring received signal strength or connectivity typically

provide very poor accuracy in practical environments [BHE00]. Signal strength

poses problems because the transmit power of a radio is not generally well–

calibrated, and the propagation characteristics of the environment are generally

17

unknown. Multipath fading is especially problematic, because it can introduce

abrupt variations in received signal strength over very small distances.

Several ad–hoc location systems have been built based on automated calibra-

tion of RF measurements. SpotOn [HVB01] presented a system that included a

calibration phase to calibrate out difference in the transmitters and receivers, al-

though this system was still subject to environmental variations. RADAR [BP00]

presents a more comprehensive calibration process, in which a robot is used to

discover a detailed map of signal strength to 802.11 base stations.

Some systems have presented “range free” solutions based on connectivity

rather than signal strength [NN03] [SRZ03] [SS04]. These systems propose filtered

connectivity as a more reliable and more readily modeled metric than received

signal strength. The binary nature of the connectivity metric can also yield

simplifications in the localization algorithms. However, these systems do not

achieve accuracy approaching phase–based techniques.

Some recent work on an interferometry–based scheme presented in [MVD05]

demonstrated high accuracy in line–of–sight, low–multipath conditions: 5 cm av-

erage position error in an 18 by 18 meter field with three anchor points. While

these techniques are resilient to many types of amplitude noise, and are indepen-

dent of variations in signal amplitude due to transmitter or receiver variations,

they are not immune to multipath interference. Although this work did not show

results from environments containing reflectors, it seems likely that environments

rich in multipath interference would introduce significant difficulties.

18

2.5.2 Laser–based Localization

Laser ranging and pointing systems have been used for localization in the robotics

community for many years. One of the most popular systems is the SICK Scan-

ning Laser Rangefinder, an off–the–shelf module that can easily be attached to

a mobile robot and can report the range to reflective surfaces with accuracies

as high as a few millimeters. However, these systems are generally large and

expensive, both in terms of monetary cost and energy cost.

In the world of sensor networks, several laser–based systems have been pro-

posed, including the Lighthouse system [Koe03] and the Spotlight system [HSS05]

[SHS05]. These systems work by scanning a laser over a field of devices and

marking the times that the laser is detected by each device. These times are

then correlated to the scanning position of the laser at that time to estimate

the location of the device. While these systems have been demonstrated to work

well outdoors, they require line–of–sight from the scanning laser to the nodes in

the field. This requirement is similar to the requirements of GPS, and is not a

practical assumption in a forested area.

2.5.3 Ultrasound Acoustic Localization

The most successful implementations of ad–hoc sensor network localization sys-

tems to date have been based on measuring acoustic time–of–flight. Of these,

most have presented localization solutions based on ultrasound, including the Ac-

tive Bat [WJH97], AHLoS [SKB00] [SPS03] [SHS01], Cricket [PCB00] [PMB01]

[SBG04], and Calamari [WC03]. Of these systems, all are ad–hoc systems save the

Active Bat, which relies on surveyed ceiling–mounted receivers. Calamari fuses

RF signal strength with ultrasound ranges to improve its performance. Cricket

is based on ceiling–mounted beaconers, which are self–configuring in version 2.

19

AHLoS is a fully self–configuring system developed for the Active Kindergarten

project.

While ultrasound holds many advantages, such as being inaudible, experience

with ultrasound has shown that it does not perform well in outdoor environments.

Ultrasound is readily blocked by obstructions such as foliage, and tends to have

lower effective range. In addition, most implementations of ultrasound rang-

ing use off–the–shelf detectors and narrow-band signals that tend to have lower

processing gain than the wideband coding techniques we can use using audible

acoustics. Work on wideband ultrasonic transducers by Hazas et. al. [HW02]

might offer better outdoor performance, but these transducers are not readily

available as a manufactured product, and so far there has been limited experi-

ence with these devices. By comparison, the broad frequency diversity achievable

using typical audio speakers enables excellent interference rejection, especially

for somewhat obstructed environments such as are found in forested areas. In

our experience [GE01] [MGE02] and others [KMS05] [SBM04], using the audible

acoustic spectrum has given excellent performance in outdoor environments.

2.5.4 Orientation Discovery

Determining the orientation of the arrays is also a challenging problem. As we

have seen, this is a critical aspect of the calibration if we intend to support

distributed sensing applications. The small size of the arrays makes accurate

manual alignment of orientation difficult because a small movement of the array

yields a large rotation. Magnetic orientation sensors could be employed, but

they are subject to significant local variation caused by metallic objects and

other sources of magnetic interference. Instead, we use the acoustic sensors to

estimate the array orientation directly. This way, we avoid the need to reference

20

the measurement to the exact physical configuration of the array. The principles

we use to estimate direction of arrival are similar to those used in the Cricket

Compass [PMB01], although our algorithms achieve better results with fewer

receivers.

2.5.5 Alternatives to the use of Sensor Arrays

As an alternative to the use of sensor arrays, other work such as [RDY05]

and [DZL05] has suggested that a single sensor with a known near field envi-

ronment can be used for DOA estimation. In these methods, the signal from a

single sensor is deconvolved according to a known near field impulse response,

which is parameterized in terms of the direction of arrival. Given a hypothesized

direction of arrival, this method can be used to perform spatial filtering. This

principle is applied in human auditory perception to estimate direction of arrival.

However, it is not clear whether we gain much for our systems by using this

technique.

Since these methods require a fixed near–field configuration (in the case of

humans, the head), having one microphone in addition to external reflectors may

not actually result in a reduction in form factor relative to a microphone array. In

addition, while it is understood how to apply this method to spatial filtering, we

are not currently aware of algorithms that can do the inverse: efficiently recover

the best match impulse response from a family of responses parameterized on

incoming angle1. Since additional microphones are relatively inexpensive compo-

nents, adding microphones may ultimately be cheaper if they significantly reduce

the processing required.

1While humans use these methods to estimate incoming angle, we need our system to achievehigher levels of accuracy, and we have far less computational power to work with.

21

The other drawback is that the precision of our direction of arrival estimate

depends a great deal on how precisely we know the incoming signal. For appli-

cations such as woodpecker detection, the signal is often not characterized well

enough to get a clean impulse response.

22

CHAPTER 3

Estimation of Range and DOA

In Chapter 2 we laid out the basic outline of the time of flight ranging system

developed in this work. In this Chapter we will present the detection and esti-

mation algorithms in more detail, describing the ranging system and the inside

of the “Detection Algorithm” box of Figure 2.2.

Ranging and localization systems have been discussed and characterized in

many survey papers [PAK05] [LR03] and book chapters [SGS04] [KSP03]. This

ranging system is an active, cooperative ranging mechanism. It is an active

mechanism because the emitter in the system generates a signal specifically so

that a receiver can detect it and determine a range and bearing estimate. It is

cooperative because the emitter and receivers are working together: the emitter

notifies the receivers when a signal is emitted, and provides explicit timing and

decoding information.

In this system, the emitter selects a code seed and generates a coded ranging

signal. This ranging signal is then emitted through the speaker outputs. In order

to determine the exact time of emission, a segment of data is captured from the

local microphones and processed to detect the signal. The local detection time

and the code seed are then sent over the network to reach the receivers.

Figure 3.1 shows an expanded view of the detection algorithm. In the first

stage, the input signals are extracted and filtered according to the particular code

23

Start Time

Rate Skew

Code

Filtering andCorrelation

Signal Input

4

4

Approx 1st Peak Phase

Noise Estimate

1st Peak Phase

θ, φ, V

SNR

Detection andExtraction

4DOA EstimationAnd CombiningStart Time

Rate Skew

Code

Filtering andCorrelation

Signal Input

44

44


Noise Estimate

1st Peak Phase

θ, φ, V

SNR

Detection andExtraction

44DOA EstimationAnd Combining

Figure 3.1: Block diagram of the ranging detection algorithm.

used by the emitter, as described previously in [GE01]. In the second stage, the

ranging signal is detected in the signals, and the approximate phase of the signal

is determined. Using this approximate phase, input signals are cropped to select

out only the segment of the input containing the ranging signal. In the final

stage, the cropped inputs are analyzed to estimate direction of arrival (DOA).

Then, that DOA estimate is used to recombine the 4 channels into a single signal,

from which a more accurate phase estimate and SNR value can be determined.

The following sections will describe each of these stages in detail.

This work is similar to other work such as the work of Sallai [SBM04], which

also performs ranging using audible acoustic signals. One advantage of Sallai’s

work is that it is simple enough computationally to fit on a Mica2 mote. However,

as we will see in Chapter 9, the performance of our detection algorithms is far

superior to that of the Mote–based system, and in addition our system determines

direction of arrival. Where our system measures range with a standard deviation

of 3.8 cm, the system described in [SBM04] achieves a standard deviation of

approximately 20 cm.

24

Start Time

Rate Skew

CodeModulator FFT

Signal Input

4

Extract

FFT4 4 4

FD Correlation2 KHz High

Pass

Start Time

Rate Skew

CodeModulator FFT

Signal Input

44

Extract

FFT44 44 44

FD Correlation2 KHz High

Pass

Figure 3.2: The Filtering and Correlation stage of the ranging detection algorithm.

3.1 Filtering and Correlation

The Filtering and Correlation stage of the detection process extracts a segment

from the input signal and filters it in preparation for detection. This stage has

two parallel tracks: one that generates and processes the reference signal, and

the other that processes the acoustic input signals.

-30000

-20000

-10000

0

10000

20000

30000

0 4 8 12 16 20 24 28 32 36

DA

CV

alue

Samples (48 KHz)

PN Code Modulation

Figure 3.3: Modulation for seed=7, encoding the sequence 0111011110.

25

100000

1e+06

1e+07

1e+08

1e+09

1e+10

1e+11

1e+12

1e+13

1e+14

1e+15

0 2 4 6 8 10 12 14 16 18 20 22 24

Pow

er

Frequency Component, KHz

(a) Exact Spectrum of PN Code Emitted by 103

100000

1e+06

1e+07

1e+08

1e+09

1e+10

1e+11

1e+12

1e+13

1e+14

1e+15

0 2 4 6 8 10 12 14 16 18 20 22 24

Pow

er


(b) System Response to PN Code Measured at 103

Figure 3.4: (a) The power spectral density (PSD) function for the exact reference

signal. (b) The PSD of the reference signal as recorded at the source.

26

100000

1e+06

1e+07

1e+08

1e+09

1e+10

1e+11

1e+12

1e+13

1e+14

1e+15

0 2 4 6 8 10 12 14 16 18 20 22 24

Pow

er


(a) Spectrum Measured at 101 (80m range, Court of Sciences)

-60

-50

-40

-30

-20

-10

0

10

0 0.05 0.1 0.15 0.2 0.25 0.3

dB

Time lag in seconds

(b) Correlation of Reference with Input Measured at 101

Correlation (peak 7 dB)Noise Floor

Figure 3.5: (a) The PSD of the input signal received at Node 101, 80 meters from the

source. (b) The correlation of the signal above, expressed in the time domain. The

correlation peak is 7dB above the noise floor.

27

3.1.1 Code Generation and Modulation

Our ranging system implements direct–sequence spread spectrum [SP80] [Rap96]

in the acoustic domain, by emitting and detecting a coded ranging signal. The

codes used in this system are selected from a family of chaotic pseudo–noise (PN)

codes generated by repeated evaluation of the logistic equation,

xn+1 = Rxn(1− xn). (3.1)

These types of chaotic codes have been used successfully in other commu-

nications systems, including underwater acoustic communications systems such

as [APB02]. In our system, we quantize the output of the equation above to one

bit per iteration, such that

Cn =

0 if xn < 0.5

1 otherwise(3.2)

This code is then modulated on a 12 KHz carrier. The modulation scheme

is a modified Binary Phase Shift Keying (BPSK) scheme. An example of the

modulation can be seen in Figure 3.3. Essentially, the signal shifts phase 180 deg

on every 0 bit, and maintains the same phase on every 1 bit. An additional

discontinuity in the signal is introduced by starting and ending each bit at π2

rather than at 0. This forces an additional rail–to–rail transition.

We performed several empirical tests, weighing waveforms designed for smooth

transitions against waveforms with more abrupt transitions. We found that over-

28

all better performance in both range and phase accuracy was achieved using

more abrupt transitions. Although we were concerned that the abrupt transi-

tions might not reproduce well as they passed through the speaker components,

we found that they increase the energy delivered through the system and also

result in a much broader spectrum. In practice, using more abrupt modulation

yields higher signal–to–noise ratios (SNR).

0

5

10

15

20

0.998 0.999 1 1.001 1.0020

5000

10000

15000

20000

25000

dB

Tri

als

Rate Skew Multiplier

Peak Autocorrelation, in dB above Noise Floor

AutocorrelationObserved Rate Skew (±50 PPM)

Figure 3.6: SNR for PN code detection as a function of sample rate skew, and the

observed sample rate skew in 100K outdoor trials.

Figure 3.4 shows the power spectral densities of the exact reference signal

and the reference signal as emitted and measured directly at the source. These

graphs show that much of the energy in the original signal is preserved as it passes

through the system.

Generating the reference signal used for detection requires inputs from other

layers of the system. The detection algorithm is passed a message containing the

29

code seed used by the emitter, and the rate skew. The code seed parameter defines

the initial value for the chaotic function. The rate skew parameter defines the

relative skew between the emitter and receiver codecs, as determined by the time

synchronization subsystem.

When the signal is modulated, rate skew is used to adjust the modulation

rate to match the rate of the incoming signal. This skew input could also be

used to correct for Doppler shift in systems involving motion. Rate skew has a

significant impact on the performance of the correlation. Figure 3.6 shows the

peak correlation of one of our PN codes, correlated against itself with varying

degrees of rate skew. The y axis of the graph is in dB above the noise floor

(see Section 3.2 for more details on how we estimate the noise floor for this

system). The plot shows that a skew rate of just 0.06% results in the maximum

correlation peak dropping to the noise floor. This skew rate could result from a

relative velocity of 2 m/s.

Measurements of sound card oscillators have shown accuracy on the order of

50 parts per million (PPM) [Baa05]. Observations with our system confirm this

measurement. Figure 3.6 shows the distribution of observed rate skew for 100K

trials. Our observations showed a mean of 1, a standard deviation of 13 PPM,

and a range of 100 PPM.

3.1.2 Input Extraction

Along with the code seed and rate input, the detection algorithm is passed a

message containing the signal start time. This start time has already been con-

verted by the time synchronization subsystem so that it is expressed in terms

of our local CPU clock. The sampling layer can therefore be queried to extract

segments of the signal starting at the time of emission. Once the signal is located

30

in those segments, the distance from the start time to the detection time, or lag,

determines the time of flight.

After extracting the relevant segments from the 4 input channels, the input

segments are transformed to the frequency domain using an FFT. A 2 KHz high

pass filter is then applied to the signals to eliminate low frequency noise. This

eliminates wind noise and other environmental sources of low frequency noise

with large amplitude. This can be seen clearly in Figure 3.5, where the power in

the low–frequency components dwarfs the rest of the signal. However, as we see

in Figure 3.4, the ranging signals used in this system roll off below 2 KHz. In

addition, the response of the piezo emitters used in the system drops off around

3 KHz. Therefore, a 2 KHz high pass filter can be applied without significantly

impacting the ranging signal.

3.1.3 Correlation

After the reference signal and the input signals have been transformed to the

frequency domain and pre–filtered, the next step is to correlate the input signals

to the reference. This process is sometimes called a matched filter because it

filters the input signal exclusively for superimposed copies of the reference signal.

Unlike simple filters that match a continuous band of frequencies, a matched filter

matches against an exact distribution of frequencies, and therefore is much more

specific.

In the time domain, this process is called convolution, and is implemented

as a “sliding correlator”, in which the reference is matched against the signal at

every possible offset. In the frequency domain, this can be done by multiplying

the two frequency distributions together [KC76]. The result of correlation (after

transforming back to the time domain) is shown in Figure 3.5(b). As we can

31

see, even at 80m range this correlation function achieved an excellent signal to

noise ratio of 7 dB. The online determination of the noise floor is discussed in

Section 3.2.

3.2 Detection and Extraction

After the initial filtering and correlation steps are complete, the next stage im-

plements a rough detection to locate the ranging signal in the input. In the

detection stage, the input signals are first transformed back to the time domain

to yield correlation functions. Next, the correlations are processed by an adap-

tive noise estimator and peak detector that determines an estimate of the earliest

peak above the noise floor on each channel. Based on these estimates, a small

segment containing the peaks is extracted from all four channels. These segments

are interpolated at a higher resolution and normalized based on the noise floor

estimates. These small segments in the time domain are then passed along to the

estimation stage.

3.2.1 Noise Estimation and Peak Detection

The detection problem for ranging differs from many similar problems in the area

of communications. The difference stems from the fact that the most accurate

range estimate comes from the first arrival, rather than the strongest arrival.

So, where in communications systems the object is to locate and combine the

strongest multipath components, with ranging the object is to locate only the

first arriving component, which may not be the strongest. To achieve this, our

detector must develop a model of the noise and lock onto the first significant

deviation from that model. In practice because the noise level can vary somewhat

32

0

0.1

0.2

0.3

0.4

0.5

-50000 -25000 0 25000 50000

Fra

ctio

n

Correlation Value

Distribution of Autocorrelation Noise

Fraction of Correlation Points in BinLogistic PDF, µ = −4.09, b = 6510.10

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

-50000 -25000 0 25000 50000

Fra

ctio

n

Correlation Value

Filtered Distribution of Autocorrelation Noise


Figure 3.7: Analyzing the distribution of autocorrelation noise.

33

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

-150 -100 -50 0 50 100 150

Fra

ctio

n

Correlation Value

Distribution of Correlation Noise for 103–101


1

10

100

1000

10000

100000

0 20 40 60 80 100

Tri

als

Correlation Peak in Multiples of σ

Distribution of Peak Values among Successful Trials, C = 12

Noise PeaksDetection Peaks

Figure 3.8: (a) Distribution of noise for the correlation shown in Figure 3.5. (b)

Distribution of peak correlation values for a 100K trial outdoor test. For each successful

trial, the largest noise peak and the detection peak are included.

34

FD Correlation

Extract

44

Adaptive NoiseEsimtator andPeak Detector


Noise Estimate

InterpolationFFT � 8x IFT

TD Correlation

4

IFFT4

Normalize

4

FD Correlation

Extract

4444

Adaptive NoiseEsimtator andPeak Detector


Noise Estimate

InterpolationFFT � 8x IFT

TD Correlation

44

IFFT44

Normalize

44

Figure 3.9: The Detection and Extraction stage of the ranging detection algorithm.

as a function of time, we would also like to develop a model that reacts to changes

in the noise level.

To address this problem, we first need to develop a model for the noise in

the system. Because of the similarity between our detection process and straight

autocorrelation, we begin by examining the distribution of autocorrelation noise

for our PN code family. In autocorrelation, the reference code is correlated to

itself. In our system, the reference is correlated to a copy of the reference which

has been passed through a system function that includes distortions introduced

by the physical emitter, the receiver and the intervening environment. Therefore,

the noise model should be some combination of the autocorrelation noise and the

noise introduced in the system.

3.2.1.1 Modeling Autocorrelation Noise

Figure 3.7 shows the results of analyzing the distribution of autocorrelation noise.

Figure 3.7(a) shows the distribution of correlation values of a typical PN code,

after removing the values immediately surrounding the peak. The distribution

has a heavy proportion of small values which we conjecture are artefacts of the

exactness of the functions being correlated. We also conjecture that noise intro-

duced by the environment will dominate over these small correlates, effectively

35

redistributing those points according to the noise model imposed by the system

and environment.

If we filter these small values out, the distribution roughly fits a logistic dis-

tribution characterized by the distribution function

L(x, µ, b) =1

1 + e−(x−µ)

b

, where (3.3)

b =σ√

3

π. (3.4)

Figure 3.7(b) shows the fit to the logistic distribution, achieving a value of

D+ = 0.0185 from the Kalmagorov–Smirnov test and A2 = 2.074 from the

Anderson–Darling test. These fit results leave room for future work to better

characterize this data.

Using this fit, we can explain how we determined the noise floor for the graph

in Figure 3.6 that shows autocorrelation peaks as a function of rate skew. Based

on the distribution in Figure 3.7(b), we defined a safe noise floor as 6σ, which

in that case is 105. Applying this to the data in Figure 3.6 yielded good results.

The largest peak that yielded an incorrect answer (i.e. did not detect at 0–lag)

was 5.1σ. While some smaller peaks yielded the correct answer, no peaks over

6σ were incorrect.

3.2.1.2 Modeling Noise from Real Data

Next, we considered real data from an outdoor experiment in a forested region

of the James Reserve [Ham00]. We will discuss this experiment in more detail in

Chapter 10, but for now we will consider the data set as 105 independent ranging

36

trials. Here we follow a similar approach to the case of autocorrelation noise, in

which we examine a single trial in detail and then look at the results from the

complete set of trials.

Figure 3.8(a) shows the distribution of correlation noise for the ranging trial

shown in Figure 3.5 (an 80 meter test). There is a recursive problem in deter-

mining a noise distribution to detect noise: we want exclude the signal from our

distribution, but we also want to use the distribution to distinguish noise from

signal.

To address this we first empirically determine a cutoff factor by choosing a

cutoff and comparing the output of an online estimator to ground truth. The

online estimator continuously estimates the mean and variance of the correlation

until the data exceeds the current standard deviation estimate by the cutoff

factor. In the event that no point exceeds that factor, the signal is assumed to

be undetectable and that trial is dropped. In this empirical design process we

reduce the cutoff until the estimator began returning values less than the ground

truth lag—values that result from locking onto noise.

We define the estimator more clearly as follows: given a collection of signal

time series Si and ground truth values Gi, we define the online estimates of mean

and standard deviation:

µi,j =

∑

k<j Si,k

j, (3.5)

σi,j =

√

√

√

√

∑

k<j (Si,k − µi,j)2

j. (3.6)

37

We then select the lowest cutoff value C such that:

∀i(

∃j s.t. arg minj

Si,j > Cσij

)

→ j ≥ Gi. (3.7)

Once a hypothetical cutoff is selected, we measure how well this cutoff per-

forms using a large data set. Using our cutoff–based detection algorithm to lo-

cate the signal, we then analyze the preceding noise offline. This analysis shows

that the noise preceding the detection is roughly similar to the “filtered” auto-

correlation noise, with a weak fit to the logistic distribution, as in for example

Figure 3.8(a). For each trial in the data set, we analyze the noise distribution and

record the noise peak Ni, the largest candidate point before the detection peak Di

that exceeded the cutoff:

Ni = arg maxk<j

Si,k

σi,k

, (3.8)

Di =Si,j

σi,j

. (3.9)

Figure 3.8(b) shows a plot of the distribution of the Ni and Di over all trials for

which a detection peak was found. This distribution plot serves as a verification

of the choice of cutoff C, because it shows a clear gap between the N distribution

and the D distributions. This gap assures us that by choosing our cutoff value

C, we will have a very low likelihood of a false positive detection, in which we

mistake a sample from the Ni distribution as a detection peak. It also assures us

that we are unlikely to significantly lower C without increasing the probability

of false positives; rather, improvement can only come through a more accurate

38

noise model, i.e. a model more sophisticated than standard deviation.

3.2.1.3 An Adaptive Noise Model

The last remaining part of this problem has to do with reacting to dynamics in

the level of noise. This idea fits well with the concept of an online estimator, as

we can readily design an online estimator to incorporate a filter that favors recent

data over older data. The most common solution to this type of problem is the

Exponentially Weighted Moving Average (EWMA), which is a type of filter that

is easy to implement digitally and is equivalent to an “RC” low–pass filter. A

EWMA function E(t) filtering the isochronous signal S(t) is represented by the

update function:

Et = αEt−1 + (1− α)St, 0 < α < 1. (3.10)

Each element St is carried forward in the average with an exponentially de-

creasing weight; at time t, the element St−k is weighted by αk(1 − α). We can

control the scale of the adaptation by selecting α. To see how, consider the step

response of the EWMA function. If we assume that E0 = 0 and S0+ = 1, then

(applying the Maclaurin series):

Et = (1− α)t−1∑

i=0

αi

= (1− α)

(

∞∑

i=0

αi −∞∑

i=t

αi

)

= (1− α)

(

(

1

1− α

)

− αt∞∑

i=0

αi

)

39

= (1− α)((

1

1− α

)

− αt

(

1

1− α

))

= 1− αt

log(1− Et) = t log(α)

α = elog(1−Et)

t .

Translating this to acoustics, by choosing α we can select at what scale to

adapt to changes in the noise environment. Because physical processes are the

source of sounds, the time constants involved with onset of noise are typically

scaled to match the rates that objects typically are expected to move. Since we

control the signal we intend to detect, we can ensure that its onset is much faster

than the expected types of noise, and we can select a filter that can adapt to en-

vironmental noise without risk of filtering out our signal. In our implementation,

we selected α = 0.99, in order to adapt to within 1% for changes on the order of

5 ms (480 samples). The results presented in Figure 3.8(b) use an EWMA filter

with α = 0.99 to compute the estimates of mean and variance.

3.2.2 Extraction

The detection algorithm described above is run on each of the 4 channels to de-

termine a noise floor estimate and an estimate of the earliest arrival time of the

signal. Next, we extract a peak–region segment, a small segment of 64 samples

surrounding the earliest arrival time, so that we can focus the rest of the process-

ing on that segment. This technique reduces processing requirements and also

helps to filter out multipath interference. By extracting a small region centered

on the first arrival, echoes and reverberation that arrive with a phase lag greater

40

than 321 samples (22 cm of distance) will be ignored.

To extract the segment, we select a center point based on the results of the

detection algorithm. Only detection times with SNR values above the cutoff

are considered. If none of the channels achieves this, the detection is aborted.

Otherwise, the earliest time among the four channels is used to define the center

of the extraction. This heuristic increases the probability that the system locks

on to the true shortest path, even if one of the channels misses the earliest path

but detects a strong reflection. The 64 sample peak–region segments are then

extracted from each of the four channel inputs, and passed along to the next

stage.

3.2.3 Interpolation and Normalization

In this stage, the peak–region segments are normalized to equalize their noise

floors and they are interpolated to a higher sample rate using the Fourier series.

In the normalization step, each of the signals’ mean and variance is estimated

based on the samples immediately preceding the peak regions. The signals are

then adjusted so that these parameters are matched across the four channels.

Next, the signals are re–sampled at a higher resolution. To re–sample a seg-

ment, we first compute the FFT of the data to generate the coefficients of the

Fourier series that reconstructs the time series. Then, we numerically evaluate

the Fourier series at 8x resolution to generate a time series with exactly the same

frequency composition as the original, but with 8x as many points. While this

does not add any additional information, it does enable us to perform subsample

shifts when implementing a limited–slip sliding correlator in the next section.

1A region of 32 samples is extracted because 32 is the next power of 2 larger than the longestallowed phase lag between any two microphones based on array geometry. For larger arrays,this size would necessarily increase.

41

This technique is equivalent to fractional–phase shifts when correlating in the

frequency domain, but in our case, with small series and limited phase shifts,

time–domain correlation is considerably more efficient. One question which we

leave to future work is the question of how much information is lost when we

perform this re–sampling operation using only components from the small, 32–

bin FFT of the peak–region segment. We have not experimented to determine

the optimal size of the segment to use to re–sample the peak–region, nor the

optimal expansion factor in the re–sampling operation.

3.3 DOA Estimation and Combining

TD Correlation

DOA Estimator

1st Peak Phase

θ, φ, V

Combiner

Peak Detector

SNR

Max

TD Correlation4

6TD Correlation

DOA Estimator

1st Peak Phase

θ, φ, V

Combiner

Peak Detector

SNR

Max

TD Correlation44

66

Figure 3.10: The DOA Estimation and Combining stage of the ranging detection algo-

rithm.

The final stage of processing, shown in Figure 3.10, takes as input the pre–

filtered peak–region segments and estimates a direction of arrival (DOA) and

range estimate. We use a technique known as 2–TDOA to estimate the DOA of

the incoming signal [RF03]. This estimation process begins by cross–correlating

the channels to determine the most likely relative phase lag for each pair. These

lags form constraint equations that are solved using least squares to estimate DOA

parameters (θ, φ) and a compensation variable v to allow for local variations in

the speed of sound. If a DOA estimate can be determined, that estimate is used

to combine the four channels before doing a final peak detection to estimate the

42

range. We now discuss each of these steps in more detail.

3.3.1 Lag Finding

To find the most likely phase lags, we do a cross–correlation of each pair of

channels and find the simple maximum of each correlation. Unlike the case of

range determination where we want to determine the absolute phase of the signal,

for DOA determination we are only interested in the relative phases of the signals.

This means that we are free to lock on to the strongest part of the signal without

worrying that it might not represent the true onset. To get accurate relative

phase, it is important that our algorithm consistently lock on to the same feature

in each of the lagged signals. Our experience indicates that a simple max achieves

this property.

The work done in the previous stage to re–sample the peak region now pays

off in DOA estimation. Each cross–correlation has a maximum relative phase

offset bounded by the distance between the microphones given the geometry of

the array. This means that we can apply a limited–slip cross–correlation in the

time domain, which can be less expensive than a frequency–domain correlation.

If M is the slip required and N is the length of the segment, then a time–domain

solution is faster whenever M < 2C lg N , where C is the constant cost of the

FFT operation.

In this case re–sampling is also critical, because the small size of the array

means that phase lags must be determined with subsample precision2. For ex-

ample, one sample lag for a beam arriving at 90 deg resolves to a 5 deg variation,

because a single sample is 0.71 cm, and the baseline of the array is only 8 cm:

2Doing the correlation in the frequency domain would give the same result if the result oftransforming back to the time domain were interpolated in the same way as our re–samplingprocess.

43

θ = cos−10.71

8= 84.9. (3.11)

The output of the cross–correlation step is summarized to describe the relative

phase lags Li,j and the maximum correlation values Wi,j for each of the 6 pairwise

correlations.

3.3.2 DOA Estimation

Given the lag components and their weights, we now use a weighted least–squares

minimization to estimate the most likely DOA. We define the constraint equations

based on the geometry of the array and on the array coordinate system. From the

array coordinate system, we can define the unit vector for a particular direction:

cos θ cos φ

sin θ cos φ

sin φ

. (3.12)

The phase lag observed between two microphones is therefore the component

of that unit vector along the vector between the two microphones. Therefore, by

combining the lags Li,j computed in the previous step with the array geometry

and coordinate system (recall from Figure 2.3), we can therefore derive constraint

equations in the variables (V, θ, φ):

44

V

0 8 0

−8 8 14

−8 0 0

−8 0 14

−8 −8 0

0 −8 −14

cos θ cos φ

sin θ cos φ

sin φ

=

L1,2

L1,3

L1,4

L2,3

L2,4

L3,4

. (3.13)

Although these constraints are not linear functions, we can apply an iterative

non–linear least squares minimization, starting with an initial estimated value

for the variables and iteratively improving upon that estimate. To determine

the initial estimate, we test four hypothesis angles per quadrant, and select the

minimum of those 32 alternatives. Once we have an initial estimate, we itera-

tively apply a weighted least squares minimization to a linearized version of the

constraint equations, with the weights defined by the values of the maximum

correlation values Wi,j.

Linearization is a technique in which the partial derivatives of the constraint

equations are evaluated according to an existing estimate, and the resulting linear

equations are solved to determine a correction to update the estimate. This

technique is explained in greater detail in Section 4.3.2. In this case, the linearized

equations for the six constraints in Equation 3.13 are:

45

8 sin θ cos φ 8V cos θ cos φ −8V sin θ sin φ

−8 cos θ cos φ+

8 sin θ cos φ+

14 sin φ

V

8 sin θ cos φ+

8 cos θ cos φ

V

8 cos θ sin φ+

−8 sin θ sin φ+

14 cos φ

−8 cos θ cos φ 8V sin θ cos φ 8V cos θ sin φ

−8 cos θ cos φ+

14 sin φ

8V sin θ cos φ V

8 cos θ sin φ+

14 cos φ

−8 cos θ cos φ+

−8 sin θ cos φ

V

8 sin θ cos φ+

−8 cos θ cos φ

V

8 cos θ sin φ+

8 sin θ sin φ

−8 sin θ cos φ+

−14 sin φ

−8V cos θ cos φ V

8 sin θ sinφ+

−14 cos φ

dv

dθ

dφ

=

L1,2 − V (8 sin θ cos φ)

L1,3 − V (−8 cos θ cos φ + 8 sin θ cos φ + 14 sin φ)

L1,4 − V (−8 cos θ cos φ)

L2,3 − V (−8 cos θ cos φ + 14 sin φ)

L2,4 − V (−8 cos θ cos φ− 8 sin θ cos φ)

L3,4 − V (−8 sin θ cos φ− 14 sin φ)

. (3.14)

These linearized constraints are approximations that are only valid in the

region surrounding the initial estimate, meaning that a large movement of a

variable is a sign of potentially dangerous instability. This iterative algorithm

is also subject to settling on local minima; the algorithm implements steepest

descent from the initial estimate to reach a local minimum. We found that the

objective function is quite smooth and our relatively lightweight initial estimate is

sufficient to get into the region of convergence for the true minimum. Figure 3.11

shows a plot of the objective function for a test with φ = 0, θ = 281.

The most serious problem we encountered with this estimation mechanism was

that the results for φ > 50 deg tended to be skewed for some azimuth angles. We

46

020406080100120

Min

imum

Err

or

θ

φ

Plot of DOA objective function for θ = 281 deg

0 50 100 150 200 250 300 350

-80-60-40-20

020406080

Figure 3.11: Plot of the DOA objective function observed for a test with φ = 0, θ = 281.

hypothesized that this was caused by the “self–ranging” speaker which is placed

directly over the channel 0 microphone. This obstruction introduces excess lag

on that channel for high elevation arrivals. By compensating for this lag with a

fixed constant, we achieved an improvement in convergence for the zenith angle

estimates.

In some cases, the NLLS estimate does not converge well and yields a local

minimum. In these cases, we achieved results closer to ground truth through a

brute force solution in which we tested every possible angle at 1 deg intervals.

We apply this fallback mechanism whenever the weighted sum of the 6 residuals

is greater than 10 cm.

In Section 9.1 we discuss these results in more detail. There, we speculate

that a more thorough per–array calibration procedure would yield a performance

improvement for both zenith and azimuth angle estimates. We also hypothesize

47

that shadowing of one microphone by the mounting for another might explain

some of the estimation errors. In that case, an outlier rejection heuristic might

help to address that problem.

3.3.3 Alternative Approaches to DOA Estimation

Some work has been done to develop DOA estimators that use all of the infor-

mation in the signals rather than only using one lag estimate per pair of micro-

phones [RF03]. These techniques work by assuming some fixed set of hypothesis

directions, and testing each hypothesis to find the direction with the maximum

correlated energy.

We experimented with this type of solution in our angular correlation algo-

rithm (AC), but found its performance to be lower than was achievable using

2–TDOA. In our solution, for each hypothesis angle, we phase–shifted the signals

according to the lags induced by the hypothesis angle and correlated all four

channels together at that specific lag configuration. This algorithm is similar to

a “sliding correlator”, but where a sliding correlator adjusts the relative phase of

two signals for all possible linear phase lags, our “angular correlator” adjusts the

phase of all four signals according to the lags induced by all possible 3D incoming

angles.

While the AC algorithm in principle uses more of the information in the input

signals, it performs poorly in practice, because it makes rigid assumptions about

the geometry of the array. In 2–TDOA, the 6–way cross–correlation determines

the most likely lag measurements regardless of whether those lags are an exact

fit to the geometry of the array. In fact, many practical considerations reduce

the likelihood that the empirical geometry will conform exactly to the nominal

specifications. There are many factors that affect the empirical geometry, in-

48

cluding error in microphone mounting, phase dependence on incoming angle, and

environmental variations in the speed of sound.

Because of these slight deviations, our AC algorithm often misses the peak

correlation lags that the 6–way cross–correlation in 2–TDOA consistently locates.

The probability that all four lags are exact (a condition that would result in AC

computing the maximal correlation) is quite low in practice, and there is a rapid

falloff as the correlation moves off–peak. In comparison, the 2–TDOA algorithm

extracts the maximal energy from each pairwise cross–correlation, and then fits

those lags to the geometry, allowing for error in both lag detection and array

geometry. As a result, the AC algorithm failed to improve upon the 2–TDOA

algorithm, despite taking a larger amount of signal information into account.

3.3.4 Recombination

Once we have an estimate of the direction of arrival, we can use that information

to recombine the signals according to the expected phase offsets into a single time

series with higher SNR. This technique is often called beam–forming or spatial

filtering. To do this combination we apply a heuristic that combines the DOA

estimate computed in the previous step with the results of the pairwise cross–

correlation.

A spatial filtering algorithm takes as an input M input signals Si, their weights

wi and their relative phase offsets pi. The filtered signal S ′ is computed by the

weighted sum:

S ′

j =M∑

i=1

wiSi,j+pi. (3.15)

49

Given a DOA estimate, we can compute the corresponding lags using the

formulæ given in the constraint equations for the 2–TDOA algorithm in Sec-

tion 3.3.2. However, because of variations in the array geometry these computed

lags do not perfectly match the observed maximum correlation values.

To address this problem we apply the following heuristic to adjust the phase

lags computed from the DOA estimate so that they fit the observed correlation

peaks more accurately:

AdjustLags(DOALags[4], XCorrLags[6], XCorrPeaks[6])

indices [6 ]← 0,1,2,3,4,5

mapi [6 ]← 0,0,0,1,1,2

mapj [6 ]← 1,2,3,2,3,3

AscendingSortByKey(indices,XCorrPeaks, 6)

for index ∈ [0, 5]

do i ← indices[index]

norm ← XCorrPeaks[i]/max(XCorrPeaks, 6)

curr ← (DOALags[mapj[i]]−DOALags[mapi[i]])

∆← norm ∗(XCorrLags[i]− curr)

if ∆ < 2 samples

then DOALags[mapj[i]]← DOALags[mapj[i]]−∆/2

DOALags[mapi[i]]← DOALags[mapi[i]] + ∆/2

This heuristic will ignore cross–correlation measurements that differ greatly

from the DOA estimate. When correcting the lags, it will favor corrections that

match the highest cross–correlation peaks, by scaling down corrections resulting

from lesser peaks and by performing the corrections resulting from larger peaks

last. Figure 3.12 shows an example of the improvement yielded by our heuristic,

relative to combination based only on the DOA estimate.

50

-10

-5

0

5

11260 11280 11300 11320

SN

R

Samples (48 KHz)

Combined Signal at 101 (80m range, Court of Sciences)

Combined using DOA EstimateCombined using Heuristic

Detected Peak Phase

Figure 3.12: Combined signals for the trial from Figure 3.5. The two curves show the

effect of recombination using the straight DOA estimate and our heuristic.

3.3.5 Peak Detection

Once the signal has been recombined, we need to determine the onset of the rang-

ing signal in order to get a range estimate. Figure 3.12 shows an example of the

recombined signal for the 80m trial described earlier in Figure 3.5. Determining

the peak is a difficult problem because the output of the correlation function

has strong negative and positive peaks, and often features trailing periodic rever-

beration. Depending on environmental conditions, this reverberation can often

approach or exceed the initial peak, so a simple maximum peak value is not a good

solution. If the goal is to achieve accuracy on the order of 1 cm, our selection

heuristic must consistently select the “same” peak for different measurements,

because the peaks are typically several samples apart.

The first key part of our heuristic lies in the definition of a “peak”. Rather

51

than measuring the absolute height of individual peaks, we instead measure

“swings” peak–to–peak, associating each swing with the preceding peak. This

is helpful because this eliminates the effect of any DC bias in the signal, which

could otherwise throw off an absolute measurement of peak value.

The second part defines the metric used to select a peak. By looking at the

correlation functions, we observed that a good peak selection typically has both

a large swing and a high slope. Thus, we define the metric for peak selection as

the magnitude of the swing times the slope of the swing, or S2/R where S is the

magnitude of the swing and R is the length of the “run”. We found that using

this metric, a simple maximum achieved good results, because reverberation and

pre–onset noise tends to have a lower slope than the main pulse.

Future work might develop a better metric. Possible avenues for analysis

include:

• Analyzing the distributions of swing value and slope.

• Tracing backwards from the maximum through the contiguous peak region.

• Applying information from the autocorrelation of the selected code.

3.4 Environmental Effects

Environmental parameters affect the accuracy of our ranging system. Temper-

ature, humidity, and wind affect the speed of sound, and therefore can have a

significant impact on the accuracy of a measured range. Note that because DOA

estimation operates on relative phase shifts, these parameters generally have min-

imal impact on DOA.

Because temperature and humidity affect the speed of sound in air, they cause

52

error that scales with distance. Wind can carry the sound in a particular direc-

tion, also resulting in a distortion of the results. These errors can be significant;

for example, a 1 deg C offset in temperature results in approximately 1% error.

For a 80m range, a 1% error dominates over the other sources of error in the

system, all of which add up to a few cm at the most. This can be seen more

clearly in Chapter 9, where we will see that the range measurements exhibit high

precision even when the accuracy is compromised by environmental parameters.

Our initial strategy was to try to compensate for temperature by measuring

the temperature and humidity and correcting the speed of sound. This proved

unsuccessful for two reasons. First, it is difficult to measure these parameters

accurately enough to achieve good results. At 80 m, a 10 cm error translates

to about 0.2 deg C, but thermometers are generally specified to measure with

at most ±0.5 deg C. Second, the temperature measurements were only made at

a few points. However, the distortion of the propagation of the ranging signal

is really a path integral of the speed of sound over the path followed by the

signal. Measuring this value is difficult, and adjusting based on a single point is

dangerous because that point may be subject to significant local variations.

Because of these issues, we settled on a different strategy, in which we avoid

the problem. First, we recommend designing the system to perform calibrations

at night, when signal quality will be better and when the environment has reached

a steady–state temperature. Second, we design the system to take all of the cali-

bration measurements in a short time span, during which we can expect minimal

environmental change. Third, we use the multilateration algorithm described in

the next chapter to discover a map based on the geometric consistencies and inde-

pendent of scale, and then fit the resulting map to surveyed points to determine

the scaling factor implied by the environmental parameters.

53

CHAPTER 4

Multilateration

After we have estimated ranges and angles and determined confidence values,

we must synthesize this data into a single consistent map, by determining the

Cartesian coordinates and orientations for the arrays. While many solutions in

the literature address the problem of ad–hoc positioning, one component of our

work that has not been discussed much is the problem of orientation calibration.

As we discussed in Chapter 2, this problem is critical for our application, because

our platform must support localization applications based on DOA and beam–

crossing techniques.

4.1 Overview and Context

Our solution takes the general structure of iterative refinement. We begin by

computing an initial estimate, and then refine that estimate by solving a set of

constraints. In Sections 4.3.2 and 4.3.1 we present the details of two constraint

representations which we tested. Once our constraints reach convergence, we

analyze the system to determine if there are any constraints that fit very poorly,

and if so, remove the worst offender and try again.

Our iterative process solves for position and orientation in separate, inter-

leaved constraint systems. After the primary constraint system converges on an

improvement to the position estimates, we determine the incoming angles based

54

on that system and compare them to our DOA measurements. For each node,

we compute the average difference between measured and computed angles to

estimate a orientation bias for that node relative to the current set of position

estimates. We report this bias as the node’s orientation estimate. This algorithm

will be described in more detail in Section 4.4.

A number of proposed localization systems have focused on developing dis-

tributed algorithms that implement this map–building operation while mini-

mizing network transmissions, CPU usage, or both [CGS04] [SHS01] [SPS03]

[KMS05] [LR03]. In this work we chose to implement a centralized solution

rather than a distributed one, for two reasons. First, a centralized solution is

much simpler and requires a simpler communication protocol. Second, it is very

difficult to filter the data effectively without developing an over–constrained sys-

tem. Most distributed algorithms focus on collecting the smallest amount of

information required to perform multilateration, e.g. [SPS03]. However, because

this approach rarely yields over–constrained systems, it is much more difficult to

detect and reject bad data resulting from either system errors or environmental

problems such as obstructions.

There are numerous examples in the literature of solutions to this type of

problem based on least–squares minimization [CGS04], multi–dimensional scal-

ing [Tor52] [CGS04] [KMS05] [JZ04] [RD04] and maximum likelihood optimiza-

tion [RD04]. To apply these techniques, we represent our measured data as

weighted constraints. The weights are determined by modeling the error in the

underlying measurements. We can capture a model from our controlled tests,

although that model may not hold in general: obstructions and other envi-

ronmental effects may impact the distribution of errors. In our work we have

characterized the underlying ranging components to obtain an estimate of the

55

uncertainty in those measurements, and then implemented filtering at different

points in the process to reject bad data. We have applied two different forms of

least–squares minimization, one that results in a linear system similar to those

described in [CGS04], and one that is based on extending a multi–dimensional

scaling (MDS) solution and that we solve as a non–linear least–squares (NLLS)

problem.

We reject bad data in two ways: by checking for geometrical inconsisten-

cies using DOA information, and by checking for constraint inconsistencies using

the method of studentized residuals [WJH97]. Past work in this area has used

the triangle inequality to reject range information based on inconsistent geome-

try [GBE02] [KMS05]. In Moore’s work [MLR04], more sophisticated geometric

analyses were used to avoid inconsistency. However, while these methods may

be good choices for systems that provide ranges only, they are not as effective

as using the direction information that our system provides. Using DOA we can

immediately spot reflections by identifying ranges that arrive from the wrong

direction, and reject that data. Because virtually all significant errors in range

data are the result of reflections in obstructed environments, this technique is

very effective.

In the remainder of this chapter, we discuss our multilateration algorithms

in detail. The following sections present each step of the process in turn, from

developing the initial estimate, through position estimation and orientation esti-

mation.

56

4.2 Prefiltering and Initial Estimation

In Section 2 we defined the calibration problem to be a solution to an estimation

problem that computes the Cartesian coordinates and an absolute orientation

for each node in the system. We defined these unknowns as Xi, Yi, Zi, Θi for

each array i. The inputs to this estimation problem consist of the data from the

ranging layer described in Chapter 3. Each node in the system will emit ranging

sounds, that are detected by the other nodes in the system. Each node i will

record detected ranges Ri,j and angles θi,j and φi,j, where j is the emitter of the

signal being detected1. Each node performs five trials within a short span of

time, in order to minimize variation in environmental conditions. These values

are collected centrally to be processed.

In the first phase of processing, the data is filtered to remove obvious incon-

sistencies. The five trials are taken by each node are first filtered by selecting the

entry corresponding to the median angular estimate2. These medians are kept

and the remaining entries are dropped.

Whenever there are bidirectional ranges, the smaller of the forward and reverse

paths Ri,j and Rj,i are retained, while the forward and reverse angles are selected

based on the angular confidence estimate. However, if a range is more than 3

meters longer than its reverse, or if it is more than 10% longer than its reverse,

or if a range is dropped as an outlier, its corresponding angles are also dropped.

The logic behind this heuristic is that large range errors are usually the result of

reflections, which will also corrupt the angle measurement. Medium–length long

ranges are usually the result of minor obstructions, such as foliage. While these

1Note that we use Θi to mean the orientation parameter being estimated and θi,j to meanthe DOA estimate to node j measured by a node i.

2To compute the median angle, we use a heuristic that first selects the largest subset of thedata that lies in two contiguous quadrants.

57

DoGuess(node[],N, i)

node[i ].params ← AverageEstimates(node, i))node[i ].state ← guessed

node[i ].estimates ← nil

for 0 ≤ j < N

do if (node[j ].state = free) ∧ (∃(R, θ, φ)i,j)then Append(node[j ].estimates ,

Extrapolate(node[i ].params , (R, θ, φ)i,j))

InitialGuess(node[],N)

for 0 ≤ i < N

do node[i ].estimates ← nil

node[i ].state ← free

i ← arg maxi CountRanges(node[i ])1 DoGuess(node, i)

i ← arg maxiLength(node[i ].estimates)if node[i ].state = free

then goto 1

Figure 4.1: Algorithm for determining an initial parameter estimate.

ranges should be ignored, the angle is likely to still be fairly accurate. Note that

all of this data is subject to rejection after each constraint solving step, in the

event that it appears to be an outlier.

After pre–filtering the range and angle data, we next construct an initial

estimate of the system parameters. We construct this estimate by first selecting

an origin point. We choose an origin by finding the most “well–connected” node,

that is, the node with the largest number of ranges and angles to other nodes.

Once that node is selected, we can define its location as the origin and begin

extrapolating the locations of other connected nodes using the algorithm shown

in Figure 4.1.

58

4.3 Two Solutions to the Position Estimation Problem

In this work we tested two different solutions to the position estimation problem:

R–θ and Non–Linear Least–Squares (NLLS). In the R–θ scheme, we represent

each range and DOA estimate as a vector in R3 linking the coordinates of the

two nodes, resulting in a set of three linear equations. In the NLLS approach, we

represent each range and angle constraint separately, resulting in a set of three

non–linear equations that can be linearized and solved with iterative techniques.

The R–θ scheme is fast and simpler to understand, but it does not perform as

well as the NLLS approach.

4.3.1 R–θ

The R–θ scheme results in a set of three linear equations for each range and angle

estimate. A 2–dimensional R–θ solution is described in [CGS04] and [CDG03].

Let us consider from the perspective of node Ni. Let Ri,j, θi,j, and φi,j be node

Ni’s estimate of the range, azimuth and zenith DOA to node Nj, and let us

assume Θi to be an initial estimate of the orientation of node Ni. We can then

write the constraints

Xj −Xi = Ri,j cos(θi,j −Θi) cos φi,j, (4.1)

Yj − Yi = Ri,j sin(θi,j −Θi) cos φi,j, (4.2)

Zj − Zi = Ri,j sin φi,j. (4.3)

Since we assume Θi to be constant, and all of the other values are measure-

ments, the only variables are the position estimates (Xi, Yi, Zi). We also note

59

Measurement Mean Error Standard Deviation

Range (cm) -2.38 1.76Azimuth (deg) 0.14 0.96

Zenith Overall (deg) 2.22 7.97Zenith, −30 deg–+45 deg 0.26 0.86Zenith, +45 deg–+90 deg 0.31 2.29

Table 4.1: Error Distributions for Range and DOA Estimates.

that these are linear equations and this are readily and efficiently solved using

weighted least–squares minimization, for example using singular value decompo-

sition [PTV92]. This technique assumes that the errors in the data are normally

distributed, and if so, a weighting value can be applied to each equation by mul-

tiplying through both sides by the square root of that weight.

This weighting value is inversely proportional to the square root of the stan-

dard deviation of the distribution3. Thus, in order to choose a weight we need

an estimate of the distribution of the errors in the measurements. Based on

the experiments we will describe in Chapter 9, Table 4.1 provides the standard

deviation values for range, azimuth and zenith estimates.

We can now apply these distribution estimates to derive weightings for each

of the constraints. First, we note that the uncertainty in one of our constraints

is largely due to the angular estimates. The angular error causes a position

error that is proportional to the range. Since we see that our range estimates

are accurate to within a few centimeters, and typical inter–node spacings are on

the order of tens of meters, the position error resulting from angular error will

dominate.

3The square root is needed because the least–squares minimization implicitly squares theerror terms.

60

Thus, neglecting the uncertainty in the range estimate, we can derive weight

values for our constraints:

wXi,j= max (Ri,j cos(θi,j −Θi ± σazi) cos(φi,j ± σzen)) , (4.4)

wYi,j= max (Ri,j sin(θi,j −Θi ± σazi) cos(φi,j ± σzen)) , (4.5)

wZi,j= max (Ri,j sin(φi,j ± σzen)) . (4.6)

Each row of the constraint matrix is then divided by the square root of the

appropriate weight value, and it is solved in the normal fashion.

After solving, we can analyze the residuals to find the constraint that con-

tributes the most to the fit error, and possibly drop it as an outlier. We describe

this process in Section 4.5.

The R–θ solution is computationally fast, but as we will see in Section 10, it

does not perform well. The reason is that all of the constraints involve angular

measurements, and therefore they cannot take advantage of the much higher pre-

cision provided by the range measurements. This results in an order of magnitude

greater average positioning error, as well as an order of magnitude poorer fit to

the ranging data.

These results are fairly consistent with the results from [CGS04]. Although

that paper lauded their simulations of the R–θ technique as a success, when

applied to scenarios with 20–meter inter–node spacing, even according to their

data this technique results in considerable error.

61

4.3.2 Iterative Non–Linear Least–Squares Minimization

The second approach we investigated uses Iterative Non–Linear Least–Squares

Minimization (NLLS). A similar approach to this is also described in [CGS04],

where they call it Iterative Least–Mean–Square Refinement.

While systems of linear equations can be solved quickly, they are very limited

in the sorts of constraints they can express. For example, a central constraint for

a multilateration algorithm is the distance formula

Ri,j =√

(Xj −Xi)2 + (Yj − Yi)2 + (Zj − Zi)2. (4.7)

However, this constraint (along with others detailed in Section 4.3.2.2) can-

not be solved directly using linear algebra because it is not a linear function in

Xi, Yi, Zi. Instead, we use a technique called Linearization to convert a system

of non–linear constraints to a set of linear equations. By solving these linear

equations, we can compute a refinement to our existing estimate of the system

variables, and iteratively improve our estimate.

4.3.2.1 Linearization

NLLS works by linearizing the non–linear constraint equations, and then solving

the system in an iterative fashion. A constraint can be linearized by first deter-

mining an initial estimate for the parameters, and then expanding the constraint

as a Taylor series around that initial estimate. Thus, for an arbitrary constraint

function F (X) = K, where X is an N element vector with current estimates X,

the linearized constraint is:

62

F (X) + F ′(X)(X − X) + ... = K (4.8)

F (X) +N∑

i

∂F

∂Xi |X(Xi + Xi) + ... = K. (4.9)

If we neglect the higher order terms of the Taylor series, the resulting function

is a linear approximation to the constraint function, valid in the neighborhood

of the estimate X. Using this linearization technique, we can express a set of

constraints Fi(X) = K as a linear system

Ax = b, where (4.10)

Ai,j =∂Fi

∂Xj |X(4.11)

xj = Xj − Xj (4.12)

bi = Ki − Fi(X). (4.13)

By solving this linear system, we determine the x that maximize consistency

of the system. Since these xi are deltas on our original estimates X, we can

iterate by updating X ← X + x, recomputing A and b, and solving again until

we reach a stopping condition.

To apply this technique, we must have a way to determine an initial estimate

for our parameters, and we must compute these expansions for the constraints

we plan to apply.

63

4.3.2.2 Non–Linear Constraint Equations

Once we have determined an initial estimate for a set of nodes, we can define

constraint equations that describe all of the information we have about them.

These constraints take the form of equations that should equal zero; when we solve

the set of functions we minimize the sum of the square errors in the constraint

equations. Since we plan to use NLLS to solve our system, we also derive a

linearized version of each constraint.

Range Constraints Each range between two nodes forms a range constraint

equation. This equation takes the form of the distance formula, with the coordi-

nates of each node being the unknowns we intend to estimate. So, given a range

estimate between nodes i and j of Ri,j, we define the range constraint to be:

V Di,j = Ri,j, where (4.14)

Di,j =√

(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.15)

Linearizing this constraint,

dXiVXi −Xj

Di,j

+ dYiVYi − Yj

Di,j

+

dZiVZi − Zj

Di,j

+ dXjV−(Xi −Xj)

Di,j

+

dYjV−(Yi − Yj)

Di,j

+ dZjV−(Zi − Zj)

Di,j

= Ri,j − V Di,j. (4.16)

Note that we added a scalar V to the constraint above to enable the range

64

constraints to expand uniformly, for example as a function of temperature. In

cases where certain nodes are anchored, V can allow the map to adjust to accom-

modate expansion in the ranging measurements relative to fixed anchor points.

V can be estimated separately from this system as described in Section 4.4.

Azimuth Constraints Each azimuth estimate recorded by a node can be ex-

pressed with an azimuth constraint. This equation uses the arctangent function

to relate the azimuth angle θi,j to the node coordinates and orientation parameter.

So, given an azimuth estimate θi,j measured at node i:

arctan(Yj − Yi, Xj −Xi) + Θi = θi,j. (4.17)

Note that in this equation, arctan is used to mean the atan2() function, that

uses the signs of its arguments to return angles in all four quadrants. Note also

that we do not consider Θi a variable in this system of constraints. Instead, we

iteratively estimate Θi interleaved with iterations to estimate (Xi, Yi, Zi). Thus,

linearizing this constraint,

dXiTi,j

Yj − Yi

(Xj −Xi)2− dXjTi,j

Yj − Yi

(Xj −Xi)2

− dYiTi,j

1

Xj −Xi

+ dYjTi,j

1

Xj −Xi

= θi,j − (arctan(Yj − Yi, Xj −Xi) + Θi), where (4.18)

65

Ti,j =1

1 + ( Yj−Yi

Xj−Xi)2

. (4.19)

Zenith Constraints Each zenith estimate recorded by a node can be expressed

with an zenith constraint. These equations relate the Z dimension to the observed

zenith angles φi,j. Because node deployments are often mostly in a plane, these

equations are strong contributors to accurately estimating node position along

the Z axis4.

So, given zenith estimate φi,j measured at node i:

arctanZj − Zi

√

(Xi −Xj)2 + (Yi − Yj)2= φi,j. (4.20)

Linearizing this constraint,

dZiT′

i,j

−1

D′

i,j

+ dZjT′

i,j

1

D′

i,j

− dXiT′

i,jKi,j(Xi −Xj) + dXjT′

i,jKi,j(Xi −Xj)

− dYiT′

i,jKi,j(Yi − Yj) + dYjT′

i,jKi,j(Yi − Yj)

= φi,j − arctanZj − Zi

D′

i,j

, where (4.21)

4Because of the aforementioned poor error properties of angular estimates, this also meansthat the Z estimates may not be very accurate in flat deployments

66

T ′

i,j =1

1 + (Zj−Zi

D′

i,j

)2, (4.22)

D′

i,j =√

(Xi −Xj)2 + (Yi − Yj)2, (4.23)

Ki,j = (Zj − Zi)((Xi −Xj)2 + (Yi − Yj)

2)−32 . (4.24)

Anchor Points In all of these constraints, we make no assumptions about

anchor points. In fact, our position estimation system works well without any

anchor points other than a single point that is defined to be the origin: the

mixture of angular constraints and range constraints will fix the map into a

specific orientation, while the origin point fixes it to a specific coordinate frame.

Alternatively, any number of points in the system can be assigned constant

coordinates. Any constraints that involve only those nodes drop out of the system,

and constraints that relate that node to other nodes retain only the terms relating

to the non–anchor node. Anchor nodes can be helpful, by reducing error that

might accumulate in very large systems5. However, anchors can also introduce

errors, because their placement is not perfect, and their coordinates may not be

scaled to match the ranges.

To address the scaling issue, the V factor in the range constraints enables the

ranges to scale to match the coordinates of the anchor points. Placement errors

can be absorbed by the existing range and angle constraints, although the errors

in placement are likely to have a different distribution than the errors in range

and angle measurements, so the weighting of those constraints may need to be

modified. It may also be more appropriate to process placement constraints in

5Although, the results in Section 10 show negligible error for the largest systems we coulddeploy (10 nodes).

67

the “interleaved” solution step described in Section 4.4, or a post–processing fit

step as described in Section 4.6.

4.3.2.3 NLLS Constraint Weighting

Once we have our linearized constraint matrix, we can solve it using standard

linear algebra techniques such as Singular Value Decomposition [PTV92]. We

can then add the refinement values into our estimates and iteratively converge on

a solution. Each iteration will minimize the sum of the squares of the “residual”

values for the constraint equations. However, just as in the R–θ case, these

residual values are naturally in different units, and must be properly normalized

according to estimates of the variance of the measurements.

We can use a similar technique here to the case of R–θ. The range constraint

residuals are in units of cm, and from Table 4.1 we know the standard deviation

of the ranges. For the angular constraints, the residuals are measured in radians.

Thus, in order to make them commensurate with the range residuals we need to

apply weights to scale the standard deviation of the azimuth and zenith angles

to equal the standard deviation of the range measurements. Therefore, we can

define weights:

wrange = 1.0, (4.25)

wazi = σrange/σazi, (4.26)

wzen = σrange/σzen. (4.27)

In our implementation, we also take into account confidence estimates for

the angular constraints, as well as varying σzen as a function of φ. Since these

68

weights are constants for each constraint equation, we can apply an arbitrarily

sophisticated weighting scheme and variance estimate.

4.4 Interleaved Orientation Estimation

In the previous section, we presented two different schemes for estimating the

node positions (Xi, Yi, Zi). Both of these schemes assumed that the node orien-

tations Θi were already known and constant. While we can get an initial estimate

of the orientations in the algorithm described in Figure 4.1, our system must also

refine this estimate. Unfortunately, including Θi as a variable in the same system

of constraints introduces problems because it allows the entire map to rotate.

To address this, we iteratively refine each set of variables in an interleaved

fashion. After each refinement step, we use the updated node positions (Xi, Yi, Zi)

to derive new estimates for the node orientation. The node orientations are

computed by averaging the differences between the measured angles and angles

computed based on the current position estimates. This average represents a bias

between the observations and the estimated positions, which we deduce to be the

orientation of the array.

After each pass improves the orientation estimates, the position estimates

should likewise improve because the angular constraints should be more consis-

tent. This process continues until a termination condition is reached. In the case

of the R–θ algorithm, this process terminated when the orientation correction

drops below a fixed threshold. In the case of the NLLS algorithm, this pro-

cess normally terminates when all components of the NLLS refinement step drop

below a fixed threshold.

For NLLS, the orientation correction introduces an additional problem. If

69

there is an overall bias in the angular constraints, correcting the orientation will

cause the NLLS algorithm to react by rotating the entire map in the opposite

direction. We address this with two additional mechanisms. First, we rotate the

map after each orientation correction, so that the orientation offsets of the nodes

always average 0. Second, we stop improving the yaw estimate after the first 10

iterations of the NLLS refinement algorithm. Since the yaw estimate converges

quickly, continuing to estimate and rotate serves no useful purpose.

After the tenth iteration, we perform an angular “sanity check”. Because

we now have a reasonable estimate of both yaw and position, we can now tell

if there appear to be angles that are significantly off relative to the position

estimates. These bad angles are likely to be associated with reflected ranges,

since the reflected path arrives at a different angle than the line–of–sight path.

During the angle check, we locate the worst angular inconsistency in the whole

system, and drop the range and angles associated with that angle if the error is

greater than 20 degrees. We then re–enter the yaw estimation mode for another

three iterations, before performing another angular check. If the angular check

finds no outlier angles, we then drop out of yaw estimation mode for the remainder

of the algorithm.

In addition to estimating the “yaw” orientation of each array, we also estimate

the “pitch” and “roll” of each array using a similar averaging technique. However,

while we use the “yaw” estimate in the azimuth constraint equations to correct

the azimuth DOA measurements, we did not find that correcting the zenith angles

was helpful. However, we assume that deploying the arrays such that they are

level is relatively easy and that in any case the accuracy of zenith angles is less

critical to the applications.

Other corrections and estimations can be included in this interleaved step,

70

for example the estimation of the V scaling parameter. However, we leave the

exploration and implementation of these ideas to future work.

4.5 Outlier Rejection Using Studentized Residuals

One of the fundamental assumptions underlying this algorithm is that the error

in the inputs can be modeled as Gaussian. However, in practice this is not

always true. The error observed in ranging and angular estimates often includes

very large outliers that can wreak havoc on a system of constraints. Typically,

outliers arise when the line–of–sight from sender to receiver is blocked and a

strong reflection is observed. These reflections arrive at the wrong angle and

in addition take a much longer path than the original. If the obstruction is a

solid and permanent one, repeated experiments typically yield the same wrong

answer repeatedly, throwing off error estimates based on the variance of individual

measurements. While the effort in Chapter 3 has resulted in a highly effective

and resilient ranging sensor, in many environments it will still yield incorrect

answers.

To address the problem of outlier rejection, we use the technique of studentized

residuals [Har05] [WJH97]. Analysis of raw residuals in the solution of a linear

system does not generally yield useful information, because the largest residuals

are often not the most important ones. In fact, the opposite is true, because the

constraints that are “easiest to move” will yield the largest residual error terms.

Studentized Residuals is a technique that weights the residuals inversely to

the standard deviation of that residual’s value in response to perturbations to the

system. That is, if a particular residual would change dramatically in response

to other constraints being removed, that residual would be considered to have

71

a high variance. Thus, Studentized Residuals normalizes the magnitude of each

residual so that a high value connotes both a large residual error and a low

variance—suggesting that this error has a broad impact on the system.

The rejection heuristic runs after the system has converged. If a Studentized

Residual is found to be over a fixed threshold, the constraint corresponding to

the largest residual is dropped and the system is run again to convergence. If no

residual is over the threshold, the estimation is considered complete. The fixed

threshold is derived empirically; in our experience (see Section 10) a threshold of

4 works well.

4.6 Performance Metrics

In Chapter 10 we present the results of our experiments running our system.

However, first we need to define metrics to accurately measure the performance

of our system.

We use two metrics to assess the effectiveness of the position estimation algo-

rithm: a quality–of–fit metric that is independent of ground truth, and a position

error metric after an affine fit to ground truth. These metrics are similar to, but

slightly modified from, those discussed in [SMP03] and [SMP02].

The quality–of–fit metric is formed by the distribution and statistics of the

residual error terms of the constraints. For example, consider the average differ-

ence between the measured ranges and the computed ranges based on the position

estimates:

2

N(N − 1)

∑

Ri,j −√

(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.28)

72

This formula provides some insight into the degree of consistency in the even-

tual fit. If this average is on the order of the expected variance in the range

measurements, and if the distribution of errors is normal, this suggests that the

fit is very consistent and that if the system is well constrained, the results are

likely to be correct. This is similar to the EF metric developed in [SMP03]

and [SMP02], except computed as an average rather than a sum. It is unclear

how to properly interpret a sum, as its magnitude varies greatly with the number

of input ranges in the system.

The position error metric relies on access to (Xi, Yi, Zi), representing ground

truth position information for each node i. However, when computing the dis-

tance between corresponding points, we want to make comparisons that take

only the shape into account, ignoring differences of translation, scale, and ro-

tation. The Procrustes method is a collection of techniques for characterizing

shape [DM98]. This method formalizes techniques that extract a characteristic

shape from a set of points, and define transforms that filter out translation, scale,

and rotation to fit an experimental dataset to a characteristic shape. By fitting

our estimated maps to ground truth landmarks, we accomplish several objectives

at once: we relate our map to a real coordinate system, we define a metric to

measure position error, and we define a way to compare repeated trials to each

other.

Our fitting process is similar to the Procrustes methods, but we have modified

it to implement several forms of outlier rejection. Although our fit metric has

given good performance, these methods might be improved more formally in

future work. The fit process involves four steps. First, we compute a scaling

factor from one map to the other. Second, we translate the maps so that the the

node closest to the centroid is the origin, and scale the maps according to our

73

computed scaling factor. Third, we rotate the maps in three dimensions according

to the average angular offsets between corresponding nodes. Finally, we translate

the estimated map by the average difference between all corresponding points.

This fit method is not perfect, but it allows the easy integration of outlier

rejection heuristics, that would otherwise be difficult to implement. Several tech-

niques that are resilient to outliers are described in [DM98], including Least

Median of Squares solutions (LMS). We leave these implementations to future

work, and describe our present heuristic in the following sections.

Determining a Scaling Factor. Our transformation first estimates a scaling

factor by computing the ratio of the sum over all nodes of the distance to the

nearest neighboring node in the computed map,

V =

∑

i

√

(Xi −XMi)2 + (Yi − YMi

)2 + (Zi − ZMi)2

∑

i

√

(Xi − XMi)2 + (Yi − YMi

)2 + (Zi − ZMi)2

, where (4.29)

Mi = arg minj

√

(Xi −Xj)2 + (Yi − Yj)2 + (Zi − Zj)2. (4.30)

We initially used the centroid size metric described in [DM98], but we found it

to be very susceptible to mis–placed nodes that throw off the centroid. Our metric

is a modified version of the baseline size metric, which uses the distance between

two arbitrary points; our metric simply averages a collection of baselines. This

metric is resilient to outliers caused by a single node being mis–placed. Rather

than adding several ranges to a mis–placed node into the sum, this metric will

tend to only count errors from mis–placed nodes once.

74

Translation to the Centroid Node. Next, we locate the node closest to the

centroid of the map, and translate the maps so that that node is the origin in

both maps. This differs from the usual methods used in Procrustes techniques,

which locate the maps according to their actual centroid. The problem with using

the true centroid is that, similar to the problem with the centroid size metric,

mis–placed nodes often add significant error to the centroid.

Except in the unlikely case that the center node is mis–placed, we avoid this

problem by locking to a single node in the center of the map. The most central

node is rarely significantly mis–placed, because it is typically one of the most

well–constrained nodes. In general, this method does introduce some distortion,

because the error in the placement of the “origin” node is distributed throughout

the map. Using the true centroid might be possible if we employed an iterative

technique to eliminate outliers and re–fit. We leave this possibility to future work.

Rotation. Next we rotate our estimated map in three axes to match the ground

truth map. To determine the rotation, we compute a weighted average of the

angular offsets of each corresponding pair of nodes,

θ =1

N − 1arctan

(

∑

i>0 R0,i sin(Ti)∑

i>0 R0,i cos(Ti)

)

, where (4.31)

Ti = arctan(

Yi

Xi

)

− arctan

(

Yi

Xi

)

. (4.32)

In order to make the metric more resilient to mis–placed nodes, the angular

average computed above is computed on the subset of the angles centered on the

largest cluster of angles within a 20 degree span. Angles lying outside of that

span are dropped form the average.

75

Then, we apply a rotation matrix for rotation about each of the three axes,

and scale to rotate the estimated map to match ground truth, for example to

rotate about the Z axis,

X ′

i = (Xi cos(θ)− Yi sin(θ))/V, (4.33)

Y ′

i = (Xi sin(θ) + Yi cos(θ))/V. (4.34)

This method might be improved by using a Least Median of Squares solution,

and by solving of all three axes simultaneously.

Final Translation. Finally, the translation between each pair of points is com-

puted and averaged, and that average translation is applied to translate the es-

timated map.

Because these transforms scale and rotate without warping, they serve to

match the estimated map to the ground truth without altering the consistency

achieved by the estimation algorithms. After fitting, we then compute the po-

sition error metric, both as a projection to the (x, y) plane and considering all

three dimensions. We consider both metrics important, since our typical “flat”

deployments tend to have greater error along the Z axis.

4.7 System Considerations

So far we have discussed a number of algorithms and heuristics that transform

raw range and DOA estimates into position estimates. Separately from these

algorithms, many system considerations influence how the position estimation

76

works.

First, we have seen that it is more important that the range data be consistent

than that it be completely accurate. In the absence of sophisticated calibration

techniques to compensate for environmental factors, we expect that range data

will be scaled by an unknown factor. However, if the ranges are consistent, the

resulting map will be uniformly scaled and it can be made quite precise by fitting

it to a set of anchors. This suggests that the system should be designed to capture

a consistent snapshot of ranges in a brief span of time in order to minimize the

impact of changing environmental conditions.

Second, as the size of systems grows, the position estimation process may need

to be broken down into phases and possibly distributed among a set of nodes.

However, since effective outlier rejection requires an over–constrained system, we

do not suggest a maximally distributed system; rather, a distributed set of nodes

that each perform centralized computations on data from its local region. Even

for a given centralized computation, many of these techniques involve expensive

matrix operations that grow as O(N3). To make these algorithms practical for

embedded systems, we may need to reduce N by estimating positions for only a

part of the map at a time.

Third, there are many different network protocol schemes that might be ap-

plied to this problem. In our initial implementation, we use a protocol called

StateSync to reliably publish all of the raw range and angle data, broadcast to

neighbors N hops away. Since every node receives all of the raw data, they all

locally compute a map, without requiring any further coordination. This works

well for small networks, and it is very simple architecturally, but it is not the

most efficient technique. A much more efficient structure, both in terms of net-

work and CPU usage, would elect leaders who coordinate the ranging process.

77

These leaders would first coordinate the nodes to schedule ranging into a brief

time period. After ranging, the nodes would send the results back to the leader,

which would process it into a map. Each leader would then publish their map to

the entire network, and maps from different leaders would be stitched together

into a single comprehensive map. We leave this implementation to future work.

78

CHAPTER 5

Robustness

So far in Part I, we have largely discussed algorithms, only briefly touching on

practical issues. However, making a system like this work is much more than

inventing an algorithm that works. To achieve a robust solution, a great deal of

effort must be devoted to handling error conditions that occur in deployments,

and many layers of scaffolding must be constructed to develop and test such a

system. In this Chapter, we briefly present a number of things that can go wrong,

and some general approaches for combating those problems. Then, in Part II we

discuss the scaffolding we have created to address these issues in building our

platform.

5.1 What Can Go Wrong?

Problems and failures can occur at many layers of the system.

Hardware Malfunctions. At the lowest layer, many different hardware mal-

functions can occur. A failure in the power system (such as battery failure, loose

wiring, or water damage) would cause a reboot or a permanent node failure, to

which other nodes would need to adapt. The wiring to the RF antenna can fail,

leaving the node unable to communicate, or in some cases able to send but not

receive. The wiring in the microphone array can fail, causing one or more input

79

channels to fail, or causing the output channel to fail.

Resource Limits. Resource availability failures can occur due to software

glitches or unusual conditions that cause unexpected resource usage patterns.

Memory or flash exhaustion can cause the system to become unresponsive, or to

reboot. Hardware resources can also be unavailable if there is a hardware prob-

lem such as a loose PCMCIA card or some other malfunction. Unusual hardware

conditions or software bugs can also cause components to restart when they en-

counter cases that they are unprepared to handle. We have also observed some

problems in the Linux kernel on our system (ARM Linux version 2.6.10), specif-

ically problems with the JFFS2 flash filesystem in the event that it gets close to

filled, and also some sporadic problems with our sound and wireless card drivers.

The Wireless Channel. The wireless channel is a well–known source for a

wide range of failures. Peer–to–peer connectivity is time–varying and often can

be asymmetric. Connectivity is affected by the physical properties of the channel

(which themselves are a function of the environment), as well as by the use of the

channel by other nodes in the system. Collisions with peers and, in the case of

the “hidden terminal effect”, collisions with nodes that cannot be received, are

another important source of message loss [Rap96].

Time Synchronization. At the time–synchronization level, synchronization

to a node may not always be achievable because of connectivity. The Reference

Broadcast Synchronization (RBS) [EGE02a] technique used in our system (de-

scribed in Chapter 7) places certain requirements on connectivity: that every

node to be synchronized receive broadcast messages in common with at least one

other node, and that a contiguous chain of such relationships be present between

80

any two nodes that need to be synchronized. Because this connectivity require-

ment is more stringent than simple connectivity, it is possible for a node to be

present in the system and communicative, without being able to be synchronized.

Ranging and Multilateration. Finally, the ranging and multilateration lay-

ers must deal with these different failures in the underlying system, as well as

detecting and reducing the many forms of error that can appear in the rang-

ing measurements. Ranging errors can occur due to detection of late arrivals

in obstructed environments, failures to detect due to insufficient signal quality,

or system failures that interfere with time synchronization or other parts of the

system.

To make our system work in spite of this variety of failures, we must develop

a system that incorporates many layers of robustness.

5.2 Strategies for Robustness

We apply several strategies to allow the system to continue working in the face

of these varied types of fault.

Fault Detection and Reporting. Often software components can detect and

report faults through self–checks and upon detection of error conditions. A fault

reporting service enables software components to report faults which are then

propagated to an operator who can address the problems. While we want the

system to continue to operate in the presence of faults, this mechanism enables

the operator to debug the system. In our experience this has been crucial to

getting the system deployed, as there are invariably some hardware failures due

to loose wires or deployment errors, which would otherwise be very tedious to

81

track down. The fault reporting system can also give the operator a hint to

investigate further when things go wrong.

Soft State Design. One of the most powerful tools in the pursuit of robustness

is soft state design [Cla88]. In soft state, an operation is periodically executed,

without making assumptions about prior state from past executions, relying on

caching and low–level retry to enhance performance. This technique allows a

trivial recovery from any possible error condition, simply by throwing away any

information that has been previously cached and running through the same code

path. Since all error conditions are handled as part of the normal code path, we

avoid the common problem of latent bugs in the error paths. This principle is

generally one of reducing the number of states a system can be in, reducing the

number of code paths, and avoiding lockups caused by inconsistent views of the

state of the system.

Reactive State Machines. Soft state design was popularized by distributed

systems and network protocols. However, it can also be applied within a node

to the communications channels between services, as well as to the design of the

services themselves. Reactive state machines are the application of the soft state

principle to the design of state machines.

In a reactive program, inputs feed in from a number of sources, and at each

step, the current set of inputs are used to determine the next output. In this

type of system the world model carried forward from step to step is designed to

be explicit. The term “reactive”, which was coined by Brooks [Bro86] and the

robotics community, suggests that rather than proceeding with old plans based

on old models, fresh inputs should modify the model on the fly and thus change

the behavior of the system on the fly.

82

For example, in application to wireless networking, many layers of the system

might need to react to a recently disconnected peer, changing their behavior in

response to that new condition.

Reduce and Simplify Inter–Node Dependencies. Inter–node dependen-

cies are costly because of the added complexity in handling the cases where a

peer node misbehaves or disappears. For example, any time your algorithm re-

quires a special leader node, there much be some additional mechanism to handle

the case that something goes wrong with the leader—whether it goes off–line

permanently or temporarily, or develops some more insidious fault that might

only affect its performance. Thus, design choices that simplify or eliminate de-

pendence on other nodes often result in a net simplification of the system. For

example, if there is a task that any node can do, it may make the most sense for

all nodes to perform that task independently, rather than devising some scheme

to have a single node perform the task and publish the result.

Fault Isolation. Process isolation in Linux enables a portion of the system to

fail without causing the whole system to fail. This enables the system to continue

to operate even if a subcomponent fails and restarts due to a software bug. The

benefit of this capability is that exit() is an acceptable way to address failures

that seem unrecoverable or that are unexpected. The caveat to this capability

is that the system components must be designed to survive the restart of their

underlying components. This increases the burden on the designer, who must at a

minimum be aware of the issues involved. However, this property is important as

the complexity of the system increases and when new conditions are experienced

during deployments.

The belief that more failures occur in the field than in the lab isn’t just an

83

urban legend—it is the result of running code for the first time in a different

environment. New timing relationships, new connectivity properties, and new

sensor inputs practically guarantee that the system will behave differently in the

field. As a result, the system may enter states that were not well exercised during

lab testing.

5.3 Successfully Managing Complexity

In Part II, we will see how we apply these ideas to build a layered stack of soft-

ware and system components that we can then integrate into a working system.

In this process, we manage complexity by dividing the problem into component

services, such as time synchronization, sampling, ranging, etc. Each service en-

capsulates a well–defined chunk of functionality, that is large enough to be useful

and small enough to be manageable, and provides a service to other modules and

to applications through published APIs. Throughout this process, we maintain

system visibility as a first–order goal: each service provides numerous debugging

and diagnostic interfaces in addition to its API. We will see how we apply the

many robustness strategies outlined above in the construction of our platform,

successfully addressing the many failure modes.

84

Part II

The Acoustic Sensing Platform

85

CHAPTER 6

Emstar: a Software Framework

Through our initial experiences developing distributed sensing systems we dis-

covered numerous impediments to developing deployable systems. While many

early projects found immediate success at solving relatively simple problems in a

one–off demo context, it proved more difficult to build more complex and robust

solutions on these early successes [GSR04] [ADB04] [ARE05]. Our desire to ex-

plore deeper and more powerful applications led us to develop Emstar [GEC04]

[GSR04] [EGE04] [EBB03b], a software framework for developing distributed

sensing systems from Linux systems.

Emstar is a complete software framework and development/deployment en-

vironment designed for distributed sensing applications. The role of Emstar for

distributed sensing applications is analogous to that of GNOME [War04] or the

Win32 API [RN97] for GUI applications. Just as GNOME and Win32 pro-

vide tools and libraries to build a universe of GUI applications with a common

user experience, Emstar provides tools and libraries to build a growing set of

inter–operable distributed sensing applications and system components. Where

GNOME and Win32 provide several different “Save As” dialog boxes and a li-

brary to build tear–off pull–down menus, Emstar provides several different link

estimation modules and libraries to build drivers for new link–layer network de-

vices.

86

6.1 Design Principles

Emstar was developed with several design principles in mind. These principles

resulted from early experiences with distributed wireless sensing systems, and

from considering the ways in which existing programming interfaces meshed or

clashed with the needs of these systems.

6.1.1 Inter–node communication is not usually transparent.

In the design of the Internet, TCP, an end–to–end reliable stream transport layer,

serves the purposes of most applications. This reliable layer, combined with the

overall high performance of the Internet, lends itself to remote/local transparency

via sockets. A local service that is accessed via a socket can transparently be

accessed remotely.

However, in the case of wireless sensor networks, local/remote transparency

is often undesirable. The reduced link reliability of wireless networks and the

absence of a fixed topology translate to higher communication costs, from a

combination of increased transmission costs and increased complexity of control

protocols. These increased costs of communication in a wireless network mean

that the application needs to know whether a service is local or remote—therefore

transparency is often counterproductive! When compared with a local transac-

tion, a remote transaction may have much higher latency, much higher energy

cost, and may fail because of connectivity failures. Because masking these differ-

ences is rarely beneficial, Emstar was designed to focus on support for interfaces

to local services, under the assumption that access to remote services would re-

quire more application–specific solutions.

87

6.1.2 The system within a node is complex and benefits from dis-

tributed system design principles.

The development of the Internet has highlighted many techniques for build-

ing complex systems out of components that are individually subject to fail-

ure [Cla88]. These principles also apply to the operation of individual nodes in

many distributed sensing applications, for two primary reasons. First, nodes in

distributed sensing applications must operate robustly in challenging environ-

ments. Because the environment provides inputs and circumstances that are

difficult to predict or reproduce in laboratory environments, a robust design is

often required to survive a deployment.

Second, in many distributed sensing applications, network costs mandate that

much of the intelligence in the system must be pushed into the network rather

than centralized. For example, 10 nodes hosting 4 channels of streaming audio

constitutes an aggregate data rate of 4MB/sec. Even if we set aside issues such

as packet loss, contention, adaptive transmit rates, and control overhead, this

rate is over 3 times the nominal capacity of an 802.11b card. Pushing intelligence

into the network increases both the complexity of individual nodes and of their

interactions. Problems that are easy when centralized, such as selecting a leader,

are yet another protocol design challenge when they are distributed, with the

need for fault recovery at every level.

These considerations have led to the introduction of distributed systems de-

sign principles into the inter–module interfaces within Emstar nodes. For exam-

ple, Emstar modules are typically designed to recover from the failure of other

modules in the system, using techniques such as soft–state refresh and dynamic

registration and unregistration of services at run–time. In order to reduce the

burden on the system designer, many of these features are built–in features of

88

the Emstar libraries.

System visibility is critical from the foundation up. As the complexity

of our systems increases, they become more difficult to debug. A critical aspect

of this is a capability to gain direct visibility and insight into the workings of

individual modules in the system. By using the UNIX device file interface as

its IPC interface, Emstar’s inter–module interfaces can be browsed and often

accessed directly from the shell. In some cases, transactions on the IPC channels

can be directly viewed without modification the code. Debugging devices that

provide insight into the current state of a module are cheap and easy to add, and

are often much more convenient than using log files. These techniques enable

rapid fault isolation and debugging.

Interleaved, interacting events are the common case. Similar to the

principles of “reactive robotics”, distributed sensing systems tend to operate in a

“reactive” mode in which their immediate behavior is heavily influenced by sensor

inputs [Bro86]. This reactive style fits well within an event–driven programming

model, because events and inputs of different types arriving asynchronously must

be integrated to influence the immediate behavior of the system. From a system

design perspective, reactivity requires timely delivery of event notification among

the modules in the system, as opposed to a polling–based approach.

System development tools need to support “real code simulation” and

“emulation” for quick turn–around debugging. One of the most diffi-

cult aspects of distributed sensing systems is the difficulty of effectively testing

the system. Experience has shown that deployments often expose new kinds of

problem that did not initially appear in simulation. This issue underscores the

89

Layer 0: FUSDLow Level IPC

Layer 1: GlibHandle events on IPC

Layer 2: Device Patterns &LibrariesIPC mechanism for a variety of interactions

Layer 3: Existing Modules and ServicesExisting useful components for applications

Layer 4: Extra ToolsHelp run, maintain, and debug application

Layer 0: FUSDLow Level IPC

Layer 1: GlibHandle events on IPC

Layer 2: Device Patterns &LibrariesIPC mechanism for a variety of interactions

Layer 3: Existing Modules and ServicesExisting useful components for applications

Layer 4: Extra ToolsHelp run, maintain, and debug application

Figure 6.1: The five layers of the Emstar framework.

importance of “real–code simulation” in which systems that experienced prob-

lems in deployment can be brought back into the lab and tested under various

simulated conditions. Another important element is the ability to run in vari-

ous “emulation” modes, where centralized, high–visibility simulations can be run

with real hardware in the loop. Often these techniques are the only practical

ways to debug a system that fails in the field.

6.2 How Emstar Works

Software frameworks are by nature difficult to conceptualize because they have no

tangible instantiation, except as the foundation beneath an application. However,

a framework can be described by describing the services and interfaces it provides,

and the structure it imposes on an application. Described in this way, we can

describe Emstar as a five layer system [Byt05], as shown in Figure 6.11.

1This figure is due to Martin Lukac.

90

6.2.1 Layer 0: FUSD Syscall Inter–process RPC

The lowest layer of Emstar is FUSD [GEC04] [Els02]. FUSD is a micro–kernel

interface implemented in Linux that allows user–space server processes to register

character device files and handle system calls on those devices. FUSD provides a

convenient way to enable cross–process message–passing, while at the same time

exposing shell–accessible interfaces to internal state and control functions. In

some respects, FUSD is similar to the AT&T Plan 9 system [PPT90], but has

the advantage of running over any platform running Linux rather than requiring

the port of a complete OS to the latest embedded hardware. FUSD also has much

in common with the procfs and sysfs features of Linux, which expose control and

status interfaces to in–kernel features; the difference being that FUSD exposes

interfaces to user–space processes.

By enabling systems to be readily composed of separate processes, we benefit

greatly from fault isolation. A multi–process system prevents implementation

errors in one process to cause a complete system reset. This is an important

property for deployed sensor network systems because data from the field some-

times causes failures that did not occur in the lab. For example, one version of

our ranging system which was successfully tested in the lab, encountered a new

kind of failure in the field. The deployed system suffered from a certain type of

inconsistency in the ranging data early in the run, which would eventually be

resolved as more ranging data was collected. However, this inconsistency some-

times caused an exception in the multilateration engine, which in turn caused the

multilateration module to restart. If this restart had caused a complete system

restart, the system would never have gotten past the startup phase—but because

of process isolation our system was able to limp past that point and return valid

answers.

91

Fault isolation also means that combining components is less likely to result

in new failures. A more tightly coupled, single–process approach can lead latent

errors in a particular component to surface only when several components are

used in combination. By isolating components from each other, it is easier to

integrate systems of components of varied origin.

6.2.1.1 System Calls as Blocking RPC

A system call on a FUSD device represents a blocking RPC call to the server,

brokered by the kernel. For example, consider the following snippet of client

code:

int status, fd;char buf[100];fd = open("/dev/test", O RDWR);status = read(fd, buf, sizeof(buf));

In line 4, the read() system call results in the following sequence of events,

corresponding to the diagram in Figure 6.2:

1. Client process traps into the kernel and blocks.

2. Kernel marshals the arguments to the read() call into a FUSD message.

3. Kernel queues the FUSD message for the server process bound to “/dev/test”,

and wakes the server.

4. Server reads and processes any previously queued FUSD messages.

5. Server reads out the FUSD message and processes it.

6. After processing the message, Server marshals a response and writes the

response message to the kernel.

92

7. Kernel passes the response back to the Client and the Client’s system call

returns with a result code.

From this sequence of events, it is important to note that the client blocks for

the entire duration of the system call. Even if the system call is a “non–blocking”

call such as a read() on a file descriptor that is configured non–blocking, the client

will still be blocked until the server processes the message and returns a response,

e.g. EAGAIN to indicate that the server is not ready. This means that if the server

is unresponsive the client can block in a system call for arbitrary amounts of time.

Slow response times could be caused either by errors in the implementation of the

server, or if the server is busy handling calls from other clients. Response time

can also be increased if scheduling latency becomes significant from high load.

Despite this drawback of a blocking RPC model, there is also considerable

benefit to this approach. First, synchronous RPC calls can be made in a straight-

forward coding style, as opposed to using completion callbacks. Each syscall syn-

chronously returns a result code with a minimum of latency. Lengthy operations

are typically structured as a request which is accepted or rejected quickly in one

RPC call, followed by notification when the requested operation completes.

Second, syscalls are a very basic interface accessible from any POSIX appli-

cation, with no library required beyond the standard UNIX interface libraries.

Emstar device file interfaces are browseable within the device filesystem, and in

many cases can be accessed directly by existing UNIX programs such as cat. The

syscall interface is also narrow and therefore readily ported to operating systems

other than Linux.

93

Client A FUSD Server B

read()

Msg3

1

2

Msg3Msg2

3

read()

Msg2 Process Other Clientread()

Msg3 Process Client A

Msg3

Msg3read() returnsresult code

Msg3

write() result code

4

5

6

7

Client A blocks untilsystem call returns

Client A FUSD Server B

read()

Msg3

1

2

Msg3Msg2

3

read()

Msg2 Process Other Clientread()

Msg3 Process Client A

Msg3

Msg3read() returnsresult code

Msg3

write() result code

4

5

6

7

Client A blocks untilsystem call returns

Figure 6.2: Message timing diagram of a FUSD call. The middle column of the diagram

represents the FUSD kernel module.

94

6.2.1.2 Client–Server Connections in FUSD

Using FUSD, client and server are distinct roles. A process becomes a server by

registering a new device file and handling operations on that device file. A process

becomes a client by opening a FUSD device file. By successfully opening a device

file, a connection is established between client and server, which is named on the

client side by the file descriptor returned by open(). The client may invoke RPC

calls to the server at any time by making a system call on that file descriptor.

The client may also listen for asynchronous notification on that file descriptor

using select() or poll(). Thus, to communicate from server to client, the server

first notifies the client and the client then calls back to the server. For example,

the server might indicate “readable” and the client responds by calling read().

This asymmetric relationship has advantages. The primary advantage is that

clients can be written to lower standards than servers without compromising the

integrity of the system. That is, while an incorrectly implemented server can

permanently block a client, an implementation error in a client can’t cause a

server to fail. This is a consequence of the blocking semantics: while a server

can potentially cause a client to block, a client can at worst only pass malformed

arguments in a system call which are either caught by the kernel or should be

rejected by the server process.

The FUSD client–server connection provides certain guarantees ensured by

the kernel that enable fault isolation. The kernel ensures memory fault isolation

between the client and server processes. Any pointers provided as arguments to

a system call are checked for validity in the kernel and transferred to the memory

space of the destination process, thus protecting the client and server from each

other. In addition, in the event that a client or server terminates unexpectedly,

any open connections are automatically cleaned up. When a client terminates

95

with open connections, close() messages are generated and sent to the servers

handling those connections. When a server terminates with active clients, those

clients’ file descriptors are immediately notified with exception signals and any

future system calls on those descriptors will return EBADF error codes.

6.2.1.3 FUSD Dependency Graphs

An Emstar system typically involves dozens of components, each of which hosts

multiple servers and is client to several other components. For example, the

acoustic localization system described in this document, along with all of its Em-

star subcomponents, is composed of 23 components and 182 device file interfaces.

While processes are often both servers and clients of other processes, there is a

requirement that the dependency graph of clients and servers be loop–free. This

stems from the blocking nature of the system calls. A loop in the dependency

graph introduces the potential for deadlock, as shown in the left side of Figure 6.3.

In the diagram, process A is blocked in write() system call as a client of process

B, but before handling that call process B attempts to make a write() call back

to process A. When a process is blocked in a system call as a client, it cannot

respond as a server.

A/dev/A

B

/dev/B

write(X) write(Y)

Broker

/dev/broker

A B

write(A�Y)

X � read()

write(B�X)

Y � read()

A/dev/A

B

/dev/B

write(X) write(Y)

A/dev/A

B

/dev/B

write(X) write(Y)

Broker

/dev/broker

A B

write(A�Y)

X � read()

write(B�X)

Y � read()

Broker

/dev/broker

A B

write(A�Y)

X � read()

write(B�X)

Y � read()

Figure 6.3: A dependency loop, and using a broker service to break the loop.

Figure 6.3 also shows a way to resolve this type of circular dependency. The

96

most common solution is to restructure the system to add a new service that

can act as a “broker”. Building systems as a collection of services tends to lend

itself naturally to this type of structure, because most services naturally act as a

broker in the course of their operations.

syncd

udpd

/dev/link/udp0/data

/dev/sync/params

syncd

udpd

/dev/link/udp0/data

/dev/sync/params

IO Threadsyncd

udpd

/dev/link/udp0/data

/dev/sync/params

syncd

udpd

/dev/link/udp0/data

/dev/sync/params

syncd

udpd

/dev/link/udp0/data

/dev/sync/params

IO Threadsyncd

udpd

/dev/link/udp0/data

/dev/sync/params

IO Thread

Figure 6.4: Diagram showing how to use a thread and a queue to break a FUSD

dependency loop.

In some cases, there is no natural decomposition into strictly layered services.

When a strict layering is inconvenient, a thread and a queue can be used to break

a loop, as shown in Figure 6.4. Essentially, this decouples the client and server

portions of a process, with a queue between them.

6.2.1.4 The FUSD Device API

As we have seen, from a client’s perspective the FUSD API is simply the well–

known POSIX system call API. From a server’s perspective, FUSD provides

an API through which those system calls can be handled [Els02]. This API is

modeled along the lines of the character device API in the Linux kernel, with a

few key differences. See [RC01] for an introduction to the Linux character device

API.

Similar to the Linux character device API, FUSD handles the character device

system calls by calling handler functions specified by the server in the structure

shown in Figure 6.5. For example, when a client calls read(), the server’s read

97

typedef struct fusd file operations {int (*open) (struct fusd file info *file);int (*close) (struct fusd file info *file);ssize t (*read) (struct fusd file info *file, char *buffer, size t length,

5 loff t *offset);ssize t (*write) (struct fusd file info *file, const char *buffer,

size t length, loff t *offset);int (*ioctl) (struct fusd file info *file, int request, void *data);int (*poll diff) (struct fusd file info *file, unsigned int cached state);

10 int (*unblock) (struct fusd file info *file);} fusd file operations t;

Figure 6.5: The FUSD file operations structure.

callback is called to handle that call. The fusd file info t pointer contains the

arguments for the call and other information about the calling process, includ-

ing an application–determined per–connection pointer. The server may either

return a return value, causing the read() to complete immediately, or may delay

the return, causing the client to block until a later time. A separate function,

fusd return(), is called to trigger an asynchronous return.

A new client–server connection is established when the client calls open() and

the server accepts that open() by returning a value of 0 to indicate success. At that

time, the server may set a per–connection “private data” pointer. This pointer

enables the server to distinguish different client connections and respond to their

requests appropriately; it will be passed to subsequent handler callbacks relating

to that connection. The close() handler is called when the connection breaks for

any reason (e.g. if the client calls close() or if the client’s process terminates for

any reason.) The close() handler should do any clean–up and resource recovery

necessary to deallocate that connection.

The handlers for read(), write(), and ioctl() are used respectively to transfer

98

data to, from, and to/from a client process. The actual semantics of what this

means, e.g. what the server does with the data and what data the server re-

turns to the client, is application specific. However, the convention in Emstar is

to retain some similarity with the POSIX meaning of the calls. Emstar imple-

mentations typically design the semantics to be compatible with common UNIX

utility programs such as cat and shell functions such as echo.

Thus far, the FUSD API is very similar to the Linux character device API.

The main difference between the two lies in the way poll() is handled. In a Linux

driver, poll() is a callback function that is called whenever the polling process

awakens to see whether it should drop out of a blocking poll() or select(). This

Linux poll() callback is called from deep in the scheduler, a point in the kernel at

which a response is required immediately. Since queries out to a FUSD service can

potentially have unbounded latency (and in any case require a different process

to be scheduled), calling to a FUSD server to satisfy a kernel poll() request is not

an option. Consequently, FUSD preemptively requests poll() state and caches it

so that it can respond immediately from the cached version. The freshness of

the cache is maintained using the poll diff() callback function: a poll diff() call is

left “outstanding”, such that whenever the poll state changes from the current

cached state, the server is obliged to return that poll diff() with the update.

6.2.1.5 FUSD Performance

While FUSD has many advantages, the performance of drivers written using

FUSD suffers relative to an in–kernel implementation. To quantify the costs of

FUSD, we compared the performance of FUSD and in–kernel implementations

of the /dev/zero device in Linux. To implement /dev/zero using FUSD, we im-

plemented a server with a read() handler that returned a zeroed buffer of the

99

1

10

100

1000

10000

64 256 1024 4096 16384 65536

Tra

nsfe

r sp

eed

(MB

ytes

/sec

)

Size of read (bytes)

FUSD /dev/zero, 2.4.20 kernelModule /dev/zero, 2.4.20 kernel

FUSD /dev/zero, 2.6.3 kernelModule /dev/zero, 2.6.3 kernel

Figure 6.6: Throughput comparison of FUSD and in–kernel implementations of

/dev/zero, timing a read of 1GB of data on a 2.8 GHz Xeon, for both 2.4 and 2.6

kernels.

requested length. The in–kernel implementation implemented the same read()

handler directly in the kernel.

Figure 6.6 shows the results of our experiment, running on a 2.8 GHz Xeon.

The figure shows that for small reads, FUSD is about 17x slower than an in–

kernel implementation, while for long reads, FUSD is only about 3x slower. This

reduction in performance is a combination of two independent sources of over-

head.

The first source of overhead is the additional system call overhead and schedul-

ing latency incurred when FUSD proxies the client’s system call out to the user–

space server. For each read() call by a client process, the user–space server first be

scheduled, and then must itself call read() once to retrieve the marshalled system

100

call, and must call writev() once to return the response with the filled data buffer.

This additional per–call latency dominates for small data transfers.

The second source of overhead is an additional data copy. Where the native

implementation only copies the response data back to the client, FUSD copies

the response data twice: once to copy it from the user–space server, and again

to copy it back to the client. This cost dominates for large data transfers.

In our experiments, we tested both the 2.6 and 2.4 kernels, and found that

2.6 kernels yielded an improvement for smaller transfer sizes. The 2.6 kernel has

a more significant impact when many processes are running in parallel.

6.2.2 Layer 1: GLib Event System

As we discussed in Section 6.1, Emstar is designed to support “reactive”, event–

driven designs. To address these needs, Emstar incorporates an event system

that supports the management and multiplexing of I/O and timer events in a

modular way. Rather than invent a new event system, Emstar uses a preexisting

system that is part of the GLib library, a standard library widely used in Linux

and open–source projects. In order to minimize dependence on any particular

event system, Emstar defines a thin “glue” layer, shown in Figure 6.7, that in the

current implementation connects the Emstar codebase to the GLib events API,

but could be replaced with some amount of effort with another event system.

The Emstar events API handles two kinds of event: timer events which are

optionally retriggerable, and I/O events, which enable poll flags to be watched for

a specific file descriptor. Since all Emstar signals and I/O are based on timers and

file descriptors, at the lowest layer these are the only event functions required.

These functions in turn call into GLib functions that configure the GLib event

loop. Figure 6.8 shows an example using a GLib timer from an Emstar program.

101

/* Condition values for I/O Events */#define FUSD NOTIFY INPUT 0x1#define FUSD NOTIFY OUTPUT 0x2#define FUSD NOTIFY EXCEPT 0x4

5

/* Return values for event callbacks */#define EVENT DONE (0)#define EVENT RENEW (1)#define TIMER DONE (0)

10 #define TIMER RENEW (1)#define EVENT ERROR(x) ((x) << 16)#define TIMER RENEW MS(x) ((x+1) << 4)

typedef int (*g event handler cb t)15 (void *data, int fd, int fusd condition, g event t *event);

typedef int (*g timer handler cb t)(void *data, int interval, g event t *event);

20 int g event add (int fd, int fusd condition,g event handler cb t function,void *data, g event opts t *opts,g event t **ref);

25 int g timer add (uint interval,g timer handler cb t function,void *data, g event opts t *opts,g event t **ref);

30 int g event destroy(g event t *closure);

Figure 6.7: The Emstar event system API.

102

int cb func(void *data, int *interval, g event t *ev) {elog(LOG NOTICE, "Timeout fired!");return TIMER RENEW;}

5

int main(int argc, char * argv[ ]){

/* install the timer event */status = g timer add(1000, cb func, &cb data, NULL, NULL);

10

/* . . . */

/* enter the event loop */g main();

15 return 1;}

Figure 6.8: Setting a timer in the Emstar event system.

Although this event API is very low–level, higher–level components can be

constructed above it. Typically, a higher level event will define its own application–

specific callbacks, encapsulating and masking the internal details of the basic I/O

and timer events. This enables modularity, since at the lowest level a consistent

events API and loop is used to combine and manage these low level events.

6.2.3 Layer 2: Emstar Device Patterns and Libraries

Layer 2 of the Emstar design is a layer of libraries that comprise the heart of

Emstar, supporting the implementations of all of the tools, services, and appli-

cations built within the framework. These libraries include a collection of useful

utility functions and data structures called libmisc, a collection of event–based

I/O functions such as socket and file I/O in libevent, and a set of functions for

creating and using FUSD devices called libdev.

103

Pattern Name Description

Status Device Presents current status on demand, and notificationof status change.

Packet Device Send and receive small packets on a best–effortbasis, with per–client queueing.

Command Device Presents usage information when read, acceptscommand strings via write().

Query Device Synchronous RPC with a single round–robin queuefor transactions.

Sensor Device Streaming or buffered interface to a bufferedsequence of samples measured from a sensor.

Log Device Buffer of recent log messagesOption Device /proc–style runtime–configurable optionDirectory Device Internally stores a mapping from strings to small

integers, and allows clients to access and add to themapping.

Table 6.1: Device Patterns currently defined by the Emstar system.

Using FUSD, it is possible to implement character devices with almost arbi-

trary semantics. FUSD itself does not enforce any restrictions on the semantics

of system calls, other than those needed to maintain fault isolation between the

client, server, and kernel. While this absence of restriction makes FUSD a very

powerful tool, we have found that in practice the interface needs of most ap-

plications fall into well–defined classes, which we term Device Patterns. Device

Patterns factor out the device semantics common to a class of interfaces, while

leaving the rest to be customized in the implementation of the service. Table 6.1

shows a list of Emstar Device Patterns.

The Emstar device patterns are implemented by libraries that hook into the

GLib event framework. The libraries encapsulate the detailed interface to FUSD,

leaving the service to provide the configuration parameters and callback functions

that tailor the semantics of the device to fit the application. For example, while

104

the Status Device library defines the mechanism of handling each read(), it calls

back to the application to represent its current “status” as data.

Relative to other approaches such as log files and status files, a key property

of Emstar device patterns is their active nature. For example, the Logring Device

pattern creates a device that appears to be a regular log file, but always contains

only the most recent log messages, followed by a stream of new messages as they

arrive. The Status Device pattern appears to be a file that always contains the

most recent state of the service providing it. However, most status devices also

support poll()–based notification of changes to the state.

The following sections will describe a few of the Device Patterns defined within

Emstar. Most of these patterns were discovered during the development of ser-

vices that needed them and later factored out into libraries. In some cases, several

similar instances were discovered, and the various features amalgamated into a

single pattern.

6.2.3.1 Status Device

The Status Device pattern provides a device that reports the current state of a

module. The exact semantics of “state” and its representation in both human–

readable and binary forms are determined by the service. Status Devices are used

for many purposes, from the output of a neighbor discovery service to the current

configuration and packet transfer statistics for a radio link. Because they are so

easy to add, Status Devices are often the most convenient way to instrument a

program for debugging purposes, such as the output of the Neighbors service and

the packet reception statistics for links.

Status Devices support both human–readable and binary representations through

two independent callbacks implemented by the service. Since the devices default

105

Status Device

Server

O I

Client1 Client2 Client3

write binaryprintable

status_notify()

Status Device

Server

O I


write binaryprintable

status_notify()

Figure 6.9: Block diagram of the Status Device pattern. The functions binary(), print-

able(), and write() are callbacks defined by the server, while status notify() is called by

the server to notify the client of a state change.

to ASCII mode on open(), programs such as cat will read a human–readable

representation. Alternatively, a client can put the device into binary mode us-

ing a special ioctl() call, after which the device will produce output formatted

in service–specific structs. For programmatic use, binary mode is preferable for

both convenience and compactness.

Status Devices support traditional read–until–EOF semantics. That is, a

status report can be any size, and its end is indicated by a zero–length read.

But, in a slight break from traditional POSIX semantics, a client can keep a

Status Device open after EOF and use poll() to receive notification when the

status changes. When the service triggers notification, each client will see its

device become readable and may then read a new status report.

This process highlights a key property of the status device: while every new

report is guaranteed to be the current state, a client is not guaranteed to see

every intermediate state transition. The corollary to this is that if no clients care

106

about the state, no work is done to compute it. Applications that desire queue

semantics should use the Packet Device pattern (described in Section 6.2.3.2).

Like many Emstar device patterns, the Status Device supports multiple con-

current clients. Intended to support one–to–many status reporting, this feature

has the interesting side effect of increasing system transparency. A new client

that opens the device for debugging or monitoring purposes will observe the same

sequence of state changes as any other client, effectively snooping on the “traffic”

from that service to its clients. The ability to do this interactively is a powerful

development and troubleshooting tool.

A Status Device can implement an optional write() handler, which can be

used to configure client–specific state such as options or filters. For example, a

routing protocol that maintained multiple routing trees might expose its routing

tables as a status device that was client–configurable to select only one of the

trees.

In order to demonstrate the simplicity of implementing a “dual mode” Status

Device, Figure 6.10 shows an complete example using this interface. The ex-

ample creates a device called /dev/energy/status, that reports information about

remaining energy in the system, represented by the energy status t structure. The

device is created in the main() function, by calling the constructor with an options

structure.

The options structure specifies the name of the device, a private data pointer,

and two callback functions that will be called when the device is accessed by a

client. If the client sets the device into binary mode, the “binary” handler is

called to generate a response; otherwise, the “printable” handler is called. The

handlers are provided a buf t (a dynamically allocated growable buffer) which

they must fill. Typically the binary output is reported as a struct that it exposed

107

#include <libdev/status dev.h>

typedef struct energy status s {float batt voltage;

5 int seconds remain;} energy status t;

int e stat bin(status context t *ctx, buf t *buf) {energy status t *es = (energy status t *)sd data(ctx);

10 bufcpy(buf, es, sizeof(energy status t));return STATUS MSG COMPLETE;}

int e stat print(status context t *ctx, buf t *buf) {15 energy status t *es = (energy status t *)sd data(ctx);

bufprintf(buf, "Energy status: \n");bufprintf(buf, " %.2f volts, %d seconds remain\n",

es−>batt voltage, es−>seconds remain);return STATUS MSG COMPLETE;

20 }

int main(int argc, char **argv) {energy status t energy status = {};status context t *stat dev = NULL;

25 status dev opts t s opts = {device: {

devname: "energy/status",device info: &energy status},

30 printable: e stat print,binary: e stat bin};g status dev(&s opts, &stat dev);/* e cmd init(&energy status); */

35 g main();return 0;}

Figure 6.10: A snippet of code that creates a Status Device.

108

PacketDevice

Server

Client1

I O

F

Client2

I O

F

Client3

I O

F

O Isendpd_receive()pd_unblock()

filterPacketDevice

Server

Client1

II OO

F

Client2

II OO

F

Client3

II OO

F

OO IIsendpd_receive()pd_unblock()

filter

Figure 6.11: Block diagram of the Packet Device pattern. The functions send() and

filter() are callbacks defined by the server, while pd recieve() and pd unblock() are func-

tions called by the server.

to clients in a header file, while the printable output constructs an equivalent

message from the same underlying struct. This approach of always reporting the

complete status (rather than a diff–based scheme) simplifies implementation and

eliminates a wide array of potential bugs.

Of course, in a real application there would be mechanism that acquired and

filled in the energy status. In the event that a significant change occurred in

the energy state, it might be appropriate to notify any existing clients. In this

example, notification would take the form of the call g status dev notify(stat dev).

This call would trigger read notification on all clients, who would then re–read

the device to get the updated status.

6.2.3.2 Packet Device

The Packet Device pattern provides a read/write device that provides a queued

multi–client packet interface. This pattern is generally intended for packet data,

109

such as the interface to a radio, a fragmentation service, or a routing service, but

it is also convenient for many other interfaces where queue semantics are desired.

Reads and writes to a Packet Device must transfer a complete packet in each

system call. If read() is not supplied with a large enough buffer to contain the

packet, the packet will be truncated. A Packet Device may be used in either a

blocking or poll()–driven mode. In poll(), readable means there is at least one

packet in its input queue, and writable means that a previously filled queue has

dropped below half full.

Packet Device supports per–client input and output queues with client–configurable

lengths. When at least one client’s output queue contains data, the Packet De-

vice processes the client queues serially in round–robin order, and presents the

server with one packet at a time. This supports the common case of servers that

are controlling access to a rate–limited serial channel.

To deliver a packet to clients, the server must call into the Packet Device

library. Packets can be delivered to individual clients, but the common case is to

deliver the packet to all clients, subject to a client–specified filter. This method

enhances the transparency of the system by enabling a “promiscuous” client to

see all traffic passing through the device.

6.2.3.3 Command Device

The Command Device pattern provides an interface similar to the writable entries

in the Linux /proc filesystem, which enable user processes to modify configura-

tions and trigger actions. In response to a write(), the provider of the device

processes and executes the command, and indicates any problem with the com-

mand by returning an error code. Command Device does not support any form

of delayed or asynchronous return to the client.

110

#include <libdev/command dev.h>

char *e usage(void *data) {return "Echo ’suspend’ to suspend system\n";

5 }

int e command(char *cmd, size t size, void *data) {int retval = EVENT RENEW;if (strncasecmp(cmd, "suspend", 7) == 0) {

10 /* initiate suspend mode. . . */}else

retval |= EVENT ERROR(EINVAL);return retval;

15 }

void e cmd init(energy status t *es) {cmd dev opts t c opts = {

device: {20 devname: "energy/command",

device info: es},command: e command,usage: e usage

25 };g command dev(&c opts, NULL);}

Figure 6.12: Snippet of code that creates a Command Device.

111

While Command Devices can accept arbitrary binary data, they typically

parse a simple ASCII command format. Using ASCII enables interactivity from

the shell and often makes client code more readable. Using a binary structure

might be slightly more efficient, but performance is not a concern for low–rate

configuration changes.

The Command Device pattern also includes a read() handler, which is typically

used to report “usage” information. Thus, an interactive user can get a command

summary using cat and then issue the command using echo. Alternatively, the

Command Device may report state information in response to a read. This

behavior would be more in keeping with the style used in the /proc filesystem,

and is explicitly implemented in a specialization of Command Device called the

Options Device pattern.

Figure 6.12 continues our previous example by adding a Command Device.

Uncommenting line 34 of Figure 6.10 and linking with Figure 6.12 will instantiate

a new Command Device called /dev/energy/command, that can be used to trigger

the system to suspend.

The implementation requires only the “command” handler. This handler tests

the string and triggers the suspend process if the string equals suspend. Any other

string will return the error EINVAL. The usage handler returns a usage string to

the client.

In many cases the commands to a command device are more complex than

a simple keyword. To support these cases, the Emstar libraries include a simple

parser that defines a standard syntax used by most Command Devices. This

syntax specifies a sequence of key/value pairs, delimited by colons.

112

Query Device

Server

Q

I


processqdev_reply()

Q

R RR

usage

Query Device

Server

QQ

I


processqdev_reply()

QQ

R RR

usage

Figure 6.13: Block diagram of the Query Device pattern. In the Query Device, queries

from the clients are queued and “process” is called serially. The “R” boxes represent a

buffer per client to hold the response to the last query from that client.

6.2.3.4 Query Device

The Device Patterns we have covered up to now provide useful semantics, but

none of them really provides the semantics of synchronous RPC. To address this,

the Query Device pattern implements a transactional, request/response seman-

tics. To execute a transaction, a client first opens the device and writes the

request data. Then, the client uses poll() to wait for the file to become readable,

and reads back the response in the same way as reading a Status Device. For

those services that provide human–readable interfaces, we use a universal client

called echocat that performs these steps and reports the output.

It is interesting to note that the Query Device was not one of the first de-

vice types implemented; rather, most configuration interfaces in Emstar have

been implemented by separate Status and Command devices. In practice, any

given configurable service will have many clients that need to be apprised of its

113

current configuration, independent of whether they need to change the config-

uration. This is exacerbated by the high level of dynamics in sensor network

applications. Furthermore, to build more robust systems we often use soft–state

to store configurations. The current configuration is periodically read and then

modified if necessary. The asynchronous Command/Status approach achieves

these objectives while addressing a wide range of potential faults.

To the service implementing a Query Device, this pattern offers a simple,

transaction–oriented interface. The service defines a callback to handle new

transactions. Queries from the client are queued and are passed serially to the

transaction processing callback, similar to the way the output queues are handled

in a Packet Device. If the transaction is not complete when the callback returns,

it can be completed asynchronously. At the time of completion, a response is

reported to the device library, which it then makes available to the client. The

service may also optionally provide a callback to provide usage information, in

the event that the client reads the device before any query has been submitted.

Clients of a Query Device are normally serviced in round–robin order. How-

ever, some applications need to allow a client to “lock” the device and perform

several back–to–back transactions. The service may choose to give a current

client the “lock”, with an optional timeout. The lock will be broken if the time-

out expires, or if the client with the lock closes its file descriptor.

6.2.3.5 Sensor Device

Sensor Device provides a convenient interface to recorded sensor data. On the

server side, the server acquires the sensor data and calls a function to push it to

the interface. Internally, the Sensor Device maintains a ring buffer of recent data

samples and assigns a monotonic index to each sample.

114

Sensor Device

Q


sdev_push()

R

R

R

usage

Server

RB

command

Sensor Device

QQ


sdev_push()

R

R

R

usage

Server

RBRB

command

Figure 6.14: Block diagram of the Sensor Device pattern. In the Sensor Device, the

server submits new samples by calling sdev push(). These are stored in the ring buffer

(RB), and streamed to clients with relevant requests. The “R” boxes represent each

client’s pending request.

Clients can retrieve the data by sending a request for a range of samples to

the Sensor Device. This range specifies an absolute starting sample or a starting

point relative to “now”, and optionally an ending point. If no ending point is

specified, data will continue to be streamed to the client until the client closes the

connection. Because the Sensor Device maintains a ring buffer, a client can access

recent historical data. As we will see in Chapter 7, this property is important for

building systems that want to compare sensor data recorded at different nodes.

Samples are returned to the client in a packet format, with a header that

includes the starting sample index, number of samples, sample size and format.

This header is important because the Sensor Device is in some ways best–effort;

performance glitches or misbehaving clients may result in data dropped from the

sequence. For example, if a client makes a streaming request but never reads, it

is impossible for the Sensor Device to maintain an infinite buffer for that data.

Instead, when the client finally reads, it will only read back beginning with the

115

history retained in the ring buffer.

Like Query Device, Sensor Device is built above Status Device. The im-

plementation works by reporting the next chunk of sensor data as the current

“status”, and keeps notifying the client until no data remains to report. This im-

plementation is simple and provides good fault isolation, but passing bulk data

through FUSD has disadvantages in terms of performance. FUSD involves sig-

nificant context switches in and out of the kernel, and bulk data transfer through

FUSD messages involves many unnecessary copies. For high–rate sensors this can

be a significant challenge. To address this in the future, we intend to implement

a new version of Sensor Device that uses shared memory for bulk data transfer

and uses FUSD only to coordinate access to that memory.

6.2.3.6 Client Libraries

One of the benefits of the Emstar design is that services and applications are

separate processes and communicate through POSIX system calls. As such, Em-

star clients and applications can be implemented in a wide variety of languages

and styles. However, a large part of the convenience of Emstar as a development

environment comes from a set of helper libraries that improve the elegance and

simplicity of building robust applications.

In the preceding Sections we have described several device patterns, and we

have noted that an important part of these device patterns is the library that im-

plements them on the service side. Most device patterns also include a client–side

“API” library, that provides basic utility functions, GLib compatible notification

interfaces, and a crashproofing feature intended to prevent cascading failures.

Crashproofing is intended to prevent the failure of a lower–level service from

causing exceptions in clients that would lead them to abort. It achieves this

116

by encapsulating the mechanism required to open and configure the device, and

automatically triggering that mechanism to re–open the device whenever it closes

unexpectedly.

The algorithm used in crashproofing is described in Figure 6.15. The argu-

ments to this algorithm are the name of the device, and two callback functions,

config and handler. The config function configures a freshly opened device file

according to the needs of the client, e.g. setting queue lengths and filter pa-

rameters. The handler function is called when new data arrives. Note that in

the implementation, the call to poll() occurs in the GLib event system, but the

fundamental algorithm is the same.

A client’s use of crashproof devices is completely transparent. The client

constructs a structure specifying the device name, a handler callback, and the

client configuration, including desired queue lengths, filters, etc. Then, the client

calls a constructor function that opens and configures the device, and starts

watching it. according to the algorithm in Figure 6.15. In the event of a crash

and reopen, the information originally provided by the client will be used to

reconfigure the new descriptor. Crashproof client libraries are supplied for both

Packet and Status devices.

6.2.3.7 Domain Specific Device Interfaces

Along with the generic devices we have described, there are also many domain–

specific device interfaces. These interfaces are implemented by libraries and are

usually composed of a set of devices that, taken together, provide a single logical

interface. The most broadly used example of this is the Data Link interface, a

specification of a standard interface for network stack modules.

The link interface is composed of a set of devices located in the /dev/link/*

117

Watch-Crashproof(devname,config,handler)

1 fd ← open(devname)2 if configure(fd) < 0 goto 113 crashed ← false

4 resultset ← poll(fd , {input, except})5 if crashed6 then status ← read(fd , buffer)7 if status < 0 abort8 if devname ∈ buffer goto 19 else

10 if except ∈ resultset11 then close(fd)12 fd ← open(“/dev/fusd/status”)13 if fd < 0 abort14 crashed ← true

15 elseif input ∈ resultset16 then status ← read(fd , buffer)17 if fatal error goto 1118 if status ≥ 0 handler(buffer , status)19 goto 4

Figure 6.15: “Crashproof” auto–reopen algorithm.

118

tree. Each link device is composed of a set of device files in a subdirectory

named by the link name, e.g. /dev/link/udp0/*. A link device always has three

subdevices: data, status and command, and in addition may also have other

related devices, such as neighbors, routes, errors, etc.

The data device is a Packet Device interface that is used to exchange packets

with the network. All packets transmitted on this interface begin with a standard

link header that specifies common fields. This link header masks certain cosmetic

differences in the actual over–the–air headers used by different MAC layers, such

as the Berkeley MAC [HSW00] and SMAC [YHE02] layers supported on Mica

Motes.

The command and status devices provide asynchronous access to the config-

uration of a stack module. The status device reports the current configuration

of the module (such as its channel, sleep state, link address, etc.) as well as the

latest packet transfer and error statistics. The command device is used to issue

configuration commands, for example to set the channel, sleep state, etc. The

set of valid commands and the set of values reported in status varies with the

underlying capabilities of the hardware. However, the binary format of the status

output is standard across all modules (currently, the union of all features).

Many “link drivers” and services have been implemented using the Link in-

terface. This uniform interface enables services to be stacked and swapped (at

run–time if needed), and provides a uniform interface for applications. We will

discuss the services in more detail in Section 6.2.4.

6.2.4 Layer 3: Emstar Components and Services

Layer 3 in the Emstar design is a collection of reusable components and services

that address common needs in embedded networked systems. This spans a wide

119

range of functionality including device drivers, routing algorithms, time synchro-

nization services, and distributed collaboration services. In this section we will

introduce many of the components, while Chapters 7 and 8 will focus on time

synchronization and network services in more detail.

6.2.4.1 Network Stack Components

In Section 6.2.3.7 we described the Link interface used to create network stack

components. Emstar includes a suite of components that can be used and com-

bined to provide network functionality tuned to the needs of wireless embedded

systems. These components include “link drivers”, that implement the lowest

layer interfaces to network resources, pass–through modules that implement var-

ious types of filter and passive processing, and routing modules that provide

facilities that provide network–layer interfaces that route messages among one or

more link–layer interfaces.

Emstar implements several “link drivers”, providing interfaces to radio link

hardware including 802.11, and several flavors of the Mica Mote. The 802.11

driver overlays the socket interface, sending and receiving packets through the

Linux network stack, and optionally integrating feedback from the MAC layer

about RSSI, precise timing, and transmission failures. Two versions of the Mote

driver exist, one that supports both Berkeley MAC and SMAC on Mica2, and

a new version that supports only BMAC but adds support for newer platforms

such as Telos.

Because all of these drivers conform to the link interface spec, applications use

a single access method to use different physical radio hardware. However, the Link

interface does not intend to treat all links transparently—it explicitly exposes low

level information about the link’s capabilities and status so that applications can

120

make intelligent decisions about how and when to use them. For example, link

capabilities such as variable transmit power, nominal link capacity, and MTU

are all accessible to applications and routing algorithms from the Link’s status

device.

Link interfaces are also used to construct modules sit in the middle of the

stack, passing packets through to lower layers, possibly analyzing or modifying

them along the way. A pass–through module is both a client of a lower Link

device and a provider of an upper Link device. To simplify the implementation,

some of the work of proxying status and command interfaces is done by a library.

In some cases, the implementation of a pass–through involves implementing a

single function that transforms a packet from above and sends it below, and vice

versa.

Linkstats, Blacklisting, and Fragmentation are examples of pass–through mod-

ules. Linkstats adds a small header to each packet and counts gaps in sequence

numbers to estimate link quality. Blacklisting uses the output of a neighbor dis-

covery module to block traffic on links that are not bidirectional. Fragmentation

breaks large packets up into smaller packets.

Link interfaces are also used to provide interfaces to routing modules. These

are network layer interfaces rather than link layer interfaces, so the source and

destination addresses are usually interpreted as network layer IDs i.e. Node IDs)

rather than link layer IDs (i.e. Interface IDs). The simplest routing module is the

floodd module, which accepts messages, adds a sequence number, and re–sends

each message exactly once. There is also a generic routing module called sink that

uses routing tables provided by another module to route messages to a specified

destination.

121

6.2.4.2 Local Directory Service

One of the best examples of a small Emstar service is the local directory service.

This service allows applications to dynamically assign and share mappings of

short strings to small integers. This avoids the need to statically assign numbers

to items in the system where the set of items is known only at run–time.

For example, there are many implementations that might provide Link inter-

faces, and an implementations might be used more than once in a single Emstar

system. One solution would be a global file that statically assigned numbers to

links. The disadvantage of such a scheme is that it is difficult to manage, and

the list will grow long and cumbersome.

Using the Directory service, each Link provider can register their link with

the service and be dynamically assigned a number. These numbers are thus

guaranteed to be small and the mapping is known by querying the Directory

service. The Directory service is also used in several other places, including to

define mappings of local clocks to numbers. However, because these mappings

are dynamic on each node, the numbers assigned cannot be assumed to be the

same on two different nodes in the system. For cases where global assignments

must be made, other techniques must be employed.

6.2.4.3 EmRun Services

EmRun is a program that parses a configuration file that describes an Emstar

system, launches the system, and provides several centralized services. Among

the services it provides are automatic respawning with visibility into the status

and history of processes, centralized in–memory logging, process responsiveness

tracking, fault reporting, and fast startup / graceful shutdown.

122

Respawn Process respawn is neither new, nor difficult to achieve, but it is very

important to an Emstar system. It is difficult to track down every bug, especially

ones that occur very infrequently, such as a floating–point error processing an

unusual set of data. Nonetheless, in a deployment, even infrequent crashes are

still a problem. Often, process respawn is sufficient to work around the problem;

eventually, the system will recover. Emstar’s process respawn is unique because

it happens in the context of “crashproofed” interfaces (Section 6.2.3.6). When

an Emstar process crashes and restarts, Crashproofing prevents a ripple effect,

and the system operates correctly when the process is respawned.

When processes die unexpectedly, EmRun tracks the termination signal and

last log message reported by the process. This information can be accessed from

the last msg device, that reports the count and circumstances of all process ter-

minations along with their final message.

In–Memory Logs EmRun saves each process’ output to in–memory log rings

that are available interactively from the /dev/emlog/* hierarchy. These illustrate

the power of FUSD devices relative to traditional logfiles. Unlike rotating logs,

Emstar log rings never need to be switched, never grow beyond a maximum size,

and always contain only recent data.

Process Responsiveness Tracking An unresponsive server can cause perfor-

mance bottlenecks for an entire Emstar system. To address this, EmRun tracks

the responsiveness of all processes in the system. The EmRun client library in-

cludes a timer event that sends a periodic heartbeat message to EmRun. EmRun

tracks the arrival of these messages and compares the arrival time to the sched-

uled send time according to the timer. The discrepancy in the times reveals an

estimate of the responsiveness of the process, since whenever the timer fires, that

123

process is also free to respond to I/O events.

Fault Reporting Fielded systems often have the possibility of unexpected

faults. For example, our acoustic systems sometimes experience wiring problems

that cause one or more channels of the microphones or the speaker to fail. It

is also possible for our acoustic systems to be incorrectly set up, for example

if the battery pack for the microphone array is disconnected or if the array’s

wires are crossed. A driver that detects a fault can publish this fault through

a centralized reporting service provided by EmRun. This fault report is only

available locally through the faults device file, but other modules can publish

that fault information over the network to other nodes and to the user.

The fault reporting API is very simple. A process reports a fault by opening

a device file, writing in a string describing the fault, and keeping that file open

until the fault is corrected. When the file is closed (whether by the application

or by the process terminating), the fault will be removed from the list. Other

processes may monitor the fault list using the standard Status Client library.

Fast Startup EmRun’s fast startup and graceful shutdown is critical for a

system that needs to duty cycle to conserve energy. The implementation depends

on a control channel that Emstar services establish back to EmRun when they

start up. Emstar services notify EmRun when their initialization is complete,

signaling that they are now ready to respond to requests. The emrun init() library

function, called by the service, communicates with EmRun by writing a message

to /dev/emrun/.int/control. EmRun then launches other processes waiting for

that service, based the dependency graph expressed in the EmRun configuration

file.

This feedback enables EmRun to start independent processes with maximal

124

parallelism, and to wait exactly as long as it needs to wait before starting de-

pendent processes. This scheme is far superior to the naive approach of waiting

between daemon starts for pre–determined times, i.e., the ubiquitous “sleep 2”

statements found in *NIX boot scripts. Various factors can make startup times

difficult to predict and high in variance, such as flash filesystem garbage collec-

tion. On each boot, a static sleep value will either be too long, causing slow

startup, or too short, causing services to fail when their prerequisites are not yet

available.

Graceful Shutdown The control channel is also critical to supporting graceful

shutdown. EmRun can send a message through that channel, requesting that the

service shut down, saving state if needed. EmRun then waits for SIGCHLD to

indicate that the service has terminated. If the process is unresponsive, it will be

killed by a signal.

An interesting property of the EmRun control channel is one that differen-

tiates FUSD from other approaches. When proxying system calls to a service,

FUSD includes the PID, UID, and GID of the client along with the marshalled

system call. This means that EmRun can implicitly match up the client con-

nections on the control channel to the child processes it has spawned, and reject

connections from non–child processes. This property is not yet used much in

Emstar but it provides an interesting vector for customizing and securing device

behavior.

6.2.5 Layer 4: Additional Tools and Environment

In addition to reusable components that might be integrated into newly developed

systems, there are a collection of ancillary tools that help developers design,

125

implement, and deploy new systems. These tools include tools for simulation,

deployment, remote access, and visualization.

6.2.5.1 EmSim: the Emstar Simulator

Transparent simulation at varying levels of accuracy is crucial for building and

deploying large systems [EBB03a] [LLW03] [GSR04]. EmSim enables “real–code”

simulation at many different accuracy regimes. EmSim runs many virtual nodes

in parallel, each with its own device hierarchy. Because Emstar applications al-

ways interact with the world through standard interfaces such as Link devices and

Sensor Devices, EmSim can transparently run nodes in simulation by presenting

identical interfaces to simulated or remote resources.

For operations in pure simulation, a radio channel simulator and a sensor

simulator can provide interfaces to a modeled world, or re–play conditions that

have been observed empirically. EmSim also supports “emulation mode”, in

which real hardware such as radios and sensors can be accessed remotely. This

yields a very convenient interface to a testbed, because the entire application

can run centrally on a single simulation server, while the radio traffic or sensor

data comes from a deployed testbed. We have found that using real radios is far

superior to attempting to model radios, especially when there may be bugs or

glitches in the operation or performance of the radios.

These different simulation regimes speed development and debugging; pure

simulation helps to get the code logically correct, while emulation in the field helps

to understand environmental dynamics before a real deployment. Simulation and

emulation do not eliminate the need to debug a deployed system, but they do

tend to reduce it.

In all of these regimes, the Emstar source code and configuration files are

126

identical to those in a deployed system, making it painless to transition among

them during development and debugging. This serves to eliminate accidental

code differences that can arise when running in simulation requires modifications.

EmSim can also simulate heterogeneous networks contain both Motes and Emstar

systems, by running the Mote code inside an EmTOS wrapper. Other “real–

code” simulation environments include TOSSim [LLW03] and SimOS [RBD97],

but Emstar is the only environment that readily supports heterogeneous networks

and “emulation” using real hardware.

6.2.5.2 Remote Access Methods

As an IPC mechanism, FUSD has the benefit being fast, deterministic, and syn-

chronous, making straight–line programming of sequential calls possible. These

properties make it easy to communicate between processes on a single node, but

unlike sockets, they do not provide any native remote access mechanism. In ad-

dition, some languages such as Java are designed to handle everything in terms

of sockets, and don’t have complete support for POSIX system calls.

To address these concerns, we have implemented several remote access mech-

anisms to Emstar. These mechanisms enable access to Emstar services over the

network, and can also simplify the integration of Emstar with other systems. The

three remote access mechanisms supported by Emstar are FUSDnet, the Emstar

HTTP server, and EmProxy.

FUSDnet FUSDnet is a remote access protocol based on FUSD. Using FUS-

Dnet, any FUSD device can be accessed remotely via a sockets protocol.

A server that wants to enable incoming remote connections must set a special

flag when it registers the device. This flag will register the device with the

127

FUSDnet daemon, which listens for incoming requests and de–multiplexes them

to the appropriate server. A client that wants to connect to a remove FUSD

service must run a client program that opens a socket to the remote node, requests

a connection to the specified device, and creates a local stub device. Once the

connection is established, the local stub device will be an exact mirror of the

remote device. A system call made on the stub will be marshalled and transferred

via the socket to the remote server, where it will be handled and a response

returned. Thus, FUSDnet provides transparent access to remote FUSD services.

FUSDnet is transparent, but it is recommended only for use in conditions

with reliable and deterministic network links between client and server. FUSDnet

might be a convenient way to link two Emstar systems together if they have a

wired Ethernet link between them, but it would not be so appropriate if those two

systems are physically separate and are connected wirelessly. For such situations,

protocols designed for slow or unreliable links would be preferred.

HTTP With the advent of the Web, HTTP has become one of the most uni-

versally implemented protocols. Recognizing this, Emstar supports an HTTP

gateway that enables remote access to FUSD devices. This access is implemented

by CGI scripts that enable access to Status Devices, Command Devices, and Log

Devices via simple URL formats. This approach can easily be extended by adding

additional CGI scripts to handle other device types.

The Emstar HTTP server integrates with EmRun to provide a default web

page that shows the node’s current status and can integrate sub–pages for each

running process. This “Node Page” can make it easy for novice users to browse

the status of a node using a web client. The HTTP service also enables integration

with Java and other software that can readily access services via HTTP, and

128

allows those programs to run remotely.

EmProxy EmProxy is a remote access protocol that exposes real–time state

changes via a best–effort UDP protocol. Unlike FUSDnet and HTTP, which use

TCP to reliably connect to a specific node, EmProxy provides a broadcast inter-

face to control and monitor groups of nodes over a broadcast network. Because

EmProxy can operate over broadcast to groups of nodes, it is very useful in de-

ployed environments where many nodes are involved but only a subset of those

nodes are reachable at any given time. The use of UDP enables EmProxy to

report status at high rates, dropping messages rather than buffering them when

the rate exceeds capacity.

An EmProxy client connects to EmProxy by periodically sending a request

message that lists a set of Status Devices to monitor. The EmProxy service opens

those Status Devices and monitors them for notification. Whenever notification

is triggered, EmProxy reads the new state and reports that back to the requester

via UDP. The request string can include options and arguments to limit the rate

at which replies are reported, to automatically re–read the device periodically,

and to set the mode in which the device is read (e.g. binary, ASCII, XML, etc.)

EmProxy also supports the ability to run shell commands and report back the

results. This broadcast remote shell is very useful for managing and controlling

groups of nodes in a deployed setting.

6.2.5.3 Deployment Tools

Emstar includes a number of tools and facilities designed to aid deployments.

When working on a deployment the two primary issues to address are finding

out the state of the nodes and controlling the nodes. Emstar provides several

129

mechanisms that address these issues: rbsh (Remote Broadcast SHell), IP routing,

and efficient state flooding. The routing and flooding facilities will be discussed

in more detail in Chapter 8.

rbsh The rbsh program is an invaluable tool for dealing with collections of nodes.

It provides a simple shell prompt interface, but when commands are written at

the prompt, they are broadcast out over a selected network, and the command

script is run on each node. The results of the command are then reported back

and collated at the prompt.

In a deployed setting, this provides a fast and convenient way to send com-

mands to all reachable nodes without needing to know which nodes exist or which

are reachable. In addition, relative to tools based on ssh and other connection–

oriented protocols, there is no need to maintain connections to remote nodes,

nor to time out broken connections. The result is a simple, fast, and generally

intuitive shell interface.

We have also had success using rbsh in scripts to control groups of nodes while

running experiments and to implement simple forms of coordination without

writing specialized application code.

IP Routing In a deployed setting, being able to telnet across the network is

very useful. For this reason, even if the application itself does not require end–

to–end IP routing, IP routing can be very useful for debugging a deployment.

Often full pairwise routing is not needed, and routing along a tree is sufficient.

Emstar provides IP routing by combining native Emstar routing facilities with

the IP Connector. The IP connector creates a tunnel device that IP applications

such as telnet and ping can use, but routes the traffic on that device through a

Emstar Link device.

130

State Flooding One of the most challenging parts of a deployment is deter-

mining what is happening in the network. For example, it is important to know

the link quality observed between different nodes in the system so that gaps in

connectivity can be corrected. It is also important for the user to be aware of

faults that may have occurred in the deployment.

To address these needs, Emstar includes an efficient reliable state flooding

mechanism. This mechanism does not rely on any form of routing; it exchanges

messages peer to peer to flood the current state of a set of variables to its neigh-

bors with a hopcount to limit propagation. Each flooded message also includes

a sequence number that is used to detect gaps in the sequence. When a gap is

detected, a local retransmission protocol requests the missing data. This mecha-

nism is described in more detail in Chapter 8.

In our acoustic deployment, we used this mechanism to flood reported faults

and neighbor link quality. This enabled us to use a laptop to observe the network

from anywhere in the field, quickly getting a picture of the link quality in the

network, and immediately seeing any reported faults.

6.2.5.4 Visualizing Emstar Systems

EmView is a graphical visualizer for Emstar systems. Through an extensible

design, developers can easily add “plugins” for new applications and services.

Figure 6.16 shows a screen–shot of EmView displaying real–time state of a run-

ning deployment at the James Reserve. In this instance, the data was being

displayed live from our state flooding protocol.

EmView uses the EmProxy protocol to acquire status information from a

collection of nodes. Although the protocol is only best–effort, the responses

are delivered with low latency, such that EmViewcaptures real–time system dy-

131

Figure 6.16: Screen shot of EmView, the Emstar visualizer.

namics. In order to support heterogeneous networks, EmView first requests a

configuration file from each node that details how to visualize the services on

that node. Based on that file, EmView then follows up with a request for node

status as needed. This design enables EmView to visualize any Emstar system

without needing to be informed up front about the details of what software or

services are present in each system.

132

CHAPTER 7

A Synchronized Distributed Sampling Layer

The time–synchronized distributed sampling layer is a critical part of what makes

this platform so ideally suited to distributed acoustic processing, and it is also

one of the most difficult parts of the system to engineer. This layer provides an

application developer with an API that represents the incoming signals at a single

node as a contiguous time series with a monotonically increasing sample clock. It

also provides a mechanism to precisely compare the time that two samples were

recorded, even if those samples were recorded on different nodes in the system.

This layer greatly simplifies many distributed signal processing applications.

For example, our calibration system demonstrates how this layer simplifies a

time–of–flight ranging implementation. When a node emits a ranging signal,

it detects the signal locally to determine the exact sample index at which the

signal was emitted. Then, that sample index is published across multiple hops

to potential receivers. Through the synchronized sampling layer, receiving nodes

can translate the sender’s sample index into their own time series, correct for any

skew in sample rates, and extract the portion of their local signals containing

the ranging signal. This process will succeed as long as the receiver performs the

extraction within 8 seconds of the original event occurring.

This layer consists of three elements which we will describe separately in the

following sections: a buffered acoustic sensor interface, a time synchronization

system, and a hop–by–hop time conversion facility built into the routing layer.

133

7.1 A Buffered Acoustic Sensor Interface

SamplingHardware

vxpcd

Ring Buffer

Client

/dev/sensors/vxp/all

syncd

/dev/sync/params

/dev/sync/pairs_inject

(remote) syncd

SamplingHardware

vxpcd

Ring BufferRing Buffer

Client

/dev/sensors/vxp/all

syncd

/dev/sync/params


(remote) syncd

Figure 7.1: Block diagram of the buffered acoustic sensor interface.

The buffered acoustic sensor interface, shown in Figure 7.1, is critical to

simplifying the development of distributed sensing applications. This compo-

nent maintains a consistent, monotonic, and continuous timebase, correlates that

timebase to the node’s main CPU clock, and provides a multi–client buffered and

streaming interface to the sensor data. In the diagram, the box marked vxpcd is

the acoustic sensor interface, providing access to data sampled from the sound

hardware. The box marked syncd is the time synchronization service, which is

discussed in more detail in Section 7.2.

This interface performs several important functions, which are described in the

next few sections. While some of these features are specific to the shortcomings of

this particular hardware, in our experience most sound hardware has some subset

of these problems. Given the costs of developing custom hardware, system designs

that can work around hardware shortcomings are desirable.

134

7.1.1 Continuous Sampling and Buffering

In order to provide a continuous and monotonic timebase, vxpcd continuously

samples from the sound hardware. By sampling continuously, vxpcd leverages the

frequency stability of the sample clock, and avoids glitches and discontinuities in

the time series. In the event that a hardware error or another problem forces a

break in sampling, vxpcd will insert space into the signal in an effort to preserve

the continuity of the signal. Continuous sampling also enables synchronization

information to be estimated and retained over time, as opposed to having to

re–sync each time sampling is started.

Given that vxpcd is sampling continuously, buffering that data is the next

logical step. In addition to streaming new data to its clients, vxpcd retains the

audio data in a large ring buffer. This buffered interface can greatly simplify the

design of distributed sensing applications, because the application can sustain

significant coordination delays without worrying that the signals of interest will

have passed by the time the system can react. In many applications, nodes

that may not initially detect a signal can still extract information about the

signal if they know where to look. A buffered sensor interface enables such an

implementation to work, even if the node coordination is delayed by messaging

latency or local processing delays. Even in cases where potential receivers can be

warned in advance, it is usually simpler to design the system with more relaxed

timing considerations.

7.1.2 Synchronization

One of the biggest engineering challenges of building this platform is the prob-

lem of achieving synchronization to the audio hardware, a Digigram VXP440

PCMCIA card. Difficulties with the design of this specific audio hardware made

135

-500000

0

500000

1e+06

1.5e+06

2e+06

2.5e+06

3e+06

3.5e+06

0 10 20 30 40 50 60 70 80-15

-10

-5

0

5

10

15

Offse

t(µ

S)

Fit

erro

r(µ

S)

Samples x50000 (Seconds x1.04)

Correlation of VXP440 Sample Clock to CPU Clock (RMS=5.2 µS)

Figure 7.2: Plot of the linear relationship between the VXP sample clock and the

platform’s CPU clock.

this problem more difficult, but we have encountered similar issues in the past

with other hardware, including the audio hardware internal to the iPAQ and the

Cirrus Logic CS4281. In general, the sound hardware that is generally available

off–the–shelf is not designed to support high–precision synchronization.

Using the VXP440, two separate synchronization problems must be solved.

First, the 4 channels of audio must be synchronized together in order to achieve

high–precision phase comparisons between the channels. Second, the audio streams

must be synchronized to the system clock, so that software running on the main

processor can relate a particular point in a time series to a particular time.

In previous systems, we have used interrupt timing to deduce the time at

which samples were recorded. However, in the case of the VXP440 interrupt

timing was not well–correlated to the sample timing. In this case we contracted

136

with the manufacturer to add a custom feature that would report on demand the

total number of samples recorded by the card since the beginning of sampling.

Given this feature, we modified the in–kernel driver and the vxpcd module to

exploit this new command1.

To synchronize the audio streams to the CPU clock, the modified vxpcd mod-

ule periodically queries the card to retrieve the total sample count for each channel

and records the CPU time at which the command was issued. Each of these re-

quests provides a single observation, a data point that maps a sample index to a

CPU time. These observations are submitted to the timesync system, which per-

forms a linear fit on the data to determine a relation between the two clocks that

can be used to convert from one to the other. In order to enable finer–granularity

conversion, the sample clock is expanded by a factor of 20, so that each count is

approximately 1 microsecond at a sample rate of 48 KHz. This enables times to

be expressed and converted with sub–sample accuracy.

Figure 7.2 shows the relationship between the two clocks at a particular time

on one of our nodes2. The x axis is in multiples of 50K samples, or 1.04 seconds

at a 48 KHz sample rate. The line running across the graph represents the

linear conversion function that is used to convert from one clock to another.

This line and the points on the graph are plotted according to the scale of the

left–hand y axis, representing the offset in µS relative to a constant offset. The

three horizontal dashed lines and the impulses represent the residuals from the

linear fit: the difference between a specific data point and the conversion value.

These are plotted in µS, according to the scale of the right–hand y axis. The

“RMS fit error” is an estimate of the quality of the fit based on the root mean

1The modified firmware and drivers are available from our website. Although the newfirmware appears to introduce certain race conditions, we have used it successfully.

2This method of graphing syncd time conversions is due to Jeremy Elson.

137

square of the residuals. Properties of the VXP440 hardware that cause a non–

deterministic timing on command responses are the limiting factor in achieving

tighter synchronization.

In addition to synchronizing the acoustic time series to the CPU clock, vxpcd

must also synchronize the 4 channels of the VXP440. Although the sampling

process is driven off a single crystal, the design of the VXP440 implements two

independent stereo streams with no explicit synchronization between them. Be-

cause the design of the command channel precludes the possibility to command

both streams in a synchronized manner, we must implement a mechanism for

lining up the streams after the fact.

To do this, vxpcd opens each stream independently and buffers the data until

synchronization data can be retrieved from the card. By requesting synchroniza-

tion data from each stream in turn and matching up the CPU timestamps, vxpcd

can determine the offset between the channels and insert spaces into one of the

streams to sync them up. Because both streams are driven from the same clock,

the inter–channel synchronization achieved by this system is quite accurate.

7.1.3 Multi–Client Interface

The vxpcd module presents a multi–client Sensor Device interface that supports

both streaming and buffered access modes. The Sensor Device interface, de-

scribed in Section 6.2.3.5, is one of the Emstar device patterns. Sensor Device

provides an interface that can be accessed from the shell or from scripts using

simple utilities, as well as via a binary programmatic API implemented in the

Sensor Client library. Sensor Device supports an unlimited number of concurrent

clients, enabling multiple independent applications to use the same sensor data.

Sensor Device supports a streaming interface with a buffer of past samples. A

138

client requests data starting at a particular sample index. If that starting index is

in the past, the past data is immediately reported, and any remaining requested

data is streamed to the client as it arrives. If that starting index is in the future,

no data is returned until the first samples arrive.

The client API implements helper functions that provide buffered and stream-

ing event–driven interfaces. The buffered interface allows a client to request a

whole buffer, and have that buffer delivered whole for processing. This is usually

the most convenient way to acquire a bounded clip of data when the data must

be processed in its entirety. The streaming interface allows a client to receive

incoming data in fixed or variable sized chunks. This is most convenient when

the client must process a stream of data in real time, with bounded latency.

The vxpcd module uses the Sensor Device pattern to provide two separate

sensor devices: single and all. The single device presents only the data received on

channel 0, while the all device presents all four channels, after synchronization, in

an interleaved format. By providing both of these alternatives, applications that

need to do simpler real–time detection can reduce overhead by only processing

one of the channels. In the event that an event is detected, the data from all four

channels can be extracted and more sophisticated processing can be done.

7.2 An Integrated Time Synchronization Service

The Emstar time–synchronization service was discussed in prior work [EGE02a]

[ER02] [EGE02b], including a Ph.D. thesis [Els03]. That work developed the

framework and theory behind the Emstar time synchronization services, as well

as the initial implementation of syncd, the Emstar timesync service, shown in

Figure 7.3. In this work, we continued to develop that implementation, devel-

139

syncd Packet IO Thread

vxpcd

Compute Thread

RangingSystem

/dev/sync/params


udpd

floodd

syncd Packet IO Thread

vxpcd

Compute Thread

RangingSystem

/dev/sync/params


udpd

floodd

Figure 7.3: Block diagram of the syncd service.

oping new drivers and addressing some additional system considerations. We

also proved more conclusively that the approach promulgated in the Reference

Broadcast Synchronization (RBS) system design is often the only solution to tight

cross–node time synchronization that does not require low–level firmware access

that is generally impossible to achieve using COTS components.

7.2.1 Conversion–Based Time Synchronization

The Emstar timesync implementation is a departure from many other timesync

implementations, such as NTP [Mil94], as well as many timesync schemes in

the sensor network domain [GKS03] [GR03] [MKS04] [GGS05]. Rather than at-

tempting to synchronize or discipline clocks, the Emstar timesync module allows

the clocks to run freely, instead computing conversion parameters that enable an

application to relate timestamps from different timebases.

140

This approach has several advantages. First, disciplining a clock often requires

constant, timely adjustments, which can be difficult to engineer in application

software. The most effective clock disciplining approaches are implemented with

specialized hardware, e.g. a phase locking feedback loop using a VCO, frequency

counter, and a DAC, or a temperature compensation circuit. When it is possible

to implement these types of solution in software, it must be done at a very low

layer of the system.

Second, altering a running clock results in discontinuities that complicate the

correct use of the clock values. To use these clock values correctly, applications

must be aware of a complex array of discontinuities, including the possibility that

the clock jumps backwards. When considering signal processing applications that

assume isochronous time series, this added complexity is a significant headache.

Third, the system can operate in a relative sync mode or stay offline for long

periods of time without introducing problems. Where clock–disciplining solutions

encounter their worst problems with discontinuities when they are forced to run

offline for long periods of time, a conversion–based system can always gracefully

recover. While timestamps recorded in the intervening offline period may not

be accurately converted, once new conversion parameters are computed, recent

timestamps will be readily compared. In addition, a conversion–based approach

does not require a global master clock; any two peers can meaningfully compare

their timestamps to each other without any third–party reference.

This third point highlights the caveat to the conversion–based timesync ap-

proach: because clocks are not linear over long time periods, interpretation of his-

torical timestamps requires historical conversion information. Thus, conversion–

based approaches are plainly better for applications requiring precise timing com-

parisons of recent timestamps. However, this approach is not sufficient for appli-

141

cations that require interpretation of historical timestamps, or that require global

time or frequency references.

The easiest way to address this concern is to use the Emstar conversion–based

timesync system to publish “global time” out to the network from a trusted

time and frequency reference. The Emstar gsyncd module does exactly this; it

uses hop–by–hop local conversions to disseminate global time from one or more

locations. Thus, historical timestamps should be maintained in a global timebase,

while accurate local comparisons can be made directly using local conversion

parameters.

7.2.2 The Timesync API and Time Conversion Graph

The syncd module maintains a time conversion graph in which nodes represent

clocks and edges represent linear conversion functions. This graph includes con-

versions from two different types of data: RBS relations, and “pairs” relations.

An RBS relation relates clocks on two nodes by correlating the measured

times of events, for example of the reception of broadcast packets. The better

correlated the event time estimates, the better RBS will work. That is, if the

process of detecting an event tends to have correlated latency on all nodes, then

RBS will work well, even if the latency varies widely from one event to the next.

The canonical example of this is the fact that when considering reception of

broadcast packets, RBS is immune to non–determinism in media access times,

variations in packet length and transmit rate, etc.

A pairs relation correlates an arbitrary clock to a node’s CPU clock. For

example, the relationship between the CPU clock and the sample clock for a

sound card is expressed as a pairs relation. The difference is subtle: where RBS

relations are between two instances of the same observation mechanism, pairs

142

relations are between two different observation mechanisms. This means that

pairs relations tend to introduce more error, and often have an error distribution

that does not have a µ of 0, whereas RBS relations can often factor out the

common mechanisms because the error it introduces is correlated.

These RBS and pairs relations form a connected graph. All of the pairs

relations on a particular node form a star topology around the CPU clock, while

RBS relations link from one node to another. The client side of the timesync

API allows an application to readily convert a timestamp in any known clock to

any other clock that is connected to it through the conversion graph. The server

side of the API allows a service to inject synchronization information that clients

can use.

For example, in Section 7.1.2 we described how our implementation synchro-

nizes the sound hardware to the CPU clock by submitting observations to the

timesync subsystem as a pairs relation. Once this data is provided to syncd,

applications can convert from the sample clock to and from the CPU clock, and

thus to any other clock that can is “reachable” from that node. By converting

timestamps in packets as they travel through the network, sensor data from re-

mote nodes annotated with timestamps can be precisely matched up with local

sensor data.

One of the improvements we made to the timesync system was to decouple the

portions of syncd that send and receive radio messages, the portion that performs

the linear fit computations, and the portion that services clients and servers.

These changes were necessary to allow the system to scale to larger numbers of

nodes, higher density deployments, and to more complex systems. Figure 7.3

shows a block diagram of the use of timesync in our platform and ranging appli-

cation. During the development of this platform, we discovered that the linear

143

0

1000

2000

3000

4000

5000

6000

7000

0 200 400 600 800 1000 1200-20

-15

-10

-5

0

5

10

15

20

Offse

t(µ

S)

Fit

erro

r(µ

S)

Seconds

Correlation of Network interrupts to CPU via RBS (RMS=8.7 usec)

Figure 7.4: RBS correlation of the timing of received broadcasts. This graph shows

that CPU clocks are stable with respect to each other over time periods as long as 20

minutes.

fit computations were blocking the main thread and causing significant system

latency problems. We also encountered a problem with module dependency loops

when we added timesync support to the low layer radio interface driver. Both

of these problems were solved through the use of threads and message queues,

represented in the diagram by dashed lines.

7.2.3 RBS vs. MAC Layer Timestamps

During the development of this platform, we discovered that the our existing

RBS–based synchronization was not performing as well as it had on other plat-

forms. Figure 7.4 shows the performance of RBS synchronization on our platform,

based on correlating interrupt arrival times on different nodes. The plot shows

144

0

20

40

60

80

100

120

140

0 50 100 150 200 250 300-100

-80

-60

-40

-20

0

20

40

60

Offse

t(µ

S)

Fit

erro

r(µ

S)

Seconds

Correlation of two MAC clocks using RBS (RMS=1.2 usec)

-2000

0

2000

4000

6000

8000

10000

12000

14000

0 50 100 150 200 250 300-1000

-500

0

500

1000

1500

2000

Offse

t(µ

S)

Fit

erro

r(µ

S)

Seconds

Correlation of the MAC clock to the CPU clock: (RMS=393.6 µS)

Figure 7.5: The MAC clocks appear to actively adapt their rates, rather than main-

taining frequency stability: (a) shows a central mode with perfect rate matching, while

(b) shows that the frequency of the MAC clock is unstable when referenced to the CPU

clock. But we know from Figure 7.4 that the CPU clocks are stable with respect to

each other.

145

-200

-150

-100

-50

0

50

100

150

200

0 25 50 75 100 125 150

µSec

Seconds in CPU time

Frequency Stability of MAC-level timestamps

Observations after applying linear correction

Figure 7.6: Expanded plot of MAC timestamps showing high levels of noise.

seconds on the x axis, and shows the residuals from the linear fit as impulses ac-

cording to the scale of the right–hand y axis (see Section 7.1.2 for a more detailed

explanation of these plots).

Where we had been able to achieve fit errors of a microsecond with our previ-

ous iPAQ platform and Orinoco 802.11 cards, we found that our performance on

the Slauson with SMC 802.11 cards was 15–30 microseconds fit error given the

same averaging parameters. We found that we could improve this situation by

extending the averaging period, but even this could only improve the fit error to

about 5–10 microseconds. The problem appeared to be that, compared with the

Orinoco firmware, the generation of interrupts by the Prism II firmware was less

deterministically correlated with packet reception. As a result, we were seeing a

great deal more noise in the data, which required longer averaging intervals to

correct.

146

In an effort to improve upon this, we implemented a new in–kernel inter-

face that exposed MAC layer header information about each packet, including

a microsecond–granularity MAC layer timestamp. We hoped that by using this

timestamp we could get a more precise RBS relation directly between cards, and

we could then create a comprehensive pairs relation from the MAC clock to the

CPU clock. The advantage of this approach would be that we could use every

arriving packet to improve the pairs relation, and the MAC timestamp would

be highly accurate because of its low–level source. In addition, by breaking the

conversion into two components, the MAC–MAC conversion layer and the MAC–

CPU conversion layer, conversions through multiple radio hops could remain in

terms of MAC clocks directly, thus avoiding the need to convert through the

higher–error MAC–CPU relation on each hop.

This seemed like a promising plan, but it ran into an interesting, but fatal flaw.

Figure 7.5 describes the performance of the two components of our MAC layer

synchronization experiment. The upper graph shows the result of correlating the

MAC timestamps of broadcast packets received on two nodes (the MAC–MAC

conversion). The lower graph shows the result of correlating the MAC clock

to CPU clock (the MAC–CPU conversion) based on relating interrupt times to

packet timestamps. In both graphs, the x axis is seconds of real time, and the

right–hand y axis shows the residual of each point against the linear fit.

These graphs display some interesting properties. Considering first the graph

of MAC–MAC correlation, we see that it contains several widely–spaced modes,

with a very tight central mode. The syncd linear fitting algorithm automatically

performs outlier rejection, and in this case rejected the other modes as outliers,

leaving only the central mode and a very tight fit of 1.2 µS average error. However,

the fact that many of the spread of outliers is greater than 100 µS is worrisome.

147

The other interesting fact about the MAC–MAC correlation is that the rate is

exactly matched, with a computed rate skew of 44 picoseconds per second. Given

the large outliers, this perfectly matched rate casts more doubt on the validity of

the MAC timestamps.

Now considering the graph of MAC–CPU correlation, we see a very non–linear

behavior. Over the course of 300 seconds, the rates of the two clocks have varied

greatly with respect to each other, resulting in large errors over this averaging

interval. A valid linear approximation of the MAC clock can tolerate averaging

intervals of at most a few seconds. We know that this frequency instability does

not come from the CPU clocks, because our CPU–CPU correlation shown in Fig-

ure 7.4, demonstrates very linear behavior over 1200 seconds. Worse still, we can

see from Figure 7.6 that in addition to having very poor frequency stability, they

also have large amounts of noise, with frequent spikes greater than 50 microsec-

onds. The combination of these two factors is disastrous, since the noise can only

be corrected if we can assume that the clock will remain linear.

We did not perform any experiments to carefully measure these clock prop-

erties against a ground truth frequency source, and we do not know precisely

why the MAC clock performs poorly. One hypothesis is that in ad–hoc mode the

802.11 cards continually sync to each others’ clocks, for example by training the

clock in response to incoming packets. This explains our observation that the

RBS sync among MAC clocks always reported a rate skew of 0, suggesting that

the MAC clocks are keeping their rates synchronized to each other. As an object

lesson, this demonstrates once again the value in the RBS approach, which rather

than relying on features of the hardware, tends to be blind to the implementation

details of the hardware.

148

7.3 Hop–by–Hop Time Conversion

Thus far we have discussed the design and application of our synchronized sam-

pling layer, but we have not discussed in detail how this works in a multihop

network. As we have seen, the synchronized sampling layer provides a network

of time conversions on each node, linking local clocks such as the sample clocks

of the sensors and the CPU clock, and also linking to CPU clocks on neighboring

nodes. However, we have not addressed connections to nodes more than one hop

away.

We could solve the multihop case in two ways. The first possibility would be

to publish the neighborhood conversion information throughout the network, so

that any node could convert from any other node’s clock to its own. This has

the disadvantage of being costly in terms of network traffic, and also does a large

amount of unnecessary work in the event that conversions are not needed. In ad-

dition, since time conversion data does not remain valid forever, new conversions

would constantly need to be flooded throughout the network.

Given these drawbacks, we chose to add hooks into the routing layer that

would convert packets in flight. In other words, when a packet containing a

timestamp is sent by an application, that timestamp is modified at every hop to

convert it into the local timebase. Any packet that is successfully converted can

be forwarded on to its destination. These hooks are implemented in a library

that all routing modules invoke to process packets. The library understands

certain packet types; to implement hop–by–hop conversion for a new application,

the developer need only modify that library to add the appropriate conversion

hook3.

3The Directed Diffusion API [IGE00] provides a more elegant solution to the problem ofmodifying packets at every hop. Our system could take a similar approach, but it is not clearwhether the additional generality would be worth the increase in complexity.

149

This solution has the benefit of using the latest conversion data available

(since it is getting it straight from the source), as well as only doing work when it

is actually needed—when timing information needs to be transferred through the

network. It also is a localized algorithm; since all time conversions are computed

within a neighborhood and since the packet only need travel from the source

to the destination, this solution scales with the number of hops from source to

destination, independent of the overall size of the network.

The gsyncd service mentioned in Section 7.2.1 provides a similar solution to

this problem by pushing out timestamps and converting them at every hop to

compute conversions at every node to a single global time reference. gsyncd

builds a global sync tree from one or more global time references to every node,

minimizing the error in the conversion paths from the root [KEE03].

This solution is a good alternative to hop–by–hop conversion, although in

many cases it would not perform as well as converting along the direct path

from source to destination, because it would involve more conversions. Using

gsyncd, all conversions take the path from the source up the global sync tree

to the branch where the path to the destination joins the tree, and then back

down the tree to the destination. However, using routing with integrated hop–

by–hop time conversion, conversions only occur along the path from the source

to destination, which is often a shorter path than the path through the root. The

other disadvantage of using the gsyncd approach is that it requires the additional

coordination of a global sync broadcaster, without which the system would fail.

150

CHAPTER 8

Multihop Wireless Layer

One of the most challenging aspects of wireless embedded networks is the de-

sign and development of the communications stack. This has been an active

area of research in the field, from ad–hoc IP routing protocols [PB94] [JM96]

[PR99] [CJB01] to a wide variety of protocol stacks designed for sensor net-

works [WTC03a] [IGE00]. The principal problem is that many of the abstrac-

tions that were developed in the context of the wired Internet no longer work

well when applied to wireless embedded systems. Given the relative youth of

the wireless embedded networking field, this leaves us with the difficult task of

developing all of the layers of the stack more or less from the ground up.

8.1 How Wireless is Different

Wireless networks differ substantially from wired networks. First, wireless net-

works tend to have lower capacities than wired networks. At any point in

time, wireless network technology tends to lag behind the equivalent wired tech-

nology in terms of capacity. For example, the current typical wired Ethernet

chipsets range from 100–1000Mb/sec, while typical wireless rates range from 11–

54Mb/sec. In addition, energy conservation is an important part of many em-

bedded wireless applications, and both transmitting and listening has significant

energy cost, thus encouraging the development of more efficient protocols.

151

Second, wireless networks are significantly less reliable than wired networks.

This is true not only in terms of the probability of packet loss, but also the prob-

ability of a failure of contention–avoidance mechanisms such as CSMA. This fact

is one of the primary reasons that the TCP protocol performs poorly over wireless

links: TCP erroneously interprets packet loss as an indicator of congestion and

backs off its transmission rate.

Third, unlike wired networks, wireless networks do not have a pre–defined

topology. Rather, each pair of nodes has some probability of transmission success

in the absence of other traffic, and that probability varies as a function of time.

Some pairs that exhibit a very low probability of success might be considered to

be “disconnected”, but even with low probability some packets will get through,

and in addition that status may change over time. To make matters worse,

links are not always symmetric. Hardware differences and asymmetries in the

noise environments can yield links with radically different loss rates forward and

reverse. There are many characterizations of wireless networks in the literature,

such as [CWP05] [ZG03] [WTC03b] [CAB03].

This means that, relative to wired networks that for the most part make

routing decisions purely on the basis of whether a link is up or down, routing

protocols in wireless networks have to:

• Continuously estimate the characteristics of links to neighbors, while dis-

counting the effects of collisions.

• Continuously agree upon a multihop topology.

• Route data as required by the application, while minimizing overhead.

Building such a system is a daunting task, because to make it work requires a

vertical solution that addresses all of these problems at once. This task is made

152

more difficult because much of the prior work in existing layered protocol designs

does not apply.

In this work, to narrow the scope of this problem we have attacked it from the

perspective of building StateSync, an efficient vertical protocol implementation

that provides a simple publish/subscribe API. In the process of developing several

implementations of this protocol, we learned a little more about the shape of a

more general protocol stack for wireless networks. We discuss this work in the

remainder of this Chapter.

8.2 The StateSync Abstraction

As the field of embedded networked sensing matures, useful abstractions are

emerging to satisfy the needs of increasingly complex applications. As a part of

the implementation of our acoustic position estimation application, we developed

StateSync, an abstraction for reliable dissemination of application state through

a multihop wireless network.

The StateSync layer presents a publish/subscribe interface to a set of application–

defined tables. The contents of these tables are reliably and efficiently broad-

casted a specified number of hops away, using a protocol that is robust to changes

to the network topology and changes in the receiver set. StateSync conforms to

a minimal consistency model for received values published by a single node, but

does not attempt to guarantee consistency between received values published by

different nodes. Using StateSync, the complexity of the multihop wireless net-

work is reduced to processing a gradually evolving set of table entries, subject to

certain minimal consistency checks.

153

8.2.1 Application Requirements

Embedded networked sensing applications inherit a long list of application re-

quirements that are more or less unique among distributed systems. The main

distinguishing characteristic is a high degree of dependence on the environment,

in the face of dynamic conditions and a limited capability to discover environ-

mental properties with certainty. Properties of the environment often affect both

system performance and the application’s objectives, and thus must be estimated

to achieve the system’s goals. These issues are at the heart of the design of suc-

cessful system components for embedded networked sensing applications.

We designed StateSync to extend the ideas of previous abstractions [WSB04]

and protocols [LPC04] to support of a specific class of applications. These appli-

cations have the following properties:

• Reliable delivery greatly simplifies the design of the application.

• A relatively large amount of data is shared, and freshness of the data is

important, including assurance that the publisher of data is still active.

• The data being shared exhibits low “churn”, meaning that the expected

lifespan of a data element is long compared with the system latency re-

quirements.

Our acoustic position estimation system is a good match to these properties.

Our system needs to disseminate range estimates throughout the network in or-

der to fuse them into a coordinate system. These range estimates tend to stay

constant as long as the nodes do not move, but might change drastically in the

event that a node is disturbed. Reliability is important for this application, be-

cause after a change to the position of one of the nodes, inconsistent or stale data

154

can present problems for the multilateration algorithm. The range data in this

application tends to have long lifespans, often going for hours or days without

modification. When modifications do occur, they often affect only a small frac-

tion of the data being published by a given node. Despite these long lifespans, low

latency is desirable, because additional latency in the propagation of updates di-

rectly affects the application–level performance of the position estimation system

by delaying position updates.

Building applications over the StateSync abstraction not only greatly sim-

plifies the implementation of applications, but also provides opportunities for

efficiently aggregating application state changes. Other examples of services and

applications that can benefit from this type of layer are routing protocols, con-

figuration and calibration mechanisms, and membership agreement protocols.

This work is similar to prior work in the wired network domain, including

ISIS [BC91], SRM/WB [FJL97], and implementations of Linda [GB82]. However,

these techniques make assumptions about Internet and LAN performance and

connectivity properties that do not hold for ad–hoc wireless networks. Our work

is designed to provide similar abstractions, but the protocol and implementation

is designed from the ground up for embedded ad–hoc wireless networks.

8.2.2 The StateSync Abstraction

The StateSync abstraction defines the data model, the API, and the semantics

of StateSync. StateSync imposes a simple data model of typed key–value pairs.

The data types are user–defined and can either specify a fixed record and key

length, or use variable record and key lengths. The key–value pairs are implicitly

annotated with a flow ID that includes a unique address for the publisher and

other user–definable fields. This additional implicit key effectively assigns each

155

Node A Node B

StateSync

Publish

Subscribe

StateSync

Publish

Subscribe

Node A Node B

StateSync

Publish

Subscribe

StateSync

Publish

Subscribe

StateSync

Publish

Subscribe

StateSync

Publish

Subscribe

Figure 8.1: Publisher applications push tables of key–value pairs to StateSync, which

disseminates them and delivers the complete table of all received keys to subscribers

whenever a change occurs.

publisher an independent key–space. At most one value is permitted per key:

when a pair is published with the same key, type, and flow ID as an existing pair,

the original pair is replaced.

The StateSync API presents a Publish/Subscribe interface. A publisher pro-

vides StateSync with a complete set of keys for a given type and flow ID to replace

all existing keys for that type and flow ID. A subscriber will receive events when-

ever there is any update in the data matching a specified type. The complete

data matching that type can be retrieved from StateSync, combined from all flows

that reach the subscriber. Each key–value pair is annotated with the flow ID of

the publisher of that data, as well as other metadata such as the arrival time and

the distance to the publisher in hops. For fixed–length records, simple arrays of

records are passed to and from the API. Figure 8.1 shows a block diagram of the

StateSync API.

The StateSync mechanism provides semantics that are designed to be relaxed

156

enough to be implemented efficiently in a wireless network, while still maintaining

useful properties. The StateSync subscribe interface presents only the most recent

state to a subscriber; it does not present each intermediate published state. This

policy eliminates the need to retain a backlog or a complete history in the event

of lengthy disconnection. In addition, StateSync guarantees that each state pre-

sented at a subscriber was in fact an actual prior state of the publisher. That is,

the view at the subscriber is never a partial state of the publisher (such as would

occur if a sequence of updates were played out of order). Third, the latency with

which a state propagates from publisher to receiver conforms to a probabilistic

latency bound that is a function of the number of hops, the size of the transfer,

and timers in the implementation. StateSync deliberately relaxes any guarantee

of consistency across disparate publishers. Consistency is guaranteed across the

set of receivers of a given published state, after no change has occurred for the

expected latency bound for the farthest node.

8.2.3 Related Work

The design of StateSync builds on the observations and experience of many past

and present systems in sensor networks. The importance and value of a neighbor-

hood abstraction was clearly laid out in the discussion of Hood [WSB04]. Hood

provides a way to approach several important concepts about neighborhoods,

and provides a best–effort transport layer. StateSync provides a similar API to

Hood, but extends its scope by defining a model that includes reliable delivery

over multiple hops. The Hood and StateSync solutions in some ways address

orthogonal application properties. Whereas Hood is designed to share ephemeral

data in a best–effort fashion, StateSync is designed to share long–lived data with

very low quiescent cost. Each of these solutions advances a significant space of

157

applications.

Relative to much prior work that present very generalized solutions to prob-

lems in distributed systems, StateSync defines a narrower set of properties, which

nonetheless represent a large application space. The StateSync API draws upon

prior experience with Publish/Subscribe interfaces in the context of Directed

Diffusion [IGE00] [HSE03] and other early work in Sensor Networks. However,

StateSync imposes more structure than a simple raw data interface, providing

an interface supporting application–defined fixed–length tables. The StateSync

data model of typed key–value pairs draws on experience with Tuple space sys-

tems such as Linda [GB82]. However, StateSync relaxes most of the locking and

group consistency semantics, because group consistency is generally too heavy–

weight for the wireless networks StateSync is designed to support. The StateSync

implementations build upon Diffusion Trees and upon work in reliable multi-

cast [FJL97], but encapsulate most of the protocol details behind an interface

that is fairly implementation–independent.

StateSync’s focus on maintaining a low quiescent cost of state synchroniza-

tion bears much resemblance to the Trickle [LPC04] protocol for code update on

TinyOS motes. In implementing our algorithms, we focused on low latency oper-

ation, efficient support for many concurrent publishers, and prompt detection of

the disappearance of a publisher. Trickle is designed for higher latency tolerance,

and while Trickle can support multiple trees, the costs scale with the number of

trees. The “polite gossip” mechanism of Trickle is a very effective way to reduce

quiescent cost of maintaining state, but unfortunately the savings is incompatible

with detecting source disappearance.

158

8.3 Variants of StateSync

In our exploration of the StateSync abstraction, we developed several variants

of varying complexity and with different performance characteristics in terms of

latency and network traffic. Since each variant conforms to a common API, we

can readily compare them in the context of different applications.

In this section we present three StateSync variants, in increasing order of

sophistication: SoftState, LogFlood, and LogTree. SoftState is a very simple im-

plementation based on periodic re–flooding of the complete state with no retrans-

mission mechanism. LogFlood introduces a log mechanism to enable publication

of updates to existing state and implements a local retransmission protocol, while

using a flooding mechanism to push data with low latency. LogTree introduces

an overlay network consisting only of the most reliable bidirectional links, and

forms distribution trees via that overlay. These variants are discussed in more

detail in the following sections.

8.3.1 SoftState

SoftState implements a periodic refresh of the complete state published by each

node. Each refresh is transmitted via a best–effort flooding service and is received

by nodes a specified number of hops away. If the complete state is larger than

a single MTU, the message is fragmented and reassembled across each hop. No

other form of reliability is implemented, so as the state size grows the latency of

SoftState increases rapidly. The latency of updates is a function of the refresh

interval and of the probability of message loss, which is in turn a function of total

state size.

SoftState is a very simple variant of StateSync with numerous drawbacks—

159

for example, its quiescent cost is high for most applications. However, it is

sufficient for some applications, and it can be readily implemented on low–end

platforms. An application that publishes only small amounts of data and can

accept the bandwidth / latency tradeoff can use this protocol. SoftState is also

appropriate for applications with high “churn” relative to latency requirements.

If the expected lifetime of the data being published is on the order of the required

refresh interval, then there is little to be gained by transmitting only the portions

of the state that have changed.

8.3.2 LogFlood

The LogFlood variant introduces two important mechanisms that enable higher

efficiency and allow StateSync to be applied to a much larger space of applications.

The first is a log mechanism that stores and transmits published data in the

form of a log of additions and deletions of key–value pairs. This log enables the

data to be broken down into small segments and transmitted and re–transmitted

piecemeal. The second is a local retransmission protocol that can request missing

segments from a neighbor based on sequence numbers. In the following sections,

we will show that these two mechanisms enable much larger amounts of state to

be transmitted efficiently.

8.3.2.1 The StateSync Log Scheme

As we have described in Section 8.2, StateSync is based on a key–value data

model and the API is tuned to support tables of fixed length key–value pairs.

These design decisions fit neatly into a log–based transport scheme, because they

enable the application to define the granularity at which changes typically occur,

and specify precisely which parts of the existing state need to be re–transmitted.

160

The StateSync log scheme is designed to provide correctness with low overhead

and to support a continuous stream of log entries. The StateSync log is composed

of a sequence of variable–length entries containing a 16–bit sequence number and

a command field. The first entry is always an INIT command, and has sequence

number 0. The INIT message contains a 64–bit log sequence number that is

chosen randomly by each node on boot and is incremented whenever a new log is

created. This sequence number is used to protect StateSync against inconsistency

from reboots or stale data.

Following the INIT command, a sequence of ADD and DEL entries represent

the addition and deletion of keys. An ADD entry adds a new key and value

to the state published by a given node, replacing any previous entry with the

same key. A DEL entry removes an existing key and value from the published

state. Additional command types are used to fragment large entries that might

otherwise exceed the network MTU.

Unlike protocols like TCP that use byte ranges, sequence numbers in a StateSync

log are assigned at the granularity of log entries. The reason for this design

choice is two–fold. First, sequencing at a larger granularity reduces the required

size of the sequence numbers, and thus reduces protocol overhead. Second,

by always transmitting whole entries rather than byte ranges, the log entries

can be processed by the application out of order, as in application layer fram-

ing [FJL97]. The drawback of this scheme is that, unlike the case of IP fragmenta-

tion, StateSync log entries cannot be adaptively fragmented “in flight”. Instead,

a predefined granularity must be selected at design time, taking into account the

MTU of the networks in the system and the expected size of the values published

by the application. While the choice of granularity can impact the utilization of

packets, in practice we have been able to use a single default value for all of our

161

INIT2367

Checkpointed Log

ADDx=1

ADDy=4

DELx

ADDy=6

TERM

INIT2368

Active Log

ADDy=6

ADDz=3

ADDw=9

DELy

ADDz=3

….

Redundant EntriesEntries Carried Over

ADDs=6

Active Entries

INIT2367

Checkpointed Log

ADDx=1

ADDy=4

DELx

ADDy=6

TERM

INIT2368

Active Log

ADDy=6

ADDz=3

ADDw=9

DELy

ADDz=3

….

Redundant EntriesEntries Carried Over

ADDs=6

Active Entries

Figure 8.2: The StateSync Log Scheme maintains a checkpointed and and an active log.

In the diagram, the first two ADD entries in the active log are carried over from the

checkpointed log after the redundant entries have been compressed out.

development.

The other key design problem for the StateSync log mechanism is how to

address the problem of an infinitely growing log. While ADD and DEL commands

often make a previous log entry redundant, those redundant log entries cannot be

deleted without forfeiting the semantic requirement that StateSync subscribers

always see a valid past state of the publisher. In addition, as state changes

occur, an increasing fraction of the sequence space will be consumed by redundant

entries. Given StateSync’s relatively small 16–bit sequence numbers, this can lead

to sequence number exhaustion. To address this we apply a solution similar to

the “new page” abstraction implemented by the WB application[FJL97].

Each StateSync log maintains two sub–logs: a checkpointed log and an active

log, as shown in Figure 8.2. New additions to the log are always appended to the

active log. When certain conditions are met—such as a maximum level of redun-

dancy in the log—the active log is “checkpointed”. A special TERM command is

162

appended to the active log, and it is rotated into the checkpointed slot. A new

active log is formed by incrementing the log sequence number and compressing

the previous active log, renumbering the entries starting from sequence 0.

The checkpointing process addresses the problem of infinite logs at minimal

cost. The only cost of the scheme is an additional TERM entry; once the ter-

minated log is received completely, the checkpointing process is a local opera-

tion that does not require any additional network traffic. As an optimization,

StateSync will queue out–of–order entries that pertain to the new active log be-

fore checkpointing is complete.

8.3.2.2 The StateSync Retransmission Protocol

Once the state data is organized in a sequenced stream of small blocks, we can

implement a local retransmission protocol. Similar to many reliable multicast

protocols, StateSync’s retransmission protocol is receiver–driven with proactive

broadcast as an optimization [FJL97]. Receivers add received entries to their logs

and maintain state about which log entries are missing based on sequence num-

ber gaps. Receivers then schedule NACK requests for specific missing sequence

ranges, with an initial delay followed by an exponential backoff. Optimizations

such as NACK suppression and more sophisticated timers such as in [FJL97] are

not currently implemented.

The wire protocol used by StateSync is designed to be efficient in terms of

network usage and flexible in terms of packet structure. The packet format does

not have a pre–defined header structure, but rather is composed of a series of

variable–length entries, similar to other proposed wire formats [GKE04] [BFH03]

[IGE00]. As a result, this flexible structure exhibits lower overhead and is also

more amenable to piggybacking on other traffic. The wire protocol incorporates

163

numerous optimizations, such as the ability to define a length field that applies

to several subsequent log entries, or a sequence number that applies to several

subsequent NACKs. For example, the overhead of sending 20 sequential range

entries in our acoustic localization application is 25 bytes beyond the 400 bytes

of data.

8.3.2.3 LogFlood Multihop Implementation

Given the log and protocol mechanisms described above, the LogFlood multihop

implementation is straightforward. First, the retransmission protocol is extended

to include the flow–id and the current hopcount of the successive data. The flow–

id identifies the publisher–subscriber pair and any additional de–multiplexing

bits. In this case, the publisher is identified by a network–layer address and the

subscriber is always “broadcast N–hops”. Because of the flexible structure of the

wire protocol, entries from multiple flows can be packed into a single message.

With this minor change, a simple state machine can implement the multihop

flooding protocol. Incoming messages are parsed to extract the flow they pertain

to, the hopcount, and the log entries comprising the data. Any messages that

are not already present in the log and that are not beyond the maximum desired

hopcount are scheduled for retransmission. The hopcount of a flow is determined

by recording the lowest hopcount of incoming messages on that flow, and adding 1.

When the next transmission is scheduled, any outgoing entries are concatenated

with their flow–id’s and hopcounts into a single packet and broadcast out to

neighbors.

This simple state machine, in addition to the local retransmission protocol,

implements an efficient many–to–many flood that can piggyback floods from dif-

ferent sources onto the same packets. However, it is not guaranteed to be reliable;

164

3256S

3256N

3286 3420 344034363428

Hallway

3256S

3256N

3286 3420 344034363428

Hallway

Figure 8.3: A screen shot from EmView displaying the wireless testbed deployed in our

building. The scale of the map is 5 meters per grid square.

if the last packet is lost, the retransmission protocol cannot discover that there is

a sequence number to NACK. To solve this issue, LogFlood also floods a periodic

refresh message, beginning a fixed time after the last new log entry was flooded.

These messages are small and can be piggybacked as described above, but still

represent a significant quiescent transmission overhead. They also place limits on

join latency. The quiescent cost scales roughly as nk where n is the total number

of nodes and k is the number of nodes in the flood radius, The join latency is

determined by the refresh rate.

8.3.3 LogTree

The LogTree variant builds on the log scheme and local retransmission protocol

described in Section 8.3.2. However, where LogFlood used a flooding protocol for

proactive dissemination and end–to–end reliability, LogTree implements a distri-

bution tree for each publisher in order to reduce redundant transmissions without

significantly impacting latency. LogTree also reduces the quiescent cost of the re-

liability mechanism to 1 message per node per refresh interval, compared with

k messages per node for LogFlood. To accomplish this, LogTree introduces an

underlying layer called ClusterSync.

165

8.3.3.1 ClusterSync

The ClusterSync mechanism serves two functions. First, it estimates the topology

of the network and constructs an overlay network consisting only of links that

meet certain criteria. Second, it provides a single–hop version of StateSync, with

the same API and semantics.

To form the overlay, ClusterSync uses a link estimator and periodic beacons to

discover the topology of the network and to continuously estimate link quality. It

uses a link estimator called RNPLite that consumes one additional byte of over-

head per packet and computes link estimates based on the principles in [CWP05].

ClusterSync uses the link estimates to select links for the overlay that meet cer-

tain criteria, including bidirectionality, a minimum link quality metric, and a

connectivity metric that prefers neighbors with distinct neighbor sets.

The single–hop version of StateSync uses the same log scheme and retransmis-

sion protocol as other versions of StateSync. End–to–end reliability is achieved

by each node periodically including its latest sequence number in the beacon

message it sends for link estimation. When other ClusterSync traffic is present,

beacon messages and sequence numbers are piggybacked on existing traffic.

The ClusterSync mechanism has many advantages that pertain to applica-

tions. Many applications benefit from ClusterSync’s stable overlay network and

prompt detection of topology changes. While topology information is not always

necessary to the correctness of an application, it often simplifies the application

and results in greater responsiveness. The stable overlay also presents a more

stable definition for the hopcount used to limit the scope of state dissemination.

From an application’s perspective, it is often more important that the receiver set

be stable than that they be a specific “distance” away. ClusterSync also provides

an efficient way to reliably disseminate state variables to immediate broadcast

166

neighbors. ClusterSync will provide the greatest performance improvement to

applications that need to publish keys with long lifespans, since the up–front cost

of reliable transfer is amortized by higher efficiency during quiescent periods.

8.3.3.2 LogTree

LogTree is a multihop StateSync variant that builds distribution trees to yield

a performance improvement over LogFlood. It builds its trees in the overlay

topology constructed by ClusterSync, and uses ClusterSync to publish routing

and flow metadata.

LogTree implements a distance vector algorithm that computes a route and

a number of hops back to each publisher. The route to each publisher is used

to select a peer for requesting local retransmissions and the hopcount to the

publisher is used to determine whether or not to proactively forward new data.

Each node also advertises its “preferred” upstream peer for transmission, which

is used to prune the proactive distribution tree. All of this routing metadata (i.e.

flow ID, hopcount, and preferred upstream peer) is published to adjacent nodes in

the overlay network through the ClusterSync mechanism. Because ClusterSync

is reliable, LogTree only needs to process updates from ClusterSync and keep

pushing its most recent routing state back to ClusterSync. The ClusterSync

layer handles all of the complexity of message loss and of timing out stale data

and stale neighbors.

LogTree implements end–to–end reliability using a similar mechanism. In ad-

dition to the other per–flow routing metadata, a log sequence number is published

via ClusterSync. This sequence number propagates along with the distance vector

messages to inform all nodes of the most recent sequence number published by

the source node. In order to limit the traffic pushed through ClusterSync, LogTree

167

sets a 5 second holdoff timer after each new data element is pushed before push-

ing a new sequence number out via ClusterSync. This information enables nodes

to request retransmissions in the event that the most recent message of the log

was lost.

8.3.3.3 Optimizations to LogTree

Our experiments with LogTree show that it outperforms LogFlood and SoftState

in terms of total volume of data transferred, and does not suffer that much in

terms of latency (see Section 8.4). However, in order to achieve these results we

implemented two optimizations: flooding mode and flow–ID compression.

Flooding mode addresses the startup latency of ClusterSync and of building

distribution trees. The original LogTree implementation suffered latency problems

in the event that the overlay network had not yet formed, or when the distribution

tree for a particular source had not yet been constructed. To address this, we

modified LogTree to proactively flood messages in cases where hopcount was

not yet reached and neighbors were observed that did not report an active tree.

This optimization achieves similar latency to LogFlood, while only incurring the

bandwidth penalty as the distribution tree is still being constructed.

Flow–ID compression is an optimization that allows the routing metadata to

scale better as the number of distribution trees grows. Each node defines a dictio-

nary that locally maps flow–IDs to small integers, and publishes this dictionary

through ClusterSync. This enables a full 12–byte flow–ID to be replaced by a

1–byte nickname, reducing the size of published route metadata and reducing

the size of headers on data messages that pertain to a given flow. This technique

might be applied to other nicknaming problems, although it can increase join

latency as the complete dictionary must be replayed to new neighbors.

168

8.4 Benchmarking StateSync

In this section, we describe how we measure the performance of our StateSync

variants, both in a set of benchmark tests, and in the context of running appli-

cations.

8.4.1 Metrics and Experimental Setup

Our criteria are primarily focused on two metrics: the distribution of latency

in state propagation, and the network traffic incurred by our mechanisms. The

latency is determined by logging the activities of the application or benchmark,

matching up publish states with subscribe states, and logging the time lag. Net-

work traffic is determined by measuring the number of bytes and packets that

pass through the network interface, and in some cases by measuring statistics

gathered directly from the mechanisms.

Our measurements were taken from simulations and tests on a wireless testbed.

The testbed experiments were run from a centralized server with remote connec-

tions to a set of 12 802.11 radios hosted by Stargates distributed throughout

our building, as shown in Figure 8.3. The simulations were run within the Em-

star [GEC04] environment on a typical workstation. Simulations of the Localiza-

tion and Sink Tree applications were also run with a larger, 50–node topology.

For validation purposes, we also ran simulations using the same topology as the

testbed experiments, and found that the differences were negligible.

8.4.2 Benchmark Tests

In order to characterize the abstract performance of our different mechanisms,

we ran a series of benchmarks. The results of those benchmarks are shown in

169

0

2000

4000

6000

8000

10000

12000

1_chunk

4_chunks

16_chunks

64_chunks

Tot

al N

umbe

r of

Pac

kets

Packets TX

LogFlood_1_senderLogFlood_12_sendersLogTree_1_senderLogTree_12_senders

0

500000

1e+06

1.5e+06

2e+06

1_chunk

4_chunks

16_chunks

64_chunks

Tot

al N

umbe

r of

Byt

es

Bytes TX

LogFlood_1_senderLogFlood_12_sendersLogTree_1_senderLogTree_12_senders

0.1

1

10

100

1_chunk

4_chunks

16_chunks

64_chunks

Med

ian

Late

ncy

(sec

)

Latency

LogFlood_1_senderLogFlood_12_senders

LogTree_1_senderLogTree_12_senders

Figure 8.4: Results of benchmark tests on the testbed. Each grouping of bars represents

four 20–minute experiments in which 64K of data is published in a fixed number of

chunks, issued at regular intervals.

170

Figure 8.4. These benchmarks are intended to measure the differences in per-

formance between the LogFlood and LogTree variants when driven with simple

workloads. Each experiment lasted 20 minutes, and published 64K of data via

StateSync, evenly distributed among the publishers in that experiment. The only

difference from one experiment to the next was the distribution of the data in

time (i.e. when it was published) the number of nodes involved in publishing.

In the first set of experiments, only one node published data and we varied

the number of “chunks” the data was broken into. Each chunk was published at

a uniform division of the 20 minutes. From Figure 8.4, we can see that LogTree

always sends fewer bytes and generally achieves comparable latency.

One peculiar feature of the Bytes Transmitted graph is the fact that the 1

sender, 16 chunks case for LogFlood is so much higher than the other cases. This

is caused by additional retransmissions that occurred in that case. Apart from

that case, we see that in general for one sender the performance of LogFlood

appears to approach that of LogTree as the frequency of updates increases. This

occurs because LogTree gains the most ground in quiescent periods when updates

are not occurring, but the current state is being refreshed. As the frequency of

updates increases, the length of the quiescent periods decreases and LogTree loses

its advantage. In contrast, LogFlood performs best in the cases where quiescent

periods are short. The 1 sender, 64 chunk case performs almost optimally because

the data transmissions are spaced out such that collisions are unlikely and the

ratio of refresh messages to data messages is the minimum of all of our test cases.

In the second set of experiments, all 12 nodes in the network published at each

interval, dividing the same total amount of data among them. With 12 senders,

both variants incur greater traffic cost. LogFlood consistently sends more data

than LogTree, although LogFlood sends fewer packets (and per–packet overhead is

171

First 600s Last 600sBytes Packets Bytes Packets

LogFlood 979604 2074 108874 1283LogTree 628467 2157 22231 830

Table 8.1: Packet and byte counts for LogTree and LogFlood for 1 chunk, 12 senders,

at 600 seconds and 1200 seconds.

not included in the byte counts.) This occurs because the current implementation

ClusterSync sends its own independent packets rather than piggybacking them

on other traffic. Since each followup refresh in LogTree involves a change to the

data published through ClusterSync, this substantially increases the cost in terms

of packets, although these additional packets carry very small amounts of data.

This problem can be addressed in two ways: by piggybacking ClusterSync and

LogTree packets, and by changing the timing of ClusterSync packets to induce

more aggregation of data into each packet.

Increasing the number of senders magnifies the overhead incurred while the

data size being published remains the same. Relative to LogFlood, LogTree saves

overhead in three different ways. First, LogTree performs more efficiently during

quiescent periods. Table 8.1 shows the performance of LogTree during the quies-

cent period after the 64K data is transferred in the 1 chunk, 12 sender case. In

the latter 600 seconds of the experiment, LogFlood transmits 5 times more bytes

and 50% more packets than LogTree.

Second, LogTree can represent refresh messages and packet headers more ef-

ficiently because it can compress the flow IDs to single byte nicknames. In our

experiment, as the number of senders and chunks grows, the number of packets

grows as the product of senders and chunks.

Third, because it follows the broadcast distribution tree, LogTree transmits

172

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01 0.1 1 10 100 1000

Fra

ctio

n

Latency (sec)

CDF of Publication Latency by hopcount, using LogTree

1 Hop2 Hop3 Hop4 Hop5 Hop6 Hop

Figure 8.5: The latency distribution, broken down by hopcount.

fewer times than LogFlood. Even though our test topology does not lend itself to

efficient distribution trees1 we see that the total number of bytes transmitted by

LogTree beats the minimum possible byte count required for flooding (786KB) in

the 1 and 4 chunk, 12 sender cases.

This benchmark data also provides us with some idea on how to model latency

as a function of the amount of data being pushed and the number of hops. From

Figure 8.4, we see that the latency scales roughly linearly with the size of the

input data. This result was expected, given the various forms of rate limiting

implemented in the local retransmission protocol. In addition, Figure 8.5 shows

that the distribution of latencies also increases as a function of the number of hops

from the publisher. The bimodal distribution in latency reflects the probability

1The long and narrow shape of our ceiling topology yields less redundancy than a gridtopology.

173

of a loss that results in additional delay before the 5 second holdoff timer expires

and the new sequence number is pushed.

8.4.3 Determining Application Suitability

Latency and traffic consumed are a good starting point for determining whether

StateSync is helpful to an application. StateSync is most appropriate to applica-

tions that need notification when state is stale or when the source of some data

has disappeared. In cases such as these, epidemic protocols are not appropriate

because they will mask stale data; the only possible solution is some kind of re-

fresh mechanism. StateSync provides a reliable transport that protects a large

collection of state variables with a single aggregate refresh.

To quantify an application’s needs we characterize the application using two

metrics: the application’s specific latency requirements, and the level of “churn”

in the application’s data, defined by the expected lifetime of a key–value pair.

If the expected lifespan of application data is low enough compared with the

required latency bound, then a simple periodic refresh may be cheaper than the

expected cost of a reliable transmission protocol. However, if the lifespan of

application data is likely to be much longer than the latency with which stale

data is to be detected, then the additional overhead of a reliable protocol is

justified. This argument holds true to an even greater extent in cases where the

quantity of data being refreshed further increases the cost of refresh.

Figure 8.6 shows the distribution of key lifetimes we observed when the system

was driven by our Position estimation application. In our study of the overall

performance of this protocol, we also studied the performance when used in two

other applications. While we do not discuss those results here, they can be found

in [GLP05].

174

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

Pro

babi

lity

Key Lifetime (sec)

CDF of key lifetimes in seconds for Acoustic Localization application

Acoustic Localization Churn

Figure 8.6: The distribution of key lifetimes for our position estimation application.

The mean key lifetime is 1506 seconds ±121.

8.5 Applying StateSync to Position Estimation

We have already discussed many layers of our Position Estimation system, in-

cluding the Time–of–Flight ranging layer, the Multilateration layer, and the Syn-

chronized Sampling layer. The StateSync collaboration primitive is the final piece

that lets the whole system fit together. In this Section we show how we put all

these pieces together and how we leverage StateSync as a simplifying primitive.

In our system, the range estimates and angle estimates determined in Chap-

ter 3 using the synchronization infrastructure described in Chapter 7 are reliably

published to all nodes within some maximum hopcount. GPS–derived or stati-

cally configured information about survey points are published from the individ-

ual survey points using the same mechanism. Each node therefore has all the raw

175

data and can fuse it using the multilateration algorithm described in Chapter 4 to

estimate its position relative to the other nodes. This algorithm may determine

that additional ranging information is required in order to tie itself in to the map.

In that case, the multilateration algorithm triggers additional local experiments

to obtain additional improved estimates.

8.5.1 Applying the StateSync Model

The reliability and consistency model of StateSync is used to ensure consistency

in the datasets that are fed to the multilateration algorithm. In the event that a

node is rotated or moved, the ranges and direction estimates relating to that node

are no longer valid. This in itself is not a serious problem, as it will only result

in estimating the node’s location as its last location. However, if further ranging

experiments lead to a mixture of old and new range and orientation estimates,

these inconsistencies are likely to cause the multilateration algorithm to fail.

In our application, this problem is addressed using a per–node “orientation

sequence number” that is incremented each time the node moves or otherwise

invalidates its ranges. The ranging component indicates its current orientation

sequence number when it requests peers to range to it. This enables nodes that

receive acoustic range signals to annotate their published estimates with the

sequence number that was in effect at the acoustic sender at the time that the

experiment occurred. Published estimates are also annotated with the publisher’s

sequence number, indicating that those estimates are relative to their current

position. Whenever a node increments its sequence number, it deletes all ranges

it had previously published, and then publishes its new sequence number.

In spite of StateSync’s relatively loose consistency semantics, this protocol en-

ables the multilateration component to maintain a consistent dataset. To main-

176

tain consistency, the multilateration component records the current sequence

number published by each node, and all published data annotated with other

sequence numbers is ignored. The only exception is for range notification mes-

sages that arrive with a subsequent sequence number: as an optimization, these

messages are processed and published ahead of the arrival of a sequence num-

ber update from the source node. Because StateSync’s semantics guarantee that

states from an individual publisher arrive in sequence, the table received from

a node can never itself contain inconsistent sequence numbers, and StateSync

reliability guarantees that the table update will occur within a probabilistic time

bound.

8.5.2 StateSync Simplifies the System Design

This architecture simplifies the system on a number of levels. First, the applica-

tion itself does not need to implement any form of coordination. The Ranging

subsystem responds to local triggers from the Multilateration subsystem, and

publishes the data via StateSync The Multilateration subsystem receives data

from StateSync and attempts to fuse it together. If the Multilateration does not

succeed in integrating that node into the system, or if new nodes appear that it

might range to, it triggers a local ranging experiment by commanding the Rang-

ing layer. Randomization is used in place of explicit scheduling of the ranging

experiments, de–synchronizing the ranging process.

Alternative solutions to this problem rapidly become more complex. Even if

explicit coordination is done, there is still the possibility that a collision occurs.

Further, if the explicit coordination requires a “leader”, then there are additional

problems of choosing a leader, and of handling cases where the leader fails or

becomes temporarily or permanently unavailable. There may be many reasons to

177

implement a leader–based approach: using a leader and explicit coordination the

ranging process can be done more quickly and only the leader need implement the

multilateration algorithm. However, it is interesting that using these primitives

this more complex approach is not necessary. Should a more coordinated scheme

be implemented, the StateSync primitive would still be helpful, similar to the

way Tuple Spaces such as Linda [GB82] are used. We discuss this possibility in

more detail in Chapter 12.

The StateSync reliability layer offloads much of the complexity of the wire-

less network from the application. StateSync handles all of the difficulties of

retransmission and recovery from message loss, while leveraging broadcasts to

efficiently distribute the data to multiple nodes. Nodes that join the network

after most of the other nodes have been running will automatically download the

complete set of published ranges when they join. Nodes that lose connectivity

to one part of the network and later rejoin to another part will see a smooth

transition. Their data will be re–broadcast only if the part of the network where

they rejoin has not already received that data. Data published by nodes that

reboot will automatically be marked as stale and removed.

Because the signal processing and estimation components of our Position esti-

mation system are already fairly complex, the StateSync abstraction is a powerful

tool. Using StateSync, the complexity of the multihop wireless network is reduced

to processing a gradually evolving set of table entries, subject to certain minimal

consistency checks.

178

8.6 Performance of StateSync for Position Estimation

The Position Estimation application is well suited to the properties of StateSync.

Typical large deployments of this type of system yield between 10 and 20 ranging

pairs per node [MGS04]. If the system performs multiple trials for each range, this

results in approximately 2–5KB of published data per node. The “churn” graph in

Figure 8.6 shows that these ranging records tend to have long lifespans, meaning

that simple soft–state refresh approaches will be costly over time compared with

mechanisms that can reliably cache the data. At the same time, low latency is

desirable because the latency in state update directly affects the length of time

that the system operates with incorrect position information, which in turn could

lead to sensing and actuation based on inaccurate location estimates.

To test the performance of StateSync when used in our localization appli-

cation, we ran several tests with different variants of StateSync underneath our

application. We ran each test for 1 hour on our wireless testbed, and for 1 hour in

the simulator, running a 50 node grid topology. The ranging process was primar-

ily driven by the multilateration algorithm, which would cause 3 range requests

in rapid succession, followed by an exponential backoff. In addition, we simulated

three of the nodes “moving” by forcing them to invalidate their range information

at three particular times. These invalidations result in a burst of ranging activity

and cause the backoffs to be canceled.

Figures 8.7 and 8.8 show the results of testing our localization application

using different variants of StateSync. The graphs show two types of information:

the cumulative bytes transmitted throughout the network as a function of time,

and the distribution of latency in a published state arriving at a subscriber.

From the graphs we see that in terms of bytes transmitted, LogTree performs

179

10

100

1000

10000

100000

1e+06

1e+07

1e+08

1 10 100 1000 10000

Cum

ulat

ive

Tra

ffic

(byt

es)

Time (sec)

Cumulative Network Traffic in Bytes for Position Estimation application

SoftStateLogFloodLogTree

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01 0.1 1 10 100 1000

Pro

babi

lity

Latency (sec)

CDF of Publication Latency for Position Estimation application


Figure 8.7: Results of tests of our Position Estimation application from our 12 node

testbed. The latency graphs show a CDF of latency in seconds. The curve for LogTree

shows some initial traffic in setting up the ClusterSync trees before the start of data

traffic.

180

100

1000

10000

100000

1e+06

1e+07

1e+08

1e+09

1 10 100 1000 10000

Cum

ulat

ive

Tra

ffic

(byt

es)

Time (sec)

Cumulative Network Traffic in Bytes for Position Estimation application


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.001 0.01 0.1 1 10 100 1000 10000

Pro

babi

lity

Latency (sec)

CDF of Publication Latency for Position Estimation application


Figure 8.8: Results of tests of our Position Estimation application from a 50 node

simulation. The mean latency for LogTree is 31.54± 0.58; for LogFlood is 14.33± 0.12.

181

better than LogFlood, both in terms of the amount of overhead during trans-

fers as well as in the rate of traffic during quiescent periods. During transfers,

LogTree’s pruned distribution tree provides significant savings over flooding, es-

pecially as the size of the network grows to reach the maximum hopcount of the

published flows. During quiescent periods, LogFlood’s periodic re–flood of the

latest sequence number is considerably more costly than the ClusterSync beacon

traffic that refreshes the ClusterSync sequence number (that in turn protects the

LogTree per–flow sequence numbers).

The latency graphs show that LogTree has twice the expected latency of

LogFlood. This can be explained by a number of factors, including higher hop-

counts on average than in the flooding case, and lower redundancy when data

is sent via the distribution tree. The “knee” at 5 seconds in the data from the

small network is caused by message loss combined with the 5 second holdoff on

publishing a new sequence number in LogTree. However, for Position Estimation,

the latency of LogTree is more than acceptable.

8.7 Enabling System Visibility Using LogFlood

In addition to using StateSync to publish range data, we have also used the

LogFlood variant of StateSync in deployment to improve our visibility into the

state of the network. We modified LogFlood to reduce the traffic it generated by

rate–limiting it to one packet per second and by increasing the refresh interval

by a factor of 100. We then implemented a few small applications that would

report information into the LogFlood system, and implemented a visualization

component to display the data in EmView. The result was a lightweight way to

see the current state of the whole fielded network at once. By starting up the

LogFlood component on our laptop, we would quickly get a view of the network as

182

our laptop retrieved the data cached at the nearest reachable nodes, and received

a follow–on stream of flooded updates.

In our system, we publish EmRun fault information and link quality data for

all neighbors over 70% reception rate, republishing only when the link quality

changes dramatically. In addition, we publish the state of the LogTree overlay

network as it changes, giving us a view into whether the network is successfully

connected or partitioned. This information was invaluable to the process of di-

agnosing problems in the field, from hardware faults and mis–configurations to

network connectivity problems.

183

Part III

Experimental Results

184

CHAPTER 9

Range and DOA Estimation Testing

In order to assess the capabilities of our system, we first performed several exper-

iments to measure the performance of the ranging and DOA estimation system.

These tests were performed in controlled environments, with as much ground–

truth accuracy as we could achieve. We performed two types of experiment:

an angular experiment to test the DOA estimation, and a straight–line distance

experiment to test the range estimation.

These tests demonstrate the functionality of all of the layers below the mul-

tilateration layer. These tests are a partial integration test, requiring the correct

behavior of the synchronized sampling layer, the time synchronization subsystem,

the multihop networking layer with hop–by–hop time conversion, and the esti-

mation algorithms themselves. For example, high precision ranging is effectively

a “ground–truth” test of the complete synchronized sampling layer. Inaccura-

cies in time conversions from the sender to the receiver will show up as range

error: every 28 µS of timing error translates into 1 cm ranging error. In addi-

tion, forward and reverse ranges between the same pair of nodes apply inverse

linear conversions, so any timing error will affect the forward and reverse ranges

oppositely. See Section 10.3 for an exploration of this kind of analysis.

185

24 feet

Cem

ent W

all

24 feet

Cem

ent W

all

24 feet

Cem

ent W

all

Figure 9.1: Experimental setup for the DOA component test.

9.1 DOA Component Testing

To analyze the performance of the direction–of–arrival (DOA) estimation, we

performed an outdoor experiment in Lot 9, one of the parking structures on

the UCLA campus. The setup for our experiment is shown in Figure 9.1. In

this experiment, we laid out a carefully measured square 24 feet on a side, by

setting out a plastic tape measure and taping it to the ground. We used a laser

measurement tool (Hilti PD30)1 to ensure that the tape was laid down squarely.

We positioned the emitter at one corner of the square. We positioned the receiver

on a tripod in the center so that we could easily rotate the array about the y

axis.

In order to accurately measure the ground truth azimuth angles, we attached

a laser to the side of the microphone array, as shown in the left hand image

1The Hilti PD30 laser range measurement tool is accurate to 1/16 inch and has a maximumrange of 600 feet.

186

Figure 9.2: Mounting the measurement laser for the azimuth test (left) and the zenith

test (right).

in Figure 9.2. Before taking each measurement, we set a cardboard box at a

measured location along the side of the square. We then rotated the array until

the laser lined up with a mark on the box that was in turn lined up at a particular

distance along the side of the square. We took measurements at 1–foot intervals

along the square, an average of 3.75 degrees. At each measurement location, we

recorded 5 trials.

After doing this experiment for azimuth angles, we mounted the array on its

side and remounted the laser, as shown in the right image in Figure 9.2. We

then repeated the test, collecting data for the range of zenith angles for azimuth

90 deg and 270 deg.

187

0

0.02

0.04

0.06

0.08

-2 -1 0 1 2

Fra

ctio

n

Error in deg

Distribution of Azimuth Estimation Errors

Fraction of Values in BinNormal Distribution, µ = −0.14, σ = 0.96

Figure 9.3: Overall distribution of errors in the Azimuth test. These results are well

within our target of ±1 deg.

9.1.1 Azimuth Performance

In this section, we present the results of our azimuth experiment. Figure 9.3

shows the overall distribution of errors from our azimuth test, without considering

incoming angle. This shows a roughly normal distribution, with mean 0.14 deg

and σ = 0.96 deg. This result outperforms the results reported in the Cricket

Compass [PMB01], and it is also a good result relative to our target of estimating

orientation ±1 deg.

Figure 9.4 shows the accuracy and precision of the DOA estimator as a func-

tion of incoming angle. The results for each test angle generally show high pre-

cision, but the means show a bias that appears to be dependent on angle. We

hypothesized two possible sources for this error, and tested modifications of the

algorithm to try to compensate for them.

188

-3

-2

-1

0

1

2

3

0 45 90 135 180 225 270 315 360

Dev

iati

onin

Deg

rees

(95%

Con

fiden

ce)

Degrees

Accuracy and Precision of Azimuth Measurements at 5.17 m

Figure 9.4: Results of the Azimuth test, showing deviation from ground truth. These

results suggest a bias that is dependent on angle.

9.1.1.1 Failure of the Parallel Ray Assumption

First, since these tests were done at a fairly close range, the “parallel ray” as-

sumption underlying our estimation algorithm does not hold completely. The

equations we derived in Section 3.3 assume that the signal hits all microphones

from the same angle, meaning that we can compare the measured lags directly

to the component of that angle along the line between the microphones. How-

ever, at 5m range, the ray reaching the “high” microphone travels an additional

0.19 cm, resulting in an angular error of 1.36 deg. To test this hypothesis, we

implemented a partial solution to this problem. Our partial solution corrected

the lags that are used to compute the error term on the right–hand side of equa-

tion 3.13. Rather than using the projection of the baseline onto the hypothesis

angle, we geometrically computed the lag based on the angle and the range es-

189

timate, while leaving the coefficients in the Jacobian matrix unchanged. This

appeared to make the zenith angles more consistent, but had a negligible effect

on the overall performance of the azimuth measurement.

9.1.1.2 Weighting to Offset Instability

Our second hypothesis was drawn from the fact that certain angles are more

error–prone than others. Cases where the incoming ray is nearly orthogonal to a

baseline in the array geometry will tend to yield constraints that are very sensitive

to slight perturbations in the input. To address this, we implemented a simple

weighting scheme that reduced the weights to compensate for this sensitivity.

Our scheme weights each row of the Jacobian matrix, as

Si =∑

j

Ji,j (9.1)

J ′

i,j = Ji,j

√

maxi Si

Si

. (9.2)

However, we did not find that this additional weighting significantly affected

the error.

9.1.1.3 Perturbations in Mounting and Placement

Our remaining hypothesis is that small variations between the actual and as-

sumed array geometry are causing the angle–dependent errors. An in–lab array

calibration procedure could be devised to calibrate the array to correct for these

specific deviations, similar to [RD04]. More investigation is required to determine

whether this is a significant source of error, and if so determine a solution. We

190

leave these improvements for future work.

9.1.2 Zenith Performance

We next present the results of our zenith angle experiment. Our setup for this

experiment is just as we described in Section 9.1, with the array mounted on its

side so that the zenith angle changes as the array rotates.

The zenith angle measurement varies from −90 deg to +90 deg, but it can be

measured at a variety of different azimuth angles. In our experiment, we mounted

the array with 0 deg azimuth pointed up, which meant that as we rotated the

array, one half of the experiment would we taken with azimuth 90 deg, and one

half with azimuth 270 deg.

The results for these two portions of the test are shown in Figure 9.5. The

left half of each graph represents results from cases where the signal is coming

toward the array from below the plane of the base, while the right half represents

cases where the signal approached from above the plane of the base.

9.1.2.1 Performance of Negative Zenith Angles

In general, we saw poor performance for negative zenith angles, where the angle

was less than −30 deg. In these cases, line of sight is often blocked by the Lucite

base of the array, yielding large errors and outliers.

The poor performance of the array for signals coming from below is not a

serious concern, because in practice this system would be deployed in a configu-

ration that minimizes the occurrence of signals coming from below the plane of

the array. Typically the arrays will be deployed on a terrain, with the object of

detecting signals on or above the terrain. For our envisioned applications, it is

191

-20

-15

-10

-5

0

5

10

15

20

-90 -45 0 45 90

Dev

iati

onin

Deg

rees

(95%

Con

fiden

ce)

Degrees

Zenith Measurements at 5.17 m from 90 deg side

-20

-15

-10

-5

0

5

10

15

20

-90 -45 0 45 90

Dev

iati

onin

Deg

rees

(95%

Con

fiden

ce)

Degrees

Zenith Measurements at 5.17 m from 270 deg side

Figure 9.5: Results of the Zenith test, showing deviation from ground truth. Some

asymmetry is evident when comparing the two sides. Negative angles approach from

beneath the array and are heavily obstructed.

192

sufficient to achieve accurate detection of signals at most 30 deg below the plane

of the array. Algorithms that depend on angular data can also take that error

distribution into account.

9.1.2.2 Performance of Positive Zenith Angles

The right hand side of the graphs shows the performance as the signal arrival di-

rection passes over the top of the array from 0–90deg. We observe generally good

performance up to about 45 deg, with slightly worse performance as we approach

90 deg. We also observe poorer performance as we approach from the 90 deg az-

imuth side relative to approaching from the 270 deg side. This asymmetry may

be a consequence of asymmetries in the array geometry and slight deviations in

the placement of the microphones.

The geometry of our arrays is partly a historical consequence: once manufac-

tured, changing the geometry is difficult. It would be interesting to experiment

with other geometries, such as a tetrahedron. From a software perspective, there

is very little cost to testing alternate geometries. We leave experiments with

alternative geometries to future work.

Another hypothesis is that the performance from the 90 deg side was hurt

because the taller microphone blocked LOS to the other lower microphone. Per-

forming a test with the array rotated to approach from azimuth 45 deg would

eliminate that shadow and might yield improved performance. An exhaustive

test of incoming angles for several arrays might yield more clues to the source of

these errors, as well as methods of compensation.

193

0

0.05

0.1

0.15

0.2

0.25

0.3

-10 -5 0 5 10

Fra

ctio

n

Error in deg

Distribution of Zenith Estimation Errors

Fraction of Values, −30 deg < φ < +45 degNormal Distribution, µ = 0.26, σ = 0.86

Fraction of Values, +45 deg < φ < +90 degNormal Distribution, µ = 0.31, σ = 2.28

Figure 9.6: Overall error distribution from the Zenith test. We observe that the error

distribution for “midrange” angles is comparable to that of the azimuth estimates,

although the error distribution for overhead angles performs more poorly.

9.1.2.3 Modeling the Error Distribution

As we saw in Chapter 4, our position estimation algorithms require a model of

the errors in the input variables. For most techniques, this model needs to be a

Gaussian model. Figure 9.6 shows our attempt to fit a Gaussian model to the

zenith error we observed in our test.

Noting that the error is angle–dependent, we divided the data into two sets,

one covering the “midrange” angles −30 deg < φ < +45 deg, and the other cov-

ering the “overhead” angles +45 deg < φ < +90 deg. We drop the angles under

−30 deg because they are usually very unreliable. Figure 9.6 shows two separate

histograms, one for each of these sets. Note that the bars in the histogram are

scaled to represent a fraction of the overall data set, so that they may be directly

194

Figure 9.7: The experimental setup for our range test in Lot 9, showing tests at 5m

(left) and 50m (right). The 50m test required multihop synchronization.

compared.

This technique could be applied further (e.g. creating a model that was

fully parameterized on both zenith and azimuth angles), but this data set is not

extensive enough to support a more complex model. We only have zenith data

for two azimuth angles, and all of our data comes form a single array. We do

not know the extent to which a detailed analysis of this data would be specific

to each array.

9.2 Range Component Testing

In order to characterize the accuracy and precision of our system’s ranging es-

timates, we designed an experiment to quantify the performance of the ranging

system. Figure 9.7 shows the setup for the ranging experiment. In order to very

accurately measure the range between the two arrays, we mounted the arrays

195

0

15

30

45

60

75

90

0 15 30 45 60 75 90-20

-15

-10

-5

0

5

10

15

20

Met

ers

(95%

Con

fiden

ce)

Mea

nE

rror

incm

Meters

Range Measurements from Lot 9

0

15

30

45

60

75

90

0 10 20 30 40 50-30

-25

-20

-15

-10

-5

0

5

10

Dis

tance

inM

eter

s

Dev

iati

onin

cm(9

5%C

onfiden

ce)

Experiment sequence (ordered by distance)

Range Measurements from Lot 9

Figure 9.8: Results of the Ranging test, 0-90m. In (a) the impulses show the mean

deviation from ground truth (right y scale), as a function of distance. In (b) experiments

are shown ordered by distance, with the mean deviation plotted relative to the right y

scale. The distance for each experiment is represented by the dotted line, referenced

to the left y scale.

196

0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Ran

geE

rror

(cm

)

SNR

Range Error vs. SNR observed in Range Test

-10

-5

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35 40 45 500

15

30

45

60

75

90

SN

R(9

5%C

onfiden

ce)

Dis

tance

inM

eter

s

Experiment sequence (ordered by distance)

SNR Correlated to 21− 11 log10 r in Range Test Data

SNR21− 11 log10 r

Distance

Figure 9.9: Plots showing the relationship between distance, SNR, and error. The

upper graph shows a scatter plot of range error vs. SNR. The lower graph shows the

relationship between SNR and distance, with experiments ordered by distance. The

dashed line shows a function of distance that fits well to SNR. The dotted line shows

the distance corresponding to each experiment.

197

on blocks that we could position on the ground with high precision. We used a

laser ranging tool and a tape measure to determine the ground truth distances,

using the laser to position the tape and then using the tape to get consistent

fine–grained measurements.

9.2.0.4 Range Error, Distance, and SNR

Figure 9.8 shows the results of the complete experiment. The upper graph plots

the measurements linearly against ground truth, with impulses showing the mean

deviation from ground truth according to the right–hand y axis. The lower graph

plots the mean error and 95% confidence intervals for all experiments, ordered by

distance; distance is shown as the dashed line. We observe from these graphs in

Figure 9.8 that there is no significant correlation between magnitude of error and

distance in an unobstructed environment. The errors we observe are independent

of distance because they result from inaccuracy in the detection process rather

than from error that accumulates over distance.

The plots in Figure 9.9 describe the relationship between measurement error,

SNR, and distance, for the test in Lot 9. The scatter plot in the upper graph

shows some correlation between measurement error and SNR. However, this data

is skewed because it excludes failed detections that did not generate a meaningful

error value. The lower plot relates SNR to distance, and shows that as the

distance grows the SNR drops, roughly proportionally to −11 log10 r. In this case,

failed detections are counted as SNR 0. We suspect that this sublinear falloff is

due to a combination of directional emitters, and a wave–guide effect induced by

the floor and ceiling of the parking structure. Unlike radio propagation, where

reflections cause a 180 deg phase shift and result in cancellation, reflections in

acoustic propagation generally add to the strength of the signal. These favorable

198

Experiment Precision Scale Range Scale

1.00–1.04m 1cm 1m5.00–5.04m 1cm 5m10.00–10.04m 1cm 10m50.00–50.04m 1cm 50m84.00–84.04m 1cm 84m5.00–5.50m 10cm 5m50.00–50.50m 10cm 50m1–10m 1m 1–10m10–50m 5m 10–50m50–90m 10m 50–90m

Table 9.1: Range experiments, grouped by target scale and precision.

propagation characteristics enable very long ranges to be achieved with lower

energy expenditures.

9.2.0.5 Multi–Scale Measurement

In order to get a sense for the overall performance of the ranging system, we

performed an experiment that tested the performance at many different scales.

The clumps of points in Figure 9.8(a) fall into clusters at several scales. Table 9.1

shows the list of distances that were tested. Each set of tests is designed to assess

the precision and accuracy of the system at a given distance scale.

The results of the tests are shown in Figures 9.10 and 9.11. These plots zoom

in on different segments of a distance–distance plot. For example, we can see that

at 5m the system achieves sufficient precision to correctly order measurements

one cm apart, although the measurements’ accuracy is off by nearly 1cm. The

same sequence of measurements at 50m was much less precise and much less

accurate, although other tests past 50m yielded better results. We believe that

some system connectivity problems were the source of the error observed in the

199

0

2

4

6

8

10

0 2 4 6 8 10

Met

ers

(95%

Con

fiden

ce)

Meters

Range Measurements from Lot 9, 1-10m

4.9

5

5.1

5.2

4.9 5 5.1 5.2

Met

ers

(95%

Con

fiden

ce)

Meters

Range Measurements from Lot 9, 5m

Figure 9.10: Results of the Ranging test, zooming in on 10 and 5 meters. These tests

show good accuracy and precision, despite being taken over a long time interval and

assuming a single temperature over the entire experiment.

200

49

50

51

52

53

54

55

49 50 51 52 53 54 55

Met

ers

(95%

Con

fiden

ce)

Meters

Range Measurements from Lot 9, 50-55m

49.9

50

50.1

50.2

49.9 50 50.1 50.2

Met

ers

(95%

Con

fiden

ce)

Meters

Range Measurements from Lot 9, 50m

Figure 9.11: Results of the Ranging test, zooming in on tests from 50–55 meters.

Anomalous behavior is observed at 50 meters, perhaps the result of a transient syn-

chronization problem. A bug that could have caused this has since been fixed.

201

0

0.05

0.1

0.15

0.2

0.25

-15 -10 -5 0 5 10

Fra

ctio

n

Error in cm

Distribution of Range Estimation Errors, Lot 9 Range Test

Fraction of Values in BinNormal Distribution, µ = −2.38, σ = 3.81Normal Distribution, µ = −1.73, σ = 1.76

Figure 9.12: Overall error distribution from the Lot 9 Range Test. The standard

deviation of the range error for all tests is 3.81 cm. If we drop the 17 values with error

larger than 10 cm, the standard deviation of the remaining distribution is 1.76 cm. By

applying the narrower model in our multilateration algorithm, we can drop the data in

the tails as outliers.

50 meter tests, and that they are anomalous.

9.2.0.6 Modeling Range Error

In Chapter 4 we saw the importance of developing an error model. Figure 9.12

shows the overall distribution of errors from the data we collected in the Lot

9 range test. From this graph, we can see that the data does not fit well to a

Gaussian model.

The reason for this is twofold. First, in the absence of environmental changes,

we would expect the data to be skewed to longer ranges. As we saw in Chapter 3,

the ranging signals tend to be detected as a sequence of pulses. If a lower–energy

202

first arrival is missed, the result will be a small positive error when one of the

following pulses is detected. As the SNR drops, the likelihood that second arrivals

are detected increases.

Second, this data was collected over a long time span, in which environmental

factors affecting range may have changed. We suspect that this is the cause of

the heavy tail of short ranges in this dataset. For the purposes of applying this

data to position estimation, we collect the data over a short time span in order

to avoid this problem.

Note that this model does not account for the effects of obstructions and

long reflections. Since our Lot 9 test did not have any obstructed conditions, we

did not observe any long ranges. However, as we saw in Chapter 4, reflection

problems are better removed by outlier rejection techniques, rather than trying

to incorporate them into a Gaussian model. The Gaussian model only applies

to measurement and detection error; it does not apply to errors from reflections

and the Gaussian model can be used to identify possible outliers when a system

of measurements is analyzed.

9.2.0.7 Sources of Error

In the course of running range experiments, we developed several hypotheses

about sources of error in this system. We hypothesize three primary sources of

error: time synchronization error, excessive noise, and weather factors.

Time synchronization error occurs when radio link quality changes and old

time conversions are kept without being updated. This occurred due to a bug in

the system which has since been corrected; time conversions now time out after

a certain period. However, we suspect that this bug may have tainted this data.

We plan to perform a followup experiment to address this.

203

Our estimation algorithms do very good noise rejection, but a sufficient level

of background noise will still render the signal undetectable. This problem is

mainly an issue in urban environments; in natural environments it is rare that

noise sources are present that pose a significant problem for our system.

Weather factors are the primary source of error and the most difficult to

address. As we discussed in Section 3.4, variations in the temperature can cause

1% error per degree C. Wind and relative humidity also affect the accuracy of an

acoustic ranging system. Worse, it is unclear how to compensate for this. First

of all, it is difficult to measure air temperature accurately enough to compensate

for temperature with the accuracy required to achieve centimeter precision over

80 meters—most temperature sensors are only accurate to 0.5 degC. Second, it is

not the temperature at a single point, but the average temperature all along the

path between the nodes that determines the correction.

Given these facts, our approach is to try to apply our system in the region

where the environmental factors are relatively uniform, and therefore their im-

pact can be well approximated by average values. For example, by performing

calibration at night, we avoid uneven heating from solar radiation and many is-

sues associated with that, such as updrafts. Our other technique to combat this

problem is to take all the measurements as temporally close together as possible.

For example, rather than combining range data collected over a long period of

time, we attempt to perform the ranges in a short span, during which we can

assume minimal changes to the environmental parameters.

Unfortunately, this does not help us much with our ranging distance test,

because the measurements are taken over the course of a few hours, as we move

the two nodes farther and farther apart. This time lag, and the fact that Lot 9

is only partially enclosed, may explain why we saw what may be time–varying

204

error in our Lot 9 data (we performed the measurements in order from near

to far, so in sequence they increase monotonically both in time and distance).

For this dataset, we collected point measurements of temperature and humidity

taken at different times and places, which varied considerably. Rather than use

all those measurements, we corrected this dataset based on a single consistent

temperature, selected near the end of the run.

While this testing strategy gives reasonable results, it is possible that much

of the error we see is due to temporal variations in conditions. To get a more

consistent assessment of the ranging system independent of environment, we plan

to perform a follow–up experiment in an underground garage which should have

a more stable climate.

205

CHAPTER 10

System Testing

In this chapter we describe the results of several complete system tests. In each

case, we deployed 10 nodes into an environment, measured the ground truth, and

ran the system repeatedly. Each run performed 5 trials at each of the nodes,

attempting to receive the signals at all of the other nodes. All of the raw data

was captured for offline analysis.

We performed system tests in two different environments: first in the UCLA

Court of Sciences and then in a forested area of the UCR James Reserve in

Idyllwild, CA. These tests are described in the following sections.

10.1 Urban Outdoor Test: Court of Sciences

Our first test was performed on September 22 2005 in the UCLA Court of Sci-

ences. The Court of Sciences, shown in Figure 10.1, is a large paved courtyard

surrounded on three sides by buildings. In addition to the paved areas, the court-

yard has intermittent grassy areas, and several tall hedges and planters. In the

figure, the hedges have been indicated with yellow lines. The positions of our

nodes in deployment has been indicated with yellow dots, along with the ID of

each node.

Table 10.1 describes the sequence of experiments and the measured weather

conditions. Before experiment 9, we calibrated the emitters using a sound level

206

Figure 10.1: The experimental setup for our system test in the UCLA Court of Sciences.

Node locations are indicated by numbered dots, while yellow bars indicate the location

of hedges. North is toward the top of the photo. Image courtesy of Google Earth.

207

-3000

-2000

-1000

0

1000

2000

3000

-2000 0 2000 4000 6000

101:(-25.96,-0.08,0.06,90.5)

100:(33.91,-19.86,2.10,88.7)111:(-26.05,-20.07,0.22,87.6)

104:(0.00,-0.03,0.84,87.2)

105:(-0.14,-20.12,0.50,85.9)

102:(34.04,26.13,-0.08,84.8)108:(-25.87,25.96,-0.04,91.5)

(53.98,0.02,2.29,89.8):103

109:(33.97,0.03,1.57,90.8)

106:(0.12,26.01,-1.18,87.9)

Figu

re10.2:

Outp

ut

ofth

eN

LLS

Position

Estim

ationA

lgorithm

,for

the

1:45A

M

dataset.

The

greencrosses

den

otegrou

nd

truth

;th

ered

arrows

show

the

position

and

orientation

ofeach

node.

208

-3000

-2000

-1000

0

1000

2000

3000

-2000 0 2000 4000 6000

101:(-26.06,-0.35,-2.02,91.8)

100:(33.90,-19.68,1.53,88.9)111:(-25.83,-20.27,2.00,90.0)

104:(-0.14,-0.13,1.23,88.8)

105:(-0.30,-20.22,0.26,85.5)

102:(33.61,26.34,0.17,85.9)

(54.07,-0.03,2.10,91.2):103

109:(33.96,0.08,1.49,91.4)

106:(0.31,25.98,-0.14,89.8)108:(-25.52,26.29,-0.34,88.9)

Figu

re10.3:

Outp

ut

ofth

eR

–θ

Position

Estim

ationA

lgorithm

,for

the

1:59A

M

dataset.

This

dataset

was

the

best

result

forR

–θ.

209

Expt. Time deg C Humidity Nodes Note

1 21:46 19.0 73 102 22:03 18.6 76 103 22:20 19.0 78 104 22:34 18.4 79 105 23.01 18.2 80 106 23:21 17.2 79 107 23:33 17.1 80 108 23:51 16.9 80 109 00:41 16.9 79 10 Cal. Emitters, Reboot10 00:56 16.9 80 1011 01:29 16.6 81 1012 01:45 16.6 81 1013 01:59 16.6 81 1014 02:12 16.6 81 8 2 failed

Table 10.1: Experiment timing and weather conditions.

meter to all emit 100 dB at 1 meter from the emitter. During the calibration

process, one of the nodes was accidentally rebooted. At experiment 14, two nodes

malfunctioned and failed to participate in the system.

10.1.1 Measurement of Ground Truth

The nodes were laid out in a grid in order to simplify the measurement of ground

truth. Each array was mounted on a tripod 1.5 meters above the ground. Each

array was oriented with 0 deg pointed west.

We measured ground truth positions using the Hilti PD–30 laser rangefinder.

By laying the nodes out in a grid, we were able to sight along grid lines and

position the nodes in lines, with measured distances between each node. We then

also measured distances across from one line to another where that was possible.

In some cases, such as between nodes 109 and 100, line of sight was blocked and

210

there was no way to measure an east–west range.

The accuracy of the position measurement was limited by the stability of

the tripods, by our measurement capability, and by time limitations. While the

tripods are fairly stable, they might easily sway on the order of a centimeter.

Our ranging measurements over these fairly long distances were not always made

level, and the target of the laser was not always positioned perfectly. In addition,

the process suffers from additive error because all of our measurements were made

to the next node in the line rather than to an absolute reference position. In the

end, our cross–checking ranges (e.g. diagonal ranges) were generally accurate to

within about 4cm.

We aligned the arrays using a laser held to the side of the array, pointing

toward the next node in the grid. While this did not provide an absolute orienta-

tion reference, it was probably the most accurate solution, given the difficulty of

measuring angles on the small scale of the arrays and the problems using compass

readings in the presence of the speaker magnets.

We observed differences in elevation across the courtyard, but we had no

way to accurately measure them. However, using Google Earth we were able to

capture depth information for our experimental positions. We don’t know the

precision of Google Earth’s depth map, but it reports in units of feet.

In the next several sections we will explore different aspects of the performance

of our system on this data.

10.1.2 Selecting the Residual Cutoff

The first question we would like to answer is to find a good cutoff value for our

studentized residual outlier rejection scheme. Given that we have ground truth

information for this data, we can do an experiment where we keep rejecting the

211

0

2

4

6

8

0 3 6 90

20

40

60

80

Effec

tive

Res

idual

Thre

shol

d

Min

imum

3DPos

itio

nE

rror

,cm

Experiments, ordered by decreasing threshold

Studentized Residual Threshold Achieving Minimum Error

Effective Residual ThresholdMinimum Position Error

Figure 10.4: Results of running our 14 courtyard experiments using a residual threshold

of 2. We see that half of our experiments do equally well with a threshold of 3.

worst residual, and determine the residual that brought the error to a minimum

value. By doing this on a series of datasets from this experiment, we can thus

empirically determine a good cutoff.

In this experiment, we ran through 14 datasets collected from our Court of

Sciences test. These datasets vary from each other in several ways. Since they

were taken over the course of several hours from 9:45 PM through 2:15 AM, the

air temperature and humidity changed over that time period, resulting in changes

in the scaling of the map. In addition, near the end of the experiment several

nodes began to malfunction and failed to fully participate in the system, leading

to a reduction in the amount of data available to the algorithm.

Figure 10.4 shows the result of running our courtyard data with a residual

212

0

0.2

0.4

0.6

0.8

1

30 40 50 60 70 80 90

Fra

ctio

nW

ith

Hig

her

Err

or

Minimum 3D Position Error, cm

Error Achieved Given Different Residual Thresholds

Minimum Position Error, T=2Minimum Position Error, T=3Minimum Position Error, T=4

Figure 10.5: CDF of the results of applying several different residual thresholds to our

14 courtyard experiments.

threshold of 21. When such a low threshold is used, the algorithm will con-

tinue dropping constraints, even after all of the remaining constraints are “good”

constraints that reduce the overall system error. This graph shows two curves:

the minimum position error achieved by dropping constraints, and the residual

threshold required to achieve that minimum. To compute this graph, we recorded

based on ground truth the minimum error achieved each time a new low was

reached, and recorded the lowest–value residual dropped up to that point. Thus,

if we were to re–run one of these experiments with the “effective” threshold set

as our threshold, we would achieve that minimum error, although we might then

continue to drop additional constraints past that point.

We interpret this graph by observing that half of our experiments have an

1Two experiments failed to converge well enough to drop any points, so they were left outof this graph.

213

effective threshold of 3 or greater. Inspection of the raw data showed that even

when the true minimum was achieved with a lower threshold, a threshold of 3

got “most of the way” to the minimum, while dropping many fewer constraints.

Figure 10.5 shows the distributions of minimum error for rejection thresholds of

2, 3, and 4.

Since this result only represents a single dataset in a single environment, we

do not claim that our selection of a residual threshold value is universal. Rather,

we anticipate that the residual threshold might be tuned in the field. Fortunately,

the choice of residual cutoff value is not a difficult parameter to tune. There is

little risk in choosing a sub–optimal value. If we choose too high, there might be

too much bad data, resulting in poor convergence. If we choose too low, good

data may be dropped and the system may drop so much data that convergence

fails, and some nodes are not placed. An incorrect threshold choice is readily

detectable by observing the fit quality metric reported by the average range

residual value, by observing the amount of data dropped, and by observing the

high–level performance of the system. For the remainder of this analysis, we use

a threshold of 3; we anticipate that the threshold of 3 will work well for a wide

range of environments and will rarely need to be adjusted.

10.1.3 Comparison of R–θ and NLLS

In Chapter 4 we described two different position estimation algorithms, R–θ and

NLLS. To evaluate these algorithms, we run each of them on our 14 courtyard

datasets. The results of these runs are shown in Figure 10.6.

The graph shows two curves each for the R–θ and NLLS algorithms. For each

algorithm, the first curve shows position error projected onto the (x, y) plane after

fitting to ground truth, while the second curve shows the 3–D position error. We

214

0.1

1

10

100

1000

0 3 6 9 12 15

Pos

itio

nE

rror

(cm

)

Experiment

Comparison of R–θ and NLLS Algorithms: Position Error

R–θ XYZ Position Error (cm)R–θ XY Position Error (cm)

NLLS XYZ Position Error (cm)NLLS XY Position Error (cm)

Figure 10.6: Position error achieved by the R–θ and NLLS algorithms on our 14 court-

yard experiments, using a residual threshold of 3. The NLLS algorithm consistently

outperforms the R–θ algorithm because it is able to make better use of the more ac-

curate range data. Our 2D results improve upon those in [KMS05] by a factor of

20.

show both of these results for two reasons. First, our system is much more well–

constrained in the X–Y plane, because their placement deviates minimally from

the X–Y plane, and because the zenith angle estimation is generally less accurate

than the other measurements. Second, our ground truth measurements are much

more accurate in the (x, y) plane than in the Z direction. For depth measurements

we are relying on data from Google Earth which is at best accurate to 30 cm.

As we can see, the NLLS algorithm performs quite well in the X–Y plane and

significantly outperforms R–θ in both 2–D and 3–D estimation.

For a point of reference, we can compare the results reported by Kwon et.

al. [KMS05], which reported on a system of 45 nodes deployed on a grid in a

215

0.1

1

10

100

1000

0 3 6 9 12 15

Avg.

Ran

geR

esid

ual

(cm

)

Experiment

R–θ and NLLS Algorithms: Avg. Range Residual

R–θ Avg. Range Residual (cm)NLLS Avg. Range Residual (cm)

0.1

1

10

100

1000

0 3 6 9 12 15

cm

Experiment

NLLS Position Error and Avg. Range Residual

NLLS Avg. XYZ Position Error (cm)NLLS Avg. XY Position Error (cm)

NLLS Avg. Range Residual (cm)

Figure 10.7: (a) Average Range Residual achieved by the R–θ and NLLS algorithms

for our courtyard experiments, using a residual threshold of 3. (b) Average Range

Residual and Position Error for NLLS.

216

60 meter by 55 meter grassy area, with a minimum inter–node spacing of 9 me-

ters. Their system reported an average 2–D position error of 2.47 meters (or,

1.5 meters after dropping the largest 5 errors). In comparison, our system reliably

achieved average 2–D position errors of about 10 cm in a similar–sized deploy-

ment, but with only 10 nodes and a correspondingly larger minimum inter–node

spacing of 20 meters, and with a terrain interrupted by thick hedges. The smaller

number of nodes in our case does not benefit us, since we are not comparing the

computational complexity of the multilateration algorithms, and having fewer

nodes means a less–constrained system.

We observe an anomaly in the graph for experiment 9, where the NLLS algo-

rithm failed to converge. We suspect this was the result of a software bug that

caused a synchronization problem after the accidental reboot of one of the nodes.

This bug has since been fixed.

Figure 10.7(a) compares R–θ and NLLS using our other metric, Average

Range Residual. In this metric, we compute the average range residual, inde-

pendent of ground truth, by subtracting each measured range from the distance

between the computed position estimates in the map. We hoped that this quality–

of–fit metric might be useful as an indicator of position error. Figure 10.7(b) plots

position error and range consistency for NLLS on the same graph. While they

appear to be correlated, it is still not clear whether we can conclude much from

a good fit, although a bad fit is probably indicative of high position error. We

will take a more detailed look at this question in Section 10.2.

217

1.0095

1.01

1.0105

1.011

1.0115

1.0120 3 6 9 12 15

15

16

17

18

19

20

Sca

ling

Fac

tor

Tem

per

ature

,deg

C

Experiment

Correlation of Map Scaling With Temperature

Scaling FactorTemperature, deg C

Figure 10.8: Scaling factors relative to ground truth, and air temperature. We see a

correlation between map scaling relative to ground truth, and air temperature.

10.1.4 Map Scaling With Temperature

As we have previously noted, environmental parameters such as temperature

cause scaling in our estimated position map. Figure 10.8 shows a correlation

between air temperature and the scale of our position estimates.

Recall that our position error metric fits the position estimates to the ground

truth, and that fit includes a scale factor. That scale factor is shown on the left–

hand Y axis of Figure 10.8. The right–hand Y axis shows the air temperature in

deg C recorded during our courtyard experiment at a single point in the middle

of the field.

While the two sets of data show a correlation, they do not match the model

218

for the variation of the speed of sound as a function of temperature2 We do not

have a satisfying explanation for this discrepancy, but our current conjecture is

that there is an additional scaling factor in the system for which we are not

accounting. Future work on this problem might allow this system to produce

highly accurate air temperature estimates.

10.1.5 Repeatability

In this section we analyze the repeatability of our position algorithms across ex-

periments. One of the principal difficulties in analyzing this system is accurately

capturing ground truth. A consistent bias in the output of position estimation

might suggest either a flaw in the ground truth, or a persistent bias in the system

itself.

To investigate this further, we computed statistics for each node’s position

estimates over our courtyard experiments. Note that we dropped experiment 9 as

an outlier—this is fair because we detected a convergence failure and a very poor

fit metric, indicating that the experiment failed. The results of the remaining 13

experiments were summarized for each node by the mean and standard deviation

of each component: X, Y, Z, roll, pitch, and yaw.

The results of these analyses are shown in Figures 10.9 and 10.10. Recall

that to compute our position error metric, we first normalize the map to extract

its shape, by filtering out scale, translation, and rotation relative to the ground

truth landmarks that we use as a template. Once all maps have been fitted to the

ground truth, we can plot the distribution of estimates for a given node over all

experiments, relative to the ground truth values. In the upper plot in Figure 10.9,

we have plotted each node on an X–Y plot relative to the ground truth value.

2We also captured relative humidity data, but including this did not help the fit significantly.

219

-20

-15

-10

-5

0

5

10

15

20

-15 -10 -5 0 5 10 15

cm

cm

NLLS Repeatability: X/Y

100

101

102

105

106

108109

-150

-100

-50

0

50

100

150

100 102 104 106 108 110

cm

Node

NLLS Repeatability: Z

Figure 10.9: Repeatability statistics for position estimates, showing the per–node distri-

bution of deviations from ground truth. All errorbar ranges are ± Standard Deviation.

The mean standard deviations for X, Y, and Z estimates over all nodes are 3.18 cm,

3.85 cm, and 49.15 respectively.

220

-7-6-5-4-3-2-101234

100 102 104 106 108 110

deg

Node

NLLS Repeatability: Yaw

-12

-8

-4

0

4

-16 -12 -8 -4 0 4 8 12

Rol

lin

deg

Pitch in deg

NLLS Repeatability: Roll/Pitch

Figure 10.10: Repeatability statistics for Yaw, Pitch, Roll, computed using the same

method as in Figure 10.9. All errorbar ranges are ± Standard Deviation. The mean

deviation for yaw estimates over all nodes is 1.37 deg.

221

Thus, for example, we can see that node 100 was consistently estimated to be

north of the landmark by 15 cm, and east by 10 cm. If the estimated maps

matched the ground truth exactly, they would all be plotted in the center of the

map at (0,0).

In each case, the range of the errorbars is twice the standard deviation. In

cases such as Node 105 where there is a tight cluster of points substantially offset

from the origin, this suggests that there might be an error in ground truth. In

cases such as Node 100 where there is a diffuse cluster of points, it is less clear.

However, it is worth noting that our ground truth measurement of Node 100 was

based only on range and bearing to 105, because a hedge blocking our line–of–

sight to 109 made a range measurement to the west impossible.

Overall, this data is inconclusive: while it is entirely possible that much of the

offsets we see in (x, y) position are due to errors in ground truth, the deviations

are still small and we have no way to re–examine ground truth. Some of the worst

outliers, such as 100, correspond to nodes that we know were poorly measured,

and given that the ground truth measurements incorporate additive error, errors

of 5–10 cm across the field are not surprising. However, errors on that order

could also result from some potential calibration issues that are currently not

well understood. These are discussed in more detail in Section 10.3.

Repeatability in the Z dimension is shown in the lower plot of Figure 10.9.

Because of the flat topology of this deployment, the Z dimension is poorly con-

strained and must rely largely on the less–accurate angular data for its estimates.

While our ground truth is accurate to only about 30 cm, we see variation on the

order of a meter or more, which we can safely say is estimation error.

Figure 10.10 shows similar statistics computed for our yaw, pitch, and roll

estimates. Here we see that the yaw estimates are quite repeatable with an

222

Figure 10.11: The experimental setup for our system test in the James Reserve in

Idyllwild. Node locations are indicated by red numbered dots. North is toward the top

of the photo; all arrays were aligned by compass to point west.

average range of about 3 degrees. Given the size of the arrays, a ground truth

alignment error of 5 degrees is not surprising. The data shows that the arrays

are mostly oriented accurately, with a few, such as 102 and 105 that seem likely

to be misaligned rather than mis–estimated.

10.2 Forest Outdoor Test: James Reserve

In order to evaluate the performance of the system in an environment more

realistic for our typical applications, we performed another system test in the

223

-4000

-3000

-2000

-1000

0

1000

2000

3000

-1000 0 1000 2000 3000

103:(-0.06,-13.80,1.84,179.7)

100:(-0.75,-34.77,1.30,175.8)

105:(17.54,-11.77,2.13,190.6)

104:(27.24,-0.30,0.89,173.0)

110:(16.42,16.18,-0.44,181.3)111:(5.54,18.01,1.12,177.4)

106:(5.35,25.96,3.65,176.6)

102:(0.10,0.12,0.04,171.7)

109:(-4.09,19.70,2.44,177.8)

108:(17.13,-0.07,-0.21,177.6)

Figure 10.12: 3–D map generated by the NLLS algorithm from our deployment in the

James Reserve. Ground truth is shown as crosses, estimated positions and orientations

as arrows.

224

-4000

-3000

-2000

-1000

0

1000

2000

3000

-1000 0 1000 2000 3000

103:(-0.97,-13.18,2.95,178.2)

100:(-2.62,-33.11,1.60,177.9)

105:(16.27,-11.44,1.92,189.1)

104:(26.15,-0.72,0.59,171.5)

110:(15.85,15.84,-0.22,179.3)111:(5.05,17.44,1.20,173.5)

106:(3.84,27.65,4.17,172.4)

102:(0.16,-0.05,-0.02,175.6)

109:(-4.73,19.65,3.24,174.0)

108:(16.28,-0.09,0.23,175.5)

Figure 10.13: 3–D position estimation map generated by the R–θ algorithm. Both this

and Figure 10.12 use data captured at 10:30 AM on September 29, 2005.

225

James Reserve in Idyllwild, on September 28, 2005. For this test, we planted 10

stakes, roughly in the locations shown in Figure 10.113. We aligned the arrays

using a compass, lining the edge of the compass up with the edge of the array by

eye. The arrays were aligned with 0 deg pointing west.

We attempted to measure ground truth positions for the arrays, but it was

very difficult to get accurate readings because of the foliage and significant varia-

tions in elevation. In order to simplify the collection of ground truth, we aligned

the arrays linearly in a grid–like topology, and measured point–to–point ranges

with the laser rangefinder. We used a hand–held altimeter to get approximate

elevations, accurate to at best 1 meter. Based on this data, we have an ap-

proximate map, but it is not nearly as accurate as our courtyard ground truth

data.

Figures 10.12 and 10.13 show maps generated by our two positioning algo-

rithms, fitted to our approximate ground truth measurements. Both maps show

data from 10:30 AM, 5 hours after the beginning of the experiment. Due to bat-

tery problems during the test, only 6 experiments included all 10 nodes at once,

for two brief periods at 10 AM and at 2 PM. We selected this particular experi-

ment because of the experiments with 10 nodes, it reported the best position error

score, 44 cm average position error (139 cm for the R–θ algorithm). Table 10.2

shows the position error results for the 6 experiments with all 10 nodes.

When we consider all of the experiments (i.e. including those with fewer than

10 nodes), we find a much greater variation in the position error. Figure 10.14

shows histograms of the position error and range consistency metrics using the

NLLS and R–θ algorithms, over all experiments. The NLLS histogram includes

only those trials where the NLLS algorithm reached convergence, but is repre-

3Image retrieved from the James Reserve GIS system by Vanessa Rivera Del Rio.

226

0

0.1

0.2

0.3

0.4

0 500 1000 1500 2000

Fra

ctio

n

cm

Position Errors from James Reserve

R–θ AlgorithmNLLS Algorithm Reached Convergence

0

0.1

0.2

0.3

0.4

0 10 20 30 40 50

Fra

ctio

n

cm

Avg. Range Residuals from James Reserve

R–θ AlgorithmNLLS Algorithm Reached Convergence

Figure 10.14: Histograms of the Position Error and Average Range Residual Metrics for

the James Reserve data. NLLS outperforms R–θ according to both metrics, although

inaccuracies in ground truth likely prevent errors under 50 cm.

227

Position Error (cm) Average Range Residual (cm)Time NLLS R–θ NLLS R–θ

10:34 AM 51 153 2.41 9210:39 AM 44 139 4.42 10710:44 AM 297 110 174 11410:49 AM 59 359 2.07 432:46 PM 55 186 2.38 3082:51 PM 55 145 2.96 257

Table 10.2: Position Error and Average Range Residual metrics for the NLLS and R–θ

algorithms, run on the 6 10–node experiments captured at the James Reserve. For the

experiment at 10:44 AM, the NLLS algorithm failed to reach convergence.

sented as a fraction of all trials.

The NLLS algorithm generally performs better, although both algorithms

have a significant number of results with very high average error values. Inves-

tigating some of these cases, we found that these large values of the position

error metric were often due to a single node that was placed quite far away from

the correct location. We found that dropping the worst position error from the

average reduced the percentage of experiments with average position errors over

500 cm from 30% to 2.6%.

Note that the histogram of range residuals in Figure 10.14(b) does not show

the cluster of large errors we see in the position error data. This suggests that the

cause of the mis–placed nodes is under-constrained systems, which would tend to

yield a low average range residual. Unfortunately, we did not observe any signifi-

cant predictive relationship between average range residual and average position

error. We believe that more work is needed to discover a metric independent of

ground truth that can be used to identify bad fits. Perhaps these metrics can be

extended to take into account how well–constrained a node is: the average range

residual and the average position error among well–constrained nodes. We leave

228

-40

-20

0

20

40

-100 -80 -60 -40 -20 0 20 40 60 80 100

cm

cm

NLLS Repeatability: X/Y

100102

103

104

105

106

108

109

110 111

-100

-50

0

50

100

100 102 104 106 108 110

cm

Node

NLLS Repeatability: Z

Figure 10.15: Repeatability statistics for position estimates, over all 10–node James

Reserve data. All errorbar ranges are ± Standard Deviation. The mean standard

deviations for X, Y, and Z estimates over all nodes are 3.48, 3.78, and 17.1 respectively.

229

-15

-10

-5

0

5

10

15

100 102 104 106 108 110

deg

Node

NLLS Repeatability: Yaw

-4

0

4

-8 -4 0 4

Rol

lin

deg

Pitch in deg

NLLS Repeatability: Roll/Pitch

Figure 10.16: Repeatability statistics for Yaw, Pitch, and Roll. All errorbar ranges are

± Standard Deviation. The mean standard deviation for yaw estimates over all nodes

is 3.15 deg.

230

these efforts to future work.

Figures 10.15 and 10.16 show statistics for the variation in position estimates

over the 5 experiments for which 10 nodes reported and the NLLS converged.

In terms of X and Y, this data exhibits comparable variance to the courtyard

data, but with a much wider spread in the means of the position errors, perhaps

the result of errors in ground truth. In the case of the Z axis data shown in

Figure 10.15(b), the variance is actually lower than the comparable data from

the courtyard experiment. This improvement is probably due to the fact that the

James Reserve node placement has more variation in the Z axis, yielding a more

well–constrained system.

10.3 Analysis of Symmetric Ranges

Because it is often difficult to get accurate ground truth information from our

less–controlled system tests, this data is not always that helpful to characterize

the performance of the underlying ranging estimates. However, we can use this

data to learn about the ranging system by comparing “symmetric ranges”, e.g.

the range from Node A to Node B vs. the range from Node B to Node A.

Figure 10.17 and Figure 10.18(a) show symmetric range data from our James

Reserve experiment, comparing three nodes against each other. We can observe

several properties of this data. First, data has a consistent curve downwards,

reflecting a temperature increase from the starting time of 5:30 AM through to

12:30 AM. Second, the initial data is much cleaner than the later data. We believe

that this additional noise is due both to the increase in audible noise during the

day, as well as weather conditions, including solar heating and wind currents.

Some of this noise is also attributable to synchronization errors.

231

3450

3500

3550

3600

0 5000 10000 15000 20000 25000

cm

Experiment time (sec after 5:30 AM)

Symmetric Ranges: 100 vs. 102

100→ 102102→ 100

1400

1450

0 5000 10000 15000 20000 25000

cm



102→ 103103→ 102

Figure 10.17: Symmetric ranges, showing variation as a function of temperature and a

consistent offset.

232

2050

2100

2150

0 5000 10000 15000 20000 25000

cm



100→ 103103→ 100

2100

2120

2140

2160

0 50 100 150 200

cm

Experiment Sequence

Raw Symmetric Ranges: 100 vs. 103

100→ 103103→ 100

Figure 10.18: (a) Symmetric ranges for 100–103, and (b) Raw range data showing

probable synchronization failure.

233

To get better insight into how well the time synchronization system is working,

we show the raw experiment data in Figure 10.18(b). Whenever there is an error

in synchronization, forward and reverse ranges are likely to be affected oppositely.

The reason for this is that if the skew rate parameter of the conversion is incorrect,

this parameter will skew times positively in one direction and negatively in the

other. In the graph, we see that there are several instances where the lines draw

diverge symmetrically.

In the time since we performed this experiment, we have located and fixed a

bug in the time synchronization system that would retain old conversions long

after they were no longer valid. This problem was not noticeable in the lab or

in the courtyard test, because in those tests the connectivity density was higher.

However, in our James Reserve deployment, we encountered a sparser network

connectivity graph and in addition had several nodes run out of battery—leaving

conversions still in the system. We addressed this problem by adding a timeout

to remove conversions that are not periodically updated.

In addition to synchronization errors, there appears to be a fixed bias for each

node with respect to other nodes. We can see how this might be possible if each

node has some fixed delay that varies from node to node, because that delay

would add in when sending, and subtract when receiving. However, we currently

do not have an explanation for this anomaly.

234

CHAPTER 11

Conclusions

This work has made many important contributions to the field. Some of these con-

tributions have already been published, in [GE01] [GBE02] [MGE02] [EGE02b]

[GEC04] [GSR04] [GLP05]. Others, including a detailed description of the plat-

form and the position estimation algorithm, have not yet been published.

11.1 Summary of Contributions

In this work, we have demonstrated a deployable, wireless platform capable of

hosting distributed acoustic sensing applications. Our acoustic ranging and po-

sition estimation application requires many of the same capabilities as would a

distributed sensing and detection application such as woodpecker detection. Dur-

ing our system testing, we tested the system outdoors and developed many of the

necessary tools for deploying the system practically in the field. The results of

our position estimation tests show that we are able compute position estimates

to within our target tolerances, and orientation estimates nearly meeting our tar-

gets. With some additional work, we believe that these results can be improved

to easily meet our requirements.

Platform Development. In the process of building and testing this system,

we put significant effort into the development of a reusable platform, with a large

235

investment in reusable system software components. Over the course of several

years we developed Emstar, a software system designed to simplify the devel-

opment of wireless embedded networked systems1. Emstar includes numerous

reusable components, and embodies a set of design principles that we have used

to build a complex, yet robust system. We have implemented Emstar above a

hardware layer which we integrated to provide a platform capable of distributed

acoustic sensing and local DOA estimation via a microphone array.

Time–Synchronized Sampling and Multihop Coordination. Our imple-

mentations of time–synchronized sampling and multihop coordination primitives

are a key part of the platform we have built. We demonstrate their utility in

our implementation of acoustic ranging and position estimation, showing how

these components of the platform drastically simplify the implementation of our

position estimation application. We use the capability to do precise time syn-

chronization over multiple hops to perform accurate time–of–flight ranging, and

we use the StateSync coordination primitive to share that data reliably and ef-

ficiently over multiple RF hops, greatly simplifying many system aspects of the

position estimation algorithm.

Ranging and Direction of Arrival Estimation. We have developed a set

of algorithms that yield highly accurate range and DOA estimates. While these

techniques are not individually novel, their integration into a working distributed

system and a successful position estimation algorithm is. In this work we combine

many areas in signal processing to yield a high–performance ranging system, and

prove out our basic platform design with a demanding algorithm. While there is

1Emstar includes the work of many contributors, a few of whom are: Jeremy Elson, ThanosStathopoulos, Alberto Cerpa, Nithya Ramanathan, and Martin Lukac.

236

still room for improvement and further testing, our ranging system achieves high

precision, with a standard deviation of 1.7 cm, long ranges of up to 90 meters,

and excellent noise immunity. The ranging is resilient to obstructing foliage, as

we have demonstrated in our tests at the James Reserve. The DOA estimation

determines estimates in 3–D, and obtains azimuth angles with a standard de-

viation of 0.96 deg and zenith angles between -30 and 45 deg with a standard

deviation of 0.86 deg, and slightly worse performance for angles above 45 deg.

The performance of the ranging sub–system is comparable to the best work

based on ultrasound [SHS01], and significantly better than that of previous sys-

tems based on audible sound [KMS05] [SBM04]. However, our system also per-

forms well in the presence of foliage. The performance of the DOA sub–system

performs considerably better than similar systems based on ultrasound, such

as [PMB01], perhaps because our system can perform sub–sample phase compar-

isons.

Positioning Application Performance We have demonstrated position es-

timation performance far better than other work in the field, under more difficult

circumstances. The work of Kwon et. al. [KMS05] is closest to ours in terms of

the objective: outdoor, ad–hoc position estimation in 2–D, in a grassy environ-

ment. Relative to this, we took our deployed testing a step farther, performing

our tests in a very difficult environment containing significant levels of obstruc-

tion and multipath interference, and deployed it more sparsely, as described in

Section 10.2. Despite this more challenging environment, our system performed

considerably better, giving typical average 3D position errors of 60 cm relative

to the 1.5–2.6 meter 2D position errors reported in [KMS05]. Furthermore, our

repeatability analysis and a comparison to our courtyard test described in Sec-

tion 10.1 suggests that the true accuracy is actually much higher than is revealed

237

through comparison to our approximate ground truth measurements.

The effort invested in building the platform and the system software com-

ponents such as Emstar, the Synchronized Sampling layer and the StateSync

protocol has ensured that this system works as a stand–alone system, robustly,

and runs on a wireless, embedded system. In addition, it provides a platform

and a foundation for other applications to be built above. Applications such as

the woodpecker detection and monitoring system are already being built to run

on this platform.

11.2 Discussion

We now consider the system as a whole, and ask several questions about the

system: how practical is it, how well does it scale, and what can be generalized

from it?

11.2.1 Practicality and Cost of the System

Our system is based on an Intel PXA platform with 64 MB of RAM and 32 MB

of flash2. This increases the cost of the platform both in terms of dollars and en-

ergy consumed. Our current system is a prototype and has not been optimized;

if it were optimized the cost in both metrics could be considerably lower. On

the hardware side, the processor board could be integrated with a customized

sampling board and possibly with a DSP pre–processor. The microphone ar-

ray could be redesigned to be lighter, smaller, and more symmetrical in shape.

The software could be greatly optimized both by optimizing the code as well as

2The CPU board is developed and marketed by Sensoria Corporation. The acoustic platformhardware was integrated by Jefferey Tseng, and is now being offered as a product by AevenaCorporation.

238

optimizing the operation of the system.

Many of the other systems that have been documented in the literature are

based on much lighter–weight platforms. Work as UIUC [KMS05] and Vander-

bilt [SBM04] are based on the Mica2 [HC02], while the Cricket [PCB00] [PMB01]

system is based on its own platform, and the AHLoS work [SHS01] also con-

structed a specialized platform based on a 16–bit ARM processor. In our system

we did not attempt to fit everything on an 8– or 16–bit processor, in part to

avoid early optimization, but also because we wanted to build a general–purpose

platform to support easy construction of experimental signal processing applica-

tions. While it might be possible to use an 8–bit microcontroller to do woodpecker

detection, it definitely will not be easy3.

Given our first–order goal of supporting embedded signal–processing applica-

tions, we chose to develop a system to be sparsely deployed in relatively small

numbers, and to have each node host an acoustic array. These applications are

also the source of our more stringent accuracy requirements, and of the require-

ment that we estimate array orientation.

In the final analysis, a distributed acoustic detection system will need all of

the features of the platform we have built: a local microphone array, sufficient

local processing and storage, high bandwidth wireless network with an appropri-

ate networking stack, and high accuracy time synchronization. Our automatic

position estimation system not only provides another key feature of the platform,

but it also comes at no additional cost, because all of the features it depends on

are already required by our set of target applications.

3This is a generous statement; distinguishing one species from another at a reasonabledistance, if done digitally, requires quite a bit of processing.

239

11.2.2 Scaling Properties and Applicability Across Environments

We did not do a thorough investigation of the scaling properties of our system.

This was partly because we only constructed 10 nodes, so we could not test

larger networks in real life. However, we are confident that as the number of

nodes grows, the resources required by the multilateration algorithm will grow.

While we did not address this issue in this system, we have experience from

a prior system that scaled to 90 nodes [MGE02]. In this system, we gradually

accreted a coordinate system by adding 14 nodes at a time. We selected the

nodes to add based on their connectivity to each other: starting with the most

well–connected node and next adding in the node that was best connected to the

nodes in our current set. This method was simple and appeared to work well,

although we did not carefully measure its performance.

Scaling might also be accomplished by distributing the multilateration algo-

rithm to several nodes, and stitching the resulting coordinate systems together.

Some of the work we have done on the map fit heuristics in our position error

metric might apply well to this stitching process. If each node computed a patch

of the coordinate system nearest them and broadcast the results via StateSync,

data from the other nodes could be stitched in using translation, scaling, and

rotation. In a more sophisticated solution, StateSync can be used to elect leaders

who can locally coordinate this process.

11.2.3 Ideas and Components to Carry Forward

Many aspects of this system will be useful for other research and system devel-

opment efforts. First of all, the platform itself is designed to be used to build

new applications. Enormous effort is required to assemble the hardware, includ-

240

ing the physical acoustic array, and to get the whole software system working

including getting all of the parts of the system synchronized. Doing this requires

attention to details at every layer of the system from physical hardware and me-

chanical construction to system software to kernel drivers to the many layers of

software and the position estimation application that calibrates the deployed sys-

tem. Having all of this in one off–the–shelf box is a huge step forward for anyone

contemplating doing distributed acoustic sensing and detection. Already we have

several students beginning to work with this platform, and we hope that trend

continues.

In addition, many of the sub–systems and components are also valuable on

their own.

Emstar. The Emstar system is a valuable resource and is being used in numer-

ous other contexts, both within our lab and outside it. Emstar provides a software

framework designed for wireless embedded networking [GEC04], that is integrated

with simulation tools and that comes with a collection of tools and components

that are useful in deployment. Emstar also simplifies the integration of Motes

and other TinyOS devices into a common, heterogeneous system [GSR04].

StateSync. The StateSync primitive and implementation is currently being

used and extended by several other projects in our lab. StateSync provides a

simple interface to efficient, reliable, low–latency state dissemination to one–hop

neighbors or over multiple hops. StateSync has broad application to wireless

networked systems, facilitating the implementation of efficient routing protocols,

leader election algorithms, and distribution of configuration or calibration data.

241

Synchronized Sampling and Audio Server. The code for the Synchronized

Sampling layer is available, although it is not always trivial to port to hardware

because of the dependence on kernel modifications and in some cases special-

ized firmware. However, the audio server interface is easily ported, and as new

platforms are standardized, kernel patches can be implemented to add the syn-

chronization hooks. There is already one other student in our lab who has ported

the Audio Server for use on an iPAQ for a acoustic sensing project that does not

require tight synchronization.

Ranging and DOA Estimation. The Ranging and DOA estimation tech-

niques are not necessarily novel, although they collectively represent a working

system: an integration of known techniques that yields a characterized perfor-

mance. In our work we identified a few important ideas that are worth not-

ing. First, that 2–TDOA schemes work better than Angular Correlation (AC)

schemes, because AC schemes are too sensitive to the exact placement of micro-

phones, whereas multiple 2–TDOA measurements can be combined while allowing

for slip in the microphone placement. Second, that by interpolating the corre-

lation function at a higher resolution we are readily able to achieve sub–sample

phase comparison in the time domain.

Position Estimation and Metrics. We have not seen the exact formulation

of our position estimation algorithm in prior work, so this may be of interest for

those researchers looking for a position estimation algorithm that works well for

3–D systems where angular data is available. In addition, our approach to fitting

for our position error metric, and our delineation from the basic Procrustean

approaches may at least be a good starting point for finding a good anchor–

free position error metric, and may also be useful for use in stitching together a

242

coordinate system from patches. While our metric has performed adequately for

our current needs, we don’t claim that it is ideal or complete. Applying some

of the more advanced ideas in [DM98] might well yield significant improvements

over our metric.

The Utility of Angular Measurements. Another idea that we have demon-

strated in this work is that angular information is very important. We use angular

information in several ways:

• To get an initial guess before iterative refinement.

• To cross–check range data and detect likely reflections.

• To resolve many types of geometric ambiguity.

In practice, angular information is critical to get the vertical topology right,

because it is often difficult to place nodes with sufficient geometric diversity to

resolve all ambiguities. To demonstrate this, we ran our James Reserve data with

angles disabled. If all angles were disabled, the NLLS convergence failed because

the system was under–constrained. If θ is used without φ, the system converged

to a folded configuration, shown in Figure 11.1.

The ambiguity that caused the fold might be eliminated with more ranges,

especially if the nodes are well–distributed vertically. However, distributing nodes

vertically is often very inconvenient, because it requires mounting nodes in trees

or tall towers. By including angular information, we increase the robustness

of the system to poorly constrained regions, enabling correct results with fewer

nodes and sparser, easier–to–deploy networks. Angular information also helps

the system scale by resolving ambiguities locally that otherwise might require

information from other parts of the network.

243

-2000

0

2000

4000

-4000-2000

02000

4000

-1000

-800

-600

-400

-200

0

200

400

Z cm

Comparison of solutions with and without using φ

Position Estimates Using φPosition Estimates Ignoring φ

Ground truth

X cm

Y cm

Z cm

Figure 11.1: 3–D plot showing the importance of using the φ angle information. Al-

though the system converged without φ, it converged to a folded configuration.

244

CHAPTER 12

Future Work

While this work has made much progress, there are many areas where improve-

ments can still be made. More testing and experience with the platform is needed;

the feedback from initial users of the platform should be valuable input when de-

termining which future improvements to make a priority.

12.1 Algorithm Improvements

A number of improvements can be made to the algorithms used in this work.

Improved Position Error Metric and Fit. As we alluded in Section 4.6, our

position error metric might be improved to better apply the Procrustes method.

In our current implementation we took several shortcuts to enable outlier rejection

that might lead to skewed data. This will not be a simple application of the

techniques in [DM98], but it might yield publishable position estimation metrics

that other researchers in the area could use.

In addition, it would be very useful to devise a metric that could tell us

when the system might have converged to a wrong answer, without access to

ground truth. Our existing Average Range Residual metric tells us whether the

system converged well, but it does not take into account how well–constrained

that configuration is.

245

Better Confidence Metrics. Our ranging and DOA estimators output con-

fidence values to indicate the quality of the detection. Currently the ranging

metric is based on the SNR of the detected pulse, but the angular metric is a

somewhat ad–hoc formula. It would be useful to devise a more useful confidence

metric, perhaps related to the quality–of–fit of the angular estimator.

Improvements to the Optimization Method. While our methods appear

to work in practice, we have at times tuned the algorithms based on measure-

ments from our tests. In our NLLS and R–θ implementations we have used

weightings that we derived from tests under controlled conditions, yet we are

applying these distributions in other environments that may have different prop-

erties. Our outlier rejection scheme relies on the selection of a threshold for

rejection of constraints. We determined this threshold empirically using ground

truth from our Court of Sciences tests, but we wish to apply this in arbitrary

environments.

These concerns suggest that more work is needed to produce a more gener-

alized solution to the problem. One approach would start by making a minimal

set of assumptions about the data, and using these assumptions to build a model

that can be used to derive solutions to some of these questions. In this effort, we

want to either make as few assumptions as possible, or require that each deploy-

ment make some number of ground truth measurements so that the results may

be verified.

In this effort, we will first recover more accurate measurements of the James

Reserve ground truth deployment, so that we can better analyze the data. In

addition, deploying the system in new kinds of environment will also yield more

data with which to test our techniques.

246

Investigate the Forward/Reverse Range Offset. The offset observed in

Section 10.3 is not well understood. If this is a per–node offset or per–array

offset, then we should calibrate the systems to eliminate it. If the offset is not

stable, we should determine its cause and attempt to eliminate it. This offset may

account for the fact that our positioning performance does not seem to match

our ranging performance as well as expected.

Investigate Scaling Issues. We have thus far not addressed any scaling issues.

Since we only have 10 nodes it is difficult to investigate these issues in practice.

However, simulation work might be useful in testing possible staged or distributed

multilateration schemes, determining in the process how much accuracy is lost

when we solve the system in segments.

Implement Leader Election and Centralized Multilateration. The cur-

rent application performs periodic, uncoordinated ranging, and publishes all

range and angle data so that the multilateration algorithm can pick it up and

process it into a map. However, to get the best results, we want to do all ranging

in as short a time a possible. Implementing this would require additional coordi-

nation, and our idea here is to elect a leader who will coordinate all of the nearby

nodes to chirp in close succession, and then collect all the data to perform the

multilateration. After the multilateration, it would broadcast the results, and the

nodes receiving the results would stitch the maps from different leaders together

into a single map.

This implementation would be more complex than the current one, but using

StateSync it should be very straightforward to implement the leader election and

results broadcast components. The schedule for ranging can also be published

using StateSync, using the gsyncd global time distribution component to provide

247

a common timebase within which to reference the ranging. The resulting system

should provide a significant performance improvement during daytime when the

environmental parameters change most rapidly.

12.2 Platform Improvements

The platform is a prototype, and there are many ways to physically improve it:

• The sampling boards are expensive and cumbersome. Replacing them

with a custom board would be a cost savings and might improve the sys-

tem as well (e.g. we could eliminate the self–ranging speaker and many

workarounds in the audio driver). The audio amplifier should also be re-

placed.

• The microphone array configuration is asymmetric and would probably be

improved by changing to a symmetric configuration. The mounts that hold

the microphones should be replaced, because they shadow other micro-

phones in the array from certain angles, and are not mechanically held as

tightly as they should be. The “self–ranging” speaker should be positioned

so it does not block channel 0.

• An accelerometer should be added to the array to detect movement as well

as to enable correction for roll and pitch. An EEPROM on the array would

also be useful, so that calibration information and an ID can be stored

there.

• An 802.15.4 or Mica2 radio should be added to provide a low power paging

channel, to improve synchronization, and to communicate with Motes.

248

Calibration of Microphone Arrays. In our testing, we discovered a need

to better calibrate the microphone arrays. Specifically, a technique that would

determine more exactly the position of the microphones would result in better fits

in the DOA stage of the system. We can then associate calibration information

with each array, and use that information to improve the DOA estimates.

It would also be useful to quantify the accuracy of our model defining phase

lag as a function of DOA. We currently assume that the lags are a simple function

of the incoming angle, but if one of the microphones is shaded that will likely

induce excess path length to that microphone. A more thorough angular test

might reveal new information about the relationship between incoming angle

and the phase offsets.

Re–factor and Improve Multihop StateSync Implementation. Our ini-

tial experiences in the field have shown some problems using the multihop version

of StateSync. The performance is not as we had hoped, and we believe the cause

is that StateSync tries to do too much. To address this, we propose refactoring

Multihop StateSync and removing the currently existing “clustering algorithm”.

We believe that this will result in the creation of several components that are

individually more useful than the current monolithic StateSync implementation.

We can then re–assemble the StateSync functionality from this decomposed set

of modules.

Software Improvements and Optimization. In our work thus far we have

done very little optimization. Now that we are working with more complex sys-

tems, it may be time to start looking more carefully at resource usage and finding

ways to prune resource usage. We think that this process may also result in Em-

star being generally a simpler system, with fewer, more manageable components.

249

Two areas of optimization that we see currently are memory usage and mes-

sage passing overhead for high–rate sensor devices. We propose to address mem-

ory by examining the system to find out where the memory is used and for what

purpose. We propose to implement a shared–memory version of sensor device

that uses FUSD message passing for notification and control traffic only, and

uses shared–memory for direct data access.

Development of New Applications. Several new applications will serve to

refine and improve the platform. A mote localization system that localized Mote–

based emitters using DOA from multiple points would be a useful application of

this platform. As a mote localization scheme, it has the advantage that there

is no need for precise time synchronization to the motes, which substantially re-

duces the level of integration required. Other applications, such as woodpecker

detection, would also test the system in new ways. One interesting class of appli-

cations is a “call and response” application, in which the node emits a particular

animal call at a particular time, and listens for animals to respond to that call.

New Tests. Our range test in lot 9 was affected by a synchronization problem,

and possibly by weather. We think that redoing the test in an indoor environ-

ment, such as an underground parking garage, would result in more consistent

data.

We would also like to re–do the Court of Sciences test with more accurate

ground truth measurements, in addition to re–measuring the James Reserve de-

ployment.

250

References

[ADB04] A. Arora, P. Dutta, S. Bapat, V. Kulathumani, H. Zhang, V. Naik,V. Mittal, H. Cao, M Demirbas, M. Gouda, Y. Choi, T. Herman,S. Kulkarni, U. Arumugam, and M. Nesterenko. “A Line in the Sand:A Wireless Sensor Network for Target Detection, Classification andTracking.” Computer Networks, 46(5):605–634, December 2004.

[APB02] S. Azou, C. Pistre, and G. Burel. “A chaotic direct sequence spread-spectrum system for underwater communication.” In Proceedings ofthe IEEE Oceans Conf., Biloxi, Mississippi, October 2002.

[ARE05] A. Arora, R. Ramnath, E. Ertin, P. Sinha, S. Bapat, V. Naik, V. Ku-lathumani, H. Zhang, H. Cao, M. Sridharan, S. Kumar, N. Sed-don, C. Anderson, T. Herman, N. Trivedi, C. Zhang, M. Nesterenko,R. Shah, S. Kulkarni, M. Aramugam, L. Wang, M. Gouda, Y. Choi,D. Culler, P. Dutta, C. Sharp, G. Tolle, M. Grimmer, B. Ferriera, andK. Parker. “ExScal: Elements of an Extreme Scale Wireless SensorNetwork.” In The 11th IEEE International Conference on Embeddedand Real–Time Computing Systems and Applications (RTCSA 2005),2005.

[Baa05] Tom Van Baak. “Experiments With a PC Sound Card.” TheLeapSecond Web Site, 2005.

[BC91] Kenneth Birman and Robert Cooper. “The ISIS Project: Real Ex-perience with a Fault Tolerant Programming System.” OperatingSystems Review, pp. 103–107, April 1991.

[BFH03] Robert Braden, Ted Faber, and Mark Handley. “From protocol stackto protocol heap: role-based architecture.” SIGCOMM Comput.Commun. Rev., 33(1):17–22, 2003.

[BHE00] Nirupama Bulusu, John Heidemann, and Deborah Estrin. “GPS–lesslow cost outdoor localization for very small devices.” IEEE PersonalCommunications, 5(5):28–34, 2000.

[BP00] P. Bahl and V.N. Padmanabhan. “RADAR: An in–building RF–based user location and tracking system.” In Proceedings of IEEEInfocom. IEEE, 2000.

[Bro86] Rodney Brooks. “A Robust Layered Control System for a MobileRobot.” IEEE Journal of Robotics and Automation, 2(1), 1986.

251

[Byt05] Vladimir Bytchkovskiy. “Discussion of EmStar Layering.” PersonalCommunication, 2005.

[CAB03] Douglas S. J. De Couto, Daniel Aguayo, John Bicket, and RobertMorris. “A High-Throughput Path Metric for Multi-Hop WirelessRouting.” In Mobicom 2003. ACM, 2003.

[CDG03] K. Chintalapudi, A. Dhariwal, R. Govindan, and G. Sukhatme. “Onthe feasibility of ad–hoc localization systems.” Technical Report 03-796, Computer Science Department, University of Southern Califor-nia, 2003.

[CGS04] Krishna Chintalapudi, Ramesh Govindan, Gaurav Sukhatme, andAmit Dhariwal. “Ad-Hoc Localization Using Ranging and Sectoring.”In INFOCOM ’04: Proceedings of the IEEE Inforcom 2004, 2004.

[CJB01] B. Chen, K. Jamieson, H. Balakrishnan, and R. Morris. “Span: Anenergy–efficient coordination algorithm for topology maintenance inad hoc wireless networks.” In Proceedings of the International Con-ference on Mobile Computing and Networking (MobiCom2001), pp.85–96, 2001.

[CKS03] N.S. Correal, S. Kyperountas, Q. Shi, and M. Welborn. “An UltraWideband Relative Location System.” In Proceedings of the IEEEConference on Ultra Wideband Systems and Technologies, pp. 394–397, November 2003.

[Cla88] David D. Clark. “The Design Philosophy of the DARPA InternetProtocols.” Computer Communications Review, 18(4):106–114, 1988.

[CWP05] Alberto Cerpa, Jennifer L. Wong, Miodrag Potkonjak, and DeborahEstrin. “Statistical Model of Lossy Links in Wireless Sensor Net-works.” In IPSN ’05: Proceedings of the Fourth ACM/IEEE Inter-national Conference on Information Processing in Sensor Networks,2005.

[DM98] Ian L. Dryden and Kanti V. Mardia. Statistical Shape Analysis. JohnWiley and Sons, 1998.

[DZL05] Ramani Duraiswami, Dmitry N. Zotkin, Zhiyun Li, Elena Grassi,Nail A. Gumerov, and Larry S. Davis. “High Order Spatial AudioCapture and its Binaural Head–Tracked Playback over Headphoneswith HTRF Cues.” Journal of the Audio Engineering Society, 2005.

252

[EBB03a] J. Elson, S. Bien, N. Busek, V. Bychkovskiy, A. Cerpa, D. Gane-san, L. Girod, B. Greenstein, T. Schoellhammer, T. Stathopoulos,and D. Estrin. “EmStar: An Environment for Developing WirelessEmbedded Systems Software.” Technical Report CENS TechnicalReport 0009, Center for Embedded Networked Sensing, University ofCalifornia, Los Angeles, March 2003.

[EBB03b] Jeremy Elson, Solomon Bien, Naim Busek, Vladimir Bychkovskiy,Alberto Cerpa, Deepak Ganesan, Lewis Girod, Ben Greenstein, TomSchoellhammer, Thanos Stathopoulos, and Deborah Estrin. “Em-Star: An Environment for Developing Wireless Embedded SystemsSoftware.” Technical Report CENS-0009, Center for Embedded Net-worked Sensing, 2003.

[EGE02a] Jeremy Elson, Lewis Girod, and Deborah Estrin. “Fine–Grained Net-work Time Synchronization using Reference Broadcasts.” In OSDI,pp. 147–163, Boston, MA, December 2002.

[EGE02b] Jeremy Elson, Lewis Girod, and Deborah Estrin. “A Wireless Time–Synchronized COTS Sensor Platform, Part I: System Architecture.”In IEEE CAS Workshop on Wireless Communications and Network-ing, 2002.

[EGE04] Jeremy Elson, Lewis Girod, and Deborah Estrin. “EmStar: Develop-ment with High System Visibility.” IEEE Wireless CommunicationsMagazine, 2004.

[Els02] Jeremy Elson. FUSD: Framework for User Space Devices, 2002.

[Els03] Jeremy Elson. Time Synchronization in Wireless Sensor Networks.PhD thesis, Univerity of Caliornia at Los Angeles, 2003.

[ER02] Jeremy Elson and Kay Romer. “Wireless Sensor Networks: A NewRegime for Time Synchronization.” In First Workshop on Hot Topicsin Networks (HotNets-I), 2002.

[FJL97] Sally Floyd, Van Jacobson, Ching-Gung Liu, Steven McCanne, andLixia Zhang. “A reliable multicast framework for light-weight sessionsand application level framing.” IEEE/ACM Trans. on Networking(TON), 5(6):784–803, 1997.

[GB82] David Gelernter and Arthur J. Bernstein. “Distributed communica-tion via global buffer.” In PODC ’82: Proceedings of the first ACMSIGACT-SIGOPS symposium on Principles of distributed computing,pp. 10–18, New York, NY, USA, 1982. ACM Press.

253

[GBE02] Lewis Girod, Vladimir Bychkovskiy, Jeremy Elson, and Deborah Es-trin. “Locating tiny sensors in time and space: A case study.” Inin Proceedings of ICCD 2002 (invited paper), Freiburg, Germany,September 2002. http://lecs.cs.ucla.edu/Publications.

[GE01] L. Girod and D. Estrin. “Robust Range Estimation Using Acousticand Multimodal Sensing.” In International Conference on IntelligentRobots and Systems, October 2001.

[GEC04] Lewis Girod, Jeremy Elson, Alberto Cerpa, Thanos Stathopoulos,Nithya Ramanathan, and Deborah Estrin. “EmStar: a Software En-vironment for Developing and Deploying Wireless Sensor Networks.”In Proceedings of the 2004 USENIX Technical Conference, Boston,MA, 2004. USENIX Association.

[GGS05] S. Ganeriwal, D. Ganesan, H. Shim, V. Tsiatsis, and M. Srivastava.“Estimating Clock Uncertainty for Efficient Duty Cycling in SensorNetworks.” In Proceedings of the 3rd International Conference onEmbedded Networked Sensor Systems (SenSys05). ACM, November2005.

[GKE04] Ben Greenstein, Eddie Kohler, and Deborah Estrin. “A Sensor Net-work Application Construction Kit (SNACK).” In Proceedings of thesecond international conference on Embedded networked sensor sys-tems. ACM Press, 2004.

[GKS03] S. Ganeriwal, R. Kumar, and M. Srivastava. “Timing Sync Protocolfor Sensor Networks.” In Sensys, Los Angeles, 2003.

[GLP05] Lewis Girod, Martin Lukac, Andrew Parker, Thanos Stathopoulos,Jeffrey Tseng, Hanbiao Wang, Deborah Estrin, Richard Guy, andEddie Kohler. “A Reliable Multicast Mechanism for Sensor NetworkApplications.” Technical report, CENS, April 25, 2005 2005.

[GR03] J. van Greunen and Jan Rabaey. “Lightweight Time Synchroniza-tion for Sensor Networks.” In Proceedings of the ACM Workshop onWireless Sensor Networks and Applications (WSNA 2003), Septem-ber 2003.

[GSR04] L. Girod, T. Stathopoulos, N. Ramanathan, J. Elson, D. Estrin,E. Osterweil, and T. Schoellhammer. “Tools for Deployment andSimulation of Heterogeneous Sensor Networks.” In Proceedings ofSenSys 2004, November 2004.

254

[Ham00] M. Hamilton. “Hummercams, Robots, and the Virtual Reserve.”James San Jacinto Mountains Reserve web site, February 2000.

[Har05] Michael Hardy. “Studentized Residuals.” In Wikipedia: An OnlineEncyclopedia. Wikipedia Web Site, 2005.

[HC02] J. Hill and D. Culler. “Mica: A Wireless Platform for Deeply Em-bedded Networks.” IEEE Micro, 22(6):12–24, Nov/Dec 2002.

[HSE03] John Heidemann, Fabio Silva, and Deborah Estrin. “Matching DataDissemination Algorithms to Application Requirements.” In Pro-ceedings of the first international conference on Embedded networkedsensor systems. ACM Press, 2003.

[HSS05] T. He, R. Stoleru, and J.A. Stankovic. “Spotlight: Low-Cost Asym-metric Localization System for Networked Sensor Nodes.” In The 4thInternational Conference on Information Processing in Sensor Net-works (IPSN 2005), Demo Paper. USENIX, 2005.

[HSW00] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler,and Kristofer Pister. “System architecture directions for networkedsensors.” In Proceedings of the Ninth International Conference onArhitectural Support for Programming Languages and Operating Sys-tems (ASPLOS-IX), pp. 93–104, Cambridge, MA, USA, November2000. ACM.

[HVB01] J. Hightower, C. Vakili, G. Borriello, and R. Want. “Design and Cali-bration of the SpotON Ad–Hoc Location Sensing System.” Availablefrom Jeff Hightower’s web site., 2001.

[HW02] Michael Hazas and Andy Ward. “A novel broadband ultrasonic lo-cation system.” In The 4th International Conference on UbiquitousComputing (UbiComp 2002), 2002.

[IGE00] Chalermek Intanagonwiwat, Ramesh Govindan, and Deborah Estrin.“Directed diffusion: a scalable and robust communication paradigmfor sensor networks.” In MobiCom ’00: Proceedings of the 6th annualinternational conference on Mobile computing and networking, pp.56–67, New York, NY, USA, 2000. ACM Press.

[JM96] D. Johnson and D. Maltz. “Dynamic Source Routing in ad hoc wire-less networks.” Mobile Computing, pp. 153–181, 1996.

255

[JZ04] Xiang Ji and Hongyuan Zha. “Sensor Positioning in Wireless Ad–Hoc Networks with Multidimensional Scaling.” In IEEE INFOCOM,2004.

[KC76] C.H. Knapp and G.C. Carter. “The generalized correlation methodfor estimation of time delay.” IEEE Transactions on Acoustics,Speech, and Signal Processing, ASSP-24(4):320–327, August 1976.

[KEE03] Richard Karp, Jeremy Elson, Deborah Estrin, and Scott Schenker.“Optimal and Global Time Synchronization in Sensornets.” TechnicalReport CENS-0012, Center for Embedded Networked Sensing, 2003.

[KMS05] YoungMin Kwon, Kirill Mechitov, Sameer Sundresh, Wooyoung Kim,and Gul Agha. “Resilient Localization for Sensor Networks in Out-door Environments.” In IEEE International Conference on Dis-tributed Computing Systems (ICDCD05), 2005.

[Koe03] Kay Koemer. “The Lighthouse Location System for Smart Dust.” InThe First International Conference on Mobile Systems, Applications,and Services (MobiSys03). USENIX, 2003.

[KSP03] F. Koushanfar, S. Slijepcevic, and M. Potkonjak. “Location Discov-ery in Ad–hoc Wireless Sensor Networks.” In X. Cheng, X. Huang,and D.Z. Du, editors, Ad–hoc Wireless Networking. Kluwer AcademicPublishers, 2003.

[LLW03] P. Levis, N. Lee, M. Welsh, and D. Culler. “TOSSIM: Accurate andScalable Simulations of Entire TinyOS Applications.” In Sensys, LosAngeles, 2003.

[LPC04] Philip Levis, Neil Patel, David E. Culler, and Scott Shenker. “Trickle:A Self-Regulating Algorithm for Code Propagation and Maintenancein Wireless Sensor Networks.” In Proceedings of 1st Symposium onNetworked Systems Design and Implementation (NSDI 2004), March29-31, 2004, San Francisco, California, 2004.

[LR03] K. Langendoen and N. Reijers. “Distributed Localization in WirelessSensor Networks: a Quantitative Comparison.” Computer Networks,special issue on Wireless Sensor Networks, 2003.

[LS02] J.Y. Lee and R.A. Scholtz. “Ranging in a dense multipath environ-ment using an UWB radio link.” IEEE Journal on Selected Areas inCommunications, 20(9):1677–1683, December 2002.

256

[MGE02] William Merrill, Lewis Girod, Jeremy Elson, Katayoun Sohrabi,Fredric Newberg, and William Kaiser. “Autonomous Position Loca-tion in Distributed, Embedded, Wireless Systems.” In the IEEE CASWorkshop on Wireless Communications and Networking, Pasadena,CA, 2002.

[MGS04] W. Merrill, L. Girod, B. Schiffer, D. McIntire, G. Rava, K. Sohrabi,F. Newberg, J. Elson, and W. Kaiser. “Dynamic Networking andSmart Sensing Enable Next-Generation Landmines.” IEEE PervasiveComputing Magazine, Oct-Dec 2004.

[Mil94] David L. Mills. “Internet Time Synchronization: The Network TimeProtocol.” In Zhonghua Yang and T. Anthony Marsland, editors,Global States and Time in Distributed Systems. IEEE Computer So-ciety Press, 1994.

[MKS04] M. Maroti, B. Kusy, G. Simon, and A. Ledeczi. “The FloodingTime Synchronization Protocol.” In Proceedings of the 2nd Inter-national Conference on Embedded Networked Sensor Systems (Sen-Sys04). ACM, November 2004.

[MLR04] David Moore, John Leonard, Daniela Rus, and Seth Teller. “Robustdistributed network localization with noisy range measurements.” InSenSys ’04: Proceedings of the 2nd international conference on Em-bedded networked sensor systems, pp. 50–61, New York, NY, USA,2004. ACM Press.

[MVD05] Miklos Maroti, Peter Volgyesi, Sebestyen Dora, Branislav Kusy, An-dras Nadas, Akos Ledeczi, Gyorgy Balogh, and Karoly Molnar. “Ra-dio Interfermetric Geolocation.” In Proceedings of the 3rd Inter-national Conference on Embedded Networked Sensor Systems (Sen-Sys05). ACM, November 2005.

[NN03] D. Niculescu and B. Nath. “DV–based positioning in adhoc net-works.” Telecommunications Systems, pp. 267–280, 2003.

[PAK05] Neal Patwari, Joshua Ash, Spyros Kyperountas, Alfred Hero III, Ran-dolph Moses, and Neiyer Correal. “Locating the Nodes: CooperativeLocalization in Sensor Networks.” IEEE Signal Processing Magazine,pp. 54–69, July 2005.

[PB94] C. Perkins and P. Bhagwat. “Highly Dynamic Destrinatio–sequencedDistance Vector Routing (DSDV) for Mobile Computers.” In Proceed-ings of the ACM SIGCOMM, pp. 234–244. ACM, August 1994.

257

[PCB00] N. Priyantha, A. Chakraborty, and H. Balakrishnan. “The CricketLocation Support System.” In Mobicom 2000. ACM, August 2000.

[PMB01] Nissanka B. Priyantha, Allen K. L. Miu, Hari Balakrishnan, andSeth Teller. “The Cricket Compass for Context–Aware Mobile Ap-plicaitons.” In The 7th ACM Conference on Mobile Computing andNetworking (MOBICOM), 2001.

[PPT90] R. Pike, D. Presotto, K. Thompson, and H. Trickey. “Plan 9 fromBell Labs.” In Proceedings of the Summer 1990 UKUUG Conference,pp. 1–9, July 1990.

[PR99] C. Perkins and E. Royer. “Ad hoc On–demand Distance–vector(AODV) routing.” In Proceedings of the 2nd IEEE Workshop onMobile Computing Systems and Applications, 1999.

[PTV92] William H. Press, Saul A. Teukolsky, William T. Vetterling, andBrian P. Flannery. Numerical Recipes in C: The Art of ScientificComputing, 2nd Edition. Cambridge University Press, 1992.

[Rap96] T.S. Rappaport. Wireless Communications: Principles and Practice.Prentice–Hall, 1996.

[RBD97] Mendel Rosenblum, Edouard Bugnion, Scott Devine, and Steve Her-rod. “Using the SimOS Machine Simulator to Study Complex Com-puter Systems.” ACM TOMACS Speical Issue on Computer Simula-tion, 1997.

[RC01] Alessandro Rubini and Johnathan Corbet. Writing Linux DeviceDrivers, 2nd edition. O’Reilly, June 2001.

[RD04] Vikas C. Raykar and Ramani Duraiswami. “Automatic Position Cali-bration of Multiple Microphones.” In IEEE International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP04), 2004.

[RDY05] Vikas C. Raykar, Ramani Duraiswami, and B. Yagnanarayana. “Ex-tracting the frequencies of the pinna spectal notches from measuredhead–related impulse responses.” Journal of the American AcousticSociety, 118(1):364–374, July 2005.

[RF03] Yong Rui and Dinei Florencio. “New Direct Approaches to RobustSound Source Localization.” In IEEE International Conference onMultimedia and Expo (ICME), 2003.

258

[RN97] Brent Rector and Joseph Newcomer. Win32 Programming. AddisonWesley, January 1997.

[SBG04] Adam Smith, Hari Balakrishnan, Michel Goraczko, and NissankaPriyantha. “Tracking Moving Devices with the Cricket Location Sys-tem.” In ACM MobiSYS 2004, 2004.

[SBM04] Janos Sallai, Gyorgy Balogh, Miklos Maroti, Akos Ledeczi, andBranislav Kusy. “Acoustic Ranging in Resource–Constrained Sen-sor Networks.” Technical Report ISIS-04-504, Institute for SoftwareIntegrated Systems, 2004.

[SGS04] Andreas Savvides, Lewis Girod, Mani Srivastava, and Deborah Es-trin. “Localization in Sensor Networks.” In C. S. Raghavendra, K. M.Sivalingam, and T. Znati, editors, Wireless Sensor Networks. Kluwer,2004.

[SHS01] Andreas Savvides, Chih-Chieh Han, and Mani B. Strivastava. “Dy-namic fine-grained localization in Ad-Hoc networks of sensors.” InMobiCom ’01: Proceedings of the 7th annual international conferenceon Mobile computing and networking, pp. 166–179, New York, NY,USA, 2001. ACM Press.

[SHS05] Radu Stoleru, Tian He, John A. Stankovic, and David Luebke. “AHigh Accuracy, Low Cost Localization System for Wireless SensorNetworks.” In Proceedings of the 3rd International Conference onEmbedded Networked Sensor Systems (SenSys05). ACM, November2005.

[SKB00] A. Savvides, F. Koushanfar, A. Boulis, V. Karavas, and M.B. Sri-vastava M. Potkonjak. “Location Discovery in Ad-hoc Wireless Net-works.” Memorandum, Networked and Embedded Systems Labora-tory, UCLA, June 2000.

[SMP02] Sasha Slijepcevic, Seapahn Megerian, and Miodrag Potkonjak. “Lo-cation Errors in Wireless Embedded Sensor Networks: Sources, Mod-els, and Effects on Applications.” Mobile Computing and Communi-cations Review, 6(3):67–78, June 2002.

[SMP03] Sasha Slijepcevic, Seapahn Megerian, and Miodrag Potkonjak.“Characterization of Location Error in Wireless Sensor Networks:Analysis and Applications.” In IPSN ’03: Proceedings of the SecondACM/IEEE International Conference on Information Processing inSensor Networks, pp. 593–608, 2003.

259

[SP80] D.V. Sarwate and M.B. Pursley. “Crosscorrelation Properties ofPsuedorandom and Related Sequences.” Proceedings of the IEEE,68:593–619, 1980.

[SPS03] A. Savvides, H. Park, and M. B. Srivastava. “The N-Hop Multilat-eration Primitive for Node Localization Problems.” MONET SpecialIssue on Sensor Networks and Applications, 2003.

[SRZ03] Y. Shang, W. Ruml, Y. Zhang, and M. Fromherz. “Localization frommere connectivity.” In Proceedings of MobiHoc03, pp. 201–212, June2003.

[SS04] Radu Stoleru and John Stankovic. “Probability Grid: A LocationEstimation Scheme for Wireless Sensor Networks.” In Proceedings ofthe 1st IEEE Conference on Sensor and Adhoc Communications andNetworks (SECON 2004), 2004.

[Tor52] W.S. Torgerson. “Multidimensional Scaling: I. Theory and Method.”Psychometrika, 17:401–419, 1952.

[War04] Matthias Warkus. The Official GNOME 2 Developer’s Guide. NoStarch Press, April 2004.

[WC03] Kamin Whitehouse and David Culler. “Macro–Calibration in Sen-sor/Actuator Networks.” Mobile Networks and Applications, pp. 463–472, 2003.

[WCA05] Hanbiao Wang, Chiao-En Chen, Andreas Ali, Shadnaz Asgari,Ralph E. Hudson, Kung Yao, Deborah Estrin, and Charles Taylor.“Acoustic Sensor Networks for Woodpecker Localization.” In SPIEConference on Advanced Signal Processing Algorithms, Architecturesand Implementations, August 2005.

[WJH97] A. Ward, A. Jones, and A. Hopper. “A new location technique forthe active office.” IEEE Personal Communications, 4(5), October1997.

[WSB04] Kamin Whitehouse, Cory Sharp, Eric Brewer, and David Culler.“Hood: a neighborhood abstraction for sensor networks.” In Mo-biSYS ’04: Proceedings of the 2nd international conference on Mobilesystems, applications, and services, pp. 99–110, New York, NY, USA,2004. ACM Press.

260

[WTC03a] Alec Woo, Terence Tong, and David Culler. “Taming the UnderlyingChallenges of Reliable Multihop Routing in Sensor Networks.” InSensys 2003, 2003.

[WTC03b] Alec Woo, Terence Tong, and David Culler. “Taming the underlyingchallenges of reliable multihop routing in sensor networks.” In Pro-ceedings of the first international conference on Embedded networkedsensor systems, pp. 14–27. ACM Press, 2003.

[WYP04] Hanbiao Wang, Kung Yao, Greg Pottie, and Deborah Estrin.“Entropy–based Sensor Selection Heuristic for Localization.” In Sym-posium on Information Processing in Sensor Networks (IPSN04),April 2004.

[YHE02] W. Ye, J. Heidemann, and D. Estrin. “An energy-efficient MACprotocol for wireless sensor networks.” In Proceedings of IEEE IN-FOCOM, 2002.

[ZG03] Jerry Zhao and Ramesh Govindan. “Understanding Packet DeliveryPerformance In Dense Wireless Sensor Networks.” In Proceedingsof the first international conference on Embedded networked sensorsystems. ACM Press, 2003.

261

Date post:	26-Feb-2023
Category:	Documents
Upload:	khangminh22
View:	0 times
Download:	0 times

A Self-Calibrating System of Distributed Acoustic Arrays

Documents