Active Learning of Gaussian Processes for Spatial Functions in...

Active Learning of Gaussian Processes for

Spatial Functions in Mobile Sensor

Networks

Dongbing Gu ∗ Huosheng Hu ∗

∗ University of Essex, Wivenhoe Park, CO4 3SQ UK (e-mail:[email protected], [email protected]).

Abstract: This paper proposes a spatial function modeling approach using mobile sensornetworks, which potentially can be used for environmental surveillance applications. The mobilesensor nodes are able to sample the point observations of an 2D spatial function. On the onehand, they will use the observations to generate a predictive model of the spatial function. Onthe other hand, they will make collective motion decisions to move into the regions where highuncertainties of the predictive model exist. In the end, an accurate predictive model is obtainedin the sensor network and all the mobile sensor nodes are distributed in the environment withan optimized pattern. Gaussian process regression is selected as the modeling technique inthe proposed approach. The hyperparameters of Gaussian process model are learned online toimprove the accuracy of the predictive model. The collective motion control of mobile sensornodes is based on a locational optimization algorithm, which utilizes an information entropyof the predicted Gaussian process to explore the environment and reduce the uncertainty ofpredictive model. Simulation results are provided to show the performance of the proposedapproach.

Keywords: Coverage control, Gaussian process regression, Active sensing, Mobile sensornetworks, Spatial function modeling.

1. INTRODUCTION

The environmental surveillance in meteorology and clima-tology, ecology, demography, epidemiology, forestry, fisher,oceanography, and others requires the capability of mod-eling spatial functions, or even spatial-temporal functions.The distributed nature in spatial space and the mobile ca-pability in temporal space of mobile sensor networks offeran capability for environmental surveillance applications.Several research projects have targeted to this researcharea, such as monitoring forest fires using UAVs in Merinoet al. (2006), monitoring air quality using UAVs in Cor-rigan et al. (2007), monitoring ocean ecology conditionsusing UWVs in Leonard et al. (2007).

Mobile sensor networks are able to make sensing observa-tions of environmental spatial function with their on-boardsensors, exchange information with on-board wireless com-munication, and explore the environment with their mo-bility. Consequently they are able to produce a predictivemodel based on sensing observations and allocate them-selves in a pattern which can generate a more accuratepredictive model. Gaussian process (GP), also known asKriging filter, is a well-known regression technique fordata assimilation. GPs are specified by a mean function,a covariance function, and a set of hyperparameters whichcan be determined from a training set. The learning al-gorithm of hyperparameters is based on maximizing themarginal likelihood. The main advantage of GP regressionover other regression techniques is the ability of predictingnot only the mean function, but also the covariance func-

tion, see Williams and Rasmussen (1996), MacKay (1998),Rasmussen and Williams (2006).

The predictive uncertainty is valued information for fur-ther decision making in environmental surveillance appli-cations. Recent publications using GPs to model a spatialfunction include Krause et al. (2008), Stranders et al.(2008), Stachniss et al. (2009), Ny and Pappas (2009),Cortes (2009), Singh et al. (2010). In Krause et al. (2008),GP regression was applied for monitoring the ecologicalcondition of a river. The sensor placement was deter-mined by maximizing a mutual information gain, whichselects locations which most effectively reduce the uncer-tainty at the unobserved locations. GP regression is anon-parameter regression technique and its computationcomplexity will grow with the size of sampled data. InStranders et al. (2008), the computation complexity ofGP regression was reduced by a Bayesian Monte Carloapproach, and an information entropy was used to allocatemobile sensor nodes. In Stachniss et al. (2009), a mixtureof GPs was applied for building a gas distribution withthe aim to reduce the computation complexity. In Ny andPappas (2009), a Kalman filter was built on the top ofa GP model to characterize spatial-temporal functions. Apath planning problem was solved by optimizing a mutualinformation gain via an effective computation algorithm.In Cortes (2009), a Kriged Kalman filter was developedto build spatial-temporal functions. A centroidal Voronoitessellation (CVT) algorithm was employed to allocatemobile sensor nodes according to the predictive spatialfunction. The Kriged Kalman filter and swarm control

Preprints of the 18th IFAC World CongressMilano (Italy) August 28 - September 2, 2011

Copyright by theInternational Federation of Automatic Control (IFAC)

13564

were developed in Choi et al. (2008) to build spatial-temporal functions. In Singh et al. (2010), several non-separable spatial-temporal covariance functions were pro-posed for modeling spatial-temporal functions. The samemutual information gain as in Krause et al. (2008) wasutilized to plan the path for mobile sensor nodes.

Environmental spatial functions have been modeled in mo-bile sensor networks by using RBF networks in Schwageret al. (2009) where a CVT coverage control was used toallocate mobile sensor nodes, and in Lynch et al. (2008)where a flocking control was used to move sensor nodes.RBF is a truncated GP regression where limited numberof base functions is used. Environmental spatial functionwas approximated by using an inverse distance weightinginterpolation method and updated by using a Kalmanfilter in Martinez (2010). A Kalman filter approach fordynamic coverage control was also proposed in Hussein(2007).

In this paper, we propose to use GP regression to builda spatial function with a mobile sensor network. Ourcontribution is to make the hyperparameters of GP modeladaptive online so that a more accurate time varyingpredictive model can be obtained. An information entropyof the predictive Gaussian process is optimized to allocatemobile sensor nodes so that the environment can beexplored and the model uncertainty can be reduced. ACVT algorithm that uses the information entropy as autility function is proposed in this paper. This algorithmis able to allow mobile sensor nodes to make collectivemotion decisions and allocate themselves in a patternwhich can reduce the uncertainty of the predictive GPmodel.

In the following, Section 2 presents the basics of Gaus-sian process regression and the hyperparameter learningalgorithm. The information entropy based coverage controland its integration with the CVT algorithm are introducedin Section 3. Section 4 provides simulation results. Ourconclusion and future work are given in Section 5.

2. GAUSSIAN PROCESS REGRESSION

A mobile wireless sensor network with N sensors is to bedeployed in an 2D area Q to model a scalar environmentalspatial function in that area. Sensor node i is located at a2D position xi,t and it is assumed that the position xi,t canbe found by itself with self-localization techniques at timestep t. Each sensor node i can make a point observationyi,t of an environmental spatial function f(xi,t) at timestep t. The sensory observation distribution is assumed tobe Gaussian:

yi,t = f(xi,t) + εi,t

where εi,t is a Gaussian noise with mean zero and covari-ance σ2

t noted as εi,t ∼ N (0, σ2t ).

It is assumed that each sensor node can collect all thelocation information xj,t and its corresponding observationyj,t from all the other sensor nodes via wireless communi-cation.

2.1 Gaussian Process

In a sensor node, Gaussian inference is conducted at eachtime step based on the given information available at thatmoment. The given information includes a data set Dt ofinput vectors Xt = [x1,t, . . . xN,t]

T and the correspondingobservations yt = [y1,t, . . . , yN,t]

T .

In a GP model, the prior distribution of latent variablefi,t = f(xi,t) is modeled as Gaussian. Its mean value isassumed to be zero because offsets and simple trends canbe subtracted out first. The prior knowledge about mul-tiple latent variables is modeled by a covariance functionKNN,t = [k(xi,t, xj,t)]. With a positive definite covariancefunction KNN,t, the GP prior distribution of latent vectorft = [f1,t, . . . , fN,t]

T is represented as:

p(ft) = N (0,KNN,t)

The likelihood distribution of observation vector yt isrepresented as:

p(yt|ft) = N (ft, σ2t I)

GP regression can infer f∗,t = f(x∗,t) for a test point x∗,t ∈Q using p(f∗,t|yt) given a training data set (Xt, yt) and asingle test point x∗,t. The latent predictive distributionof the given test point is obtained by solving the MAPproblem and is given below:

p(f∗,t|yt) =N (µ∗,t,Σ∗,t)

in which the predictive mean function and the predictivecovariance function are:

µ∗,t = K∗N,t(KNN,t + σ2t I)

−1yt

Σ∗,t = K∗∗,t −K∗N,t(KNN,t + σ2t I)

−1KN∗,t

(1)

where KN∗,t = KT∗N,t for symmetrical covariance func-

tions, and

K∗∗,t = k(x∗,t, x∗,t)

K∗N,t = [k(x∗,t, x1,t), . . . , k(x∗,t, xN,t)]

2.2 Hyperparameter Learning

The prior knowledge about KNN,t is very important forGP regression. It determines the properties of samplefunctions drawn from the GP prior and represents the priorknowledge about environmental spatial functions. For themobile sensor network discussed in this research, the priorknowledge we have is that f(xi,t) is closely related tof(xj,t), i.e. k(xi,t, xj,t) approximates its maximum valueif the distance between two nodes xi,t and xj,t is short. Incontrast, f(xi,t) is not related to f(xj,t), i.e. k(xi,t, xj,t)approximates to zero if the distance between them istoo far away. A valid covariance function guarantees thatcovariance matrix KNN,t is symmetrical positive definite.The commonly used covariance function is the ‘squaredexponential’:

k(rt) = a2t exp

(

−r2tl2t

)

(2)

where rt is the Euclidean distance ||xi,t−xj,t|| between twonodes. at is the amplitude and lt is the lengthscale, bothof which represent the characteristics of covariance func-tion and are the hyperparameters of squared exponentialcovariance function. Notice that a smaller lengthscale im-plies the sample function varies more rapidly and a larger


13565

lengthscale implies the sample function varies more slowly.The hyperparameter set is denoted as θt = [at, lt, σt]

T .The time varying property of the hyperparameters is main-tained via the maximum likelihood learning algorithm dis-cussed below. Although temporal dynamics is not modeledin the GP regression discussed above, the time varyinghyperparameters can potentially compensate for temporaldynamics.

Given a hyperparameter set, the log marginal likelihoodis:

L = − log p(yt|θt)

=1

2yTt C

−1t yt +

1

2log |Ct|+

N

2log(2π)

where Ct = KNN,t + σ2t I. The partial derivative is:

∂L

∂θt= −

1

2yTt C

−1t

∂Ct

∂θtC−1

t yt +1

2tr

(

C−1t

∂Ct

∂θt

)

= −1

2tr

(

(

αtαTt − C−1

t

) ∂Ct

∂θt

)

(3)

where αt = C−1t yt. From (2), it can find that

∂Ct

∂at=

[

∂k(rt)

∂at

]

=

[

2at exp

(

−r2tl2t

)]

∂Ct

∂lt=

[

∂k(rt)

∂lt

]

=

[

2a2t r

2t

l3texp

(

−r2tl2t

)]

∂Ct

∂σt

= 2σtI

3. INFORMATION ENTROPY BASED COVERAGECONTROL

When the sensor network is deployed in the environmentto be explored, they have no knowledge about the spatialfunction except the prior knowledge of GP regression.Mobile sensor node i makes observations yi,t at its locationxi,t, and obtains observation yj,t and location xj,t fromall the other nodes j ∈ N via wireless communication.It then learns the hyperparameters θt and constructs thecovariance function KNN,t. With the GP regression, apredictive mean function µ∗,t and a predictive covariancefunction Σ∗,t are deduced. This is the first step of ourproposed approach.

In the second step of our proposed approach, the sensornodes need a strategy to explore the environment so thatthey will be able to model the spatial function moreaccurately. The predictive mean function µ∗,t could beused for generating a control signal ui,t for sensor nodei. This strategy would lead to an allocation concentrationof sensor nodes on the region where high mean valuesare available. However, this behavior can not reduce theuncertainty of GP regression. In this work, the informationentropy is utilized for mobile sensor nodes to explore theenvironment in order to reduce the uncertainty of thepredictive function. By optimizing the information entropywith respect to sensor node location xi,t, it is able to finda control input signal ui,t for mobile sensor node i. Thensensor node i is able to move to the next position accordingto its kinematics xi,t+1 = xi,t + ui,t.

3.1 Information Entropy

To reduce the uncertainty of GP regression, the predictivecovariance function Σ∗,t is a valued resource to be uti-lized. This evokes the use of posterior information entropyH(f∗,t|yt), which measures the information potentiallygained by making an observation.

The information entropy of Gaussian random variable f∗,tconditioned on observation yt is a monotonic function ofits variance:

H(f∗,t|yt) =1

2log (2πe|Σ∗,t|)

=1

2log

(

2πe|K∗∗,t −K∗N,tC−1t KN∗,t|

)

As can be seen from the above equation, although it lookslike that an observation will not be related to the informa-tion entropy, the observation affects the hyperparametersand therefore it does affect the covariance function. Thusthis information entropy is dependent on actual observa-tion. We can represent this dependence as follows:

H(x∗,t) = H(f∗,t|yt)

3.2 Centroidal Voronoi Tessellation

According to the above discussion, optimizing the infor-mation entropy would be able to allow the mobile sensornodes to explore the environment in order to reduce theuncertainty of GP regression. However, for multiple sensornodes, when all of them optimize a single object function,it would be possible for them to move together rather thanspreading over a large area to explore more high uncertainregions.

It is necessary for multiple mobile sensor nodes to coop-erate when they explore the environment . The centroidalVoronoi tessellation (CVT) approach proposed in Corteset al. (2004) is an effective way for motion cooperation.The CVT algorithm is a locational optimization approachbased on a utility function. Here we propose to use theinformation entropy H(x∗,t) as the utility function in CVTalgorithm. The idea is to allow each node to move tothe center of its Voronoi cell, which is weighted by theinformation entropy.

A Voronoi tessellation consists of multiple Voronoi cellsVi,t, each of which is occupied by a sensor nodes at timestep t. A Voronoi cell Vi,t is defined as follows:

Vi,t = {x∗,t ∈ Q | ‖x∗,t − xi,t‖ ≤ ‖x∗,t − xj,t‖ , ∀i 6= j}

CV T is a special Voronoi tessellation, which requires eachsensor node move forward to the mass center of its Voronoicell. The utility function of locational optimization prob-lem is defined as:

U(x1,t, . . . , xN,t) =N∑

i=1

∫

Vi,t

1

2‖x∗,t − xi,t‖

2H(x∗,t)dx∗,t

For computing the mass center of each Vi,t, we use thefollowing definitions:


13566

MVi,t=

∫

Vi,t

H(x∗,t)dx∗,t

LVi,t=

∫

Vi,t

x∗,tH(x∗,t)dx∗,t

CVi,t=

LVi,t

MVi,t

The gradient of the utility function with respect to sensornode position xi,t is:

∂U

∂xi,t= −

∫

Vi,t

(x∗,t − xi,t)H(x∗,t)dx∗,t

= −MVi,t(CVi,t

− xi,t)

The control input signal of sensor node i is:

ui,t = k∂U

∂xi,t

where k is the control gain.

4. SIMULATIONS

An 1× 1 area was used in simulations. The mobile sensornetwork consisted of N = 30 sensor nodes and theywere randomly distributed in a small area with a size of0.2×0.2. The hyperparameters learned are the lengthscalelt and the amplitude at. The hyperparameter σt wasgiven constant in simulations. A Gaussian like spatialfunction was simulated first. The sensor observation noisefollowed a Gaussian distribution with a standard deviationof σt = 0.05. The noised function is shown in Fig. 1(a).The predictive mean function is shown in Fig. 1(b). Thesetwo figures show that the proposed algorithm was able tomodel a simple spatial function.

The mobile sensor trajectories are shown in Fig. 2(a). 30sensor nodes were initially placed at the left bottom cornerof the environment. They were able to move to cover aslarge as possible the area according to the uncertaintyof the predictive model. The root mean squared error(RMSE) between the predictive function and the groundtruth function with noise is shown in Fig. 2(b). At thebeginning of the process (from loop 1 to loop 25), theRMSE was kept nearly unchanged. Then the RMSE expe-rienced a sharp rise at around the 40th loop. It indicatedan exploring behavior of the proposed algorithms given thenoised observations. After the sharp rise, the RMSE de-clined rapidly and converged to zero. The behavior demon-strated in the changes of RMSE can also be observed fromthe results of hyperparameter learning. The lengthscaleand the amplitude of squared exponential function werelearned online using the maximum likelihood algorithm.The learned results are shown in Fig. 3(a) for lengthscaleand in Fig. 3(b) for amplitude. Both parameters werekept very low due to the lack of enough coverage of theenvironment by all the sensor nodes at the beginning. Thena sharp rise was observed in both parameters. After the40th loop, the hyperparameters declined to stable values.The stable value of lengthscale was about 0.25 and thestable value of amplitude was about 0.28.

A more complex 2D spatial function was simulated to showthe proposed algorithm can also handle non-Gaussianspatial functions. The ground truth function is shown

00.2

0.40.6

0.81

0

0.5

10

0.2

0.4

0.6

0.8

1

No

ise

d f

un

ctio

n

(a)

00.2

0.40.6

0.81

0

0.5

10

0.2

0.4

0.6

0.8

1

Me

an

(b)

Fig. 1. Ground truth function with noise and predictivemean function

Fig. 4(a). The sensor noise was sampled with a standarddeviation of σ = 0.005. The predictive mean functionis shown in Fig. 4(b). By comparing the ground truthfunction with the predictive mean function, it is foundthat they were very similar. The large error was foundat the boundary of the squared area. This is because theCVT algorithm kept the mobile sensor nodes inside ofthe Voronoi cell and the boundary information was lesssampled.

The trajectories of mobile sensor nodes demonstrated thebehavior of the proposed algorithm. They started from thecorner of 0.2 × 0.2 and moved to the uncertainty regionaccording to the information entropy. In the end, theycovered the area in an optimized pattern shown in Fig.5(a). The RMSE curve in Fig. 5(b) shows the convergentproperty of the whole process. All the sensor nodes didexplore the environment at the beginning of the learning.Then the RMSE started to decline until a value very closeto zero was reached.


13567

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a)

0 20 40 60 80 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

loops

RM

SE

(b)

Fig. 2. Trajectories and RMSE changes

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

loops

length

scale

(a)

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

loops

am

plit

ude

(b)

Fig. 3. Learning results of hyperparameters

5. CONCLUSIONS

GP regression is an effective tool for spatial function modelproblems due to its ability of generating predictive meanfunction and predictive covariance function for the func-tion to be estimated. With the predictive covariance func-tion, an information entropy based approach to motiondecision making is proposed in this paper. Combining theinformation entropy with the CVT algorithm, it is ableto explore the uncertainty of the environment and exploit

00.2

0.40.6

0.81

0

0.5

10

2

4

6

8

No

ise

d f

un

ctio

n

(a)

00.2

0.40.6

0.81

0

0.5

10

2

4

6

8M

ea

n

(b)

Fig. 4. Ground truth non-Gaussian function and predictivemean function

the predictive model for multiple mobile sensor nodes. Theproposed strategy is an active spatial function modelingapproach.

Since the information entropy is only related to the co-variance function, not related to the sensor observation,it is possible to control mobile sensor nodes to move evenwithout practical sensor observations. However, when thehyperparameters are learned, practical sensor observationis necessary for motion decision making. Although the pro-posed algorithm targets to spatial functions, not spatial-temporal functions, it is possible for the proposed ap-proach to be applied to spatial-temporal functions wheretemporal dynamics can be handled implicitly by the adap-tive covariance function learned from the online hyperpa-rameter learning algorithm.

In the next step, we would like to test our proposedalgorithm to spatial-temporal functions to check its per-formance. Alternatively, it is feasible to use a hierarchicalBayesian structure to explicitly handle temporal dynam-ics. In a hierarchical Bayesian structure, a low layer han-


13568

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(a)

0 20 40 60 80 1000

1

2

3

4

5

6

loops

RM

SE

(b)

Fig. 5. Trajectories and RMSE changes

dles the spatial dynamics using the proposed GP regressionwhile a high layer handles the temporal dynamics usingKalman filter.

ACKNOWLEDGEMENTS

This research work is financially sponsored by EuropeanUnion FP7 program, ICT-231646, SHOAL.

REFERENCES

Choi, J., Lee, J., and Oh, S. (2008). Swarm intelligencefor achieving the global maximum using spatio-temporalGaussian processes. In Proc. of the American ControlConference. Seattle, Washington.

Corrigan, C.E., Roberts, G.C., Ramana, M.V., Kim, D.,and Ramanathan, V. (2007). Capturing vertical profilesof aerosols and black carbon over the indian ocean usingautonomous unmanned aerial vehicles. AtmosphericChemistry and Physics Discussions, 7(4), 11429–11463.

Cortes, J. (2009). Distributed Kriged Kalman filter forspatial estimation. IEEE Trans. on Automatic Control,54(12), 2816–2827.

Cortes, J., Martinez, S., Karatas, T., and Bullo, F. (2004).Coverage control for mobile sensing networks. IEEETrans. on Robotics and Autonamous, 20(2), 243–255.

Hussein, I. (2007). A Kalman filter-based control strategyfor dynamic coverage control. In Proc. of the 2007American Control Conference, 3271–3276. New York,NY.

Krause, A., Singh, A., and Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: the-ory, efficient algorithms and empirical studies. Journalof Machine Learning Research, 9, 235–284.

Leonard, N.E., Paley, D., Lekien, F., Sepulchre, R., Fratan-toni, D.M., and Davis, R. (2007). Collective motion,sensor networks and ocean sampling. Proceedings of theIEEE, 95(1), 48–74.

Lynch, K.M., Schwartz, I.B., Yang, P., and Freeman,R.A. (2008). Decentralized environmental modelling bymobile sensor networks. IEEE Trans. on Robotics, 24(3),710–724.

MacKay, D.J.C. (1998). Introduction to Gaussian pro-cesses. In C.M. Bishop (ed.), Neural Networks and Ma-chine Learning, volume 168, 133–165. Springer, Berlin.

Martinez, S. (2010). Distributed interpolation schemes forfield estimation by mobile sensor networks. IEEE Trans.on Control Systems Technology, 18(2), to appear.

Merino, L.F., Caballero, J.R., de Dios, J.M., and Ferruz,A.O. (2006). A cooperative perception system formultiple UAVs: application to automatic detection offorest fires. Journal of Field Robotics, 23(3), 165–184.

Ny, J.L. and Pappas, G. (2009). On trajectory optimiza-tion for active sensing in Gaussian process models. InProc. of the IEEE Conf. on Decision and Control, 6286–6292. Shanghai, China.

Rasmussen, C.E. and Williams, C. (2006). GaussianProcesses for Machine Learning. MIT press, CambridgeMA.

Schwager, M., Rus, D., and Slotine, J. (2009). Decentral-ized, adaptive converage control for networked robots.Int. J. of Robotics Research, 28(3), 357–375.

Singh, A., Ramos, F., Whyte, H.D., and Kaiser, W.J.(2010). Modeling and decision making in spatio-temporal processes for environmental surveillance. InProc. of the IEEE Int. Conf. on Robotics and Automa-tion. Anchorage, Alaska, USA.

Stachniss, C., Plagemann, C., and Lilienthal, A.J. (2009).Learning gas distribution models using sparse Gaussianprocess mixtures. Autonomous Robots, 26(2-3), 187–202.

Stranders, R., Rogers, A., and Jennings, N. (2008). Adecentralized, on-line coordination mechanism for mon-itoring spatial phenomena with mobile sensors. In Proc.of Second International Workshop on Agent Technologyfor Sensor Networks. Estoril, Portugal.

Williams, C.K.I. and Rasmussen, C.E. (1996). Gaus-sian processes for regression. In D.S. Touretzky, M.C.Mozer, and M.E. Hasselmo (eds.), Advances in NeuralInformation Processing Systems 8, 514–520. MIT Press,Cambridge.


13569

Date post:	24-Jul-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Active Learning of Gaussian Processes for Spatial Functions in...

Documents