+ All Categories
Home > Documents > 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON...

280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON...

Date post: 21-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
13
280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 Distributed Particle Filtering via Optimal Fusion of Gaussian Mixtures Jichuan Li and Arye Nehorai , Life Fellow, IEEE Abstract—We propose a distributed particle filtering algorithm based on an optimal fusion rule for local posteriors. We imple- ment the optimal fusion rule in a distributed and iterative fashion via an average consensus algorithm. We approximate local poste- riors as Gaussian mixtures and fuse Gaussian mixtures through importance sampling. We prove that under certain conditions the proposed distributed particle filtering algorithm converges in prob- ability to a global posterior locally available at each sensor in the network. Numerical examples are presented to demonstrate the performance advantages of the proposed method in comparison with other distributed particle filtering algorithms. Index Terms—Average consensus, data fusion, distributed par- ticle filtering, Gaussian mixture model, importance sampling. I. INTRODUCTION P ARTICLE filtering, also known as the sequential Monte Carlo method, is a powerful tool for sequential Bayesian estimation [1]. Unlike Kalman filtering [2], particle filtering is able to work with both nonlinear models and non-Gaussian noise, and thus is applicable to sequential estimation problems under general assumptions. To improve estimation accuracy, a particle filter is often built on observations from more than one perspective or sensor. These sensors form a network and collab- orate via wireless communication. The network can designate one of the sensors or an external node as the fusion center, which receives and processes observations sent from all the other sen- sors in the network. This centralized implementation is optimal for particle filtering in terms of accuracy, but does not scale with growing network size in applications like target tracking, environmental monitoring, and smart grids. This shortcoming motivates distributed particle filtering [3]. Distributed particle filtering consists of separate particle fil- ters that have access to local observations only and produce global estimates via communication. It is often implemented using a consensus algorithm [4], where sensors in a net- work reach agreement among their beliefs iteratively through Manuscript received February 10, 2016; revised September 7, 2016 and March 24, 2017; accepted April 4, 2017. Date of publication April 12, 2017; date of current version May 8, 2018. This work was supported by the Air Force Office of Scientific Research under Grants FA9550-16-1-0386 and FA9550-11-1-0210. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Marcelo G. S. Bruno. (Corresponding author: Arye Nehorai.) The authors are with the Preston M. Green Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO 63130 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSIPN.2017.2694318 communication between neighboring sensors. Depending on the type of information communicated in the consensus algorithm, a distributed particle filtering algorithm can be categorized as weight-based, likelihood-based, or posterior-based. Weight-based algorithms [5]–[9] communicate the weight of each particle or, similarly, the local likelihood evaluated at each particle. To guarantee an accurate Monte Carlo approximation, the number of particles held by each local filter is usually con- siderably large, which results in considerably high communica- tion overhead for weight consensus. Also, in order for weight consensus to make sense, different local filters must have iden- tical particles, which necessitates perfect synchronization be- tween the random number generators of different sensors. The reliance on perfect synchronization, together with the high com- munication overhead, makes weight-based algorithms costly to implement in practice. Likelihood-based algorithms [10]–[12] communicate local likelihood functions approximated via factorization and linear regression. Since there is no universal approach to the desired factorization format, the likelihood approximation approach does not generalize well beyond the exponential family. Also, likelihood consensus requires uniform factorization across the network and thus does not apply to scenarios where the noise distribution at each sensor varies. Hence, likelihood consensus might not be an ideal choice for general applications. Posterior-based algorithms [13]–[21] communicate local posteriors parametrically approximated in a compact form, and have several advantages over likelihood-based and weight- based algorithms. First, unlike likelihood functions, posteriors are essentially probability density functions and thus easy to represent parametrically. If a posterior follows a (multivariate) Gaussian distribution, it can be losslessly represented by its mean and variance (covariance matrix); if a posterior follows a non-Gaussian distribution, it can be sufficiently accurately approximated by a convex combination of multiple Gaussian components, i.e., a Gaussian mixture (GM) [22]. Also, such a compact parametric representation incurs significantly lower communication overhead than a nonparametric representation, e.g., particles. Moreover, posterior-based algorithms are invariant to how local posteriors are obtained and thus allow diverse sensing modalities [23] and various filtering tools to be exploited in a network. Lastly, posterior-based algorithms give each sensor privacy, since no sensor in the network needs to know how others compute their local posteriors. The challenge of posterior-based algorithms mainly lies in the fusion of parametrically-represented local posteriors. 2373-776X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Transcript
Page 1: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

Distributed Particle Filtering via OptimalFusion of Gaussian Mixtures

Jichuan Li and Arye Nehorai , Life Fellow, IEEE

Abstract—We propose a distributed particle filtering algorithmbased on an optimal fusion rule for local posteriors. We imple-ment the optimal fusion rule in a distributed and iterative fashionvia an average consensus algorithm. We approximate local poste-riors as Gaussian mixtures and fuse Gaussian mixtures throughimportance sampling. We prove that under certain conditions theproposed distributed particle filtering algorithm converges in prob-ability to a global posterior locally available at each sensor in thenetwork. Numerical examples are presented to demonstrate theperformance advantages of the proposed method in comparisonwith other distributed particle filtering algorithms.

Index Terms—Average consensus, data fusion, distributed par-ticle filtering, Gaussian mixture model, importance sampling.

I. INTRODUCTION

PARTICLE filtering, also known as the sequential MonteCarlo method, is a powerful tool for sequential Bayesian

estimation [1]. Unlike Kalman filtering [2], particle filteringis able to work with both nonlinear models and non-Gaussiannoise, and thus is applicable to sequential estimation problemsunder general assumptions. To improve estimation accuracy, aparticle filter is often built on observations from more than oneperspective or sensor. These sensors form a network and collab-orate via wireless communication. The network can designateone of the sensors or an external node as the fusion center, whichreceives and processes observations sent from all the other sen-sors in the network. This centralized implementation is optimalfor particle filtering in terms of accuracy, but does not scalewith growing network size in applications like target tracking,environmental monitoring, and smart grids. This shortcomingmotivates distributed particle filtering [3].

Distributed particle filtering consists of separate particle fil-ters that have access to local observations only and produceglobal estimates via communication. It is often implementedusing a consensus algorithm [4], where sensors in a net-work reach agreement among their beliefs iteratively through

Manuscript received February 10, 2016; revised September 7, 2016 and March24, 2017; accepted April 4, 2017. Date of publication April 12, 2017; date ofcurrent version May 8, 2018. This work was supported by the Air Force Officeof Scientific Research under Grants FA9550-16-1-0386 and FA9550-11-1-0210.The associate editor coordinating the review of this manuscript and approvingit for publication was Dr. Marcelo G. S. Bruno. (Corresponding author: AryeNehorai.)

The authors are with the Preston M. Green Department of Electrical andSystems Engineering, Washington University in St. Louis, St. Louis, MO 63130USA (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSIPN.2017.2694318

communication between neighboring sensors. Depending on thetype of information communicated in the consensus algorithm,a distributed particle filtering algorithm can be categorized asweight-based, likelihood-based, or posterior-based.

Weight-based algorithms [5]–[9] communicate the weight ofeach particle or, similarly, the local likelihood evaluated at eachparticle. To guarantee an accurate Monte Carlo approximation,the number of particles held by each local filter is usually con-siderably large, which results in considerably high communica-tion overhead for weight consensus. Also, in order for weightconsensus to make sense, different local filters must have iden-tical particles, which necessitates perfect synchronization be-tween the random number generators of different sensors. Thereliance on perfect synchronization, together with the high com-munication overhead, makes weight-based algorithms costly toimplement in practice.

Likelihood-based algorithms [10]–[12] communicate locallikelihood functions approximated via factorization and linearregression. Since there is no universal approach to the desiredfactorization format, the likelihood approximation approachdoes not generalize well beyond the exponential family. Also,likelihood consensus requires uniform factorization across thenetwork and thus does not apply to scenarios where the noisedistribution at each sensor varies. Hence, likelihood consensusmight not be an ideal choice for general applications.

Posterior-based algorithms [13]–[21] communicate localposteriors parametrically approximated in a compact form,and have several advantages over likelihood-based and weight-based algorithms. First, unlike likelihood functions, posteriorsare essentially probability density functions and thus easy torepresent parametrically. If a posterior follows a (multivariate)Gaussian distribution, it can be losslessly represented by itsmean and variance (covariance matrix); if a posterior followsa non-Gaussian distribution, it can be sufficiently accuratelyapproximated by a convex combination of multiple Gaussiancomponents, i.e., a Gaussian mixture (GM) [22]. Also, such acompact parametric representation incurs significantly lowercommunication overhead than a nonparametric representation,e.g., particles. Moreover, posterior-based algorithms areinvariant to how local posteriors are obtained and thus allowdiverse sensing modalities [23] and various filtering tools to beexploited in a network. Lastly, posterior-based algorithms giveeach sensor privacy, since no sensor in the network needs toknow how others compute their local posteriors.

The challenge of posterior-based algorithms mainly liesin the fusion of parametrically-represented local posteriors.

2373-776X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 2: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 281

In [13]–[16], local posteriors are fused in a Bayesian fashionbut assumed to be Gaussian for fusion tractability. As we know,a posterior follows a Gaussian distribution only if both the statetransition model and the observation model are linear with addi-tive Gaussian noise. Thus, the Gaussian assumption is so strongthat it will incur obvious approximation errors in nonlinear ap-plications. In [17] and [18], local posteriors are approximated asGaussian mixtures but fused linearly through their parameters.This linear fusion rule is, however, suboptimal because it is notjustified by the underlying statistical model. Also, it requiresGaussian mixtures to have a uniform number of components,thus limiting the flexibility and adaptivity of local parametricrepresentation. In [19]–[21], Gaussian mixtures are fused inan analytical fashion with approximations, but the approximatefusion strategy incurs inaccuracy and makes it difficult to con-trol the number of components within a reasonable range for aGaussian mixture model.

In this paper, we propose a posterior-based distributed particlefiltering algorithm. We approximate local posteriors as Gaussianmixtures and fuse local posteriors via an optimal distributed fu-sion rule derived from Bayesian statistics and implemented viaaverage consensus. Unlike other posterior-based algorithms, theproposed algorithm neither compromises approximation accu-racy for fusion tractability nor compromises fusion validity forapproximation accuracy. Also, the proposed algorithm seeksconsensus on the posterior distribution, rather than on parame-ters of the posterior distribution, thus giving flexibility to localparametric approximations by allowing each Gaussian mixtureto have an optimal yet possibly nonuniform number of com-ponents. To address the challenge in fusion, we design algo-rithms based on importance sampling [24] to fuse Gaussianmixtures nonlinearly within each consensus step. Finally, weprove the convergence of the proposed distributed particle filter-ing algorithm and demonstrate its advantages through numericalexamples.

The rest of the paper is organized as follows. Section II in-troduces the sensor network model and the state-space model.Section III introduces centralized particle filtering. Section IVpresents our distributed particle filtering algorithm. Section Vanalyzes the performance of the proposed algorithm. Section VIpresents numerical examples, and Section VII concludes thepaper.

II. PROBLEM FORMULATION

A. Network Model

We model a sensor network as a graph G = (V ,E), whereV = {S1 , S2 , . . . , SK } is the set of vertices, corresponding tosensors, with cardinality |V | = K, and E ⊂ V × V is the set ofedges, corresponding to communication links between sensors.We assume each communication link to be bidirectional, in thesense that sensors can transmit information in either directionthrough the link. With no particular direction assigned to anyedge, we assume the graph G to be undirected. We restrict eachcommunication link to a local neighborhood defined as an areawithin a circle of radius ρ, in the sense that a sensor can directlycommunicate only with its neighbors. Also, we assume that

the graph G is connected, or in other words that there existsa multi-hop communication route connecting any two sensorsin the network. Moreover, we assume the sensor network to besynchronous or, if not, synchronized via a clock synchronizationscheme [25]–[27].

B. Signal Model

We consider a single moving target to be observed by thesensor network. We connect target state transition with sensorobservation using a discrete-time state-space model,{

xn = g(xn−1) + un

yn,k = hk (xn ) + vn,k (k = 1, 2, . . . ,K), (1)

where1) xn ∈ Rd is the target state at the nth time point;2) yn,k ∈ Rbk is the observation taken by Sk at the nth time

point;3) g is a known state transition function;4) hk is a known observation function of Sk ;5) both {un} and {vn,k} are uncorrelated additive noise;6) the distribution of x0 is given as prior information;7) state transition is Markovian, i.e., past and future states

are conditionally independent, given the current state;8) the current observation is conditionally independent of

past states and observations, given the current state.

C. Goal

The goal is to sequentially estimate the current state xn basedon the estimate of the preceding state xn−1 and the newly avail-able observations {yn,1 ,yn,2 , . . . ,yn,K }.

D. Notation

We denote consecutive states {x1 ,x2 , . . . ,xn} as x1:n , ob-servations taken by the whole network at the nth time point{yn,1 ,yn,2 , . . . ,yn,K } as yn , and consecutive observationstaken by the whole network {y1 ,y2 , . . . ,yn} as y1:n . We usef to denote a probability density function (pdf) and q to denotethe pdf of a proposal distribution in importance sampling.

III. CENTRALIZED PARTICLE FILTERING

The problem formulated in Section II is a filtering problem.A filtering problem is often solved by a particle filter when thestate-space model is nonlinear or the noise is non-Gaussian. Aparticle filter can be implemented in a centralized fashion bycollecting observations from all the sensors in the network andprocessing them together.

A centralized particle filter approximates the posterior distri-bution of the current state, f(xn |y1:n ), as a weighted ensembleof Monte Carlo samples (also known as particles):

f(xn |y1:n ) ≈M∑

m=1

w(m )n δ(xn − x(m )

n ), (2)

where M is the total number of particles, x(m )n is the mth

particle, w(m )n is the weight of x

(m )n with

∑Mm=1 w

(m )n = 1,

Page 3: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

and δ is the Dirac delta function. Using importance sampling,a particle is generated according to a proposal distributionq(xn |x(m )

n−1 ,yn ), and its weight is updated according to

w(m )n ∝ f(yn |x(m )

n )f(x(m )n |x(m )

n−1)

q(x(m )n |x(m )

n−1 ,yn )× w

(m )n−1 . (3)

The proposal distribution q is commonly chosen as the statetransition pdf f(xn |x(m )

n−1), which, although slightly inefficient,yields a convenient weight update rule:

w(m )n ∝ f(yn |x(m )

n ) × w(m )n−1 . (4)

The global likelihood function f(yn |x(m )n ) in (4) can be factor-

ized into a product of local likelihood functions,

f(yn |x(m )n ) =

K∏k=1

f(yn,k |x(m )n ), (5)

thus providing a centralized fusion rule.As time goes on, due to the finite number of particles, the

weight in an ensemble tends to be concentrated in only a fewparticles, resulting in a small effective sample size and thus apoor approximation. When an ensemble’s effective sample sizefalls below a threshold, a possible remedy is to resample theparticles according to their weights. A popularly used estimateof the effective sample size of an ensemble is

Me =

[M∑

m=1

(w(m )n )2

]−1

, (6)

and the threshold can be set as, for example, 60% of the originalsample size M , or 100% if the plan is to resample in everyiteration.

Although centralized particle filtering is optimal in estimationaccuracy, it is impractical for large-scale sensor networks. First,it expends considerable energy and bandwidth on transmittingraw measurements from everywhere in the network to a commonfusion center. Second, it causes severely unbalanced energyconsumption and communication traffic in the network, becausesensors located near the fusion center relay many more messagesthan those located far away. Further, reliance on a commonfusion center makes it vulnerable to a single point of failure.Moreover, it does not scale with the network size. Therefore, itis often preferable to perform distributed particle filtering.

IV. DISTRIBUTED PARTICLE FILTERING

In distributed particle filtering, every sensor in the networkperforms local particle filtering on its own observation and thencommunicates with its neighbors for data fusion, thus achievingcentralized filtering in a distributed fashion.

A. Consensus

Consensus [4] is a type of data fusion algorithm in whichevery sensor in the network iteratively communicates with itsneighbors and updates its own belief based on its neighbors’until all the sensors hold the same belief. Consensus has the

following advantages in distributed data fusion. First, it ends upwith a global estimate available at each sensor in the network,so that the network is robust to sensor failures and every sensorin the network is ready to react based on the global estimate.Second, it requires only local communications and does notneed global routing. Last but not least, it is robust to changesin the network topology. In this paper, we fuse local posteriorsprovided by different sensors via consensus, so that every sensorin the network ultimately obtains a global posterior.

Likelihood factorization in (5), as mentioned in Section III,makes data fusion convenient, because its logarithmic form(from now on, we assume that every pdf is positive over itssupport, so that its logarithm is well defined)

log f(yn |xn ) =K∑

k=1

log f(yn,k |xn ) (7)

gives rise to a straightforward implementation of an averageconsensus algorithm [4]. However, unlike a prior or posteriordensity function, a likelihood function is generally difficult toapproximate parametrically through a universal approach suchas the Gaussian mixture model. This difficulty motivates us tocommunicate posterior density functions, instead of likelihoodfunctions, in average consensus.

To derive a distributed fusion rule for posteriors, we start froma centralized approach to posterior fusion [28].

Due to conditional independence, a likelihood function canbe equivalently written as

f(yn,k |xn ) = f(yn,k |xn ,y1:n−1), (8)

which, according to Bayes’ theorem, can be rewritten as

f(yn,k |xn ) =f(xn |yn,k ,y1:n−1)f(yn,k |y1:n−1)

f(xn |y1:n−1). (9)

Substitute (9) into (7), and we get

logf(xn |y1:n )f(yn |y1:n−1)

f(xn |y1:n−1)

=K∑

k=1

logf(xn |yn,k ,y1:n−1)f(yn,k |y1:n−1)

f(xn |y1:n−1), (10)

which simplifies to

log f(xn |y1:n ) + (K − 1) log f(xn |y1:n−1)

=K∑

k=1

log f(xn |yn,k ,y1:n−1) + const, (11)

where “const” represents a constant term equal to

− log f(yn |y1:n−1) +K∑

k=1

log f(yn,k |y1:n−1). (12)

Because the constant term is not a function of the state variablexn , we do not have to explicitly compute it, when we computethe distribution of xn .

Equation (11) presents a centralized fusion rule for local pos-teriors: f(xn |yn,k ,y1:n−1) on the right-hand side of (11) is the

Page 4: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 283

local posterior of xn by Sk , while f(xn |y1:n ) on the left-handside of (11) is the global posterior of xn by the whole network.There are two other terms in (11), namely the constant term andthe prediction term. The constant term will disappear when wenormalize f(xn |y1:n ) so that it integrates to 1; the predictionterm f(xn |y1:n−1) can be calculated as

f(xn |y1:n−1) =∫

Rd

f(xn |xn−1)f(xn−1 |y1:n−1)dxn−1 ,

(13)

where f(xn |xn−1) is available from the state transition model,and f(xn−1 |y1:n−1), i.e., the global posterior of the last state,is available at each sensor thanks to the consensus algorithmperformed during the last time step.

The centralized fusion rule (11) can be implemented in adistributed manner through an average consensus algorithm.Denoting f(xn |yn,k ,y1:n−1) as η

(0)k (xn ), the summation on

the right-hand side of (11) can be computed iteratively based ona two-step distributed fusion rule:

Step 1: log η(i+1)k (xn ) =

∑j∈Nk

εkj log η(i)j (xn ), (14)

Step 2: Normalize η(i+1)k (xn ), (15)

where η(i)k (xn ) is the posterior density function of xn held

by Sk in the ith iteration of the average consensus algorithmduring the nth time step, Nk is the neighborhood of Sk with Sk

included, and εkj is the Metropolis weight [29] defined as

εkj =

⎧⎪⎨⎪⎩

1/max{|Nk |, |Nj |} if (k, j) ∈ E

1 −∑l∈Nk

εkl if k = j

0 otherwise

. (16)

We call (14) the distributed fusion step and (15) the normal-ization step. In the distributed fusion step, every sensor sendsits current belief to its neighbors and updates it with beliefs re-ceived from its neighbors; in the normalization step, an updatedbelief is normalized so that it appears as a valid probabilitydensity function and can be parametrically represented as aGaussian mixture model for future communication. Note that(14) and (15) can also be reached from another perspective byminimizing the weighted average Kullback-Leibler (KL) dis-tance between the fused posterior and the posteriors to be fusedand then taking the logarithm, as shown in [21].

In Section V-A, we show that under certain conditions, wehave for ∀ k

limi→∞

log η(i)k (xn ) =

1K

K∑j=1

log η(0)j (xn ) + const, (17)

where the constant term is added simply for the purpose ofnormalization. Combining (11) and (17), we have for ∀ k

f(xn |y1:n ) ∝(limi→∞ η

(i)k (xn )

)K

f(xn |y1:n−1)K−1 , (18)

which, called the recovery step, concludes the consensus-baseddistributed particle filtering.

Algorithm 1: GM Learning from Weighted Samples.

1: procedure GMLEARN({xi , wi}Mi=1 , C)

2: initialize C (if not given) and {αc , µc , Σc}Cc=1

3: repeat4: for i = 1 to M do � E-step5: for c = 1 to C do6: pi,c = αc N (xi |µc ,Σc)7: end for8: normalize {pi,c}C

c=19: end for

10: for c = 1 to C do � M-step11: αc =

∑Mi=1 pi,cwi

12: µc = α−1c

∑Mi=1 pi,cwixi

13: Σc =α−1c

∑Mi=1 pi,cwi(xi−µc)(xi−µc)T

14: end for15: normalize {αc}C

c=116: until convergence17: return GM = {αc,µc ,Σc}C

c=118: end procedure

Note that (18) is the final result of the proposed distributedfusion approach, calculated individually at each sensor based onthe posterior it holds locally at the end of the average consensusalgorithm. As a distributed fusion result, (18) is also validatedby the centralized fusion result in [28], which, in contrast, iscalculated centrally at a global fusion center based on localposteriors provided by all the sensors in the network.

B. Gaussian Mixture Model

Consensus necessitates inter-sensor communication. Com-munication is a major source of energy consumption for wire-less sensor networks. Since wireless sensor networks are usuallysubject to strong energy constraints, it is important to minimizethe amount of communication needed in consensus. A possiblesolution to communication minimization is to compress the datato be transmitted. In this paper, we compress all the posteriorsin the distributed fusion step (14) and the recovery step (18) intoGaussian mixtures [22].

A Gaussian mixture is a convex combination of Gaussiancomponents as follows,

η(i)k (xn ) ≈

C∑c=1

αc N (xn ;µc ,Σc) , (19)

where C is the total number of components, and αc , µc , andΣc are the weight, mean, and covariance matrix, respectively,of the cth component.

A Gaussian mixture model can be used to approximate anarbitrary probability distribution, and is often learned via theexpectation-maximization (EM) algorithm [30] from samplesgenerated from the underlying distribution. In particle filtering,samples are often weighted due to importance sampling, andthus we need to learn a Gaussian mixture model from weightedsamples using the weighted EM algorithm [31], as summarizedin Algorithm 1.

Page 5: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

284 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

The convergence of Algorithm 1 can be determined in variousways. In this paper, we terminate Algorithm 1 when the abso-lute difference between the log-likelihoods of the current andprevious Gaussian mixture models is smaller than a chosen per-centage of the absolute difference between the log-likelihoodsof the current and initial models.

Note that the EM algorithm is often computationally ineffi-cient and thus might not be a good choice for real-time applica-tions. For real-time applications, an efficient Gaussian mixturelearning method [32] can be used, but the discussion of efficientGaussian mixture learning is beyond the scope of this paper.Also, the EM algorithm is not adaptive in terms of the numberof components in the Gaussian mixture model. To obtain adap-tivity, an adaptive Gaussian mixture learning method [33]–[36]can be used, but the discussion of adaptive Gaussian mixturelearning is also beyond the scope of this paper.

C. Fusion of Gaussian Mixtures

With posteriors represented as Gaussian mixtures, the fu-sion of Gaussian mixtures has to be considered for both thedistributed fusion step (14) and the recovery step (18). For con-venience, we convert the distributed fusion step (14) from thelogarithmic form to the exponential form:

η(i+1)k (xn ) =

∏j∈Nk

(i)j (xn )

)εk j

. (20)

Both (18) and (20) involve a product of powers of Gaussianmixtures, which is unfortunately intractable to compute ana-lytically. Therefore, we consider importance sampling. In [37],various methods are proposed to sample from a product of Gaus-sian mixtures, but unfortunately none of them directly appliesto our problem, because of the negative exponent in (18) andthe fractional exponents in (20). In this paper, we extend themixture importance sampling approach presented in [37] to ageneral case where Gaussian mixtures in the product can havefractional or negative exponents, and propose a weighted mix-ture importance sampling approach for the fusion of Gaussianmixtures in both (18) and (20).

1) Distributed Fusion Step: We generate samples from eachGaussian mixture to be fused and assign to them importanceweights calculated under their corresponding proposal distri-butions. For each j ∈ Nk , we draw Mj samples {x(j,m )

n }Mj

m=1

from η(i)j (xn ) and assign to each x

(j,m )n an importance weight

w(j,m )n calculated as

w(j,m )n =

(i)j (x(j,m )

n ))−1 ∏

l∈Nk

(i)l (x(j,m )

n ))εk l

. (21)

We set Mj to be proportional to the Metropolis weight εkj , i.e.,Mj = Mεkj, where is the floor function and M is thegiven total number of samples to be drawn (the total number ofthus generated samples might be smaller than M due to round-ing, but could be manually adjusted back to M by distributingthe unused quota to some of the Gaussian mixtures to be fused).After applying the normalization step (15) to {w(j,m )

n }, a Gaus-sian mixture model of the updated posterior η

(i+1)k (xn ) can

Algorithm 2: GM Fusion.

1: procedure GMFUSE(GMk , {GMj}j∈Nk)

2: initialize M , {εkj}j∈Nk

3: for j in Nk do4: Mj = Mεkj5: generate {x(m )

j }Mj

m=1 from GMj

6: for m = 1 to Mj do7: w

(m )j = GMj (x

(m )j )−1 ∏

l∈Nk

GMl(x(m )j )εk , l

8: end for9: end for

10: normalize {w(m )j }

11: return GMLEARN({x(m )j , w

(m )j })

12: end procedure

be learned from the weighted samples {x(j,m )n , w

(j,m )n } using

Algorithm 1.Here, the proposed approach draws samples from each Gaus-

sian mixture to be fused, so that the drawn samples cover mostof the support of the fused density function. Since multiple pro-posal distributions are used, the proposal approach equivalentlysamples from a mixture of proposal distributions. However, wedo not have to use the whole mixture when calculating theimportance weight of each sample. Instead, since we know ex-actly which proposal distribution in the mixture each sample isdrawn from, it would be more accurate if we use the correspond-ing proposal distribution alone when calculating the importanceweight. Also, since the sampling bias introduced by each pro-posal distribution is eliminated when we divide the true densityby the corresponding importance density, importance weightscalculated under different proposal distributions are consistent.

Another contribution of the proposed approach is weightedsample allocation. As we can see, the Gaussian mixtures in (20)do not contribute equally to the product, and a Gaussian mixturewith a large exponent contributes more to the product and ismore influential in local fusion than one with a small exponent.By adjusting the contribution of each Gaussian mixture to theproposal distribution mixture according to its contribution tothe product, weighted sample allocation makes the proposaldistribution mixture closer to the product, thus improving theefficiency of importance sampling.

The weighted mixture importance sampling approach is sum-marized in Algorithm 2 for the distributed fusion step.

2) Recovery Step: We implement the recovery step (18) ina similar way via weighted mixture importance sampling. LetGMk be the fully fused posterior held by Sk , i.e., η

(∞)k (xn ),

and GMpk be the prior prediction of the current state by Sk , i.e.,f(xn |y1:n−1). We draw half of the samples from GMk and the

other half from GMpk . For a sample x(m )n drawn from GMk , its

importance weight is calculated as

w(m )n =

GMk (x(m )n )K

GMpk (x(m )n )K−1GMk (x(m )

n )=

GMk (x(m )n )K−1

GMpk (x(m )n )K−1

;

(22)

Page 6: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 285

Algorithm 3: GM Recovery.1: procedure GMRECOVER(GMk , GMpk )2: initialize M3: generate {x(m )}M/2

m=1 from GMk

4: for m = 1 to M/2 do

5: w(m ) =[GMk (x(m ))/GMpk (x(m ))

]K−1

6: end for7: generate {x(m )}M

m=M/2+1 from GMpk

8: for m = M/2 + 1 to M do

9: w(m ) =[GMk (x(m ))/GMpk (x(m ))

]K10: end for11: normalize {w(m )}M

m=112: return GMLEARN({x(m ) , w(m )}M

m=1)13: end procedure

for a sample x(m )n drawn from GMpk , its importance weight is

calculated as

w(m )n =

GMk (x(m )n )K

GMpk (x(m )n )K−1GMpk (x(m )

n )=

GMk (x(m )n )K

GMpk (x(m )n )K

.

(23)

A Gaussian mixture model of the recovered global posterior isthen learned from the weighted samples {x(m )

n , w(m )n } using

Algorithm 1. Note that we do not apply weighted sample al-location to the recovery step, because negative weights are notwell justified for allocation.

The recovery step is summarized in Algorithm 3.As we can see, the fusion of Gaussian mixtures in

Algorithms 2 and 3 depends only on the density function de-scribed by each Gaussian mixture and does not care about howmany components each Gaussian mixture has. In other words,the proposed method gives each individual sensor the flexibil-ity to choose an optimal, yet not necessarily uniform, numberof components based on its own samples, thus improving ap-proximation accuracy and efficiency. In contrast, most otherposterior-based algorithms fuse local posteriors based on theirparameters rather than the density functions described by theseparameters, and thus put structural constraints on local paramet-ric representations. For example, linear fusion of Gaussian mix-tures [17], [18] requires each mixture to have the same numberof components. This requirement gives less flexibility to sensorsand compromises adaptivity in local signal processing.

D. Summary

We summarize the proposed distributed particle filtering al-gorithm in Algorithm 4, in which “PF” is short for “particlefiltering”, and the convergence in fusion is locally determinedwhen the discrepancy among local beliefs is lower than a certainthreshold under a chosen metric or no neighbor is still sendingdata. We do not specify the exact particle filter used for localparticle filtering, because any particle filter would fit. Also, eachsensor can select its own customized particle filter, thanks to theflexibility given by posterior-based fusion.

Algorithm 4: Distributed Particle Filtering.

1: procedure DPF({x(m )n−1,k , w

(m )n−1,k}M,K

m,k=1 , yn )2: for k = 1 to K do in parallel � filtering3: {x(m )

n,k ,w(m )n,k }M

m=1=PF({x(m )n−1,k ,w

(m )n−1,k}M

m=1 ,yn,k )

4: GMpk = GMLEARN({g(x(m )n−1,k ), w(m )

n−1,k}Mm=1)

5: GMk = GMLEARN({x(m )n,k , w

(m )n,k }M

m=1)6: end for7: repeat � fusion8: for k = 1 to K do in parallel9: Sk sends GMk to Sj for ∀j ∈ Nk

10: end for11: for k = 1 to K do in parallel12: GMk = GMFUSE(GMk , {GMj}j∈Nk

)13: end for14: until convergence15: for k = 1 to K do in parallel � recovery16: GMk = GMRECOVER(GMk , GMpk )

17: generate {x(m )n,k }M

m=1 from GMk

18: end for19: return {x(m )

n,k , 1/M}M,Km,k=1

20: end procedure

V. PERFORMANCE ANALYSIS

In this section, we investigate the performance of the proposeddistributed particle filtering algorithm in terms of convergence,communication overhead, and computational complexity.

A. Convergence of Average Consensus

The proposed distributed particle filtering algorithm is builton an average consensus algorithm. A standard average con-sensus algorithm is proved to converge under certain conditionsin [4], [21], [29], and [38]. However, the proof for standardaverage consensus does not directly apply to the proposed aver-age consensus algorithm in Section IV-A, because the proposedalgorithm has an additional normalization step (15) for each sen-sor in each iteration and is thus different from standard averageconsensus. We claim the convergence of the proposed averageconsensus algorithm in (17) and, for rigorousness, show theconvergence below, based on the proof for the convergence ofstandard average consensus.

Theorem 1: After a sufficiently large number of iterationsof average consensus with normalization, the posterior held byeach sensor converges in probability to the normalized geo-metric mean of the initial local posteriors obtained from localparticle filters.

Proof: The exponential form of (14) with normalization (15)can be written as

η(i+1)k (xn ) = γ

(i+1)k

∏j∈Nk

(i)j (xn )

)εk j

, (24)

where γ(i+1)k is a constant coefficient that normalizes η

(i+1)k (xn )

so that it integrates to one. Each consensus iteration involvessuch a constant coefficient for each sensor, and the constant

Page 7: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

286 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

coefficient accumulates across iterations. We denote the part ofη

(i)k that comes purely from the fusion of the original posteriors

obtained from local particle filters (in other words, η(0)k ) as

p(i)k (xn ), and the accumulated constant coefficient that η

(i)k has

collected up to the ith iteration as λ(i)k . Then, we have η

(i)k (xn ) =

λ(i)k p

(i)k (xn ). When i = 0, we have λ

(0)k = 1 and p

(0)k (xn ) =

f(xn |yn,k ,y1:n−1); when i ≥ 1, we have

η(i+1)k (xn ) = γ

(i+1)k

∏j∈Nk

(i)j p

(i)j (xn )

)εk j

= γ(i+1)k

∏j∈Nk

(i)j

)εk j

︸ ︷︷ ︸λ

( i + 1 )k

×∏

j∈Nk

(p

(i)j (xn )

)εk j

︸ ︷︷ ︸p

( i + 1 )k (xn )

. (25)

The logarithmic form of the last term p(i+1)k (xn ) is

log p(i+1)k (xn ) =

∑j∈Nk

εkj log p(i)j (xn )

= log p(i)k (xn ) +

∑j∈Nk

εkj

(log p

(i)j (xn ) − log p

(i)k (xn )

),

(26)

which coincides with the canonical form of (weighted) averageconsensus. With the underlying graph G being connected andnot bipartite, according to [4], [21], [29], and [38], we have thefollowing convergence in probability

limi→∞

log p(i)k (xn ) =

1K

K∑k=1

log p(0)k (xn ), (27)

or equivalently,

limi→∞

p(i)k (xn ) =

K∏k=1

(p

(0)k (xn )

) 1K

=K∏

k=1

f(xn |yn,k ,y1:n−1)1K . (28)

Hence,

limi→∞

η(i)k (xn ) = lim

i→∞λ

(i)k p

(i)k (xn )

= limi→∞

λ(i)k lim

i→∞p

(i)k (xn )

=(

limi→∞

λ(i)k

) K∏k=1

f(xn |yn,k ,y1:n−1)1K , (29)

where limi→∞ η(i)k (xn ) is the posterior held by Sk at conver-

gence,∏K

k=1 f(xn |yn,k ,y1:n−1)1K is the geometric mean of

the initial local posteriors, and limi→∞ λ(i)k normalizes the ge-

ometric mean so that it exists as a valid probability densityfunction in the form of limi→∞ η

(i)k (xn ). �

Following the convergence, the global posterior can be ob-tained separately by each sensor through a recovery step, whichresults from substituting (29) into (11).

Note that the proof above is for the average consensus al-gorithm, instead of the distributed particle filtering algorithm,and thus does not involve Gaussian mixture approximations orMonte Carlo approximations. We discuss the convergence ofthe distributed particle filtering algorithm, with approximationsinvolved, in the following subsection.

B. Convergence of Distributed Particle Filtering

The proposed distributed particle filtering algorithm imple-ments the proposed average consensus algorithm with approxi-mations and asymptotically converges under the following threeassumptions: (i) the number of consensus iterations is suffi-ciently large, (ii) the number of generated samples is suffi-ciently large, and (iii) the approximation error of a Gaussianmixture model is sufficiently small. In practice, however, noneof these can be perfectly satisfied without considerable commu-nication or computation. Hence, convergence errors are usuallyinevitable. Due to independent randomness, different sensorsare likely to have different convergence errors, thus resultingin consensus errors. Although the proposed algorithm does notrequire exact consensus as weight-based algorithms do, inexactconsensus, if too significant, leads to errors in both filtering andfusion in upcoming time steps.

Consensus errors can be manually eliminated by additionalaverage consensus on the parameters of the obtained Gaussianmixture models. To promote the convergence of the average con-sensus, we match similar components between different Gaus-sian mixtures based on the Kullback-Leibler (KL) distance, andperform local parameter averaging among the matched compo-nents. As mentioned in the Introduction, parameter-based aver-age consensus is not justified by the underlying statistical modeland thus is suboptimal in the fusion of local posteriors. However,its suboptimality is not problematic here, because the method isused not for fusion but for numerical fine-tuning of beliefs thatare already close to consensus. Also, because of the closeness toconsensus, it is not expected to take many consensus iterations.

Note that parameter-based average consensus requires thatall the Gaussian mixtures to be fused have the same numberof components. To satisfy the constraint, we have to adjust thenumber of components for each Gaussian mixture in case theydo not agree. We achieve this via sampling. More specifically,we first sample from each Gaussian mixture and then learn fromthe samples a Gaussian mixture model with a specified uniformnumber of components.

C. Communication Overhead

In the proposed algorithm, posteriors are transmitted be-tween sensors in the form of Gaussian mixtures. Let C bethe average number of components in these Gaussian mix-tures, then we need to transmit C(d2 + d + 1) numbers perGaussian mixture, with d being the state dimension. Sincecovariance matrices are symmetric, we only need to transmit(d2 + d)/2, instead of d2 , numbers for each covariance matrixin a Gaussian mixture. Also, since component weights sum toone, we only need to transmit C − 1, instead of C, componentweights. Thus, the actual count of numbers needed to represent a

Page 8: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 287

Gaussian mixture is Cd2/2 + (C/2 + 1)d + C − 1. In a con-sensus iteration, each communication link is used once in eachdirection, so the total number of Gaussian mixtures transmit-ted in a consensus iteration is 2|E|. Let L be the number ofconsensus iterations, and then the proposed algorithm commu-nicates 2|E|L[Cd2/2 + (C/2 + 1)d + C − 1] numbers in to-tal and 2|E|L[Cd2/2 + (C/2 + 1)d + C − 1]/K numbers persensor during each time step. Since |E| ranges from O(K) toO(K2) for a connected graph, the communication complexityper sensor is between O(LCd2) and O(KLCd2).

In comparison, the count of numbers transmitted by each sen-sor in a weight-based algorithm is proportional to the numberof particles, which is proved to grow exponentially with thestate dimensionality d for a successful particle filter [39]; thecommunication complexity of a likelihood-based algorithm iscombinatorial in the state dimension, because the number of re-gression coefficients needed for the polynomial approximationpresented in [11] is combinatorial with the state dimension; thecommunication complexity of a posterior-based algorithm withthe Gaussian approximations is quadratic in the state dimen-sion, with d and (d2 + d)/2 numbers to represent the mean andcovariance matrix, respectively, of a Gaussian distribution.

It is hard to directly compare the communication complex-ity of the proposed algorithm with those of other algorithms,because the dependence of L and C on d is mostly problem-specific. In Section VI-E, we compare the actual communicationcosts of different algorithms through numerical examples.

D. Computational Complexity

We now investigate the computational complexity of dis-tributed particle filtering algorithms, focusing on the fusion partwith the filtering part excluded, because the latter has almost thesame complexity among different algorithms. Since a distributedparticle filtering algorithm runs at each sensor in parallel, weonly consider the computation performed at a single sensor.

The proposed algorithm calls Algorithms 1, 2, and 3.Algorithm 1 (GM learning) costs O(LgMgCd2) to learn aGaussian mixture of C components from Mg samples in Lg

iterations. Algorithm 2 (GM fusion) calls Algorithm 1 onceand costs O(KMf Cd2) in addition to calling Algorithm 1,where K means that a sensor has at most O(K) neighbors andMf is the sample size for importance sampling in distributedfusion. Algorithm 3 (GM recovery) also calls Algorithm 1once and costs O(MrCd2) in addition to calling Algorithm 1,with Mr being the sample size for importance sampling inrecovery. In addition to Algorithms 1, 2, and 3, the pro-posed algorithm calls the parameter-based average consensusalgorithm for numerical fine-tuning, which costs O(KCd2) ineach iteration. In summary, if we assume that the proposedalgorithm takes Lf iterations for distributed fusion and Lp iter-ations for fine-tuning, then the overall computational complex-ity is O((Lf LgMg + KLf Mf + Mr + KLp)Cd2). Assumingthat Mg , Mf , and Mr are all O(M), then the complexity sim-plifies to O([(Lg + K)Lf M + KLp ]Cd2).

In comparison, a likelihood-based algorithm [11]costs O(R3 + (M + q)R2 + (d + q)MR) on polynomial

approximation (M is the sample size, q is the dimension of thestate function appearing in factorization, and R is the dimensionof the polynomial basis expansion) and O(LKR) on consensus(L is the number of consensus iterations). Thus, the overallcomplexity is O(R3 + (M + q)R2 + [(d + q)M + LK]R).Since R itself is a combinatorial function of the state dimensiond, the cubic function of R might make the algorithm scalepoorly in high-dimensional systems. A weight-based algorithm[9] only needs to perform average consensus on weightswith a computational complexity of O(LKM). A Gaussianposterior-based algorithm costs O(LKd3). Generally, theproposed algorithm and the likelihood-based algorithm requiremore computation than the weight-based algorithm and theGaussian posterior-based algorithm. The former two algorithmsuse a certain compact representation for inter-sensor com-munication and thus need to enclose information in and readinformation out of the representation. Such a representationincurs much lower communication overhead than particles, andprovides a more accurate approximation than a single Gaussiandistribution, as shown in Section VI.

VI. NUMERICAL EXAMPLES

In this section, we demonstrate the performance of the pro-posed distributed particle filtering algorithm in comparisonwith weight-based, likelihood-based, and other posterior-basedalgorithms, through numerical examples of distributed targettracking.

A. General Settings

We considered a wireless sensor network consisting of 20sensors programmed to track a moving target.

The target followed a Wiener process acceleration model [40]in two-dimensional space. The target state consisted of the po-sition, velocity, and acceleration of the target in each dimensionas

xn =[xn,1 xn,2 xn,1 xn,2 xn,1 xn,2

]T(30)

The state transition function was

g(xn ) = D · xn , (31)

where

D =

⎡⎢⎢⎢⎢⎢⎢⎣

1 0 t 0 12 t2 0

0 1 0 t 0 12 t2

0 0 1 0 t 00 0 0 1 0 t0 0 0 0 1 00 0 0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎦

(32)

with t being the state transition interval. The state transi-tion noise un followed a multivariate Gaussian distribution

Page 9: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

288 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

Fig. 1. Wireless sensor network, communication links, and target trajectory.

N (0,R), where

R = σ2u

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

120 t5 0 1

8 t4 0 16 t3 0

0 120 t5 0 1

8 t4 0 16 t3

18 t4 0 1

3 t3 0 12 t2 0

0 18 t4 0 1

3 t3 0 12 t2

16 t3 0 1

2 t2 0 t 00 1

6 t3 0 12 t2 0 t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

. (33)

We assumed that the target traveled over 30 unit-length statetransition intervals.

Each sensor measured the range and range rate (Doppler) ofthe target. The kth sensor, Sk , was located at lk = (lk,1 , lk ,2)with the observation function

hk (xn ) =[hk,range(xn ) hk,doppler(xn )

]T, (34)

where

hk,range(xn ) =√

(xn,1 − lk,1)2 + (xn,2 − lk,2)2 (35)

and

hk,doppler(xn ) =xn,1(xn,1 − lk,1) + xn,2(xn,2 − lk,2)√

(xn,1 − lk,1)2 + (xn,2 − lk,2)2, (36)

and the observation noise

vn,k ∼ N([

00

],

[σ2

v 00 σ2

w

]). (37)

We set σu as 0.5, σv as 1, and σw as 1. We set the neigh-borhood radius threshold ρ according to the sensor locations toensure that the network was connected. We set the initial targetstate x0 as 0, and assumed N (x0 ,R) as the prior informationavailable to each sensor. Fig. 1 shows a wireless sensor networkand a realization of the target trajectory under random noise.In a predefined fashion, we assumed three components for anyGaussian mixture. We used the sampling importance resam-pling (SIR) particle filter [1] for local particle filtering. We used

20,000 samples for importance sampling in both distributed fu-sion and recovery of the proposed algorithm.

We compared the proposed algorithm (“Optimal GM”) withother posterior-based algorithms, including the Bayesian fusionof Gaussian approximations (“Bayesian Gauss”) [13] and thelinear fusion of Gaussian mixtures (“Linear GM”) [18]. Wealso compared it with a representative weight-based algorithm[9], likelihood-based algorithm [11], and distributed unscentedKalman filter (UKF) [41], which can be also considered as aposterior-based algorithm, although it does not involve particlefiltering. Moreover, we compared it with centralized particlefiltering, which served as a benchmark. We tested all the al-gorithms on repeated experiments and compared their averageperformance.

B. Metrics

We considered the posterior mean as a point estimate of eachstate and used the root-mean-square error (RMSE) to quantifythe performance. For a single state xn , the RMSE of an esti-mate xn was defined as ||xn − xn ||, namely the l-2 norm ofxn − xn ; for a state sequence of length T , i.e., {xn}T

n=1 , theaverage RMSE (ARMSE) of a sequence estimate, {xn}T

n=1 ,

was defined as√

1T

∑Tn=1 ||xn − xn ||2 . In a network that per-

forms distributed filtering, each sensor holds a separate globalestimate and thus has its own RMSE and ARMSE. We used theiraverages to quantify the performance of the whole network. Weused the Kullback-Leibler (KL) distance [42] to describe thedissimilarity between two Gaussian mixtures. Since it is ana-lytically intractable to compute the KL distance between twoGaussian mixtures, we approximated it using the first Gaussianapproximation approach introduced in [43].

C. Accuracy

We tested all the methods on repeated experiments to com-pare their trajectory estimation accuracy as an ensemble average.Fig. 2 compares the ARMSEs as a function of the number ofparticles, under sufficient consensus iterations. We can see thatthe error of the proposed method varied the most with the num-ber of particles. With 2000 particles, its error was the secondhighest; with no less than 8000 particles, its error was closeto that of centralized particle filtering and no higher than thatof any other method. The performance of the proposed methodvaried significantly because the approximation accuracy of aGaussian mixture is strongly affected by the number of particlesused in local particle filtering. In contrast, the error of BayesianGauss, also a posterior-based method, stayed almost constantacross different numbers of particles, because the accuracy of aGaussian approximation, consisting of a mean and a covariancematrix only, is relatively robust to the number of particles usedto represent a local posterior. The error of Linear GM, anotherposterior-based method, did not vary much with the number ofparticles either, because Linear GM failed to benefit from theincreased number of particles due to its unjustified fusion rule.The errors of both the likelihood-based and weight-based meth-ods dropped as the number of particles increased. Their errors

Page 10: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 289

Fig. 2. A comparison in the trajectory estimation ARMSE as a function of thenumber of particles.

Fig. 3. A comparison in the state estimation RMSE as a function of time.

were lower than that of the proposed method when the numberof particles was small, and comparable to that of the proposedmethod when the number of particles was either medium orlarge. In summary, the proposed method was the most accurateamong all the posterior-based methods, and competitive with thelikelihood-based and weight-based methods, when the numberof particles was not too small.

We also investigated the state estimation accuracy of eachmethod. For each method, we used the number of particles cor-responding to its elbow point in Fig. 2, i.e., 10 000 particlesfor the proposed method, 2000 for Bayesian Gauss, 4000 forLinear GM, 10 000 for the weight-based method, 8000 for thelikelihood-based method, and 6000 for centralized particle fil-tering. In Fig. 3, we show the state estimation RMSE of eachmethod as a function of time along the trajectory of the tar-get. We can see that the proposed method, the likelihood-basedmethod, the weight-based method, and centralized particle fil-tering had state estimation errors at almost the same level, whileLinear GM, Bayesian Gauss, and distributed UKF suffered from

Fig. 4. KL distance and state estimation RMSE across iterations during the10th time step using the proposed method.

high errors at many time points. Among all the methods, LinearGM obviously yielded the highest errors.

D. Consensus

We investigated the consensus process of the proposedmethod within a single time step. Fig. 4 shows the consen-sus process during the 10th time step as an example, in whichwe applied the proposed method, with 10 000 particles for localfiltering, 20 iterations for average consensus, and 20 iterationsfor numerical fine-tuning, to the example in Fig. 1. We can seethat in both average consensus and numerical fine-tuning, boththe KL distance and the RMSE dropped and converged as thealgorithm proceeded, which demonstrated the validity of theproposed average consensus algorithm in terms of convergence.Note that the metrics in Fig. 4 were computed based on unre-covered beliefs for average consensus and recovered beliefs fornumerical fine-tuning, so they came in different scales.

E. Communication Overhead

We investigated the communication overhead of each methodand the relationship between communication overhead and es-timation accuracy. For each method, we fixed the number ofparticles at its elbow point, as specified in Section VI-C, andinvestigated its performance, as an ensemble average fromrepeated experiments, under a varied number of consensusiterations.

In Fig. 5, we demonstrate the effect of the number of con-sensus iterations on the performance of a distributed filteringalgorithm. As we can see, the error of each method dropped asthe number of iterations increased and stayed constant beyonda certain threshold.

In Fig. 6, we show the trajectory estimation ARMSE ofeach method as a function of the communication overheadper time step. We used the count of numbers transmitted be-tween sensors in the network to quantify the communicationoverhead of each method. As expected, there was a trade-offbetween estimation accuracy and communication efficiency.

Page 11: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

290 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

Fig. 5. Trajectory estimation ARMSE as a function of the number of consen-sus iterations.

Fig. 6. Trajectory estimation ARMSE as a function of the communicationcost per time step.

For each method, the estimation error dropped as the com-munication overhead increased, but stayed almost constant be-yond a certain threshold. The proposed method, the weight-based method, and the likelihood-based method had errors atthe same level but communication costs of different orders ofmagnitude. The weight-based method, which communicatednon-parametric representations, transmitted significantly morenumbers than the proposed method and the likelihood-basedmethod, both of which communicated parametric represen-tations. The likelihood-based method, which used polyno-mial approximations, transmitted more numbers than the pro-posed method, which used Gaussian mixture approximations.Bayesian Gauss and distributed UKF, both posterior-basedmethods, had errors at the same level, higher than that ofthe proposed method, due to the insufficient approximation

Fig. 7. Trajectory estimation ARMSE and average degree as functions of thelocal communication radius.

accuracy of Gaussian approximations. The communicationoverhead of distributed UKF was close to that of the proposedmethod, while that of Bayesian Gauss was lower than that ofany other method in Fig. 6. Note that the trade-off betweenestimation accuracy and communication efficiency existed notonly within each method, but also between different methods.As we can see, the proposed method was more accurate thanBayesian Gauss, benefiting from the upgrade from Gaussianapproximations to Gaussian mixture approximations, but in themeantime incurred extra communication overhead due to theupgrade. Given the significant improvement in accuracy, weclaim that the extra communication incurred by the Gaussianmixture model used in the proposed method was justified.

F. Local Communication Radius

The local communication radius determines the number ofneighbors for each sensor. Fig. 7 shows the effect of the radiuson the performance of distributed particle filtering methods,with both the number of particles and the number of consensusiterations fixed at the respective elbow points corresponding toeach method. The simulations were conducted on the networkin Fig. 1, whose default radius was 48. As we can see in Fig. 7,when the radius was increasing below the default radius, the er-rors of distributed UKF and weight-based method decreased dra-matically, and those of the proposed method, Bayesian Gauss,and the likelihood-based method decreased slightly; when theradius was increasing above the default radius, the error of eachmethod either stayed constant or decreased slightly. In fact, theradius controls the rate of consensus. When the radius is small,it might takes many iterations of communication for informa-tion to be transmitted from a sensor to another in the network;when the radius is sufficiently large, a sensor can communicatedirectly with any other sensor in the network, and the networkbecomes effectively centralized. When the number of consensusiterations is fixed, the radius effectively controls the progress ofconsensus. Thus, when a radius is large enough for the net-work to reach consensus within a given number of consensus

Page 12: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

LI AND NEHORAI: DISTRIBUTED PARTICLE FILTERING VIA OPTIMAL FUSION OF GAUSSIAN MIXTURES 291

Iterations, it would not help much to further increase the radius,as shown in Fig. 7. Also, a large radius adds to the difficulty ofwireless communication, and thus might not be always desirablein distributed fusion.

VII. CONCLUSION

In this paper, we proposed a distributed particle filtering algo-rithm based on optimal fusion of local posteriors approximatedas Gaussian mixtures. We implemented the optimal fusion rulein a distributed fashion via an average consensus algorithm. Wederived a distributed fusion rule for the consensus algorithmand performed the fusion of Gaussian mixtures via a proposedvariant of importance sampling. With an extra normalizationstep involved in the distributed fusion rule, the convergence ofthe proposed average consensus algorithm does not directly fol-low from that of a standard average consensus algorithm. Wetherefore proved the convergence of the proposed average con-sensus algorithm and then validated it with numerical examples.We also demonstrated through numerical examples that the pro-posed distributed particle filtering algorithm is more accuratethan other posterior-based algorithms, and approaches central-ized particle filtering when sufficient particles are used for localfiltering. We compared communication costs and showed thatthe proposed algorithm incurs a lower communication cost thanthe weight-based and likelihood-based algorithms, thanks to thecompact representation of the Gaussian mixture model, and thatthe extra communication cost on the use of the Gaussian mixturemodel, instead of a Gaussian approximation, in the proposed al-gorithm is justified by the improvement in accuracy.

The advantages of the proposed distributed particle filter-ing algorithm extend beyond accuracy and communication effi-ciency. As a posterior-based algorithm, it allows diverse sensingmodalities and filtering tools to be exploited by the network; byperforming importance sampling in fusion, it does not requireuniformity in local approximations but allows each sensor toapproximate its local belief as a Gaussian mixture with an adap-tively determined number of mixture components based on itsown data. Although adaptive Gaussian mixture learning is nota focus of this paper, we will explore efficient solutions in fu-ture work. Also, we will study analytical approximations to theproduct of powers of Gaussian mixtures. Moreover, we will ex-plore gossip algorithms to be used in the fusion of posteriors indistributed particle filtering.

REFERENCES

[1] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on par-ticle filters for online nonlinear/non-Gaussian Bayesian tracking,” IEEETrans. Signal Process., vol. 50, no. 2, pp. 174–188, Feb. 2002.

[2] R. E. Kalman, “A new approach to linear filtering and prediction prob-lems,” J. Basic Eng., vol. 82, no. 1, pp. 35–45, 1960.

[3] O. Hlinka, F. Hlawatsch, and P. M. Djuri, “Distributed particle filtering inagent networks,” IEEE Signal Process. Mag., vol. 1, no. 30, pp. 61–81,Jan. 2013.

[4] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperationin networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1, pp. 215–233, Jan. 2007.

[5] D. Ustebay, M. Coates, and M. Rabbat, “Distributed auxiliary par-ticle filters using selective gossip,” in Proc. 2011 IEEE Int. Conf.Acoust., Speech Signal Process., Prague, Czech Republic, May 2011,pp. 3296–3299.

[6] C. J. Bordin and M. G. S. Bruno, “Consensus-based distributed particle fil-tering algorithms for cooperative blind equalization in receiver networks,”in Proc. 2011 IEEE Int. Conf. Acoust., Speech Signal Process., Prague,Czech Republic, May 2011, pp. 3968–3971.

[7] C. J. Bordin and M. G. Bruno, “Distributed particle filtering for blindequalization in receiver networks using marginal non-parametric approx-imations,” in Proc. 2014 IEEE Int. Conf. Acoust., Speech Signal Process.,Florence, Italy, May 2014, pp. 7984–7987.

[8] S. Farahmand, S. Roumeliotis, and G. B. Giannakis, “Set-membershipconstrained particle filter: Distributed adaptation for sensor net-works,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4122–4138,Jun. 2011.

[9] V. Savic, H. Wymeersch, and S. Zazo, “Belief consensus algorithms for fastdistributed target tracking in wireless sensor networks,” Signal Process.,vol. 95, pp. 149–160, Feb. 2014.

[10] O. Hlinka, F. Hlawatsch, and P. M. Djuric, “Likelihood consensus-baseddistributed particle filtering with distributed proposal density adaptation,”in Proc. 2012 IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto,Japan, Mar. 2012, pp. 3869–3872.

[11] O. Hlinka, O. Sluciak, F. Hlawatsch, P. M. Djuric, and M. Rupp,“Likelihood consensus and its application to distributed particle fil-tering,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4334–4349,Aug. 2012.

[12] O. Sluciak, O. Hlinka, M. Rupp, F. Hlawatsch, and P. M. Djuric, “Se-quential likelihood consensus and its application to distributed parti-cle filtering with reduced communications and latency,” in Proc. 45thAsilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, Nov. 2011,pp. 1766–1770.

[13] B. N. Oreshkin and M. J. Coates, “Asynchronous distributed particle filtervia decentralized evaluation of Gaussian products,” in Proc. 13th Int. Conf.Inform. Fusion, Edinburgh, U.K., Jul. 2010, pp. 1–8.

[14] A. Mohammadi and A. Asif, “Consensus-based distributed unscentedparticle filter,” in Proc. 2011 IEEE Statistical Signal Process. Workshop,Nice, France, Jun. 2011, pp. 237–240.

[15] J. K. Uhlmann, “Dynamic map building and localization: New theoreticalfoundations,” Ph.D. dissertation, Dept. Eng. Sci., Univ. Oxford, Oxford,U.K., 1995.

[16] S. J. Julier and J. K. Uhlmann, “A non-divergent estimation algorithm inthe presence of unknown correlations,” in Proc. 1997 Am. Control Conf.,Jun. 1997, vol. 4, pp. 2369–2373.

[17] D. Gu, J. Sun, Z. Hu, and H. Li, “Consensus based distributed particle filterin sensor networks,” in Proc. 2008 Int. Conf. Inform. Autom., Changsha,China, Jun. 2008, pp. 302–307.

[18] D. Gu, “Distributed particle filter for target tracking,” in Proc.2007 IEEE Int. Conf. Robot. Autom., Rome, Italy, Apr. 2007,pp. 3856–3861.

[19] S. J. Julier, “An empirical study into the use of Chernoff information forrobust, distributed fusion of gaussian mixture models,” in Proc. 9th Int.Conf. Inform. Fusion, 2006, pp. 1–8.

[20] M. B. Guldogan, “Consensus Bernoulli filter for distributed detection andtracking using multi-static Doppler shifts,” IEEE Signal Process. Lett.,vol. 21, no. 6, pp. 672–676, Jun. 2014.

[21] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and A. Graziano, “Consen-sus CPHD filter for distributed multitarget tracking.” IEEE J. Sel. TopicsSignal Process., vol. 7, no. 3, pp. 508–520, 2013.

[22] C. M. Bishop, “Mixture models and EM,” in, Pattern Recognition andMachine Learning. New York, NY, USA: Springer, 2007.

[23] P. Chavali and A. Nehorai, “Managing multi-modal sensor networks usingprice theory,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4874–4887,Jun. 2012.

[24] J. V. Candy, Bayesian Signal Processing: Classical, Modern and Par-ticle Filtering Methods. New York, NY, USA: Wiley-Interscience,2009.

[25] Y.-C. Wu, Q. Chauhari, and E. Serpedin, “Clock synchronization of wire-less sensor networks,” IEEE Signal Process. Mag., vol. 28, no. 1, pp. 124–38, Jan. 2011.

[26] O. Simeone, U. Spagnolini, Y. Bar-Ness, and S. H. Strogatz, “Dis-tributed synchronization in wireless networks,” IEEE Signal Process.Mag., vol. 25, pp. 81–97, Sep. 2008.

[27] J. Li and A. Nehorai, “Joint sequential target estimation and clock syn-chronization in wireless sensor networks,” IEEE Trans. Signal Inform.Process. Netw., vol. 1, no. 2, pp. 74–88, Jun. 2015.

[28] Y. Bar-Shalom, “Distributed multitarget multisensor tracking,” inMultitarget-Multisensor Tracking: Advanced Applications. Storrs, CT,USA: YBS Publishing, 1990, vol. 1, ch. 8.

Page 13: 280 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION …nehorai/paper/Li... · 282 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018 and δ

292 IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 4, NO. 2, JUNE 2018

[29] L. Xiao, S. Boyd, and S. Lall, “Distributed average consensus with time-varying Metropolis weights,” Automatica, Jun. 2006. [Online]. Available:http://web.stanford.edu/~boyd/papers/avg_metropolis.html

[30] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the EM algorithm,” J. Roy. Statist. Soc., SeriesB (Methodological), vol. 39, no. 1, pp. 1–38, 1977.

[31] N. R. Ahmed and M. Campbell, “Fast consistent Chernoff fusion of Gaus-sian mixtures for ad hoc sensor networks,” IEEE Trans. Signal Process.,vol. 60, no. 12, pp. 6739–6745, Nov. 2012.

[32] D. M. Blei and M. I. Jordan, “Variational inference for Dirichlet processmixtures,” Bayesian Anal., vol. 1, no. 1, pp. 121–144, 2006.

[33] P. Paclık and J. Novovicova, “Number of components and initialization inGaussian mixture model for pattern recognition,” in Artificial Neural Netsand Genetic Algorithms. New York, NY, USA: Springer, 2001, pp. 406–409.

[34] T. Huang, H. Peng, and K. Zhang, “Model selection for Gaussian mixturemodels,” arXiv:1301.3558, 2013.

[35] R. J. Steele and A. Raftery, “Performance of Bayesian model selectioncriteria for Gaussian mixture models,” in Frontiers Statistical DecisionMaking Bayesian Analysis. New York, NY, USA: Springer-Verlag, 2010,pp. 113–130.

[36] C. Olivier, F. Jouzel, and A. Matouat, “Choice of the number of componentclusters in mixture models by information criteria,” in Proc. Vis. Interface,1999, pp. 74–81.

[37] A. T. Ihler, E. B. Sudderth, W. T. Freeman, and A. S. Willsky, “Efficientmultiscale sampling from products of Gaussian mixtures,” Adv. NeuralInform. Process. Syst., vol. 16, pp. 1–8, 2004.

[38] G. Battistelli and L. Chisci, “Kullback-Leibler average, consensus onprobability densities, and distributed state estimation with guaranteed sta-bility,” Automatica, vol. 50, no. 3, pp. 707–718, 2014.

[39] C. Snyder, T. Bengtsson, P. Bickel, and J. Anderson, “Obstacles to high-dimensional particle filtering,” Monthly Weather Rev., vol. 136, no. 12,pp. 4629–4640, Dec. 2008.

[40] Y. Bar-Shalom, P. K. Willett, and X. Tian, “Introduction,” in Tracking andData Fusion: A Handbook of Algorithms. Storrs, CT: YBS Publishing,2011, ch. 1.

[41] W. Li and Y. Jia, “Consensus-based distributed multiple model UKF forjump Markov nonlinear systems,” IEEE Trans. Autom. Control, vol. 57,no. 1, pp. 227–233, Dec. 2012.

[42] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork, NY, USA: Wiley, 2012.

[43] J. R. Hershey and P. Olsen, “Approximating the Kullback Leibler diver-gence between Gaussian mixture models,” in Proc. 2007 IEEE Int. Conf.Acoust., Speech Signal Process., Apr. 2007, vol. 4, pp. 317–320.

Jichuan Li received the B.Sc. degree in electri-cal engineering from Fudan University, Shanghai,China, in 2011 and the M.Sc. and Ph.D. degreesin electrical engineering from Washington Univer-sity in St. Louis, St. Louis, MO, USA, in 2014 and2016, respectively, under the guidance of Dr. AryeNehorai. He is currently a Software Engineer atGoogle, Inc., Mountain View, CA, USA. His re-search interests include statistical signal processing,Bayesian inference, Monte Carlo methods, machinelearning, and distributed computing.

Arye Nehorai (S’80–M’83–SM’90–F’94–LF’17)received the B.Sc. and M.Sc. degrees from the Tech-nion, Israel and the Ph.D. from Stanford University,Stanford, CA, USA. He is the Eugene and MarthaLohman Professor of Electrical Engineering in thePreston M. Green Department of Electrical and Sys-tems Engineering (ESE), Washington University inSt. Louis (WUSTL), St. Louis, MO, USA. He servedas Chair of this department from 2006 to 2016. Un-der his leadership, the undergraduate enrollment hasmore than tripled and the master’s enrollment has

grown sevenfold. He is also a Professor in the Division of Biology and Biomed-ical Sciences, the Division of Biostatistics, the Department of Biomedical En-gineering, and Department of Computer Science and Engineering, and Directorof the Center for Sensor Signal and Information Processing at WUSTL. Priorto serving at WUSTL, he was a faculty member at Yale University and theUniversity of Illinois at Chicago.

He was the Editor-in-Chief of the IEEE TRANSACTIONS ON SIGNAL PRO-CESSING from 2000 to 2002. From 2003 to 2005, he was the Vice President(Publications) of the IEEE Signal Processing Society (SPS), the Chair of thePublications Board, and a member of the Executive Committee of this Society.He was the founding Editor of the special columns on Leadership Reflectionsin the IEEE SIGNAL PROCESSING MAGAZINE from 2003 to 2006.

He received the 2006 IEEE SPS Technical Achievement Award and the 2010IEEE SPS Meritorious Service Award. He was elected Distinguished Lecturerof the IEEE SPS for a term lasting from 2004 to 2005. He received several bestpaper awards in IEEE journals and conferences. In 2001, he was named Uni-versity Scholar of the University of Illinois. He was the Principal Investigatorof the Multidisciplinary University Research Initiative project titled AdaptiveWaveform Diversity for Full Spectral Dominance from 2005 to 2010. He is aFellow of the Royal Statistical Society since 1996 and Fellow of AAAS since2012.


Recommended