+ All Categories
Home > Documents > IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO....

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO....

Date post: 03-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
15
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011 4139 Performance Bounds for Expander-Based Compressed Sensing in Poisson Noise Maxim Raginsky, Member, IEEE, Sina Jafarpour, Zachary T. Harmany, Student Member, IEEE, Roummel F. Marcia, Member, IEEE, Rebecca M. Willett, Member, IEEE, and Robert Calderbank, Fellow, IEEE Abstract—This paper provides performance bounds for com- pressed sensing in the presence of Poisson noise using expander graphs. The Poisson noise model is appropriate for a variety of applications, including low-light imaging and digital streaming, where the signal-independent and/or bounded noise models used in the compressed sensing literature are no longer applicable. In this paper, we develop a novel sensing paradigm based on expander graphs and propose a maximum a posteriori (MAP) algorithm for recovering sparse or compressible signals from Poisson observations. The geometry of the expander graphs and the positivity of the corresponding sensing matrices play a cru- cial role in establishing the bounds on the signal reconstruction error of the proposed algorithm. We support our results with experimental demonstrations of reconstructing average packet arrival rates and instantaneous packet counts at a router in a communication network, where the arrivals of packets in each flow follow a Poisson process. Index Terms—Compressive measurement, expander graphs, packet counters, photon-limited imaging, RIP-1. I. INTRODUCTION T HE goal of compressive sampling or compressed sensing (CS) [1], [2] is to replace conventional sam- pling by a more efficient data acquisition framework, which generally requires fewer sensing resources. This paradigm is particularly enticing whenever the measurement process is costly or constrained in some sense. For example, in the context of photon-limited applications (such as low-light imaging), the photomultiplier tubes used within sensor arrays are physi- cally large and expensive. Similarly, when measuring network Manuscript received July 14, 2010; revised February 04, 2011; accepted April 28, 2011. Date of publication May 27, 2011; date of current version August 10, 2011. The associate editor coordinating the review of this manuscript and ap- proving it for publication was Dr. Arie Yeredor. The work of M. Raginsky, Z. T. Harmany, and R. M. Willett was supported by NSF CAREER Award CCF-06- 43947, DARPA Grant HR0011-07-1-003, and NSF Grant DMS-08-11062. The work of R. Calderbank and S. Jafarpour was supported in part by the NSF under Grant DMS 0914892, by the ONR under Grant N00014-08-1-1110, and by the AFOSR under Grant FA9550-09-1-0551. The work of R. F. Marcia was sup- ported by NSF Grant DMS-08-11062. M. Raginsky, Z. T. Harmany, and R. M. Willett are with the Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA (e-mail: [email protected]; [email protected]; [email protected]). S. Jafarpour is with the Department of Computer Science, Princeton Univer- sity, Princeton, NJ 08540 USA (e-mail: [email protected]). R. F. Marcia is with the School of Natural Sciences, University of California, Merced, CA 95343 USA (e-mail: [email protected]). R. Calderbank is with the Departments of Computer Science, Electrical En- gineering and Mathematics, Duke University, Durham, NC 27708 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2011.2157913 traffic flows, the high-speed memory used in packet counters is cost-prohibitive. These problems appear ripe for the application of CS. However, photon-limited measurements [3] and arrivals/de- partures of packets at a router [4] are commonly modeled with a Poisson probability distribution, posing significant theoretical and practical challenges in the context of CS. One of the key challenges is the fact that the measurement error variance scales with the true intensity of each measurement, so that we cannot assume constant noise variance across the collection of mea- surements. Furthermore, the measurements, the underlying true intensities, and the system models are all subject to certain phys- ical constraints, which play a significant role in performance. Recent works [5]–[8] explore methods for CS reconstruction in the presence of impulsive, sparse or exponential-family noise, but do not account for the physical constraints associated with a typical Poisson setup and do not contain the related perfor- mance bounds emphasized in this paper. In previous work [9], [10], we showed that a Poisson noise model combined with con- ventional dense CS sensing matrices (properly scaled) yielded performance bounds that were somewhat sobering relative to bounds typically found in the literature. In particular, we found that if the number of photons (or packets) available to sense were held constant, and if the number of measurements, , was above some critical threshold, then larger in general led to larger bounds on the error between the true and the estimated signals. This can intuitively be understood as resulting from the fact that dense CS measurements in the Poisson case cannot be zero-mean, and the DC offset used to ensure physical feasibility adversely impacts the noise variance. The approach considered in this paper hinges, like most CS methods, on reconstructing a signal from compressive mea- surements by optimizing a sparsity-regularized goodness-of-fit objective function. In contrast to many CS approaches, how- ever, we measure the fit of an estimate to the data using the Poisson log-likelihood instead of a squared error term. This paper demonstrates that the bounds developed in previous work can be improved for some sparsity models by considering alternatives to dense sensing matrices with random entries. In particular, we show that deterministic sensing matrices given by scaled adjacency matrices of expander graphs have important theoretical characteristics (especially an version of the restricted isometry property [11]) that are ideally suited to controlling the performance of Poisson CS. Formally, suppose we have a signal with known norm (or a known upper bound on ). We aim to find a matrix with , the number of measure- ments, as small as possible, so that can be recovered effi- ciently from the measured vector , which is related to 1053-587X/$26.00 © 2011 IEEE
Transcript
Page 1: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011 4139

Performance Bounds for Expander-BasedCompressed Sensing in Poisson Noise

Maxim Raginsky, Member, IEEE, Sina Jafarpour, Zachary T. Harmany, Student Member, IEEE,Roummel F. Marcia, Member, IEEE, Rebecca M. Willett, Member, IEEE, and Robert Calderbank, Fellow, IEEE

Abstract—This paper provides performance bounds for com-pressed sensing in the presence of Poisson noise using expandergraphs. The Poisson noise model is appropriate for a variety ofapplications, including low-light imaging and digital streaming,where the signal-independent and/or bounded noise models usedin the compressed sensing literature are no longer applicable.In this paper, we develop a novel sensing paradigm based onexpander graphs and propose a maximum a posteriori (MAP)algorithm for recovering sparse or compressible signals fromPoisson observations. The geometry of the expander graphs andthe positivity of the corresponding sensing matrices play a cru-cial role in establishing the bounds on the signal reconstructionerror of the proposed algorithm. We support our results withexperimental demonstrations of reconstructing average packetarrival rates and instantaneous packet counts at a router in acommunication network, where the arrivals of packets in eachflow follow a Poisson process.

Index Terms—Compressive measurement, expander graphs,packet counters, photon-limited imaging, RIP-1.

I. INTRODUCTION

T HE goal of compressive sampling or compressedsensing (CS) [1], [2] is to replace conventional sam-

pling by a more efficient data acquisition framework, whichgenerally requires fewer sensing resources. This paradigm isparticularly enticing whenever the measurement process iscostly or constrained in some sense. For example, in the contextof photon-limited applications (such as low-light imaging),the photomultiplier tubes used within sensor arrays are physi-cally large and expensive. Similarly, when measuring network

Manuscript received July 14, 2010; revised February 04, 2011; accepted April28, 2011. Date of publication May 27, 2011; date of current version August 10,2011. The associate editor coordinating the review of this manuscript and ap-proving it for publication was Dr. Arie Yeredor. The work of M. Raginsky, Z. T.Harmany, and R. M. Willett was supported by NSF CAREER Award CCF-06-43947, DARPA Grant HR0011-07-1-003, and NSF Grant DMS-08-11062. Thework of R. Calderbank and S. Jafarpour was supported in part by the NSF underGrant DMS 0914892, by the ONR under Grant N00014-08-1-1110, and by theAFOSR under Grant FA9550-09-1-0551. The work of R. F. Marcia was sup-ported by NSF Grant DMS-08-11062.

M. Raginsky, Z. T. Harmany, and R. M. Willett are with the Departmentof Electrical and Computer Engineering, Duke University, Durham, NC 27708USA (e-mail: [email protected]; [email protected]; [email protected]).

S. Jafarpour is with the Department of Computer Science, Princeton Univer-sity, Princeton, NJ 08540 USA (e-mail: [email protected]).

R. F. Marcia is with the School of Natural Sciences, University of California,Merced, CA 95343 USA (e-mail: [email protected]).

R. Calderbank is with the Departments of Computer Science, Electrical En-gineering and Mathematics, Duke University, Durham, NC 27708 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2011.2157913

traffic flows, the high-speed memory used in packet counters iscost-prohibitive. These problems appear ripe for the applicationof CS.

However, photon-limited measurements [3] and arrivals/de-partures of packets at a router [4] are commonly modeled witha Poisson probability distribution, posing significant theoreticaland practical challenges in the context of CS. One of the keychallenges is the fact that the measurement error variance scaleswith the true intensity of each measurement, so that we cannotassume constant noise variance across the collection of mea-surements. Furthermore, the measurements, the underlying trueintensities, and the system models are all subject to certain phys-ical constraints, which play a significant role in performance.

Recent works [5]–[8] explore methods for CS reconstructionin the presence of impulsive, sparse or exponential-family noise,but do not account for the physical constraints associated witha typical Poisson setup and do not contain the related perfor-mance bounds emphasized in this paper. In previous work [9],[10], we showed that a Poisson noise model combined with con-ventional dense CS sensing matrices (properly scaled) yieldedperformance bounds that were somewhat sobering relative tobounds typically found in the literature. In particular, we foundthat if the number of photons (or packets) available to sensewere held constant, and if the number of measurements, , wasabove some critical threshold, then larger in general led tolarger bounds on the error between the true and the estimatedsignals. This can intuitively be understood as resulting from thefact that dense CS measurements in the Poisson case cannot bezero-mean, and the DC offset used to ensure physical feasibilityadversely impacts the noise variance.

The approach considered in this paper hinges, like most CSmethods, on reconstructing a signal from compressive mea-surements by optimizing a sparsity-regularized goodness-of-fitobjective function. In contrast to many CS approaches, how-ever, we measure the fit of an estimate to the data using thePoisson log-likelihood instead of a squared error term. Thispaper demonstrates that the bounds developed in previouswork can be improved for some sparsity models by consideringalternatives to dense sensing matrices with random entries.In particular, we show that deterministic sensing matricesgiven by scaled adjacency matrices of expander graphs haveimportant theoretical characteristics (especially an versionof the restricted isometry property [11]) that are ideally suitedto controlling the performance of Poisson CS.

Formally, suppose we have a signal with knownnorm (or a known upper bound on ). We aim

to find a matrix with , the number of measure-ments, as small as possible, so that can be recovered effi-ciently from the measured vector , which is related to

1053-587X/$26.00 © 2011 IEEE

Page 2: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4140 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

through a Poisson observation model. The restriction thatelements of be nonnegative reflects the physical limitationsof many sensing systems of interest (e.g., packet routers andcounters or linear optical systems). The original approach em-ployed dense random matrices [11], [12]. It has been shownthat if the matrix acts nearly isometrically on the set of all

-sparse signals, thus obeying what is now referred to as the Re-stricted Isometry Property with respect to norm (RIP-2) [11],then the recovery of from is indeed possible. It hasbeen also shown that dense random matrices constructed fromGaussian, Bernoulli, or partial Fourier ensembles satisfy the re-quired RIP-2 property with high probability [11].

Adjacency matrices of expander graphs [13] have beenrecently proposed as an alternative to dense random matriceswithin the compressed sensing framework, leading to compu-tationally efficient recovery algorithms [14]–[16]. It has beenshown that variations of the standard recovery approaches suchas basis pursuit [2] and matching pursuit [17] are consistentwith the expander sensing approach and can recover the orig-inal sparse signal successfully [18], [19]. In the presence ofGaussian or sparse noise, random dense sensing and expandersensing are known to provide similar performance in terms ofthe number of measurements and recovery computation time.Berinde et al. proved that expander graphs with sufficientlylarge expansion are near-isometries on the set of all -sparsesignals in the norm; this is referred as a Restricted IsometryProperty for norm (RIP-1) [18]. Furthermore, expandersensing requires less storage whenever the signal is sparsein the canonical basis, while random dense sensing providesslightly tighter recovery bounds [16].

The approach described in this paper consists of the followingkey elements:

• expander sensing matrices and the RIP-1 associated withthem;

• a reconstruction objective function that explicitly incorpo-rates the Poisson likelihood;

• a countable collection of candidate estimators; and• a penalty function defined over the collection of candi-

dates, which satisfies the Kraft inequality and which canbe used to promote sparse reconstructions.

In general, the penalty function is selected to be small for sig-nals of interest, which leads to theoretical guarantees that er-rors are small with high probability for such signals. In thispaper, exploiting the RIP-1 property and the non-negativity ofthe expander-based sensing matrices, we show that, in contrastto random dense sensing, expander sensing empowered with amaximum a posteriori (MAP) algorithm can approximately re-cover the original signal in the presence of Poisson noise, and weprove bounds which quantify the MAP performance. As a result,in the presence of Poisson noise, expander graphs not only pro-vide general storage advantages, but they also allow for efficientMAP recovery methods with performance guarantees compa-rable to the best -term approximation of the original signal.Finally, the bounds are tighter than those for specific dense ma-trices proposed by Willett and Raginsky [9], [10] whenever thesignal is sparse in the canonical domain, in that a log term inthe bounds in [10] is absent from the bounds presented in thispaper.

A. Relationship With Dense Sensing Matrices for Poisson CS

In recent work, the authors established performance boundsfor CS in the presence of Poisson noise using dense sensingmatrices based on appropriately shifted and scaled Rademacherensembles [9], [10]. Several features distinguish that work fromthe present paper:

• The dense sensing matrices used in [9], [10] require morememory to store and more computational resources toapply to a signal in a reconstruction algorithm. The ex-pander-based approach described in this paper, in contrast,is more efficient.

• The expander-based approach described in this paperworks only when the signal of interest is sparse in thecanonical basis. In contrast, the dense sensing matricesused in [9], [10] can be applied to arbitrary sparsity bases(though the proof technique there needs to be alteredslightly to accommodate sparsity in the canonical basis).

• The bounds in both this paper and [9], [10] reflect asobering tradeoff between performance and the numberof measurements collected. In particular, more measure-ments (after some critical minimum number) can actuallydegrade performance as a limited number of events (e.g.,photons) are distributed among a growing number ofdetectors, impairing the SNR of the measurements.

Notation

Nonnegative reals (respectively, integers) will be denoted by(respectively, ). Given a vector and a set

, we will denote by the vector obtained by settingto zero all coordinates of that are in , the complement of :

. Given some , let bethe set of positions of the largest (in magnitude) coordinates of

. Then will denote the best -term approximationof (in the canonical basis of ), and

will denote the resulting approximation error. The quasi-norm measures the number of nonzero coordinates of :

. For a subset we will denote bythe vector with components , . Given a

vector , we will denote by the vector obtained by settingto zero all negative components of : for all ,

. Given two vectors , we will writeif for all . If for some ,we will simply write . We will write instead of if theinequalities are strict for all .

B. Organization of the Paper

This paper is organized as follows. In Section II, we summa-rize the existing literature on expander graphs applied to com-pressed sensing and the RIP-1 property. Section III describeshow the problem of compressed sensing with Poisson noise canbe formulated in a way that explicitly accounts for nonnega-tivity constraints and flux preservation (i.e., we cannot detectmore events than have occurred); this section also contains ourmain theoretical result bounding the error of a sparsity penalized

Page 3: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4141

Fig. 1. A ��� ��-expander. In this example, the light nodes correspond to �,the dark nodes correspond to �, the light oval corresponds to the set � � �,and the dark oval corresponds to the set � ��� � �. There are three collidingedges.

likelihood reconstruction of a signal from compressive Poissonmeasurements. These results are illustrated and further analyzedin Section IV, in which we focus on the specific application ofefficiently estimating packet arrival rates. Several technical dis-cussions and proofs have been relegated to the appendices.

II. BACKGROUND ON EXPANDER GRAPHS

We start by defining an unbalanced bipartite vertex-expandergraph.

Definition II.1: We say that a bipartite simple graphwith (regular) left degree1 is a - expander if,

for any with , the set of neighbors ofhas size .

Fig. 1 illustrates such a graph. Intuitively a bipartite graph isan expander if any sufficiently small subset of its variable nodeshas a sufficiently large neighborhood. In the CS setting, (re-spectively, ) will correspond to the components of the originalsignal (respectively, its compressed representation). Hence, fora given , a “high-quality” expander should have , , and

as small as possible, while should be as close as possible to. The following proposition, proved using the probabilistic

method [20], is well-known in the literature on expanders:Proposition II.2 (Existence of High-Quality Expanders): For

any and any , there exists a -ex-

pander with left degree and right set size

.

Unfortunately, there is no explicit construction of expandersfrom Definition II.1. However, it can be shown that, with highprobability, any -regular random graph with

satisfies the required expansion property. Moreover, the graphmay be assumed to be right-regular as well, i.e., every node inwill have the same (right) degree [21]. Counting the numberof edges in two ways, we conclude that

1That is, each node in � has the same number of neighbors in �.

Thus, in practice it may suffice to use random bipartite regulargraphs instead of expanders.2 Moreover, there exists an explicitconstruction for a class of expander graphs that comes very closeto the guarantees of Proposition II.2. This construction, due toGuruswami et al. [23], uses Parvaresh–Vardy codes [24] and hasthe following guarantees:

Proposition II.3 (Explicit Construction of High-Quality Ex-panders): For any positive constant , and any , thereexists a deterministic explicit construction of a -expander

graph with and .Expanders have been recently proposed as a means of con-

structing efficient compressed sensing algorithms [15], [18],[19], [22]. In particular, it has been shown that any -dimen-sional vector that is -sparse can be fully recovered using

measurements in time [15], [19].It has been also shown that, even in the presence of noisein the measurements, if the noise vector has low norm,expander-based algorithms can approximately recover any

-sparse signal [16], [18], [19]. One reason why expandergraphs are good sensing candidates is that the adjacency matrixof any -expander almost preserves the norm of any

-sparse vector [18]. In other words, if the adjacency matrixof an expander is used for measurement, then the distancebetween two sufficiently sparse signals is preserved by mea-surement. This property is known as the “Restricted IsometryProperty for norms” or the “RIP-1” property. Berinde et al.have shown that this condition is sufficient for sparse recoveryusing minimization [18].

The precise statement of the RIP-1 property, whose proof canbe found in [15], goes as follows:

Lemma II.4 (RIP-1 Property of the Expander Graphs): Letbe the adjacency matrix of a expander graph .

Then for any -sparse vector , we have

(1)

The following proposition is a direct consequence of the aboveRIP-1 property. It states that if, for any almost -sparse vector 3

, there exists a vector whose norm is close to that of , andif approximates , then also approximates . Our resultsof Section III exploit the fact that the proposed MAP decodingalgorithm outputs a vector satisfying the two conditions above,and hence approximately recovers the desired signal.

Proposition II.5: Let be the adjacency matrix of a-expander and be two vectors in , such that

for some . Then is upper-bounded by

2Briefly, we can first generate a random left-regular graph with left degree� (by choosing each edge independently). That graph is, with overwhelmingprobability, an expander graph. Then, given an expander graph which is onlyleft-regular, a paper by Guruswami et al. [22] shows how to construct an ex-pander graph with almost the same parameters, which is both left-regular andright-regular.

3By “almost sparsity” we mean that the vector has at most � significant en-tries.

Page 4: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4142 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

In particular, if we let , then we get the bound

Proof: See Appendix B.For future convenience, we will introduce the following piece

of notation. Given and , we will denote by a-expander with left set size whose existence is guar-

anteed by Proposition II.2. Then has

III. COMPRESSED SENSING IN THE PRESENCE OF

POISSON NOISE

A. Problem Statement

We wish to recover an unknown vector of Poissonintensities from a measured vector , sensed according tothe Poisson model

Poisson (2)

where is a positivity-preserving sensing matrix.4That is, for each , is sampled independentlyfrom a Poisson distribution with mean :

(3)

where, for any and , we have

ifotherwise

(4)

where the case is a consequence of the fact that

We assume that the norm of is known, (al-though later we will show that this assumption can be relaxed).We are interested in designing a sensing matrix and an esti-mator , such that can be recovered with small ex-pected risk

where the expectation is taken w.r.t. the distribution .

B. The Proposed Estimator and its Performance

To recover , we will use a penalized Maximum Likeli-hood Estimation (pMLE) approach. Let us choose a convenient

and take to be the normalized adjacency matrix of

4Our choice of this observation model as opposed to a “shot-noise” modelbased on� operating on Poisson observations of � is discussed in Appendix A.

the expander (cf. Section II for definitions): . More-over, let us choose a finite or countable set of candidate es-timators with , and a penaltysatisfying the Kraft inequality5

(5)

For instance, we can impose less penalty on sparser signals orconstruct a penalty based on any other prior knowledge aboutthe underlying signal.

With these definitions, we consider the following penalizedmaximum-likelihood estimator (pMLE):

(6)

One way to think about the procedure in (6) is as a MAP al-gorithm over the set of estimates , where the likelihood iscomputed according to the Poisson model (4) and the penaltyfunction corresponds to a negative log prior on the candidateestimators in .

Our main bound on the performance of the pMLE is as fol-lows:

Theorem III.1: Let be the normalized adjacency matrix of, let be the original signal compressively sampled

in the presence of Poisson noise, and let be obtained through(6). Then

(7)

where

is the Kullback–Leibler divergence (relative entropy) betweenand [25].Proof: Since , we have . Hence,

using Proposition II.5 with , we can write

Taking expectations, we obtain

(8)

where the second step uses Jensen’s inequality. Using LemmasC.1 and C.2 in Appendix C, we have

Substituting this into (8), we obtain (7).

5Many penalization functions can be modified slightly (e.g., scaled appropri-ately) to satisfy the Kraft inequality. All that is required is a finite collection ofestimators (i.e., � ) and an associated prefix code for each candidate estimatein� . For instance, this would certainly be possible for a total variation penalty,though the details are beyond the scope of this paper.

Page 5: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4143

The bound of Theorem III.1 is an oracle inequality: it statesthat the error of is (up to multiplicative constants) the sumof the -term approximation error of plus times the min-imum penalized relative entropy error over the set of candidateestimators . The first term in (7) is smaller for sparser ,and the second term is smaller when there is a whichis simultaneously a good approximation to (in the sense thatthe distributions and are close) and has a low penalty.

Remark III.2: So far we have assumed that the norm ofis known a priori. If this is not the case, we can still estimate itwith high accuracy using noisy compressive measurements. Ob-serve that, since each measurement is a Poisson random vari-able with mean , is Poisson with mean .

Therefore, is approximately normally distributed with

mean and variance [26, sec. 6.2].6 Hence,Mill’s inequality [27, Thm. 4.7] guarantees that, for every pos-itive ,

where is meant to indicate the fact that this is only an approxi-mate bound, with the approximation error controlled by the rateof convergence in the central limit theorem. Now we can use theRIP-1 property of the expander graphs obtain the estimates

and

that hold with (approximate) probability at least.

C. A Bound in Terms of Error

The bound of Theorem III.1 is not always useful since itbounds the risk of the pMLE in terms of the relative entropy.A bound purely in terms of errors would be more desirable.However, it is not easy to obtain without imposing extra con-ditions either on or on the candidate estimators in . Thisfollows from the fact that the divergence maytake the value if there exists some such thatbut .

One way to eliminate this problem is to impose an additionalrequirement on the candidate estimators in : There existssome , such that

(9)

Under this condition, we will now develop a risk bound for thepMLE purely in terms of the error.

6This observation underlies the use of variance-stabilizing transforms.

Theorem III.3: Suppose that all the conditions of TheoremIII.1 are satisfied. In addition, suppose that the set satisfiesthe condition (9). Then

(10)Proof: Using Lemma C.3 in Appendix C, we get the bound

Substituting this into (7), we get (10).Remark III.4: Because every satisfies , the

constant cannot be too large. In particular, if (9) holds, thenfor every we must have

On the other hand, by the RIP-1 property we have. Thus, a necessary condition for (9) to hold is .

Since , the best risk we may hope to achieveunder some condition like (9) is on the order of

(11)

for some constant , e.g., by choosing . Effectively,

this means that, under the positivity condition (9), the error ofis the sum of the -term approximation error of plus

times the best penalized approximation error.The first term in (11) is smaller for sparser , and the secondterm is smaller when there is a which is simultaneouslya good approximation to and has a low penalty.

D. Empirical Performance

Here, we present a simulation study that validates our method.In this experiment, compressive Poisson observations are col-lected of a randomly generated sparse signal passed through thesensing matrix generated from an adjacency matrix of an ex-pander. We then reconstruct the signal by utilizing an algorithmthat minimizes the objective function in (6) and assess the ac-curacy of this estimate. We repeat this procedure over severaltrials to estimate the average performance of the method.

More specifically, we generate our length- sparse signalthrough a two-step procedure. First, we select elements of

uniformly at random, then we assign these elementsan intensity . All other components of the signal are set tozero. For these experiments, we chose a length 100 000and varied the sparsity among three different choices of 100,500, and 1,000 for two intensity levels of 10 000 and 100 000.We then vary the number of Poisson observations from 100to 20 000 using an expander graph sensing matrix with degree

. Recall that the sensing matrix is normalized such thatthe total signal intensity is divided amongst the measurements,hence the seemingly high choices of .

Page 6: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4144 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

To reconstruct the signal, we utilize the SPIRAL- algorithm[28] which solves (6) when . We design the al-gorithm to optimize over the continuous domain instead ofthe discrete set . This is equivalent to the proposed pMLEformulation in the limit as the discrete set of estimates becomesincreasingly dense in the set of all with ,i.e., we quantize this set on an ever finer scale, increasing thebit allotment to represent each . In this high-resolution limit,the Kraft inequality requirement (5) on the penalty willtranslate to . If we select a penalty propor-tional to the negative log of a prior probability distribution for

, this requirement will be satisfied. From a Bayesian perspec-tive, the penalty arises by assuming each component isdrawn i.i.d. from a zero-mean Laplace prior .Hence, the regularization parameter is inversely related to thescale parameter of the prior, as a larger (smaller ) will pro-mote solutions with more zero-valued components.

This relaxation results in a computationally tractable convexprogram over a continuous domain, albeit implemented on amachine with finite precision. The SPIRAL algorithm utilizes asequence of quadratic subproblems derived by using a second-order Taylor expansion of the Poisson log-likelihood at each it-eration. These subproblems are made easier to solve by usinga separable approximation whereby the second-order Hessianmatrix is approximated by a scaled identity matrix. For the par-ticular case of the penalty, these subproblems can be solvedquickly, exactly, and noniteratively by a soft-thresholding rule.

After reconstruction, we assess the estimate according tothe normalized error . We select the regularizationweighting in the SPIRAL- algorithm to minimize thisquantity for each randomly generated experiment indexedby . To assure that the results are not biased in ourfavor by only considering a single random experiment foreach , we repeat this experiment several times. Theaveraged reconstruction accuracy over ten trials is presented inFig. 2.

These results show that the proposed method is able to accu-rately estimate sparse signals when the signal intensity is suffi-ciently high; however, the performance of the method degradesfor lower signal strengths. More interesting is the behavior as wevary the number of measurements. There is a clear phase tran-sition where accurate signal reconstruction becomes possible,however the performance gently degrades with the number ofmeasurements since there is a lower signal-to-noise ratio permeasurement. This effect is more pronounced at lower intensitylevels, as we more quickly enter the regime where only a fewphotons are collected per measurement. These findings supportthe error bounds developed in Section III-B.

IV. APPLICATION: ESTIMATING PACKET ARRIVAL RATES

This section describes an application of the pMLE estimatorof Section III: an indirect approach for reconstructing averagepacket arrival rates and instantaneous packet counts for a givennumber of streams (or flows) at a router in a communicationnetwork, where the arrivals of packets in each flow are assumedto follow a Poisson process. All packet counting must be done inhardware at the router, and any hardware implementation muststrike a delicate balance between speed, accuracy, and cost. Forinstance, one could keep a dedicated counter for each flow, but,depending on the type of memory used, one could end up with

Fig. 2. Average performance (as measured by the normalized � error) for the proposed expander-based observation method for recovering

sparse signals under Poisson noise. In this experiment, we sweep over a rangeof measurements and consider a few sparsity ��� and intensity ��� levels ofthe true signal.

an implementation that is either fast but expensive and unableto keep track of a large number of flows (e.g., using SRAMs,which have low access times, but are expensive and physicallylarge) or cheap and high-density but slow (e.g., using DRAMs,which are cheap and small, but have longer access times) [29],[30].

However, there is empirical evidence [31], [32] that flow sizesin IP networks follow a power-law pattern: just a few flows(say, 10%) carry most of the traffic (say, 90%). Based on thisobservation, several investigators have proposed methodologiesfor estimating flows using a small number of counters by ei-ther a) keeping track only of the flows whose sizes exceed agiven fraction of the total bandwidth (the approach suggestivelytermed “focusing on the elephants, ignoring the mice”) [29]or b) using sparse random graphs to aggregate the raw packetcounts and recovering flow sizes using a message passing de-coder [30].

We consider an alternative to these approaches based onPoisson CS, assuming that the underlying Poisson rate vector issparse or approximately sparse—and, in fact, it is the approx-imate sparsity of the rate vector that mathematically describesthe power-law behavior of the average packet counts. The goalis to maintain a compressed summary of the process samplepaths using a small number of counters, such that it is possibleto reconstruct both the total number of packets in each flowand the underlying rate vector. Since we are dealing here withPoisson streams, we would like to push the metaphor furtherand say that we are “focusing on the whales, ignoring theminnows.”

A. Problem Formulation

We wish to monitor a large number of packet flows usinga much smaller number of counters. Each flow is a homoge-neous Poisson process (cf. [4] for details pertaining to Poissonprocesses and networking applications). Specifically, let

denote the vector of rates, and let denote the randomprocess with sample paths in , where, for

Page 7: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4145

each , the component of is a homogeneousPoisson process with the rate of arrivals per unit time, and allthe component processes are mutually conditionally indepen-dent given .

The goal is to estimate the unknown rate vector based on .We will focus on performance bounds for power-law networktraffic, i.e., for belonging to the class

(12)

for some and , where the constant hidden inthe notation may depend on . Here, is the power-lawexponent that controls the tail behavior; in particular, the ex-treme regime describes the fully sparse setting. As inSection III, we assume the total arrival rate to be known(and equal to a given ) in advance, but this assumption can beeasily dispensed with (cf. Remark III.2).

As before, we evaluate each candidate estimatorbased on its expected risk,

B. Two Estimation Strategies

We consider two estimation strategies. In both cases, we letour measurement matrix be the adjacency matrix of the ex-pander for a fixed (see Section II for definitions).The first strategy, which we call the direct method, uses standardexpander-based CS to construct an estimate of . The secondis the pMLE strategy, which relies on the machinery presentedin Section III and can be used when only the rates are of interest.

1) The Direct Method: In this method, which will be usedas a “baseline” for assessing the performance of the pMLE, thecounters are updated in discrete time, every time units. Let

denote the sampled version of , where. The update takes place as follows. We have a binary ma-

trix , and at each time let . In otherwords, is obtained by passing a sampled -dimensional ho-mogeneous Poisson process with rate vector through a lineartransformation . The direct method uses expander-based CSto obtain an estimate of from , followed byletting

(13)

This strategy is based on the observation that is the max-imum-likelihood estimator of . To obtain , we need to solvethe convex program

minimize subject to

which can be cast as a linear program [33]. The resulting so-lution may have negative coordinates,7 hence the use of the

operation in (13). We then have the following result.Theorem IV.1:

(14)

7Khajehnejad et al. [34] have recently proposed the use of perturbed adja-cency matrices of expanders to recover nonnegative sparse signals.

where is the vector with components .

Remark IV.2: Note that the error term in (14) is ,assuming everything else is kept constant, which coincides withthe optimal rate of the error decay in parametric estimationproblems.

Proof: We first observe that, by construction, satisfiesthe relations and . Hence,

(15)

where the first step uses the triangle inequality, while the secondstep uses Proposition II.5 with . To bound the first termin (15), let denote the positions of the largestentries of . Then, by definition of the best -term representa-tion,

Therefore,

To bound the second term, we can use concavity of the squareroot, as well as the fact that each Poisson , to write

Now, it is not hard to show that .Therefore,

which proves the theorem.2) The Penalized MLE Approach: In the penalized MLE ap-

proach the counters are updated in a slightly different manner.Here the counters are still updated in discrete time, every timeunits; however, each counter is updated at times

, and only aggregates the packets that have ar-

rived during the time period . There-fore, in contrast to the direct method, here each arriving packetis registered by at most one counter. Furthermore, since thepackets arrive according to a homogeneous Poisson process,conditioned on the vector , the values measured by distinct

Page 8: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4146 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

counters are independent.8 Therefore, the vector of counts attime obeys

Poisson where

which is precisely the sensing model we have analyzed inSection III.

Now assume that the total average arrival rate isknown. Let be a finite or a countable set of candidate estima-tors with for all , and let be a penaltyfunctional satisfying the Kraft inequality over . Given and

, consider the scaled set

with the same penalty function, for all. We can now apply the results of Section III. Specifically,

let

where is the corresponding pMLE estimator obtained ac-cording to (6). The following theorem is a consequence ofTheorem III.3 and the remark following it:

Theorem IV.3: If the set satisfies the strict positivity con-dition (9), then there exists some absolute constant , suchthat

(16)

We now develop risk bounds under the power-law condition.To this end, let us suppose that is a member of the power-lawclass defined in (12). Fix a small positive number , suchthat is an integer, and define the set

These will be our candidate estimators of . We can define thepenalty function . For anyand any we can find some , such that

and

Here, we assume that is sufficiently small, so that the penaltyterm dominates the quantization error . In order toguarantee that the penalty function satisfies Kraft’s inequality,we need to ensure that

8The independence follows from the fact that if � � � � � � � are condition-ally independent random variables, then for any choice of functions � � � � � � � ,the random variables � �� �� � � � � � �� � are also conditionally indepen-dent.

For every fixed , there are exactly subspaces of dimension

, and each subspace contains exactly distinct elementsof . Therefore, as long as

(17)

then

and Kraft’s inequality is satisfied.Using the fact that , we can bound the

minimum over in (16) from above by

We can now particularize Theorem IV.3 to the power-law case:Theorem IV.4:

where the constants implicit in the notation depend onand .

Note that the risk bound here is slightly worse than the bench-mark bound of Theorem IV.1. However, it should be borne inmind that this bound is based on Theorem III.3, rather than onthe potentially much tighter oracle inequality of Theorem III.1,since our goal was to express the risk of the pMLE purely interms of the approximation properties of the power-law class

. In general, we will expect the actual risk of the pMLEto be much lower than what the conservative bound of The-orem IV.4 predicts. Indeed, as we will see in Section IV-D, thepMLE approach obtains higher empirical accuracy than the di-rect method. But first we show how the pMLE can be approx-imated efficiently with proper preprocessing of the observedcounts based on the structure of .

C. Efficient PMLE Approximation

In this section we present an efficient algorithm for ap-proximating the pMLE estimate. The algorithm consists oftwo phases: 1) we preprocess to isolate a subset of

which is sufficiently small and is guaranteed tocontain the locations of the largest entries of (the whales);and 2) then we construct a set of candidate estimators whosesupport sets lie in , together with an appropriate penalty,and perform pMLE over this reduced set. The success of thisapproach hinges on the assumption that the magnitude of thesmallest whale is sufficiently large compared to the magnitudeof the largest minnow. Specifically, we make the following

Page 9: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4147

assumption: Let contain the locations of the largestcoordinates of . Then we require that

(18)

Recall that is the right degree of theexpander graph. One way to think about (18) is in terms of asignal-to-noise ratio, which must be strictly larger than . Wealso require to be sufficiently large, so that

(19)

Finally, we perturb our expander a bit as follows: choose aninteger so that

(20)

Then we replace our original -expander with left-degree with a -expander with the same left de-gree. The resulting procedure, displayed below as Algorithm 1,has the following guarantees:

Algorithm 1: Efficient pMLE Approximation Algorithm

Input Measurement vector , and the sensing matrix .

Output An approximation

Let consist of the locations of the largest elements ofand let .

Let contain the set of all variable nodes that have at leastone neighbor in and let .

Construct a candidate set of estimators with support inand a penalty over .

Output the pMLE .

Theorem IV.5: Suppose the assumptions (18), (19), and (20)hold. Then with probability at least the set con-structed by Algorithm 1 has the following properties: 1)

; 2) ; and 3) can be found in time.Proof:

1) First, fix a measurement node . Recall thatis a Poisson random variable with mean . Bythe same argument as in Remark III.2, is approxi-

mately normally distributed with mean ,

and with variance . Hence, it follows from Mill’s in-equality and the union bound that for every positive

If is a neighbor of , then ;whereas if is not connected to , then

. Hence, by setting

(where w.l.o.g. we assume that ), we conclude that,

with probability at least , for every measurementnode the following holds:• If is a neighbor of , then

• If is not connected to , then

Consequently, by virtue of (18) and(19), with prob-ability at least every element of that is aneighbor of has larger magnitude than every ele-ment of that is not a neighbor of .

2) Suppose, to the contrary, that . Letbe any subset of size . Now, Lemma

3.6 in [34] states that, provided , thenevery -expander with left degree is also a

-expander with left degree . Weapply this result to our -expander, wheresatisfies (20), to see that it is also a -ex-pander. Therefore, for the set we must have

. On the other hand,, so . This is a contradic-

tion, hence we must have .3) Finding the sets and can be done in

time by sorting . The set can then can be found intime , by sequentially eliminating all nodes con-nected to each node in .

Having identified the set , we can reduce the pMLE op-timization only to those candidates whose support sets lie in

. More precisely, if we originally start with a sufficiently richclass of estimators , then the new feasible set can be reducedto

Hence, by extracting the set , we can significantly reduce thecomplexity of finding the pMLE estimate. If is small, theoptimization can be performed by brute-force search intime. Otherwise, since , we can use the quantizationtechnique from the preceding section with quantizer resolution

to construct a of size at most . In this case, wecan even assign the uniform penalty

which amounts to a vanilla MLE over .

D. Empirical Performance

Here we compare penalized MLE with -magic [35], a uni-versal minimization method, and with SSMP [36], an alterna-tive method that employs combinatorial optimization. -magicand SSMP both compute the “direct” estimator. The pMLE es-timate is computed using Algorithm 1 above. For the ease ofcomputation, the candidate set is approximated by the convexset of all positive vectors with bounded norm, and the CVX

Page 10: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4148 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

Fig. 3. Relative � error as a function of number of whales �, for � -magic (LP), SSMP, and pMLE for different choices of the power-law exponent�. The numberof flows � � ����, the number of counters � � ���, and the number of updates is 40: (a) � � �; (b) � � ���; and (c) � � �.

Fig. 4. Probability of successful support recovery as a function of number of whales �, for � -magic (LP), SSMP, and pMLE for different choices of the power-lawexponent �. The number of flows � � ����, the number of counters � � ���, and the number of updates is 40: (a) � � �; (b) � � ���; and (c) � � �.

package [37], [38] is used to directly solve the pMLE objectivefunction with .

Fig. 3(a) through Fig. 5(b) report the results of numerical ex-periments, where the goal is to identify the largest entries inthe rate vector from the measured data. Since a random graph is,with overwhelming probability, an expander graph, each exper-iment was repeated 30 times using independent sparse randomgraphs with .

We also used the following process to generate the rate vector.First, given the power-law exponent , the magnitudes of thewhales were chosen according to a power-law distribution withparameter . The positions of the whales were then chosenuniformly at random. Finally the minnows were sampledindependently from a distribution (negative sam-ples were replaced by their absolute values). Thus, given thelocations of the whales, their magnitudes decay according toa truncated power law (with the cut-off at ), while the mag-nitudes of the minnows represent a noisy background. Fig. 3shows the relative error of the three above algo-rithms as a function of . Note that in all cases , ,and , the pMLE algorithm provides lower errors. Sim-ilarly, Fig. 4 reports the probability of exact recovery as a func-tion of . Again, it turns out that in all three cases the pMLEalgorithm has higher probability of exact support recovery com-pared to the two direct algorithms.

We also analyzed the impact of changing the number of up-dates on the accuracy of the three above algorithms. The resultsare demonstrated in Fig. 5. Here we fixed the number of whalesto , and changed the number of updates from 10 to 200.It turned out that as the number of updates increases, the rel-ative errors of all three algorithms decrease and their proba-bility of exact support recovery consistently increase. Moreover,the pMLE algorithm always outperforms the -magic (LP), andSSMP algorithms.

V. CONCLUSION

In this paper we investigated expander-based sensing asan alternative to dense random sensing in the presence ofPoisson noise. Even though the Poisson model is essential insome applications, it presents several challenges as the noiseis not bounded, or even as concentrated as Gaussian noise,and is signal-dependent. Here we proposed using normalizedadjacency matrices of expander graphs as an alternative con-struction of sensing matrices, and we showed that the binarynature and the RIP-1 property of these matrices yield provableconsistency for a MAP reconstruction algorithm.

The compressed sensing algorithms based on Poisson obser-vations and expander-graph sensing matrices provide a usefulmechanism for accurately and robustly estimating a collectionof flow rates with relatively few counters. These techniques

Page 11: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4149

Fig. 5. Performance of � -magic, SSMP and pMLE algorithms as a functionof the number of updates � . The number of flows � � ����, the number ofcounters � � ���, and the number of whales is � � ��. There are � whaleswhose magnitudes are assigned according to a power-law distribution with � ��, and the remaining entries are minnows with magnitudes determined by a� ��� �� � random variable. (a) Relative � error as a function of number ofupdates � . (b) Probability of successful support recovery as a function of numberof updates � .

have the potential to significantly reduce the cost of hardwarerequired for flow rate estimation. While previous approachesassumed packet counts matched the flow rates exactly or thatflow rates were i.i.d., the approach in this paper accounts for thePoisson nature of packet counts with relatively mild assump-tions about the underlying flow rates (i.e., that only a small frac-tion of them are large).

The “direct” estimation method (in which first the vector offlow counts is estimated using a linear program, and then theunderlying flow rates are estimated using Poisson maximumlikelihood) is juxtaposed with an “indirect” method (in whichthe flow rates are estimated in one pass from the compressivePoisson measurements using penalized likelihood estimation).

The methods in this paper, along with related results in thisarea, are designed for settings in which the flow rates are suf-ficiently stationary, so that they can be accurately estimated ina fixed time window. Future directions include extending theseapproaches to a more realistic setting in which the flow ratesevolve over time. In this case, the time window over whichpackets should be counted may be relatively short, but this canbe mitigated by exploiting estimates of the flow rates in earliertime windows.

APPENDIX AOBSERVATION MODELS IN POISSON INVERSE PROBLEMS

In (2) and all the subsequent analysis in this paper, we assume

Poisson

However, one might question how accurately this models thephysical systems of interest, such as a photon-limited imagingsystem or a router. In particular, we may prefer to think of onlya small number of events (e.g., photons or packets) being in-cident upon our system, and the system then rerouting thoseevents to a detector. In this appendix, we compare the statis-tical properties of these two models. Let denote the numberof events traveling from location in the source to location

on the detector. Also, in this appendix let us assume is a sto-chastic matrix, i.e., each column of sums to one; in general,most elements of are going to be less than one. Physically,this assumption means that every event incident on the systemhits some element of the detector array. Armed with these as-sumptions, we can think of as the probability of events fromlocation in being transmitted to location in the observa-tion vector .

We consider two observation models:

Model A

Model B Poisson

Multinomial

where in both models all the components of are mutuallyconditionally independent given the appropriate parameters.Model A roughly corresponds to the model we considerthroughout the paper; Model B corresponds to consideringPoisson realizations with intensity (denoted ) incidentupon our system and then redirected to different detectorelements via . We model this redirection process with amultinomial distribution. While the modelis slightly different from Model A, the following analysis willprovide valuable insight into discrete event counting systems.

We now show that the distribution of is the same in ModelsA and B. First note that

and (21)

Under Model A, we have

(22)

Page 12: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4150 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

where in the last step we used (21) and the assumption that. Under Model B, we have

ifotherwise

(23)

The fourth line uses (21). Since (22) and (23) are the same, wehave shown that Models A and B are statistically equivalent.While Model B may be more intuitively appealing based onour physical understanding of how these systems operate, usingModel A for our analysis and algorithm development is just asaccurate and mathematically more direct.

APPENDIX BPROOF OF PROPOSITION II.5

Let , let denote the positions ofthe largest (in magnitude) coordinates of , and enumerate thecomplementary set as in decreasing order ofmagnitude of . Let us partition the setinto adjacent blocks , such that all blocks (but pos-sibly ) have size . Also let . Let be a submatrixof containing rows from . Then, following the argu-ment of Berinde et al. [18], which also goes back to Sipser andSpielman [21], we have the following chain of inequalities:

Most of the steps are straightforward consequences of the defi-nitions, the triangle inequality, or the RIP-1 property. The fourthinequality follows from the following fact. Since we are dealingwith a -expander and since for every

, we must have .Therefore, at most edges can cross from each to .From the above estimate, we obtain

(24)

Using the assumption that , the triangle in-equality, and the fact that , we obtain d

which yields

Using (24) to bound , we further obtain

Rearranging this inequality completes the proof.

APPENDIX CTECHNICAL LEMMAS

Lemma C.1: Any satisfies the bound

Proof: From Lemma II.4, it follows that

(25)

Let and . Then

Page 13: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4151

The first and the second inequalities are by Cauchy–Schwarz,while the third inequality is a consequence of (25).

Lemma C.2: Let be a minimizer in (6). Then

(26)

Proof: Using Lemma C.4 below with andwe have

Clearly

We now provide a bound for this expectation. Let be a min-imizer of over . Then, bydefinition of , we have

for every . Consequently,

We can split the quantity

into three terms:

We show that the third term is always nonpositive, which com-pletes the proof. Using Jensen’s inequality,

Now

Since , we obtain

which proves the lemma.Lemma C.3: If the estimators in satisfy the condition

(9), then following inequality holds:

Proof: By definition of the KL divergence,

The first inequality uses , the second is by (9), thethird uses the fact that the norm dominates the norm, andthe last one is by the RIP-1 property (Lemma II.4).

Lemma C.4: Given two Poisson parameter vectors, the following equality holds:

Page 14: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

4152 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 9, SEPTEMBER 2011

where denotes the counting measure on .Proof:

Taking logs, we obtain the lemma.

ACKNOWLEDGMENT

The authors would like to thank P. Indyk for his insightfulcomments on the performance of the expander graphs, and theanonymous referees whose constructive criticism and numeroussuggestions helped improve the quality of the paper.

REFERENCES

[1] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[2] E. Candès, J. Romberg, and T. Tao, “Stable signal recovery from in-complete and inaccurate measurements,” Commun. Pure Appl. Math.,vol. 59, no. 8, pp. 1207–1223, 2006.

[3] D. Snyder, A. Hammond, and R. White, “Image recovery from dataacquired with a charge-coupled-device camera,” J. Opt. Soc. Amer. A,vol. 10, pp. 1014–1023, 1993.

[4] D. Bertsekas and R. Gallager, Data Networks. Englewood Cliffs, NJ:Prentice-Hall, 1992.

[5] I. Rish and G. Grabarnik, “Sparse signal recovery with exponential-family noise,” in Proc. Allerton Conf. Commun., Control, Comput.,2009, pp. 60–66.

[6] L. Jacques, D. K. Hammond, and M. J. Fadili, “Dequantizing com-pressed sensing with non-Gaussian constraints,” in Proc. ICIP, 2009,pp. 1465–1468.

[7] R. E. Carrillo, K. E. Barner, and T. C. Aysal, “Robust sampling andreconstruction methods for sparse signals in the presence of impulsivenoise,” IEEE J. Sel. Topics Signal Process., vol. 4, no. 2, pp. 392–408,2010.

[8] J. N. Laska, M. A. Davenport, and R. G. Baraniuk, “Exact signal re-covery from sparsely corrupted measurements through the pursuit ofjustice,” in Proc. 43rd Asilomar Conf. Signals, Syst., Comput., 2009,pp. 1556–1560.

[9] R. Willett and M. Raginsky, “Performance bounds on compressedsensing with Poisson noise,” in Proc. IEEE Int. Symp. Inf. Theory,Seoul, Korea, Jun./Jul. 2009, pp. 174–178.

[10] M. Raginsky, Z. Harmany, R. Marcia, and R. Willett, “Compressedsensing performance bounds under Poisson noise,” IEEE Trans. SignalProcess., vol. 58, pp. 3990–4002, Aug. 2010.

[11] E. Candès and T. Tao, “Near optimal signal recovery from random pro-jections: Universal encoding strategies,” IEEE Trans. Inf. Theory, vol.52, no. 12, pp. 5406–5425, Dec. 2006.

[12] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, 2006.

[13] S. Hoory, N. Linial, and A. Wigderson, “Expander graphs and their ap-plications,” Bull. Amer. Math. Soc. (New Series), vol. 43, pp. 439–561,2006.

[14] R. Berinde and P. Indyk, “Sparse recovery using sparse random ma-trices,” MIT, Cambridge, MA, Tech. Rep., 2008.

[15] S. Jafarpour, W. Xu, B. Hassibi, and R. Calderbank, “Efficient androbust compressed sensing using optimized expander graphs,” IEEETrans. Inf. Theory, vol. 55, no. 9, pp. 4299–4308, Sep. 2009.

[16] R. Berinde, P. Indyk, and M. Ruzic, “Practical near-optimal sparse re-covery in the � norm,” in Proc. 46th Annu. Allerton Conf. Commun.,Control, Comput., 2008, pp. 198–205.

[17] J. Tropp and A. Gilbert, “Signal recovery from random measurementsvia orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no.12, pp. 4655–4666, Dec. 2007.

[18] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, “Com-bining geometry and combinatorics: A unified approach to sparsesignal recovery,” in Proc. 46th Annu. Allerton Conf. Commun., Con-trol, Comput., Sep. 2008, pp. 798–805.

[19] P. Indyk and M. Ruzic, “Near-optimal sparse recovery in the �� norm,”in Proc. 49th Annu. IEEE Symp. Found. Comput. Sci. (FOCS), 2008,pp. 199–207.

[20] N. Alon and J. Spencer, The Probabilistic Method. New York: Wiley-Interscience, 2000.

[21] M. Sipser and D. Spielman, “Expander codes,” IEEE Trans. Inf.Theory, vol. 42, no. 6, pp. 1710–1722, 1996.

[22] V. Guruswami, J. Lee, and A. Razborov, “Almost euclidean subspacesof � via expander codes,” in Proc. 19th Annu. ACM-SIAM Symp. Dis-crete Algorithms (SODA), Jan. 2008, pp. 353–362.

[23] V. Guruswami, C. Umans, and S. Vadhan, “Unbalanced expanders andrandomness extractors from Parvaresh-Vardy codes,” in Proc. IEEEConf. Comput. Complexity (CCC), 2007, pp. 96–108.

[24] F. Parvaresh and A. Vardy, “Correcting errors beyond the Guruswami-Sudan radius in polynomial time,” in Proc. 46th Annu. IEEE Symp.Found. Comput. Sci. (FOCS), 2005, pp. 285–294.

[25] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nded. New York: Wiley, 2006.

[26] P. McCullagh and J. Nelder, Generalized Linear Models, 2nd ed.London, U.K.: Chapman & Hall, 1989.

[27] L. Wasserman, All of Statistics: A Concise Course in Statistical Infer-ence. New York: Springer, 2003.

[28] Z. Harmany, R. Marcia, and R. Willett, “Sparse Poisson intensity re-construction algorithms,” in Proc. IEEE Stat. Signal Process. Work-shop, 2009, pp. 634–637.

[29] C. Estan and G. Varghese, “New directions in traffic measurement andaccounting: Focusing on the elephants, ignoring the mice,” ACM Trans.Comput. Sys., vol. 21, no. 3, pp. 270–313, 2003.

[30] Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kab-bani, “Counter braids: A novel counter architecture for per-flow mea-surement,” in Proc. ACM SIGMETRICS, 2008, pp. 121–132.

[31] W. Fang and L. Peterson, “Inter-AS traffic patterns and their implica-tions,” in Proc. IEEE GLOBECOM, 1999, pp. 1859–1868.

[32] A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F.True, “Deriving traffic demands for operational IP networks: Method-ology and experience,” IEEE/ACM Trans. Networking, pp. 265–279,Jun. 2001.

[33] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, “Com-bining geometry and combinatorics: A unified approach to sparsesignal recovery,” in Proc. Allerton Conf., Sep. 2008, pp. 798–805.

[34] M. A. Khajehnejad, A. G. Dimakis, W. Xu, and B. Hassibi, “Sparse Re-covery of Nonnegative Signals With Minimal Expansion,” IEEE Trans.Signal Process., vol. 59, no. 1, pp. 196–208, Jan. 2011.

[35] E. Candès and J. Romberg, � -MAGIC: Recovery of sparse signalsvia convex programming, 2005 [Online]. Available: http://www.acm.caltech.edu/l1magic

[36] R. Berinde and P. Indyk, “Sequential sparse matching pursuit,” in Proc.47th Annu. Allerton Conf. Commun., Control and Comput., 2009, pp.36–43.

[37] M. Grant and S. Boyd, CVX: Matlab Software for DisciplinedConvex Programming. ver. 1.21, Jan. 2011 [Online]. Available:http://cvxr.com/cvx

[38] M. Grant and S. Boyd, “Graph implementations for nonsmooth convexprograms,” in Recent Advances in Learning and Control, ser. LectureNotes in Control and Information Sciences, V. Blondel, S. Boyd, andH. Kimura, Eds. New York: Springer-Verlag Ltd., 2008, pp. 95–110.

Page 15: IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. …maxim.ece.illinois.edu/pubs/raginsky_et-al_IEEE-TSP11.pdfMAP recovery methods with performance guarantees compa-rable to the

RAGINSKY et al.: PERFORMANCE BOUNDS FOR EXPANDER-BASED COMPRESSED SENSING IN POISSON NOISE 4153

Maxim Raginsky (S’99–M’00) received the B.S.and M.S. degrees in 2000 and the Ph.D. degree in2002 from Northwestern University, Evanston, IL,all in electrical engineering.

From 2002 to 2004, he was a Postdoctoral Re-searcher at the Center for Photonic Communicationand Computing at Northwestern University, wherehe pursued work on quantum cryptography andquantum communication and information theory.From 2004 to 2007, he was a Beckman FoundationPostdoctoral Fellow at the University of Illinois,

Urbana-Champaign, where he carried out research on information theory,statistical learning and computational neuroscience. Since September 2007,he has been with Duke University, Durham, NC, where he is now AssistantResearch Professor of electrical and computer engineering. His interestsinclude statistical signal processing, information theory, statistical learning andnonparametric estimation. He is particularly interested in problems that com-bine the communication, signal processing and machine learning componentsin a novel and nontrivial way, as well as in the theory and practice of robuststatistical inference with limited information.

Sina Jafarpour received the B.Sc. degree incomputer engineering from Sharif University ofTechnology, Tehran, Iran, in 2007. He is workingtowards the Ph.D. degree in the Computer ScienceDepartment of Princeton University, Princeton, NJ,under the supervision of Prof. R. Calderbank andProf. R. Schapire.

His main research interests include compressedsensing and applications of machine learning inimage processing, multimedia, and informationretrieval. He has been a member of the Van Gogh

project supervised by Prof. I. Daubechies since fall 2008.

Zachary T. Harmany (S’02) received the B.S.degree (magna cum laude) in electrical engineeringand B.S. degree (cum laude) in physics in 2006 fromPennsylvania State University, University Park. Heis currently working towards the Ph.D. degree in theDepartment of Electrical and Computer Engineeringat Duke University, Durham, NC.

In 2010, he was a visiting researcher at The Uni-versity of California, Merced. His research interestsinclude nonlinear optimization, statistical signal pro-cessing, learning theory, and image processing with

applications in functional neuroimaging, medical imaging, astronomy, and nightvision.

Mr. Harmany is a member of SIAM and SPIE.

Roummel F. Marcia (M’08) received his B.A. inMathematics from Columbia University in 1995 andhis Ph.D. in Mathematics from the University of Cal-ifornia, San Diego in 2002. He was a Computationand Informatics in Biology and Medicine Postdoc-toral Fellow in the Biochemistry Department at theUniversity of Wisconsin-Madison and a ResearchScientist in the Electrical and Computer Engineeringat Duke University. He is currently an AssistantProfessor of Applied Mathematics at the Universityof California, Merced. His research interests include

nonlinear optimization, numerical linear algebra, signal and image processing,and computational biology.

Rebecca M.Willett (S’01–M’05) received the Ph.D.degree in electrical and computer engineering at RiceUniversity, Houston, TX, in 2005.

She is an Assistant Professor in the Electricaland Computer Engineering Department at DukeUniversity, Durham, NC. She has also held vis-iting researcher positions at the Institute for Pureand Applied Mathematics at the University ofCalifornia, Los Angeles (UCLA), in 2004, theUniversity of Wisconsin-Madison from 2003 to2005, the French National Institute for Research in

Computer Science and Control (INRIA) in 2003, and the Applied ScienceResearch and Development Laboratory at GE Healthcare in 2002. Her researchinterests include network and imaging science with applications in medicalimaging, wireless sensor networks, astronomy, and social networks. Additionalinformation, including publications and software, are available online athttp://www.ee.duke.edu/ willett/.

Prof. Willett received the National Science Foundation CAREER Award in2007, is a member of the DARPA Computer Science Study Group, and receivedan Air Force Office of Scientific Research Young Investigator Program awardin 2010.

Robert Calderbank (M’89–SM’97–F’94) receivedthe B.Sc. degree from Warwick University, U.K.,in 1975, the M.Sc. degree from Oxford University,U.K., in 1976, and the Ph.D. degree from the Cal-ifornia Institute of Technology, Pasadena, in 1980,all in mathematics.

At the start of his career at Bell Laboratories,his innovations were incorporated in a progressionof voiceband modem standards that moved com-munications practice close to the Shannon limit.Together with P. Shor and colleagues at AT&T

Laboratories, he showed that good quantum error correcting codes exist anddeveloped the group theoretic framework for quantum error correction. Hewas Vice-President for Research at AT&T, responsible for directing the firstindustrial research laboratory in the world where the primary focus is dataat scale. He was a Professor of Electrical Engineering and Mathematics atPrinceton University, Princeton, NJ, where he directed the Program in Appliedand Computational Mathematics. He is currently Dean of Natural Sciences atDuke University, Durham, NC. He is a co-inventor of space-time codes forwireless communication, where correlation of signals across different transmitantennas is the key to reliable transmission.

Dr. Calderbank served as Editor-in-Chief of the IEEE TRANSACTIONS ON

INFORMATION THEORY from 1995 to 1998, and as Associate Editor for CodingTechniques from 1986 to 1989. He was a member of the Board of Governorsof the IEEE Information Theory Society from 1991 to 1996 and from 2006 to2008. He was honored by the IEEE Information Theory Prize Paper Award in1995 for his work on the � linearity of Kerdock and Preparata codes (jointlywith A. R. Hammons, Jr., P. V. Kumar, N. J. A. Sloane, and P. Sole), and againin 1999 for the invention of space-time codes (jointly with V. Tarokh and N.Seshadri). He received the 2006 IEEE Donald G. Fink Prize Paper Award andthe IEEE Millennium Medal. He was elected to the U.S. National Academy ofEngineering in 2005.


Recommended