1030 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. …yc5/publications/atomic2D.pdf1030 IEEE...

1030 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 4, FEBRUARY 15, 2015

Compressive Two-Dimensional Harmonic Retrievalvia Atomic Norm MinimizationYuejie Chi, Member, IEEE, and Yuxin Chen, Student Member, IEEE

Abstract—This paper is concerned with estimation of two-di-mensional (2-D) frequencies from partial time samples, whicharises in many applications such as radar, inverse scattering, andsuper-resolution imaging. Suppose that the object under studyis a mixture of continuous-valued 2-D sinusoids. The goal is toidentify all frequency components when we only have informationabout a random subset of regularly spaced time samples. Wedemonstrate that under some mild spectral separation condition,it is possible to exactly recover all frequencies by solving an atomicnorm minimization program, as long as the sample complexityexceeds the order of . We then propose to solve theatomic norm minimization via a semidefinite program and pro-vide numerical examples to justify its practical ability. Our workextends the framework proposed by Tang et al. for line spectrumestimation to 2-D frequency models.

Index Terms—Atomic norm, basis mismatch, continuous-valuedfrequency recovery, sparsity.

I. INTRODUCTION

T HE problem of estimating two-dimensional (2-D) spec-trum is encountered in a variety of signal processing

applications. For instance, the multi-dimensional frequencymodel naturally arises in several operational scenarios in mul-tiple-input multiple-output (MIMO) radars [2], where multiplecomponents of each frequency correspond respectively to thedirection of arrival, direction of departure, and Doppler shift ofa scatter. Retrieving these parameters is of great importance forlocalization and tracking of targets [3]. A second applicationconcerns channel sensing in wireless communications, whereaccurate estimation of channel state information is crucial forcoherent detection in order to ensure high data rate. Physicalarguments and a growing body of experimental evidence sug-gest that the number of significant paths in a wireless channel

Manuscript received March 25, 2014; revised September 25, 2014; acceptedDecember 19, 2014. Date of publication December 24, 2014; date of currentversion January 23, 2015. The associate editor coordinating the review of thismanuscript and approving it for publication was Dr. Akbar Sayeed. This paperhas been presented in part at the Asilomar Conference on Signals, Systems,and Computers, Pacific Grove, CA, USA, 2013 [1]. The work of Y. Chi wassupported in part by the Ralph E. Powe Junior Faculty Enhancement Awardfrom the Oak Ridge Associated Universities.Y. Chi is with Department of Electrical and Computer Engineering and the

Department of Biomedical Informatics, The Ohio State University, Columbus,OH 43210 USA (e-mail: [email protected]).Y. Chen is with Department of Electrical Engineering, Stanford University,

Stanford, CA 94305 USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TSP.2014.2386283

is typically small [4], [5]. Each path is specified by a triple oftime delay, Doppler shift and attenuation, and can be mapped toa multi-dimensional frequency. Another example is super-reso-lution imaging [6], where any 2-D point source translates into a2-D complex sinusoid after passing through a Fourier imagingsystem.One of the essential goals in various applications is to mini-

mize the number of samples required to recover the underlyingfrequencies. Take wireless communications as an example,where training pilots are transmitted and extracted from thereceived signal to estimate the channel. The smaller the numberof pilots, the higher the data rate. Conventional channel estima-tion methods are often based on linear least-squares estimators[7], which requires the sample size to be greater than thedimensionality of the signal space determined by the maximaltime delay and Doppler shift. To reduce the required samplesize, conventional approaches are often based on parametricrepresentation, which directly estimate 2-D frequencies viasuper-resolution methods such as 2-D unitary ESPRIT [8],2-D MUSIC [9], Clark and Scharf’s IQML method [10], theMatrix Enhancement Matrix Pencil (MEMP) method [11],etc. However, many of these approaches require equi-spacedtime-domain samples. They also rely on prior knowledge onthe model order—the number of sinusoids. Moreover, thesemethods are often sensitive to model order mismatch and noise.Pioneered by the work of Candès et al. [12] and Donoho [13],

Compressive Sensing (CS) suggests that it is possible to recovera spectrally sparse signal from highly incomplete time-domainsamples. Specifically, consider a time-domain signal of ambientdimension , composed of distinct 2-D complexsinusoids. If the frequencies of the sinusoids lie approximatelyon the fine DFT grid of the normalized frequency plane [0,1)[0,1), the signal of interest can be sparsely represented over

the DFT basis. It has been demonstrated that the signal can berecovered from a random subset of time-domain samples with asample size of [14] via basis pursuit [15] or greedypursuit [16]. The success of CS has inspired a large body ofalgorithm and system design enabling sub-Nyquist sampling,notably for compressive channel sensing [17], [18], high-res-olution radar [19], [20], and multi-user detection [21]. Cautionneeds to be exercised, however, when approximating the contin-uous-valued frequencies over a discrete (DFT) grid, since thesignal of interest often contains off-the-grid components andmight not enjoy a good sparse approximation over the discretebasis. This effect has been studied in great details in [22], re-vealing considerable performance degradation of conventionalCS algorithmswhen applied to off-the-grid signals. Several pos-sible remedies have been suggested ever since, see for example

1053-587X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

CHI AND CHEN: COMPRESSIVE TWO-DIMENSIONAL HARMONIC RETRIEVAL VIA ATOMIC NORM MINIMIZATION 1031

[23]–[27]. Nevertheless, a grid is still assumed in these reme-dies and therefore the continuous-valued frequencies cannot berecovered perfectly. In addition, there seems to be little theoret-ical understanding of these approaches.Several recent works have been proposed to deal directly

with continuous-valued frequencies without imposing a dis-crete dictionary. In the pioneer work of [28], Candès andFernandez-Granda proves that perfect frequency extrapolationis possible from partial low-end time samples by solving atotal-variation minimization program. This analyses thereinreadily extend to multi-dimensional frequency models. Tanget al. [29] investigates the problem of 1-D spectral estimationwhen one is given randomly observed time-domain samples,and proves that atomic norm minimization [30] succeedswith samples, assuming that the wrap-arounddistance between distinct frequencies is at least . Moreprecisely, the atomic norm proposed by Chandrasekaran et al.[30] is a general recipe for developing convex optimizationsolutions for model selection, where the goal is to minimizethe number of selected atoms for a given parsimonious model.Many w1ell-known problems can be treated as a special caseof atomic norm minimization, including -minimization forsparse recovery where the atoms are unit-norm one-sparsevectors, nuclear norm minimization for low-rank matrix com-pletion where the atoms are unit-norm rank-one matrices, andso on. For spectrally sparse signals, the atoms are Vandermondevectors with a continuous-valued frequency in [0,1). It is worthnoting that the atomic norm for spectrally sparse signals isequivalent to the total-variation norm adopted in [28], [31].Another line of work has approached the multi-dimensionalharmonic retrieval problem via Enhanced Matrix Completion(EMaC) [32], [33], namely, to perform nuclear norm mini-mization over multi-fold Hankel matrices constructed from thetime-domain samples. This algorithm is guaranteed to workfrom random samples, provided that the signalmodel obeys some mild incoherence properties.In this paper, we extend the atomic norm minimization

approach by Tang et al. [29] to 2-D frequency models. Whenthe sample size exceeds the order of , the proposedatomic norm minimization algorithm is guaranteed to perfectlyrecover all 2-D frequency components with high probability,under a mild frequency separation condition. The proof isinspired by [29] and [28], that is, to construct a dual polynomialcertifying the optimality of the solution to the correspondingconvex program. We then propose to solve the atomic normminimization problem via semidefinite programming (SDP),which can be performed tractably using off-the-shelf SDPsolvers. However, unlike the case in 1-D model [29], the equiv-alence between the atomic normminimization and our proposedSDP is not guaranteed, primarily beacuse the Caratheodory’stheorem [34] does not hold in higher dimensions. Instead,we validate the effectiveness of the proposed SDP throughnumerical examples and its noise robustness is also examined.After the conference version [1] of this paper was published,Xu et al. developed a precise SDP characterization [35] of the2-D atomic norm minimization based on the theory of positivetrigonometric polynomials [36], where our proposed SDP can

be regarded as a first-order relaxation in their sum-of-squaresrelaxation hierarchy.The rest of the paper is organized as follows. In Section II, we

formulate the problem and review related literature. Section IIIpresents the proposed atomic norm minimization algorithmalong with its performance guarantee, whose proof is deferredto the Appendix. Section IV introduces a semidefinite programto approximate the original atomic norm minimization. Nu-merical experiments are supplied in Section V to validate thepractical applicability of our algorithm. Finally we concludein Section VI with a summary of our findings. Throughout thepaper, we use and to denote the transpose and theconjugate transpose, respectively.

II. PROBLEM FORMULATION AND RELATED WORK

A. Problem Formulation

Without loss of generality, consider a 2-D square data ma-trix of size , where

denotes theunion of the indices of . This assumption is imposed to sim-plify the development of the theoretical guarantees, and can beremoved with little modifications, see [29] for a similar treat-ment. Each entry of can be expressed as a superposition ofcomplex sinusoids observed at the time index ,i.e.

(1)

where represents the complex amplitude associated with each. Let

be the set of distinct frequencies. For notational simplicity,we introduce the following unit-norm atoms:

where , and . This allows us to writein a matrix form as follows

(2)

where is given by

(3)

(4)

and

(5)

Denote by the vectorizeddata matrix, then one has

(6)


where represents Kronecker product, and

satisfying .In this paper, we assume that entries of are observed

uniformly at random. Specifically, denote by as the indexset such where are observed if and only iff .Define the operator such that represents the orthog-onal projection of onto the subspace of matrices supportedon . We shall abuse the notation, without ambiguity, to let ,and represent the set of observed entries, all entries, and

the observation operator with respect to the vectorized signalas well.The primary focus of this paper is to recover the unobserved

entries of the original data matrix . We note that the frequen-cies can also be recovered using conventional approachessuch as the MEMP method [11] once the data matrix is recov-ered.

B. Conventional CS Approach

To apply conventional CS paradigms, we represents as asparse signal in a pre-determined basis by discretizing the 2-Dplane [0,1) [0,1) with grid points , where

. Write the resulting DFT basis as

where is a DFT matrix of dimension .The vectorized signal can then be represented using as

(7)

where is approximately sparse. CS suggests that we couldrecover using the -minimization as

where the minimizer is returned as an estimate of . The majorissue with the above approach is that the frequencies never lieperfectly on the grid , resulting in inevitable mismatch issuebetween the true frequencies and the discrete grid. It has beendemonstrated in [22] that the performance of sparse recoveryalgorithms can degenerate considerably. In this paper we willadopt a different approach and attempt to recover the frequen-cies directly without imposing a grid.

III. ATOMIC NORM MINIMIZATION FOR 2-DHARMONIC RETRIEVAL

The atomic norm is proposed in [30] as a general recipe of de-signing convex optimization solutions for model selection, byconvexifying the atomic set of the parsimonious models. Theatomic set of a signal model is defined as the simplest buildingblocks of the signal, such as unit-norm one-sparse vectors forsparse recovery, unit-norm rank-one matrices for low-rank ma-trix completion, and so on. Interested readers are referred to [30]for a detailed discussion about the atomic norm. In the case of

2-D harmonic retrieval, it is straightforward to define the atomicset as the collection of all normalized 2-D complex sinusoids:

and the atomic norm for a signal as

(8)

This is obtained by convexifying the atomic representation ofusing the smallest number of 2-D frequency spikes:

The above definition generalizes the atomic norm for 1-D har-monic signals in [29] and allows one to accommodate higherdimensions. Given partial observations of (or equivalently

), we attempt recovery via the following atomic normminimization program

(9)

namely, to seek a signal with minimal atomic norm satisfyingthe observation constraints. This approach is adopted in [29]for line spectrum estimation when the set of atoms is

. In [29], it is shown that a random subsetcontaining samples can ensure exact frequencyrecovery under a mild frequency separation condition.The following theorem establishes similar performance guar-

antees hold in the 2-D case, namely, the proposed algorithm (9)recovers the true data perfectly under a properly defined sep-aration condition, provided that the sample complexity exceedsthe order of .Theorem 1: Let . Suppose that we observe samples

of a data matrix in (1) on the index set of sizeuniformly at random, where . Suppose that

the signs of ’s are i.i.d. and uniformly drawn from ,and the minimum separation between ’s satisfies

(10)

where , are the wrap-around distances onthe unit circle. Then there exists a numerical constantsuch that if

(11)

then the solution to (9) is exact and unique with probability atleast . The same results hold with a different constant in(10) when the signs of ’s are i.i.d. uniformly generated on thecomplex unit circle.The proof can be found in Appendix B. Theorem 1 suggests

that as long as the frequencies are minimally separated as in(10), the recovery via atomic norm minimization is exact once


is on the order of . This orderwisebound agrees with the performance guarantee for line spectrumestimation as derived in [29].We compare Theorem 1 with conventional subspace methods

such as ESPRIT. ESPRIT is able to recover the underlying fre-quencies from consecutive samples of the data matrix .The number of samples required for exact recovery dependsonly on the underlying degrees of freedom irrespective of theambient dimension of . In contrast, the proposed algorithm(9) assumes random subsampling of the data matrix andrequires a slightly higher sample complexity about the order of

. Moreover, in the absence of noise, ESPRIT allowsrecovery without imposing a separation condition like (10).Note, however, that a separation condition is necessary whennoise is present, as detailed in [28], [37]. We will demonstratethrough numerical examples that the proposed algorithm (9) isstable under noisy observations as well.We also compare Theorem 1 with standard results in CS

[14]. When the frequencies in are indeed on the DFT grid,CS allows recovery of complex sinusoids from a number

of samples. The proposed algorithm (9) can beregarded as a remedy of CS for targets off the grid with slightlylarger sample complexity.

IV. APPROXIMATE SEMIDEFINITE PROGRAM TO SOLVE ATOMICNORM MINIMIZATION

Theorem 1 indicates that solving the atomic norm minimiza-tion problem (9) allows perfect recovery of the data matrix fromonly a small number of its time samples. However, a naturalquestion arises as to how to solve (9) in a tractable manner.Unfortunately, the exact semidefinite programming character-ization of atomic norm minimization in the line spectrum case,as proposed in [29], cannot be extended straightforwardly to2-D models, due to the fundamental difficulty of generalizingthe classical Caratheodory’s theorem [34] to higher dimensions.Nonetheless, in this section we propose a semidefinite programto approximately solve (9), which exhibits excellent empiricalperformance in Section V. We also provide a sufficient condi-tion when the proposed semidefinite program returns the solu-tion to (9).We describe the algorithm in the general case when the di-

mension of is , which is not necessarily square. Wewill still assume that satisfies (2), but slightly abuse nota-tions by letting

with , and

with , wherever theyare clear from context.Before presenting the algorithm, we first define a matrix en-

hancement using two-fold Toeplitz structures. Given a

matrix with and, we define an block Toeplitz matrix

from as

......

......

(12)

where each block is an Toeplitzmatrix defined from the th row of :

......

......

We use to represent the correspondingtwo-fold block Toeplitz matrix constructed from . It isstraightforward to verify that for any , an atom in the formof forms a two-fold blockToeplitz matrix. The following proposition presents a semidef-inite program that allows approximation of the atomic norm

.Proposition 1: Let be an matrix and

. Denote

(13)and let the objective value under . Then we have

Furthermore, if can be written as

(14)

where

with ’s being real and positive values, then .Proof: Let , where , then

where . This indicates that

Moreover, if the optimal and satisfy


then we have and by Schur com-plement condition. If we can write , then fallswithin the column space of or, equivalently, forsome vector . Let be any vector such that ,where is the sign vector of , then

This implies that

(15)

which is equivalent to . Therefore we have.

We propose to approximate the atomic norm minimizationalgorithm in (9) via the following semidefinite program

(16)

Unlike the 1-D algorithm proposed in [29], since it is not guaran-teed to write into a form of (14), the semidefinite programformulation (16) is in general not guaranteed to be equivalent to(9).Although the equivalence between (16) and (9) is not en-

sured, we can establish that if is not greater thanfor certain matrix (in general it could be as large

as ), it can indeed be written uniquely in the form of (14).This is characterized in the following proposition, whose proofis deferred to Appendix H.Proposition 2: If , thenis PSD if and only if it can be represented as (14).

From the above proposition, it is straightforward that if thesolution to (16) satisfies , thesemidefinite characterization of (16) is exact.Remark 1: The dual problem of (16) can be written as

where , is the Knonecker product of thesymmetric Toeplitz matrix generated by the -th standard basisvector, and the symmetric Toeplitz matrix generatedby the -th standard basis vector; and if ,and otherwise. This is exactly the first-order relaxationin sum-of-squares relaxation hierarchy proposed in [35] for theprecise SDP characterization of (19). Therefore, one can alsoemploy the checking mechanism proposed in [35] to determineif (16) is exact.

V. NUMERICAL SIMULATIONS

We present numerical examples to verify the performance ofthe proposed algorithm (16) for a data matrix of size .

In the first example, let . We randomly generatedfrequency pairs in [0,1) [0,1), with

where the coefficient of each frequency was generated with con-stant magnitude one and a random phase from . In typ-ical applications of interest such as radar or channel estimation,these frequency pairs correspond to delay, Doppler and ampli-tudes of the scatters. The actual frequency locations are depictedin Fig. 1(a). Each entry in was observed with probability

, with , which can be collected usingthe sub-Nyquist sampling framework described in [38].We thenimplemented (16) using CVX [39]. Notice that the number ofunknown parameters was . Fig. 1(b) shows the recov-ered frequency locations using basis pursuit (BP) by assumingthe signal is sparse in a DFT basis, and Fig. 1(c) shows the re-covered frequency locations using BP by assuming the signal issparse in a DFT frame oversampled by a factor of 4. Finally, therecovered frequency locations using MEMP [11] from the datamatrix recovered from (16) are depicted in Fig. 1(d), superim-posed on the ground truth. The reconstruction is perfect usingthe proposed approach when the data is noise-free.We also examine the phase transition of the proposed algo-

rithm (16). Let . For each pair of and the numberof modes , we ran 10 experiments, where in each experimentcomplex sinusoids (a) are generated randomly, or (b) generatedrandomly until a separation condition of is sat-isfied. The recovery was claimed successful if the normalizedmean squared error (NMSE) error ,where was the reconstructed data. Fig. 2 shows the successrate for each pair of and , with the grayscale of each cell re-flecting the empirical rate of success, for the two cases describedabove respectively in (a) and (b). Fig. 2(b) has a much sharperphase transition compared with (a), indicating that the numberof samples grows approximately linearly with respect to whenthe separation condition is imposed, in line with our theoreticalanalysis.We further compare the proposed algorithm with the EMaC

algorithm proposed in [33] by setting the pencil parameterstherein to be and respectively, which yieldsa two-fold Hankel matrix of size 20 20 to be completed.Fig. 2(c) shows the success rate of EMaC for each pair ofand under the same condition as Fig. 2(a) when the

frequencies are randomly generated. Our numerical examplesalso indicate that unlike the atomic norm approach, the phasetransition curve of EMaC is insensitive to the separation condi-tion. While the EMaC algorithm yields a much sharper phasetransition than that of the proposed algorithm for randomlygenerated frequencies, the range of its recoverable ismuch smaller, partially due to the small dimensionality of therelevant two-fold Hankel matrix when the data size is small.We further examine the performance of (16) in the presence

of noise. The noisy data was generated as

where was generated in the same way as in Fig. 2 withdifferent frequencies, and was standard additive white


Fig. 1. The recovered signal in the frequency domain for from measurements. (a) Ground truth; (b) BP (DFT basis); (c) BP (DFTframe); (d) proposed approach.

Fig. 2. Phase transition plots when : (a) the proposed algorithm with randomly generated frequencies; (b) the proposed algorithm with randomlygenerated frequencies satisfying a separation condition ; (c) the EMaC algorithm [33] with randomly generated frequencies. The success rate iscalculated by averaging over 10 runs.

Gaussian noise (AWGN) with each entry i.i.d. from .The signal-to-noise ratio is defined as ,which has been scaled with respect to the number of samples.The proposed algorithm was modified to incorporate noise as

(17)

Fig. 3 illustrates the NMSE against SNR under different samplecomplexity. When , there is a possibility of failurethat is not perfectly recovered even without noise, so the

NMSE is relatively large under all SNRs. When , 40,and 50, it is with high possibility that is perfectly recoveredwithout noise. When this was the case, the performance degen-erated gracefully as the SNR decreases. The performance alsoimproved when the number of samples increases, but the gainwas not as significant as long as it is above certain threshold.

VI. CONCLUSIONS

In this paper we explore estimation of 2-D frequency compo-nents of a spectrally sparse signal, when we are given a randomsubset of its regularly spaced samples. We formulate an atomicnorm minimization problem, and show that a sample size of


Fig. 3. NMSE vs SNR when , where frequency locations aregenerated randomly with a separation condition for differentnumbers of samples.

is sufficient to guarantee perfect frequency re-covery, provided that a mild separation condition is satisfied.Our work can be extended to an arbitrary higher dimension and asimilar semidefinite program can be proposed using a multi-foldblock Toeplitz matrix constructed similar to (14). Finally, it re-mains to be seen how to develop more efficient first-order algo-rithms in solving the semidefinite program (16), as generic SDPsolvers based on interior point methods are limited to small-di-mensionality problems.

APPENDIX AUSEFUL LEMMAS

We first present a few useful inequalities that will be used inthe proofs.Lemma 1. (Noncommutative Bernstein’s Inequality) [40]:

Let be independent zero-mean symmetric randommatrices of dimension . Supposeand almost surely for all . Then for any

,

(18)

Lemma 2. (Talagrand’s Concentration Inequality) [41]: Letbe a finite sequence of independent random variables

taking values in a Banach space and be defined as

for a countable family of real valued functions . Assume thatand for all and every . Then for

all ,

where , ,and is a numerical constant.Lemma 3. (Hoeffding’s Inequality) [42]: Let the components

of be sampled i.i.d. from a symmetric distribution on the

complex unit circle, , and be a positive real number.Then

APPENDIX BPROOF OF THEOREM 1

This section is dedicated to the proof of Theorem 1 when thesigns of ’s are randomly drawn from . The proof issimilar for the case where ’s are complex-valued, followingthe discussions in [28, Section 1.3]. A road map of the proofis given below. We will first characterize properties of a dualpolynomial that suffices to certify the optimality and unique-ness of the solution to (9), and then present a randomized dualconstruction scheme. Specifically, the construction scheme pro-duces a polynomial by randomizing the dual polynomial in [28]constructed for the full-observation case. Finally, we will showthat this random polynomial satisfies the optimality and unique-ness conditions with high probability.

A. Optimality Conditions for Dual Polynomial

The dual norm of is defined as

where . As a result, the dual problem associatedwith (9) is given by

(19)

where is the complement set of . Following stan-dard analysis (see [28]), the optimal solution of (9) is unique ifthere exists a dual polynomial

satisfying the following set of conditions

(20a)

(20b)

(20c)

where represents the complex sign. In the sequel we willproduce a dual polynomial satisfying the conditions (20a)–(20c)with high probability.

B. Fejér’s Kernel

In [28], the dual polynomial is constructed from the squaredFejér’s kernel [43], which is defined in the 1-D setting as

for , where


Two important features of are worth mentioning: 1) it isnonnegative, and 2) it exhibits rapid decay to zero as grows.We note that , which will beuseful in later analysis.In the 2-D setting, the corresponding Fejér’s kernel is defined

as

for . Let be the partialderivative of given by

C. Construction of Dual Polynomial

Following the argument in [29, Section IV-B], it is sufficientto consider a Bernoulli observation model such that each entryin is observed with probability

Specifically, we assign an i.i.d. Bernoulli random variable toindicate whether the th entry is observed, which satisfies

(21)

Define a randomized 2-D Fejér’s kernel as

(22)

where is defined in (21). Let be the partial deriva-tive of as

for any ,1,2. Then their expected values with respectto can be computed as

We propose to construct the dual polynomial of (19) as

(23)

i.e. a superposition of the randomized Fejér’s kernel and its first-order partial derivatives at the frequencies in .To establish Theorem 1, we need to verify that in

the form of (23) satisfies the hypotheses (20a)–(20c) withhigh probability. Apparently, Condition (20a) is satisfiedby the randomized construction scheme. The next step isto tweak the interpolation coefficients ,

and to satisfy(20b). Specifically, let the th entry of be

where represents the partial derivative of .Choose the coefficients , and such that

(24)

where1 , and obeys . De-note by the matrix on the left-hand side of the above equation

whose expected value is given by . Here, denotes

with each sub-block defined with the th entry being

where . In order to find a solution to(24), one first needs to demonstrate that is invertible. To thisend, we begin by presenting the following lemma, whose proofis given in Appendix C.Lemma 4: Under the conditions of Theorem 1, one has

Lemma 4 immediately implies that is invertible,

(25)

and

The matrix can then be expressed as

(26)

where is given by

... (27)

Similarly, one can write as

(28)

1 is the second-order derivative of at .


We will establish that the spectral norm of can be wellcontrolled, as stated in the following lemma. The proof is de-ferred to Appendix D.Lemma 5: Let . If , and

for some positive constant , then with prob-ability at least ,

Denote by the event , whichoccurs with probability . Conditional on ,one has

(29)

revealing the invertibility of . Writing for some, we have

where follows from [29, Corollary IV.5]. The inter-polation coefficients can thus be written as

(30)

With this choice, (20b) is satisfied trivially.

D. Verification of (20c)

What remains to be established is Condition (20c). We willfirst show that it holds on a regular grid ,and then extends it to the continuous domain.To proceed, define as

...

...

...

(31)

then the derivatives of the dual polynomial in (23) can bewritten as

Define the mean of such that

We have

where

We first need to establish that and can becontrolled uniformly over all . To this end, we apply similartechniques adopted in [29], which first bound and

on a regular grid , and then extend the result to allfrequencies. The following lemma quantifies the perturbationson a regular grid, whose proof is deferred to Appendix E.Lemma 6: Suppose . For a regular grid , there

exists a numerical constant such that if

(32)

then andfor ,1,2,3 with probability at least .Following Lemma 6, we immediately show that the event

occurs with probability at least on the grid .We will first extend to the whole continuous domain by

the following lemma whose proof can be found in Appendix F.Lemma 7: Suppose that . There exists a nu-

merical constant such that if satisfies (32) for someconstant , then

(33)

for with probability at least .Finally, we can establishes (20c) through the following

lemma, where the proof is supplied in Appendix G.Lemma 8: Suppose that . There exists a uni-

versal constant such that if

(34)

then with probability at least , one has for.


Combining the above lemmas, we have successfully con-structed a dual polynomial when satisfies (34),completing the proof of Theorem 1.

APPENDIX CPROOF OF LEMMA 4

Proof: When , using the result in [28,Proof of Lemma C.2], we have

where is the matrix infinity norm, i.e. the maximum abso-lute row sum. Since is symmetric and its diagonal entriesare all zero, by the Gershgorin’s circle theorem [44],

APPENDIX DPROOF OF LEMMA 5

Proof: First, write , where

is a random zero-mean self-adjoint matrix. We would like toapply Lemma 1. Since and

(35)

(36)

where (35) follows from , (36) followsfrom and

for where the last inequality follows fromin [29]. And

where the last inequality follows from (25). Applying Lemma 1and setting , one obtains

(37)

for some constant , leading to .APPENDIX E

PROOF OF LEMMA 6

Proof: We first write as

where

Write

where . To apply Lemma 2, we com-pute

and

when . Then from Jensen’s inequality,

We can then upper bound


thereafter following similar argument as in [29, Proof of LemmaIV.6],

where we have used and . By Lemma 2,if we let

and

if ,

otherwise,

then we have

with probability at least . Consequently

To bound on the set we use Hoeffding’sinequality and the union bound (see [29, proof of Lemma IV.8]),which gives

(38)

where the last term . Following similar argumentsin [29, Lemma IV.8] to bound each term in (38) we obtain

under (32) with probability at least.

Similarly we can bound on the set . From[29, Lemma IV.9], we can upper bound

under the event . Applying the Hoeffding’s inequality and theunion bound we have

(39)

following similar arguments in [29, proof of Lemma IV.9] tobound each term in (39) we obtainunder (32) with probability at least .

APPENDIX FPROOF OF LEMMA 7

First we have from [29] for some con-stant . Then using Bernstein’s polynomial inequality [45], wehave

If we select the grid such that for any , thereexists a point such that

The size of the grid is no smaller than .Conditioned on we have

Using the relationship , we can modify theconstant in the bound (32) accordingly.

APPENDIX GPROOF OF LEMMA 8

Proof: We divide the whole frequency domain as

and . From [28], we havefor . By Lemma 7 and let ,

using triangle inequality, it is straightforward to show, for with probability .On the other hand, for , from [28] we have

, and the Hessian matrix

is negative definite. In particular, we have ,, and . Let ,

with probability at least ,


following Lemma 7, hence and ,the matrix is also negative definite. Thereforefor . Combining the above, we have for

with probability at least .

APPENDIX HPROOF OF PROPOSITION 2

Proof: The sufficient condition is trivial. We now prove thenecessary condition. Since , the (1,1)-th block in (12),is a PSD Toeplitz matrix, by the Vandermonde decompositionlemma in 1-D, there exists a decomposition

(40)

where is an Vandermonde matrix with the th columnspecified by , and

where for . Given that is a PSDmatrix ofrank , then , and each block admits a decompositionfrom [46, Proposition 1] as

(41)

where and is a unitary matrix. Write as, where is a unitary matrix, is a diagonal matrix as

, with . Then can berewritten as

(42)

with . Combining with (40), can be written asfor some unitary matrix

. On the other hand, the principal submatrix of withentries from the first column of is also a PSD Toeplitz matrix,which can be written as

. . ....

.... . .

. . .

where is the first row of .The th entry of can be written as

Therefore we can write as

(43)

and is a Vandermonde matrix with the th column. Then can be written as

(44)

where . Therefore we have established Proposition2.

REFERENCES

[1] Y. Chi and Y. Chen, “Compressive recovery of 2-D off-grid frequen-cies,” in Proc. Asilomar Conf. Signals, Syst., Comput., Nov. 2013, pp.687–691.

[2] D. Nion and N. Sidiropoulos, “Tensor algebra and multidimensionalharmonic retrieval in signal processing for MIMO radar,” IEEE Trans.Signal Process., vol. 58, no. 11, pp. 5693–5705, Nov. 2010.

[3] M. Skolnik, Radar Handbook. New York, NY, USA: McGraw-Hill,1970.

[4] A. M. Sayeed and B. Aazhang, “Joint multipath-doppler diversity inmobile wireless communications,” IEEE Trans. Commun., no. 1, pp.123–132, Jan. 1999.

[5] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.Cambridge, U.K.: Cambridge Univ. Press, 2005.

[6] M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imagingby stochastic optical reconstruction microscopy (STORM),” NatureMethods, vol. 3, no. 10, pp. 793–796, 2006.

[7] Y. Chi, A. Gomaa, N. Al-Dhahir, and A. Calderbank, “Training signaldesign and tradeoffs for spectrally-efficient multi-user MIMO-OFDMsystems,” IEEE Trans. Wireless Commun., vol. 10, no. 7, pp.2234–2245, 2011.

[8] M. Haardt, M. Zoltowski, C. Mathews, and J. Nossek, “2D unitaryESPRIT for efficient 2D parameter estimation,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (ICASSP), 1995, vol. 3, pp.2096–2099.

[9] Y. Hua, “A pencil-music algorithm for finding two-dimensional an-gles and polarizations using crossed dipoles,” IEEE Trans. AntennasPropag., vol. 41, no. 3, pp. 370–376, 1993.

[10] M. Clark and L. Scharf, “Two-dimensional modal analysis based onmaximum likelihood,” IEEE Trans. Signal Process., vol. 42, no. 6, pp.1443–1452, 1994.

[11] Y. Hua, “Estimating two-dimensional frequencies by matrix enhance-ment and matrix pencil,” IEEE Trans. Signal Process., vol. 40, no. 9,pp. 2267–2280, Sep. 1992.

[12] E. Candés, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb.2006.

[13] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 2, pp. 1289–1306, Feb. 2006.

[14] E. Candes and J. Romberg, “Sparsity and incoherence in compressivesampling,” Inverse Problems, vol. 23, no. 3, p. 969, 2007.

[15] S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,” SIAM Rev., vol. 43, no. 1, pp. 129–159, 2001.

[16] J. Tropp and A. Gilbert, “Signal recovery from random measurementsvia orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no.12, pp. 4655–4666, Dec. 2007.

[17] W. Bajwa, J. Haupt, G. Raz, and R. Nowak, “Compressed channelsensing,” in Proc. 42nd IEEE Annu. Conf. Inf. Sci. Syst. (CISS), 2008,pp. 5–10.

[18] C. R. Berger, S. Zhou, J. C. Preisig, and P.Willett, “Sparse channel esti-mation formulticarrier underwater acoustic communication: From sub-space methods to compressed sensing,” IEEE Trans. Signal Process.,vol. 58, no. 3, pp. 1708–1721, 2010.

[19] M. Herman and T. Strohmer, “High-resolution radar via compressedsensing,” IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2275–2284,2009.

[20] R. Baraniuk and P. Steeghs, “Compressive radar imaging,” in Proc.IEEE Radar Conf., 2007, pp. 128–133.


[21] Y. Chi, Y. Xie, and R. Calderbank, “Compressive demodulation of mu-tually interfering signals,” 2013, arXiv preprint arXiv:1303.3904 [On-line]. Available: http://arxiv.org/abs/1303.3904

[22] Y. Chi, L. L. Scharf, A. Pezeshki, and R. Calderbank, “Sensitivity tobasis mismatch in compressed sensing,” IEEE Trans. Signal Process.,vol. 59, no. 5, pp. 2182–2195, May 2011.

[23] A. Fannjiang and H.-C. Tseng, “Compressive radar with off-grid tar-gets: A perturbation approach,” Inverse Problems, vol. 29, no. 5, p.054008, 2013.

[24] M. F. Duarte and R. G. Baraniuk, “Spectral compressive sensing,”Appl. Comput. Harmon. Anal., vol. 35, no. 1, pp. 111–129, 2013.

[25] Z. Yang, L. Xie, and C. Zhang, “Off-grid direction of arrival estimationusing sparse bayesian inference,” IEEE Trans. Signal Process., vol. 61,no. 1, pp. 38–43, 2013.

[26] Z. Tan, P. Yang, and A. Nehorai, “Joint sparse recovery methodfor compressed sensing with structured dictionary mismatch,”2013, arXiv preprint arXiv:1309.0858 [Online]. Available:http://arxiv.org/abs/1309.0858

[27] L. Hu, Z. Shi, J. Zhou, and Q. Fu, “Compressed sensing of complexsinusoids: An approach based on dictionary refinement,” IEEE Trans.Signal Process., vol. 60, no. 7, pp. 3809–3822, 2012.

[28] E. J. Candès and C. Fernandez-Granda, “Towards a mathematicaltheory of super-resolution,” Commun. Pure Appl. Math., vol. 67, no.6, pp. 906–956, 2014.

[29] G. Tang, B. Bhaskar, P. Shah, and B. Recht, “Compressed sensing offthe grid,” IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7465–7490,2013.

[30] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, “The convexalgebraic geometry of linear inverse problems,” in Proc. 48th IEEEAnnu. Allerton Conf. Commun., Control, Comput., 2010, pp. 699–703.

[31] E. J. Candès and C. Fernandez-Granda, “Super-resolution from noisydata,” J. Fourier Anal. Appl., vol. 19, no. 6, pp. 1229–1254, 2013.

[32] Y. Chen and Y. Chi, “Spectral compressed sensing via structured ma-trix completion,” presented at the Int. Conf. Mach. Learn., Atlanta, GA,USA, Jun. 2013.

[33] Y. Chen and Y. Chi, “Robust spectral compressed sensing via struc-tured matrix completion,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp.6576–6601, Oct. 2014.

[34] C. Carathéodory and L. Fejér, “Uber den zusammenghang der extemenvon harmonischen funktionen mit ihren koeffizienten und uber den pi-card-landauschen satz,” (in German) Rendiconti del Circolo Matem-atico di Palermo, vol. 32, pp. 218–239, 1911.

[35] W. Xu, J.-F. Cai, K. V. Mishra, M. Cho, and A. Kruger, “Precisesemidefinite programming formulation of atomic norm minimizationfor recovering -dimensional off-the-grid frequencies,” 2013,arXiv preprint arXiv:1312.0485 [Online]. Available: http://arxiv.org/abs/1312.0485

[36] B. Dumitrescu, Positive Trigonometric Polynomials and Signal Pro-cessing Applications. New York, NY, USA: Springer, 2007.

[37] D. Slepian, “Prolate spheroidal wave functions, Fourier analysis anduncertainty,” Bell Syst. Tech. J., vol. 57, no. 5, pp. 1371–1429, 1978.

[38] Y. Chi, “Sparse MIMO radar via structured matrix completion,” inProc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), 2013, pp.321–324.

[39] M. Grant, S. Boyd, and Y. Ye, CVX: Matlab software for disciplinedconvex programming. 2008 [Online]. Available: http://stanford.edu/~boyd/cvx

[40] J. Tropp, “User-friendly tail bounds for sums of random matrices,”Found. Comput. Math., vol. 12, no. 4, pp. 389–434, 2012.

[41] M. Talagrand, “Concentration of measure and isoperimetric inequal-ities in product spaces,” Publications Mathematiques de l’IHES, vol.81, no. 1, pp. 73–205, 1995.

[42] W. Hoeffding, “Probability inequalities for sums of bounded randomvariables,” J. Amer. Statist. Assoc., vol. 58, no. 301, pp. 13–30, 1963.

[43] J. Taylor, “Aggregate dynamics and staggered contracts,” J. PoliticalEconomy, pp. 1–23, 1980.

[44] S. Gershgorin, “Uber die abgrenzung der eigenwerte einer matrix,” Izv.Akad. Nauk USSR Otd. Fiz.-Mat. Nauk, no. 6, pp. 749–754, 1931.

[45] A. Schaeffer, “Inequalities of a. markoff and s. bernstein for poly-nomials and related functions,” Bull. Amer. Math. Soc, vol. 47, pp.565–579, 1941.

[46] L. Gurvits and H. Barnum, “Largest separable balls around the maxi-mally mixed bipartite quantum state,” Phys. Rev. A, vol. 66, no. 6, p.062311, 2002.

Yuejie Chi (S’09–M’12) received the Ph.D. degreein Electrical Engineering from Princeton Universityin 2012, and the B.E. (Hon.) degree in ElectricalEngineering from Tsinghua University, Beijing,China, in 2007. Since September 2012, she hasbeen an assistant professor with the departmentof Electrical and Computer Engineering and thedepartment of Biomedical Informatics at The OhioState University. She is the recipient of the IEEESignal Processing Society Young Author Best PaperAward in 2013 and the Best Paper Award at the

IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP) in 2012. She received the Ralph E. Powe Junior Faculty Enhance-ment Award from Oak Ridge Associated Universities in 2014, a Google FacultyResearch Award in 2013, the Roberto Padovani scholarship from QualcommInc. in 2010, and an Engineering Fellowship from Princeton University in2007. She has held visiting positions at Colorado State University, StanfordUniversity and Duke University, and interned at Qualcomm Inc. and MitsubishiElectric Research Lab. Her research interests include high-dimensional dataanalysis, statistical signal processing, machine learning and their applicationsin communications, networks, imaging and bioinformatics.

Yuxin Chen (S’09) received the B.S. degree in Mi-croelectronics with High Distinction from TsinghuaUniversity in 2008, the M.S. in Electrical and Com-puter Engineering from the University of Texas atAustin in 2010, and the M.S. in Statistics from Stan-ford University in 2013. He is currently a Ph.D. can-didate in the Department of Electrical Engineeringat Stanford University. His research interests includeinformation theory, compressed sensing, network sci-ence and high-dimensional statistics.

Date post:	17-Aug-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

1030 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. …yc5/publications/atomic2D.pdf1030 IEEE...

Documents