On the statistics of matching pursuit angles

On the Statistics of Matching Pursuit Angles

Lisandro Lovisolo1, Eduardo A. B. da Silva2 and Paulo S. R. Diniz2

1. PROSAICO – PEL/DETEL – UERJ, Rio de Janeiro, RJ, BRASIL2. LPS – COPPE/UFRJ, Cx. P. 68504, Rio de Janeiro, RJ, BRASIL

[[email protected], [email protected], [email protected]]

Abstract

Matching Pursuit decompositions have been employed for signal coding. For this purpose,Matching Pursuit coefficients need to be quantized. However, their behavior has beenshown to be chaotic in some cases; posing difficulties to their modeling and quantizerdesign. In this work, a different approach is presented. Instead of trying to modelthe statistics of Matching Pursuit coefficients, the statistics of the angle between theresidue signal and the element selected in each iteration of the Matching Pursuit arestudied, what allows to model Matching Pursuits coefficients indirectly. This approachresults in a simple statistical model. This is so because one observes that the statisticsof such angles do not vary substantially after the first Matching Pursuit iteration, andcan be approximately modeled as independent and identically distributed. Moreover, itis also observed that the probability density functions of Matching Pursuit angles arereasonably modelled by a single probability density function. This function dependsonly on the dictionary employed and not on the signal source. The derived statisticalmodel is validated by employing it to design Lloyd-Max quantizers for Matching Pursuitcoefficients. The Lloyd-Max quantizers obtained show good rate×distortion performancewhen compared to the state-of-the-art methods.

Key words:Source Coding, Overcomplete Representations, Matching Pursuits, Lloyd-MaxQuantization

1. Introduction

The Matching Pursuit algorithm (MP) [1][2] is a technique to decompose a signalthrough a weighted sum of atoms which are selected utilizing a greedy criterion. Itemploys an overcomplete collection of pre-defined elements or atoms; this collection isthe so-called dictionary and is overcomplete because it contains more elements thannecessary to span the signal space [1][2][3][4].

The MP has been used for: signal filtering and denoising [5][2][6]; analysis of thephysical phenomena underlying the signals associated with pattern recognition and signalmodeling [5][2][7][8][9][10][11][12]; time-frequency [1][2] and harmonic analyses [12][13]. Ithas been also employed for signal compression [5][14][15][16][17][13][18][19].

The MP starts by searching the dictionary atom that has the largest inner productwith the signal. The signal is first approximated by the selected atom scaled by its innerPreprint submitted to Signal Processing March 9, 2010

https://www.researchgate.net/publication/3079945_On_denoising_and_best_signal_representation?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/257291756_Adaptive_Nonlinear_Approximations?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/3079603_Data_compression_and_harmonic_analysis?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/2792928_Adaptive_Signal_Models_Theory_Algorithms_and_Audio_Applications?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

product with the signal. The approximation error or residue is the difference between thesignal and the scaled atom. To improve the approximation, the same process is appliedto the residue and iterated until a given approximation error norm or a given number ofatoms is reached. Since this algorithm searches for the best possible approximation ineach iteration it is said to be greedy.

The MP has been used in different flavors of encoding algorithms. In some cases itsusage resembles Gain-Shape Vector Quantization as it is used for extracting very fewatoms highly correlated with the signal and correspondent coefficients [5][20]. In a sim-ilar framework it was used to encode video residues [21][22]. When the MP is used forencoding the coefficients obtained in the decomposition must be quantized and trans-mitted along with atom indices. Some works employ in-loop quantization [15][16][17],i.e, the quantization is considered before computing the residue; others use off-loop (oropen-loop) quantization [18][5]. Quantizers providing good rate-distortion behavior arerequired. However, for designing such quantizers statistical models are required.

Although the MP has been present in the literature for over ten years a full under-standing of the statistical properties of MP residues and coefficients is not available. Inthis work we address this issue by analyzing the statistics of the angles between residuesand the atoms selected in MP iterations. We have verified that such statistics, afterthe first stage, have the asymptotic tendency to depend only on the dictionary used,being approximately independent of the signal being decomposed. That is, we reportthat these statistics can be modelled as depending only on the dictionary, after someinitial decomposition steps and not on the source. Therefore, for some applications theremight be no need to indicate the source being decomposed and the same quantizer mightbe used for any source and this has been verified in the MP coefficients quantizationapplication. That is, the observed result is employed to design an off-loop Lloyd-Maxquantizer for the coefficients of MP decompositions. This quantizer, when applied tocode MP decompositions and compared to state-of-the-art quantization schemes for MPcoefficients validates the adequacy of the statistical model.

Organization and Main Results – Off-loop quantization of MP coefficients isdiscussed in Section 2, along with the pros and cons of this coding framework for MPbased encoders. In Section 3, MP coefficients are expressed using the angles betweenresidues and atoms in Matching Pursuit iterations. In Section 4, we observe that, foroutcomes of a signal source whose probability density function depends of only the signalenergy, the statistics of MP angles are asymptotically independent of the iteration. Itis also observed that for other signal sources, whose corresponding probability densityfunctions do not depend only on the magnitude of the outcome but also on its angle,the same behavior is observed after some Matching Pursuit iterations. As a result,since the orientation of MP residues has chaotic behavior, we conjecture that, for mostsignal sources, MP angles can be approximately modeled as statistically independent andidentically distributed (iid) after the first MP iteration. In section 4.2, we state a result,which is later proved in Appendix A, regarding dictionaries that include an orthonormalbasis. For such dictionaries, after a number of iterations greater than or equal to thesignal space dimension, the MP has a non-zero probability of obtaining an exact signalrepresentation. From this result, one may adopt for such dictionaries two sets of statistics.One is used whenever the iteration number is smaller than the dimension of the signalspace and another in the remaining cases, i.e., when the iteration number is greater thanor equal to the signal space dimension. Considering the approximate statistical model

2

https://www.researchgate.net/publication/224054830_Coherent_decompositions_of_power_systems_signals_using_damped_sinusoids_with_applications_to_denoising?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/239541745_Modeling_of_Electric_Disturbance_Signals_Using_Damped_Sinusoids_via_Atomic_Decompositions_and_Its_Applications?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/224725109_Matching_pursuits_video_coding_using_generalized_bit-planes?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

of MP coefficients as being iid it is possible to obtain Lloyd-Max quantizers for MPcoefficients independently of the signal source considered, as presented in Section 5. TheLloyd-Max quantizer (LMQ) presented is compared to the state-of-the-art off-loop MPquantization [18] in Section 6. Results show that both quantization schemes have similarrate×distortion (RD) performance corroborating that the iid statistical model for MPangles is appropriate to be used in practice. Section 7 concludes the paper.

2. Off-loop Quantization of MP Coefficients

Let the dictionary be D = {gk} and k ∈ {1, . . . , #D} such that ‖gk‖ = 1 ∀k, where#D is the dictionary cardinality, i.e., the number of elements in D. In each iterationm ≥ 1, the MP searches for the atom gi(m) ∈ D, i.e., i(m) ∈ {1 . . .#D}, with largestinner product with the residual signal rm−1

x[1][2]. The initial residue is r0

x= x. The

residue of each iteration is given by rmx

= rm−1x

− γmgi(m), where γm = 〈rm−1x

,gi(m)〉 isthe inner product between rm−1

xand the atom gi(m). The atom and the residue resulting

of each MP iteration are orthogonal. Therefore, one can express x using the M -termrepresentation (or simply M -term)

x̂ ≈M∑

m=1

γmgi(m), (1)

where gi(m) is the atom selected at the m-th MP iteration and γm is referred to as the m-th MP coefficient. After M MP iterations, the approximation error is the M -th residuerMx

= x − x̂. Note that the MP is non-linear, since the decomposition of x = x1 + x2 isnot, in general, the sum of the decompositions of x1 and x2. In practice, the MP (thecalculation of γm, i(m), and rm

x) is iterated until: a) a prescribed distortion (‖rm

x‖) is

met, b) a maximum number of steps M is achieved, or, c) either a small decay rate or a

stationary behavior for the approximation metric ( |γm|‖rm−1

x‖ ) is observed [2][4][5].

A common practice for compressing MP decompositions is to quantize the coefficientsoff-loop. In this scheme, first the decomposition is obtained and then its coefficientsare quantized. That is, the quantizer is placed outside the decomposition loop withthe residues being computed without considering the quantization. In these cases thequantization may be either applied off-line, that is, after the entire M -term is availablesuch that all the M coefficients are provided to the coder, or on-line, i.e., each γm isquantized as soon as it is available. On-line coding delivers the quantized coefficients atan earlier stage than off-line, allowing the decomposition to be halted once the availablerate is achieved. On the other hand, off-line coding allows for the conceptually simplerate×distortion (RD) optimization procedure consisting of trying different quantizers inorder to find one meeting a prescribed RD criterion.

The state-of-the-art for off-loop quantization of MP coefficients is the adaptive expo-nentially upper bounded quantization (EUQ) [18]. It allocates bits and sets the quantizerdynamic range for each coefficient in the MP decomposition. It accomplishes this by firstsorting all the coefficients in decreasing magnitude order, and subsequently employinga uniform quantizer of distinct dynamic range and number of levels for each coefficient.The quantizer range for the m-th coefficient depends on the quantized value of the (m−1)-th coefficient, and the number of levels of each quantizer is decided using a bit-allocation

3

https://www.researchgate.net/publication/224054830_Coherent_decompositions_of_power_systems_signals_using_damped_sinusoids_with_applications_to_denoising?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/37435901_A_Posteriori_Quantization_of_Progressive_Matching_Pursuit_Streams?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

https://www.researchgate.net/publication/37435901_A_Posteriori_Quantization_of_Progressive_Matching_Pursuit_Streams?el=1_x_8&enrichId=rgreq-716a6ad9-b128-4e77-a43b-132b1799364c&enrichSource=Y292ZXJQYWdlOzIyMzY4MTU4NDtBUzoxMDExMTQ0MzExNDgwMzhAMTQwMTExODk2NzM3NA==

scheme based on a Lagrangian multiplier [18][23]. In order to allow the decoder to per-form the inverse quantization, the value of the largest coefficient and the number of bitsused to quantize the second largest coefficient are sent as side information. This schemehas a good RD performance particularly at low bit-rates [18]. Note that we have chosento benchmark the proposed statistical model of MP angles by designing a Lloyd-Maxquantizer and comparing its performance to the one of the EUQ, which has good RDperformance. If the results of the Lloyd-Max quantizer and the EUQ are comparable,we have an indication of the usefulness of the proposed statistical model.

When the M -term is quantized off-loop the signal is retrieved using

x̂q =

M∑

m=1

Q[γm]gi(m), (2)

where Q[γm] is the quantized value of coefficient γm. The quantization error relative tothe actual signal is given by d = x − x̂q and leads to the energy distortion per sample

d2 =1

N‖d‖2 =

1

N‖x − x̂q‖

2, (3)

where N is the signal dimension. However, since x = x̂+rMx

, then d2 is influenced by theresidual signal; therefore, d2 does not depend only on the quantization of the coefficients.In order to consider just the distortion due to quantization, we employ the quantizationerror of the M -term

dM = x̂ − x̂q =M∑

m=1

(γm − Q[γm])gi(m), d2M =

1

N‖x̂− x̂q‖

2. (4)

Noting that D is composed of unit norm vectors (‖gk‖ = 1, ∀k ∈ [1, #D]) and defining

eq(γm) = γm − Q[γm], (5)

the energy distortion per sample of the quantized M -term is given by

d2M =

1

N

M∑

m=1

e2q(γm) +

M∑

m=1

M∑

l=1l 6=m

eq(γm)eq(γl)〈gi(m),gi(l)〉

, (6)

For a given signal source X , one may consider the expected value of d2M

E[d2M ] =

1

N

M∑

m=1

E[

e2q(Γm)

]

+

M∑

m=1

M∑

l=1l 6=m

E[

eq(Γm)eq(Γi)〈gi(m),gi(l)〉]

. (7)

In this equation, Γm is a random variable (RV) corresponding to γm, for 1 ≤ m ≤ M ,for the signal source X . One should note that no assumption has been imposed on thenumber of decomposition steps M , that is, equation (7) is valid for any M , small orlarge. For designing an efficient quantizer the statistics of Γm are required. However,the statistics of MP coefficients vary across iterations [18]. Instead of searching for astatistical model for MP coefficients, we employ a statistical model for the angles in MPiterations which is discussed bellow.

4

3. Angles in Matching Pursuit Iterations

Let the dictionary be D = {gk}, considering that ‖gk‖ = 1, ∀k ∈ {1, . . . , #D}, at them-th MP iteration the angle between the residue rm−1

xand the selected atom gi(m) is

θm = arccos

(

〈rm−1x

,gi(m)〉

‖rm−1x ‖

)

. (8)

If gk ∈ D implies that −gk ∈ D, then the γm are always positive. In the sequel we assumethat the dictionaries possess this property. If this is not the case for a given dictionaryD′, then we create a new dictionary D = D′ ∪ D′′, where D′′ = {−gk} ∀ gk ∈ D′. Thismay require one more bit to index the atoms but since one saves the coefficient’s sign bit,the data rate of the M -term is unaltered. With D having this property, we have that

γ1 = ‖x‖ cos (θ1); ‖r1x‖ = ‖x‖ sin (θ1) (9)

γ2 = ‖x‖ sin (θ1) cos (θ2); ‖r2x‖ = ‖x‖ sin (θ1) sin (θ2) (10)

......

γm = ‖x‖

[

m−1∏

i=1

sin (θi)

]

cos (θm); ‖rmx‖ = ‖x‖

m∏

i=1

sin (θi). (11)

A relevant dictionary metric is the maximum angle between any signal x belongingto the signal space X and its closest atom in D

Θ(D) = arccos

(

minx∈X

[

maxi∈{1,...,#D}

(

|〈x,gi〉|

‖x‖

)])

. (12)

This dictionary metric was used in [24] in the context of Successive Approximation VectorQuantization. In the context of Matching Pursuit algorithms, it has also been calledstructural redundancy [25].

Since θm ≤ Θ(D) then equation (11) implies that the residue at the m-th step isbounded by

‖rmx‖ ≤ ‖x‖ sinm (Θ(D)). (13)

However, this bound is weak, since θi is in general distributed between zero and Θ(D).Note that equation (13) also leads to the known fact that MP expansions converge if Dspans the signal space [2][4][26], since, in this case, Θ(D) < π/2.

Although equation (13) bounds the norm of MP residues and, therefore, the norm ofMP coefficients, subsequent MP coefficients may not decrease in magnitude. Theorem 1uses Θ(D) to find an upper bound for the number of MP iterations required such thatthe coefficient magnitude decreases.

Theorem 1 After

l =

⌈

log [cos (Θ(D))]

log [sin(Θ(D))]

⌉

steps, (14)

the MP always produces coefficients γm+l ≤ γm.

5

Proof: Since γm = ‖x‖[

∏m−1i=1 sin (θi)

]

cos (θm) and γm+l =

‖x‖[

∏m+l−1i=1 sin (θi)

]

cos (θm+l) it follows that

γm+l = γm tan (θm) sin (θm+1) . . . sin (θm+l−1) cos (θm+l).

Given γm the largest possible value of γm+l is obtained if θi = Θ(D) ∀ i ∈ [m, . . . , m +l − 1] and θm+l = 0. Thus, one can guarantee that γm+l ≤ γm tan(Θ(D))[sin(Θ(D))]l−1.

Therefore, if tan (Θ(D)) [sin (Θ(D))]l−1 ≤ 1 then γm+l ≤ γm , what is equivalent to

l ≥log [cos (Θ(D))]

log [sin (Θ(D))], (15)

which implies equation (14).

Theorem 1 shows that once Θ(D) is known, the number of iterations l guaranteeing thatthe coefficient magnitude decreases is also known. Note that the bound provided byTheorem 1 is weak for the same reasons that the bound in equation (13) is weak. Inaddition, it is a difficult task to obtain Θ(D) for a given D and, in practice, statisticalestimates may suffice and can be obtained by decomposing a large set of signals from arandomly generated uniformly oriented source [27].

4. Statistics of the Angles in MP Iterations

Consider a memoryless independent and identically distributed (iid) N -dimensionalGaussian source N (0, σ2). It is a well known fact that its pdf depends just on the vectormagnitude [28]. Therefore, if one normalizes the outcomes x such that ‖x‖ = 1, thenthe resulting source will be uniform on the unit sphere. Figure 1 illustrates this factby presenting the probability density functions (pdf) of the orientation of normalizedvectors that are drawn from four different two dimensional random sources. The figureshows that the orientation of the Gaussian source is uniformly distributed.

−150 −100 −50 0 50 100 1500

0.02

0.04

0.06

0.08

Orientation − θ (deegres)

f Θ(θ

)

Orientation density for different normalized sources in 2−D

Gaussian σ2=1, mean=0Gamma shape=1, scale=1Uniform in [0,1]Uniform in [−1,1]

Figure 1: Probability density functions of the orientation of normalized vectors drawn from differentrandom sources in R

2.

It has been observed that after the first iterations, MP residues have a chaotic be-havior [29]. Thus it is reasonable to assume that, after some MP iterations, MP residuesdo not have any preferred orientation in the signal space, or, in more precise terms that

6

their probability distribution depends just on their magnitude. Thus, one could assumethat, if one normalizes the residuals such that ‖ri

x‖ = 1 then, after the first MP step,

they have a pdf that is uniform on the unit sphere. Since this is the same propertyenjoyed by the normalized iid Gaussian source, it seems reasonable to assume that anormalized iid Gaussian source may be a good model to such normalized MP residues.If this assumption is true then one may expect that after their first iterations, the anglesbetween the selected atoms and the MP residues may have a similar distribution to theone obtained for the angle in the first MP iteration for a memoryless iid Gaussian source.

We now investigate the validity of the above argument. In order to do so, we start byusing dictionaries D composed by normalized outcomes of a Gaussian iid source. Suchdictionaries composed by outcomes of random processes have been often used [3][18] toinvestigate properties of decomposition algorithms. We refer to a dictionary composed of#D unit-norm signals obtained by normalizing outcomes of an N -dimensional Gaussiansource as GSND(#D, N). Figure 2 shows the pdfs (actually, histograms normalized tohave unit area) of the RVs (Random Variables) Θm, which correspond to the anglesin the m-th MP iteration (for several m). These pdfs result from decompositions ofrealizations of a memoryless iid Gaussian source using a GSND(16, 4). Figure 3 showsthe mean and the variance of Θm for several m together with the covariance among someΘm for the same dictionary and source. The results in Figs. 2 and 3 were obtainedusing an ensemble of 50,000 MP decompositions of signals from the Gaussian source.In Figure 2 it can be verified that the pdfs of the RVs Θm have a similar shape for allm. This corroborates the above assumptions about the MP residue’s pdf, leading to theconjecture that the pdfs fΘm

(θm) are independent of the MP iteration number m andidentically distributed across MP iterations. Note that this conjecture is consistent withthe results presented in [29][4], where it is shown that, under specific conditions, the MPhas a chaotic behavior, i.e., the residue in step m maps, chaotically, to the residue instep m + 1.

Since in the example of Figure 2 the angles Θm are approximately identically dis-tributed, one argues whether they can be also modeled as statistically independent.Figure 3 gives us a hint on that. It depicts the covariances between MP angles in dif-ferent steps. As can be observed Cov[Θi, Θk] ≈ cδ(i − k) (c is a constant), i.e., theangles are uncorrelated. Of course, this does not imply the independence of the anglesRVs, instead, it shows that the independence assumption is consistent with the observedcovariance data. This suggests that it may be reasonable to model the angle RVs Θm asbeing statistically independent.

4.1. Angle Statistics for the Gabor Dictionary in R64

The results presented so far were obtained using a dictionary of relatively low dimen-sion and cardinality. In practice the MP is commonly used in high dimensional spacesusing parameterized dictionaries, as for example, the Gabor one [1][2]. The elements ofthe Gabor dictionary are defined by translations, modulations and dilations of a pro-totype signal [1][2][30]. The most common choice for the prototype signal f [n] is theGaussian window. The atoms of the Gabor dictionaries are complex valued, as such,for obtaining real coefficients, the optimal phase for the atom can be computed [31][5].However, in compression applications, a quantized phase is often employed. Therefore,we analyze here a real Gabor dictionary composed of atoms with phases being multiples

7

0 10 20 30 40 50 600

0.02

0.04

θ1

0 10 20 30 40 50 600

0.02

0.04

θ2

0 10 20 30 40 50 600

0.02

0.04

θ3

0 10 20 30 40 50 600

0.02

0.04

θ4

0 10 20 30 40 50 600

0.02

0.04

θ7

0 10 20 30 40 50 600

0.02

0.04

θ10

Figure 2: Probability density functions of Θm for a Gaussian source of R4 using the GSND(16, 4).

5 10 15 200

20

40 Expected value of Θm

m5 10 15 20

0

50

100

150 Variance of of Θm

m

5 10 15 200

50

100

150Cov[Θ

1Θ

m]

m5 10 15 20

0

50

100

150 Cov[Θ2Θ

m]

m

5 10 15 200

50

100

150Cov[Θ

5Θ

m]

m5 10 15 20

0

50

100

150 Cov[Θ10

Θm]

m

Figure 3: Mean, variance and covariance of Θm for a Gaussian source in R4 using the GSND(16, 4).

ofπ

V, V ∈ N

∗. Each atom is then given by

g[n] =

δ[n], if j = 0

K(j,p,k,v)f

[

n − p2j

2j

]

cos[

nkπ21−j +πv

V

]

, if j ∈ (0, L)

1√N

, if j = L

, (16)

where f [n] = 214 e−πn2

, n is the coordinate, K(j,p,k,v) provides a unit-norm atom, andv ∈ [0, . . . , V − 1]. For the Gabor atom j defines its scale, p defines the time shift, and kdefines the atom modulation. For L = log 2(N) scales, the atom parameters ranges arej ∈ [0, L], p ∈ [0, N2−j), k ∈ [0, 2j), and v ∈ [0, V − 1].

Figure 4.(a) shows fΘm(θm), the pdfs of the RVs Θm, for some m, obtained for

8

0 20 40 60 800

0.1

0.2

θ1

0 20 40 60 800

0.1

0.2

θ2

0 20 40 60 800

0.1

0.2

θ4

0 20 40 60 800

0.1

0.2

θ16

0 20 40 60 800

0.1

0.2

0.3

θ32

0 20 40 60 800

0.1

0.2

0.3

θ64

0.3

0.3

0.3

0.3

(a) Gaussian

0 20 40 60 800

0.1

0.2

θ1

0 20 40 60 800

0.1

0.2

θ2

0 20 40 60 800

0.1

0.2

θ4

0 20 40 60 800

0.1

0.2

θ16

0 20 40 60 800

0.1

0.2

0.3

θ32

0 20 40 60 800

0.1

0.2

0.3

θ64

0.3

0.3

0.3

0.3

(b) Gamma

Figure 4: Probability density functions of MP angles for two iid sources in R64 at iterations m =

{1, 2, 4, 8, 16, 32, 64}, for the 4-phase Gabor dictionary in R64.

an ensemble of 512,000 decompositions of signals from the R64 Gaussian source using a

four-phase Gabor dictionary (V = 4). Figure 4.(b) shows fΘm(θm), for some m, obtained

for an ensemble of 512,000 decompositions of signals driven from a memoryless source

9

that has Gamma distributed coordinates in R64 for the same dictionary. Note that the

statistics of the angles from these two sources differ significantly at the first MP iteration,but become closer as the iteration number m increases. The results in Figure 4 showthat, for an iteration number between 4 and 16, MP angles start to have very similarstatistics for both the Gaussian and the Gamma source. In addition, for both sourcesthe statistics of these angles are very similar to the ones that are obtained in the firstiteration for a Gaussian source. Indeed, for any source, as m increases the pdfs of theangles in different MP steps get more similar to the ones obtained for a Gaussian source.Similar results were observed for decompositions of signals taken from a source uniformlydistributed in the unit-length cube and also for 8×8 image blocks taken from frontal x-ray images. The pdf of the angles across different iterations when decomposing thoseimage blocks with the same dictionary are shown in Figure 5.

0 20 40 60 800

0.25

0.5

0.75

θ1

0 20 40 60 800

0.1

0.2

0.3

θ2

0 20 40 60 800

0.1

0.2

0.3

θ4

0 20 40 60 800

0.1

0.2

0.3

θ16

0 20 40 60 800

0.1

0.2

0.3

θ32

0 20 40 60 800

0.1

0.2

0.3

θ64

Figure 5: Probability density functions of MP angles when decomposing 8×8 blocks taken from s-rayimages at iterations m = {1, 2, 4, 8, 16, 32, 64}, for the 4-phase Gabor dictionary in R

64.

One should note that, although the statistics for other sources are not exactly equalto the statistics obtained for the first decomposition stage of signals from the Gaussiansource, they are reasonably close to the later. Therefore, the pdf of the first MP angle,fΘ1(θ1), obtained for a memoryless iid Gaussian source, may be an acceptable model forfΘm

(θm), m > 1, for any signal source. This further corroborates the assumption madeabove that, after the first step, the MP residuals when normalized to unit-energy areuniformly distributed over the unit sphere. Figure 6 presents the mean and the varianceof the MP angles as well as the covariance between the angles in MP iterations for thedecomposition of the same Gaussian source ensemble. As can be observed, similarly tothe case for GSND(16, 4) (see Figure 3), the covariance between Θm and Θm+l quicklydecreases with l. This indicates that, even when the dimensionality of the signal spaceis high, to model the Θm as independent may be a reasonable and accurate assumption.

10

10 20 30 40 50 60 700

20

40

60

80

Expected value of Θm

m10 20 30 40 50 60 70

0

5

10

Variance of of Θm

m

10 20 30 40 50 60 70

0

5

10

Cov[Θ1Θ

m]

m10 20 30 40 50 60 70

0

5

10

Cov[Θ4Θ

m]

m

10 20 30 40 50 60 70

0

5

10

Cov[Θ8Θ

m]

m 10 20 30 40 50 60 70

0

5

10

Cov[Θ16

Θm

]

m

10 20 30 40 50 60 70

0

5

10

Cov[Θ32

Θm

]

m 10 20 30 40 50 60 70

0

5

10

Cov[Θ72

Θm

]

m

Figure 6: Mean, variance and covariance between angles in MP iterations for a Gaussian source in R64

as a function of the iteration for the Gabor dictionary in R64 of four phases.

4.2. Dictionaries Including Orthonormal Bases

In the particular case that the dictionary includes an orthonormal basis, the assump-tion that the MP residues after the first step are identically distributed has to be slightlyreformulated. This is so because, as we demonstrate here, for such a dictionary, undervery general conditions, the MP algorithm has a non-zero probability of generating nullresidues in a finite number of steps. We refer to it as the null residue proposition. Inessence it shows that there exists a region of the space, of volume greater than 0, suchthat all vectors belonging to it lead to a null residue after a finite number of MP iter-ations. Its proof is presented in Appendix A, where it is shown that if D includes anorthonormal basis then the MP has a non zero probability of trapping the residuals ofthe MP decomposition of a given vector into successive subspaces that are orthogonal tothe previously selected atoms. Therefore, since at any iteration m it is known that theresidual rm

xis orthogonal to gi(k), k = 1, 2, . . .m, then there is a non-zero probability

that the MP will produce a null residue (rqx

= ~0) in a finite number of steps q ≥ N .Details on the conditions necessary for this to hold can also be found in Appendix A.

For example, when a GSND(20, 4) is used to decompose an ensemble of 25,600 sig-nals drawn from a four dimensional memoryless Gaussian source, allowing at most 100decomposition steps, none of the signal decompositions produces a null residue. How-ever, once a set of 4 elements of the same GSND(20, 4) is replaced by the canonicalbasis of R

4, exact signal expansions with finite number of terms are obtained. Figure 7shows the histogram of the number of null residues produced by the MP as a functionof the MP iteration for an ensemble of 25,600 signals drawn from a four dimensionalmemoryless Gaussian source when the GSND(20, 4) is modified to include the canonical

11

basis of R4. The last bin of the histogram accounts for the decompositions that did not

produce null residues. From this figure one can see that there are indeed signals thatare decomposed with a null residue for a number of iterations greater than or equal to4 (the signal dimension). This is in accordance with the results in Appendix A, whereCorollary 1 shows that for an iid Gaussian source the MP has a non-zero probability ofproducing null residues after N (the signal space dimension) iterations.

04 20 40 60 80 1000

200

400

600

800

1000

1200

1400

Step

Number of null residues at the step

Figure 7: Incidence of null residues using a GSND(20, 4) with 4 elements replaced by the canonicalbasis of R

4.

The null residue proposition is further illustrated in Figure 8, that shows the pdfsof the angles at steps 1, 4 and 8 for the aforementioned decomposition process. On theleft-hand side of Figure 8 are shown results for the original GSND(20, 4) dictionary, andon the right-hand side for the modified GSND(20, 4) dictionary. One observes on theleft-hand side of Figure 8 that for the original GSND(20, 4) null angles do not occur,which is equivalent to saying that the MP decompositions never generate a null residue.On the other hand, the impulses that appear at Θi = 0 (i ∈ {4, 8}) on the right-handside of Figure 8 show that for the modified GSND(20, 4) null angles often occur whenm ≥ 4. One should also note that for the modified GSND(20, 4) the percentage of nullangles are very similar after the fourth MP iteration. The probability of producing a nullresidue at a given MP step depends on both the dictionary D and on the signal source,and there is not yet a way to compute it theoretically; however, it can be estimated bysimulation. For instance, in the example of Figure 7, one can deduce from the presenteddata that the probability of having a null residue at step 4 is approximately 0.063.

A useful kind of such dictionaries are the ones including several distinct orthonormalbases. Among them one can highlight the dictionaries that are unions of orthonormalbases [2][32][6]. An example of a dictionary composed by a union of bases is the dictionaryformed by the normalized elements of the first shell of the ǫ8 lattice [33][34] (the ǫ8,sh1

dictionary). It is composed by 240 elements (in 120 directions) in R8, and for each of its

elements it contains either 126 or 110 orthogonal elements. Actually, the ǫ8,sh1 dictionarycan be regarded as the union of 30 bases for R

8. Figure 9 shows the pdfs of the MPangles Θm, m ∈ {1, . . . , 10}, for ǫ8,sh1 . They were obtained using an ensemble of 512,000MP decompositions of Gaussian source signals. In Figure 9, it is possible to verify thatthe pdfs of Θm, m ∈ {1, . . . , 7} are reasonably similar. Note also that, after a numberof steps m greater than or equal to 8 (the space dimension) the pdfs of the different Θm

are also quite similar and have a large incidence of zero angles. The above results show

12

0 10 20 30 40 500

0.1

0.2

θ1

0 10 20 30 40 500

0.1

0.2

θ4

0 10 20 30 40 500

0.1

0.2

θ7

0 10 20 30 40 500

0.1

0.2

θ10

0 10 20 30 40 500

0.1

0.2

θ1

0 10 20 30 40 500

0.1

0.2

θ4

0 10 20 30 40 500

0.1

0.2

θ7

0 10 20 30 40 500

0.1

0.2

θ10

Figure 8: Probability density functions of the angles at m = {1, 4, 8} using the original GSND(20, 4)(left) and the modified GSND(20, 4) (right) – the GSND(20, 4) with 4 of its elements replaced by thecanonical basis of R

4.

that, when the ǫ8,sh1 dictionary is employed, the statistics of the first MP angle for amemoryless white Gaussian source are appropriate to model the angles in the first sevendecomposition steps. However, after the eighth step, they are not appropriate anymore.Nevertheless, the results imply that only two distinct pdfs are needed to model the MPangles for the ǫ8,sh1: one pdf being valid up to the seventh step and another being validfor step 8 and beyond.

In order to better understand how the probability of producing null residues varieswith the space dimension, we generated dictionaries with a fixed ratio of the number ofbits necessary to represent an element of the dictionary to the number of bits needed toencode an element of a basis, that is, log2 #D

log2 N= 1+α (α can be regarded as a redundancy

factor). In this case the dictionary cardinality would be⌈

N1+α⌉

. Figure 10 shows the

probability of convergence for a GSND(⌈

N1+α⌉

, N)

for α = .25 and N ∈ [1, . . . , 53]with N of its elements replaced by the canonical basis. One observes that the probabilityof convergence has an approximately exponential decrease with N .

4.3. Discussion

In [29][4], the average value of the approximation ratio |γm|‖rm−1

x ‖ was studied. It cor-

responds to the cosine of the angle between the MP residual and the atom at the m-thstep. In those works it is discussed that the mean of the approximation ratio convergesto a fixed value. Here we support a stronger claim. We have observed that, after the firstiterations, the statistics of the angles between residuals and atoms can be considered tobe almost invariant across iterations. This happens for sources with very different anglestatistics, as is the case for the Gamma and Gaussian sources and x-ray image blocks(refer to Figs. 1, 4, and 5). Exceptions occur when the dictionary includes an orthonor-

13

0 5 10 15 20 25 30 35 40 450

0.2

0.4

θ1

0 5 10 15 20 25 30 35 40 450

0.2

0.4

θ2

0 5 10 15 20 25 30 35 40 450

0.2

0.4

θ3

0 5 10 15 20 25 30 35 40 450

0.2

0.4

θ4

0 5 10 15 20 25 30 35 40 450

0.2

0.4

0.6

0.6

0.6

0.6

0.6

θ6

0 5 10 15 20 25 30 35 40 450

0.2

0.4

0.6

θ8

0 5 10 15 20 25 30 35 40 450

0.2

0.4

0.6

θ9

0 5 10 15 20 25 30 35 40 450

0.2

0.4

0.6

θ10

Figure 9: Probability density functions of Θm for a Gaussian source in R8 using the ǫ8,sh1

dictionary.

0 10 20 30 40 50 6010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Space Dimension

Con

verg

ence

Pro

babi

lity

Figure 10: Probability of convergence for a GSND(⌈N1+α⌉, N) with N of its elements replaced by thecanonical basis for α = .25 and N ∈ [1, . . . , 53].

mal basis; since in these cases two different pdfs have to be used to model the angles inMP iterations, one valid at step numbers that are smaller than the signal dimension andanother for the remaining steps – the ones that are greater than or equal to the signalspace dimension.

The analysis of Figure 4 shows that, for the Gabor dictionary the pdf of the anglebetween residuals and atoms, after some MP iterations, is reasonably similar to the pdfobtained in the first MP iteration of the decomposition of a memoryless Gaussian source.This indicates that one might use a Gaussian source to obtain good estimates of the pdf

14

of the angle in MP iterations.For measuring the quality of these estimates some metric is necessary. For evaluating

the statistical similarities of the angles in MP iterations for different sources we usean “information similarity” metric, the Kullback-Leibler divergence [35]. For discretesources it is defined as

DKL(P ||Q) =∑

Pr(P (i)) log2

Pr(P (i))

Pr(Q(i)). (17)

The Kullback-Leibler divergence can be interpreted as a measure of the extra bits thatwould be required for coding outcomes of a source P using an optimal code designed forthe source Q. The larger the DK,L between two distributions the more different the dis-tributions are. It is capable of capturing statistical differences between two sources froma “quantity of information” point of view. It can vary from zero (identical distributions)to infinity.

Figure 11 shows DKL between the distributions of the angles in different decompo-sitions steps for the four-phases Gabor dictionary in R

64. These were computed usinghistograms of 200 bins for the angles. Since DKL(P, Q) 6= DKL(Q, P ), we show thedivergences computed in “both directions”, i.e., interchanging the distributions roles inequation (17). Figure 11.a) shows the DKL between the distributions of the angles indifferent decomposition steps for a Gaussian source, while Figure 11.b) shows the resultsfor the Gamma source. One can observe that the divergences between the angle in thefirst decomposition step and subsequent ones for the Gaussian source are much smallerthan for the Gamma source. For the Gaussian source, this suggests that it is reasonableto approximate the angle distribution fΘm

(θm) for m 6= 1 using fΘ1(θ1). That is, thedistribution of the angles in the first decomposition step of a Gaussian source capturesreasonably well the distributions of the angles in following steps, what does not occurfor the Gamma source.

1020

30

1020

30

0

0.5

1

1.5

2

Decomposition Step −− i

Gaussian Source −− DKL

(Θi || Θ

j)

Decomposition Step −− j

1020

30

1020

30

0

20

40

60

Decomposition Step −− i

Gamma Source −− DKL

(Θi || Θ

j)

Decomposition Step −− j

(a) Gaussian Source (b) Gamma Source

Figure 11: Kullback-Leibler Divergences between the angles in different decompositions steps for thefour-phases Gabor Dictionary in R

64.

Figure 12 shows the Kullback-Leibler divergences between the angle distributions ob-tained for the first decomposition step of the Gaussian and Gamma sources and the onesobtained in subsequent steps for the two sources. One can see that the distribution ofthe angle in the first decomposition step of a Gaussian source is reasonably close to theones obtained for both the Gaussian and the Gamma source after the first step. Figure

15

13 presents in the left graph the results of the same experiment but between the Gaus-sian source and the source of x-ray 8×8 image blocks, where the same behavior can beobserved. It also presents in the right graph the behavior of the Kullback-Leibler diver-gence across the Gamma and the x-ray image blocks source, in which case a differentbehavior is observed – none of the statistical models of the angle in the first MP iter-ation fits reasonably well the statistics of the angles in further MP iterations for thosesources. However, as can be see, the Gaussian source is capable of obtaining such amodel. Therefore, the results indicate that one could use a Gaussian source to obtaingood estimates of the pdf of the angle in MP iterations after the first iteration, irrespec-tive of the source distribution. As a result, the iid statistical model proposed in thissection is an appropriate assumption.

0 5 10 15 20 25 30 350

10

20

30

40

50

60

70

Decomposition Step

DK

L

DKL

(Θ1,Gaussian

||Θj,Gaussian

)

DKL

(Θ1,Gaussian

||Θj,Gamma

)

DKL

(Θ1,Gamma

||Θj,Gaussian

)

DKL

(Θ1,Gamma

||Θj,Gamma

)

Figure 12: Kullback-Leibler Divergences between the first angle distribution and subsequent anglesacross Gaussian and Gamma sources for the four-phases Gabor Dictionary in R

64.

0 5 10 15 20 25 30 350

10

20

30

40

50

60

70

Decomposition Step

DK

L

DKL

(Θ1,Gaussian

||Θj,Gaussian

)

DKL

(Θ1,Gaussian

||Θj,X−ray Image Blocks

)

DKL

(Θ1,X−ray Image Blocks

||Θj,Gaussian

)

DKL



)

0 5 10 15 20 25 30 350

10

20

30

40

50

60

70

Decomposition Step

DK

L

DKL

(Θ1,Gamma

||Θj,Gamma

)

DKL

(Θ1,Gamma


)

DKL


||Θj,Gamma

)

DKL



)

Figure 13: Kullback-Leibler divergences between the first angle distribution and subsequent angles acrossGaussian source and the source generated from 8×8 blocks from x-ray images for the four-phases GaborDictionary in R

64, at the left; and across a Gamma distributed source and the x-ray image blocks source.

In the sequel, we assess the effectiveness of this model by using it to design Lloyd-Max quantizers of MP coefficients. It should be noted that the statistics of the firstMP angle, and therefore of the first coefficient, have a much higher degree of sourcedependence than in other MP steps, as confirmed by the results in Figure 11. Therefore,in order, to make an appropriate use of the presented model for the angles betweenresiduals and atoms, the first coefficient, γ1, will be quantized with negligible error and

16

encoded as side information. For coding the remaining coefficients a Lloyd-Max quantizeris designed considering that the statistics of the MP angles are invariant with respect tothe iteration number and source. One should highlight that there is no goal of obtainingthe best MP quantization ever, but to assess the usefulness of the proposed model of thepdf of MP coefficients by designing a Lloyd-Max quantizer using it.

5. Angle Based Lloyd-Max Quantizers for MP Coefficients

In equation (11) the value of ‖x‖ is required to compute the coefficients. Alternatively,one can express the coefficients as a function of the first coefficient (γ1), in this caseequation (11) becomes

γm = γ1δm, δm = tan(θ1)

[

m−1∏

i=2

sin (θi)

]

cos (θm), m ≥ 2. (18)

Thus, the pdfs of the coefficients γm can be computed from the pdfs of the angles θm. Fora known γ1, the pdf of the RV Γm, for m ≥ 2, is given by fΓm

(γm|γ1) = fΓm(γ1δm|γ1) =

f∆m(δm|γ1), where ∆m is the RV whose outcome is δm (defined in equation (18)). If an

optimal quantizer Q[·] is designed for the RV Y , then the optimal quantizer for Z = cY(c is a constant) is simply a scaled version of Q[·]. Therefore, considering that instead ofquantizing γm one quantizes δm (see equation (18)) and also that γ1 is known, equation(7) becomes

E[d2M |γ1] =

γ12

N

M∑

m=2

E[

e2q(∆m)

]

+

M∑

m=2

M∑

l=2l 6=m

E[

eq(∆m)eq(∆l)〈gi(m),gi(l)〉]

. (19)

Thus, assuming γ1 to be known, if the pdfs of the RVs ∆m are known then E[d2M |γ1]

can be computed for any Q[·] applied to ∆m. For a given γ1 the quantization of MPcoefficients should aim to minimize the distortion per sample of the quantized M -term,defined in equation (19), incurring in the design of quantizers for ∆m, m ≥ 2. Since thequantization is applied to δm instead of γm, the value of γ1 is required at the decoder andtherefore has to be transmitted as side information. This strategy does not imply anyextra bit-rate cost; since when using the MP algorithm one has to transmit ‖x‖ as sideinformation if one wants to use coefficient quantizers that are independent of the signal’sdynamic range. If one sends γ1 instead of ‖x‖, one has the extra advantage of providingthe value of γ1 with a low quantization error, which reduces the overall distortion.

For transmission, it is important that if coefficients and/or atom indices are lost, thedecoder can just ignore the lost terms when reconstructing the signal. Therefore, in orderto improve error resilience, it would be desirable that the quantizer for a given γm beindependent of the quantized values of other γl (l 6= m). One way to accomplish this is touse the same quantization rule for all coefficients of the M -term. In this work we achievethat by designing a single quantizer. The same quantizer is applied for γm, 1 < m ≤ M ,being M the number of decomposition steps. For that purpose one considers that thevalues to be quantized are drawn from the sample space formed by the superposition ofthe ones corresponding to the ∆m (m ≥ 2). In terms of sample spaces, one has that

17

S∆ = ∪Mm=2S∆m

. Note that, in order to do this, we must ponderate the probabilitydensity function f∆(δ) accordingly. Given a number of MP iterations all iterations areequally likely, therefore, the pdf of ∆ is the average of the pdfs of each ∆m, that is

f∆(δ) =1

M − 1

M∑

m=2

f∆m(δm). (20)

5.1. Distortion for an Optimal Quantizer

In Section 3 it was observed that the RVs of the angles Θm can be assumed tobe uncorrelated. Although the RVs ∆m and ∆l may be correlated, when designing aquantizer for ∆ drawn from S∆ = ∪M

m=2S∆mthe assumption that eq(∆m) and eq(∆l)

(where eq(∆m) = ∆m − Q[∆m] and m 6= l) are uncorrelated is reasonable. It is alsoreasonable to assume that eq(∆m)eq(∆l) and 〈gi(m),gi(l)〉 are uncorrelated. In addition,due to the invariant nature of MP angle statistics, it is possible to infer that the expectedvalue of 〈gi(m),gi(l)〉 is invariant. More specifically, we consider that E

[

〈gi(m),gi(l)〉]

= c,i.e., it is constant. Applying these assumptions to the second term of the right hand sideof equation (19) yields

γ12

N

M∑

m=2

M∑

l=2l 6=m

E[

eq(∆m)eq(∆l)〈gi(m),gi(l)〉]

= (M − 2)E [eq(∆)]

M∑

m=2

E [eq(∆m)] c.

(21)

An optimal quantizer for ∆ should be an unbiased estimate of ∆, that is, E[eq(∆)] = 0[37]. This makes the expression in equation (21) to vanish. As a result equation (19)becomes

E[d2M |γ1] =

γ12

N

M∑

n=2

E[(∆m − Q[∆m])2]. (22)

Because the same Q[·] applies to all ∆m, 2 ≤ m ≤ M and f∆(δ) is given by equation(20), then

E[d2M |γ1] =

γ12(M − 1)

NE

[

(∆ − Q[∆])2]

, (23)

where the term M −1 multiplying the integral takes into account the contribution of theM − 1 coefficients. In addition, it should be highlighted that no restriction is appliedto the number of decomposition steps M and equation (23) applies as well for smalland large values of M . Equation (23) means that the distortion of the quantized MPdecomposition given its first coefficient (γ1) is equivalent to the quantization distortionof ∆. This random variable is given by the union of the random variables ∆m thatcorrespond to the MP coefficients normalized by γ1. Therefore, if the same quantizationrule is applied to all the coefficients in M -terms then one can minimize the quantizationdistortion by designing the optimal “mean square”, or Lloyd-Max quantizer (LMQ) [37]for ∆.

18

5.2. Lloyd-Max Quantizer Design

In order to design a Lloyd-Max quantizer (LMQ) [37] for ∆ we have to minimize theexpression in equation (23), that can be written as the following integral:

E[d2M |γ1] =

γ21(M − 1)

N

∫

(δ − Q[δ])2f∆(δ)dδ, with (24)

f∆(δ) =1

M − 1

M∑

m=2

f∆m(δm), with ∆m = tan(Θ1)

[

m−1∏

i=2

sin (Θi)

]

cos (Θm). (25)

The reconstruction levels and thresholds of the L-level or bcoef bits (L = 2bcoef levels)LMQ are calculated using an iterative algorithm as the one in [37], with the restrictionthat they are all constrained to be positive (see discussion following equation (11)). Anestimate of f∆(δ) is required for obtaining the LMQ. The f∆m

(δm) composing f∆(δ) (seeequation (20)) are computed using the model developed in Section 4, whereby the pdfsof MP angles are equal to fΘ1(θ1). To estimate fΘ1(θ1), an ensemble of #DN2 one-stepdecompositions of realizations from a Gaussian source was employed. The number of binsemployed to estimate fΘ1(θ1) was varied according to the value of bcoef , since for smallervalues of bcoef one needs less bins for optimizing the quantizer. We used 100×2bcoef binsfor estimating fΘ1(θ1).

Given that the quantizer design is independent of γ1; it is sufficient to design quan-tizers for γ1=1 and store their copies in both encoder and decoder. Since f∆(δ) dependson the number of MP terms M , there is a different LMQ for each pair of number of bitsper coefficient bcoef and M . In order to inform these values to the decoder, the bitstreamgenerated includes a header containing bcoef, M and γ1 quantized with low quantizationerror. The parameter γ1 is used to scale the quantizer defined by bcoef and M in bothcoder and decoder, a simple strategy that makes good use of resources and has the extraadvantage of providing the value of γ1 with low distortion at the decoder. This approachalso avoids the inadequacy of using the same statistical model for the first MP angle asnoted in subsection 4.3 for sources that are not Gaussian.

Figure 14 shows LMQs obtained for bcoef ranging from 1 to 4 bits and M = {2, 4, 8, 16},for a dictionary GSND(16, 4). One observes that, for a given bcoef, the LMQ is almostinvariant after a sufficiently large number of terms M . This can be understood throughthe argument that follows. From the definition of δm (see equation (18)), and consider-ing that the Θi are iid, the pdfs of successive RVs ∆m get narrower and their averagesapproach zero as m increases. Therefore, the probability that each ∆m is small increasesas m increases. Since ∆ = ∪M

m=2∆m, the probability of ∆ being close to zero also in-creases with M . Since sufficiently small values of δ are quantized to zero due to thequantizer dead-zone, as a result the distortion due to coefficient quantization in equation(22) changes very little after a given M = J . The value of J depends on fΘ1(θ1) (asthe Θi are considered to be iid), which in turn depends on the dictionary D. Thus, forpractical applications one designs bcoef bits LMQs only for the range from 2 to J MPsteps. For M > J the quantizer obtained for M = J can be used without major impact.J can be obtained or by observing the quantizer threshold and reconstruction levels orby measuring the quantization distortion as M increases. When their variation from Mto M + 1 is smaller than a threshold, then J is reached. Note that as bcoef increases thequantizer dead-zone shrinks, thus reducing the coefficient range that is mapped to zero.This implies that J also depends on bcoef.

19

0 0.5 1 1.50

0.5

1

1.5

Input − δ

Out

put −

Q[δ

]

1 bit quantizers

2 steps4 steps8 steps16 steps

0 0.5 1 1.50

0.5

1

1.5

Input − δ

Out

put −

Q[δ

]

2 bit quantizers


0 0.5 1 1.50

0.5

1

1.5

Input − δ

Out

put −

Q[δ

]

3 bit quantizers


0 0.5 1 1.50

0.5

1

1.5

Input − δ

Out

put −

Q[δ

]

4 bit quantizers


Figure 14: Coefficient LMQs for a dictionary GSND(16, 4), with γ1 = 1, for 2, 4, 8 and 16 terms.

6. Comparison to the State of the Art

The pair (M, bcoef) (number of terms and quantization levels L = 2bcoef ) definesthe LMQ that is applied to all the coefficients of the M -term. Since for an M -termdecomposition one has to encode M atoms and M − 1 coefficients (γ1 is encoded asside information in a header), it is easy to find a code such that the data rate of theLloyd-Max quantized MP decomposition is

R = M ⌈log2 (#D)⌉ + (M − 1) bcoef + bheader. (26)

Given an ensemble of signals from a certain source, one can optimize the overall rate-distortion (RD) performance of the proposed MP quantization method by using theprocedure that follows. We start by decomposing each signal of the ensemble using theMP algorithm. Then, for each target rate R one searches a bounded subset of N

2 for thepair (M, bcoef) that leads to the lowest average distortion for the ensemble. The (M, bcoef)pairs leading to the best RD performance curve for the given source can be stored inboth coder and decoder. Then, for transmitting a given signal, the coder simply choosesthe pair meeting a desired RD target. In the presented work the RD optimization ofthe LMQ is done considering just the Gaussian source, that is, the same set of pairs(M, bcoef ) are employed independently of the source being considered for coding. Whencoding different sources it is not necessary to indicate the source being coded, i.e. thesame quantizer is employed. It should be noticed that the selection of a (M, bcoef ) pairstrategy can be applied for both off-line and on-line quantization. In addition, this RDoptimization could be easily adapted to consider more sophisticated RD tradeoffs, as theone in [23].

The EUQ scheme [18] adjusts the number of quantization levels for each MP coefficientaccording to an RD estimative. The corresponding matlab code has been kindly provided

20

by the authors of [18] and has been used in our experiments. In this code entropyencoding is applied to the differences between successive quantization indices. In orderto minimize the impact of using different codes for the different quantization schemes thesame coding strategy was applied to the LMQ in our paper. This was done in order for thedifferences in coding performance to be due solely to the quantizer and not to the codingstrategy. Figure 15 shows the RD curves of quantized MP expansions originated fromthree different memoryless iid random sources in R

10 (Gaussian, uniform and Gamma)using both LMQ and EUQ for a dictionary GSND(128, 10). For each distinct sourcethe RD plots are averages over an ensemble of 100 signals. For this experiment LMQswere designed for (M, bcoef) ∈ ([1, . . . , 16], [1, . . . , 8]). The EUQ was set so that its headerinfo has the same length as the one of the LMQs, that is, the second coefficient can usefrom 1 to 128 quantization levels. It can be seen in Figure 15 that both quantizers havesimilar RD performance for all three sources; yet, LMQ has better RD performance thanEUQ at low rates. LMQ is also able to work at lower data rates than EUQ.

0 2 4 6 8 10 12 14 1610

−4

10−2

100

E[d

2 ]

Gaussian Source

LMQEUQ

0 2 4 6 8 10 12 14 1610

−4

10−2

100

E[d

2 ]

Uniform source

LMQEUQ

0 2 4 6 8 10 12 14 1610

−4

10−2

100

Rate [bits/sample]

E[d

2 ]

Gamma source

LMQEUQ

Figure 15: LMQ and EUQ RDs for three different random sources using the GSND(128, 10) dictionary.

The EUQ codes each signal with a different rate, i.e., its bit-rate is signal depen-dent since the quantizers it employs depends on the values of the coefficients in theM -term. Therefore, it is important that we evaluate the ability of the two methods toachieve a given data rate. In order to perform such an evaluation, we analyze their RDplots for individual realizations of a random source. Figure 16 shows the RD curvesof three realizations of a Gaussian source when their MP expansions using a dictionaryGSND(128, 10) are quantized with both the RD optimized LMQ and the EUQ. In Figure16 one notes that at rates below 12 bits/sample both methods achieve similar distortion.However, the existence of a larger number of RD points at low rate for LMQ than forEUQ indicates that the LMQ provides a finer and more accurate rate control.

Figure 17 shows the RD curves for LMQ and EUQ for the Gaussian, uniform andGamma iid sources in R

64 using the 4-phase Gabor dictionary, see equation (16). Forthat purpose the LMQs were designed with bcoef ∈ [1, . . . , 6]. For each distinct source theresults are averages over an ensemble of 200 signals. For instance, note that the codingof these signals employ a pair (M , bcoef) that ranges from (1,0) at the lower rate, (32, 4)

21

0 2 4 6 8 10 12 14 1610

−4

10−2

100

E[d

2 ]

0 2 4 6 8 10 12 14 1610

−4

10−2

100

E[d

2 ]

0 2 4 6 8 10 12 14 1610

−4

10−2

100

Rate [bits/sample]

E[d

2 ]

LMQEUQ

LMQEUQ

LMQEUQ

Figure 16: LMQ compared to the EUQ for the GSND(128, 10) dictionary – RD curves for three differentsignals drawn from a Gaussian source.

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

E[d

2 ]

Gaussian Source

LMQEUQ

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

E[d

2 ]

Uniform Source

LMQEUQ

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

Rate [bits/sample]

E[d

2 ]

Gamma Source

LMQEUQ

Figure 17: LMQ and EUQ RDs for three different random sources using the 4-phase Gabor dictionaryin R

64.

at 8 bits/sample, to (56, 6) at 16 bits/sample. The decompositions quantized with EUQrequire larger bit-rates because the bit allocation employed in [18] fails for dictionariesthat have large Θ(D) (as is often the case in spaces of high dimension). In order to makea fairer comparison with LMQ, and thus better assess the proposed MP angles model,we have introduced a small improvement to EUQ. We allow the EUQ bitstream to betruncated after coding any given number of terms in order to provide more accurate ratecontrol. That is the bit-allocation scheme of the EUQ is combined with the truncation

22

of the bitstream that it generates in order to stop coding after any given number of MPsteps, what indeed allows for a more precise rate-distortion control. Figure 18 presentsthe RD plots of both LMQ and the modified EUQ for three realizations of the randomGaussian source when using the Gabor dictionary of 4 phases in R

64. The decompositionsuse a total of 100 terms. The LMQs employed to obtain these graphs were designed for(M, bcoef) ∈ ([1, . . . , 64], [1, . . . , 8]). The EUQ was set to allow the quantization of thesecond coefficient using from 1 to 256 bits. In Figure 18 one observes that both the“truncated” EUQ and the LMQ have similar RD performance at low bit rates.

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

E[d

2 ]

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

E[d

2 ]

0 2 4 6 8 10 12 14 16 1810

−4

10−2

100

Rate [bits/sample]

E[d

2 ]

LMQEUQ

LMQEUQ

LMQEUQ

Figure 18: LMQ compared to the EUQ for the 4 phases Gabor dictionary in R64 – RD curves for three

different signals drawn from a Gaussian source.

Some experiments where also held on real-life signals. We present the results obtainedwhen coding pieces of an audio signal, with both LMQ and EUQ. In this experiment,the audio signal is divided into non-overlapping blocks of 64 samples each and bit-rate isallocated for each individual block independently of other blocks. The results are shownin Figure 19 where one can observe that the LMQ and EUQ have similar performancesindicating that the quantizers designed by the proposed approach may also be valid forcorrelated sources. Similar results were obtained for x-ray images.

7. Conclusion

When applying the Matching Pursuit for signal coding, the coefficients of the ob-tained decompositions need to be quantized. We have focussed on the design of off-loopquantizers for those coefficients, that is, first the decomposition is obtained and then itis quantized. However, in order for designing quantizers for those coefficients an appro-priate statistical model in required. As an alternative for such a statistical model, westudied the statistics of the angles in Matching Pursuit decompositions.

23

0.5 1 1.5 2 2.5 3 3.5 410

−4

10−3

10−2

Rate [bits/sample]

E[d

2 ]

Audio signal coding

LMQEUQ

Figure 19: Rate×Distortion performance for the quantization of the MP decomposition of an audiosignal using both LMQ and EUQ for the 4 phases Gabor dictionary in R

64.

The empirical analysis of the angle between the residue and the atom selected inMatching Pursuit iterations led to the conjecture that these angles may be statisticallymodeled as independent and identically distributed. Therefore, they can approximatelybe considered as being statistically invariant with respect to the decomposition iterationnumber. In addition, since Matching Pursuit residues tend to be chaotic, they do nothave any preferred orientation in signal space, thus resembling a memoryless white Gaus-sian signal source. As a consequence, we considered estimating the statistics of anglesbetween the residuals and corresponding atoms from the statistics of the angle betweenthe outcomes of a Gaussian iid source and its closest atom. Numerical results indicatethat this is indeed a good approximation.

It was also shown that if the dictionary includes at least one orthonormal basis thenthe Matching Pursuit has a non-null probability to produce a null residue whenever thenumber of decomposition steps is greater than or equal to the signal space dimension.Therefore, for such dictionaries the independent identically distributed statistical modelmust employ two different statistical sets. One set applies whenever the iteration numberis smaller than the signal dimension and the other applies when the number of theiteration is greater than or equal to the signal dimension.

The independent and identically distributed statistical model of angles between theresiduals and closest atoms in the Matching Pursuit algorithm was assessed by usingit to develop a method to encode MP decompositions based on Lloyd-Max quantizersfor Matching Pursuit coefficients. Their design is mainly based on the estimate of theprobability density function of the angle with the closest atom for signals drawn from awhite Gaussian signal source. The developed method for encoding MP decompositionswas compared to a state-of-the-art off-loop MP decomposition encoding scheme; bothencoders were shown to have similar rate-distortion performance. Therefore, the derivedLloyd-Max quantizer validates and shows the effectiveness the proposed statistical modelfor the MP angles.

24

A. Proof of the Null Residue Proposition

First we state some relevant definitions and facts:

1. The MP algorithm is employed to decompose vectors x from a source whose real-izations are a subset of R

N .

2. D is given by D ≡ {gk}k∈{1,...,#D} and its elements have unit norms, i.e., ‖gk‖ =1 ∀k ∈ {1, . . . , #D} and #D is finite.

3. All dictionary atoms are different, i.e., |〈gk,gj〉| < 1, ∀ j 6= k ∈ {1, . . . , #D}.

4. We use fX (y) to denote the pdf of the signal source X and fRmx

(y) to denote thepdf of the m-th MP residue.

Definition 1 The Voronoi region or cell [36] associated to each dictionary elementgk is

Vk =n

x ∈ RN | 〈x,gk〉 > 〈x,gj〉, j ∈ {1, . . . , #D} − {k}

o

.

a) Since gk 6= gj for j 6= k and ‖gk‖ = 1, then Vk 6= ∅ > 0, ∀ k ∈ {1, . . . , #D}, i.e.,∫

Vk

dx > 0, ∀ k ∈ {1, . . . , #D}.

b) Let d = mingj , gk∈D, j 6=k

‖gj − gk‖, then if ‖x‖ < d2 then gj + x ∈ Vj , ∀gj ∈ D.

c) If x ∈ Vk then ax ∈ Vk, a ∈ R∗+.

d) At the m-th MP step, if rm−1x

∈ Vk then the MP selects gi(m)=gk, i.e., i(m)=k, forapproximating rm−1

x.

e) Let P (m, k) = Pr(

rm−1x

∈ Vk

)

denote the probability of the m-th MP residual

being inside the Voronoi cell Vk. Note that

P (1, k) =

∫

Vk

fX (x) dx, . . . , P (m, k) =

∫

Vk

fRm−1x

(y) dy.

Definition 2 Let BQ ⊂ RN , BQ ≡ {gk}k∈{1,...,Q} be a set of Q orthonormal vectors, i.e.,

〈gi,gj〉 = δi,j , ∀ i, j ∈ [1, . . . , Q]. H(BQ) is defined as the Q-dimensional hyper-plane

spanned by BQ, i.e., x ∈ H(BQ) if and only if x =

Q∑

i=1

βigi, gi ∈ BQ.

Definition 3 Let BQ be a set of Q orthonormal vectors in RN , and H(BQ) as in Def-

inition 2. Given α > 0, ǫ > 0 and g ∈ H(BQ), we define SQ(αg, ǫ, N,BQ) as a Q-dimensional hyper-sphere of radius ǫ centered at αg that is contained in H(BQ).

a) Note that for α > 0 and x, g ∈ H(BQ), if ‖x−αg‖ < ǫ then x ∈ SQ(αg, ǫ, N,BQ).

b) Note that if BQ ⊂ BQ+1 and g ∈ H(BQ) then SQ(αg, ǫ, N,BQ) ⊂ SQ+1(αg, ǫ, N,BQ+1).

25

c) SQ(0, ǫ, N,BQ) denotes a Q-dimensional hyper-sphere of radius ǫ centered at theorigin.

Using the above definitions, the “null residue proposition” is stated in the theorem below.

Theorem 2 Let D ≡ {gk}k∈{1,...,#D} contain an orthonormal basis BN of RN , BN ≡

{glk}k∈{1,...,N}, i.e., ∃ IN ⊂ N, IN = {l1, . . . , lN} such that 〈gli ,glj 〉 = δ(i − j), for iand j ∈ {1, . . . , N}.

Let the signal source X have the following property: ∃ glk , lk ∈ IN , such that∃ αlk , ǫlk > 0 with

SN (αlkglk , ǫlk , N,BN ) ⊂ Vlk and fX (x) > 0, ∀ x ∈ SN (αlkglk , ǫlk , N,BN ). (27)

Then, there is a non-zero probability for the MP selecting only atoms that belong to BN

in the first q ≤ N steps, that is,

Pr(i(1) ∈ IN , i(2) ∈ IN , . . . , i(q) ∈ IN ) > 0. (28)

This implies a non-zero probability of the residue being null after the N -th MP iteration,that is,

Pr(rNx =0) > 0. (29)

In order to prove this theorem we need some auxiliary results.

Lemma 1 Let D be as in Theorem 2. Given 0 ≤ m < N , let BN−m ⊂ BN be anorthonormal set defined by BN−m ≡ {glk}lk∈IN−m

, IN−m ⊂ IN . Let Xm be a source

whose outcomes belong to H(BN−m). This means that if x ∈ Xm, then x =∑

li∈IN−m

βligli .

If the source Rq−1x

has the following property: ∃ glk , lk ∈ IN−m, such that ∃ αlk , ǫlk > 0with

SN−m(αlkglk , ǫlk , N,BN−m) ⊂ Vlk and fR

q−1x

(y) > 0 ∀ y ∈ SN−m(αlkglk , ǫlk , N,BN−m), (30)

Then, if i(q)=lk then ∀ li ∈ IN−m−1 = IN−m − {lk} there exist αli and ǫli such that

fRqx

(y | i(q)=lk) > 0, ∀ y ∈ SN−m−1(αligli , ǫli , N,BN−m−1). (31)

That is, if i(q)=lk then the resulting residue satisfies the assumption in eq. (30) ∀ li ∈IN−m−1.

Proof:

1. Since BN−m = {gl1 , . . . ,glN−m} is an orthonormal basis of H(BN−m) ⊂ R

N , any

x ∈ H(BN−m) can be expressed as x =

N−m∑

i=1

γligli . Thus,

∀ x ∈ SN−m(αlkglk , ǫlk , N,BN−m) we have that

x = αlkglk +

N−mX

i=1

βligli and then ‖x − αlkglk‖2 =

N−mX

i=1

βli2 ≤ ǫlk

2. (32)

26

2. If when approximating x the MP iteration selects glk , i.e., i(q)=lk, then the result-ing residue is rq

x= x−γlkgk. From eq. (32) it follows that γlk = 〈x,glk〉 = αlk +βlk

then

rqx

=∑

li∈IN−m−{lk}βligli → ‖rq

x‖ =

√

∑

li∈IN−m−{lk}βli

2 ≤ ǫlk .

That is, rqx∈ SN−m−1(0, ǫlk , N,BN−m−1), where BN−m−1 = BN−m−{glk}. Therefore,

from eq. (30) we have that

fRqx

(y | i(q)=lk) > 0, ∀ y ∈ SN−m−1(0, ǫlk , N,BN−m−1). (33)

3. It is a fact that ∀ li ∈ IN−m−1 one has that Vli ∩ SN−m−1(0, ǫlk , N,BN−m−1) 6= ∅.

(a) Note that Vli ∩ H(BN−m−1) 6= ∅ and 0 ∈ Vli ∩ H(BN−m−1).

(b) Note also that if x ∈ Vli ∩ H(BN−m−1) then ax ∈ Vli ∩ H(BN−m−1).

(c) Let x ∈ Vli ∩ H(BN−m−1), if a ≤ ǫ‖x‖ then ax ∈ SN−m−1(0, ǫlk , N,BN−m−1).

4. For each li ∈ IN−m−1 there exists SN−m−1(αligli , ǫli , N,BN−m−1) such that

SN−m−1(αligli , ǫli , N,BN−m−1) ∩ Vli = SN−m−1(αligli , ǫli , N,BN−m−1)

and SN−m−1(αligli , ǫli , N,BN−m−1) ⊂ SN−m−1(0, ǫlk , N,BN−m−1).(34)

This can be shown by the following arguments:

(a) From Definition 1.b) we have that for each gli ∈ BN−m−1 exists d > 0 suchthat

SN−m−1(gli , d, N,BN−m−1) ∩ Vli = SN−m−1(gli , d, N,BN−m−1). (35)

(b) From Definition 1.c), we know that if eq. (35) holds then ∀ a > 0,

SN−m−1(agli , ad,N,BN−m−1) ∩ Vli = SN−m−1(agli , ad,N,BN−m−1).

(c) It is known that ∀ y ∈ SN−m−1(agli , ad, N,BN−m−1), ‖y‖ is such that |a(1 − d)| ≤‖y‖ ≤ |a(1 + d)|.

(d) Therefore, if we select a <ǫlk

|1+d| then SN−m−1(agli , ad, N,BN−m−1) ⊂

SN−m−1(0, ǫlk , N,BN−m−1).

(e) Therefore, if we select 0 < αli < a and 0 < ǫli < ad then eq. (34) is valid.

5. From eqs. (33) and (34) we have that ∀ li ∈ IN−m−1 there exist αli and ǫli suchthat

fRqx

(y | i(q)=lk) > 0, ∀ y ∈ SN−m−1(αligli , ǫli , N,BN−m−1).

Lemma 2 Let D be as in Theorem 2. Let also BN−m and the source Rq−1x

be as inLemma 1. Let Pr(i(q+1)=li | i(q)=lk) be the probability of i(q+1)=li given that i(q)=lkthen

Pr(i(q+1)=li | i(q)=lk) > 0, ∀ li ∈ IN−m−1 = IN−m − {lk} (36)

Proof:

27

1. From Lemma 1 we know that if i(q) = lk then ∀ li ∈ IN−m−1 existsSN−m−1(αligli , ǫli , N,BN−m−1) such that fRq

x

(y | i(q) = lk) > 0,∀ y ∈ SN−m−1(αligli , ǫli , N,BN−m−1).

2. Since SN−m−1(αligli , ǫli , N,BN−m−1) ⊂ Vli , then ∀ li ∈ IN−m−1 we have that

Pr(i(q+1)=li | i(q)=lk) =

Z

Vli

fRqx

(y | i(q)=lk)dy ≥

Z

SN−m−1(αligli

,ǫli,N,BN−m−1)

fRqx

(y | i(q)=lk)dy > 0.

Proof of Theorem 2:

1. Since at each MP step gi(n) ⊥ rnx, then if from steps 1 to q the atoms selected

belong to BN we have that rqx⊥ gi(k), k ∈ [1, . . . , q], i.e., rq

x∈ H(BN−q), BN−q =

BN − {gi(1), . . . ,gi(q)}. Therefore, if i(1), i(2), . . . , i(q) ∈ IN then necessarily i(q +1) ∈/ {i(1), . . . , i(q)}. However, one still has to account for all the possible wayshow this selection can occur. Therefore, restricting the definition of IN−q to beIN−q = IN − {i(1), . . . , i(q)}, (a particular case of IN−m as defined in Lemma 1)one has that

Pr(i(1), i(2), . . . , i(q) ∈ IN) = Pr`

i(1) ∈ IN , i(2) ∈ IN−1, . . . , i(q) ∈ IN−(q−1)

´

=

X

lk(1)∈IN

P(1, lk(1))

8

<

:

X

lk(2)∈IN−1

Pr(i(2)=lk(2) | i(1)=lk(1))

2

4. . .

0

@

X

lk(q)∈IN−(q−1)

Pr(i(q)=lk(q) | i(q−1)=lk(q−1))

1

A

3

5

9

=

;

> 0.

(37)

2. Now we prove that eq. (37) is valid for q ≤ N by induction.

(a) Note that SN (αlkglk , ǫlk , N,BN ) ⊂ [Vlk ∩ H(BN )]. Therefore, eq. (30) impliesthat

P(1, lk) =

Z

Vlk∩H(BN )

fX (y)dy ≥

Z

SN (αlkglk

,ǫlk,N,BN−m)

fX (y)dy > 0.

Note that for the source considered both Lemma 1 and 2 are valid. That is,∀ li ∈ IN−1, ∃ αli and ǫli such that

fR1x

(y | i(1) ∈ IN ) > 0, ∀ y ∈ SN−1(αligli , ǫli , N,BN−1),

and

Pr(i(1) ∈ IN , i(2) ∈ IN−1) =X

lk∈IN

P(1, lk)X

li∈IN−1

Pr(i(2)=li | i(1)=lk) > 0.

(b) Suppose now that for at least one li ∈ IN−(M−1), ∃ αli and ǫli such that

fR

M−1x

(y | i(M−1) ∈ IN−(M−2)) > 0, ∀ y ∈ SN−(M−1)(αligli , ǫli , N,BN−(M−1)),

and that eq. (37) is greater than zero for q=M .

i. From Lemma 1, we know that ∀ li ∈ IN−M , ∃ αli and ǫli such that

fRMx

(y | i(M) ∈ IN−(M−1)) > 0,∀ y ∈ SN−M (αligli , ǫli , N,BN−M ).

28

ii. Therefore, from Lemma 2 we have that

∑

li∈IN−M

Pr(i(M+1)=li | i(M)) > 0. (38)

iii. Therefore, if Pr (i(1), i(2), . . . , i(M)∈IN ) > 0 thenPr (i(1), i(2), . . . , i(M+1)∈IN )>0.

(c) If i(1), i(2), . . . , i(N−1) ∈ IN then I1 = IN −{i(1), i(2), . . . , i(N−1)} = {l̂}and the resulting residue is rN−1

x= αg

l̂(that is orthogonal to the previously

selected elements). Since gl̂∈ D, the MP makes i(N)=l̂, therefore

Pr(i(1), i(2), . . . , i(N−1), i(N) ∈ IN ) = Pr(i(1), i(2), . . . , i(N−1) ∈ IN ) > 0.

(d) In item b) is was shown that if eq. (28) is true for q=M then it is also truefor q=M +1; in item a) it was verified that eq. (28) is true for q=2; in item c)it was verified that for q=N eq. (28) equates its value for q=N −1 therefore,by induction, eq. (28) is true for any q ≤ N .

3. Note that the selection i(N)=l̂ generates a null residue,therefore

Pr(rNx =0) = Pr(i(1), i(2), . . . , i(N−1) ∈ IN) > 0.

Corollary 1 Let D be as in Theorem 2. If the signals to be decomposed using the MPcome from an iid Gaussian source then

Pr(rqx=0) > 0, q = N. (39)

Proof: The N -dimensional iid Gaussian source is such that fX (x) > 0, ∀ x ∈ RN ,

satisfying the condition in eq. (27) ∀ glk ∈ BN . This results in Pr(

rNx

=0)

> 0.

References

[1] S. Mallat and Z. Zhang, ”Matching pursuits with time-frequency dictionaries”, IEEE Trans. on

Signal Proc., vol. 41, no. 12, pp. 3397-3415, 1993.[2] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, USA, 1998.[3] V. K. Goyal, M. Vetterli, and N. T. Thao, ”Quantized overcomplete expansions in R

N : Analysis,synthesis, and algorithms”, IEEE Trans. on Information Theory, vol. 44, no. 1, pp. 1631, 1998.

[4] G. Davis, Adaptive Nonlinear Approximations, Ph. D. Thesis, New York University, 1994.[5] L. Lovisolo, M. A. M. Rodrigues, E. A. B. da Silva, and P. S. R. Diniz, “Efficient coherent decom-

positions of power systems signals using damped sinusoids”, IEEE Trans. on Signal Proc., vol. 53,pp. 3831-3846, 2005.

[6] H. Krim, D. Tucker, S. Mallat, and D. Donoho, “On denoising and best signal representations”,IEEE Trans. on Information Theory, vol. 45, no. 7, pp. 2225-2238, 1999.

[7] P. Vera-Candeas, N. Ruiz-Reyes, M. Rosa-Zurera, D. Martinez-Munoz, and F. Lopez-Ferreras,“Transient modeling by matching pursuits with a wavelet dictionary for parametric audio coding”,IEEE Signal Proc. Let. , vol. 11, pp. 349-352, 2004.

[8] R. Gribonval and E. Bacry, “Harmonic decomposition of audio signals with matching pursuit”,IEEE Trans. on Signal Proc., vol. 51, pp. 101-111, 2003.

[9] R. Heusdens, R. Vafin and W. B. Kleijn, “Sinusoidal modeling using psychoacoustic-adaptive match-ing pursuits”, IEEE Signal Proc. Let., vol. 9, pp. 262-265, 2002.

29

[10] M. M. Goodwin and M. Vetterli, “Matching pursuits and atomic signal models based on recursivefilters banks”, IEEE Trans. on Signal Proc., vol. 47, no. 7, pp. 1890-1902, 1999.

[11] S. Jaggi, W.C. Carl, S. Mallat and A. S. Willsky, “High resolution pursuit for feature extraction”,Appl. Comput. Harmon. Anal., vol. 5, no. 4, pp. 428-449, 1998.

[12] M. M. Goodwin, Adaptive Signal Models: Theory, Algorithms, and Audio Applications, Kluwer,Boston, USA, 1998.

[13] D. L. Donoho, M. Vetterli, R. A. DeVore and I. Daubechies, “Data compression and harmonicanalysis”, IEEE Trans. on Inf. Theory, vol. 44, no. 6, pp. 2435-2476, 1998.

[14] K. Engan, S. O. Aase and J. H. Husoy, “Multi-frame compression: Theory and design”, Elsevier

Signal Proc., vol. 80, pp. 2121-2140, 2000.[15] R. Neff and A. Zakhor, “Very low bit-rate video coding based on matching pursuits”, IEEE Trans.

on Circ. and Syst. for Video Tech., vol. 7, pp. 158-171, 1997.[16] O. K. Al-Shaykh, E. Miloslavsky, T. Nomura, R. Neff and A. Zakhor, “Video compression using

matching pursuits”, IEEE Trans. on Circ. and Syst. for Video Tech., vol. 9, pp. 123-143, 1997.[17] R. Caetano, E. A. B. da Silva and A. G. Ciancio, “Matching pursuits video coding using generalized

bit-planes”, in IEEE Inter. Conf. on Image Proc., Rochester, NY, USA, 2002.[18] P. Frossard, P. Vandergheynst, R. M. Figueras I Ventura and M. Kunt, “A posteriori quantization

of progressive matching pursuit streams”, IEEE Trans. on Signal Proc., vol. 52, no. 2, pp. 525-535,2004.

[19] C. De Vleeschouwer and B. Macq, “SNR scalability based on matching pursuits”, IEEE Trans. on

Multimedia, vol. 2, no. 4, 2000.[20] L. Lovisolo, M. P. Tcheou, E. A. B. da Silva, M. A. M. Rodrigues and P. S. R. Diniz, “Modeling of

electric disturbance signals using damped sinusoids via atomic decompositions and its applications”,EURASIP Journal on Applied Signal Processing, Hindawi, 2007.

[21] R. Neff and A. Zakhor, “Modulus quantization for matching-pursuit video coding”, IEEE Trans.

on Circ. and Syst. for Video Tech., vol. 10, pp. 895-912, 2000.[22] C. De Vleeschouwer and A. Zakhor, “In-loop atom modulus quantization for matching pursuit and

its application to video coding”, IEEE Trans. on Image Proc., vol. 12, pp. 1226-1242, 2003.[23] M. Gharavi-Aikhansari, “A model for entropy coding in matching pursuit”, in IEEE Inter. Conf.

on Image Proc., 1998.[24] E.A.B. Da Silva, D. G. Sampson and M. Ghanbari, “A successive approximation vector quantizer

for wavelet transform image coding”, IEEE Trans. on Image Proc., vol. 5, Feb, 1996.[25] P. Frossard and P. Vandergheynst, “Redundancy in non-orthogonal transforms”, in IEEE Interna-

tional Symposium on Information Theory, June 2001.[26] Z. Zhang, Matching Pursuits, Ph. D. Dissertation, New York University, 1993.[27] L. Lovisolo and E. A. B. da Silva, “Uniform distributions of points on an hyper-sphere with appli-

cations to vector bit-plane encoding”, IEE Proc. on Vision, Image and Signal Proc., vol. 148, pp.187-193, 2001.

[28] J. Hamkins and K. Zeger, “Gaussian source coding with spherical codes”, IEEE Trans. on Inf.

Theory, vol. 48, no. 11, pp. 2980-2989, 2002[29] G. Davis, S. Mallat and M. Avellaneda, “Adaptive greedy approximations”, Journal of Constructive

Approx., vol. 13, pp. 57-98, 1997.[30] I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, USA, 1991.[31] S. E. Ferrando, L. A. Kolasa and N. Kovacevic, “Algorithm 820: A flexible implementation of

matching pursuit for Gabor functions on the interval”, ACM Trans. on Math. Software, vol. 28,pp. 337-353, 2002.

[32] R. Gribonval and M. Nielsen, “Sparse representations in unions of bases”, IEEE Trans. on Infor-

mation Theory, 2003.[33] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer Verlag, New

York, USA, 2 ed., 1993.[34] E. A. B. da Silva, Wavelets Transforms for Image Coding, Ph. D. Thesis, University of Essex,

Essex, England, 1995.[35] S. Kullback, “The Kullback-Leibler distance”, The American Statistician, vol. 41, pp. 340=341,

1987.[36] A. Gersho and R. M. Gray, Vector quantization and signal compression, Kluwer Academic Pub-

lishers, Boston, USA, 1992.[37] A. K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ 07632,

13 ed., 1989.

30

Date post:	10-Nov-2023
Category:	Documents
Upload:	uerj
View:	0 times
Download:	0 times

On the statistics of matching pursuit angles

Documents