IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, …arie/Files/tsp10.pdf · 2010. 12....

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010 5057

Blind Separation of Gaussian SourcesWith General Covariance Structures:

Bounds and Optimal EstimationArie Yeredor, Senior Member, IEEE

Abstract—We consider the separation of Gaussian sources ex-hibiting general, arbitrary (not necessarily stationary) covariancestructures. First, assuming a semi-blind scenario, in which thesources’ covariance structures are known, we derive the max-imum likelihood estimate of the separation matrix, as well as theinduced Cramér–Rao lower bound (iCRLB) on the attainableInterference to Source Ratio (ISR). We then extend our results tothe fully blind scenario, in which the covariance structures areunknown. We show that (under a scaling convention) the Fisherinformation matrix in this case is block-diagonal, implying thatthe same iCRLB (as in the semi-blind scenario) applies in this caseas well. Subsequently, we demonstrate that the same “semi-blind”optimal performance can be approached asymptotically in the“fully blind” scenario if the sources are sufficiently ergodic, or ifmultiple snapshots are available.

Index Terms—Blind source separation, independent componentanalysis, nonstationarity, second-order statistics, time-varying ARprocesses.

I. INTRODUCTION

B LIND source separation (BSS) consists of recovering un-observed source signals from their observed mixtures. In

the classical BSS setup the mixture is assumed to be real-valued,linear, static, square-invertible and noiseless, and can be ex-pressed using the following matrix notation:

(1)

where is a matrix containing theunobserved source signals (each of length ) as its rows; isthe unknown mixing matrix (assumed to be nonsingular);and is the matrix of observedmixture signals. The sources (and therefore the observations)are all assumed to have zero mean. We shall denote thedemixing matrix as .

The term “blind” usually implies that the mixing matrixis completely unknown and that the only available informationregarding the sources is their mutual statistical independence(giving rise to the term independent component analysis (ICA)

Manuscript received October 20, 2009; accepted May 31, 2010. Date of publi-cation June 21, 2010; date of current version September 15, 2010. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Prof. Ljubisa Stankovic.

The author is with the School of Electrical Engineering, Tel-Aviv University,Tel-Aviv 69978, Israel (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2010.2053362

in this context). In a “semi-blind” scenario, more a priori struc-tural or statistical information about the mixing matrix and/orabout the sources might be available. In the context of this workwe shall use the term “semi-blind” to refer to cases in which theadditional information pertains only to statistical properties ofthe individual sources.

Generally, second-order statistics (SOS) are insufficient forsolving the BSS problem (i.e., for obtaining consistent esti-mates of the sources), neither in a blind nor in our semi-blindscenario. For example, when each of the sources has an in-dependent, identically distributed (i.i.d.) time-structure, evenperfect knowledge of the observations’ SOS can only be usedfor spatial whitening, leaving an unknown residual (orthogonal)mixing, which can only be resolved using some additional sta-tistics, such as higher-order statistics (HOS). Consequently, itis well-known that Gaussian sources with i.i.d. time-structures(or with otherwise similar temporal covariance structures)cannot be separated, since the mixtures’ HOS are invariant withrespect to (w.r.t.) the mixing (as they all vanish).

Nevertheless, when the sources have different temporalcovariance structures (e.g., when they are stationary withdistinct spectra), consistent separation can rely exclusivelyon SOS. Consequently, Gaussian sources can be separated insuch cases. Moreover, the ability to work within the confinesof a Gaussian model (clearly a widely adopted framework instatistical signal processing, which is nonetheless despicablein classical ICA with i.i.d. sources) has several advantages inthis context: It enables a tractable derivation of maximum like-lihood (ML) separation, as well as of the induced Cramér–Raolower bound (iCRLB) on the attainable Interference to sourceratio (ISR).1 The ML separation in this context is asymptoti-cally efficient, attaining the iCRLB.

Prior work on ML separation and on bounds for Gaussiansources has been limited to a few, very important yet merelyparticular cases in terms of the sources’ temporal covariancestructure. The most prominent is the case of stationary sources:The “quasi-ML” (QML) approach [18], proposed by Phamand Garat in 1997, is based on expressing the stationaryobservations’ joint probability density function (pdf) in thefrequency-domain, assuming arbitrary sources’ spectra. Whenthe presumed spectra are the true spectra of the sources (e.g.,in a “semi-blind” scenario), QML becomes ML (for Gaussiansources). The “exact ML” (EML) approach [8], proposed byDégerine and Zaïdi in 2004, further assumes that the sources

1We use the term “induced”, since the CRLB does not address the ISR di-rectly, but rather bounds the variance in unbiased estimation of the mixing (ordemixing) matrix’ elements, which in turn implies an “induced” bound on theISR—see further discussion in Section II

1053-587X/$26.00 © 2010 IEEE

5058 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, OCTOBER 2010

can be modeled as Gaussian auto-regressive (AR) processes,and is based on expressing the observations’ pdf in time-do-main, maximizing w.r.t. the unknown mixing matrix and ARparameters simultaneously. Other approaches, which, althoughnot based directly on ML, can be shown to coincide (asymptot-ically, in the respective Gaussian models) with QML and EML,are Pham’s Gaussian mutual information (GMI) approach[14], and the weights-adjusted second-order blind identifica-tion (WASOBI) approach for AR sources, proposed by Yeredor[26] and by Tichavský et al. [21], [23]. With their EML deriva-tion in [8], Dégerine and Zaïdi further provide an expressionfor the CRLB on estimation of the demixing matrix’ elements.In [9] this result is translated into the induced bound (iCRLB)on the attainable ISR (for Gaussian AR, moving average (MA)and ARMA sources). The derivation of the QML approachin [18] also includes asymptotic performance analysis, whichessentially yields an expression for the iCRLB for the case ofGaussian stationary processes with general spectra.

Another particular case (in terms of the sources’ temporalcovariance structure), involving non-stationary structures, wasconsidered by Pham and Cardoso in [17], where ML separationwas applied to Gaussian block-i.i.d. sources. The block-i.i.d.model assumes that the observation interval can be dividedinto several blocks, such that each source has an i.i.d. time-structure within each block (with different, unknown vari-ances). A slightly more general block-AR model (in which thesources have different, unknown AR time-structures withinblocks) was considered by Tichavský et al. in [24], where theiCRLB for this model was provided together with a separationscheme which is asymptotically equivalent to ML.

Wide and general as the stationary model may be, in termsof the sources’ temporal covariance matrices it is still merely aparticular case, in which these matrices have a special (Toeplitz)structure. Naturally, so are the block-i.i.d. (block-diagonal withconstant-diagonal blocks) and block-AR (block-diagonal withToeplitz blocks) models. Evidently, in many cases of interestsome or all of the sources may exhibit far more rich and diversenonstationary structures, e.g., time-varying AR (TVAR) pat-terns, cyclo-stationary patterns, transient patterns, or any othergeneral (arbitrary) covariance structure. However, to the bestof our knowledge, no general framework for ML separation (orfor the associated bounds) of Gaussian sources with arbitrarycovariance structures has been presented so far. It is our purposein this paper to fill this gap.

Rather than consider additional particular cases, we shall as-sume that each source has a general covariancematrix , , not necessarily struc-tured or constrained in any way. In the first part of the paper (inSection III) we shall assume a “semi-blind” scenario, in whichall are known, deriving the ML estimateof the demixing (or mixing) matrix, as well as the iCRLB forthis case. Then, in the second part (in Section IV), we shallrelax this (often unrealistic) assumption, and consider the “fullyblind” scenario, in which these matrices are unknown. We shallshow that under some scaling constraints, the Fisher informa-tion matrix (FIM) in this scenario is block-diagonal with sepa-rate blocks corresponding to the mixing parameters and to theunknown covariance parameters, implying that the same iCRLBfrom the “semi-blind” scenario is also applicable in the blindscenario. Accordingly, we shall also demonstrate (by simula-

tion, in Section V) that with some models, the unknown co-variance matrices can be consistently estimated from the ob-served mixtures (e.g., when the sources are sufficiently ergodic,or when several snapshots are available), leading to asymptoti-cally efficient ML separation (attaining the iCRLB) in the fullyblind scenario as well. We begin (in the next section) by out-lining some general properties of the iCRLB, which will beuseful in the subsequent derivations.

II. THE INDUCED CRAMÉR–RAO BOUND ON THE ISR

The CRLB is a well-known lower-bound on the mean-squareestimation error (MSE) of any unbiased estimate of a (determin-istic) parameters vector. In the context of BSS, the unknown pa-rameters vector consists of elements of the mixing (or demixing)matrix, and, in a fully blind scenario, also of parameters relatedto the unknown sources’ distributions.2 The estimated demixingmatrix (or the inverse of the estimated mixing matrix) is appliedto the observed data in order to recover the sources. Any errorsin the estimation of the mixing or demixing matrix would be re-flected in some residual mixing in the recovered sources.

Denoting the estimated demixing matrix (based on the ob-served signal-matrix ) as , we defineas the overall mixing-unmixing matrix. The resulting ISR canthen be described by a matrix whose th element isgiven by

(2)

(where denotes the th element of ), suchthat this value represents the relative residual energy of the thsource in the reconstruction of the th source.

Noting that is a linear function of , it is evidentthat any lower-bound on the MSE in the estimation of in-duces an easily-obtained element-wise lower-bound on the at-tainable ISR matrix. Thus, the CRLB on the MSE in unbiasedestimation of induces a bound on the ISR, which we term the“induced CRLB” (iCRLB). The iCRLB is a linear transforma-tion of the CRLB on . A basic property of the iCRLB, whichis instrumental in facilitating our subsequent derivations, is itsequivarance.

The property of equivariance of ICA algorithms has beenlong observed and advocated by Cardoso and others (e.g.,[5]–[7], [13]): Evidently, any estimator of satisfying

for all nonsingular (and all )would result in

(3)

which is independent of (or “equivariant in”) the true value ofthe mixing-matrix , and depends only on the sources’ realiza-tion . Hence, for such estimators, which we term “ISR-equi-variant estimators,” the resulting ISR would depend only on thestatistics of the sources matrix , but not on . The equivarianceproperty is shared by estimators obtained by many (but certainlynot by all) classical ICA algorithms.

It is therefore intuitively plausible (yet not obvious in gen-eral) that the iCRLB should exhibit a similar invariance prop-erty. In the Appendix, we prove the invariance of the iCRLB

2Possibly involving continuous distributions in a “semi-parametric” frame-work—see, e.g., [3].

YEREDOR: BLIND SEPARATION OF GAUSSIAN SOURCES WITH GENERAL COVARIANCE STRUCTURES 5059

(for both the semi-blind and fully blind scenarios) by exploitingthe asymptotic efficiency of the ML estimate, combined with itsown equivariance.

This property is not only conceptually, but also practicallysignificant: It allows to obtain the iCRLB by calculating theCRLB for a specific, conveniently chosen value of the mixing-matrix—knowing that the same ISR bound would be obtainedwith any (nonsingular) mixing matrix. It is important to realizehere, that only the iCRLB—but not the CRLB itself—is in-variant in . Therefore, the ability to use the calculation of theCRLB for just one particular choice of in order to obtain thegeneral iCRLB (for all ) is of considerable advantage.

We now turn to the explicit derivation of the iCRLB and MLestimate, concentrating on the semi-blind case first.

III. THE SEMI-BLIND CASE

Let us use a somewhat uncommon formulation to rewrite themixing model (1) as

(4)

where (respectively, ) denotes a vector, formed bythe concatenation of all source (respectively, observation) sig-nals,

(5)

denotes the identity matrix, and denotes Kro-necker’s product

......

. . .

(6)

with denoting the th element of .According to the model assumptions, , and therefore also

, are zero-mean random vectors. The covari-ance matrix of is a block-diagonal matrix composed of

as its blocks

. . ....

.... . .

. . .(7)

The covariance matrix of is therefore givenby

(8)

(where is shorthand for ), and its inverse is givenby

(9)

Assuming, in addition, that all sources are Gaussian and mu-tually statistically independent, it follows that the observationsvector is also a (zero-mean) Gaussian vector, .

We shall now derive the CRLB on unbiased estimation of ,assuming, in accordance with the semi-blind scenario, that all

are known. This bound will in turn be usedfor obtaining the iCRLB for this problem. In addition, we shallderive the ML estimate of .

A. The CRLB and iCRLB

Since all the distribution-related parameters (the ma-trices) are known, the vector of unknown parameters consistsonly of elements of the unknown mixing matrix, .Due to the zero-mean Gaussian distribution of , the respectiveelements of the FIM for the th and thelements of the mixing matrix are well-known to be given by(see, e.g., [11])

(10)

where the second expression follows by substituting the identity

(11)

(which holds for any invertible matrix which depends on aparameter ), and by using . Let denotethe th column of the identity matrix, and let

denote the matrix which is all-zeros except fora 1 at the th element. Evidently . Wetherefore have from (9), using the relation

and exploiting the block-diagonality of ,

(12)

Multiplying on the right with , we get

(13)

Recall now, that due to the invariance of the iCRLB w.r.t. ,we may obtain the iCRLB from the derivation of the CRLB withany (or ). For convenience, we now choose to proceed with

(the identity matrix). Substituting into (13),we get

(14)


where we have used the relation. Substituting into the FIM (for and for ), we

get

(15)

Substituting the general relation

(16)

(where denotes Kronecker’s delta, which equals 1 ifand 0 if ), we get

(17)

Put differently, this expression reads

elsewhere

(18)

which means that the FIM is essentially block-diagonal, with blocks (each 2 2) accounting forelements-couples of the form and for

; and additional diagonal elements accounting for allterms. Consequently, the corresponding CRLB matrix is

a similarly-structured, essentially block-diagonal matrix, withthe respective 2 2 inverses substituting the 2 2 blocks, and

substituting the diagonal elements. Thus, defining

(19)

we may express the CRLB on unbiased estimation of elementsof when as follows:

(20)

for (with differently-indexed cross-terms beingzero). In particular, this means that the estimation variances ofall (with ) are bounded by

(21)

In order to obtain the iCRLB, we may now use the relation

(22)

where is the estimation error in . In particular, when(and ) we have , where

the variances of the elements of are bounded by the re-spective CRLB, as expressed in (20) above. Recalling the in-variance property of the iCRLB, and combining the estimationbound (21) with the ISR expression (2), we get (substituting

)

(23)

with defined in (19).It is relatively straightforward (and reassuring) to show, that

in the particular case of stationary sources, this bound coincideswith asymptotic bounds developed previously under the station-arity assumption. These are, for example, the performance anal-ysis of QML (with optimal filters) in [18], or the bound de-veloped in [9] for the case of parametric (AR, MA, ARMA)Gaussian sources.

Some key properties of our iCRLB are summarized below.• Invariance w.r.t. : As already mentioned, the iCRLB

does not depend on the mixing matrix , but only on thesources’ covariance matrices.

• Invariance w.r.t. other sources: The bound on de-pends only on the covariance matrices of sources and ,and is unaffected by the other sources.

• Invariance w.r.t. scale: The ISR bound is invariant toany scaling of the sources. Note that this property isnot shared by the bound on the variance of elements of

: If, for example, source is amplified by a certainfactor, the bounds on the variances of all of the resulting

would be reduced by the square of thatfactor; However, the bounds on all would remainunchanged.

• Non-identifiability condition: If sources and have sim-ilar covariance matrices (i.e., is a scaled version of ),then , implying (in (23)) an infinite bound on

and on —which in turn implies non-identi-fiability of the respective elements of . Recall that thisbound was developed for Gaussian sources, and thereforethis is a strict identifiability condition in the Gaussian case.In the case of non-Gaussian sources this identifiability con-dition can be easily shown to still be applicable to estima-tors based exclusively on SOS; However, when this con-dition is breached, mixtures of non-Gaussian sources maystill be identifiable using HOS.

• Resemblance to other bounds: The general form of (21)is also shared by similar bounds developed for the case ofsources with i.i.d. temporal structures (e.g., by Tichavskýet al. in [22], or by Ollila et al. in [12]), with replacedby a quantity which depends only on the probability distri-bution function of the th source.

We now turn to the derivation of the ML estimate.

B. The ML Estimate of

Recall from (4) that the vector of concatenated mix-ture signals is a zero-mean Gaussian vector, ,


with given in (8). It therefore follows that the log-likelihoodis given by

(24)

where and are constants which do not depend on. Differentiating w.r.t. we get (using the relation

):

(25)

where denotes the th row of , namely .Exploiting the relations

(26)

as well as the symmetry of , we end up with the likelihoodequations (in which and denote the ML estimates of and

(respectively), naturally satisfying ) of the form

(27)

Defining the matrices

(28)

we observe that the likelihood-equations (27) take the form

(29)

where is the th row of and where denotes the th

column of . This implies

(30)

where denotes the th column of , namely .

Exploiting the relation , which implies that

, we obtain the condition , which canalso be cast as

(31)

Note that this condition has an interesting interpretation, whichis closely related to the concept of (non-orthogonal) approx-imate joint diagonalization (AJD), a well-studied subject inrecent years (e.g., in [2], [15], [19], [23], [25], and [27], to namejust a few). In the AJD framework, a set of symmetric,“target-matrices” is given, and the problem is to

find a matrix such that the transformed matrices(for ) are “as diagonal as possible.” Differentcriteria have been proposed for measuring the diagonality ofthe transformed set, leading to various optimization algorithmsfor finding . In our case the set of “target matrices” consists of

exactly matrices , and the “diagonalitycriterion” reflects a hybrid “exact-approximate” diagonality

requirement: The th transformed matrix isrequired to be exactly diagonal w.r.t. its th column and row,meaning that this column and this row (only) must be all-zeros(exactly) except for the th (diagonal) element, which

should be 1; All other values in are irrelevant. This shouldbe satisfied for all the matrices, namely for .

Such a “hybrid exact-approximate diagonalization” problem(which we term “HEAD”) has already been encountered by (atleast) Pham and Garat in [18] and Dégerine and Zaïdi in [8], inthe context of (Q)ML separation of stationary sources (which ismerely a particular case of our more general framework). TheHEAD problem is discussed in detail in [28], where several iter-ative solution algorithms are outlined. For completeness of theexposition, we mention here the simple, yet rather effective al-gorithm by Dégerine and Zaïdi ([8]), which consists of alter-nating updates of the rows of as follows. Given some initialguess of , repeat the following “sweep” (for ):

(32)

where denotes the matrix without its th row (amatrix). When such “sweeps” are repeated iteratively,

convergence to a solution of (31) is attained within several iter-ations (see [18] or [28] for more details).

To summarize, the ML estimate of in the semi-blind case,in which the sources’ covariance matrices are known, isobtained as follows:

1) for , compute the “target-matrices”

;2) find the matrix , such that the transformed matrices

each satisfies (namely itsth column is ), e.g., using (32).

IV. THE FULLY-BLIND CASE

So far, we assumed that the sources’ covariance matricesare all known. Naturally, in practice this is rarely a realistic as-sumption. In this section we consider the “fully blind” scenario,in which these matrices are not known in advance. We shallassume a general model, reflecting the full range of possibleblindness, as follows. Assume that the sources’ covariance ma-trix depends on some unknown parameters vector . Typi-cally (but not necessarily), each covariance matrix would be


known only up to its respective parameters vector , such thatis the concatenation of all these smaller vec-

tors. The dimension of each may range anywhere between1 (e.g., when is fully known up to scale, in which casesimply contains the unknown scale factor) and , inwhich case may contain all the free unknown elements of the(symmetric) matrix .

Define a vector containing the entire set of unknown (mixing/demixing and covariances) parameters .We shall now show that under some scaling assumption (to bespecified shortly), the FIM (w.r.t. ) is block-diagonal, withdifferent blocks accounting for and for . It is importantto note here, that although the iCRLB is invariant in (also inthe fully blind case), the FIM is certainly not invariantin , and therefore it would not be sufficient to establish block-diagonality of for only. We need to show that

is block-diagonal for all (nonsingular) . Let denotean arbitrary element of . Recall that the general expression forthe cross-blocks elements of the FIM is given by

(33)

where we have used the identity (11), and where

(34)

( being used as shorthand for ).Exploiting the structural similarity between and , we

can substitute with in (13) in order to obtain

(35)

Therefore,

(36)

Due to the symmetry of and of , these two traces areequal, so

(37)

Like , is block-diagonal, so only the respective diagonalblocks of are relevant for the trace. These aredetermined by the diagonal terms of . Theonly nonzero element along the principal diagonal of this matrix

can be its th element, which equals . The respectiveth block of is , and therefore

(38)

Noting that is simply the derivative ofw.r.t. , we conclude that the FIM is block-di-

agonal if for each , the determinant doesnot depend on any of the unknown parameters in (but, ofcourse, itself may certainly depend on some or on all of theparameters in ).

A common example of a case where depends on a param-eters vector but its determinant does not is the following (forconvenience, we drop the index in this example). Consider asource signal whose elements satisfy thedifference equations

(39)

where the “driving noise” is a zero-mean white processwith fixed (known) variance (and employing the conventions

for ). The vector of unknown relevant3

parameters consists, in the most general case, ofwith (40),

(40)

with and denoting, respectively, the AR and MA orders ofthe process. These difference equations define a nonstationaryARMA process with possibly time-varying AR and MA coef-ficients (the unknown parameters). To see that does notdepend on in this case, note that the difference (39) can be ex-pressed in matrix-vector form as

(41)

where is the “driving-noise” vectorand and are lower-triangular banded matriceswith 1-s along their principal diagonals. We thus have

, so

(42)

where . Since and arelower-triangular with a fixed all-ones diagonal, their determi-nants (as well as the determinants of their inverses) are 1-s, andtherefore and does not depend on (note thatsuch independence also holds with any other driving-noise co-variance whose determinant is fixed, but the case

is considered more “standard”).Naturally, one widely familiar particular case within this

framework is the case of constant regression coefficients, whichoccurs when and for all . Note,

3Note that � �� and � �� for � � � and for � � �, respectively, areirrelevant, since they are always multiplied by zeros.


however, that strictly speaking, this does not lead to a stationarysignal , since the difference (39) entails the effect of zero initialconditions (in other words, in (42) is not a Toeplitz matrixin this case). Nevertheless, asymptotically the effect of initialconditions (or the deviation of from a Toeplitz structure)becomes negligible, and can be considered as a segment takenfrom a stationary ARMA process. It is interesting to observe,that this is the reason why the CRLB expression obtained in [8]on estimation of the demixing matrix for stationary GaussianAR sources is only asymptotic: when the sources are truly sta-tionary, the FIM is not block-diagonal, since the determinantsof all -s depend on the respective unknown AR parameters.Only asymptotically (as ) this dependence becomesnegligible, as the fully stationary model becomes essentiallyequivalent to our zero initial-conditions model above.

We therefore conclude that, since we have shown the FIM tobe block-diagonal, the CRLB on estimation of (or ) whenthe covariance matrices are known is the same as the CRLBobtained when these matrices are unknown, provided that theirdeterminants are known. For the same to hold for the iCRLB,however, knowledge of these determinants is not required. Toshow this, we recall that the iCRLB (as opposed to the CRLB) isinvariant in , and may therefore be calculated with any chosen(nonsingular) , with the result holding true for all other (non-singular) -s.

Choosing again, the only remaining (nonzero)off-block-diagonal terms of the FIM in (38) in this case areterms related to . Recalling the general block-diag-onal structure (18) of the “ -part” of , this means thatwhen , the only element of whose estimation boundmay be affected (increased) by lack of knowledge of is(according to (38)) .

Fortunately, still thanks to the block-diagonality of the“ -part” of in (18), this has no effect on the ISR bound(23). Note that this is a rather intuitive result, since uncertaintyin implies uncertainty in the scale of the th source,which is well-known to be unresolvable in BSS on one hand,but to have no effect on the ISR on the other hand.

We therefore conclude that the iCRLB is indifferent to theknowledge of the sources’ covariance matrices (or of any relatedparameters such as their scales or determinants). In other words,the iCRLB expression (23), which was obtained for the semi-blind scenario, is also valid in the fully blind scenario.

Of course, this does not mean in general that the same ISRcan be attained in the semi-blind and fully blind cases, but itdoes have an important implication on the asymptotic perfor-mance in a multiple-experiments scenario: Indeed, assume thatmultiple independent experiments (or “multiple snapshots”)

are available (for ), where in eachexperiment the sources are redrawn from the same distribution(with the same -s). For example, such a situation can occurwhen conducting a sequence of repeated experiments, suchthat each experiment is synchronized to some external trigger,causing each source to obey the same temporal-covariancepattern (e.g., some energy rise-time and/or fall-time, sometriggered frequency-drift, etc.) in each experiment.

Then, due to the asymptotic efficiency of the ML estimate,the block-diagonality of implies that asymptotically (as

) the same ISR which can be attained in the semi-blindscenario (when all are known in advance) can also be

attained in the fully blind scenario (when they are not known inadvance), and both coincide with the iCRLB developed earlier.

Intuitively, the ability to match the semi-blind performanceasymptotically in the fully blind scenario is based on the abilityto effectively estimate the sources’ covariance matrices from theobserved mixtures over the multiple experiments. In particular,an iterative process can be proposed as follows.

1) Apply some “standard” separation algorithm (e.g., second-order blind identification (SOBI) [4]) to the concatenatedobservations , denoting the estimatedseparation matrix as .

2) Extract the estimated source matrices,, and denote the extracted

signals as (namely ).3) Estimate the covariance matrices, e.g.,

.4) Using as substitutes for the true , apply the ML

estimation scheme described at the end of in Section IIIand obtain an updated estimate of the demixing matrix.

5) Go back to step 2 and repeat until convergence is attained.While such a scheme is theoretically feasible, it would generallyrequire a huge number of experiments , namely

, in order to obtain reliable estimates of the generalcovariance matrices. Fortunately, however, in many cases ofinterest, the sources may have a general, yet succinctly param-eterized covariance structure, thereby significantly reducingthe number of required experiments, so as to match the(small) number of free parameters. For example, with TVAR [1]sources, a very small number of experiments canbe sufficient for reliable estimation of the TVAR parameters(which determine the sources’ covariances), attaining optimalperformance (i.e., matching the iCRLB and the semi-blindperformance). In fact, in some cases (e.g., stationary or cyclo-stationary sources), a single (sufficiently long) realization ofeach source signal is sufficient for obtaining a reliable estimateof its covariance. Therefore, the iterative scheme can be usefulalso in the single-experiment scenario, using suitableestimation schemes for estimating the parameters of the covari-ance matrices from the extracted sources. We demonstrate bothkinds of scenarios in simulation results in the next section.

We note in passing, that when the matrices are not con-strained to have a special structure, an alternative approach forobtaining the multiple-experiments (blind) ML estimate ofcan be taken: In this case maximizing the likelihood is equiv-alent to minimizing the Kullback-Leibler divergence (KLD)between the empirical covariance matrix4

of the estimated sources (where

) and its block-diagonalversion (consisting only of its diagonal blocks of size

). This can be attained using the iterative joint block-di-agonalization algorithm proposed by Pham (in a differentcontext) in [16] (which, in our case, would be applied only toa single matrix ).

V. SIMULATION RESULTS

To demonstrate the theoretical results and the performance ofthe proposed estimates in their respective contexts, we present

4We loosely refer to the KLD between two zero-mean Gaussian distributionas the KLD between their covariance matrices.


Fig. 1. Sample functions of the two source signals used in the first experiment,expressing both spectral and temporal localization diversities.

Fig. 2. �� versus � and � : iCRLB and the empirical performance ofSOBI, BGL and SOBGL.

simulation results of three different experiments. The first exper-iment addresses a semi-blind scenario, whereas the other twoaddress (nearly) fully blind scenarios: one with a “single ex-periment” scenario and the other with a “multipleexperiments” scenario. In all three experiments, each empiricalresult represents an average of 1000 independent trials, with themixing matrix drawn at random (with independent zero-meanunit-variance Gaussian elements) in each trial.A. Experiment A: MA Sources With a Laplacian Envelope

In the first experiment we use two non-stationary sourcesignals generated as follows. First, we generate two sta-tionary Gaussian MA signals, one with four zeros at

and the other with four zeros at(and their reciprocals), where is

a parameter controlling the spectral diversity between thesesignals: as moves from 0 to , the two spectra becomemore similar. Then a segment of samples from eachof these signals is multiplied by a Laplacian-shaped window,

, centered around for thefirst signal and around for the second signal, withvariable width (measured to the 3 dB points of the Lapla-cian window). Thus, controls the temporal-location diversity

of the signals: as increases, the localization decreases, sothat when the signals are practically distinct in time,whereas when there’s nearly no temporal distinction.

An example of a sample function of the two signals forand is depicted in Fig. 1 (with the Laplacian

windows in dashed lines): The first signal has a relatively high-frequency content and its energy is concentrated more in the firsthalf of the segment, whereas the second signal has a relativelylow-frequency content, and its energy is concentrated more inthe second half.

Evidently, the SOBI algorithm [4] can be expected to cap-ture and exploit the spectral diversity of the underlying signals,but cannot exploit their different temporal concentrations. Con-versely, the BGL algorithm [17] (with just two blocks, each ofsize ) can exploit the different temporal concentra-tions but cannot exploit the spectral diversity. Now, recall thatSOBI is based on joint diagonalization of correlation matrices atdifferent lags, whereas BGL is based on joint diagonalization ofzero-lag correlations taken over different blocks. A natural ex-tension which comes to mind is to combine SOBI and BGL intowhat we would call “SOBGL” for short, and would be based onjoint diagonalization of all these correlation matrices together,hopefully being able to exploit both sources of diversity. Obvi-ously, however, this is just an ad-hoc combination, which is farfrom being optimal—as we demonstrate in the simulation re-sults on Fig. 2.

Fig. 2 shows the attained performance in terms of (indecibels [dB]). The left-hand plot shows the ISR as variesfrom to with fixed at 25, whereas the right-hand plotshows the ISR as varies from 25 to 200 with fixed at .We show the iCRLB, as well as the ISR attained by SOBI, BGL,SOBGL and our ML (implemented as prescribed for the semi-blind case in Section III-B above). On the left-hand plot we seethat SOBI, which performs well with the smaller , deterioratesrapidly as approaches and the spectral diversity is nearlylost. BGL is relatively insensitive to , and SOBGL slightlyoutperforms the better of these two at each . However, all threeare significantly worse than our ML estimates, which attains theiCRLB. On the right-hand plot we see that BGL deterioratesrapidly with the loss of localization diversity, whereas SOBI isrelatively constant and SOBGL is slightly better. Again, our MLestimate significantly outperforms all three, coinciding with theiCRLB.

It is important to realize that, obviously, in this semi-blindscenario ML has a clear “unfair” advantage over SOBI, BGLand SOBGL, which, unlike ML, cannot exploit the prior knowl-edge of the sources’ covariance matrices. Nevertheless, it is ourpurpose in this work to show just how such information can bebest exploited when available. Moreover, in the fully blind ex-periments which follow, this “unfair” prior knowledge is elimi-nated.

B. Experiment B: AR Sources With a Cyclostationary DrivingNoise

In the second experiment, we use four AR sources of order, generated using driving-noise sequences whose power-

profiles change periodically. Therefore, the resulting sources arecyclostationary. More specifically, each source satisfies the dif-ference equation


TABLE ISOURCES’ PARAMETERS USED IN (43). NOTE THAT SINCE � � � � �, � ,

� , � AND � ARE MEANINGLESS

with

(43)

where are i.i.d. zero-mean unit-variance Gaussiansequences, and where the coefficients , as wellas the poles of the polynomials arespecified in Table I. It is evident that since , thethird and fourth sources are stationary AR sources. In addition,the spectral diversity of all sources is rather weak (the poles arerather close), so good separation, e.g., between sources 1 and 3and between sources 2 and 4, has to rely on the difference in thedriving-noise envelopes.

We assume that the general structures of the sources areknown in advance, but only up to the unknown parameters ofTable I (this is why we term this a “nearly” fully blind scenario).The blind separation therefore proceeds as follows. First, weapply initial separation using SOBI with three correlationmatrices (at lags 0, 1, 2). We then repeat the following for threeiterations:

1) For each separated (estimated) source:

• estimate the AR parameters , usingYule–Walker equations (e.g., [11]) applied to ordi-nary correlation estimates of ;

• using the estimated AR parameters, obtain the estimateddriving-noise sequence via respective FIR filtering of

; denote this sequences as ;• using the discrete-time Fourier transform (DTFT) of the

squared , obtain an estimate of the period, as thereciprocal of the (non-DC) highest peak-location.

• using least-squares (LS) fit of , estimate the ampli-tude and phase (this can be attained using linearLS fit of a constant + sine and cosine sequences with theestimated periods ).

2) Using the estimated parameters, which in turn yield theestimated sources’ covariance matrices, obtain the (blind)ML estimates of the sources.

(note that estimation of , and was applied to all foursources, including the stationary sources (3 and 4), since theinformation that is unknown to the estimator).

Fortunately, in our case the procedure does not require ex-plicit computation and inversion of the estimated sources’ co-variance matrices (which are all ), since the AR modelalready provides direct access to the inverse covariance, and thedriving-noise envelope can also be easily inverted. More specif-ically, let , , , and denote the estimated pa-rameters, and let denotethe (squared) estimated envelope of the driving-noise. Defining

as the Toeplitz matrix with 1-s along its main diag-onal, along its first sub- diagonal and along its secondsub-diagonal, and denoting by the diagonal matrixwith along its diagonal, the implied estimate of the covari-ance matrix is given by

(44)

so the (blind) ML correlation matrices, given by

(45)

can be easily obtained as , where

each row of is obtained by first filteringthe respective row of with the FIR filter whose taps are

and then dividing the result (elementwise) by thesequence .

This enables computation of the matrices in ,rather than operations, saving the need for explicitcomputation and inversion of the sources’ estimated covariancematrices .

In Fig. 3, we show the obtained values for allversus the observation length . In addition to the

obtained blind-ML (BML) results, we also show the followingfor comparison:

• the respective iCRLBs;• the (semi-blind) ML separation results, obtained assuming

known covariance matrices of the sources, namely usingthe true parameters of Table I;

• the WASOBI [23] separation results (using the code in[20]).

WASOBI (an asymptotically optimally-weighted version ofSOBI) is supposed to attain the iCRLB asymptotically forstationary AR sources, and indeed it can be seen to coincide(asymptotically) with the iCRLB for and for ,since sources 3 and 4 are stationary AR sources. However,for all other couples, which involve at least one nonstationarysource, WASOBI is seen to be suboptimal, being unable toexploit the additional nonstationarity diversity. In particular,the WASOBI performance is severely suboptimal in resolvingsource 1 and 3, as well are 2 and 4, since these couples sharevery similar poles and differ mainly by their driving-noiseenvelopes. The ML separation is seen to already coincide withthe respective iCRLB for relatively low values of (typicallyless than 500), whereas the BML separation obviously needsmore data to be able to obtain useful estimates of the covariancematrices, and therefore only coincides with the iCRLB forlarger than, say, 2000.

C. Experiment C: TVAR Sources

In the last experiment we use five time-varying AR sourcesof order , generated as follows:

(46)

where the driving-noise sequences are mutually indepen-dent, white zero-mean unit-variance Gaussian processes, andwhere the TVAR coefficients reflect a linear drift of the polesfrom an initial state (at ) to a final state at


Fig. 3. �� (for �� , � is the row-index and � is the column-index) versus � : iCRLB, ML, BML and WASOBI.

TABLE IIINITIAL AND FINAL VALUES OF THE FOUR POLES FOR THE FIVE SOURCES. NOTE THAT THE FIFTH SOURCE IS STATIONARY

(an observed sequence is samples long). In otherwords, these coefficients satisfy the following relation:

where

(47)

with the initial and final values of the poles for each source spec-ified in Table II.

In this experiment, we employ the “multiple experiments”framework, estimating the sources’ covariance matrices fromindependent realizations of the mixtures (each of length

). The covariance estimation is based on estimation of theTVAR parameters from the independent realizations usingthe general Dym–Gohberg algorithm [1], [10] for TVAR pa-rameters estimation. It is important to emphasize that we do notexploit the knowledge of our specific TVAR model (of lineardrift of the poles) in the estimation process. The only informa-tion used is the TVAR order . The estimated TVARparameters are obtained from the Dym–Gohberg algorithm asfree parameters, with no particular temporal inter-relations be-tween them.

In Fig. 4, we show empirical results for the total average ISR,given by

(48)

(with in this case), versus the number of experiments. As in the previous experiment, we compare the results to the

iCRLB (also averaged over all couples), to the ML performance(obtained using the true TVAR parameters) and to WASOBI.5Note that the iCRLB in this case is simply given by the single-experiment iCRLB divided by (since the experiments areindependent).

Evidently, the ML performance nearly coincides with theiCRLB for all . The BML performance approaches theiCRLB rather rapidly, and for the difference becomessmaller than 1 dB. Recall that for estimating arbitrarycovariance matrices, generally has to be much larger than

; Nevertheless, with a TVAR model of order it is sufficientto have , namely , even if is muchsmaller than —which is well-exploited in our case. As could

5Using a slightly modified version of the code in [20], adapted to estimatingthe time-lagged correlations from multiple snapshots.


Fig. 4. Total average �� versus� : iCRLB, ML, BML and WASOBI.

be expected, WASOBI, which attains the iCRLB for stationaryAR sources, cannot exploit the temporal variability of thesources to the extent exploited by our ML and BML (andpredicted by our iCRLB).

VI. CONCLUSION

We presented a general framework for optimal semi-blindand blind separation of Gaussian sources with arbitrary covari-ance structures. We derived the induced CRLB on the attainableISR in terms of the sources’ covariance matrices, developed thesemi-blind ML separation and proposed an approach for fullyblind (or “nearly” fully blind) ML separation. We demonstratedthat this new framework enables, under various scenarios, to at-tain substantial performance gains with respect to conventionalapproaches (aimed at stationary sources) by proper exploitationof the covariance diversity of the sources.

APPENDIX

INVARIANCE OF THE ICRLB

Let denote a matrix of source signals (not neces-sarily independent), whose joint distribution is known up to anunknown parameters vector , and let denote their ob-served mixtures. Let the vector of all unknown parameters beconstructed as . Now consider a set ofstatistically independent experiments, such that in each exper-iment the entire set of source signals is independentlyredrawn from the same joint distribution and mixed by the samemixing-matrix , such that the observed mixtures are given by

, . Let denote the FIM forestimation of from . Evidently (see, e.g., [11]),we have .

Now let denote the ML estimate ofbased on the independent experiments. Asymptotically(as ) the ML estimate attains the CRLB, namely,

is asymptotically unbiased, and its covari-ance coincides asymptotically with .

Let us show now that is ISR-equi-variant. Our proof essentially resembles the one presented in[6]; however, in [6] all sources are assumed to have an i.i.d.time-structure, and their marginal distributions are implicitly

assumed to be known. In the following, we address our slightlymore general model.

Theorem 1: The ML estimate of from ,denoted is an ISR-equivariant esti-mator, namely, for any nonsingular matrix ,

Proof: We use the vectorized version of the mixing modelas

(49)

where and . This implies

(50)

where and denote the pdf-s of each of therandom vectors and (respectively). With slight mod-ification of notations, using the relation

and exploiting the statistical independence of theexperiments, we get

(51)

where denotes the joint pdf of .Let and denote the ML estimates of and obtained

with given measurements . This implies the in-equality

(52)

for all nonsingular and , which in turn implies (using (51))

(53)

Now let denote an arbitrary nonsingular matrix. Weobserve (using (51) and (53)) that for all nonsingular and

(54)

Now let where is any nonsingular matrix.We have

(55)


which coincides with the right-hand side of (54). We thereforeconclude (using (53) and (55)) that

(56)

for all nonsingular and . In other words, when the observa-tions are changed into , the ML estimate changesinto (and the ML estimate of remains unchanged).

Thus, since the MSE of coincides(asymptotically) with the CRLB, this implies that the asymp-totic iCRLB is also invariant w.r.t. . Furthermore, since, asdiscussed in Section II, the iCRLB is a linear transformationof the CRLB (on ), and since the single-experiment CRLB isgiven by , it follows that the single-ex-periment iCRLB also differs from the -experiments iCRLBonly by a factor of , and must therefore be invariant w.r.t.as well.

ACKNOWLEDGMENT

The author would like to thank the anonymous reviewers, andparticularly Reviewer 2, for suggesting ideas which were helpfulin simplifying some of the derivations.

REFERENCES

[1] Y. I. Abramovich, N. K. Spencer, and M. D. E. Turley, “Time-varyingautoregressive (TVAR) models for multiple radar observations,” IEEETrans. Signal Process., vol. 55, no. 4, pp. 1298–1311, Apr. 2007.

[2] B. Afsari, “Simple LU and QR based non-orthogonal matrix joint di-agonalization,” in Lecture Notes in Computer Science (LNCS 3889):Proc. ICA, 2006, pp. 17–24.

[3] S.-I. Amari and J.-F. Cardoso, “Blind source separation-semipara-metric statistical approach,” IEEE Trans. Signal Process., vol. 45, no.11, pp. 2692–2700, Nov. 1997.

[4] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “Ablind source separation technique using second-order statistics,” IEEETrans. Signal Process., vol. 45, no. 2, pp. 434–444, 1997.

[5] J.-F. Cardoso, “On the performance of orthogonal source separationalgorithms,” in Proc. EUSIPCO, 1994, pp. 776–779.

[6] J.-F. Cardoso, “The invariant approach to source separation,” in Proc.,NOLTA’95, 1995.

[7] J.-F. Cardoso and B. Laheld, “Equivariant adaptive source separation,”IEEE Trans. Signal Process., vol. 44, no. 12, pp. 3017–3030, 1996.

[8] S. Dégerine and A. Zaïdi, “Separation of an instantaneous mixture ofGaussian autoregressive sources by the exact maximum likelihood ap-proach,” IEEE Trans. Signal Process., vol. 52, no. 6, pp. 1499–1512,Jun. 2004.

[9] E. Doron, A. Yeredor, and P. Tichavský, “Cramér-Rao lower boundfor blind separation of stationary parametric Gaussian sources,” IEEESignal Process. Lett., vol. 14, no. 6, pp. 417–420, Jun. 2007.

[10] H. Dym and I. Gohberg, “Extensions of band matrices with band in-verses,” Linear Algebra Appl., vol. 36, pp. 1–24, 1981.

[11] S. M. Kay, Fundamentals of Statistical SIgnal Processing: EstimationTheory. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[12] E. Ollila, K. Hyon-Jung, and V. Koivunen, “Compact Cramér–Raobound expression for independent component analysis,” IEEE Trans.Signal Process., vol. 56, no. 4, pp. 1421–1428, Apr. 2008.

[13] E. Ollila, H. Oja, and V. Koivunen, “Complex-valued ICA based ona pair of generalized covariance matrices,” Comput. Stat. Data Anal.,vol. 52, no. 7, pp. 3789–3805, 2008.

[14] D.-T. Pham, “Blind separation of instantaneous mixture of sources viathe Gaussian mutual information criterion,” Signal Process., vol. 81,pp. 855–870, 2001.

[15] D.-T. Pham, “Joint approximate diagonalization of positive definitematrices,” SIAM J. Matrix Anal. Appl., vol. 22, pp. 1136–1152, 2001.

[16] D.-T. Pham, “Blind separation of cyclostationary sources using jointblock approximate diagonalization,” in Lecture Notes in Computer Sci-ence (LNCS 4666): Proc. ICA, 2007, pp. 244–251.

[17] D.-T. Pham and J.-F. Cardoso, “Blind separation of instantaneous mix-tures of non-stationary sources,” IEEE Trans. Signal Process., vol. 49,no. 9, pp. 1837–1848, Sep. 2001.

[18] D.-T. Pham and P. Garat, “Blind separation of mixture of independentsources through a quasi-maximum likelihood approach,” IEEE Trans.Signal Process., vol. 45, no. 7, pp. 1712–1725, 1997.

[19] A. Souloumiac, “Nonorthogonal joint diagonalization by combininggivens and hyperbolic rotations,” IEEE Trans. Signal Process., vol. 57,no. 6, pp. 2222–2231, Jun. 2009.

[20] P. Tichavský, Matlab Code for WEDGE and WASOBI. 2009 [Online].Available: http://si.utia.cas.cz/Tichavsky.html

[21] P. Tichavský, E. Doron, A. Yeredor, and J. Nielsen, “A computationallyaffordable implementation of an asymptotically optimal BSS algorithmfor AR sources,” in Proc. EUSIPCO, 2006.

[22] P. Tichavský, Z. Koldovsky, and E. Oja, “Performance analysis ofthe fastICA algorithm and Cramér-Rao bounds for linear independentcomponent analysis,” IEEE Trans. Signal Process., vol. 54, no. 4, pp.1189–1203, Apr. 2006.

[23] P. Tichavský and A. Yeredor, “Fast approximate joint diagonalizationincorporating weight matrices,” IEEE Trans. Signal Process., vol. 57,no. 3, pp. 878–891, Apr. 2009.

[24] P. Tichavský, A. Yeredor, and Z. Koldobský, “A fast asymptoticallyefficient algorithm for blind separation of a linear mixture of block-wise stationary autoregressive processes,” in Proc. ICASSP, 2009.

[25] R. Vollgraf and K. Obermayer, “Quadratic optimization for simulta-neous matrix diagonalization,” IEEE Trans. Signal Process., vol. 54,no. 9, pp. 3270–3278, Sep. 2006.

[26] A. Yeredor, “Blind separation of Gaussian sources via second-orderstatistics with asymptotically optimal weighting,” IEEE SignalProcess. Lett., vol. 7, no. 7, pp. 197–200, Jul. 2000.

[27] A. Yeredor, “Non-orthogonal joint diagonalization in the least-squaressense with application in blind source separation,” IEEE Trans. SignalProcess., vol. 50, no. 7, pp. 1545–1553, Jul. 2002.

[28] A. Yeredor, “On hybrid exact-approximate joint diagonalization,” inProc. 3rd Int. Workshop Computational Advances Multi-Sensor Adap-tive Processing (CAMSAP), 2009.

Arie Yeredor (M’99–SM’02) received the B.Sc.(summa cum laude) and Ph.D. degrees in elec-trical engineering from Tel-Aviv University (TAU),Tel-Aviv, Israel, in 1984 and 1997, respectively.

He is currently with the School of Electrical Engi-neering, Department of Electrical Engineering-Sys-tems, TAU, where his research and teaching areas arein statistical and digital signal processing. He alsoholds a consulting position with NICE Systems, Inc.,Ra’anana, Israel, in the fields of speech and audioprocessing, video processing, and emitter location al-

gorithms.Dr. Yeredor previously served as an Associate Editor for the IEEE SIGNAL

PROCESSING LETTERS and the IEEE TRANSACTIONS ON CIRCUITS AND

SYSTEMS—PART II, and he is currently an Associate Editor for the IEEETRANSACTIONS ON SIGNAL PROCESSING. He served as Technical Co-Chair ofThe Third International Workshop on Computational Advances in Multi-SensorAdaptive Processing (CAMSAP) 2009. He has been awarded the yearly BestLecturer of the Faculty of Engineering Award (at TAU) six times. He is amember of the IEEE Signal Processing Society’s Signal Processing Theoryand Methods (SPTM) Technical Committee.

Date post:	03-Sep-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 10, …arie/Files/tsp10.pdf · 2010. 12....

Documents