Non-Negative Blind Source Separationusing Convex Analysis
Wing-Kin (Ken) MaThe Chinese University of Hong Kong (CUHK)
Course on Convex Optimization for Wireless Comm. and Signal Proc.Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma
National Chiao Tung Univ., Hsinchu, Taiwan
December 19-21, 2008
Acknowledgement: Tsung-Han Chan, Chong-Yung Chi, and Yue Wang
Blind source separation (BSS): Problem statement
Signal model: a real-valued, N -input, M -output linear mixing model:
xi =N∑
j=1
aijsj, i = 1, . . . ,M
where
xi =
xi[1]...
xi[L]
, si =
si[1]...
si[L]
are observation & true source vectors.
11a1x
2x
3x
1s
2s
22a
21a
31a
12a
32a
Problem: extract {s1, . . . , sN} from {x1, . . . ,xM} without information of themixing matrix A = {aij}.
W.-K. Ma 1
BSS: A biomedical imaging example
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) assessments of breast cancer
captured at different times. Courtesy to Yue Wang [Wang et al. 2003].
W.-K. Ma 2
Time
Fast flowSlow flowPlasma
Time activity cruves (TAC)
Illustration of source pattern mixing process. The signals represent a summation of vascular
permeability with various diffusion rates. The goal is to separate the distribution of multiple
biomarkers with the same diffusion rate.
W.-K. Ma 3
BSS techniques
• A BSS approach is based on some assumptions on the characteristics of{s1, . . . , sN} and/or A.
• There are two aspects in developing a BSS approach:
– criterion established from the assumptions made, &– optimization methods for fulfilling the criterion.
• The suitability of the assumptions (& the approach as a result) depends muchon the applications under consideration.
Example: Independent component analysis (ICA), a well-known BSStechnique, typically assumes that each si[n] is random non-Gaussian & ismutually independent.Mutual independence is a good assumption in speech & wireless commun., butnot so in hyperspectral imaging.
W.-K. Ma 4
Non-negative blind source separation (nBSS)
• In some applications source signals are non-negative by nature; imaging.
• nBSS approaches exploit the signal non-negativity characteristic (plus someadditional assumptions).
• Applications: biomedical imaging, hyperspectral imaging, & analyticalchemistry.
• Some existing nBSS approaches:
– non-negative ICA (nICA) [Plumbley 2003]– non-negative matrix factorization (NMF) [Lee-Seung 1999].
W.-K. Ma 5
• nICA is a statistical approach adopting the mutual independence assumption.
• NMF is a deterministic approach that may cope with correlated sources.
• Essentially NMF deals with an optimization
minS∈RL×M ,A∈RM×N
‖X − SA‖2Fs.t. S � 0, A � 0 (elementwise non-negative)
where X = [ x1, . . . ,xN ], & S = [ s1, . . . , sM ].
NMF may not be a unique factorization, however.
W.-K. Ma 6
CAMNS:Convex analysis of mixtures of non-negative sources
• CAMNS [Chan-Ma-Chi-Wang 2008] is a deterministic nBSS approach.
• In addition to utilizing source non-negativity, CAMNS employs a specialdeterministic assumption called local dominance.
• What is local dominance?Intuitively, signals with many‘zeros’ are likely to satisfy localdominance (math. def. availablesoon).
• Appears to be a good assumption for sparse or high-contrast images.
W.-K. Ma 7
An intuitive illustration of how CAMNS works
s1
s2
s3
x1 x2
x3
How can we extract {s1, . . . , sN} from {x1, . . . ,xM} without knowing {aij}?
W.-K. Ma 8
An intuitive illustration of how CAMNS works (cont’d)
x1 x2
x3
Based on some assumptions (e.g., signal non-negativity & local dominance) & byconvex analysis, we use {x1, . . . ,xM} to construct a polyhedral set.
W.-K. Ma 9
An intuitive illustration of how CAMNS works (cont’d)
s1
s2
s3
We show that the ‘corners’ (formally speaking, extreme points) of this polyhedralset are exactly {s1, . . . , sN} (rather surprisingly).
W.-K. Ma 10
An intuitive illustration of how CAMNS works (cont’d)
Using LP, we can locate the ‘corners’ of the polyhedral set effectively. As a resultperfect separation can be achieved.
W.-K. Ma 11
A quick review of some convex analysis concepts
Affine hull of a given set of vectors {s1, . . . , sN} ⊂ RL:
aff{s1, . . . , sN} ={x =
N∑i=1
θisi
∣∣∣∣ θ ∈ RN ,
N∑i=1
θi = 1}.
• An affine hull can always be represented by
aff{s1, . . . , sN} ={x = Cα+ d
∣∣ α ∈ RP}
for some (non-unique) d ∈ RL and C ∈ RL×P , where P ≤ N − 1 is the affinedimension.
• If {s1, . . . , sN} is affine independent (or {s1 − sN , . . . , sN−1 − sN} is linearlyindependent) then P = N − 1.
W.-K. Ma 12
Convex hull of a given set of vectors {s1, . . . , sN} ⊂ RL:
conv{s1, . . . , sN} ={x =
N∑i=1
θisi
∣∣∣∣ θ ∈ RN+ ,
N∑i=1
θi = 1}
• A point x ∈ conv{s1, . . . , sN} is an extreme point of conv{s1, . . . , sN} if xis not any nontrivial convex combination of {s1, . . . , sN}.
• If {s1, . . . , sN} is affine independent then {s1, . . . , sN} is the set of all extremepoints of its convex hull.
W.-K. Ma 13
Example of 3-dimensional signal space geometry with N = 3. In this example, aff{s1, s2, s3} is
a plane passing through s1, s2, s3, & conv{s1, s2, s3} is a triangle with corners (extreme points)
s1, s2, s3.
W.-K. Ma 14
The assumptions in CAMNS
Recall the model xi =∑M
j=1 aijsj. Our assumptions:
(A1) Source non-negativity: For each j, sj ∈ RL+.
(A2) Local dominance: For each i ∈ {1, . . . , N}, there exists an (unknown) index `isuch that si[`i] > 0 and sj[`i] = 0, ∀j 6= i.
(Reasonable assumption for sparse or high-contrast signals).
(A3) Unit row sum: For all i = 1, . . . ,M ,∑N
j=1 aij = 1.
(Already satisfied in MRI, can be relaxed).
(A4) M ≥ N and A is of full column rank. (Standard BSS assumption)
W.-K. Ma 15
How to enforce (A3), if it does not hold
The unit row sum assumption (A3) may be relaxed.
Suppose that xTi 1 6= 0 (where 1 is an all-one vector) for all i.
Consider a normalized version of xi:
xi =xi
xTi 1
=N∑
j=1
(aijs
Tj 1
xTi 1︸ ︷︷ ︸
,aij
)(sj
sTj 1︸︷︷︸
,sj
).
One can show that (aij) satisfies (A3).
W.-K. Ma 16
CAMNS
Since∑N
j=1 aij = 1 [(A3)], we havefor each observation
xi =N∑
j=1
aijsj ∈ aff{s1, . . . , sN}
This implies
aff{s1, . . . , sN} ⊇ aff{x1, . . . ,xM}.
In fact, we can show that
Lemma. Under (A3) and (A4), aff{s1, . . . , sN} = aff{x1, . . . ,xM}.
W.-K. Ma 17
• Consider the representation
aff{s1, . . . , sN} = aff{x1, . . . ,xN}
={x = Cα+ d
∣∣ α ∈ RN−1}
, A(C,d)
for some (C,d) ∈ RL×(N−1) × RL with rank(C) = N − 1.
• Let us consider determining the source affine set parameters (C,d) from{x1, . . . ,xM}.
• The solution is simple for M = N :
d = xN , C = [ x1 − xN , . . . ,xN−1 − xN ]
• For M > N , we use an affine set fitting solution.
W.-K. Ma 18
Affine set fitting problem:
(C,d) = arg minC,d
CT C=I
M∑i=1
minx∈A(C,d)
‖x− xi‖22︸ ︷︷ ︸proj. error of xi onto A(C, d)
(∗)
where A(C,d) = { x = Cα+ d | α ∈ RN−1 }.
Proposition. Problem (∗) has a closed-form solution
d =1M
M∑i=1
xi, C = [ q1(UUT ), q2(UUT ), . . . , qN−1(UUT ) ]
where U = [ x1 − d, . . . ,xM − d ] ∈ RL×M , and qi(R) denotes the eigenvectorassociated with the ith principal eigenvalue of R.
W.-K. Ma 19
Be reminded that si ∈ RL+. Hence, it is true that
si ∈ aff{s1, . . . , sN} ∩ RL+ = A(C,d) ∩ RL
+ , S
The following lemma arises from local dominance (A2):
Lemma. Under (A1) and (A2),
S = conv{s1, . . . , sN}
Moreover, the set of all its extremepoints is {s1, . . . , sN}.
W.-K. Ma 20
Summarizing the above results, a new nBSS criterion is as follows:
Theorem 1. (CAMNS criterion) Under (A1)-(A4), the polyhedral set
S ={x ∈ RL
∣∣ x = Cα+ d � 0, α ∈ RN−1}
where (C,d) is obtained from the observation set {x1, ...,xM} by theaffine set fitting procedure in Proposition 1, has N extreme points givenby the true source vectors s1, ..., sN .
W.-K. Ma 21
Practical realization of CAMNS
• CAMNS boils down to finding all the extreme points of an observation-constructed polyhedral set.
• In the optimization context this is known as vertex enumeration.
• In CAMNS, there is one important problem structure that we can take fulladvantage of; that is,
Property implied by (A2): s1, . . . , sN are linear independent.
• By exploiting this property, we can locate all the extreme points by solving asequence of LPs ( ≈ 2N LPs at worst).
W.-K. Ma 22
Consider the following LP
p? = mins
rTs
s.t. s ∈ S(†)
for an arbitrary r ∈ RL. From basic LP theory, the solution of (†) is
• one of the extreme points of S (that is, one of the si), or
• any point on a face of S (look rather unlikely, intuitively).
W.-K. Ma 23
W.-K. Ma 24
We can prove that getting a non-extreme-pt. solution is very unlikely:
Lemma. Suppose that r is randomly generated following N (0, IL). Then, withprobability 1, the solution of
p? = mins
rTs
s.t. s ∈ S
is uniquely given by si for some i ∈ {1, ..., N}.
W.-K. Ma 25
• Suppose that we have found l extreme point, say, {s1, . . . , sl}.
• We can find the other extreme points, by using the linear independence of{s1, . . . , sN} to ‘annihilate’ the old extreme points.
Lemma. Suppose r = Bw, where w ∼ N (0, IL−l), & B ∈ RL×(L−l) is suchthat
B[s1, . . . , sl] = 0 BTB = IL−l
Then, with probability 1, at least one of the LPs
p? = mins∈S
rTs q? = maxs∈S
rTs
finds a new extreme point; i.e., si for some i ∈ {l + 1, ..., N}. The 1st LP finds anew extreme pt. if |p?| 6= 0; the 2nd LP finds a new extreme pt. if |q?| 6= 0.
W.-K. Ma 26
On alternatives of implementing CAMNS
• We have another theorem that converts S ⊂ RL to another polyhedral set onR(N−1), denoted by F below.
2s
1s
2x1x
3x = +C dαx
1α 2α
• The set F has a smaller vector dim. (note that L � N). Also it is a simplexwith extreme pts related to those of S in a one-to-one manner.
• For N = 2, F is a line segment on R and there is a closed form for locating itsextreme points.
• For N = 3, F is a triangle on R2 and there is also a simple way for locating itsextreme points.
W.-K. Ma 27
Simulation example 1: Dual energy X-Ray
Original sources
W.-K. Ma 28
Observations
W.-K. Ma 29
Separated sources by CAMNS
W.-K. Ma 30
Separated sources by nICA (a benchmarked nBSS method)
W.-K. Ma 31
Separated sources by NMF (yet another benchmarked nBSS method)
W.-K. Ma 32
Simulation example 2: Human faces
Original sources
W.-K. Ma 33
Observations
W.-K. Ma 34
Separated sources by CAMNS
W.-K. Ma 35
Separated sources by nICA
W.-K. Ma 36
Separated sources by NMF
W.-K. Ma 37
Simulation example 3: Ghosting
Original sources
W.-K. Ma 38
Observations
W.-K. Ma 39
Separated sources by CAMNS
W.-K. Ma 40
Separated sources by nICA
W.-K. Ma 41
Separated sources by NMF
W.-K. Ma 42
Simulation example 4: Five of my students
Original sources
W.-K. Ma 43
Observations
W.-K. Ma 44
Separated sources by CAMNS
W.-K. Ma 45
Simulation example 5: Monte Carlo performance for N = 3
25 30 35 4015
20
25
30
35
40
45
50
SNR (dB)
Ave
rage
(dB
)ˆ
(,
)e
SS CAMNS-LP
nLCACAMNS-geometric
nICANMF
Average sum squared errors of the sources with respect to SNRs.
W.-K. Ma 46
Conclusion
• A convex analysis framework, called CAMNS has been developed for nBSS.
• CAMNS guarantees perfect separation of the true sources, by determining theextreme points of an observation constructed polyhedral set (under severalassumptions)
• A systematic LP-based method has been proposed to realize CAMNS. Itscomplexity is polynomial (specifically, O(L1.5(N − 1)2)).
• A number of simulation results indicate that CAMNS performs very well even inthe presence of dependent sources.
• The source codes is available at http://www.ee.cuhk.edu.hk/~wkma
W.-K. Ma 47
References
[Wang et al. 2003] Y. Wang, J. Xuan, R. Srikanchana, & P. L. Choyke, “Modeling and
reconstruction of mixed functional and molecular patterns,” Int’l Journal Biomedical Imaging, pp.
1–9, 2005.
[Plumbley 2003] M. D. Plumbley, “Algorithms for nonnegative independent component analysis,”
IEEE Trans. Neural Networks, vol. 14, no. 3, pp. 534–543, May 2003.
[Lee-Seung 1999] D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, pp. 788-791, Oct. 1999.
[Chan-Ma-Chi-Wang 2008] T.-H. Chan, W.-K. Ma, C.-Y. Chi, & Y. Wang, “A convex analysis
framework for blind separation of non-negative sources,” IEEE Trans. Signal Process., vol. 56, no.
10, pp. 5120-5134, Oct. 2008.
W.-K. Ma 48