Block Component Analysis A New Concept for Blind Source ...arie/LVA-ICA-2012-LDL.pdfTensor solution:...

Post on 02-Sep-2020

0 views 0 download

transcript

L. De Lathauwer

Block Component Analysis

A New Concept for Blind Source Separation

(Higher-Order Tensors and Blind Signal Separation)

Lieven De Lathauwer

KU Leuven

Belgium

Lieven.DeLathauwer@kuleuven-kulak.be

Lieven.DeLathauwer@esat.kuleuven.be

[Selected slides]

1

L. De Lathauwer

Factor analysis and blind source separation

• Decompose a data matrix in meaningful rank-1 terms

T = A · BT

T =

a1

b1+ · · ·+

aR

bR

• Mixing vectors and sources

2

L. De Lathauwer

• Decomposition in rank-1 terms is not unique

T = A · BT

= (AM) · (M−1B

T )

= A · BT

T =

1

1+ · · ·+

R

R

aa

bb

3

L. De Lathauwer

Principal Component Analysis and Singular Value

Decomposition

PCA, SVD: uniqueness thanks to orthogonality constraints

T = U · Σ · VT

=∑

r

σrurvT

r

U, V orthogonal, Σ diagonal

4

L. De Lathauwer

Motivating example: excitation-emission fluorescence in

chemometrics

Matrix approach

row vector ∼ emission spectrum

column vector ∼ excitation spectrum

T =

a1

b1+ · · ·+

aR

bR

NMF not unique in general

5

L. De Lathauwer

Tensor solution: CP Analysis

Tensorization: one matrix → several matrices, stacked in tensor

row vector ∼ emission spectrum

column vector ∼ excitation spectrum

coefficients ∼ concentrations

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

[Smilde, Bro, Geladi ’04]

6

L. De Lathauwer

Tensor rank and Canonical Polyadic Decomposition

Rank: minimal number of rank-1 terms [Hitchcock, 1927]

Canonical Polyadic Decomposition (CPD): decomposition in minimal

number of rank-1 terms [Harshman ’70], [Carroll and Chang ’70]

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

• Unique under mild conditions on number of terms and differences

between terms

• Orthogonality not required

• Uniqueness in “underdetermined” case

7

L. De Lathauwer

Tensor data:

• telecommunications

• higher-order statistics

• annotated graphs

• hyperlink data

• matrices (deliberately) measured under different conditions / at different

time instances / . . .

• matrices depending on parameter(s)

• . . .

8

L. De Lathauwer

Alternative representation: tensor diagonalization

9

L. De Lathauwer

Alternative representation: joint matrix diagonalization

Also underdetermined case

10

L. De Lathauwer

Motivating example: EEG

0 1 2 3 4 5 6 7 8 9 10

T1

T2

P3

C3

F3

O1

T5

T3

F7

Fp1

Pz

Cz

Fz

P4

C4

F4

02

T6

T4

F8

Fp2

Time (sec)

236µV

Tensorization: biorthogonal wavelet

11

L. De Lathauwer

Components: eye blink and epileptic activity

0 1 2 3 4 5 6 7 8 9 10−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12temporal atom of eye−blink activity

Time (sec)0 1 2 3 4 5 6 7 8 9 10

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

Time (sec)

temporal atom of seizure activity

0 5 10 15 20 25 30 35 40 450

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Freq (Hz)

frequency distribution of eye−blink activity

0 5 10 15 20 25 30 35 40 450

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18frequency distribution of seizure activity

Freq (Hz)

[De Vos et al., Neuroimage ’07], [Acar et al., Bioinformatics ’07]

12

L. De Lathauwer

Independent Component Analysis (ICA)

Model: X = AS

s1

s2

s3

s1

s2s3

+

Sources statistically independent

13

L. De Lathauwer

ICA: basic equations

Model:

X = AS

Second order:

C(2)X

= E{XXT}

= A · C(2)S

· AT

uncorrelated sources: C(2)S

is diagonal

C(2)X

=

σ2s1

σ2s2

σ2sR

a1

a1

a2

a2

aR

aR

+ . . .++

14

L. De Lathauwer

Higher order:

C(N)X

= C(N)S

·1 A ·2 A ·3 . . . ·N A

independent sources: C(N)S

is diagonal

= +

c(N)s1

c(N)s2

c(N)sR

a1 a2 aR

a1 a2 aR

a1 a2 aR

+ . . .+C(N)X

Tensorization: decomposition data matrix → CPD cumulant tensor

[Comon ’94], [Cardoso ’93]

15

L. De Lathauwer

ICA based on second-order statistics

Condition: sources mutually uncorrelated, but individually correlated in time

C(2)X

(τ) = E{X(t)X(t + τ)T}

= A · C(2)S

(τ) · AT

=

Tensorization: stack covariance matrices in 3rd-order tensor

[Belouchrani et al. ’97], [Yeredor ’02], . . .

16

L. De Lathauwer

Tensor rank and Canonical Polyadic Decomposition

Rank: minimal number of rank-1 terms [Hitchcock, 1927]

Canonical Polyadic Decomposition (CPD): decomposition in minimal

number of rank-1 terms

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

[Harshman ’70], [Carroll and Chang ’70]

Unique under mild conditions

17

L. De Lathauwer

Decomposition in rank-(L,L, 1) terms

T

=

A1

B1

c1

+ · · ·+

AR

BR

cR

Unique under mild conditions

[DL ’08]

18

L. De Lathauwer

Decomposition in rank-(R1, R2, R3) terms

T

=

A1

B1

C1

+ · · ·+

AR

BR

CR

Unique under mild conditions

[DL ’08]

19

L. De Lathauwer

Alternative representation: tensor block diagonalization

20

L. De Lathauwer

Decomposition in rank-(R1, R2, •) terms

T

=

A1

B1 + · · ·+

AR

BR

[DL ’08]

Alternative representation: joint block diagonalization

21

L. De Lathauwer

Block Component Analysis

Demo

−20

2

−20

2

−1

0

1

−20

2

−20

2

−1

0

1

−20

2

−20

2

−1

0

1

−20

2

−20

2

−1

0

1

−20

2

−20

2

−1

0

1

−20

2

−20

2

−1

0

1

22

L. De Lathauwer

Exponentials, sinusoids, polynomials, exponential

polynomials

Principle: Map every row of T = A · BT to Hankel matrix

Hankel matrices are often very ill-conditioned

Hankel matrices generated by exponential polynomials are exactly low-rank

+ . . .+ + . . .+

[DL ’11]

23

L. De Lathauwer

0

0

0

0

0

0

0

0

0

0

0

0

11

11

1

1

1

1-1-1

0.50.5

0.50.5

0.50.5

2

2

2

2

-2-2

-2

4

5

-5

theoretical values: (L1, L2) = (2, 3)

perfect separation: (L1, L2) = (2, 3), (3, 3), (2, 4), (3, 4), (4, 4)

good separation: (L1, L2) = (2, 2), (1, 2)

24

L. De Lathauwer

0 0.5 1−0.2

0

0.2

0 0.5 1−0.1

0

0.1

0 0.5 1−0.1

0

0.1

0 0.5 1−0.5

0

0.5

0 0.5 1−0.05

0

0.05

0 0.5 1−0.2

0

0.2

501 samples, SNR = 5 dB

good separation: (L1, L2) = (1, 2), (2, 2), (2, 3)

25

L. De Lathauwer

theoretical values: L1 = 2, L2 = 251

0 50 100 150 200 250 300 350 400 450 500−1

−0.5

0

0.5

1

0 50 100 150 200 250 300 350 400 450 500−1

−0.5

0

0.5

1

0 50 100 150 200 250 300 350 400 450 500−2

−1

0

1

2

0 50 100 150 200 250 300 350 400 450 500−2

−1

0

1

2

26

L. De Lathauwer

theoretical values: L1 = 2, L2 = 251

results: L1 = 2, L2 = 2, 3, . . . , 7

0 50 100 150 200 250 300 350 400 450 500−2

−1

0

1

2

0 50 100 150 200 250 300 350 400 450 500−2

−1

0

1

2

0 50 100 150 200 250 300 350 400 450 500−1

−0.5

0

0.5

1

0 50 100 150 200 250 300 350 400 450 500−4

−2

0

2

4

27

L. De Lathauwer

Toy example: audio

5 10 15 20 25 30−0.4

−0.2

0

0.2

0.4

5 10 15 20 25 30−0.5

0

0.5

Chirp (top) and train (bottom) signal, 31 samples

28

L. De Lathauwer

5 10 150

0.5

1

1.5

2

2.5

5 10 150

0.5

1

1.5

2

2.5

3

100 200 300 400 5000

10

20

30

40

50

60

100 200 300 400 5000

20

40

60

80

100

singular values of Hankel matrices generated by chirp (left) and train (right)

top: 31 samples; bottom: 1000 samples

29

L. De Lathauwer

L1 / L2 1 2 3 4 5 6 7

1 20 48 49 37 20 15 15

2 48 47 49 48 44 17 16

3 49 49 49 47 23 20 19

4 37 48 47 47 47 20 18

5 20 44 23 47 45 29 16

6 15 17 20 20 29 25 33

7 15 16 19 18 16 33 24

mean SIR [dB] (Hankel, noiseless) (ICA: COM2: 15 dB, JADE: 14 dB)

L1 / L2 1 2 3 4 5 6 7

1 49 47 49 51 51 19 13

2 47 47 50 49 51 38 22

3 49 50 49 48 49 47 45

4 51 49 48 47 48 46 44

5 51 51 49 48 48 46 44

6 19 38 47 46 46 46 47

7 13 22 45 44 44 47 44

median SIR [dB] (Hankel, noiseless) (ICA: COM2: 15 dB, JADE: 14 dB)

30

L. De Lathauwer

Results for noisy data:

0 5 10 15 20 25 30 355

10

15

20

25

30

35

40

45

50

55

60

BCA Hankel L=1BCA Hankel L=2BCA Hankel L=3BCA Hankel L=4BCA wavelet L=1BCA wavelet L=2BCA wavelet L=3BCA wavelet L=4ICA COM2

SNR [dB]

SIR

[dB]

31

L. De Lathauwer

Foundation: BCA exploits low intrinsic dimensionality

intrinsic dimensionality ∼ multilinear rank

Related: Pareto analysis

compressed sensing

scientific computing

. . .

Tensorization: Hankel, wavelet, time-frequency, . . .

0 100 200 300 400 500−4

−3

−2

−1

0

1

2

3

4

unstructured signal

32

L. De Lathauwer

Analogy:

CPD: splitting in “atoms” (pure frequencies)

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

BTD: splitting in “molecules” (sounds)

T

=

A1

B1

c1

+ · · ·+

AR

BR

cR

33

L. De Lathauwer

Conclusion

• BCA: separation based on low intrinsic dimensionality

• Intrinsic dimensionality measured by (multilinear) rank

• Rank-1 hypothesis sometimes questionable

• Related to Pareto, compressed sensing, etc.

• Related to Sparse Component Analysis, etc.

• Tensorization: HOS, sets of matrices, Hankel, . . .

• Hankel: separation of exponential polynomials

• Low complexity variants of current tensorization-based schemes

• PCA, ICA, CPA, NMF, . . . : easier to use but assumptions should hold

• Constrained BCA: nonnegativity, sparsity, orthogonality, statistical

independence, . . .

Related work: CPA with orthogonality constraint [Sørensen, DL et al.]

CPA with independence constraint [De Vos, Van Huffel, DL]

Thanks: to Laurent Sorber for helping with figures

34

L. De Lathauwer

L. De Lathauwer, “Block Component Analysis, a New Concept for Blind

Source Separation,” in F. Theis, A. Cichocki, A. Yeredor, M. Zibulevsky

(Eds.): Latent Variable Analysis and Signal Separation, 10th International

Conference, LVA/ICA 2012, Tel-Aviv, Israel, March 2012, Proceedings,

LNCS 7191, Springer, Heidelberg, 2012, pp. 1-8.

35