Coupled Matrix/Tensor Decompositions - UCLouvain · 2014-07-04 · L. De Lathauwer Coupled...

L. De Lathauwer

Coupled Matrix/Tensor Decompositions:

An Introduction

Laurent Sorber Mikael Sørensen Marc Van Barel

Lieven De Lathauwer

KU Leuven

Belgium

[email protected]

1

L. De Lathauwer

Canonical Polyadic Decomposition

Rank: minimal number of rank-1 terms [Hitchcock, 1927]

Canonical Polyadic Decomposition (CPD): decomposition in minimal

number of rank-1 terms [Harshman ’70], [Carroll and Chang ’70]

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

• Unique under mild conditions on number of terms and differences

between terms

• Orthogonality (triangularity, . . . ) not required (but may be imposed)

2

L. De Lathauwer

Factor Analysis and Blind Source Separation

• Decompose a data matrix in rank-1 terms that can be interpreted

E.g. statistics, telecommunication, biomedical applications,

chemometrics, data analysis, . . .

A = F ·GT

= +

f1 f2 fR

g1 g2 gR+ . . .+

A

• F: mixing matrix

G: source signals

• Decompose a data matrix in rank-1 terms that can be interpreted

9

L. De Lathauwer

What about SVD?

• SVD is unique

• . . . thanks to orthogonality constraints

A = U · S · VT =R

∑

r=1

srrurvT

r

U, V orthogonal, S diagonal

• Whether these constraints make sense, depends on the application

• SVD is great for dimensionality reduction

best rank-R approximation ← truncated SVD

=

U

VT

SA

11

L. De Lathauwer

Uniqueness: C has full column rank

CPD: T =∑R

r=1 ar ◦ br ◦ cr ∈ CI×J×K T[1,2;3] = (A⊙ B) · CT ∈ C

IJ×K

e.g. C-mode is sample mode

Khatri-Rao product second compound matrices:

U = C2(A)⊙ C2(B) ∈ CI(I−1)

2J(J−1)

2 ×R(R−1)2

ui1i2j1j2r1r2=

∣

∣

∣

∣

ai1r1ai2r1

ai1r2ai2r2

∣

∣

∣

∣

·∣

∣

∣

∣

bj1r1 bj2r1bj1r2 bj2r2

∣

∣

∣

∣

1 6 i1 < i2 6 I 1 6 j1 < j2 6 J 1 6 r1 < r2 6 R

Theorem: if U and C have full column rank, then CPD is unique

(proof is constructive)

[Jiang and Sidiropoulos, ’04], [DL ’06]

15

L. De Lathauwer

Uniqueness: C has full column rank (2)

Theorem: if U ∈ CI(I−1)

2J(J−1)

2 ×R(R−1)2 and C ∈ C

K×R have full column rank,

then CPD is unique

Generic: CPD is unique for R bounded by I, J,K as in

I(I − 1)

2

J(J − 1)

2>

R(R − 1)

2and K > R

Approximately: IJ√2> R K > R

Compare to Kruskal:

min(I, R) + min(J,R) > R + 2 and K > R

16

L. De Lathauwer

Recent results

Unifying theory

Constructive proof

Algorithm for Kruskal’s condition (and beyond)

[Domanov, DL, ’12], [Domanov, DL, ’13]

17

L. De Lathauwer

Canonical Polyadic Decomposition

Rank: minimal number of rank-1 terms [Hitchcock, 1927]

Canonical Polyadic Decomposition (CPD): decomposition in minimal

number of rank-1 terms [Harshman ’70], [Carroll and Chang ’70]

T

=

a1

b1

c1

+ · · ·+

aR

bR

cR

• Unique under mild conditions on number of terms and differences

between terms

• Orthogonality (triangularity, . . . ) not required (but may be imposed)

19

L. De Lathauwer

Decomposition in rank-(L,L, 1) terms

T

=

A1

B1

c1

+ · · ·+

AR

BR

cR

Unique under mild conditions

[DL ’08]

20

L. De Lathauwer

Decomposition in rank-(R1, R2, R3) terms

T

=

A1

B1

C1

+ · · ·+

AR

BR

CR

Unique under mild conditions

Rank-1 term ∼ data atom

Block term ∼ data molecule

[DL ’08]

21

L. De Lathauwer

Constraints

Examples: orthogonality [Sørensen and DL ’12]

nonnegativity [Cichocki et al. ’09]

Vandermonde [Sørensen and DL ’12]

independence [De Vos et al. ’12]

. . .

Not needed for uniqueness in tensor case

Pro: relaxed uniqueness conditions

easier interpretation

no degeneracy (NN, orthogonality)

higher accuracy

Depending on type of constraints, lower or higher computational cost

22

Tensorlab

Tensorlab — a MATLAB toolbox for tensor decompositionsesat.kuleuven.be/sista/tensorlab

I Elementary operations on tensorsMulticore-aware and profiler tuned

I Tensor decompositions with structure and/or symmetryCPD, LMLRA, MLSVD, block term decompositions

I Global minimization of bivariate polynomialsExact line and plane search for tensor optimization

I Cumulants, tensor visualization, estimating a tensor’s rank ormultilinear rank, . . .

25

L. De Lathauwer

Coupled matrix/tensor decompositions

One or more matrices and/or one or more tensors

Symmetric and/or nonsymmetric

One or more factors shared (or parts of factors, or generators)

Constraints (orthogonal, nonnegative, exponential, constant modulus,

polynomial, rational, Toeplitz, Hankel, . . . )

Large, possibly incomplete data

Multi-view data / data fusion

E.g. coupled EEG-fMRI; person-location-activity and person-person and

location-location . . . ; . . .

2

L. De Lathauwer

Uniqueness: C has full column rank

Coupled CPD: T (n) =∑R

r=1 a(n)r

◦ b(n)r

◦ cr ∈ CIn×Jn×K

Khatri-Rao product second compound matrices:

U(n)

= C2(A(n)

) ⊙ C2(B(n)

) ∈ CIn(In−1)

2Jn(Jn−1)

2 ×R(R−1)

2

U =

U(1)

U(2)

...

U(N)

Theorem: if U and C have full column rank, then coupled CPD is unique

(proof is constructive)

3

Increase spatial diversity: Widely Separated Antenna Arrays

s(R)

k

h(R ,N)

j

h(R ,1)j

s(1)k

h(1,1)j

h(1,N)

j

A(1)

y(1)1jk =

R∑

r=1

a(r ,1)1 h(r ,1)

j s(r)k

y(1)Ijk =

R∑

r=1

a(r ,1)I h(r ,1)

j s(r)k

A(N)

y(N)

1jk =

R∑

r=1

a(r ,N)

1 h(r ,N)

j s(r)k

y(N)

Ijk =

R∑

r=1

a(r ,N)

I h(r ,N)

j s(r)k

(A biotensor example is multimodal data fusion: EEG × fMRI× MEG × · · · )

”Coupled Tensor Decompositions” 11 / 19 Mikael Sørensen

Signal Separation from a Coupled Tensorial Perspective

Y(1)

=

s(1)

a(1,1)

h(1,1)

+ · · ·+

s(R)

a(R ,1)

h(R ,1)...

Y(N)

=

s(1)

a(1,N)

h(1,N)

+ · · ·+

s(R)

a(R ,N)

h(R ,N)


Extension to Incoherent Multipath with Small Delay Spread

s(R)

k

h(1,R ,N)

j

h(PR,N ,R ,N)

j

h(1,R ,1)j

h(P1,N ,R ,1)j

s(1)k

h(P1,1,1,1)j

h(1,1,1)j

h(P1,N1,N)

j

h(1,1,N)

j

A(1)

y(1)1jk =

R∑

r=1

Pr ,1∑

p=1

a(p,r ,1)1 h(p,r ,1)

j s(r)k

y(1)Ijk =

R∑

r=1

Pr ,1∑

p=1

a(p,r ,1)I h(p,r ,1)

j s(r)k

A(N)

y(N)

1jk =

R∑

r=1

Pr ,N∑

p=1

a(p,r ,N)

1 h(p,r ,N)

j s(r)k

y(N)

Ijk =

R∑

r=1

Pr ,N∑

p=1

a(p,r ,N)

I h(p,r ,N)

j s(r)k


Signal Separation from a Coupled Tensorial Perspective

Y(1)

=

s(1)

H(1,1)

P1,1

A(1,1)T

+ · · ·+

s(R)

H(R ,1)

PR ,1

A(R ,1)T

.

.

.

Y(N)

=

s(1)

H(1,N)

P1,N

A(1,N)T

+ · · ·+

s(R)

H(R ,N)

PR ,N

A(R ,N)T


Tensorlab v2.0

Tensorlab v2.0

Major upgrade which brings:

I Full support for sparse and incomplete tensors

I Major improvements in computational and memory efficiency

I Structured data fusion

Structured: choose from a large library of constraints to

impose on factors (nonnegative, orthogonal, Toeplitz, . . . )

Data fusion: jointly factorize multiple data sets

𝑌 𝒳

𝑍

2

Applications

Example 4: GPS data set

Five coupled data sets: user-location-activity, user-user,location-feature, activity-activity and user-location

Challenge: predict user participation in activities

Solution with SDF: compute coupled tensor factorization

minimizeU,L,A,F ,𝜆,𝜇,𝜈

𝜔1

2

ℳ(1)(U,L,A)− 𝒯 (1)

2

𝒲(1)

+𝜔2

2

ℳ(2)(U,U, 𝜆)− 𝒯 (2)

2

F+𝜔3

2

ℳ(3)(L,F )− 𝒯 (3)

2

F

+𝜔4

2

ℳ(4)(A,A, 𝜇)− 𝒯 (4)

2

F+𝜔5

2

ℳ(5)(U,L, 𝜈)− 𝒯 (5)

2

F

+𝜔6

2

(‖U‖2F + ‖L‖2F + ‖A‖2F + ‖F‖2F + ‖𝜆‖2F + ‖𝜇‖2F + ‖𝜈‖2F

)

32

Applications

Example 4: 80% missing entries in user-location-activity tensor

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

False positive rate

True

positiv

era

te

SDFCPD

33

Applications

Example 4: 50 users missing in user-location-activity tensor

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

False positive rate

True

positiv

era

te

SDF

34

L. De Lathauwer

Breaking the Curse of Dimensionality in Data Analysis

9th-order (100 × 100 × . . . × 100) data set

10-component thermodynamic phase diagram

incomplete tensor: 130.000 known samples (out of 1018)

sum of structured rank-1 terms

4

Applications

Example 3: InsPyro materials data set

Data set: an incomplete tensor in which each dimensionrepresents the concentration of a metal in an alloy and the entriesare the alloy’s melting temperature

Challenge: predict melting temperatures of different alloys

Solution with SDF: use structured CPD where each factor vectoru(n)

r is a sum of RBF kernels

u(n)b,r =

8∑

i=1

a exp(−(t − b)2/(2c2)

)

where a b and c are the free parameters in u(n)r

27

Applications


0 10 20 30 40 50

0

5

10

15

20

c3 (%)

U(3)

28

Applications


0

20

400 10 20 30 40 50

0

1,000

2,000

8001,00

0

1,20

0

1,2001,400

1,400

1,400

1,600

1,600

1,6001,8

001,80

02,0

002,2

002,4

00

c2 (%)c3 (%)

29

Applications


0

20

400 10 20 30 40 50

0

1,000

2,000

400

600800

800

1,0001,200

1,200

1,400

1,4001,600

1,600

1,800

c2 (%)c3 (%)

30

Applications


0

20

400 10 20 30 40 50

0

1,000

2,000

200

400600

600

800800

1,0001,000

1,2001,200

1,400

1,4001,4001,600

1,8002,00

0

c2 (%)c3 (%)

31

L. De Lathauwer

Conclusion

• Coupled matrix/tensor decompositions = important new concept

• Many applications

• Gold mine for research topics:

– algebra

– (numerical) linear algebra

– randomized NLA

– large-scale numerical optimization

– . . .

5

Date post:	11-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Coupled Matrix/Tensor Decompositions - UCLouvain · 2014-07-04 · L. De Lathauwer Coupled...

Documents