L. De Lathauwer
Coupled Matrix/Tensor Decompositions:
An Introduction
Laurent Sorber Mikael Sørensen Marc Van Barel
Lieven De Lathauwer
KU Leuven
Belgium
1
L. De Lathauwer
Canonical Polyadic Decomposition
Rank: minimal number of rank-1 terms [Hitchcock, 1927]
Canonical Polyadic Decomposition (CPD): decomposition in minimal
number of rank-1 terms [Harshman ’70], [Carroll and Chang ’70]
T
=
a1
b1
c1
+ · · ·+
aR
bR
cR
• Unique under mild conditions on number of terms and differences
between terms
• Orthogonality (triangularity, . . . ) not required (but may be imposed)
2
L. De Lathauwer
Factor Analysis and Blind Source Separation
• Decompose a data matrix in rank-1 terms that can be interpreted
E.g. statistics, telecommunication, biomedical applications,
chemometrics, data analysis, . . .
A = F ·GT
= +
f1 f2 fR
g1 g2 gR+ . . .+
A
• F: mixing matrix
G: source signals
• Decompose a data matrix in rank-1 terms that can be interpreted
9
L. De Lathauwer
What about SVD?
• SVD is unique
• . . . thanks to orthogonality constraints
A = U · S · VT =R
∑
r=1
srrurvT
r
U, V orthogonal, S diagonal
• Whether these constraints make sense, depends on the application
• SVD is great for dimensionality reduction
best rank-R approximation ← truncated SVD
=
U
VT
SA
11
L. De Lathauwer
Uniqueness: C has full column rank
CPD: T =∑R
r=1 ar ◦ br ◦ cr ∈ CI×J×K T[1,2;3] = (A⊙ B) · CT ∈ C
IJ×K
e.g. C-mode is sample mode
Khatri-Rao product second compound matrices:
U = C2(A)⊙ C2(B) ∈ CI(I−1)
2J(J−1)
2 ×R(R−1)2
ui1i2j1j2r1r2=
∣
∣
∣
∣
ai1r1ai2r1
ai1r2ai2r2
∣
∣
∣
∣
·∣
∣
∣
∣
bj1r1 bj2r1bj1r2 bj2r2
∣
∣
∣
∣
1 6 i1 < i2 6 I 1 6 j1 < j2 6 J 1 6 r1 < r2 6 R
Theorem: if U and C have full column rank, then CPD is unique
(proof is constructive)
[Jiang and Sidiropoulos, ’04], [DL ’06]
15
L. De Lathauwer
Uniqueness: C has full column rank (2)
Theorem: if U ∈ CI(I−1)
2J(J−1)
2 ×R(R−1)2 and C ∈ C
K×R have full column rank,
then CPD is unique
Generic: CPD is unique for R bounded by I, J,K as in
I(I − 1)
2
J(J − 1)
2>
R(R − 1)
2and K > R
Approximately: IJ√2> R K > R
Compare to Kruskal:
min(I, R) + min(J,R) > R + 2 and K > R
16
L. De Lathauwer
Recent results
Unifying theory
Constructive proof
Algorithm for Kruskal’s condition (and beyond)
[Domanov, DL, ’12], [Domanov, DL, ’13]
17
L. De Lathauwer
Canonical Polyadic Decomposition
Rank: minimal number of rank-1 terms [Hitchcock, 1927]
Canonical Polyadic Decomposition (CPD): decomposition in minimal
number of rank-1 terms [Harshman ’70], [Carroll and Chang ’70]
T
=
a1
b1
c1
+ · · ·+
aR
bR
cR
• Unique under mild conditions on number of terms and differences
between terms
• Orthogonality (triangularity, . . . ) not required (but may be imposed)
19
L. De Lathauwer
Decomposition in rank-(L,L, 1) terms
T
=
A1
B1
c1
+ · · ·+
AR
BR
cR
Unique under mild conditions
[DL ’08]
20
L. De Lathauwer
Decomposition in rank-(R1, R2, R3) terms
T
=
A1
B1
C1
+ · · ·+
AR
BR
CR
Unique under mild conditions
Rank-1 term ∼ data atom
Block term ∼ data molecule
[DL ’08]
21
L. De Lathauwer
Constraints
Examples: orthogonality [Sørensen and DL ’12]
nonnegativity [Cichocki et al. ’09]
Vandermonde [Sørensen and DL ’12]
independence [De Vos et al. ’12]
. . .
Not needed for uniqueness in tensor case
Pro: relaxed uniqueness conditions
easier interpretation
no degeneracy (NN, orthogonality)
higher accuracy
Depending on type of constraints, lower or higher computational cost
22
Tensorlab
Tensorlab — a MATLAB toolbox for tensor decompositionsesat.kuleuven.be/sista/tensorlab
I Elementary operations on tensorsMulticore-aware and profiler tuned
I Tensor decompositions with structure and/or symmetryCPD, LMLRA, MLSVD, block term decompositions
I Global minimization of bivariate polynomialsExact line and plane search for tensor optimization
I Cumulants, tensor visualization, estimating a tensor’s rank ormultilinear rank, . . .
25
L. De Lathauwer
Coupled matrix/tensor decompositions
One or more matrices and/or one or more tensors
Symmetric and/or nonsymmetric
One or more factors shared (or parts of factors, or generators)
Constraints (orthogonal, nonnegative, exponential, constant modulus,
polynomial, rational, Toeplitz, Hankel, . . . )
Large, possibly incomplete data
Multi-view data / data fusion
E.g. coupled EEG-fMRI; person-location-activity and person-person and
location-location . . . ; . . .
2
L. De Lathauwer
Uniqueness: C has full column rank
Coupled CPD: T (n) =∑R
r=1 a(n)r
◦ b(n)r
◦ cr ∈ CIn×Jn×K
Khatri-Rao product second compound matrices:
U(n)
= C2(A(n)
) ⊙ C2(B(n)
) ∈ CIn(In−1)
2Jn(Jn−1)
2 ×R(R−1)
2
U =
U(1)
U(2)
...
U(N)
Theorem: if U and C have full column rank, then coupled CPD is unique
(proof is constructive)
3
Increase spatial diversity: Widely Separated Antenna Arrays
s(R)
k
h(R ,N)
j
h(R ,1)j
s(1)k
h(1,1)j
h(1,N)
j
A(1)
y(1)1jk =
R∑
r=1
a(r ,1)1 h(r ,1)
j s(r)k
y(1)Ijk =
R∑
r=1
a(r ,1)I h(r ,1)
j s(r)k
A(N)
y(N)
1jk =
R∑
r=1
a(r ,N)
1 h(r ,N)
j s(r)k
y(N)
Ijk =
R∑
r=1
a(r ,N)
I h(r ,N)
j s(r)k
(A biotensor example is multimodal data fusion: EEG × fMRI× MEG × · · · )
”Coupled Tensor Decompositions” 11 / 19 Mikael Sørensen
Signal Separation from a Coupled Tensorial Perspective
Y(1)
=
s(1)
a(1,1)
h(1,1)
+ · · ·+
s(R)
a(R ,1)
h(R ,1)...
Y(N)
=
s(1)
a(1,N)
h(1,N)
+ · · ·+
s(R)
a(R ,N)
h(R ,N)
”Coupled Tensor Decompositions” 12 / 19 Mikael Sørensen
Extension to Incoherent Multipath with Small Delay Spread
s(R)
k
h(1,R ,N)
j
h(PR,N ,R ,N)
j
h(1,R ,1)j
h(P1,N ,R ,1)j
s(1)k
h(P1,1,1,1)j
h(1,1,1)j
h(P1,N1,N)
j
h(1,1,N)
j
A(1)
y(1)1jk =
R∑
r=1
Pr ,1∑
p=1
a(p,r ,1)1 h(p,r ,1)
j s(r)k
y(1)Ijk =
R∑
r=1
Pr ,1∑
p=1
a(p,r ,1)I h(p,r ,1)
j s(r)k
A(N)
y(N)
1jk =
R∑
r=1
Pr ,N∑
p=1
a(p,r ,N)
1 h(p,r ,N)
j s(r)k
y(N)
Ijk =
R∑
r=1
Pr ,N∑
p=1
a(p,r ,N)
I h(p,r ,N)
j s(r)k
”Coupled Tensor Decompositions” 14 / 19 Mikael Sørensen
Signal Separation from a Coupled Tensorial Perspective
Y(1)
=
s(1)
H(1,1)
P1,1
A(1,1)T
+ · · ·+
s(R)
H(R ,1)
PR ,1
A(R ,1)T
.
.
.
Y(N)
=
s(1)
H(1,N)
P1,N
A(1,N)T
+ · · ·+
s(R)
H(R ,N)
PR ,N
A(R ,N)T
”Coupled Tensor Decompositions” 15 / 19 Mikael Sørensen
Tensorlab v2.0
Tensorlab v2.0
Major upgrade which brings:
I Full support for sparse and incomplete tensors
I Major improvements in computational and memory efficiency
I Structured data fusion
Structured: choose from a large library of constraints to
impose on factors (nonnegative, orthogonal, Toeplitz, . . . )
Data fusion: jointly factorize multiple data sets
𝑌 𝒳
𝑍
2
Applications
Example 4: GPS data set
Five coupled data sets: user-location-activity, user-user,location-feature, activity-activity and user-location
Challenge: predict user participation in activities
Solution with SDF: compute coupled tensor factorization
minimizeU,L,A,F ,𝜆,𝜇,𝜈
𝜔1
2
ℳ(1)(U,L,A)− 𝒯 (1)
2
𝒲(1)
+𝜔2
2
ℳ(2)(U,U, 𝜆)− 𝒯 (2)
2
F+𝜔3
2
ℳ(3)(L,F )− 𝒯 (3)
2
F
+𝜔4
2
ℳ(4)(A,A, 𝜇)− 𝒯 (4)
2
F+𝜔5
2
ℳ(5)(U,L, 𝜈)− 𝒯 (5)
2
F
+𝜔6
2
(‖U‖2F + ‖L‖2F + ‖A‖2F + ‖F‖2F + ‖𝜆‖2F + ‖𝜇‖2F + ‖𝜈‖2F
)
32
Applications
Example 4: 80% missing entries in user-location-activity tensor
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
False positive rate
True
positiv
era
te
SDFCPD
33
Applications
Example 4: 50 users missing in user-location-activity tensor
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
False positive rate
True
positiv
era
te
SDF
34
L. De Lathauwer
Breaking the Curse of Dimensionality in Data Analysis
9th-order (100 × 100 × . . . × 100) data set
10-component thermodynamic phase diagram
incomplete tensor: 130.000 known samples (out of 1018)
sum of structured rank-1 terms
4
Applications
Example 3: InsPyro materials data set
Data set: an incomplete tensor in which each dimensionrepresents the concentration of a metal in an alloy and the entriesare the alloy’s melting temperature
Challenge: predict melting temperatures of different alloys
Solution with SDF: use structured CPD where each factor vectoru(n)
r is a sum of RBF kernels
u(n)b,r =
8∑
i=1
a exp(−(t − b)2/(2c2)
)
where a b and c are the free parameters in u(n)r
27
Applications
Example 3: InsPyro materials data set
0 10 20 30 40 50
0
5
10
15
20
c3 (%)
U(3)
28
Applications
Example 3: InsPyro materials data set
0
20
400 10 20 30 40 50
0
1,000
2,000
8001,00
0
1,20
0
1,2001,400
1,400
1,400
1,600
1,600
1,6001,8
001,80
02,0
002,2
002,4
00
c2 (%)c3 (%)
29
Applications
Example 3: InsPyro materials data set
0
20
400 10 20 30 40 50
0
1,000
2,000
400
600800
800
1,0001,200
1,200
1,400
1,4001,600
1,600
1,800
c2 (%)c3 (%)
30
Applications
Example 3: InsPyro materials data set
0
20
400 10 20 30 40 50
0
1,000
2,000
200
400600
600
800800
1,0001,000
1,2001,200
1,400
1,4001,4001,600
1,8002,00
0
c2 (%)c3 (%)
31
L. De Lathauwer
Conclusion
• Coupled matrix/tensor decompositions = important new concept
• Many applications
• Gold mine for research topics:
– algebra
– (numerical) linear algebra
– randomized NLA
– large-scale numerical optimization
– . . .
5