Download - Lecture 5. The CP Representation and Tensor Rank · From Matrix to Tensor: The Transition to Numerical Multilinear Algebra Lecture 5. The CP Representation and Tensor Rank Charles

From Matrix to Tensor:The Transition to Numerical Multilinear Algebra

Lecture 5. The CP Representation andTensor Rank

Charles F. Van Loan

Cornell University

The Gene Golub SIAM Summer School 2010Selva di Fasano, Brindisi, Italy

⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 5. The CP Representation and Tensor Rank

Where We Are

Lecture 1. Introduction to Tensor Computations

Lecture 2. Tensor Unfoldings

Lecture 3. Transpositions, Kronecker Products, Contractions

Lecture 4. Tensor-Related Singular Value Decompositions

Lecture 5. The CP Representation and Tensor Rank

Lecture 6. The Tucker Representation

Lecture 7. Other Decompositions and Nearness Problems

Lecture 8. Multilinear Rayleigh Quotients

Lecture 9. The Curse of Dimensionality

Lecture 10. Special Topics


What is this Lecture About?

Sums of Rank-1 Tensors

The SVD of a matrix A expresses A as a very special sum of rank-1matrices.

Let us do the same thing as much as possible with tensor A.

This requires an understanding of (a) rank-1 tensors and theirunfoldings, (b) the Kruskal tensor format, (c) the alternating leastsquares framework for multilinear sum-of-squares optimization, and(d) the notion of tensor rank.

We use the order-3 case to motivate the main ideas.


What is this Lecture About?

A Note on Terminology

The central decomposition in this lecture is the CP Decomposition.

It also goes by the name of the CANDECOMP/PARAFACDecomposition.

CANDECOMP = Canonical Decomposition

PARAFAC = Parallel Factors Decomposition


Rank-1 Tensors (Order-3)

Definition

If f ∈ IRn1 , g ∈ IRn2 , and h ∈ IRn3 , then

B = f ◦ g ◦ h

is defined byB(i1, i2, i3) = f (i1)g(i2)h(i3).

The tensor B ∈ IRn1×n2×n3 is a rank-1 tensor.



The Kronecker Product Connection...

B =

[f1f2

]◦

g1

g2

g3

◦[ h1

h2

]⇔

b111

b211

b121

b221

b131

b231

b112

b212

b122

b222

b132

b232

=

f1g1h1

f2g1h1

f1g2h1

f2g2h1

f1g3h1

f2g3h1

f1g1h2

f2g1h2

f1g2h2

f2g2h2

f1g3h2

f2g3h2

= h ⊗ g ⊗ f



The Modal Unfoldings...

If

B = f ◦ g ◦ h =

[f1f2

]◦

g1

g2

g3

◦[ h1

h2

]then

B(1) =

[f1g1h1 f1g2h1 f1g3h1 f1g1h2 f1g2h2 f1g3h2

f2g1h1 f2g2h1 f2g3h1 f2g1h2 f2g2h2 f2g3h2

]

=

[f1 · (h ⊗ g)T

f2 · (h ⊗ g)T

]= f ⊗ (h ⊗ g)T




If

B = f ◦ g ◦ h =

[f1f2

]◦

g1

g2

g3

◦[ h1

h2

]then

B(2) =

f1g1h1 f2g1h1 f1g1h2 f2g1h2



=

g1 · (h ⊗ f )T

g2 · (h ⊗ f )T

g3 · (h ⊗ f )T

= g ⊗ (h ⊗ f )T




If

B = f ◦ g ◦ h =

[f1f2

]◦

g1

g2

g3

◦[ h1

h2

]then

B(3) =

[f1g1h1 f2g1h1 f1g2h1 f2g2h1 f1g3h1 f2g3h1

f1g1h2 f2g1h2 f1g2h2 f2g2h2 f1g3h2 f2g3h2

]

=

[h1 · (g ⊗ f )T

h2 · (g ⊗ f )T

]

= h ⊗ (g ⊗ f )T


Problem 5.1. Suppose a ∈ IRn1n2n3 . Show how to compute f ∈ IRn1 andg ∈ IRn2 so that ‖ a− h ⊗ g ⊗ f ‖2 is minimized where h ∈ IRn3 is given.Hint. It’s an SVD problem.

Problem 5.2. Given A ∈ IRn1×n2×n3 with positive entries, how would youchoose B = f ◦ g ◦ h ∈ IRn1×n2×n3 so that

φ(f , g , h) =nX

i=1

|log(A(i)) − log(B(i))|2

is minimized.


The CP Representation (Order-3)

Notation

Given λ ∈ IRr , F ∈ IRn1×r , G ∈ IRn2×r , and H ∈ IRn3×r , we define[[ λ; F , G , H ]] ∈ IRn1×n2×n3 by

[[ λ, F , G , H ]] =r∑

j=1

λj · F (:, j) ◦ G (:, j) ◦ H(:.j)

A weighted sum of rank-1 tensors where the vectors that specify therank-1’s are columns of the matrices F , G, and H.



Kruskal Form

We say that X ∈ IRn1×n2×n3 is in Kruskal form if

X = [[ λ; F , G , H ]].

where λ ∈ IRr , F ∈ IRn1×r , G ∈ IRn2×r , and H ∈ IRn3×r .

Can we write a given tensor A as an illuminating sum of rank-1 tensors?I.e., Given A, can we find X = [[ λ; F , G , H ]] so that A ≈ X in some

meaningful way?



Equivalent Formulations...

If X = [[ λ, F , G , H ]] ∈ IRn1×n2×n3 , then

X (i1, i2, i3) =r∑

j=1

λj · F (i1, j) · G (i2, j) · H(i3, j))

vec(X ) =r∑

j=1

λj · H(:, j)⊗ G (:, j)⊗ F (:, j)


Matlab Tensor Toolbox: ktensor Set-Up

n = [5 8 3]; r = 4;

% Set up a random , length -r ktensor ...

F = randn(n(1),r); G = randn(n(2),r);

H = randn(n(3),r); lambda = ones(r,1);

X = ktensor(lambda ,{F,G,H});

Fsize = size(X.U{1}); Gsize = size(X.U{2});

Hsize = size(X.U{3});

L = length(X.lambda ); s = size(X);

A ktensor is a structure with two fields that is used to represent a tensor inKruskal form. In the above, X.lambda houses the vector of weights while X.U isa cell array of the matrices that define the tensor X.

Variable Value

Fsize [ 5,4]

Gsize [ 8,4]

Hsize [ 3,4]

L 4

s [5 8 3]


Matlab Tensor Toolbox: Norm of a ktensor

function alfa = normKruskal(X)

% X is a ktensor and alfa is the Frobenius norm

% of the tensor it represents.

N = prod(size(X));

% Create a multidimensional array that houses

% the Kruskal tensor ...

Xarray = double(X);

% Reshape as a vector and compute its 2-norm ...

alfa = norm(reshape(Xarray ,N,1));

Problem 5.3. Write a Matlab function Y = normalize(X) that takes atakes a ktensor X and returns a ktensor Y with the property that (a)

Y.U{j}(:,k) = X.U{j}(:,k)/norm(X.U{j}(:,k))

for all appropriate values of k and j and (b) double(X) = double(Y).



The CP Approximation Problem

Given A ∈ IRn1×n2×n3 and r , determine λ ∈ IRr , F ∈ IRn1×r ,G ∈ IRn2×r , and H ∈ IRn3×r so that

A ≈ [[ λ; F , G , H ]] = X .

Using Least Squares...

Choose λ, F , G , and H so that

‖ A − X ‖2F =

∥∥∥∥∥∥vec(A) −r∑

j=1

λj · H(:, j)⊗ G (:, j)⊗ F (:, j)

∥∥∥∥∥∥2

2

is minimized.

A multilinear optimization problem. Reshape using the Khatri-RaoProduct...


The Khatri-Rao Product

Definition

If

B =[

b1 · · · br

]∈ IRn1×r

C =[

c1 · · · cr

]∈ IRn2×r

then the Khatri-Rao product of B and C is given by

B � C =[

b1 ⊗ c1 · · · br ⊗ cr

].

Note that B � C ∈ IRn1n2×r .



“Fast” Property 1.

If B ∈ IRn1×r and C ∈ IRn2×r , then

(B � C )T (B � C ) = (BTB). ∗ (CTC )

where “.∗” denotes pointwise multiplication.

Problem 5.4. Prove this property using the Kronecker product facts (i)(W ⊗ X )(Y ⊗ Z) = WY ⊗ XZ and (ii) (W ⊗ X )T = W T ⊗ XT . Howmany flops are required?



“Fast” Property 2.

IfB =

[b1 · · · br

]∈ IRn1×r

C =[

c1 · · · cr

]∈ IRn2×r

x ∈ IRn1n2 , and y = (B � C )T x , then

y =

cT1 Xb1

...cTr Xbr

X = reshape(x , n2, n1)

Problem 5.5. Prove this property using vec(YXW T ) = (W ⊗ Y ) · vec(X ).How many flops are required?


Problem 5.6. Complete the following function so that it performs asspecified.

function x = KRLS(B,C,d)

% B is n1-by-1, C is n2-by-1, and d is n1*n2-by-1

% x minimizes norm(Ax - d,2) where A is the Khatri-Rao

% product of B and C.

Use the method of normal equations and assume that A has full columnrank. Is there an equally efficient way to solve the problem via the QRfactorization of A.



Unfolding Tensors in the Kruskal Form

Given A ∈ IRn1×n2×n3 our goal is to minimize

‖ A − X ‖F = ‖ A(k) − X(k) ‖F

where

X = [[ λ; F , G , H ]] =r∑

j=1

λj · fj ◦ gj ◦ hj

with

F =[

f1 · · · fr]

G =[

g1 · · · gr

]H =

[h1 · · · hr

]So what do the modal unfoldings of X look like?

It will be a sum of rank-1 tensor unfoldings...



Unfolding Tensors in the Kruskal Form

Since B = f ◦ g ◦ h implies

B(1) = f ⊗ (h ⊗ g)T

B(2) = g ⊗ (h ⊗ f )T

B(3) = h ⊗ (g ⊗ f )T

we have

X(1) =r∑

j=1

λj · fj ⊗ (hj ⊗ gj)T = F · diag(λj) · (H � G )T

X(2) =r∑

j=1

λj · gj ⊗ (hj ⊗ fj)T = G · diag(λj) · (H � F )T

X(3) =r∑

j=1

λj · hj ⊗ (gk ⊗ fj)T = H · diag(λj) · (G � F )T



The Alternating LS Solution Framework...

‖ A − X ‖F

=

‖ A(1) − F · diag(λj) · (H � G )T ‖F

=

‖ A(2) − G · diag(λj) · (H � F )T ‖F

=

‖ A(3) − H · diag(λj) · (G � F )T ‖F

⇐ 1. Fix G and H andimprove λ and F .

⇐ 2. Fix F and H andimprove λ and G .

⇐ 3. Fix F and G andimprove λ and H.



The Alternating LS Solution Framework

Repeat:

1. Let F minimize ‖ A(1) − F · (H � G )T ‖F

and for j = 1:r set

λj = ‖ F (:, j) ‖2 and F (:, j) = F (:, j)/λj .

2. Let G minimize ‖ A(2) − G · (H � F )T ‖F

and for j = 1:r set

λj = ‖ G (:, j) ‖2 and G (:, j) = G (:, j)/λj .

3. Let H minimize ‖ A(3) − H · (G � F )T ‖F

and for j = 1:r set

λj = ‖ H(:, j) ‖2 and H(:, j) = H(:, j)/λj .

These are linear least squares problems. The columns of F , G, and H arenormalized.



Solving the LS Problems

The solution to

min

F

‖ A(1) − F · (H � G )T ‖F

= min

F

‖ AT(1) − (H � G )FT ‖

F

can be obtained by solving the normal equation system

(H � G )T (H � G )FT = (H � G )TAT(1)

Can be solved efficiently by exploiting the ideas in Problem 5.6.


Problem 5.7. Write a Matlab function X = MyKruskal(A,r,itMax)

that takes an order-3 tensor A and returns a ktensor X with the propertythat A ≈ X . X = [[ λ; F , G , H ]] should be obtained by applying thefollowing improvement steps itMax times:

1. Solve (H � G)T (H � G)FT = (H � G)TAT(1) and for j = 1:r set

λj = ‖ F (:, j) ‖2 and F (:, j) = F (:, j)/λj .

2. Solve (H � F )T (H � F )GT = (H � F )TAT(2) and for j = 1:r set

λj = ‖ G(:, j) ‖2 and G(:, j) = G(:, j)/λj .

3. Solve (G � F )T (G � F )HT = (G � F )TAT(3) and for j = 1:r set

λj = ‖ H(:, j) ‖2 and H(:, j) = H(:, j)/λj .

Choose the initial F , G , and H randomly unless you can think ofsomething more clever.


Matlab Tensor Toolbox: The Function cp als

n = [ 5 6 7 ]; rmax = 35;

% Generate a random tensor ...

A = tenrand(n);

for r = 1:rmax

% Find the closest length -r ktensor ...

X = cp_als(A,r);

% Display the fit ...

E = double(X)-double(A);

fit = norm(reshape(E,prod(n),1));

fprintf(’r = %1d, fit = %5.3e\n’,r,fit);

end

The function cp als returns a ktensor. Default values for the number ofiterations and the termination criteria can be modified:

X = cp als(A,r,’maxiters’,20,’tol’,.001)


Problem 5.8. Compare the efficiency of MyKruskal and cp als.


The CP Representation: General Order

Rank-1 Tensors: Definition

If uk ∈ IRnk for k = 1:d , then

B = u1 ◦ u2 ◦ · · · ◦ ud ,

is defined by

B(i1, . . . , id) = u1(i1) · u2(i2) · · · ud(id)

is a rank-1 tensor. Note that B ∈ IRn1×···×nd .



Rank-1 Tensors: Modal Unfoldings

IfB = u1 ◦ u2 ◦ · · · ◦ ud ,

thenvec(B) = ud ⊗ · · · ⊗ u2 ⊗ u1

and

B(k) = uk ⊗ (ud ⊗ · · · ⊗ uk+1 ⊗ uk−1 ⊗ · · · ⊗ u1)T .

Problem 5.9. Suppose B ∈ IRn1×···×nd is the rank-1 tensor defined above.Characterize the unfolding M = tenmat(B, [1:p], [p + 1:d ]) where1 ≤ p < d .



Notation

Given λ ∈ IRr and matrices U1, . . . ,Ud with unit column norms,define

[[ λ;U1, . . . ,Ud ]] =r∑

j=1

λj · U1(:, j) ◦ · · · ◦ Ud(:, j)

Assume that Uk ∈ IRnk×r .

A weighted sum of rank-1 tensors where the vectors that specify therank-1’s are columns of the matrices U1, . . . ,Ud .



The Kruskal Form

We say that X ∈ IRn1×···×nd is in Kruskal form if

X = [[ λ; U1, . . . ,Ud ]].

where λ ∈ IRr and Uk ∈ IRnk×r for k = 1:d .

Can we write a given tensor A as an illuminating sum of rank-1 tensors?I.e., Given A, can we find a tensor X in Kruskal form so that A ≈ X in

some meaningful way?



Equivalent Formulations...

If X = [[ λ; U1, . . . ,Ud ]] ∈ IRn1×···×nd , then

X (i1, . . . , id) =r∑

j=1

λj · U1(i1, j) · · ·Ud(id , j)

vec(X ) =r∑

j=1

λj · Ud(:, j)⊗ · · · ⊗ U1(:, j)


Matlab Tensor Toolbox: ktensor Operations

function X = KruskalRandn(n,r)

% Creates a random order -d, length -r

% ktensor having size determined

% by the length -d integer vector n. The

% columns of X.U{1},...,X.U{r} have unit

% 2-norm.

U = cell(r,1);

lambda = ones(r,1);

for k=1:d

U{k} = randn(n(k),r);

end

X0 = ktensor(lambda ,U);

X = arrange(X0);

The function arrange normalizes the columns of the matrices that define

X0 so that they have unit length. The weight vector is adjusted so that

double(X) and double(X0) have the same value.


Problem 5.10. Implement a function Y = MyArrange(X) that normalizesa ktensor X in the same way as arrange.



Unfolding Tensors in Kruskal Form

If

X = [[ λ; U1, . . . ,Ud ]] =r∑

j=1

λj · U1(:, j) ◦ · · · ◦ Ud(:, j)

then

X(k) = Uk · diag(λi ) · (Ud � · · · � Uk+1 � Uk−1 � · · · � U1)T

Note that Khatri-Rao products can be sequenced:

F � G � H = (F � G )� H = F � (G � H).


The CP Representation: General Case

The CP Approximation Problem

Given A ∈ IRn1×···×nd and r , determine

X = [[ λ; U1, . . . ,Ud ]] ∈ IRn1×···×nd

so that‖ A − X ‖F = ‖ A(k) − X(k) ‖F

is minimized where

X(k) = Uk · diag(λi ) · (Ud � · · · � Uk+1 � Uk−1 � · · · � U1)T


The CP Representation: General Case

The Alternating Least Squares Framework

for k = 1:d

Fix U1, . . . ,Uk−1,Uk+1, . . . ,Ud .

Improve λ and Uk by minimizing

‖ Ak − Uk (Ud � · · · � Uk+1 � Uk−1 � · · · � U1) ‖

for j = 1:r

λj = ‖ Uk(:, j) ‖

Uk(:, j) = Uk(:, j)/λj

end

end


Problem 5.11. Assume that Bk ∈ IRnk×r for k = 1:d . (a) Show how tocompute

d = (B1 � · · · � Bd)Td

efficiently where d ∈ IRN with N = n1 · · · nd . (b) Show how to computeefficiently

C = (B1 � · · · � Bd)T (B1 � · · · � Bd).

(c) Write a Matlab function x = KRLS(B,d) that solves the leastsquares problem

min‖ (B1 � · · · � Bd)x − d ‖Assume that B is a cell array that houses the matrices B1, . . . , Bd . (SeeProblem 5.6.)

Problem 5.12. Refer to Problem 5.7 and develop a general order versionof X = MyKruskal(A,r,itMax) based on the preceding alternating leastsquares framework. Take advantage of the ideas in Problem 5.11. Compareyour implementation with cp als.


Tensor Rank

What About r?

In the CP approximation problem we have assumed that r , thelength of the approximating ktensor, is given:

A ≈ X =r∑

j=1

λjU1(:, j) ◦ · · · ◦ Ud(:, j)

We can think of X as a rank-r approximation to A.


Tensor Rank

Departure from Matrix Case...

Suppose

Xr =r∑

j=1

λjU1(:, j) ◦ · · · ◦ Ud(:, j)

is the best rank-r approximation of A and

Xr+1 =r+1∑j=1

λj U1(:, j) ◦ · · · ◦ Ud(:, j)

is the best rank-(r + 1) approximation of A.

IT DOES NOT FOLLOW THAT Xr+1 is Xr plus a rank-1.

In this regard, the best Kruskal approximation is not SVD-like.


Tensor Rank

Definition

The rank of a tensor A is the smallest number of rank-1 tensorsthat sum to A.

This agrees with the definition for matrices. But there are somedifferences that make tensor rank a more complicated issue...


Tensor Rank

Anomaly 1

The largest rank attainable for an n1-by-...-nd tensor is called themaximum rank. It is not a simple formula that depends on thedimensions n1, . . . , nd . Indeed, its precise value is only known forsmall examples.

Maximum rank does not equal min{n1, . . . , nd} unless d ≤ 2.


Tensor Rank

Anomaly 2

If the set of rank-k tensors in IRn1×···×nd has positive Lebesguemeasure, then k is a typical rank.

Size Typical Ranks

2× 2× 2 2,33× 3× 3 43× 3× 4 4,53× 3× 5 5,6

For n1-by-n2 matrices, typical rank and maximal rank are both equal tothe small of n1 and n2.


Tensor Rank

Anomaly 3

The rank of a particular tensor over the real field may be differentthan its rank over the complex field.

Anomaly 4

A tensor with a given rank may be approximated with arbitraryprecision by a tensor of lower rank. Such a tensor is said to bedegenerate.


Problem 5.13. For various small choices for n = [n1, . . . , nd ], see if youcan discover the typical rank possibilities by using cp als. In particular, fora randomly generated A, compute the smallest r such that

‖ A − cp als(A,r) ‖F ≤ 10−6

By running a sufficient number of examples, see what you can deduceabout the typical rank of tensors in IRn1×···×nd .


Summary of Lecture 5.

Key Words

An order-d rank-1 tensor is the outer product of d vectors.

A tensor in Kruskal form has length r , if it is the sum of rrank-1 tensors. For an order-d tensor, the vectors that makeup the rank-1’s are specified as columns from d matrices.

The CP approximation problem for a given tensor A and agiven integer r involves finding the nearest length-r ktensor toA in the Frobenius norm.

The alternating least squares framework is used by cp als tosolve the CP approximation problem. It proceeds by solving asequence of structured linear least squares problems.