+ All Categories
Home > Documents > Low-rank tensor discretization for high-dimensional problems · tightly connected to large data...

Low-rank tensor discretization for high-dimensional problems · tightly connected to large data...

Date post: 28-Feb-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
33
f (x, y, z )=1+ ε cos(k x x) cos(k y y) cos(k z z ). (x 1 ,...,x Nx )(y 1 ,...,y Ny ) (z 1 ,...,z Nz ) X i,j,k = f (x i ,y j ,z k ), 1 i N x , 1 j N y , 1 k N z . N x N y N z X = 1 1 1 1 1 1 + ε cos(k x x 1 ) ε cos(k x x Nx ) cos(k y y 1 ) cos(k y y Ny ) cos(k z z 1 ) cos(k z z Nz )
Transcript
Page 1: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Low-rank tensor discretization for high-dimensional

problems

Katharina Kormann

August 6, 2017

1 Introduction

Problems of high-dimensionality appear in many areas of science. High-dimensionality is usuallytightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using tensor products is a natural way to generalize one-dimensional methodsto higher dimensions. Therefore, high-dimensional problems often appear as tensors.

Some problems of high dimensionality:

• Boltzmann or Vlasov equations: 6d phase-space.

• Schrödinger equation: position of particles in a molecule.

• Financial mathematics: Pricing of an option depending on multiple assets (BlackScholesmodel).

• Multi-dimensional data in psychometrics, chemometrics (factor analysis), data mining,...

• Parameter-dependent problems, e.g. uncertainties in input variables.

1.1 Tensor decomposition and tensor approximation

Let us consider an example of the three-variate function

f(x, y, z) = 1 + ε cos(kxx) cos(kyy) cos(kzz). (1)

If we discretize this function on a tensor product domain with a tensor product grid with 1d gridpoints (x1, . . . , xNx), (y1, . . . , yNy), and (z1, . . . , zNz), we get the three-way tensor with elements

Xi,j,k = f(xi, yj, zk), 1 ≤ i ≤ Nx, 1 ≤ j ≤ Ny, 1 ≤ k ≤ Nz. (2)

A representation as a full tensor contains NxNyNz data points. The data points, thus, increaseexponentially with increasing dimension of the variables. This is usually referred to as the curseof dimensionality.

On the other hand, one could represent the solution as a sum of tensor products in thefollowing way

X =

1...1

⊗1...1

⊗1...1

+

ε cos(kxx1)...

ε cos(kxxNx)

⊗ cos(kyy1)

...cos(kyyNy)

⊗ cos(kzz1)

...cos(kzzNz)

This is a representation of the tensor as a low-rank (rank-2) tensor.

1

Page 2: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

2 4 6 8 10 12 14

x

-6

-4

-2

0

2

4

6

v

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

(a) Data.

0 5 10 15 20 25 30 3510

-20

10-15

10-10

10-5

100

105

(b) Singular values.

2 4 6 8 10 12 14

x

5

10

15

20

25

30 -0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

(c) Left singular vectors.

-6 -4 -2 0 2 4 6

v

10

20

30

40

50

60

70

80

90

100 -0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

(d) Right singular vectors.

Figure 1: SVD of function (3) on grid of 32× 100 points.

In general, such an analytical tensorization is not possible. However, one can possibly stillnd a low-rank approximation, i.e. a low-rank tensor that gives a good approximation tothe original tensor. In order to get the general idea, let us consider the following two-variatefunction

f(x, v) =1√

2πTi(x)exp

(− v2

2Ti(x)

), x ∈ [0.1, 14.5] (3)

with Ti(x) = exp (−0.4tanh ((x− 7.7)/8)) taken from the initial conguration of a simulationof the drift kinetic equations in plasma physics [4].

There is no representation as sum of tensor products.Idea for compression: Compute singular value decomposition and skip data belonging to smallsingular values.

Figure 1 shows the function discretized on a 32× 100 tensor grid together with its singularvalues and vectors. Figures 2 shows the quality of the compressed tensor with 5 singular values.

1.2 Historical overview

Singular value decomposition and principle component analysis are well-known concepts forcompression and to identify properties for matrix-valued data. Interest in tensor decompositionrst arose in the eld of psychometrics with the introduction of the Tucker format [26] and the

2

Page 3: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

uncompressed

2 4 6 8 10 12 14

x

-5

0

5

v

0.1

0.2

0.3

0.4

compressed

2 4 6 8 10 12 14

x

-5

0

5

v

0

0.1

0.2

0.3

0.4

error

2 4 6 8 10 12 14

x

-5

0

5

v

-2

0

2

×10-6

0 5 10 15

x

-0.5

0

0.5left SV

0 10 20 30 4010

-20

100

1020

singular values

-10 -5 0 5 10

v

-0.4

-0.2

0

0.2right SV

Figure 2: Compression with ve singular values.

CANDECOMP/PARAFAC format [3, 16]. Since then tensor decompostion has been appliedin a wide range of settings in psychometrics and chemometrics for extracting properties frommultidimensional data arrays (see [19] for a review).

Another application of low-rank tensor decompositions is in computational sciences withproblems that depend on multiple variables. A grid-based discretization of such problemssuers from the curse of dimensionality. However, in many cases there are preferred directionssuch that low-rank tensor formats can oer a compressed representation of the quantities ofinterest. These ideas arose rst in the quantum dynamics community where many-body wavefunctions describing the interaction of nuclei are propagated in time according to the time-dependent Schrödinger equation. In the so-called multi-conguration time-dependent Hartreemethod (MCTDH) [22] this many-body wave function is represented by linear combinations oftensor products of single-particle wave functions.

Only recently, the numerical analysis community took up similar ideas. Low-rank tensorformats have been systematically studied and new formats have been introduced. Oseledetspresented the so-called tensor train format [23] and Grasedyck [11] and Hackbusch & Kühn[15] introduced a hierarchical version of the Tucker format. During the last years, both formatshave been successfully used for the solution of high-dimensional partial dierential equations invarious elds and the solution of parameter dependent problems in optimization and uncertaintyquantication (see [13] for references). Hackbusch provides a textbook on the topic of tensorcalculus [14].

1.3 Examples of successful tensor compression

VlasovPoisson equation The Vlasov equation describes the evolution of a plasma in itsself-consistent elds. The distribution function is given in six-dimensional phase-space plustime, f(x,v, t) and the elds are computed from Maxwell's equations or in a simplied model bya Poisson equation. The VlasovPoisson equation for electrons in neutralizing ion background

3

Page 4: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

is given as

∂tf(x,v, t) + v · ∇xf(x,v, t) + E(x, t) · ∇vf(x,v, t) = 0, x ∈ Rd,v ∈ Rd,

−∆φ(x) = 1−∫f(x,v)dv,

E(x) = −∇φ(x).

(4)

Note that the dimension d is three in the general form but lower-dimensional simplicationsare also considered.

In [20], the solution of the VlasovPoisson equation with f represented as low-rank tensorsis considered. Table 1 shows the computing times and compression rates for the low-rank tensorsolution compared to the full grid for a tensor product grid of 32d × 128d points.

Table 1: Results from [20, Table 3] for low-rank tensor solution of VlasovPoisson equation.

dim method # doubles for f fraction wall time [s] fraction

2D FG 4096 1.4 · 101

2D TT 2720 0.66 1.8 · 101 1.34D FG 1.7 · 107 6.2 · 104

4D TT 5.9 · 104 3.5 · 10−3 2.7 · 102 4.4 · 10−3

6D TT 7.1 · 105 1.0 · 10−5 6.6 · 103

Cookie problem See [1].

2 Notation and denitions

Denition 1. Let V , W be two vector spaces over some eld K. Then, the algebraic tensorspace is given by

V ⊗aW = spanv ⊗ w : v ∈ V,w ∈ W.The topological tensor space is given by the closure with respect to a given norm

V ⊗‖·‖W = V ⊗aW.

Remark 1. • For two vectors v, w, ⊗ denotes the vector outer product dened by v⊗w =vw>.

• Note that V ⊗‖·‖W = V ⊗aW for nite dimensional vector spaces. In this case, we omitthe index of the ⊗ symbol. We will mostly consider KI1×I2×...×Id where K = R or K = C.

• Tensor spaces are again vector spaces.

• We will denote tensors by calligraphic letters X .

• Let vi : i ∈ BV be a basis of V and wj : j ∈ BW a basis of W . Then

B := vi ⊗ wj : i ∈ BV , j ∈ BW

is a basis of V ⊗aW .

• dim(V ⊗aW ) = dim(V ) · dim(W ).

4

Page 5: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

• Repeating the product construction d−1 times, we get the dth order algebraic tensor space

a

⊗dj=1 Vj.

• A tensor space ⊗Ni=1Vi is called non-degenerate, if d > 0 and dim(Vj) ≥ 2 for all 1 ≤ j ≤N . Otherwise, it is called degenerate. If not stated otherwise, we are always consideringnon-degenerate tensor spaces.

Lemma 1. For any tensor X ∈ V ⊗a W there is an r ∈ N0 and linear independent vectorsv1, . . . , vr ⊂ V and w1, . . . , wr ⊂ W such that

X =r∑i=1

vi ⊗ wi.

For a⊗d

j=1 Vj, all linear combinations of r elementary tensors are contained in the followingset

Rr := Rr(a

d⊗j=1

Vj) :=

r∑i=1

v(1)i ⊗ . . .⊗ v(d)

i : v(j)i ∈ Vj

. (5)

Denition 2. The tensor rank of V ∈ a

⊗dj=1 Vj is dened by rank(V) := minr : V ∈ Rr ∈

N0.

Note that determining the rank of a tensor is NP hard in general (cf. [17]).Some further notion:

• order of a tensor: number of dimensions/modes/ways

• ber of a tensor: xing all indices but one; higher-order analogue of matrix rows/columns(e.g. Xi,:,j is a 2-ber of a three way tensor).

• slice of a tensor: xing all indices but two (e.g. Xi,:,:).

For normed vector spaces (Vj, ‖‖j), the norm of the tensor space ‖·‖ ⊗di=1 Vj is called acrossnorm if for all v(j) ∈ Vj satisfy

∥∥⊗di=1v(j)∥∥ =

d∏j=1

‖v(j)‖j. (6)

For nite dimensional tensor spaces, we mostly use the following denition of the norm:

Denition 3. We dene the norm‖ · ‖ of a tensor X ∈ KI1×...×Id as the square root of the sumof the squares of all its elements,

‖X‖ =

√√√√ I1∑i1=1

· · ·Id∑id=1

X 2i1,...,id

. (7)

Note that ‖ · ‖ is the Frobenius norm in the matrix case.

5

Page 6: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

2.1 Vectorization and Matricization

The matricization (or unfolding or attening) Xα rearranges a dth order tensor into a matrixdividing the index set 1, . . . , d into two non-empty, disjoint sets α, αc = 1, . . . , d\α. Inparticular, the n-mode matricization with α = n arranges the n-bers of the tensor in thecolumns of the matrix X(n).

Denition 4. Let X ∈ KI1×...×Id and α ⊂ 1, . . . , d. Then, we introduce the notation Iα =×k∈αIk and Iαc = ×k/∈αIk. The matricization X(α) maps a tensor element (i1, . . . , id) to the

matrix element (jα, jαc), where jα = 1 +∑

k∈α(ik − 1)Jk, Jk =∏k−1

m=1,m∈α Im, and jαc =

1 +∑

k∈αc(ik − 1)Lk, Lk =∏k−1

m=1,m/∈α Im. In particular, if α = n, the columns of X(α) = X(n)

are the mode-n bers of X .Denition 5. Let X ∈ KI1×...×Id and U ∈ KJ×In. Then, the multiplication of each mode-nber by U is called the n-mode product, X ×n U . Elementwise it is dened by

(X ×n U)i1...in−1jin+1...id =In∑in=1

Uj,inXi1...in−1inin+1...id . (8)

In order to illustate the matricization, we consider the three-way tensor X ∈ R4×3×2 withslices:

X:,:,1 =

1 5 92 6 103 7 114 8 12

, X:,:,2 =

13 17 2114 18 2215 19 2316 20 24

. (9)

The three mode-n matricizations are then given as

X(1) =

1 5 9 13 17 212 6 10 14 18 223 7 11 15 19 234 8 12 16 20 24

, X(2) =

1 2 3 4 13 14 15 165 6 7 8 17 18 19 209 10 11 12 21 22 23 24

,

X(3) =

(1 2 3 4 . . . 10 11 1213 14 15 16 . . . 22 23 24

).

(10)Another way to rearrange a tensor is the vectorization:

Denition 6. The vectorization of X ∈ KI1×...×Id maps the element (i1, . . . , id) of X to thevector element j = 1 +

∑dk=1(ik − 1)Jk with Jk =

∏k−1m=1 Im.

Revisiting the example (9), we get the vectorization

vec(X ) =

1...

24

. (11)

2.2 Matrix products

Denition 7. Let A ∈ KI1×I2 and B ∈ KJ1×J2. Then, the Kronecker product A ⊗ B ∈K(I1J1)×(I2J2) is dened as

A⊗B =(A:,1 ⊗B:,1 A:,1 ⊗B:,2 . . . A:,1 ⊗B:,J2 A:,2 ⊗B:,1 . . . A:,I2B:,J2

)=

a11B a12B . . . a1I2Ba21B a22B . . . a2I2B...

.... . .

...aI1,1B aI1,2B . . . aI1I2B

.(12)

6

Page 7: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Note that the Kronecker product is related to the mode-n products in the following way:

Y = X ×di=1 Ai ⇔ vec(Y) = (Ad ⊗ . . .⊗ A1)vec(X ).

We note that the dimensions are ordered backwards in the denition of the Kronecker product.

Denition 8. Let A ∈ KI1×I2 and B ∈ KJ1×I2. Then, the Khatri-Rao product AB ∈ K(I1J1)×I2

is dened asAB =

(A:,1 ⊗B:,1 A:,2 ⊗B:,2 . . . A:,I2 ⊗B:,I2

). (13)

The Khatri-Rao product can be described as the matching columnwise Kronecker product.

Denition 9. Let A,B ∈ KI1×I2. Then, the Hadamard or elementwise product A ∗B ∈ KI1×I2

is dened as

A ∗B =

A1,1B1,1 A1,2B1,2 . . . A1,I2B1,I2

A2,1B2,1 A2,2B2,2 . . . A2,I2B2,I2...

.... . .

...AI1,1BI1,1 AI1,2BI1,2 . . . AI1,I2BI1,I2

. (14)

Let us collect some properties of these matrix-vector products. We denote bei A† theMoorePenrose pseudoinverse of a matrix A.

• (A⊗B)(C ⊗D) = AC ⊗BD,

• (A⊗B)† = A† ⊗B†,• (AB)H(AB) = AHA ∗BHB,

• (AB)† = ((AHA) ∗ (BHB))†(AB)H .

The KhatriRao product can also be used to compactly express the mode-n unfolding of arank-r tensor X =

∑ri=1⊗dν=1a

(ν)i :

X(n) = A(n)(A(d) . . . A(n+1) A(n−1) . . . A(1)

)>, (15)

where A(ν) = [a(ν)1 . . . a

(ν)r ] is the matrix formed by the r elementary vectors for dimension ν.

3 Some results about matrices

Matrices are two-way tensors. Since matrices are ubiquious and tensor calculus is a generaliza-tion of matrix computations, let us rst shortly study matrices.

Let n,m ∈ N be nite index sets and A ∈ Kn×m a real (K = R) or complex (K = C) valuedmatrix. We denote matrices by capital letters, Ai,j denotes the matrix element (i, j) and Ai,:,A:,j denote the ith line and jth column, respectively.

Denition 10. For a matrix A ∈ Kn×m, we dene the matrix rank r = rank(A) by one of thefollowing, equivalent statements:

• r = dim range(A),

• r = dim range(AT ),

• r is the maximal number of linearly independent rows of A,

• r is the maximal number of linearly independent columns of A.

The maximal rank of A ∈ Kn×m is given by

rmax := min(n,m). (16)

Matrices with rank equal to rmax are called full-rank matrices. Two vectors a ∈ Kn and b ∈ Km

form a rank-1 matrix A = abH ∈ Kn×m.

7

Page 8: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

3.1 Matrix decompositions

QR decomposition. For A ∈ Kn×m the QR factorization is given by a unitary matrixQ ∈ Kn×n and an upper triangular matrix R ∈ Kn×m with A = QR.

The matrix Q can be constructed as a product of Householder reections. The cost ofcomputing R is approximately 2mnN− 2

3N3 (where N = min(n,m)). Constructing Q explicitly

from the Householder reections can be done at the cost of 43n3.

For n > m, it holds that

R =

(R′

0

)(17)

with R′ ∈ Km×m. Splitting Q in the same way Q =(Q′Q′′

)with Q′ ∈ Kn×m, we get the

reduced QR factorization Q′R′. For rank-decient matrices of rank r, the factorization can befurther reduced to Q′ ∈ Kn×r and R′ ∈ Kr×m.

Singular Value Decomposition (SVD). For A ∈ Kn×m the singular value decompositionis dened by two unitary matrices U ∈ Kn×n and V ∈ Km×m and a diagonal rectangular matrixS ∈ Kn×m such that A = USV H and the diagonal entries σi of S are in decreasing orderσ1 ≥ σ2 ≥ . . . ≥ σmin(m,n) ≥ 0.

Lemma 2. The Frobenius norm and the spectral norm can be computed from the singular

values: ‖A‖F =

√∑min(n,m)i=1 σ2

i and ‖A‖2 = σ1.

Remark 2. • To leading order the cost of computing the SVD is O(nmN), where N =min(n,m). The SVD is usually computed by rst computing a bidiagonal matrix (e.g.using Householder reections) and then iteratively with an eigenvalue algorithm (variousvariants are possible).

• The singular value decomposition is not unique. For instance, each pair of columns U:,i,

V:,i may be rescaled by z ∈ K, |z| = 1 to be U:,i = zU:,i and V:,i = 1zV:,i.

• For a detailed discussion on SVD see e.g. [7, Sec. 5.4.5].

Reduced SVD. For a matrix A ∈ Kn×m, dene r := max(i : σi > 0). It can be shown thatr = rank(A) and

A =r∑i=1

σiU:,iVH

:,i = U ′Σ′V ′H , (18)

where U ′ = U:,1:r, Σ′ = Σ1:r,1:r, and V′ = V:,1:r.

One sided SVD. For A ∈ Kn×m the left-singular vectors are the eigenvectors of AAH andthe right-singular vectors are the eigenvectors of AHA. Especially, when m n it can beattractive to only compute the left-singular vectors.

3.2 Low-rank approximation of matrices

Problem 1. Approximation problem For A ∈ Kn×m and k < rank(A) nd R such that

R = minrank(R)≤k

‖A−R‖. (19)

For matrices, such a best rank-k approximation in the Frobenius and the spectral norm canbe constructed from the SVD:

8

Page 9: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Theorem 1. EckartYoungMirsky theorem Let A ∈ Kn×m with SVD A = UΣV H and let0 ≤ k ≤ min(n,m). Dene

R := UΣkVH (20)

with

(Σk)i,j =

σi for i = j ≤ k,0 otherwise.

(21)

Then, it holds that

• ‖A− R‖2 = σk+1 and R = minrank(R)≤k ‖A−R‖2.

• ‖A− R‖F =√∑min(n,m)

i=k+1 σ2i and R = minrank(R)≤k ‖A−R‖F .

R is called a best rank-k approximation.

Proof. We only prove the result for the spectal norm here. Let us rst dene N := min(n,m)Using the denition of R, we get

‖A− R‖2 =

∥∥∥∥∥N∑

i=k+1

σiU:,iVH

:,i

∥∥∥∥∥2

= σk+1. (22)

Consider now any R with rank(R) = k. Then, there exists a linear combination w =∑k+1

i=1 γiV:,i

such that Rw = 0 and ‖w‖2 = 1. Then, we can estimate ‖A−R‖22 as follows

‖A−R‖22 ≥ ‖(A−R)w‖2

2 = ‖Aw‖22 =

k+1∑i=1

γ2i σ

2i ≥ σ2

k+1. (23)

We have thus shown ‖A−R‖2 ≥ σk+1 = ‖A− R‖2.The Frobenius norm of A− R can be computed as

‖A− R‖F =

∥∥∥∥∥N∑

i=k+1

σiU:,iVH

:,i

∥∥∥∥∥F

=

√√√√ N∑i=k+1

σ2i . (24)

For the optimality proof, we refer to [14, Sec. 2.6].

Lemma 3. Let A(ν) ∈ Kn×m converge to the matrix A and denote the SVDs by A(ν) =U (ν)Σ(ν)(V (ν))H . Then there exists a subsequence νi; i ∈ N ⊂ N such that

U (νi) → U, Σ(νi) → Σ, V (νi) → V. (25)

Furthermore, it holds A = UΣV H .

Proof. The idea of the proof is to use that eigenvalues continuously dependent on the matrix forthe convergence of the singular values. For the singular vectors, one considers the space of thesequence of columns U (ν)

:,i ; ν ∈ N one-by-one. Since ‖U (ν):,i ‖2 = 1 for all ν, the set is precompact

and has a convergent subsequence U:,i = limU(ν):,i . It also holds ‖U:,i‖2 = lim ‖U (νi)

:,i ‖2 = 1.

The following result follows from the above lemma:

Corrolary 1. Let A(ν) → A. Then, there are best rank-k approximations R(ν) according to(20), so that a subsequence R(νi) converges to a best rank-k approximation of A.

9

Page 10: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

4 CP format

One possible representation of a multi-dimensional tensor is to factorize into a sum of rank-onetensors, i.e.

X =R∑r=1

⊗Nn=1a(n)r , (26)

where R ∈ N, a(n)r ∈ KIn for r = 1, . . . , R and n = 1, . . . , N .

This format has been proposed by various authors under the following names: polyadicform of a tensor, PARAFAC (parallel factors), CANDECOMP or CAND (canonical decompo-sition), topographic components model, CP (CANDECOMP/PARAFAC), r-term representa-tion/approximation (cf. [19]).

The set of all rank-R tensors was dened in (5). The storage complexity of the CP formatis R · d · n, if n is the dimension of each one-dimensional vector space.

4.1 Approximation problems

An approximation in CP format can be found according to one of the following two approxi-mation problems:

Problem 2. AP1. Given X ∈ ⊗di=1Vi and R ∈ N0, determine Y ∈ RR minimizing ‖X − Y‖.

Problem 3. AP2. Given X ∈ ⊗di=1Vi and ε > 0, determine Y ∈ RR with ‖X − Y‖ ≤ ε forminimal R.

It can be shown that AP2 always has a solution. However, we still might want to ask forthe best approximation for the identied R. This is then again equivalent to AP1.

Let us now consider AP1. First, we note that, for a given tensor X , we can restrict thesearch for a minimizer to the bounded subset

RR,X := RR ∩Y ∈ ⊗di=1Vi : ‖Y‖ ≤ 2‖X‖

. (27)

Otherwise if Y ∈ RcR,X , it holds

‖X − Y‖ ≥ ‖Y‖ − ‖X‖ > 2‖X‖ − ‖X‖ = ‖X − 0‖, (28)

i.e. 0 would then be a better approximation than Y .Recalling the EckartYoungMirsky theorem, we know that AP1 based on the Frobenius

and the specral norm has a solution for the matrix case. More general, one can show that

Proposition 1. For d = 2, AP1 has a solution.

Unfortunately, however, this result does not extend to general dimensions.For a non-degenerate tensor space V1⊗ V2⊗ V3, De Silva and Lim [5] constructed a counter

example:

Example 1. Let v(j), w(j) ∈ Vj, j = 1, 2, 3, be linearly independent vectors and construct thetensor

X := v(1) ⊗ v(2) ⊗ w(3) + v(1) ⊗ w(2) ⊗ v(3) + w(1) ⊗ v(2) ⊗ v(3), (29)

and, for n ∈ N, dene

Xn :=(w(1) + nv(1)

)⊗(v(2) +

1

nw(2)

)⊗ v(3) + v(1) ⊗ v(2) ⊗

(w(3) − nv(3)

). (30)

10

Page 11: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

For n ∈ N, we have ‖X − Xn‖ = − 1nw(1) ⊗ w(2) ⊗ v(3). Hence, limn→∞Xn = X . On the other

hand,3 = rank(X ) = rank(limXn) > rank(Xn) = 2. (31)

Hence, the set R2 is not closed.

This counterexample extends to higher dimensional tensor spaces, since the tensor space oforder three is embeded. Moreover, de Silva and Lim have shown that the set of tensors withoutminimizer in RR are not of measure zero.

This leads to the following notion of the border rank

Denition 11. For a tensor X ∈ ⊗Ni=1Vi the border rank is dened as

rank(X ) := minr;X ∈ Rr ∈ N0. (32)

4.2 Greedy algorithm

Another special case where the best approximation exists, is the case R = 1 in the nite-dimensional case:

Lemma 4. For each X ∈ KI1×...×Id there are tensors Ymin ∈ R1 minimizing

‖X − Ymin‖ = minY∈R1

‖X − Y‖. (33)

Proof. For the proof, we refer to [14, Lemma 9.3].

The fact that local minimizers can appear renders the practical computation more dicult.Based on this result, a greedy iterative algorithm can be designed that, in each step, seeks thebest rank-one approximation to the residual. However, this can in general lead to rather poorapproximation.

4.3 Alternating least squares method (ALS)

A CP approximation is often computed based on the alternating least squares method whichis an algorithmic idea that is used in various respects in low-rank tensor calculus. Therefore,we will rst discuss the general algorithmic structure.

Problem 4. Let Ω be an ordered index set and Φ a real-valued function of variables x :=(xω)ω∈Ω. Find a minimizer x∗ of Φ(x).

Assuming that the minimization with respect to one variable at the time is easier, thefollowing iterative algorithm is called an alternating least-squares method:

Algorithm 1 Alternating Least-Squares Method.

Require: Φ, initial guess x(0).for m = 1, 2, . . . do

for i = 1, . . . , d do

x(m)i := minξ Φ

(x

(m)1 , . . . , x

(m)i−1, ξ, x

(m−1)i+1 , . . . , x

(m−1)d

).

end forend forreturn x(mmax)

Some comments on the algorithm:

11

Page 12: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

• A symmetric version can be obtained by replacing the inner loop by i = 1, . . . ,#Ω −1,#Ω,#Ω− 1, . . . , 2.

• Instead of optimizing over a single variable at the time, one can also optimize withrespect to two or more variables jointly. These groups of variables may also overlap. Theoverlapping ALS algorithm that optimizes over (x1, x2), (x2, x3), (x3, x4), . . . is referredto as MALS (modied alternating least squares). In the quantum physics community thisalgorithm is known as DMRG (density matrix renormalization group).

• The minimization problem does not have to be solved exactly in such an iterative algo-rithm.

• The convergence properties of this algorithm depend on the properties of the function Φ.

Let us now reformulate the problem of computing a CP approximation to a tensor based onthe ALS method. For this, we represent the mode-n unfoldings of X based on the RaoKhatriproduct:

X(n) = A(n)(A(d) . . . A(n+1) A(n−1) . . . A(1)

)>. (34)

In each step of the inner ALS iteration, we need to nd the optimal A(n) xing all other modeframes A(i), i 6= n, i.e., we need to solve the following optimization problem

minB∈KIn×r

‖X(n) −B(A(d) . . . A(n+1) A(n−1) . . . A(1)

)> ‖F . (35)

This is a least-squares problem and the solution can be expressed by the MoorePenrose pseu-doinverse

B∗ =X(n)

(A(d) . . . A(n+1) A(n−1) . . . A(1)

)((A(d))>A(d) ∗ . . . ∗ (A(n+1))>A(n+1) ∗ (A(n−1))>A(n−1) ∗ . . . ∗ (A(1))>A(1)

)†.

(36)

Hence, the solution of the subproblems in the ALS algorithm can be easily computed.

5 Tucker decomposition

The Tucker decompositon is a form of high-order principle component analysis. It was intro-duced by Tucker for three-way tensors as three-mode factor analysis. Later it was also calledthree-mode PCA and higher-order SVD or N-mode SVD in the general case. In the Tuckerformat a tensor is transformed into a core tensor multiplied by a matrix along each mode

X = G ×dn=1 U(n). (37)

We can also express (37) in the following ways

• Elementwise: Xi1i2...id =∑r1

j1=1 . . .∑rd

jd=1 Gj1...jdU(1)i1j1· · ·U (d)

idjd, in = 1, . . . , In, n = 1, . . . , d.

• Vectorization: vec(X ) =(U (d) ⊗ . . .⊗ U (1)

)vec(G).

• µ-mode matricization:

X(µ) = U (µ)G(µ)

(U (d) ⊗ . . .⊗ U (µ+1) ⊗ U (µ−1) ⊗ U (1)

)>. (38)

From the representation in (38) we see that

rank(X(µ)) ≤ rµ. (39)

This motivates the denition of the multilinear rank

12

Page 13: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Denition 12. Let X ∈ KI1×...×Id. The n-rank Rn of X is dened as the column rank of X(n).The multilinear rank of X is given as (R1, . . . , Rd) with Rn = rank(X(n)), n = 1, . . . , d.

The storage complexity of the Tucker format is O(rd + nrd). The factor rd is exponentialin the dimension d so that the format is unattractive in high dimensions.

For r ∈ Nd, we denote the set of rank r Tucker tensors by

Tr :=X ∈ KI1×...×Id ; there exists subspaces Vj ⊂ KIj such that dim(Vj) = rj and X ∈ ⊗dj=1Vj

.

(40)

Lemma 5. Closedness. Let X (n) ∈ Tr and X ∈ KI1×...×Id. If limn→∞ ‖X (n) − X‖ = 0, thenX ∈ Tr .Proof. See [14, Lemma 8.6.].

5.1 Computing the Tucker decomposition: higher-order SVD (HOSVD)

For a given multilinear rank (R1, . . . , RN) a rank-(R1, . . . , RN) Tucker decomposition can becomputed based on singular value decompositions:

Algorithm 2 Truncated higher order singular value decomposition (HOSVD).

Require: Tensor X ∈ KI1×...×Id and multilinear rank (R1, . . . Rd).for i = 1, . . . , d do

U (i) Ri leading left singular vectors of X(i)

end forG = X ×di=1 (U (i))>.return Rank (R1, . . . , Rd) Tucker decomposition X = G ×di=1 U

(i)

In contrast to the matrix case, the truncated tensor X resulting from the HOSVD is usuallynot optimal. However, it holds:

Proposition 2. ‖X − X‖ ≤√∑d

j=1

∑Iji=rj+1

(j)i

)2

≤√dminY∈T (r1,...,rN ) ‖X − Y‖.

Proof. Since the mode frames U (j) are all orthonormal, we can view the HOSVD as a productof orthogonal projections Pj = I ⊗ . . . ⊗ I ⊗ U (j)(U (j))H ⊗ I ⊗ . . . ⊗ I. For an orthonormalprojections, we have

‖(I − P1P2)X‖2 = ‖(I − P1)X‖2 + ‖P1(I − P2)X‖2 = ‖(I − P1)X‖2 + ‖(I − P2)X‖2. (41)

Thus, it holds

‖X − X‖2 ≤d∑j=1

‖(I − Pj)X‖2. (42)

Looking at one projection at the time, we are in the matrix case (X(j)) where we have by theEckartYoungMirsky theorem

‖(I − Pj)X‖2 =

Ij∑i=rj+1

(j)i

)2

(43)

and that (I −Pj)X is a best approximation in KI1 ⊗ . . .⊗KIj−1 ⊗V (j)⊗KIj+1 ⊗ . . .⊗KId withdim(V (j)) = rj. Hence, we have also

‖(I − P1)X‖ ≤ ‖X − X‖. (44)

13

Page 14: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

The Tucker decomposition computed from HOSVD can be further improved in a ALSalgorithm, the so-called higher-order orthogonal iteration (HOOI).

While the Tucker decomposition comes with closedness and SVD-based compression, thestorage requirement for the transfer tensor of rd becomes increasingly inattrative. Next, wewill introduce two formats that overcome the exponential storage requirements of the Tuckerformat while retaining its advantages.

6 Hierarchical Tucker format

The Hierarchical Tucker decomposition is a multi-level variant of the Tucker decomposition.The hierarchy is organized by a dimension tree.

Denition 13. A dimension tree is a binary tree T with the following properties:

• each node is a mode cluster, i.e. a union of modes from 1, . . . , d,

• the root note is 1, . . . , d,

• each leaf node is a singleton,

• each parent node is the disjoint union of its two children.

The set of leaf nodes is denoted by L(T ) and the set of non-leaf nodes by N (T ) = T \L(T ).

The number of non-leaf nodes is always d− 1.To simplify the notation, we enumerate the degrees of freedom such that the indices of the

left child node are lower than the indices of the right child node for every parent node. For anode α ∈ T , we denote the left and right child nodes by αl and αr.

The starting point for the hierarchical Tucker format is the following nestedness property

Lemma 6. Let X ∈ KI1×I2×...×Id and α = αl ∪ αr. Then, span(X(α)) ⊂ span(X(αr) ⊗X(αl)).

Proof. For ease of notation, let αl = Il, . . . , Im and αr = Im+1, . . . , Ir. A column of X(α)

can be considered as the unfolding of a tensor Y ∈ KIl×...×Ir . The columns of the unfoldingY(αl) are contained in span(X(αl)). Using the representation of the projection to the range ofthe column space of X(αl) by X(αl)(X(αl))

†, we get Y(αl) = X(αl)(X(αl))†Y(αl). Analogously, we

get (Y(αl))> = Y(αr) = X(αr)(X(αr))

†Y(αr). Dening V := (X(αl))†Y(αl)(X †(αr))>, we get

Y(αl) = X(αl)V (X(αr))> and vec(Y) =

(X(αr) ⊗X(αl)

)vec(V ). (45)

If we have bases Uα ∈ KIα×rα , Uαl ∈ KIαl×rαl and Uαr ∈ KIαr×rαr for the column spacesof X(α), X(αl), and X(αr), respectively, Lemma 6 implies the existence of a transfer matrixBα ∈ Krαlrαr×rα with

Uα = (Uαr ⊗ Uαl)Bα. (46)

Usually, we reshape the transfer matrix to a three-way transfer tensor Bα ∈ Krαl×rαr×rα . Thedenition of a hierarchical Tucker tensor is based on recursive application of this construction.

Example 2. Let us look at the example of d = 4 with balanced dimension tree (cf. Figure 3).Starting from the vectorization, we get

vecX = X(1,2,3,4) = (U34 ⊗ U12)B1234. (47)

14

Page 15: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

1, 2, 3, 4

1, 2

1 2

3, 4

3 4

Figure 3: Dimension tree for four-way hierarchical Tucker tensor (balanced tree).

Next, we further split the nodes 1, 2 and 3, 4 to get

U12 = (U2 ⊗ U1)B12, U34 = (U4 ⊗ U3)B34. (48)

Finally, putting everything together, we can represent the vectorization of X as

vecX = (U4 ⊗ U3 ⊗ U2 ⊗ U1) (B34 ⊗B12)B1234. (49)

For the representation of a tensor in hierarchical Tucker format, we store the mode framesUµ ∈ KIµ×rµ and the transfer tensors B ∈ Krαl×rαr×rα , where rα = 1 for the root node. Foreach node α ∈ T , we can recursively dene the mode frames

Uα = (Uαl ⊗ Uαl)Bα (50)

starting from the leaf frames. For the root node U1,...,d = vec(X ).The storage complexity for a d-dimensional tensor in hierarchical Tucker format with ranks

all equal to r and mode length all equal to n is given by

dnr + (d− 2)r3 + r2. (51)

The rst part is for the d mode frames, the second for the d − 2 non-root three-way transfertensors and the last term for the transfer tensor of the root.

In the following, we denote by X ∈ KI1×I2×...×Id a tensor and by Un the mode frames andby Bα the transfer tensors for each index subset α in the dimension tree.

In the following, we will study the most important algorithms for tensors in hierarchicalTucker format. We will follow the presentation in [21].

Basic operations

• The n-mode product of a tensor X in hierarchical Tucker format, Y = X×nA, correspondsto replacing the n-mode frame Un by AUn.

• The addition of two tensors X [1], X [2] in hierarchical Tucker format corresponds to theconcatenation of the mode frames and a block diagonal embedding of the transfer tensors.Note that this means that the ranks of the individual tensors add up. Figure 4 illustratesaddition of two six-way tensors.

15

Page 16: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Figure 4: Addition of two tensors (blue and green) in hierarchical Tucker format.

6.1 Orthogonalization

Denition 14. An hierarchical Tucker decomposition of a tensor is called orthogonalized ifthe columns of the mode frames Uα form an orthonormal basis for all nodes but the root.

Let us again look at the example of a four-way tensor:

vecX = (U4 ⊗ U3 ⊗ U2 ⊗ U1) (B34 ⊗B12)B1234. (52)

In a rst step, we compute the QR decomposition of the four mode frames: Uα =: UαRα forα ∈ 1, 2, 3, 4. Dening B12 := (R2 ⊗R1)B12 and B34 := (R4 ⊗R3)B34, we can rewrite (52)as

vecX =(U4 ⊗ U3 ⊗ U2 ⊗ U1

)(B34 ⊗ B12

)B1234, (53)

where the mode frames U1,2,3,4 are now orthogonal. We continue by computing the QR de-

composition of the transfer matrices B12 =: B12R12 and B34 =: B34R34. Dening B1234 =(R34 ⊗R12)B1234, we get the following representation of the tensor

vecX =(U4 ⊗ U3 ⊗ U2 ⊗ U1

)(B34 ⊗ B12

)B1234, (54)

This represenation is orthogonalized since

U12 =(U2 ⊗ U1

)B12 (55)

is a product of orthogonal matrices and as such again orthogonal.Generalized to arbitrary dimension, we get the following algorithm for orthogonalization of

tensors:

16

Page 17: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Algorithm 3 Orthogonalization of a tensor in hierarchical Tucker format.

Require: Mode frames Uα and transfer tensors Bα dening a tensor X in hierarchical Tuckerformat.

Ensure: Mode frames Uα and transfer tensors Bα dening an orthogonalized tensor X inhierarchical Tucker format.for α ∈ L(T ) do

Compute QR decomposition Uα =: UαRα.end forfor α ∈ N (T ) except the root (visiting both children before the parent node) do

Form Bα = (Rαr ⊗Rαl)Bα.

Compute QR decomposition Bα =: BαRα.end forRoot node: Form Bα = (Rαr ⊗Rαl)Bα.

The complexity of the orthogonalization algorithm is O(dnr2 + dr4), dened by the com-plexity of the d QR decompositions of the mode frames and the d − 1 QR decompositions ofthe transfer matrices.

6.2 Tensor-tensor contraction

6.2.1 Inner product

The inner product of two tensors X [1],X [2] ∈ KI1×...Id is dened as

〈X [1],X [2]〉 =

I1∑i1=1

. . .

Id∑id=1

X [1]i1,...,id

X [2]i1,...,id

= 〈vec(X [1]), vec(X [2])〉. (56)

For two tensors in hierarchical Tucker format with equal dimension tree, we can express thisoperation in terms of mode frames and transfer matrices. Let us express this again for theexample of the four-way tensor with balanced dimension tree:

〈X [1],X [2]〉 =(B

[1]1234

)H (B

[1]34 ⊗B[1]

12

)H (U

[1]4 ⊗ U [1]

3 ⊗ U [1]2 ⊗ U [1]

1

)H(U

[2]4 ⊗ U [2]

3 ⊗ U [2]2 ⊗ U [2]

1

)(B

[2]34 ⊗B[2]

12

)(B

[2]1234

).

(57)

This product can be evaluated from the inside to the outside, i.e. from the leafs to the root.

On each leaf, we can form Mµ =(U

[1]µ

)HU

[2]µ in 2nµr

2µ operations. Next, we can form on

the non-leaf node α (visiting both children before the parent)

Mα =(B[1]α

)H(Mαr ⊗Mαl)B

[2]α (58)

taking 6r4 operations for constant rank. The total complexity xing nµ = n is then 6(d−1)r4 +2dnr2. The algorithm is summarized for arbitrary dimension in Algorithm 4.

17

Page 18: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Algorithm 4 Inner product of two tensors in hierarchical Tucker format.

Require: Tensors X [1], X [2] in hierarchical Tucker format with equal dimension tree and modesizes.

Ensure: Inner product 〈X [1],X [2]〉.for α ∈ L(T ) do

Form Mα := (U[1]α )H(U

[2]α ).

end forfor α ∈ N (T ) (visiting both children before the parent node) do

Form Mα := (B[1]α )H (Mαr ⊗Mαl) (B

[2]α ).

end forreturn Mαroot

6.2.2 Reduced Gramians

Denition 15. For every α ∈ T , the mode frame contains a basis for the column space of X(α)

and there is a matrix Vα ∈ KIαc×rα (where Iαc =∏

j /∈α Ij) such that Xα = UαVHα . The reduced

Gramian Gα ∈ Krα×rα is dened asGα = V H

α Vα. (59)

The reduced Gramian is a Hermitian positive semi-denite matrix.

Reduced Gramians provide an ecient way to compute the singular values of Xα since thesingular values of Xα are the square root of the eigenvalues of the reduced Gramian if Uα isorthonormal. This can be seen from the following identity

XαXHα = UαV

Hα VαU

Hα = UαGαU

Hα . (60)

Later, we will also see that reduced Gramians are useful in truncation of tensors.Usually, we need the Gramians for each node in the dimension tree. Therefore, we discuss

an algorithm that computes the Gramians for all nodes simltaneously. First of all, we note thatGroot = 1. Now, let us assume the reduced Gramian for a node α is available and we want tocompute the reduced Gramian for the child nodes αl,r. For this, we use

X(α)XH(α) = UαGαU

Hα = (Uαr ⊗ Uαl)BαGαB

Hα (Uαr ⊗ Uαl)H . (61)

From this we see that we can compute the Gramian Gαl ∈ Krl×rl as

(Bα)(1)

(Gα ⊗ UαrUH

αr

)(Bα)H(1) . (62)

18

Page 19: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Algorithm 5 Reduced Gramians of a tensors in hierarchical Tucker format.

Require: Mode frames Uα and transfer tensors Bα dening a tensor X in hierarchical Tuckerformat.

Ensure: Reduced Gramians Gα for all α ∈ T .for α ∈ L(T ) do

Form Mα = UHα Uα.

end forfor α ∈ N (T ) (visiting both children before the parent node) do

Form Mα := (Bα)H (Mαr ⊗Mαl) (Bα).end forSet Groot = 1.for α ∈ N (T ) (visiting parent node before the children) do

Form Gtr = (Bα)(2) (Gα ⊗Mαl) (Bα)H(2).

Form Gtl = (Bα)(1) (Gα ⊗Mαr) (Bα)H(1).end for

6.3 Truncation

Finally, we can consider the question of how to compute a hierarchical Tucker decompositionof a tensor. We consider two truncation algorithms: Firstly, we address the question of how tocompute a hierarchical Tucker representation of a full tensor and, secondly, we discuss how totruncate a given hierarchical Tucker representation to a representation with lower rank. Thelatter is necessary since basic operations like e.g. addition increase the rank of hierarchicalTucker decompositions.

6.3.1 Truncation of a full tensor

In order to compute a hierarchical Tucker decomposition of a tensor X ∈ KI1×...×Id , we can usea hierarchical version of the HOSVD. For each subset α ⊂ 1, . . . , d in the tree, we project tothe subspaces span(Wα) which represent approximations of the column spaces of X(α). For agiven rank rα, the matrices Wα can be constructed as the rα dominant left singular vectors ofXα. For our four-way tensor with balanced tree, we get the following projection X of X :

vec(X ) =(W4W

H4 ⊗W3W

H3 ⊗W2W

H2 ⊗W1W

H1

) (W34W

H34 ⊗W12W

H12

)vec(X )

= (W4 ⊗W3 ⊗W2 ⊗W1)

(WH4 ⊗WH

3 )W34︸ ︷︷ ︸=B34

⊗ (WH1 ⊗WH

2 )W12︸ ︷︷ ︸=B12

(WH34 ⊗WH

12)vec(X )︸ ︷︷ ︸=B1234

(63)

This yields to the root-to-leaf truncation algorithm:

19

Page 20: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Algorithm 6 Root-to-leaf trunction to hierarchical Tucker format.

Require: Full tensor X ∈ KI1×...×Id , dimension tree T j, and desired hierarchical ranks (rα)α∈Tof the truncated tensor.

Ensure: Tensor X in hierarchical Tucker format with rank(Xα) ≤ rα for all α ∈ T .for α ∈ L(T ) do

Compute SVD Xα =: UαΣαVHα .

Set Uα := Uα(:, 1 : rα).end forfor α ∈ N (T )\root (visiting both children before the parent node) do

Compute SVD Xα =: UαΣαVHα .

Set Uα := Uα(:, 1 : rα).Form Bα :=

(UHαr ⊗ UH

αl

)Uα.

end forForm Broot :=

(UHrootr ⊗ UH

rootl

)vec(X ).

Lemma 7. The tensor X computed by Algorithm 6 satises the following error bound

‖X − X‖ ≤

√√√√∑α∈T ′

nα∑i=rα+1

(α)i

)2

≤√

2d− 3 minY∈HT (r1,...,rN )

‖X − Y‖ (64)

where T ′ = T \αroot, αrootl.

Proof. The tensor X is constructed by one projection per node in the tree T except for theroot node. Moreover, the SVD for the two child nodes of the root is the same. This gives thefactor 2d− 3. The estimate for the individual projections is as in Proposition 2.

The error estimate (64) can be used for the construction of a hierarchical Tucker tensor

with a certain error bound. Choosing rα such that

√∑nαi=rα+1

(α)i

)2

≤ εabs√2d−3

the hierarchical

Tucker tensor satises an absolute error bound of εabs. For a relative error bound of εrel, choose

rα such that

√∑nαi=rα+1

(α)i

)2

≤ εrel‖X‖√2d−3

.

6.3.2 Truncation of hierarchical Tucker tensor to lower rank

Next, we consider the question how a tensor in hierarchical Tucker format can be truncatedto a hierarchical Tucker tensor of lower rank. We assume that the original tensor has beenorthogonalized. As for the truncation of the full tensor, we dene projections Wα for each nodein the tree (except the root). In order to construct the projections, we compute the Gramians

XαXHα = UαGαU

Hα . (65)

Since the Uα's are orthonormal, we can base our projection on the rα dominant eigenvectors Sαof Gα. Then, the projection is dened as Wα := UαSα. To illustrate the trunction algorithm,we revisit the 4D example:

vec(X ) =(W4W

H4 ⊗W3W

H3 ⊗W2W

H2 ⊗W1W

H1

) (W34W

H34 ⊗W12W

H12

)vec(X )

= (W4 ⊗W3 ⊗W2 ⊗W1)((WH

4 ⊗WH3 )W34 ⊗ (WH

1 ⊗WH2 )W12

) ((WH

34 ⊗WH12)vec(X )

).

(66)

20

Page 21: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Now, we use that U12 = (U2 ⊗ U1)B12, vec(X ) = U1234 = (U34 ⊗ U12)B1234 as well as thedenition of Wα to get

vec(X ) = (U4S4 ⊗ U3S3 ⊗ U2S2 ⊗ U1S1)((SH4 ⊗ SH3 )B34S34 ⊗ (SH2 ⊗ SH1 )B12S12

) (SH34 ⊗ SH12

)B1234.

(67)The general algorithm is described in Algorithms 7.

Algorithm 7 Truncation of a tensor in hierarchical Tucker format.

Require: Tensor X in hierarchical Tucker format and desired hierarchical ranks (rα)α∈T of thetruncated tensor.

Ensure: Tensor X in hierarchical Tucker format with rank(Xα) ≤ rα for all α ∈ T .Orthogonalize X .Compute reduced Gramians Gα.for α ∈ T \root do

Compute symmetric eigenvalue decomposition Gα =: SαΣ2αS

Hα .

Set Sα := Sα(:, 1 : rα).end forSet Sroot = 1.for α ∈ L(T ) do

Set Uα := UαSα.end forfor α ∈ N (T ) do

Set Bα := (Sαr ⊗ Sαl)BαSα.end for

The complexity of the algorithm is dened by the complexity of its components:

• Orthogonalization: O(dnk2 + dk4).

• Reduced Gramians: O(dk4).

• Eigenvalues of reduced Gramians: O(dk3).

• Computation of Uα: O(dnk2)

• Computation of Bα O(dk4).

Mathematically, the algorithm is identical to the truncation algorithm for a full tensor.Hence, we can use the same error estimates.

7 Tensor Trains

The tensor train format was introduced by Ivan V. Oseledets [23] and is similar to a hierarchicalTucker decomposition with unbalanced tree. In tensor train format the dimensions are orderedin a linear chaina trainand each dimension is represented by a three-way tensor (calledcores) coupling to the prior and following dimension. A tensor element Xi1,...,id is representedby

Xi1,...,id =

r0∑α0=1

r1∑α1=1

· · ·rd∑

αd=1

G1(α0, i1, α1)G2(α1, i2, α3) . . . Gd−1(αd−2, id−1, αd−1)Gd(αd−1, id, αd).

(68)

21

Page 22: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

For the ranks at the boundaries, it holds r0 = rd = 1 and the rst and last cores G1, Gd areessentially matrices. The storage complexity of the tensor train format is (d− 2)nr2 + 2nr.

A tensor train decomposition of a full tensor can be obtained by a chain of SVD decompo-sitions as described in algorithm 8.

Algorithm 8 TT-SVD.

Require: Full tensors X ∈ KI1×...Id , prescribed ranks r1, . . . , rd.Ensure: TT tensor approximation of X represented by cores Gk ∈ Krk−1×Ik×rk .Set remainder matrix C = X(1) and r0 = 1.for k = 1, . . . , d− 1 do

Set N = Πdi=k+1Ii and reshape the tensor C := reshape(C, [rk−1Ik, N ]).

Compute SVD: C =: USV T .Set new core (Gk)(1,2) = U(:, 1 : rk)Set new remainder C = S(1 : rk, 1 : rk)V (1 : N, 1 : rk)

T .end forreturn Cores Gk.

In the TT-SVD, we are performing d− 1 SVD and therefore the following quasi-optimalityresult can be found in analogy to Proposition 2:

Corrolary 2. Let X ∈ KI1×...×Id and X be the result of Algorithm 8 with TT-ranks (r1, . . . , rd−1).Then, it holds that

‖X − X‖ ≤√d− 1 min

Y∈TT (r1,...,rd−1)‖X − Y‖. (69)

Instead of ranks, we can also dene a tolerance. An approximation with relative error boundby ε can be archived by computing in each step a δ-truncated SVD with

Ci = UiSiVi + Ei, Si ∈ Kri×ri such that ‖Ei‖F ≤ δ, (70)

where δ = ε ‖X‖√d−1

.As for the hierarchical Tucker format, a variant of the TT-SVD for recompression of TT

tensors can be designed based on a orthogonalized representation (cf. Algorithm 9). For thealgorithms we have to perform d − 1 QR and SVD decompositions of matrices of size nr ×r. Hence, the complexity is in O(dnr3). Basic operations like addition, multiplication andcontractions can be performed similarly as in the hierarchical Tucker format.

22

Page 23: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Algorithm 9 TT rounding.

Require: Tensor X ∈ KI1×...Id represented in tensor train format represented by cores Gk,required relative tolerance ε.

Ensure: TT tensor approximation of X represented by cores Gk ∈ Krk−1×Ik×rk .Compute truncation parameter δ = ε ‖X‖√

d−1

Gd = Gd.Right-to-left orthogonalization:for k = d, . . . , 2 do

Compute LQ decomposition of (Gk)(1).

Set Gk = Q and Gk−1 := Gk ×3 L.end for

G1 = G1.for k = 1, . . . , d− 1 do

Compute δ truncated SVD: Gk =: HkS, V .Gk+1 = Gk+1 ×1 (V S)T

end forreturn Cores Hk of truncated TT tensor.

Grasedyck and Hackbusch [12] have shown the following rank bounds comparing the tensortrain and the hierarchical Tucker format:

(a) The ranks in hierarchical Tucker format with an arbitrary dimension tree are bound bythe square of the ranks in TT format.

(b) Consider a tensor of dimension d = 2p ≥ 2 in hierarchical Tucker format with completebinary canonical dimension tree. Then the TT-ranks are bound by kdp/2e if the hierarchicalTucker ranks are bound by k.

8 Linear systems

The solution of linear systems is a common problem in the solution of partical dierentialequations. To dene a linear system of equations in a tensor space, let use consider a linearoperator

A = A(x1, x2, . . . , xd, y1, y2, . . . , yd) ∈ KI1×...×Id×I1×...×Id . (71)

The action of the linear operator of the tensor X ∈ KI1×...×Id is dened pointwise as

(AX )y1,...,yd =

I1∑x1=1

. . .

Id∑xd=1

A(x1, . . . , xd, y1, . . . , yd)Xx1,...,xd . (72)

Using unfolding of the matrix and vectorization of the tensors, we can express the solution ofa linear system in a tensor space as matrix-vector product.

A(1,...,d)vecX = vecB. (73)

In this section, we will discuss how to solve such a linear system in tensor formats, especiallythe tensor train format. For this we assume, that there is a representation/approximation ofA, X and B in the tensor format.

23

Page 24: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

8.1 Iterative solution

For large linear systems, iterative solvers like conjugate gradient or GMRES are commonlyused. These solvers rely on successive matrix-vector products. Since matrix-vector productsare dened in tensor formats, an iterative solution of linear systems is also possible in tensorformats. However, the matrix-vector products need to be combined with truncation in orderto avoid growing ranks. In order to achieve fast convergence, we usually need a preconditioner.If we use a tensor format, we need to provide a preconditioner that can be represented in thetensor format. One possibility is to use an approximation in Kronecker form as there exists ananalytic expression for the action of the inverse as we will discuss in the next paragraph.

8.2 Inversion of linear systems with Kronecker product structure

Under certain conditions a simple expression for the inverse of a matrix in Kronecker form canbe [10]

Proposition 3. Let A =∑d

i=1 Ai with

Ai = I ⊗ · · · ⊗ I ⊗ Ai ⊗ I ⊗ . . .⊗ I, Ai ∈ RIi×Ii , (74)

and the spectrum of A contained in the left complex half plane. Then, the inverse can beexpressed as

A−1 = −∫ ∞

0

⊗di=1 exp(tAi)dt. (75)

Proof. For any B with spectrum in the left complex half plane, we have

∂texp(tB) = B exp(tB).

Therefore, it follows that

B

∫ ∞0

exp(tB)dt =

∫ ∞0

∂texp(tB)dt = − exp(0B) = −I.

Moreover, for t > 0, it holds since the matrices Ai commute

exp(tA) = exp

(t

d∑i=1

Ai

)= Πd

i=1 exp(tAi) = ⊗di=1 exp(tAi).

For nite element matrices, the following generalization of the result is useful:

Corrolary 3. Let A =∑d

i=1 Ai with

Ai = Md ⊗ · · · ⊗Mi+1 ⊗ Ai ⊗Mi−1 ⊗ . . .⊗M1, Mi, Ai ∈ RIi×Ii , (76)

and Mi regular. If the sum of the spectra of M−1i Ai is contained in the left complex half plane,

the inverse can be expressed as

A−1 = −∫ ∞

0

⊗di=1 exp(tM−1i Ai)M

−1i dt. (77)

24

Page 25: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

8.3 Alternating least squares

Another way to solve linear systems with symmetric positive denite operators is based on anALS algorithm. In this section, we concentrate on the tensor train format.

First, we reformulate the system (73) as an optimization problem:

X = arg minJ(X ), (78)

where J(X ) := 〈A(1,...,d)vecX , vecX〉 − 2〈vecB, vecX〉. The rst order optimality conditionof this optimization problem is given by the linear equation and a sucient criterion for aminimizer due to the fact that the matrix is symmetric positive denite.

In order to dene the micro-iteration, we dene the one-component retraction operatorsPi,G dened based on the current iterate represented by cores (G1, . . . , Gd) of the ALS scheme:

Pi,G :

Kri−1×Ii×ri → KI1×I2×...×Id

V 7→(∑

α1,...,αdG1(j1, α1)G2(α1, j2, α2) · · ·Gi−1(αi−2, ji−1, αi−1)V (αi−1, ji, αi)Gi+1(αi, ji+1, αi+1) · · ·Gd(αd−1, jd)

) .(79)

Then, the i-th micro-iteration solves the optimization problem

Vi = arg min J Pi,G(V )|V ∈ Kri−1×Ii×ri, (80)

where (J Pi,G)(V ) = 〈A(1,...,d)Pi,GvecX , Pi,GvecX〉 − 2〈vecB, Pi,GvecX〉.The rst order optimality condition is then given by

Bi,GV := P Ti,GA(1,...,d)Pi,GvecX = Pi,GvecB. (81)

If the retraction operators are orthogonal, Bi,G retains the symmetry and positive denitenessof the operator A. This can be achieve by orthogonalization of the current iterate such that thecores (G1, . . . , Gi−1, Gi+1, . . . , Gd) are orthogonal. This can be achieved by QR decomposition.

System (81) is of size ri−1Iiri × ri−1Iiri, i.e. much smaller than the original system. Thesystem can be solved based on either a direct or an iterative solver, depending on its size. Thematrix Bi,G can be formed by a recursive formula in O(dn2e2R2 + dn2r3R) operations wherer is a rank bound for the tensor X and R the rank bound for the linear operator A and theright-hand-side can be projected in O(r3nd) operations (cf. [18, 25]).

Algorithm 10 summarizes the structure of the ALS-based solution of linear systems.

Algorithm 10 ALS algorithm to solve linear systems in tensor train format.

Require: Tensor train representation of A, bt, TT rank vector (r1, . . . , rd−1) and initial guessX in TT format, represented by kernels G1, . . . , Gd. Termination criterion.

Ensure: Solution of AX = B satisfying the termination criterion.for k = d− 1, . . . , 2 do . Right-to-left orthogonalization

Gk = LQ . LQ decompositionUpdate Gk = Q, Gk−1 := Gk−1 ×3 L

end for

while Termination criterion not satised dofor k = 1, . . . , d− 1 do

Form Bk,G and bk = Pk,GvecBSolve Bk,Guk = bkUpdate kernel Gk by ukGk = QR, update Gk = Q, Gk+1 = RGk+1 . Orthogonalization

end for

Repeat the for loop in reverse order.end while

25

Page 26: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Note that the special case of A being the identity yields an ALS-based truncation algorithm.

MALS A major disadvantage of the ALS algorithm is the fact that the TT-rank vector(r1, . . . , rd−1) is xed during the ALS iterations, i.e. it does not allow for rank adaptivity. Thiscan be achieved by the MALS scheme that combines two dimensions in the micro-iterations.The simplest MALS algorithm solves the micro-iteration problem as in the ALS algorithm andthen separates the dimensions using a truncated SVD where the inner rank ri (for fused i andi− 1) is chosen to meet a certain accuracy.

AMEn Another variant of the ALS method is the alternating minimal energy method (AMEn)that expands the search space by an inexact gradient direction [6]. Let us recall the steepestdescent (SD) algorithm for a system Ax = b. Given an intial guess x0, the SD algorithmimproves the iterate along the gradient direction r = −gradJ(x0) = b − Ax0 by x1 = x0 + rhwith

h = arg minh′J(x0 + rh′) =〈r, r〉〈r, Ar〉 . (82)

The inexact SD algorithm updates x0 along an approximation z ≈ r with

x1 = x0 + zh, h = arg minh′J(x0 + zh′) =〈z, r〉〈z, Az〉 =

〈z, b〉 − 〈Az, x0〉〈z, Az〉 . (83)

The AMEn algorithm is a recursive algorithm that works in three steps:

1. Optimize the rst TT core G1 → H1.

2. Find a TT representation Z1 of the rst kernel of the residual (up to a certain accuracyor rank bound) and expand the TT core by Z1:

H1 =(H1 Z1

). (84)

3. Reduce the system to dimension d1 by projection to the rst kernel.

9 Generalized Cross Approximation

The high-order singular value decomposition describes an algorithm to construct a low-rankapproximation of a full tensor when all elements are explicitly given. In this section, we areconcerned with the question of how to nd a low-rank approximation of a tensor given in func-tional form (i.e. a function is given from which all tensor elements can be constructed) withoutforming/storing the full tensor. First, we discuss the cross approximation for matrices and thena generalization to tensors for the hierarchical Tucker format proposed in [2]. Generalized crossapproximations have also been proposed for the tensor train format (see e.g. [24]).

9.1 Cross approximation of matrices

A matrix A ∈ RI1×I2 of rank r can be represented as

A = A|I1,Q S−1 A|P,I2 , (85)

where P ⊂ 1, . . . , I1 and Q ⊂ 1, . . . , I2 dene a set of row and column indices of A,respectively, and S = A|P×Q. This is referred to as skeleton decomposition of a matrix. Across or skeleton approximation of a matrix denes an approximation of the matrix of the form(85). In [9], the following result on the approximation properties of the cross approximationwas proven

26

Page 27: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Theorem 2. [9, Corollary 3.1] Let A ∈ RI1×I2. Suppose there exists a matrix Ar ∈ RI1×I2 ofrank r with

‖A− Ar‖2 ≤ ε. (86)

Then, there exist subsets P ⊂ 1, . . . , I1 and Q ⊂ 1, . . . , I2 of row and column indices ofsize less or equal r and a matrix S ∈ RP×Q such that

‖A− A|I1×Q S−1 A|P×I2 ‖2 ≤ ε

(1 + 2

√r(√I1 +

√I2)). (87)

Such a cross approximation can be constructed by successive rank-one approximations ofthe remainder:

X(1) := A:,q1

1

Ap1,q1Ap1,:,

k = 2, . . . r : R = A−X(k−1), Xk = Xk−1 +R:,qk

1

Rpk,qk

Rpk,:.(88)

The nal approximation is given by X(r) and S = A|P×Q. For a practical implementation,we need to dene suitable indices P , Q which we will refer to as Pivot sets in the following.Ideally one should use the submatrix S = A|P×Q of maximal volume (determinant in modulus)(cf. [8]). In practice, this criterion is usually relaxed by some greedy strategy in order to speed upcomputations. Tyrtyshnikov [27] has proposed the so-called row-column alternating algorithm.This algorithm starts with an initial choice of column Pivots Q, e.g. chosen at random. Theone computes the corresponding column matrix A|I1×Q and a set of row Pivot elements P of

quasi-maximal volume is computed by the so-called maxvol algorithm (complexity O(nr2)).Then, the row matrix A|P×I2 is computed and a new set of column Pivots is found by themaxvol algorithm. Suciently good submatrices are in practice found after a few iterations.

9.2 Generalization for higher order tensors in hierarchical Tucker for-mat

Given a dimension tree T , we can in principle use the cross approximation technique to con-struct a low-rank approximation of the matricizations A(α) ∈ RIα×Iαc ,

A(α) ≈ A(α)

∣∣Iα×Qα

S−1α A(α)

∣∣Pα×Iαc

Sα = A(α)

∣∣Pα×Qα

. (89)

A hierarchical Tucker approximation can then be constructed dening the mode frames

Uα = A(α)

∣∣Iα×Qα

. (90)

In this form, the algorithm is, however, not practical since the index sets Iα are large for non-leaf nodes and we want to instead directly dene the transfer tensors. This can be achieved ina recursive algorithm.

To exemplify the algorithm, let us rst consider the example of a four-way tensor withbalanced dimension tree. First we consider the rst child node α = 1, 2 of the root. Withsuitably chosen Pivot sets P1,2, Q1,2, the cross approximation for the matricization A1,2 isgiven as

A1,2 ≈ A(α)

∣∣I1,2×Q1,2

S−112 A(α)

∣∣P1,2×I3,4

, S12 = A(α)

∣∣P1,2×Q1,2

. (91)

In a next step, we consider the child node α = 1. In this step, we want to ensure nestednessof the Pivot elements, i.e. Q1 ⊂ I2×Q12, because we are aiming to approximate the left factor

27

Page 28: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

matrix A(α)

∣∣I1,2×Q1,2

. Chosing suitable Pivot sets, we analogously arrive to the following

low-rank approximation

A1 ≈ A1∣∣I1×Q1

S−11 A1

∣∣P1×I2,3,4

, S1 = A1∣∣P1×Q1

, (92)

where we form the left factor matrix explicitly to get the mode frame of this leaf node:

U1 := A1∣∣I1×Q1

. (93)

In the same way, we approximate the mode frame of the second child 2 with Pivot sets P2,Q2,

U2 := A2∣∣I2×Q2

. (94)

Once the approximations of the two children are obtained, we can form an approximate transfertensor for the parent node 1, 2. Recall that the transfer tensor satises the following relation,

(Uα):,j =

rαl∑jl=1

rαr∑j2=1

(Bα)j,j1,j2(Uαl):,j1 ⊗ (Uαr):,j2 (95)

If the mode frames of the children are approximated, this relation cannot be satised in general.However, we can restrict the relation to the Pivot sets:

(U1,2)p1,p2,j =∑jl∈Q1

∑j2∈Q2

(Bα)j,j1,j2(U1)p1,j1(U2)p2,j2 for all p1 ∈ P1, p2 ∈ P2. (96)

In this formula, we need the transfer tensors restricted to the Pivot sets which where denedto be the matrices S1, S2. Also we need the coupling matrix M j

12 for all j ∈ Q12 with elements

(M j12)p1,p2 = (A(1,2))(p1,p2),j for p1 ∈ P1, p2 ∈ P2. (97)

Then, we can dene the transfer tensor as

(B12):,:,j = S−11 M j

12S−T2 . (98)

Analogously, we can handle the right side of the tree.Algorithm 11 summarizes the recursive step for a general tensor that should be called with

the initial arguments α = 1, . . . , d, Qα = ∅.

Algorithm 11 Recursive generalized cross approximation.

Require: Node α with column Pivot set Qα.if α ∈ N then

for β ∈ αl, αr doFind Pivot elements (Pβ, Qβ).

end forAssemble Bα: (Bα):,:,j = S−1

αlM j

αS−Tαr .

for β ∈ αl, αr doCall recursive generalized cross approximation with β, Qβ as input.

end forelse

Assemble Uα = Aα|Iα×Qα .end if

28

Page 29: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

Finally, we need to address the problem of nding proper pivot sets. Ballani, Grasedyck &Kluge [2] propose a greedy Pivot strategy. The construction is based on successive rank-oneapproximation of the matricization for the node as in (88). In a given iteration, we nd aninitial Pivot element (i1, . . . , id) respecting the nestedness property. This can e.g. be done bya random choice. Then, the Pivot element is improved by varying one index at the time in anALS-type algorithm. The greedy pivot search is summarized in Algorithm 12.

Algorithm 12 Greedy Pivot search.

Require: Initial Pivot element (i1, . . . , id), residuum tensor R for matricization of node α infunctional form, sets α, α′ and αp, where αp = α ∪ α′ and Pivot set Pαp , `max (typically 3).

Ensure: Improved Pivot element (i1, . . . , id).for ` = 1, 2, . . . , `max do

for µ = 1, . . . , d doCompute iµ := argmaxj∈Iµ|Ri1,...,iµ−1,j,iµ+1,...,id |, where Iµ = 1, . . . , Iµ if µ ∈ αp and

Iµ = Pα if µ ∈ 1, . . . , d\αp.end for

end for

Note that we need to evaluate the tensor along one bre to compute the argmax in eachiteration. This is only practical if the evaluation of the tensor is cheap. In case it is given as thesolution of a parameter-dependent PDEthe typical situation for uncertainty quanticationproblemsthe Pivoting strategy needs to be modied. Balani and Grasedyck [1] have proposeda Pivoting strategy that reduces the matricizations to a smaller submatrix drawing indices atrandom. Then, this matrix is decomposed by a matrix cross approximation.

29

Page 30: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

A Exercises

1. Are the following statements true or false? Provide a counterexample where appropriate.

(a) Aij = (AH)ji, A ∈ RI1×I2 .

(b) If all summations are on the far left of an expression, we can always exchange theirorder.

(c) If all summations are on the far left of an expression, we cannot change the order ofthe variables since matrix multiplication does not commute.

(d) Let X ∈ RI1×I2×I3×I4 . If we cluster the rst two and the second two dimensions, wecan nd a best approximation based on SVDs.

(e) Any tensor can be represented with a lower memory requirement in CP format thanin hierarchical Tucker format.

2. Consider X =

(12

)⊗(

12

)⊗(

11

)+

(32

)⊗(

32

)⊗(

31

)∈ R2×2×2.

• Compute a best approximation in R1.

• Compute a rank-2 approximation based on the greedy algorithm (cf. [14], Sec. 9.4.4).

• Compute a rank-2 approximation based on the ALS algorithm. Try dierent startingpoints.

3. Consider the function f(x1, x2, x3, x4) = (cos(x1) + 1)x2(x3 + x4). Write down represen-tations of the function in CP format, Tucker format and hierarchical Tucker format withdierent dimension trees.

4. Implement the HOSVD algorithm for truncation to Tucker format.

5. Download the Hierarchical Tucker Toolbox by Christine Tobler and Daniel Kressnerat http://anchp.ep.ch/htucker.

6. Revisit the function f(x1, x2, x3, x4) = (cos(x1) + 1)x2(x3 + x4). Compute a Tuckerdecomposition and a hierarchical Tucker decomposition of this function for several choicesof the dimension tree. Repeat this experiment with your favorite multi-variate function.

7. Consider the advection equation

∂f(x, t)

∂t+ a · ∇f(x, t) = 0, (99)

on a box [0, L]d with periodic boundary conditions.

(a) Discretize the spatial derivatives based on nite dierences and use a forward Eulerscheme for propagation in time. Write a solver on a full tensor product grid in 2d.

(b) For a given initial conditions f0(X ) the solution at time t is given by

f(x, t) = f0(x− at). (100)

Verify this solution and use it as a reference to verify the convergence order of yourcode.

(c) Write another solver that represents the solution in hierarchical Tucker format basedon the Hierarchical Tucker Toolbox. Compare to your solver on the full grid.Consider also a higher dimensional problem and various choices of initial conditions.

30

Page 31: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

(d) The following initial values could be studied:

f0a(x) = cos

(∑i

xi

), f0b(x) =

∏i

cos (xi) , f0c =1

1 + cos (∑

i xi). (101)

8. Consider the Poisson equation in d dimensions on a hypercube

−∆u(x) = f(x). (102)

Consider a discretization of the one dimensional derivatives by second order nite dier-ences. We want to solve the Poisson equation in tensor train format with the help of theTT-Toolbox (https://github.com/oseledets/TT-Toolbox).

(a) Use the various ALS-based solvers for linear systems provided by the TT-toolbox

and compare them.

(b) Use the explicit inversion for Kronecker product matrices to solve the Poisson prob-lem.

(c) Use the GMRES implementation tt_gmres.m as an example of an iterative solver.

(d) Note that the nite dierence matrix for the second derivative with periodic bound-ary conditions is a circulant matrix that is diagonalized in Fourier space. This canbe used to compute the matrix exponential eciently. Solve the Poisson equationwith periodic boundary conditions with the explicit inversion.

As a rst example, you can study the problem with solution u(x) =∏d

i=1 sin (xi) on[0, 2π]d and homogeneous Dirichlet boundaries. Try a more complicated example as well.For instance, set the right-hand-side to 1.

9. Revisit your multivariate functions and compute a hierarchical Tucker decompositionbased on the generalized cross approximation algorithm instead of the high-order singularvalue decomposition. A Matlab implementation of the black-box algorithm as proposedin [2] can be downloaded at http://anchp.ep.ch/htucker.You can also use the TT-Toolbox with the functions amen_cross, dmrg_cross and studyfor instance the solution of the Poisson problem.

31

Page 32: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

References

[1] J. Ballani and L. Grasedyck. Hierarchical tensor approximation of output quantities ofparameter-dependent pdes. SIAM/ASA Journal on Uncertainty Quantication, 3(1):852872, 2015.

[2] J. Ballani, L. Grasedyck, and M. Kluge. Black box approximation of tensors in hierarchicaltucker format. Linear Algebra and its Applications, 438(2):639657, 2013.

[3] J. D. Carroll and J.-J. Chang. Analysis of individual dierences in multidimensional scalingvia an n-way generalization of EckartYoung decomposition. Psychometrika, 35(3):283319, 1970.

[4] N. Crouseilles, P. Glanc, S. A. Hirstoaga, E. Madaule, M. Mehrenberger, and J. Pétri.A new fully two-dimensional conservative semi-lagrangian method: applications on polargrids, from diocotron instability to itg turbulence. The European Physical Journal D,68(9):252, 2014.

[5] V. de Silva and L.-H. Lim. Tensor rank and the ill-posedness of the best low-rank approx-imation problem. SIAM Journal on Matrix Analysis and Applications, 30(3):10841127,2008.

[6] S. V. Dolgov and D. V. Savostyanov. Alternating minimal energy methods for linearsystems in higher dimensions. SIAM Journal on Scientic Computing, 36(5):A2248A2271,2014.

[7] G. H. Golub and C. F. van Loan. Matrix computations. Johns Hopkins University press,Baltimore, 1996.

[8] S. Goreinov and E. Tyrtyshnikov. The maximal-volume concept in approximation bylow-rank matrices. Contemporary Mathematics, 280:4752, 2001.

[9] S. Goreinov, E. Tyrtyshnikov, and N. Zamarashkin. A theory of pseudoskeleton approxi-mations. Linear Algebra and its Applications, 261(1):121, 1997.

[10] L. Grasedyck. Existence and computation of low Kronecker-rank approximations for largelinear systems of tensor product structure. Computing, 72(3):247265, 2004.

[11] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal onMatrix Analysis and Applications, 31(4):20292054, 2010.

[12] L. Grasedyck and W. Hackbusch. An introduction to hierarchical (h-) rank and tt-rank oftensors with examples. Comput. Methods Appl. Math., 11:291304, 2011.

[13] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor approx-imation techniques. GAMM-Mitteilungen, 36(1):5378, 2013.

[14] W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer Verlag, BerlinHeidelberg, 2012.

[15] W. Hackbusch and S. Kühn. A new scheme for the tensor representation. Journal ofFourier Analysis and Applications, 15(5):706722, 2009.

[16] R. Harshman. Foundations of the PARAFAC procedure: Models and conditions for anexplanatory multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 1970.

32

Page 33: Low-rank tensor discretization for high-dimensional problems · tightly connected to large data which is why handling high-dimensional problems computation-ally is challeging. Using

[17] J. Håstad. Tensor rank is np-complete. J. Algorithms, 11:644654, 1990.

[18] S. Holtz, T. Rohwedder, and R. Schneider. The alternating linear scheme for tensor opti-mization in the tensor train format. SIAM Journal on Scientic Computing, 34(2):A683A713, 2012.

[19] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review,51(3):455500, 2009.

[20] K. Kormann. A Semi-Lagrangian Vlasov Solver in Tensor Train Format. SIAM Journalon Scientic Computing, 37(4):B613B632, 2015.

[21] D. Kressner and C. Tobler. Algorithm 941: Htuckera matlab toolbox for tensors inhierarchical tucker format. ACM Trans. Math. Softw., 40(3):22:122:22, Apr. 2014.

[22] H.-D. Meyer, U. Manthe, and L. Cederbaum. The multi-congurational time-dependentHartree approach. Chemical Physics Letters, 165(1):73 78, 1990.

[23] I. Oseledets. Tensor-train decomposition. SIAM J. Sci. Comput., 33(5):22952317, 2011.

[24] I. Oseledets and E. Tyrtyshnikov. Tt-cross approximation for multidimensional arrays.Linear Algebra and its Applications, 432(1):7088, 2010.

[25] I. V. Oseledets and S. V. Dolgov. Solution of linear systems and matrix inversion in thett-format. SIAM Journal on Scientic Computing, 34(5):A2718A2739, 2012.

[26] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika,31(3):279311, 1966.

[27] E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton method. Com-puting, 64(4):367380, 2000.

33


Recommended