Low-Rank Tensor Techniquesfor High-Dimensional Problems
Daniel KressnerCADMOS Chair for Numerical Algorithms and HPC
MATHICSE, EPFL
1
ContentsI What is a tensor?I ApplicationsI Matrices and low rankI CP and TuckerI Hierarchical TuckerI Algorithms based on low-rank tensorsI Conclusions
2
What is a tensor?I Vectors, matrices, and tensorsI Basic calculus with tensorsI Vectorization and matricizationI µ-mode matrix productsI Two classes of tensor problems
3
Vectors, matrices, and tensors
Vector Matrix Tensor
I scalar = tensor of order 0I (column) vector = tensor of order 1I matrix = tensor of order 2I tensor of order 3
= n1n2n3 numbers arranged in n1 × n2 × n3 array4
Tensors of arbitrary orderA d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensionalarray with entries
Xi1,i2,...,id , iµ ∈ {1, . . . ,nµ} for µ = 1, . . . ,d .
In the following, entries of X are real (for simplicity)
X ∈ Rn1×n2×···×nd .
Multi-index notation:
I = {1, . . . ,n1} × {1, . . . ,n2} × · · · × {1, . . . ,nd}.
Then i ∈ I is a tuple of d indices:
i = (i1, i2, . . . , id ).
Allows to write entries of X as Xi for i ∈ I.
5
Two important points1. A matrix A ∈ Rm×n has a natural interpretation as a linear
operator in terms of matrix-vector multiplications:
A : Rn → Rm, A : x 7→ A · x .
There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion ofeigenvalues and singular values of tensors.
2. Number of entries in tensor grows exponentially with d Curse of dimensionality.
Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has1030 entries = 8× 1012 Exabyte storage!1
For d � 1: Cannot afford to store tensor explicitly (in terms of itsentries).
1Global data storage calculated at 295 exabyte, seehttp://www.bbc.co.uk/news/technology-12419672.
6
http://www.bbc.co.uk/news/technology-12419672
Basic calculusI Addition of two equal-sized tensors X ,Y:
Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.
I Scalar product with α ∈ R:
Z = αX ⇔ Zi = αXi ∀i ∈ I.
vector space structure.
I Inner product of two equal-sized tensors X ,Y:
〈X ,Y〉 :=∑i∈I
xiyi .
Induced norm‖X‖ :=
(∑i∈I
x2i)1/2
For a 2nd order tensor (= matrix) this corresponds to theFrobenius norm.
7
VectorizationTensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector.One possible choice:The vectorization of X is denoted by vec(X ), where
vec : Rn1×n2×···×nd → Rn1·n2···nd
stacks the entries of a tensor in reverse lexicographical order into along column vector.
Remark: For d = 2, this is the usual way how matrices are vectorized.
A =
a11 a12a21 a22a31 a32
⇒ vec(A) =
a11a21a31a12a22a32
8
VectorizationExample: d = 3, n1 = 3, n2 = 2, n3 = 3.
vec(X ) =
x111x112x113x121
...
...x321x322x323
9
MatricizationI A matrix has two modes (column mode and row mode).I A d th-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).
Let us fix all but one mode, e.g., µ = 1: Then
X (:, i2, i3, . . . , id ) (abuse of MATLAB notation)
is a vector of length n1 for each choice of i2, . . . , id .
View tensor X as a bunch of column vectors:
10
MatricizationStack vectors into an n1 × (n2 · · · nd ) matrix:
X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd )
For µ = 1, . . . ,d , the µ-mode matricization of X is a matrix
X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd )
with entries (X (µ)
)iµ1 ,(i1,...,iµ−1,iµ+1...id )
= Xi ∀i ∈ I.
11
MatricizationIn MATLAB: a = rand(2,3,4,5);
I 1-mode matricization:reshape(a,2,3*4*5)
I 2-mode matricization:b = permute(a,[2 1 3 4]);reshape(b,3,2*4*5)
I 3-mode matricization:b = permute(a,[3 1 2 4]);reshape(b,4,2*3*5)
I 4-mode matricization:b = permute(a,[4 1 2 3]);reshape(b,5,2*3*4)
For a matrix A ∈ Rn1×n2 :
A(1) = A, A(2) = AT .
12
µ-mode matrix productsConsider 1-mode matricization X (1) ∈ Rn1×(n2···nd ):
Seems to make sense to multiply an m × n1 matrix A from the left:
Y (1) := A X (1) ∈ Rm×(n2···nd ).
Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y.This is called 1-mode matrix multiplication
Y = A ◦1 X ⇔ Y (1) = AX (1)
More formally (and more ugly):
Yi1,i2,...,id =n1∑
k=1
ai1,kXk,i2,...,id .
13
µ-mode matrix productsGeneral definition of a µ-mode matrix product with A ∈ Rm×n1 :
Y = A ◦µ X ⇔ Y (µ) = AX (µ).
More formally (and more ugly):
Yi1,i2,...,id =n1∑
k=1
aiµ,kXi1,...,iµ−1,k,iµ+1,...,id .
For matrices:I 1-mode multiplication = multiplication from the left:
Y = A ◦1 X = A X .
I 2-mode multiplication = transposed multiplication from the right:
Y = A ◦2 X = X AT .
14
Kronecker productFor m× n matrix A and k × ` matrix B, Kronecker product defined as
B ⊗ A :=
b11A · · · b1`A... ...bk1A · · · bk`A
∈ Rkm×`n.
Most important properties (for our purposes):1. vec(A X ) = (I ⊗ A) vec(X ).2. vec(X AT ) = (A⊗ I) vec(X ).3. (B ⊗ A)(D ⊗ C) = (BD ⊗ AC).4. Im ⊗ In = Imn.
15
µ-mode matrix products and vectorizationBy definition,
vec(X ) = vec(X (1)
).
Consequently, also
vec(A ◦1 X ) = vec(A X (1)
).
Vectorized version of 1-mode matrix product:
vec(A ◦1 X ) = (In2···nd ⊗ A)vec(X )= (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X ).
Relation between µ-mode matrix product and matrix-vector product:
vec(A ◦µ X ) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A⊗ Inµ−1 ⊗ · · · ⊗ In1 ) vec(X )
16
Two classes of tensor problemsClass 1: function-related tensorsConsider a function u(ξ1, . . . , ξd ) ∈ R in d variables ξ1, . . . , ξd .Tensor U ∈ Rn1×···×nd represents discretization of u:I U contains function values of u evaluated on a grid; orI U contains coefficients of truncated expansion in tensorized
basis functions:
u(ξ1, . . . , ξd ) ≈∑i∈I
Ui φi1 (ξ1)φi2 (ξ2) · · ·φid (ξd ).
Typical setting:I U only given implicitly, e.g., as the solution of a discretized PDE;I seek approximations to U with very low storage and tolerable
accuracy.I d may become very large.
Focus of this lecture on function-related tensors!
17
Discretization of function in d variablesξ1, . . . , ξd ∈ [0,1] #function values grows exponentially with d
18
Separability helpsIdeal situation:Function f separable:f (ξ1, ξ2, . . . , ξd ) = f1(ξ1)f2(ξ2) . . . fd (ξd )
Kronecker product
diskretized f
discretized f j O(nd ) memory O(dn) memoryOf course:Exact separability rarely satisfied inpractice.
19
Two classes of tensor problemsClass 2: data-related tensorsTensor U ∈ Rn1×···×nd contains multi-dimensional data.
Example 1: U2011,3,2 denotes the number of papers published 2011by author 3 in the mathematical journal 2.
Example 2: A video of 1000 frames with resolution 640× 480 canbe viewed as a 640× 480× 1000 tensor.
Typical setting:I entries of U given explicitly (at least partially).I extraction of dominant features from U .I usually moderate values for d .
20
SummaryI Tensor X ∈ Rn1×···×nd is a d-dimensional array.I Various ways of reshaping entries of a tensor X into a vector or
matrix.I µ-mode matrix multiplication can be expressed with Kronecker
products
Further reading:I T. Kolda and B. W. Bader. Tensor decompositions and
applications. SIAM Rev. 51 (2009), no. 3, 455–500.Software:
I MATLAB offers basic functionality to work with d-dimensionalarrays.
I MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
21
http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/
Applications inscientific computing
I High-dimensional elliptic PDEsI High-dimensional PDE-eigenvalue problemsI Quantum many-body problemsI Stochastic Automata NetworksI further applications
22
High-dimensional elliptic PDEs: 3D model problemI Consider
−∆u = f in Ω, u|∂Ω = 0,
on unit cube Ω = [0,1]3.I Discretize on tensor grid.
Uniform grid for simplicity:
ξ(j)µ = jh, h =1
n + 1
for µ = 1,2,3.
I Approximate solution tensor U ∈ Rn×n×n:
Ui1,i2,i3 ≈ u(ξ
(i1)1 , ξ
(i2)2 , . . . , ξ
(id )d
).
23
High-dimensional elliptic PDEs: 3D model problemI Discretization of 1D-Laplace:
−∂xx ≈
2 −1
−1 2. . .
. . . . . . −1−1 2
=: A.I Application in each coordinate direction:
−∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A ◦1 U ,−∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A ◦2 U ,−∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A ◦3 U .
I Hence,−∆u ≈ A ◦1 U + A ◦2 U + A ◦3 U
or in vectorized form with u = vec(U):
−∆u ≈ (I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u.
24
High-dimensional elliptic PDEs: 3D model problemFinite difference discretization of model problem
−∆u = f in Ω, u|∂Ω = 0
for Ω = [0,1]3 takes the form
(I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u = f.
Similar structure for finite element discretization with tensorized FEs:
V⊗W⊗ Z ={∑
αijk vi (ξ1)wj (ξ2)zk (ξ3) : αijk ∈ R}
with
V = {v1(ξ1), . . . , vn(ξ1)}, W = {w1(ξ2), . . . ,wn(ξ2)}, Z = {z1(ξ3), . . . , zn(ξ3)}
Galerkin discretization
(KV ⊗MW ⊗MZ + MV ⊗ KW ⊗MZ + MV ⊗MW ⊗ KZ )u = f,
with 1D mass/stiffness matrices MV ,MW ,MZ ,KV ,KW ,KZ .25
High-dimensional elliptic PDEs: Arbitrary dimensionsFinite difference discretization of model problem
−∆u = f in Ω, u|∂Ω = 0
for Ω = [0,1]d takes the form
( d∑j=1
I ⊗ · · · ⊗ I ⊗ A⊗ I ⊗ · · · ⊗ I)
u = f.
To obtain such Kronecker structure in general:I tensorized domain;I highly structured grid;I coefficients that can be written/approximated as sum of
separable functions.
26
High-dimensional PDE-eigenvalue problemsPDE-eigenvalue problem
∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.
Assumption: Potential represented as
V (ξ) =s∑
j=1
V (1)j (ξ1)V(2)j (ξ2) · · ·V
(d)j (ξd ).
finite difference discretization
Au = (AL +AV )u = λu,with
AL =d∑
j=1
I ⊗ · · · ⊗ I︸ ︷︷ ︸d−j times
⊗AL ⊗ I ⊗ · · · ⊗ I︸ ︷︷ ︸j−1 times
,
AV =s∑
j=1
A(d)V ,j ⊗ · · · ⊗ A(2)V ,j ⊗ A
(1)V ,j .
27
Quantum many-body problemsI spin-1/2 particles: proton, neutron, electron, and quark.I two states: spin-up, spin-downI quantum state for each spin represented by vector in C2 (spinor)I quantum state for system of d spins represented by vector in C2d
I quantum mechanical operators expressed in terms of Paulimatrices
Px =[
0 11 0
], Py =
[0 −ii 0
], Pz =
[1 00 −1
].
I spin Hamiltonian: sum of Kronecker products of Pauli matricesand identities each term describes physical (inter)action of spins
I interaction of spins described by graphI Goal: Compute ground state of spin Hamiltonian.
28
Quantum many-body problemsExample: 1d chain of 5 spins with periodic boundary conditions
1 3 4 52
Hamiltonian describing pairwise interaction between nearestneighbors:
H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I+ I ⊗ Pz ⊗ Pz ⊗ I ⊗ I+ I ⊗ I ⊗ Pz ⊗ Pz ⊗ I+ I ⊗ I ⊗ I ⊗ Pz ⊗ Pz+ Pz ⊗ I ⊗ I ⊗ I ⊗ Pz
29
Quantum many-body problemsI Ising (ZZ) model for 1d chain of d spins with open boundary
conditions:
H =p−1∑k=1
I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I
+λ
p∑k=1
I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I
λ = ratio between strength of magnetic field and pairwiseinteractions
I 1d Heisenberg (XY) modelI Current research: 2d models.I More details in:
Huckle/Waldherr/Schulte-Herbrüggen: Computations inQuantum Tensor Networks.Schollwöck: The density-matrix renormalization group in the ageof matrix product states.
30
Stochastic Automata Networks (SANs)
I 3 stochastic automata A1,A2,A3 having 3 states each.I Vector x (i)t ∈ R3 describes probabilities of states (1), (2), (3) in Ai
at time tI No coupling between automata local transition x (i)t 7→ x
(i)t+1
described by Markov chain:
x (i)t+1 = Eix(i)t ,
with a stochastic matrix Ei .I Stationary distribution of Ai = Perron vector of Ei (eigenvector for
eigenvalue 1).
31
Stochastic Automata Networks (SANs)
I 3 stochastic automata A1,A2,A3 having 3 states each.I Coupling between automata local transition x (i)t 7→ x
(i)t+1 not
described by Markov chain.I Need to consider all possible combinations of states in
(A1,A2,A3):
(1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2), . . . .
I Vector xt ∈ R33
(or tensor X (t) ∈ R3×3×3) describes probabilitiesof combined states.
32
Stochastic Automata Networks (SANs)I Transition xt 7→ xt+1 described by Markov chain:
xt+1 = Ext ,
with a large stochastic matrix E .I Oversimplified example:
E = I ⊗ I ⊗ Ẽ1 + I ⊗ Ẽ2 ⊗ I + Ẽ3 ⊗ I ⊗ I︸ ︷︷ ︸local transition
.
+ I ⊗ E21 ⊗ E12︸ ︷︷ ︸interaction between A1,A2
+ E32 ⊗ E23 ⊗ I︸ ︷︷ ︸interaction between A2,A3
I Goal: Compute stationary distribution = Perron vector of E .I More details in:
Stewart: Introduction to the Numerical Solution of MarkovChains. Chapter 9.Buchholz: Product Form Approximations for CommunicatingMarkov Processes.
33
Further applicationsOther applications in scientific computing featuring low-rank tensorconcepts:
I Boltzmann equation [Ibragimov/Rjasanow’2009].I Dynamical systems [Koch/Lubich’2009].I Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009].I Stochastic PDEs [Khoromskij/Schwab’2010],
[Matthies/Zander’2011], [Kressner/Tobler’2011],[Ballani/Grasedyck/Kluge’2011], . . .
I Electronic structure calculation [Chinnamsetty et al.’2007], [Fladet al.’2009], [Khoromskij/Khoromskaja’2009],[Limpanuparb/Gill’2009], [Benedikt et al.’2011],[Mohlenkamp’2011], . . .
I Evaluation of boundary integrals (in BEM): [Grasedyck],[Khoromskij/Sauter/Veit’2011].
I . . .
34
SummaryI Large diversity of applications leading to linear systems /
eigenvalue problems with Kronecker product structures.I For many problems of practical interest:
Explicit storage / computation of solution infeasible.I Increasing use of low-rank tensor techniques.
Heaviest use currently:DMRG for quantum many-body problems.
I Remark: For PDE-related applications, high dimensionality canalso be addressed during the discretization phase (sparse grids,adaptive sparse discretization, . . .).Has advantages and disadvantages.
35
Approximatelow-rank matrices
I Singular value decompositionI Separability and low rankI Separability by polynomial interpolationI Separability by exponential sumsI Low rank of snapshot matrices
36
Low-rank approximationSetting: Matrix X ∈ Rn×m, m and n too large to compute/store Xexplicitly.Idea: Replace X by RST with R ∈ Rn×r ,S ∈ Rm×r and r � m,n.
X RST
Memory nm nr + rmCost ops(m,n) ops(m,n)× rmin{m,n} (?)
min{‖X − RST‖2 : R ∈ Rn×r ,S ∈ Rm×r
}= σk+1.
with singular values σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} of X .
37
Construction from singular value decompositionSVD: Let matrix X ∈ Rn×m and k = min{m,n}. Then ∃ orthonormalmatrices
U =[u1, u2, . . . , uk
]∈ Rn×k , V =
[v1, v2, . . . , vk
]∈ Rm×k ,
such thatX = UΣV T , Σ = diag(σ1, σ2, . . . , σk ).
Choose r ≤ k and partition
X =[U1, U2
] [ Σ1 00 Σ2
] [V1, V2
]T= U1 Σ1︸ ︷︷ ︸
=:R
V T1︸︷︷︸=:ST
+ U2Σ2V T2 .
Then ‖X − RST‖2 = ‖Σ2‖2 = σr+1.
Good low rank approximation if singular values decay sufficiently fast.
Also: span(X ) ≈ span(R), span(X T ) ≈ span(ST )
38
Discretization of bivariate functionI Bivariate function: f (x , y) :
[xmin, xmax
]×[ymin, ymax
]→ R.
I Function values on tensor grid [x1, . . . , xn]× [y1, . . . , ym]:
F =
f (x1, y1) f (x1, y2) · · · f (x1, yn)f (x2, y1) f (x2, y2) · · · f (x2, yn)
......
...f (xm, y1) f (xm, y2) · · · f (xm, yn)
Basic but crucial observation: f (x , y) = g(x)h(y)
F =
g(x1)h(y1) · · · g(x1)h(yn)... ...g(xm)h(y1) · · · g(xm)h(yn)
= g(x1)...
g(xm)
[ h(y1) · · · h(yn) ]
Separability implies rank 1.
39
Separability and low rankApproximation by sum of separable functions
f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y)︸ ︷︷ ︸=:fr (x,y)
+ error.
Define
Fr =
fr (x1, y1) · · · fr (x1, yn)... ...fr (xm, y1) · · · fr (xm, yn)
.Then Fr has rank ≤ r and ‖F − Fr‖F ≤
√mn × error.
σr+1(F ) ≤‖F − Fr‖2 ≤ ‖F − Fr‖F ≤
√mn × error.
Semi-separable approximation implies low-rank approximation.
40
Semi-separable approximation by polynomialsSolution of approximation problem
f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y) + error.
not trivial; gj ,hj can be chosen arbitrarily!
General construction by polynomial interpolation:1. Lagrange interpolation of f (x , y) in y -coordinate:
Iy [f ](x , y) =r∑
j=1
f (x , θj )Lj (y)
with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].
2. Interpolation of Iy [f ] in x-coordinate:
Ix [Iy [f ]](x , y) =r∑
i,j=1
f (ξi , θj )Li (x)Lj (y) =̂r∑
i=1
Li,x (x)Lj,y (y),
where f [f (ξi , θj )]i,j is “diagonalized” by SVD.41
Semi-separable approximation by polynomials
error ≤ ‖f − Ix [Iy [f ]]‖∞= ‖f − Ix [f ] + Ix [f ]− Ix [Iy [f ]]‖∞≤ ‖f − Ix [f ]‖∞ + ‖Ix‖∞‖f − Iy [f ]‖∞
with Lebesgue constant ‖Ix‖∞ ∼ log r when using Chebyshevinterpolation nodes.
Polynomial interpolation error typically much too pessimistic
I Lebesgue constants hit hard in high dimensions: (log r)d−1.I Severe theoretical barriers for general smooth multivariate
functions:E. Novak and H. Woźniakowski: Tractability of MultivariateProblems, Volume I and II. EMS.
42
Semi-separable approximation of 1/(x + y)Consider
f (x , y) =1
x + y, x , y ∈ [α, β], 0 < α < β.
Apply numerical quadrature:
1z
=
∫ ∞0
e−tz dt =r∑
j=1
ωje−γj z + error.
Inserting z = x + y
1x + y
=r∑
j=1
ωje−γj (x+y) + error =r∑
j=1
ωje−γj xe−γj y + error.
Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93,Braess’86, Braess/Hackbusch’05]
error ≤ 8|α|
exp[− rπ
2
log(8β/α)
].
43
Semi-separable approximation by exponential sumsI Consider more general case of function f (x , y) := g(x + y).I Approximation of g(z) with z := x + y by exponential sum
g(z) ≈r∑
j=1
ωj exp(γjz) (1)
for some coefficients γj , ωj ∈ R.I (1) gives semi-separable approximation for f :
f (x , y) = g(x + y) ≈r∑
j=1
ωj exp(γj (x + y))
=r∑
j=1
ωj exp(γjx) exp(γjy).
I Naturally extends to arbitrarily many variables.I Problem: (1) nontrivial approx problem [Braess’1986],
[Hackbusch’2006], . . .44
Low-rank approximation of snapshot matricesVector-valued function
x(α) : [αmin, αmax]→ Rn
Sampling at α1, . . . , αm ∈ [αmin, αmax]:
Snapshot matrix X = =[x(α1), x(α2), . . . , x(αm)
]
45
Example: Baking 1 cookieStationary heat equation with pw constant heat conductivity σ(x , α):
−∇(σ(x , α)∇u) = f in Ω = [−1,1]2
u = 0 on ∂Ω,
I σ(baking tray) = 1I σ(cookie) = 1 + αI Undetermined parameter
α ∈ [αmin, αmax].
0 0.5 1 1.5 2
0
0.5
1
1.5
2
# Vertices : 455, # Elements : 825,# Edges : 1279
Standard FE discretization results in linearly parameter-dependentlinear system
(A0 + αA1)x(α) = b.
46
Singular value decay – observationI 1 Cookie: n = 371,m = 101.
log10(singular values of snapshot matrix)
0 20 40 60 80 100−20
−15
−10
−5
0
5
I Foundation of Proper Orthogonal Decomposition and ReducedBasis Methods.
47
Singular value decay – explanationPolynomial approximation:
x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.
Approximation error:I Assume b(·), A(·) analytic x(·) analytic.I Then
error . ρ−k ,
where ρ > 1 depends on domain of analyticity of A,b.(Proof: Direct extension of classical result for scalar-valuedfunctions.)
48
Singular value decay – explanationPolynomial approximation:
x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.
Snapshot matrix:
X =[x(α1), x(α2), . . . , x(αm)
]=
[x0, x1, . . . , xk−1
]
1 1 . . . 1α1 α2 . . . αm...
......
αk−11 αk−12 . . . α
k−1m
+ error= matrix of rank k + error
σk+1(X ) ≤ error . ρ−k
Remark: Trivially extends to pw analytic case.
49
Singular value decay – pw analytic caseExample: Consider smallest singular value σ(z) and correspondingright singular vector v(z) of B(z) = A− izI for z ∈ [−1,1].
I s(z) only Lipschitzcont, but pw anal.
I v(z) discontinuous,but pw anal.
I A = 2× 2 block diag randn, n = 400.I Snapshot matrix of singular vectors:
X =[
v(z1), v(z2), . . . , v(z100)]
for equidistant samples zj ∈ [−1,1].
σ(z) Singular values of X
−1 −0.5 0 0.5 10
0.005
0.01
0.015
0.02
0.025
0.03
z
0 20 40 60 80 10010
−20
10−15
10−10
10−5
100
105
50
Summary
Need strong singular value decay for good low-rank approximations.
For function-related matrices/tensors: Strong link to semi-separableapproximations.
Smoothness seems to be important... at least somehow.I Fortunately, smoothness is not necessary.
Piecewise smoothness can be enough.I Unfortunately, smoothness is not sufficient for higher-order
tensors.I Need to impose stronger regularity as dimension/order d
increases, based, e.g., on mixed weak derivatives [Yserentant:Regularity and approximability of electronic wave functions.2010].
51
Low-rank tensors:CP and Tucker
I CPI TuckerI Higher-order SVDI Tensor networks
52
CP decompositionI Aim: Generalize concept of low rank from matrices to tensors.I One possibility motivated by
X =[a1, a2, . . . , aR
][b1, b2, . . . , bR
]T=
= a1bT1 + a2bT2 + · · ·+ aRbTR .
vectorization
vec(X ) = b1 ⊗ a1 + b2 ⊗ a2 + · · ·+ bR ⊗ aR .
Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 definedvia
vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR
for vectors aj ∈ Rn1 , bj ∈ Rn2 , cj ∈ Rn3 .
CP directly corresponds to semi-separable approximation.Tensor rank of X = minimal possible R
53
CP decompositionIllustration of CP decomposition
vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR .
c1
a1
b1
cr
ar
br
X
54
CP decompositionI CP decomposition offers low data-complexity; for constant R:
linear complexity in d .I For matrices:
I rank r is upper semi-continuous closedness property:sequence of rank= r matrices can only converge to rank≤ r matrix.
I best low-rank approximation possible by successive rank-1approximations.
I Robust black-box algorithms/software available (svd, Lanczos).
For tensors of order d ≥ 3:I tensor rank R is not upper
semi-continuous
lack of closedness
I successive rank-1 approximations failI all algorithms based on optimization
techniques (ALS, Gauss-Newton)Picture taken from [Kolda/Bader’2009].
55
Tucker decompositionI Aim: Generalize concept of low rank from matrices to tensors.I Alternative possibility motivated by
A = U · Σ · V T , U ∈ Rn1×r , V ∈ Rn2×r , Σ ∈ Rr×r .
vectorization
vec(X ) =(V ⊗ U
)· vec(Σ).
Ignore diagonal structure of Σ and call it C.
Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via
vec(X ) =(W ⊗ V ⊗ U
)· vec(C)
with U ∈ Rn1×r1 , V ∈ Rn2×r2 , W ∈ Rn3×r3 ,and core tensor C ∈ Rr1×r2×r3 .
In terms of µ-mode matrix products:
X = U ◦1 V ◦2 W ◦3 C =: (U,V ,W ) ◦ C.
56
Tucker decompositionIllustration of Tucker decomposition
X = (U,V ,W ) ◦ C
X CU
V
W
57
Tucker decompositionConsider all three matricizations:
X (1) = U · C(1) ·(W ⊗ V
)T,
X (2) = V · C(2) ·(W ⊗ U
)T,
X (3) = W · C(3) ·(V ⊗ U
)T.
These are low rank decompositions
rank(X (1)
)≤ r1, rank
(X (2)
)≤ r2, rank
(X (3)
)≤ r3.
Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple
(r1, r2, r3), with ri = rank(X (i)
).
58
Higher-order SVD (HOSVD)Goal: Approximate given tensor X by Tucker decomposition withprescribed multilinear rank (r1, r2, r3).
1. Calculate SVD of matricizations:
X (µ) = ŨµΣ̃µṼ Tµ for µ = 1,2,3.
2. Truncate basis matrices:
Uµ := Ũµ(:,1 : rµ) for µ = 1,2,3.
3. Form core tensor:
vec(C) :=(UT3 ⊗ UT2 ⊗ UT1
)· vec(X ).
Truncated tensor produced by HOSVD [Lathauwer/DeMoor/Vandewalle’2000]:
vec(X̃)
:=(U3 ⊗ U2 ⊗ U1
)· vec(C).
Remark:Orthogonal projection X̃ :=
(π1 ◦ π2 ◦ π3
)X with πµX := UµUTµ ◦µ X .
59
Higher-order SVD (HOSVD)Tensor X̃ resulting from HOSVD satisfies quasi-optimality condition
‖X − X̃‖ ≤√
d‖X − Xbest‖,
where Xbest is best approximation of X with multilinear ranks(r1, . . . , rd ).
Proof:
‖X − X̃‖2 = ‖X − (π1 ◦ π2 ◦ π3)X‖2
= ‖X − π1X‖2 + ‖π1X − (π1 ◦ π2)X‖2 + · · ·· · ·+ ‖(π1 ◦ π2)X − (π1 ◦ π2 ◦ π3)X‖2
≤ ‖X − π1X‖2 + ‖X − π2X‖2 + ‖X − π3X‖2
Using‖X − πµX‖ ≤ ‖X − Xbest‖ for µ = 1,2,3
leads to‖X − X̃‖2 ≤ 3 · ‖X − Xbest‖2.
Best approximation: See [Kolda/Bader’09].60
Tucker decomposition – SummaryFor general tensors:
I multilinear rank r is upper semi-continuous closednessproperty.
I HOSVD – simple and robust algorithm to obtain quasi-optimallow-rank approximation.
I quasi-optimality good enough for most applications in scientificcomputing.
I robust black-box algorithms/software available (e.g., TensorToolbox).
Drawback:Storage of core tensor ∼ rd curse of dimensionality
61
Tensor network diagramsTensor network = undirected graph with:
I each node is a tensor;I each outgoing edge is a mode;I each connected edge represents a contraction; example:
Zi1,i2,i3,i4 =r∑
j=1
Xi1,i2,jYj,i3,i4 .2
13 1
2
3
I number of free edges = order of tensor represented by entirenetwork
Researchers on quantum many-body problems think2 in terms oftensor networks!
2and dream62
Tensor network diagramsExamples:
1 2
3 3
1 2 1 2
2 2 2 2
1 1 1 1
1 11
1
22
2
(v)(i) (ii) (iii) (iv)
(i) vector;(ii) matrix;(iii) matrix-matrix multiplication;(iv) Tucker decomposition;(v) hierarchical Tucker decomposition.
63
Low-rank tensors:Hierarchical Tucker
I Intro of Hierarchical Tucker Decomposition (HTD)I MATLAB toolbox htuckerI Basic operations: µ-mode matrix multiplication, addition, . . .I Advanced Operations: inner product, elementwise multiplication,. . .
64
IntroductionI CP offers low data complexity but difficult truncation;I Tucker offers simple truncation but high data complexity.
Recently developed formats:I Matrix Product State (MPS),I TT decomposition,I Hierarchical Tucker decomposition (HTD).
Aim to offer compromise between CP and Tucker.
Focus in this lecture: HTD.I L. Grasedyck. Hierarchical singular value decomposition of tensors.
SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.I W. Hackbusch and S. Kühn. A new scheme for the tensor
representation. J. Fourier Anal. Appl., 15(5):706–722, 2009.I D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the
hierarchical Tucker decomposition. In preparation. Seehttp://www.math.ethz.ch/~ctobler.
65
http://www.math.ethz.ch/~ctobler
More general matricizationsRecall: µ-mode matricization for tensor X ,
X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd ), µ = 1, . . . ,d .
It is getting ugly...
General matricization for mode de-composition {1, . . . ,d} = t ∪ s:
X (t) ∈ R(nt1 ···ntk )×(ns1 ···nsd−k )
with(X (t)
)(it1 ,...,itk ),(is1 ,...,isd−k )
:= Xi1,...,id .
X
X (1)
X (1,2)
66
Hierarchical constructionSingular value decomposition: X (t) = Ut ΣtUTs .Column spaces are nested
t = t1 ∪ t2 ⇒ span(Ut ) ⊂ span(Ut2 ⊗ Ut1 )⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1 )Bt .
Size of Ut :Ut ∈ Rnt1 ···ntk×rt with rt = rank
(X (t)
).
For d = 4:
U12 = (U2 ⊗ U1)B12U34 = (U4 ⊗ U3)B34
vec(X ) = X (1234) = (U34 ⊗ U12)B1234⇒ vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.
67
Dimension treeTree structure for d = 4:
B12
U1
U2
U3
U4
B34
B1234(n2 × r2)
(n3 × r3)
(n4 × r4)
(n1 × r1)
(r1r2 × r12)(r1r2 × r12)
(r3r4 × r34)
(r12r34 × 1)
Reshape:
B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12
B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34
B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34
68
Dimension tree
B34
B12
U4
U3
U2
U1
B1234
I Often, U1,U2,U3,U4 are orthonormal. This is advantageous butnot required.
I Storage requirements for general d :
O(dnr) +O(dr3),
where r = max{rt}, n = max{nµ}.69
Constructors for MATLAB class htensor
x = htensor([4 5 6 7]) constructs zero htensor of size4× 5× 6× 7, with a balanced dimension tree.
x = htensor([4 5 6 7], ’TT’) constructs zero htensorof size 4× 5× 6× 7, with a TT-style dimension tree.
x = htensor({U1, U2, U3}) constructs htensor fromtensor in CP decomp X (i1, i2, i3) =
∑j U1(i1, j)U2(i2, j)U3(i3, j).
x = htenrandn([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with random ranks and random entries.
x = htenones([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with all entries one.
...
70
Basic functionality for MATLAB class htensorExample: x is in htensor of order 4.
x(1, 3, 4, 2) returns entry of X .x(1, 3, :, :) returns slice of X as an htensor.full(x) returns full tensor represented by X . (use with care)disp_tree(htenrand([5 4 6 3])) returns treestructure/ranks:
ans is an htensor of size 5 x 4 x 6 x 31-4 1; 6 3 11-2 2; 3 4 6
1 4; 5 32 5; 4 4
3-4 3; 3 3 33 6; 6 34 7; 3 3
spy(x) displays spy plots of Ut ,Bt , on the dimension tree.change_root(x, i) switches root node.
71
Singular value treeplot_sv(x) plots singular values of corresponding matricizations inthe dimension tree of a tensor X .
Example: Singular value tree of solution to elliptic PDE with 4parameters.
Dim. 1, 2 Dim. 3, 4, 5
Dim. 1 Dim. 2 Dim. 3 Dim. 4, 5
Dim. 4 Dim. 5
Remark: Singular values are computed from Gramians. 72
Basic ops: µ-mode matrix multiplicationApplication of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd :
Y = A ◦µ X ⇔ Y (µ) = AX (µ).
Nearly trivial if X is in H-Tucker format:
A ◦µ X = A ◦µ((U1, . . . ,Ud ) ◦ C
)= (U1, . . . ,Uµ−1,AUµ,Uµ+1, . . . ,Ud ) ◦ C
I Almost no operations required.I Ranks stay the same.I Orthogonality destroyed.
ttm(x, A, 2) applies matrix A to htensor X in mode 2.y = ttm(x, {A, B, C}, [2, 3, 4])y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2.y = ttm(x, {A, B, C}, [2, 3, 4], ’h’) successivelyapplies matrices AT , BT , CT in modes 2,3,4.
73
Addition of low-rank matricesAddition of two matrices in low-rank format:
A = U1ΣAUT2 , B = V1ΣBVT2
⇒A + B =
[U1 V1
] [ ΣA 00 ΣB
] [U2 V2
]TI No operations required.I Rank increases.I Orthogonality destroyed.
74
Addition of low-rank tensorsAddition of four tensors X1,X2,X3,X4 in H-Tucker format:
X1 + X2 + X3 + X4.
Proceed as in matrix case by embedding factors in larger matrices.I No operations required.I H-Tucker rank increases.I Orthogonality destroyed.
Command in htucker: x1 + x2 + x3 + x4
75
U [4]1
U [4]2
U [4]3
U [4]4
B[1]12B[2]12B[3]12B[4]12
B[1]34B[2]34B[3]34B[4]34
B[1]1234B[2]1234B[3]1234B[4]1234
U [3]1U[2]1U
[1]1
U [3]3
U [3]2U[2]2
U [2]3U[1]3
U [1]2
U [3]4U[2]4U
[1]4
76
OrthogonalizationAny tensor X in H-Tucker format can be orthogonalized in the sensethat all factors in the dimension tree, except for the root node, containorthonormal columns.
Example: vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.
Step 1: QR decompositions Ut = QtRt
vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(B̃34 ⊗ B̃12)B1234
with B̃34 := (R4 ⊗ R3)B34, B̃12 := (R2 ⊗ R1)B12.
Step 2: QR decompositions B̃34 = Q34R34, B̃12 = Q12R12
vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(Q34 ⊗Q12)B̃1234
with B̃1234 := (R34 ⊗ R12)B1234.
Compt. requirements for general d : O(dnr2) +O(dr4).
Command in htucker: x = orthog(x)
77
Norms and inner productsInner product of two tensors X ,Y ∈ Rn1×···nd :
〈X ,Y〉 = 〈vec(X ), vec(Y)〉 =n1∑
i1=1
· · ·nd∑
id =1
xi1,...,id yi1,...,id .
Can be performed efficiently in H-Tucker, provided that X ,Y havecompatible dimension trees.
Example: Two tensors of order 4
〈X ,Y〉 = (Bx1234)T (Bx34 ⊗ Bx12)T (Ux4 ⊗ Ux3 ⊗ Ux2 ⊗ Ux1 )T · · ·· · · (Uy4 ⊗ U
y3 ⊗ U
y2 ⊗ U
y1 )(B
y34 ⊗ B
y12)B
y1234
Norm: After X has been orthogonalized:
‖X‖ =√〈X ,X〉 = ‖Bx12···d‖F .
Possibly most accurate way to compute norm. Used in norm(x).
78
Computation of inner products
〈X ,Y〉 =n1∑
i1=1
· · ·nd∑
id =1
xi1,...,id yi1,...,id .
79
Computation of inner products
80
Computation of inner products
81
Computation of inner products
82
Computation of inner products
83
Computation of inner products – contraction step
(Bxt )T
(Uxt2)T Uyt2(U
xt1)
T Uyt1
Byt
(Uxt )T Uyt = (B
xt )
T ((Uxt2 )T Uyt2 ⊗ (Uxt1 )T Uyt1)Byt .I htucker command: innerprod(x,y)I Overall cost: O(dnr2) +O(dr4).
84
Reduced Gramians in H-Tucker
t
Ut
Gt
t
Ut
X (t) = UtV Tt ⇒ X (t)(X (t))T = Ut V Tt Vt︸ ︷︷ ︸=:Gt
UTt
If Ut orthonormal svd(X (t)
)=√
eig(Gt ) (used in plot_sv).85
Reduced Gramians in H-Tucker
86
Reduced Gramians in H-Tucker
87
Reduced Gramians in H-Tucker
88
Reduced Gramians in H-Tucker
89
Reduced Gramians in H-Tucker
90
Reduced Gramians in H-Tucker
Implemented in htucker command gramians(x).
91
Advanced operationsI TruncationI Combined addition + truncationI Elementwise multiplicationI Elementwise reciprocal
92
Truncation of explicit tensorLet X ∈ Rn1×n2×···×nd be explicitly given.
I For each tree node t , let Wt contain rt dominant left singularvectors of X (t) and define projection
πtX = WtW Tt ◦t X ⇔ πtX (t) = WtW Tt X (t).
I Truncated tensor:
X̃ :=( ∏
t∈TL
πt
)· · ·( ∏
t∈T1
πt
)X ,
where T` contains all nodes on level `.I [Grasedyck’2010]: ‖X − X̃‖ ≤
√2d − 3 ‖X − Xbest‖.
Proof similar as for HOSVD.
93
Truncation of explicit tensorExample:
vecX̃ = (W4W T4 ⊗W3W T3 ⊗W2W T2 ⊗W1W T1 )(W34W T34 ⊗W12W T12)vecX= (W4 ⊗W3 ⊗W2 ⊗W1) · · ·
([W T4 ⊗W T3 ]W34︸ ︷︷ ︸=:B34
⊗ [W T2 ⊗W T1 ]W12︸ ︷︷ ︸=:B12
) ([W T34 ⊗W T12]vecX )︸ ︷︷ ︸=:B1234
.
opts.max_rank = 10 maximal rank at truncation.opts.rel_eps = 1e-6 maximal relative truncation error.opts.abs_eps = 1e-6 maximal absolute truncation error.Condition max_rank takes precedence over rel_eps andabs_eps.xt = htensor.truncate_rtl(x, opts) returns truncatedtensor X̃ of a multidimensional array.
Remark: There is also a significantly fasterhtensor.truncate_ltr (proceeds successively from leafs toroots), for which the same error bound holds [Tobler’10].
94
Truncation of H-Tucker tensorLet X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.
I Compute left singular vectors of X (t) = UtV Tt from eigenvectorsof
X (t)(X (t)
)T= Ut V Tt Vt︸ ︷︷ ︸
=Gt
UTt ,
with reduced Gramian Gt .If St contains rt dominant eigenvectors of Gt Wt = UtSt .
I Traverse tree from root to leafs. In each step:
Btp
StSTt
Bt
Bt
Btp
STt
St
STt ◦ Btp
St ◦ Bt
I In htucker: truncate(x,opts). Complexity O(dnr2 + dr4).95
Combined addition + truncationSum of more than two tensors:
Y = X1 + X2 + · · ·+ Xs.
Two possibilities to incorporate truncation operator T :1. Y ≈ T (X1 + X2 + X3 + · · ·+ Xs)2. Y ≈ T (· · · (T (T (X1 + X2) + X3) + · · ·+ Xs)
Option 2 is usually significantly cheaper but may suffer from severecancellation.
Artificial example: X1,X2,X3 ∈ R101×101×101 truncated tensor griddiscretizations for summands of
f (x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3).
Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3.
What is wrong with Option 1?
96
Combined addition + truncation
U [4]1
U [4]2
U [4]3
U [4]4
B[1]12B[2]12B[3]12B[4]12
B[1]34B[2]34B[3]34B[4]34
B[1]1234B[2]1234B[3]1234B[4]1234
U [3]1U[2]1U
[1]1
U [3]3
U [3]2U[2]2
U [2]3U[1]3
U [1]2
U [3]4U[2]4U
[1]4
I Orthogonalization (needed before truncation) destroys blockdiagonal structure.
I Complexity O(dns2r2 + ds4r4) for s summands.
97
Combined addition + truncationIdea: New variant delays orthogonalization to keep block diagonalstructure in transfer tensors as long as possible.
Reduces O(dns2r2 + ds4r4) to O(dns2r2 + ds2r4 + ds3r3)
100
101
10−2
10−1
100
101
102
Number of summands
Run
time
[s]
time truncate stdtime truncate sumtime truncate succ.O(t4)O(t2)O(t)
I htucker command: add_truncate(x1 x2 x3 x4, opts).
98
Elementwise multiplicationElementwise multiplication (also called Hadamard or Schur product)of two low-rank matrices A = U1ΣAUT2 ,B = V1ΣBV
T2 :
A ? B = (U1 �̃ V1)(ΣA ⊗ ΣB)(U2 �̃ V2)T ,
with the row-wise Khatri-Rao product
C �̃ D =
cT1...
cTn
�̃ d
T1...
dTn
= c
T1 ⊗ dT1
...cTn ⊗ dTn
I Orthogonality destroyed.I Rank increases significantly.
But: singular value decay of ΣA ⊗ ΣB may become significantlystronger additional opportunities for truncation.
99
Elementwise multiplicationElementwise multiplication of two tensors X ,Y in H-Tucker format:
I Row-wise Khatri-Rao product of leaf matrices.I “Kronecker product” of non-leaf tensors.I Optional: Products are only formed after suitable truncation to
avoid excessive memory requirements.Commands in htucker:x.*y (without truncation)x.ˆ2 (without truncation)elem_mult( x, y, opt ) (with truncation)
100
Elementwise reciprocalGoal: Compute reciprocal of each entry in tensor X .
Basic idea: Newton-Schultz iteration
y0 = 1, yi+1 = yi + yi (1− x yi ), (2)
converges to 1/x for 0 < x < 2.
Apply (2) simultaneously to all entries.
Code snippet of elem_reciprocal( x, opt ) in htucker:
all_ones = htenones(size(x));y = all_ones;for it=1:maxit
xy = elem_mult( x, y );xy = truncate( all_ones - xy );xy = elem_mult( xy, y );y = truncate( y + xy );
end
See also [Oseledets et al. 2009].101
Elementwise reciprocalExample: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3,1].
c = laplace_core(4);U = [ones(100, 1), linspace(1e-3, 1, 100)’];x = ttm(c, {U, U, U, U});inv_x = elem_reciprocal(x, opts);
0 2 4 6 8 10 1210
−5
100
||y*x
k −
1||/
||1||
Convergence of ‖X ? Yk − 1‖.
Dim. 1, 2
Dim. 3, 4
Dim. 1
Dim. 2
Dim. 3
Dim. 4
Singular value tree upon conver-gence.
102
SummaryI HTD offers good compromise between CP and Tucker.I Algorithms often quite technical but conceptually simple.I Computational complexity ∼ d but often ∼ r4:
Curse of dimensionality ⇒ curse of rank ?I Important to keep in mind:
Unless d is tiny, tensor X can/should never be formed explicitly.All operations need to be performed implicitly in HTD.
Can pose severe problems even for seemingly simple operations:min(X ), max(X ), abs(X ), 1./X , . . .
103
104
Algorithms based onlow-rank tensors
I Inexact LOBPCGI ALS / MALS
105
Strategies for solving tensor equationsI In many practical situations, tensor X is given implicitly as
solution to linear system A(X ) = B, eigenvalue problemA(X ) = λX , nonlinear system, ODE, . . .
Two main strategies to use low-rank tensor techniques:1. Combine existing iterative solver (e.g., CG, LOBPCG, GMRES)
with repeated low-rank truncation of iterates ( inexact CG).I Straightforward to derive and implement (based, e.g., onhtucker).
I Hard to analyze impact of nonnegligible truncations on accuracyand convergence.
I Intermediate rank growth may result in excessive computing timesand/or harm accuracy+convergence.
2. Formulate optimization problem, constrain to low-rank tensors,iteratively optimize wrt individual factors of low-rank format.
I Works well in practice.I Convergence theory not well understood.I Not straightforward to implement.
106
Example: PDE-eigenvalue problemGoal: Compute smallest eigenvalue for
∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.
Assumption: Potential represented as
V (ξ) =s∑
j=1
V (1)j (ξ1)V(2)j (ξ2) · · ·V
(d)j (ξd ).
finite difference discretization
Au = (AL +AV )u = λu,with
AL =d∑
j=1
I ⊗ · · · ⊗ I︸ ︷︷ ︸d−j times
⊗AL ⊗ I ⊗ · · · ⊗ I︸ ︷︷ ︸j−1 times
,
AV =s∑
j=1
A(d)V ,j ⊗ · · · ⊗ A(2)V ,j ⊗ A
(1)V ,j .
107
LOBPCG methodLOBPCG with block size 1 [Knyazev’01] for computing smallesteigenvalue of
Ax = λx , A symmetric.
λ0 = 〈x0, x0〉A, p0 = 0for k = 0,1, . . . (until converged) do
rk = B−1(Axk − λk x)U =
[xk , rk , pk
]Ã = UT AU, M̃ = UT UFind eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil Ã− λM̃.pk+1 = y2 · rk + y3 · pkxk+1 = y1 · xk + pk+1xk+1 ← xk+1/‖xk+1‖2
end forReturn (λmin, x) = (λk+1, xk+1).
108
Tensor low-rank LOBPCGTruncated LOBPCG with block size 1 for computing smallesteigenvalue of
A(X ) = λX , A symmetric, X tensor.
λ0 = 〈X0,X0〉A, P0 = 0 · Xfor k = 0,1, . . . (until converged) doRk = B−1(A(Xk )− λkXk ), Rk ← T (Rk )U1 = Xk , U2 = Rk , U3 = PkÃij = 〈Ui ,Uj〉A, M̃ij = 〈Ui ,Uj〉Find eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil Ã− λM̃.Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1)Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1)Xk+1 ← Xk+1/
√〈Xk+1,Xk+1〉
end forReturn (λmin,X ) = (λk+1,Xk+1).
T = truncation to hierarchical low rank
109
Implementation details
OrthogonalizationIn standard LOBPCG, orthogonalization of U is recommended[Knyazev 2010]. This is not practical with low-rank tensors, as rankswould grow and truncation would destroy orthogonality.
TruncationXk ,Rk ,Pk are truncated in every step. Moreover, application of A(·)and preconditioner B−1(·) may involve truncation during theapplication of these operators.
Inner productReduced matrix à is very sensitive to truncation in A(·). Thecomputation of Ãi,j = 〈Ui ,Uj〉A must be exact.
110
Numerical Experiments - Sine potential
PDE-eigenvalue problem with Ω = [0, π]d and sine potential
V (ξ) = q ·d∏
i=1
sin(ξi )
for some constant q > 0. We choose d = 10, n = 128.
Preconditioner: [Grasedyck 2004]
A−1L =∫ ∞
0exp(−tAL)dt
≈M∑
j=−M
ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA(1)L ) =: B
−1,
for a certain, optimized and tabulated choice of coefficients αj , ωj > 0.We choose M = 10.
111
Numerical Experiments - Sine potential
q = 1
0 10 20 30 4010
−8
10−6
10−4
10−2
100
102
104
Re
sid
ua
l
Iterations
0 10 20 30 400
10
20
30
40
50
Ma
xim
al ra
nk
eps 1e−2
eps 1e−4
eps 1e−8
q = 1000
0 10 20 30 4010
−8
10−6
10−4
10−2
100
102
104
Re
sid
ua
l
Iterations
0 10 20 30 400
10
20
30
40
50
Ma
xim
al ra
nk
eps 1e−2
eps 1e−4
112
ALSOriginally from computational quantum physics [Schollwöck 2011],recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij2010; Holtz et al. 2010; Dolgov, Oseledets 2011]
Goal:
min{ 〈X ,A(X )〉〈X ,X〉
: X ∈ H-Tucker((rt )t∈T
), X 6= 0
}Method: Choose one node t , fix all other nodes, set new tensor atnode t to minimize Rayleigh quotient 〈X ,A(X )〉〈X ,X〉 . This is done for allnodes (a sweep), and sweeps are continued until convergence.
Sketch:
X (t) = UtV Tt =(Utr ⊗ Utl
)BtV Tt ,
vec(X ) =(Vt ⊗ Utr ⊗ Utl
)vec(Bt ) = Ut vec(Bt ).
⇒ min{
yT (UTt AUt )yyT (UTt Ut )y
: y ∈ Rrtl rtr rt , y 6= 0}.
113
Computation of reduced matrices
Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly)
Compute
Ãt := UTt AUt =(Vt ⊗ Utr ⊗ Utl
)TA(Vt ⊗ Utr ⊗ Utl ) = Ât ⊗ Ãtr ⊗ Ãtl ,where
Ãtl = UTtl
(⊗i∈tl
Ai)
Utl , Ãtr = UTtr
(⊗i∈tr
Ai)
Utr , Ât = VTt
(⊗i 6∈t
Ai)
Vt .
Additionally
M̃t := UTt Ut = V Tt Vt ⊗ UTtr Utr ⊗ UTtl Utl = Mt ⊗Mtr ⊗Mtl ,
114
Computation of reduced matrices
A1 A3 A5 A6 A7 A8A2 A4
Ã12 Ã34
Â1234
115
MALS
Method:I Select edge of tensor network.I Combine tensors at the adjacent nodes to form a higher-order
tensor.I Set this tensor to minimize the Rayleigh quotient.I Use low-rank approximation to split new combined tensor into
two tensors at adjacent nodes of selected edge.
116
MALS - Illustration
117
Numerical Experiments – Sine potential
PDE-eigenvalue problem with Ω = [0, π]d and sine potential
V (ξ) = q ·d∏
i=1
sin(ξi )
for some constant q > 0. Choose d = 10, n = 128, q = 1000.Preconditioner: [Grasedyck 2004]
A−1L =∫ ∞
0exp(−tAL)dt
≈M∑
j=−M
ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA(1)L ) =: B
−1,
for a certain, optimized choice of coefficients αj , ωj > 0. We chooseM = 10.
118
Numerical Experiments – Sine potential
ALS
0 100 200 300 400 50010
−15
10−10
10−5
100
105
Execution time [s]
0 100 200 300 400 50015
20
25
30
35
40
45err_lambda
res
nr_iter
Hierarchical ranks 40.
MALS
0 100 200 300 400 50010
−15
10−10
10−5
100
105
Execution time [s]
0 100 200 300 400 5000
20
40
60
80
100err_lambda
res
eps
rank
nr_iter
Maximal hierarchical rank 30.
119
Conclusions and Outlook
120
Conclusions and OutlookI Scientific computing with low-rank tensors rapidly evolving field
and highly technical.I Precise scope of applications far from clear; many applications
remain to be explored. More analysis and comparison toalternative techniques (sparse grids, adaptive tensordiscretization, Monte Carlo, . . .) needed.
Some current trends:I Tensorization of vectors + low rank (discrete Chebfun?) by
Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .I Computational differential geometry on low-rank tensor manifolds
by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .I Robust low rank (Candes et al.) for tensors suitable way of
dealing with singularities?I . . .
Acknowledgments: Presentation heavily benefited from joint workwith Christine Tobler (ETH Zurich).
121