Quantum and classical information processing with tensors
Caltech: ACM 270-1, Spring 2019Richard Kueng & Joel Tropp
Abstract
We approach important concepts in quantum and classical information process-ing from a tensor perspective. We establish a rigorous mathematical framework fortensor calculus and introduce a versatile graphical formalism โ wiring diagrams.Subsequently, we apply these ideas to a variety of timely topics.
These lecture notes accompany ACM 270-1 at Caltech (spring 2019) โ a specialtopics course aimed at mathematically inclined students from physics, math, com-puter science and electrical engineering. No quantum background is required, butfamiliarity with linear algebra is essential.
Contents
1 Lecture notes 11.1 Classical probability theory and quantum mechanics . . . . . . . . . . . 11.2 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Wiring calculus and entanglement . . . . . . . . . . . . . . . . . . . . . 181.4 Symmetric and antisymmetric tensors . . . . . . . . . . . . . . . . . . . 281.5 Haar integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.6 Entanglement is ubiquitous . . . . . . . . . . . . . . . . . . . . . . . . . 441.7 Classical reversible circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 531.8 Quantum circuits and quantum computing . . . . . . . . . . . . . . . . 611.9 Matrix rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711.10 Tensor rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771.11 Strassenโs algorithm for matrix multiplication . . . . . . . . . . . . . . . 861.12 Tensorial aspects of matrix multiplication . . . . . . . . . . . . . . . . . 931.13 The CP decomposition for tensors . . . . . . . . . . . . . . . . . . . . . 1011.14 The Tucker decomposition for tensors . . . . . . . . . . . . . . . . . . . 1121.15 Tensor train decompositions I . . . . . . . . . . . . . . . . . . . . . . . . 1201.16 Tensor train decomposition II . . . . . . . . . . . . . . . . . . . . . . . . 1281.17 Tensor train algorithms (DMRG lite) . . . . . . . . . . . . . . . . . . . . 137
2 Exercises 1462.1 Homework I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1462.2 Homework II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1512.3 Homework III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Lecture 01: Classical probability theory and quantummechanics
Scribe: Florian Schรคfer
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 1, 2019
1 Agenda1. Linear and semidefinite programming2. Classical (discrete) probability theory3. Postulates of quantum mechanics4. Distinguishing classical probability distributions and the maximum likelihood rule5. Distinguishing quantum distributions and the Holevo-Helstrom Theorem
2 Linear and semidefinite programmingIn the first lecture we will show that the difference between classical probability theoryand quantum mechanics is a direct analogue of the difference between linearโ andsemidefinite programming. We begin by introducing the latter two concepts.2.1 Linear programming
We endow the space R๐ with the standard inner product
โจ๐ฅ, ๐ฆโฉ =๐โ
๐=1๐ฅ๐๐ฆ๐
and define the non-negative orthant
R๐+ = {๐ฅ โ R๐ : ๐ฅ๐ โฅ 0, 1 โค ๐ โค ๐}.
This induces a partial order on R๐ given by
๐ฅ โฅ ๐ฆ โ ๐ฅ โ ๐ฆ โ R๐+ โ ๐ฅ๐ โฅ ๐ฆ๐, 1 โค ๐ โค ๐.
A linear program (LP) is an optimization problem of the following form:
maximize๐ฅโR๐
โจ๐, ๐ฅโฉ
subject to โจ๐๐, ๐ฅโฉโฉ = ๐๐ 1 โค ๐ โค ๐,
๐ฅ โฅ 0.
The vectors ๐, ๐1, . . . , ๐๐ โ R๐ and numbers ๐1, . . . , ๐๐ โ R completely specify theproblem. Problems of this form can be solved efficiently. Linear programming is apowerful technique from both an analytical and computational point of view.
2
2.2 Semidefinite programming
We denote the space of ๐ ร ๐ hermitian matrices as H๐ = {๐ โ C๐ร๐ : ๐* = ๐} andendow it with the Frobenius (or Hilbert-Schmidt) inner product
(๐, ๐ ) = tr(๐๐ ).
Remark 2.1. We note that while members of H๐ can have complex entries, H๐ is notclosed under multiplication with complex numbers and thus forms a ๐2-dimensionalvectorspace over the real numbers.
A matrix ๐ โ H๐ is positive semidefinite (p.s.d.), if โจ๐ฅ, ๐๐ฅโฉ โฅ 0 for all ๐ฅ โ C๐.The set of psd matrices H๐
+ โ H๐ forms a convex cone (H๐ is closed under convexmixtures and mutliplication with non-negative scalars). This cone induces the followingpartial ordering on H๐:
๐ โชฐ ๐ โ ๐ โ ๐ โ H๐+.
We succinctly write ๐ โชฐ 0 to indicate that ๐ โ H๐ is psd.A semidefinite program (SDP) is an optimization program of the following form
maximize๐โH๐
(๐ด, ๐)
subject to (๐ต๐, ๐) = ๐๐ 1 โค ๐ โค ๐,
๐ โชฐ 0.
This optimization is completely specified by the matrices ๐ด, ๐ต1, . . . , ๐ต๐ โ H๐ and ๐numbers ๐1, . . . , ๐๐ โ R.
Like LPs, SDPs are very useful both in theory and practice. We note that LPs andSDPs arose in totally analogous ways from the triples (R๐, โจยท, ยทโฉ, โฅ) and (H๐, (ยท, ยท), โชฐ).We will now show that the difference between classical probability theory and quantummechanics can equally be understood as replacing the former, with the latter triple.
3 Classical, discrete probability theoryProbability theory is modeled by probability triples consisting of a sample space (whichcontains all potential outcomes), a set of events (to which we might want to assignprobabilities), and a probability rule (assigning a probability to each and every event).In the setting of discrete probability theory, the set of all possible outcomes is finite(|ฮฉ| = ๐). In this case, we can simply choose the power set of ฮฉ as the set of eventsand correspondingly, the probability triple is fully characterized by a probability densityvector that assigns a probability to each outcome in ฮฉ. Let 1 = (1, . . . , 1)๐ denote theall-ones vector in R๐
Definition 3.1 (probability density). A probability density vector is a vector
๐ =
โโโ
๐1...
๐๐
โโโ โ R๐ : ๐ โฅ 0, โจ1, ๐โฉ =
๐โ
๐=1๐๐ = 1.
3
Probability theory is concerned with characterizing the likelihood of events or,equivalently, the distribution of measurement outcomes.
Definition 3.2 (measurement). Measurements are resolutions of the identity (vector):
{โ๐ : ๐ โ ๐ด} โ R๐ : โ๐ โฅ 0, ๐ โ ๐ด andโ
๐โ๐ด
โ๐ = 1.
Here, ๐ด is a (finite) set of potential measurement outcomes.
We still need a final ingredient to describe how probability densities (as vectors inR๐) and measurements {โ๐ : ๐ โ ๐ด} relate to the probability of different measurementoutcomes.
Definition 3.3 (probability rule). For a probability density ๐ โ R๐ and a measurement{โ๐ : ๐ โ ๐ด} โ R๐ define the probability rule
Pr[๐|๐] = โจโ๐, ๐โฉ, for all ๐ โ ๐ด.
This assigns a probability to each possible outcome ๐ of the measurement.
Example 3.4 (Fair dice roll). The probability density of a fair dice roll is a flat distributionover 6 potential events: ๐ = 1
61 โ R6. Suppose that we wish to test whether a singledice roll results in either, {1, 2}, {3, 4}, or {5, 6}. This measurement may be associatedwith the following resolution of identity:
โ{1,2} =
โโโโโโโโโ
110000
โโโโโโโโโ
, โ{3,4} =
โโโโโโโโโ
001100
โโโโโโโโโ
, โ{5,6} =
โโโโโโโโโ
000011
โโโโโโโโโ
,
The probability rule then readily implies:
Pr[{1, 2}|๐] = Pr[{3, 4}|๐] = Pr[{5, 6}|๐] = 13 .
We introduce the probability simplex in R๐,
ฮ๐โ1 :={
๐ฅ โ R๐ : ๐ฅ โฅ 0, โจ1, ๐ฅโฉ = 1}
,
and observe that it equal to the convex hull of the standard basis vectors ๐1 =(1, 0, . . . , 0)๐ , . . . , ๐๐ = (0, . . . , 0, 1)๐ :
ฮ๐โ1 = conv{๐1, . . . , ๐๐}.
Definition 3.5. A probability distribution ๐ โ ฮ๐โ1 is called pure, if it is an extremepoint of ฮ๐โ1. This is the case if and only if the probability distribution is deterministic.
The essential concepts of classical probability theory are summarized in Table 1
4
Concept Explanation Mathematical formulationprobability density normalized, non-negative vectors ๐ โ R๐ ๐ โฅ 0, โจ1, ๐โฉ = 1measurement resolution of the identity {โ๐ : ๐ โ ๐ด} โ๐ โฅ 0, โ๐โ๐ด โ๐ = 1probability rule standard inner product Pr[๐|๐] = โจโ๐, ๐โฉ
Table 1 Axioms for classical probability theory: The structure of discrete probability theoryis captured by the following geometric configuration: R๐ endowed with the partial order โฅand the identity element 1 = (1, . . . , 1)๐ . This closely resembles linear programming.
4 Quantum MechanicsThe postulates of quantum mechanics naturally arise from an extension of classicalprobability theory. Replace the triple
(R๐, โฅ, 1
), by the triple
(H๐, โชฐ, I
).
The analogous object to a probability density vector is a probability density matrix.
Definition 4.1 (density matrix). The state of a ๐-dimensional quantum mechanical systemis fully described by a density matrix
๐ โ H๐ : ๐ โชฐ 0, (I, ๐) = tr(๐) = 1.
In analogy to measurements in classical probability theory, we define a quantummeasurement as follows.
Definition 4.2 (measurement). A measurement is a resolution of the identity (matrix):
{๐ป๐ : ๐ โ ๐ด} : ๐ป๐ โชฐ 0, ๐ โ ๐ด,โ
๐โ๐ด
๐ป๐ = I.
If a measurement {๐ป๐, ๐ โ ๐ด} is performed on a quantum mechanical system withdensity matrix ๐, the following two things happen.
1. (Bornโs rule) We obtain a random measurement outcome that is distributedaccording to
P[๐|๐] = (๐ป๐, ๐).
2. The quantum system ceases to exist.
The fundamental axioms of quantum mechanics are a straightforward generalizationof classical probability theory, see Table 2. The transition from classical to quantumprobability theory resembles a transition from linear to semidefinite programming.
Example 4.3 (Stern-Gerlach experiment). Fix ๐ = 2 (single โspinโ) and consider the densitymatrix
๐ =(
1 00 0
)
5
Concept Explanation Mathematical formulationProbability density normalized, psd matrix ๐ โ H๐ ๐ โชฐ 0, (I, ๐) = 1measurement resolution of the identity {๐ป๐ : ๐ โ ๐ด} ๐ป๐ โชฐ 0, โ๐โ๐ด ๐ป๐ = Iprobability rule standard inner product Pr[๐|๐] = (๐ป๐, ๐)
Table 2 Axioms for quantum mechanics: The structure of quantum mechanics is captured bythe following geometric configuration: H๐ endowed with the psd order โชฐ and the identitymatrix I. This closely resembles semidefinite programming.
and two distinct potential measurements:
{๐ป
(๐ง)ยฑ}
={
12I ยฑ 1
2
(1 00 โ1
)}={(
1 00 0
),
(0 00 1
)},
{๐ป
(๐ฅ)ยฑ}
={
12I ยฑ 1
2
(0 11 0
)}={
12
(1 11 1
),12
(1 โ1
โ1 1
)}.
The resulting probabilities are then given by
P[+, (๐ง)|๐] =((
1 00 1
),
(1 00 0
))= 1,
P[โ, (๐ง)|๐] = 0,
P[+, (๐ฅ)|๐] =(
12
(1 11 1
),
(1 00 0
))= 1
2 ,
P[โ, (๐ฅ)|๐] = 12 .
This may seem surprising. The state ๐ provides completely deterministic measurementoutcomes for
{๐ป
(๐ง)ยฑ}
. Yet, the outcomes for{
๐ป(๐ฅ)ยฑ}
are completely random. Thisinteresting feature of quantum mechanics is the basis of the famous Stern-Gerlachexperiment (1923).
The union of all possible quantum states form a convex set in H๐:
S(H๐)
={
๐ โ H๐ : ๐ โชฐ 0, (I, ๐) = tr(๐) = 1}
.
This is the quantum analogue of the standard simplex.
Definition 4.4. A density matrix ๐ โ S(H๐) is called pure if it has rank-one, i.e. ๐ = ๐ฅ๐ฅ*
with ๐ฅ โ C๐ normalized to unit Euclidean length.
Pure quantum states correspond to extreme points of the convex set S(H๐) and onecan show
S(H๐) = conv{
๐ฅ๐ฅ* : ๐ฅ โ C๐, โจ๐ฅ, ๐ฅโฉ = 1}
.
6
This is the quantum version of the decomposition of the standard simplex into theconvex hull of its extreme points: ฮ๐โ1 = conv{๐1, . . . , ๐๐}. Classical density vectorsare extreme if and only if they are one-sparse, i.e. only one component is different fromzero. Quantum density matrices are extreme if and only if they have rank-one. Thisis the natural matrix generalization of sparsity: a rank-one matrix is one-sparse in itseigenbasis.
In contrast to pure density vectors (classical), pure density matrices (quantum) arenot necessarily deterministic. We have encountered this feature in Example 4.3.
5 Applications: Maximum likelihood rule and Holevo-Helstrรถm-theoremIn the last two sections we have illustrated the common structure of classical probabilitytheory and quantum mechanics. Extending these parallels, we will now show theoptimality of the maximum likelihood rule, and the Holevo-Helstrรถm theorem.
Both address the task of distinguishing two probability densities in the single-shotlimit.5.1 Distinguishing classical probability distributions and the maximum likelihood
rule
Suppose that we perfectly know descriptions of two probability distributions ๐, ๐ โ R๐
and choose to play the following game: a referee chooses one of these distributionsuniformly at random and hands it to us. We are allowed to perform a single measurementand โ based on its outcome โ we must guess which probability distribution was handedto us. We win the game if the guess was correct, otherwise we lose.
Let us now try to come up with an optimal guessing strategy. Since we are facedwith a binary question, our decision should take the form of a binary measurement:
{โ๐, โ๐} : โ๐ = 1 โ โ๐ and 1 โชฐ โ๐ โชฐ 0.
A brief computation yields the following probability of guessing the distribution correctly,based on this binary measurement: A brief computation yields
๐succ =12Pr[๐|๐] + 1
2Pr[๐|๐] = 12(โจโ๐, ๐โฉ + โจโ๐, ๐โฉ)
=12(โจโ๐, ๐โฉ + โจ1, ๐โฉ โ โจโ๐, ๐โฉ)
=12 + 1
2โจโ๐, ๐ โ ๐โฉ
We may rewrite the inner-product in the last line as โ๐๐=1[โ๐]๐([๐]๐ โ [๐]๐ผ). The factor
1/2 in front of the expression should not be surprising: we can always achieve a successprobability of 1/2 by mere guessing. Optimizing over measurements {โ๐, โ๐} allows usto further improve upon this basic strategy. This optimization problem assumes theform of a linear program:
maximizeโ๐โR๐
12 + 1
2โจ๐ โ ๐, โ๐โฉ
subject to 1 โฅ โ๐ โฅ 0.
7
This linear program is simple enough to solve it analytically. The optimal measurementis
[โโฏ
๐
]๐
={
1, if ๐๐ > ๐๐
0, else.for 1 โค ๐ โค ๐.
The associated guessing strategy is called the maximum likelihood rule: upon observingmeasurement outcome ๐, we guess ๐ if [๐]๐ โฅ [๐]๐ and otherwise ๐. In words: we choosethe distribution that is most likely to provide the outcome that we observed.
The associated optimal success probability is
๐โฏsucc = 1
2 + 12โจโ๐, ๐ โ ๐โฉ = 1
2 + 14
๐โ
๐=1|๐๐ โ ๐๐| = 1
2 + 14โ๐ โ ๐โโ1
and the bias โ the amount by which we improve over the naive guessing strategy โ isproportional to the total variational distance 1
2โ๐ โ ๐โโ1 of the distributions.5.2 Distinguishing quantum states and the Holevo-Helstrom TheoremLet us now consider the analogous problem in the quantum setting. A referee hands usa black box that contains one of two quantum states: ๐ or ๐. Assume that we knowthe density matrices associated with both states and the referee chooses each of themwith equal probability.
Similarly to before, we are allowed to perform a single quantum measurement toguess which state we obtained. Note that this single-shot limit is very appropriate here.A quantum measurement necessarily destroys the quantum state.
Again, we can base our guessing rule on a two-outcome measurement (the questionis binary):
๐ป๐, ๐ป๐ = I โ ๐ป๐.
If we observe ๐, we guess ๐, otherwise we guess ๐. In analogy to the last section, wecompute the success probability associated with such a guessing strategy:
๐succ =12Pr[๐ป๐|๐] + 1
2Pr[๐ป๐|๐] = 12(๐ป๐, ๐) + 1
2(๐ป๐, ๐)
=12((๐ป๐, ๐) + (I, ๐) โ (๐ป๐, ๐))
=12 + 1
2(๐ป๐, ๐ โ ๐)
Next, we optimize this expression over all possible choices of measurements:
maximize๐ป๐โH๐
12 + 1
2(๐ป๐, ๐ โ ๐)
subject to I โชฐ ๐ป๐ โชฐ 0.
This is a semidefinite program that is simple enough to solve analytically. Apply aneigenvalue decomposition to ๐ = ๐ โ ๐ = โ๐
๐=1 ๐๐๐ฅ๐๐ฅ*๐ . Set ๐+ = โ๐
๐=1 I{๐๐ > 0}๐ฅ๐๐ฅ*๐
and ๐โ = โ๐๐=1 I{๐ < 0}๐ฅ๐๐ฅ
*๐ . These are orthogonal projectors onto the positive- and
8
negative ranges of ๐ = ๐ โ ๐. They are the natural generalizations of the maximumlikelihood rule to the quantum setting. In particular, the choice ๐ปโฏ
๐ = ๐+ is optimaland results in the following optimal success probability:
๐โฏsucc = 1
2 + 14โ๐ โ ๐โ*
Here, โ ยทโ* denotes the nuclear (or trace) norm. It is the natural quantum generalizationof the total variational distance.
Theorem 5.1 (Holevo-Helstrom). The optimal success probability for distinguishing twoquantum states ๐, ๐ โ H๐ with a single measurement is
๐โฏsucc = 1
2 + 14โ๐ โ ๐โ1.
The optimal measurement is the projector onto the positive range of ๐ โ ๐ and dependson the states in question.
This observation dates back to Holevo1 (1973) and Helstrom (1976) and plays aprominent role in modern quantum information theory. For instance, when estimatingdensity matrices from experimental observations, error bars are typically reported inthe nuclear norm.
1Alexander Holevo received the Claude E. Shannon Award in 2016 for his outstanding contributionsto quantum information theory.
Lecture 02: Tensor productsScribe: Chung-Yi Lin
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 03, 2017
1 Agenda1. Natural axioms for vector multiplication2. Bipartite tensor product spaces ๐ปโ2
3. Operators on ๐ปโ2
4. Multi-partite tensor product spaces ๐ปโ๐
5. Operators on ๐ปโ๐
2 Axiomatic approach to vector products2.1 Natural axioms for vector multiplicationTensor products are motivated by the following basic and natural question: What doesit mean to multiply vectors?
In order to answer this question, we turn to scalar multiplication for guidance. LetF be a field, e.g. R or C. Then, the scalar product is additive, homogeneous, has azero-element and is faithful, as well as symmetric. We refer to Definition 2.1 for aprecise definition of these properties.
Now, let ๐ป be a ๐-dimensional inner product space over F = R, or F = C, equippedwith an inner product1
โจ๐ฅ, ๐ฆโฉ =๐โ
๐=1๏ฟฝ๏ฟฝ๐๐ฆ๐ for ๐ฅ, ๐ฆ โ ๐ป.
Based on our intuition about scalar multiplication, we postulate the following โnaturalโset of properties.
Definition 2.1 (Axioms for vector multiplication). A well-defined product ๐ฅ โ ๐ฆ of vectors๐ฅ, ๐ฆ โ ๐ป should obey the following properties:
1. Additivity: for all ๐ฅ, ๐ฆ, ๐ง โ ๐ป:
(๐ฅ + ๐ฆ) โ ๐ง = ๐ฅ โ ๐ง + ๐ฆ โ ๐ง and ๐ฅ โ (๐ฆ + ๐ง) = ๐ฅ โ ๐ฆ + ๐ฅ โ ๐ง.
2. Homogeneity: for all ๐ฅ, ๐ฆ โ ๐ป and ๐ผ โ F:
๐ผ(๐ฅ โ ๐ฆ) = (๐ผ๐ฅ) โ ๐ฆ = ๐ฅ โ (๐ผ๐ฆ).1In contrast to widespread mathematical convention, we define the inner product to be linear in the
second argument. This convention will considerably simplify analysis throughout the course of theselectures.
2
3. Zero property of multiplication: Let 0 โ ๐ป denote the zero element. Then for all๐ฅ, ๐ฆ โ ๐ป,
๐ฅ โ 0 = 0 โ ๐ฆ = 0.
4. Faithfulness: If ๐ฅ โ ๐ฆ = 0, then either ๐ฅ = 0, or ๐ฆ = 0.These axioms naturally generalize to multiplication of more than two vectors.Remark 2.2. Note that we have excluded symmetry from this list. A symmetric vectorproduct also obeys ๐ฅ โ ๐ฆ = ๐ฆ โ ๐ฅ. We will discuss such a symmetric vector product inLecture 4.
After cornering these natural properties, let us analyze familiar notions of vectorproducts.
1. the dot product: ๐ป ร ๐ป โ F, where (๐ฅ, ๐ฆ) โฆโ โจ๐ฅ, ๐ฆโฉ. The dot product obeysproperties 1. 2. and 3. but is not faithful: two non-zero orthogonal vectors havea vanishing dot-product.
2. the Schur/Hadamard product: ๐ป ร ๐ป โ ๐ป, where (๐ฅ, ๐ฆ) โฆโ [๐ฅ๐๐ฆ๐]๐๐=1. This vectorproduct obeys properties 1. 2. and 3. but is not faithful. The Schur product oftwo non-zero vectors with disjoint supports vanishes.
3. the Outer product: ๐ป ร ๐ป โ โ(๐ป), where (๐ฅ, ๐ฆ) โฆโ ๐ฅ๐ฆ๐ . This product fulfillsall the axioms from Definition 2.1. However, it is not obvious how to generalizethe outer product to more than two vectors.
2.2 Axiomatic approach to bipartite tensor productsLet โโโ denote a product operation that obeys the first three properties of Definition 2.1.For ๐ฅ, ๐ฆ โ ๐ป, we define the elementary tensor product ๐ฅ โ ๐ฆ. For now, we regard thisas a formal product of ๐ฅ and ๐ฆ. The tensor product space ๐ปโ2 contains all formal2linear combinations
๐ =๐โ
๐=1๐ผ๐๐ฅ๐ โ ๐ฆ๐ ๐ โ N, ๐ฅ๐, ๐ฆ๐ โ ๐ป, ๐ผ๐ โ F. (1)
We emphasize that these formal representations of ๐ are not unique: the zero propertyimplies that ๐ฅโ0 and 0โ๐ฆ both yield zero. Adding terms of this form in the summation(1) thus does not change ๐ . Moreover, additivity and homogeneity ensure linearity:
(๐ผ๐ฅ + ๐ฝ๐ฆ) โ ๐ง =๐ผ๐ฅ โ ๐ง + ๐ฝ๐ฆ โ ๐ง,
๐ฅ โ (๐ผ๐ฆ + ๐ฝ๐ง) =๐ผ๐ฅ โ ๐ฆ + ๐ฝ๐ฆ โ ๐ง.
This in turn implies that vectors ๐ฅ๐, ๐ฆ๐ โ ๐ป can be further decomposed into differentvectors (e.g. via a basis expansion) and inserting these decompositions into (1) seeminglyleads to a different ๐ โ ๐ปโ2.
We define the set of tensor products as the space of all tensors ๐ modulo theseidentity transformations:
2Formal means, that for now we treat these expressions as a collection of symbols, give them a nameand perform linear combinations.
3
Definition 2.3 (Tensor product space). Let ๐ป be a finite dimensional vector space over Fand let ๐ฅ โ ๐ฆ be a vector product that satisfies Definition 2.1. Then
๐ปโ2 ={
๐โ
๐=1๐ผ๐๐ฅ๐ โ ๐ฆ๐ โ ๐ปโ2 โ๐ โ N, โ๐ผ๐ โ F, โ๐ฅ๐, ๐ฆ๐ โ ๐ป
}/identity.
We emphasize that the definitions and concepts presented in this sub-section naturallygeneralize to products of more than two vectors.
3 Bipartite tensor product space ๐ปโ2
3.1 Bipartite tensor products and bilinear formsIn this section, we define a natural notion of a tensor product. It obeys all axiomsfrom Definition 2.1 and is minimal in the sense that every true statement about tensorproducts can be reduced to these defining properties. No additional structure is present.
Definition 3.1 (Bilinear forms). A bilinear form is a function of the form ๐ต : ๐ป ร ๐ป โ Fwith the following properties:
1. For any ๐ฅ โ ๐ป, ๐ต(๐ฅ, ยท) is a linear functional on ๐ป;2. For any ๐ฆ โ ๐ป, ๐ต(ยท, ๐ฆ) is a linear functional on ๐ป.
Let Bil(๐ป, ๐ป) denote the (linear) space of all bilinear forms.
A concrete model for bilinear forms can be obtained in the following way: Let๐ด = [๐๐,๐ ]๐๐,๐=1 โ F๐ร๐ be a ๐ ร ๐ matrix. Then, we can associate ๐ด with the followingbilinear form:
๐ต๐ด(๐ฅ, ๐ฆ) =๐โ
๐,๐=1๐ฅ๐๐๐,๐๐ฃ๐ .
Note that this identification is bijective. Conversely, fix a bilinear form ๐ต(ยท, ยท) and anorthonormal basis {๐๐}๐
๐=1 of ๐ป. Then, linearity implies
๐ต(๐ฅ, ๐ฆ) = ๐ต
โโ
๐โ
๐=1๐ฅ๐๐๐,
๐โ
๐=1๐ฆ๐๐๐
โโ =
๐โ
๐,๐=1๐ฅ๐๐ต(๐๐, ๐๐)๐ฆ๐
for any ๐ฅ, ๐ฆ โ ๐ป with basis expansion ๐ฅ = โ๐๐=1 ๐ฅ๐๐๐, ๐ฆ = โ๐
๐=1 ๐ฆ๐๐๐ . The ๐2 =(dim(๐ป))2 numbers ๐ต(๐๐, ๐๐) are independent degrees of freedom. We can identify thesedegrees of freedom with entries of a matrix ๐ด = [๐๐,๐ ]๐๐,๐=1 that tabulates the action ofthe bilinear form on different basis vectors: ๐๐,๐ = ๐ต(๐๐, ๐๐).
Definition 3.2 (Tensor product space). Let ๐ป be a (finite dimensional) inner product space.The tensor product space ๐ปโ2 is the dual space of Bil(๐ป, ๐ป). In particular, we identifyelementary tensor products with the following functional:
๐ฅ โ ๐ฆ : ๐ต โฆโ ๐ต(๐ฅ, ๐ฆ) โ F.
4
Fact 3.3. This definition of a tensor product obeys all properties listed in Definition 2.1.The fact that vector spaces and their duals are both linear and have equal dimension
allows us to infer the dimension of ๐ปโ2 via the correspondence between bilinear formsand matrices:
dim(๐ปโ2
)= dim(Bil(๐ป, ๐ป)*) = dim(Bil(๐ป, ๐ป)) = dim
(Fdim(๐ป)รdim(๐ป)
)= dim(๐ป)2.
Moreover, linear extension allows us to define an inner product on ๐ป โ ๐ป that isinduced by the inner product โจยท, ยทโฉ on ๐ป. For elementary tensors ๐ฅ1 โ ๐ฆ1 and ๐ฅ2 โ ๐ฆ2,we define
โจ๐ฅ1 โ ๐ฆ1, ๐ฅ2 โ ๐ฆ2โฉ = โจ๐ฅ1, ๐ฅ2โฉโจ๐ฆ1, ๐ฆ2โฉ โ๐ฅ1, ๐ฅ2 โ ๐ป, โ๐ฆ1, ๐ฆ2 โ ๐ป. (2)
We extend this definition linearly to the space of all linear combinations of elementarytensors (1), i.e. ๐ปโ2.Fact 3.4 (Dimension of tensor products). The tensor product space ๐ปโ2 equipped withinduced the inner product (2) forms an inner product space of dimension dim(๐ปโ2) =dim(๐ป)2.3.2 Concrete realization of ๐ปโ2 as the space of all outer productsSo far, we have presented a construction of ๐ปโ2 as the dual space of bilinear formsBil(๐ป, ๐ป). This is rather abstract, but we can represent ๐ปโ2 as an outer product space.Set ๐ = dim(๐ป) and define the elementary tensors to be outer products of vectors:
๐ฅ โ ๐ฆ := ๐ฅ๐ฆ๐ โ โ(๐ป) โ F๐ร๐.
The linear hull of these outer products corresponds to F๐ร๐, or equivalently, โ(๐ป).Note that dim(โ(๐ป)) = ๐2, in accordance with Fact 3.4. On outer products ๐ฅ๐ฆ๐ , theinduced inner product (ยท, ยท) is
(๐ฅ1๐ฆ๐
1 , ๐ฅ2๐ฆ๐2
):= โจ๐ฅ1, ๐ฅ2โฉโจ๐ฆ1, ๐ฆ2โฉ = tr
((๐ฅ1๐ฆ๐
1)*
๐ฅ2๐ฆ๐2
),
where โtrโ denotes the trace and ๐ด* is the adjoint of ๐ด โ โ(๐ป). By linear extension,this inner product becomes the Frobenius (or Hilbert-Schmidt) inner product (๐ด, ๐ต) =tr(๐ด*๐ต) on โ(๐ป).
4 Operators on ๐ปโ2
4.1 Definition and useful propertiesFor ๐ด, ๐ต โ โ(๐ป), we define ๐ด โ ๐ต โ โ(๐ปโ2) via the following action on elementarytensors:
(๐ด โ ๐ต)(๐ฅ โ ๐ฆ) = (๐ด๐ฅ) โ (๐ต๐ฆ).This action can be extended linearly to all elements of ๐ปโ2. Linear extensions of theform โ๐
๐=1 ๐ผ๐๐ด๐ โ ๐ต๐ form the full set of linear operators on ๐ปโ2. The key property ofthis construction is:
(๐ด โ ๐ต)(๐ถ โ ๐ท) = (๐ด๐ถ) โ (๐ต๐ท). (3)
This property has powerful consequences.
5
Fact 4.1.
1. Let I โ โ(๐ป) be the identity. Then, I โ I is the identity in โ(๐ปโ2).2. Let ๐ด, ๐ต โ โ(๐ป) be invertible. Then, (๐ด โ ๐ต)โ1 =
(๐ดโ1) โ (
๐ตโ1).
3. (๐ด โ ๐ต)โ1 =(๐ดโ1) โ (
๐ตโ1)if and only if ๐ด, ๐ต โ โ(๐ป) are invertible,
4. Let ๐ด* denote the adjoint of ๐ด โ โ(๐ป). Then, (๐ด โ ๐ต)* = (๐ด*) โ (๐ต*).
All these properties readily follow from the definition and the composition rule (3).For instance, the first claim is a consequence of
(I โ I)(๐ด โ ๐ต) = (I๐ด) โ (I๐ต) = ๐ด โ ๐ต = (๐ดI) โ (๐ตI) = (๐ด โ ๐ต)(I โ I).
for all ๐ด, ๐ต โ โ(๐ป). These facts together with the composition rule (3) imply thefollowing persistence property.
Fact 4.2 (Persistence). Fix ๐ด, ๐ต โ โ(๐ป).
1. If ๐ด and ๐ต are positive semidefinite, then so is ๐ด โ ๐ต.2. If ๐ด and ๐ต are self-adjoint, then so is ๐ด โ ๐ต.3. If ๐ด and ๐ต are normal, then so is ๐ด โ ๐ต.4. If ๐ด and ๐ต are unitary, then so is ๐ด โ ๐ต.
Remark 4.3. Converse persistence relations are usually false. The concept of quantumentanglement is closely related to a converse relation of property 1. failing to hold. Werefer to Lecture 3 for details.
The Kronecker product is a concrete model for tensor products of operators.4.2 Spectral TheoryTo ease notational burden, we will restrict attention to tensor product operators of theform ๐ด โ ๐ด. A generalization to asymmetric tensor products ๐ด โ ๐ต is straightforward.4.2.1 Spectral resolutions
Recall that an operator ๐ด โ โ(๐ป) is normal if ๐ด*๐ด = ๐ด๐ด* = I. The spectral theoremimplies that every normal matrix has a spectral resolution:
๐ด =โ
๐
๐๐๐๐. (4)
Here, ๐๐ are (potentially complex-valued) eigenvalues and the ๐๐โs are (mutually)orthogonal projectors that form a particular resolution of the identity: โ
๐ ๐๐ = I and๐๐๐๐ = ๐ฟ๐,๐๐๐.
Fact 4.4 (Spectral resolutions of tensor product operators). Let ๐ด โ โ(๐ป) be a normalmatrix with spectral resolution (4). Then,
๐ด โ ๐ด =โ
๐,๐
๐๐๐๐๐๐ โ ๐๐
is a spectral resolution of the tensor product.
6
4.2.2 Singular value decompositions of tensor product operators
Fact 4.5 (Singular value decompostions of tensor product operators). Let ๐ด = ๐ฮฃ๐ *
be a SVD of ๐ด โ โ(๐ป). Then,
๐ด โ ๐ด = (๐ โ ๐)(ฮฃ โ ฮฃ)(๐ โ ๐ )*
is a SVD of ๐ด โ ๐ด. In particular, the singular values are ๐๐๐๐ with 1 โค ๐, ๐ โค ๐.
4.2.3 Eigenvalue decompositions of tensor product operators
Recall that every operator admits a Schur decomposition
๐ด = ๐๐ ๐*.
Here, ๐ is unitary, and ๐ is upper triangular. The diagonal of ๐ contains all eigenvaluesof ๐ด. Persistence implies that these properties readily extend to tensor products (๐ โ ๐is again upper-triangular with respect to the designated basis used).
Fact 4.6 (Eigenvalues of tensor product operators). Suppose that ๐ด โ โ(๐ป) has eigen-values ๐1, . . . , ๐๐ โ C. Then, ๐ด โ ๐ด has eigenvalues ๐๐๐๐ with 1 โค ๐, ๐ โค ๐.
5 Multi-partite tensor product spaces ๐ปโ๐
5.1 Axiomatic approach to tensor spaces of order ๐ โฅ 3Formally, we introduce elementary ๐-fold tensor
๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐, ๐ฅ1, . . . , ๐ฅ๐ โ ๐ป
and define the space of all ๐-fold tensors by linear extension:
๐ปโ๐ ={
๐โ
๐=1๐ผ๐๐ฅ๐1 โ ยท ยท ยท โ ๐ฅ๐๐
: ๐ โ N, ๐ผ๐ โ F, ๐ฅ๐๐ โ ๐ป
}/identity
Note that this product formalism obeys all desirable axioms for vector multiplication.In particular, the role of zero and faithfulness hold: ๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐ = 0 if and only if๐ฅ๐ = 0 for at least one coordinate 1 โค ๐ โค ๐.5.2 Multi-partite tensor product spaces and multi-linear forms
More concretely, ๐ปโ๐ can be identified with the dual space of the space of all multi-linearforms.
Definition 5.1 (Multi-linear forms). A multi-linear form (of order ๐) is a function
๐ : ๐ปร๐ = ๐ป ร ยท ยท ยท ๐ปโ โ ๐ times
โ F
that is linear in each argument. More precisely:
1. For any ๐ฅ2, . . . , ๐ฅ๐ โ ๐ป, ๐(ยท, ๐ฅ2, . . . , ๐ฅ๐) is a linear functional on ๐ป;
7
...
k. For any ๐ฅ1, . . . , ๐ฅ๐โ1 โ ๐ป, ๐(๐ฅ1, . . . , ๐ฅ๐โ1, ยท) is a linear functional on ๐ป.
Let Multi(๐ปร๐
)denote the (linear) space of all multi-linear forms.
In complete analogy to the bipartite case, we can identify ๐ปโ๐ with the dual spaceof all multi-linear forms (of order ๐).
Definition 5.2 (Tensor product space). Let ๐ป be a (finite dimensional) inner product space.The tensor product space ๐ปโ๐ is the dual space of Multi
(๐ปร๐
). In particular, we
identify elementary tensors with the following functional:
๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐ : ๐ โฆโ ๐(๐ฅ1, . . . , ๐ฅ๐) โ F.
The inner product โจยท, ยทโฉ on ๐ป induces an inner product on ๐ปโ๐. Define
โจ๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐, ๐ฆ1 โ ยท ยท ยท โ ๐ฆ๐โฉ =๐โ
๐=1โจ๐ฅ๐ , ๐ฆ๐โฉ (5)
and extend this definition linearly.
Fact 5.3. Let ๐1, . . . , ๐๐ be an orthonormal basis of ๐ป (with respect to โจยท, ยทโฉ). Then,
{๐๐1 โ ยท ยท ยท โ ๐๐๐: 1 โค ๐1, . . . , ๐๐ โค ๐}
is an orthonormal basis of ๐ปโ๐ (with respect to the extended inner product (5)).
Corollary 5.4. Set ๐ = dim(๐ป). Then, dim(๐ปโ๐
)= ๐๐.
The dimension of tensor product spaces grows exponentially with the order. This isa veritable curse of dimensionality.
6 Operators on ๐ปโ๐
For ๐ด1, . . . , ๐ด๐ โ โ(๐ป) we define tensor product operators ๐ด1 โ ยท ยท ยท โ ๐ด๐ โ โ(๐ปโ๐
)
via their action on elementary tensors
(๐ด1 โ ยท ยท ยท โ ๐ด๐)(๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐) = (๐ด๐ฅ1) โ ยท ยท ยท โ (๐ด๐ฅ๐)
and extend this definition by linearity. The results form Section 4 generalize natu-rally to this ๐-fold setting. This, in particular, includes the composition rule. For๐ด1, . . . , ๐ด๐, ๐ต1, . . . , ๐ต๐ โ โ(๐ป),
(๐ด1 โ ยท ยท ยท โ ๐ด๐)(๐ต1 โ ยท ยท ยท โ ๐ต๐) = (๐ด1๐ต1) โ ยท ยท ยท โ (๐ด๐๐ต๐).
This composition rule implies persistence. For instance, let I โ โ(๐ป) be the identity.Then, I โ ยท ยท ยท โ I is the identity on โ(๐ปโ๐).
8
Fact 6.1 (Persistence). Fix ๐ด1, . . . , ๐ด๐ โ โ(๐ป).
1. If ๐ด1, . . . , ๐ด๐ are positive semidefinite, then so is ๐ด1 โ ยท ยท ยท โ ๐ด๐.2. If ๐ด1, . . . , ๐ด๐ are self-adjoint, then so is ๐ด1 โ ยท ยท ยท โ ๐ด๐.3. If ๐ด1, . . . , ๐ด๐ are normal, then so is ๐ด1 โ ยท ยท ยท โ ๐ด๐.4. If ๐ด1, . . . , ๐ด๐ are unitary, then so is ๐ด1 โ ยท ยท ยท โ ๐ด๐.
The insights about spectral resolutions and decompositions also generalize in astraightforward way.
Fact 6.2. Fix ๐ด โ โ(๐ป) and write ๐ดโ๐ = ๐ด โ ยท ยท ยท โ ๐ด.
1. Suppose that ๐ด is normal with spectral resolution ๐ด = โ๐ ๐๐๐๐. Then,
๐ดโ๐ =โ
๐1,...,๐๐
๐๐1 ยท ยท ยท ๐๐๐๐๐1 โ ยท ยท ยท โ ๐๐๐
is again a spectral resolution.2. Let ๐ด = ๐ฮฃ๐ * be a singular value decomposition with singular values ๐1, . . . ๐๐.
Then, (๐ โ ยท ยท ยท โ ๐)(ฮฃ โ ยท ยท ยท โ ฮฃ)(๐ โ ยท ยท ยท โ ๐ )* is a singular value decomposi-tion of ๐ดโ๐. In particular, the singular values are ๐๐1 ยท ยท ยท ๐๐๐
for 1 โค ๐1, . . . , ๐๐ โค ๐.3. Suppose that ๐ด has eigenvalues ๐1, . . . , ๐๐ โ C. Then, the eigenvalues of ๐ดโ๐ are
๐๐1 ยท ยท ยท ๐๐๐for 1 โค ๐1, . . . , ๐๐ โค ๐.
Lecture 03: Wiring calculus and entanglementScribe: Erika Ye
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 8, 2019
1 Agenda1. Wiring calculus2. Joint classical probability distributions3. Joint quantum distributions4. Entanglement and the Positive-Partial-Transpose (PPT) test
2 Wiring CalculusWiring calculus is a graphical formalism that is designed to deal with index contractionsamong tensors. It has been used in various fields, such as physics (Feynman andPenrose), representation theory (Cvitanovic), knot theory (Bar-Natan and Kontsevich),quantum groups (Reshetikhin), and category theory (Deligne and Vogel). More recently,it has become popular in the field of tensor networks. (We refer to the excellent lecturenotes by Bridgeman and Chubb for further information and reading.) Here, we willfocus on wiring calculus developed for tensor representation and manipulation.2.1 Wiring diagrams for vectors and adjointsLet ๐ป be a ๐-dimensional vector space with designated inner product โจยท, ยทโฉ. Let ๐1, . . . , ๐๐
denote a designated orthonormal basis of ๐ป. The basic building blocks of wiring calculusare boxes for standard basis vectors and their (basis-dependent) transposes:
๐๐ = ๐๐ and ๐๐๐ = ๐๐ for 1 โค ๐ โค ๐.
We extend both definitions in a linear and anti-linear fashion to all of ๐ป:
๐ฅ =๐โ
๐=1๐ฅ๐
(๐๐
)and ๐ฅ =
๐โ
๐=1๏ฟฝ๏ฟฝ๐
(๐๐
)(1)
This convention is crucial. Boxes with an emanating line towards the left are standardvectors, while boxes with an emanating line towards the right are adjoint vectors. Thisconvention is designed to appropriately capture contractions, like the inner product:
โจ๐ฅ, ๐ฆโฉ =๐โ
๐,๐=1๏ฟฝ๏ฟฝ๐๐ฆ๐โจ๐๐, ๐๐โฉ =
๐โ
๐=1
๐โ
๐=1๏ฟฝ๏ฟฝ๐๐ฆ๐ ๐๐ ๐๐ = ๐ฅ ๐ฆ (2)
Transposition corresponds to bending an emanating line into the opposite direction. Forstandard basis vectors, we define
๐๐ =(
๐๐
)๐
= ๐๐ and ๐๐ =(
๐๐
)๐
= ๐๐ .
2
and extend this action linearly to ๐ป:
๐ฅ =๐โ
๐=1๐ฅ๐
โโโโ ๐๐
โโโโ =
๐โ
๐=1๐ฅ๐
(๐๐
)=
๐โ
๐=1๐ฅ๐
(๐๐
)= ๏ฟฝ๏ฟฝ
Similarly:๐ฅ = ๏ฟฝ๏ฟฝ .
It is easy to see that doing the transpose twice returns the original vector.
Remark 2.1. Transposition is basis-dependent and turns column vectors (boxes withlines emanating to the left) into row vectors (boxes with lines emanating to the left).Importantly, it does not conjugate the vector entries.
The motivation behind this graphical formalism is as follows: Vectors can be thoughtof as 1-dimensional arrays ๐ฅ = [๐ฅ๐]๐๐=1. They correspond to ๐ numbers labeled by anindex 1 โค ๐ โค ๐. The outgoing lines in Equation (1) represent a free index. The directionof the line tells us whether we should think of the array as a column, or row-vector. Theinner product (2) is an index contraction. It perfectly aligns the indices associated witha row-vector and a column vector: โจ๐ฅ, ๐ฆโฉ = โ๐
๐=1 ๏ฟฝ๏ฟฝ๐๐ฆ๐. A closed line represents such acontraction pictorially. This graphically mimics the Einstein summation convention:โจ๐ฅ, ๐ฆโฉ = ๏ฟฝ๏ฟฝ๐๐ฆ
๐, where the location of the index tells us whether the object is a contra- orco-variant vector and it is implicitly assumed that one sums over indices that appeartwice.2.2 Wiring diagrams for operatorsThe wiring diagram formalism readily and consistently extends to operators (matrices).An operator ๐ด โ โ(๐ป) โeatsโ a vector ๐ฅ and spits out another vector in ๐ป. In wiringcalculus, we write
๐ด๐ฆ = ๐ด ๐ฅ and ๐ด๐ฆ = ๐ด*๐ฅ ,
where ๐ด* โ โ(๐ป) is the adjoint of ๐ด. Operators are represented by boxes with twoemanating indices. This is consistent with the array interpretation. Operators may becharacterized by matrices ๐ด = [๐๐๐ ]๐๐,๐=1 which are 2-dimensional arrays. The two indicescorrespond to two lines that emanate in different directions. Matrix multiplicationcombines two operators and returns a third one: ๐ด๐ต โ โ(๐ป):
๐ด ๐ต = ๐ด ๐ต
A particularly important operator/matrix is the identity ๐ผ โ โ(๐ป). It is characterizedby the unique property of โdoing nothingโ: I๐ฅ = ๐ฅ for all ๐ฅ โ ๐ป. We pictoriallyunderline this by writing
I =
3
There is an alternative explanation for this notation. We can expand I = โ๐๐=1 ๐๐๐
๐๐ .
This action perfectly aligns both emanating indices. This resembles the contractionthat features in the inner product (2).
The trace is the natural index contraction for matrices. It perfectly aligns left- andright-indices: tr(๐ด) = โ๐
๐=1 ๐ด๐๐:
tr(๐ด) = ๐ด = ๐ด .
Finally, the (basis dependent) transpose operation swaps the indices associated with amatrix:
[๐ด๐
]๐๐
= ๐ด๐๐. Pictorially:
๐ด๐ = ๐ด
and it is easy to verify that transposing twice returns the original operator diagram.2.2.1 Graphical proofs for important results in linear algebra
1. Inner products are basis independent: Fix a unitary matrix ๐ โ ๐ป (basis change).Then, for any ๐ฅ, ๐ฆ โ ๐ป
โจ๐๐ฅ, ๐๐ฆโฉ = ๐๐ฅ ๐๐ฆ = ๐ฅ ๐* ๐ ๐ฆ = ๐ฅ I ๐ฆ = ๐ฅ ๐ฆ = โจ๐ฅ, ๐ฆโฉ
2. The trace is cyclic:
๐ด ๐ต =๐ด๐
๐ต = ๐ต ๐ด
3. Outer products are matrices:
๐ฅ ๐ฆ = ๐ฅ๐ฆ*
In particular,
tr(๐ด๐ฅ๐ฆ๐
)= ๐ด ๐ฅ ๐ฆ = ๐ฆ ๐ด ๐ฅ
2.3 Wiring diagrams for tensorsThe wiring formalism readily extends to tensor products. Note that so far, all index lineshave been arranged horizontally. Wiring diagrams for operations in ๐ป may be thought
4
of as wires that connect operations in a serial fashion. Tensor product operations arearranged in a parallel fashion instead:
๐ฅ โ ๐ฆ =๐ฅ
๐ฆand (๐ฅ โ ๐ฆ)* =
๐ฅ
๐ฆ
This definition extends linearly to general tensors on ๐ปโ2:
๐ก =โ
๐
๐๐๐ฅ๐ โ ๐ฆ๐ = ๐ก and ๐ก* =โ
๐
๐๐๐ฅ*๐ โ ๐ฆ*
๐ = ๐ก
The parallel alignment of tensor products ensures that extended scalar product factorizesappropriately for elementary tensors:
โจ๐ฅ1 โ ๐ฆ1, ๐ฅ2 โ ๐ฆ2โฉ =๐ฅ1
๐ฆ1
๐ฅ2
๐ฆ2
This concept extends linearly to more general tensors. Fix ๐ก1 = โ๐1๐=1 ๐ผ๐๐ค๐ โ ๐ฅ๐ and
๐ก2 = โ๐2๐=1 ๐ฝ๐๐ฆ๐ โ ๐ง๐ . Then,
โจ๐ก1, ๐ก2โฉ = ๐ก1 ๐ก2 =๐1โ
๐=1
๐2โ
๐=1๏ฟฝ๏ฟฝ๐๐ฝ๐
๐ค๐
๐ฅ๐
๐ฆ๐
๐ง๐
=๐1โ
๐=1
๐2โ
๐=1๏ฟฝ๏ฟฝ๐๐ฝ๐โจ๐ค๐, ๐ฆ๐โฉโจ๐ฅ๐, ๐ง๐โฉ.
The action of elementary tensor product operators ๐ด โ ๐ต also factorizes appropriately:
(๐ด โ ๐ต)(๐ฅ โ ๐ฆ) = ๐ด
๐ต
๐ฅ
๐ฆ= ๐ด๐ฅ
๐ต๐ฆ.
Similar to general tensor products of vectors, we denote general tensor product operatorsby big boxes with (in total) 4 index lines:
๐ =๐โ
๐=1๐ผ๐
๐ด๐
๐ต๐
The trace on โ(๐ป โ ๐ป) again corresponds to a full index contraction. It aligns in- andout-going indices on both spaces and sums over both. In wiring notation:
tr(๐ ) = ๐ .
5
For โ(๐ป), the trace is the only index contraction. For โ(๐ป โ ๐ป) partial contractionsare also possible (align only one pair of indices). The two options correspond to partialtraces over the first and second tensor factor, respectively:
tr1(๐ ) = ๐ and tr2(๐ ) = ๐ .
More formally, these partial contractions are defined for elementary tensor productoperators ๐ด โ ๐ต and linearly extended to all of โ(๐ป โ ๐ป):
tr1(๐ด โ ๐ต) = ๐ด
๐ต
= tr(๐ด)๐ต โ โ(๐ป),
tr2(๐ด โ ๐ต) = ๐ด
๐ต
= tr(๐ต)๐ด โ โ(๐ป).
Similarly, we define the partial transposes on elementary tensor products
PT1(๐ด โ ๐ต) = ๐ด๐ โ ๐ต and PT2(๐ด โ ๐ต) = ๐ด โ ๐ต๐
and extend them linearly to โ(๐ปโ2). In contrast to the previous definitions, theseoperations are basis dependent. The wiring formula for ordinary transposes readilygeneralizes to partial transposition:
๐ and ๐ . (3)
Finally, we introduce a useful correspondence between โ(๐ป) and ๐ปโ2. Note that bothspaces are linear and have the same dimension. Vectorization is a bijective map fromโ(๐ป) to ๐ปโ2 that makes this correspondence precise. For 1 โค ๐, ๐ โค ๐ define
vec(๐๐๐
๐๐
)= ๐๐ โ ๐๐ โ ๐ปโ2.
The operators ๐ธ๐๐ = ๐๐๐๐๐ form a basis of โ(๐ป) and allow for generalizing this definition
linearly to all โ. In wiring notation,
vec(๐ด) = ๐ด and vec(๐ด)* = ๐ด*
6
This correspondence is basis dependent, but does preserve the natural inner productsassociated with both spaces:
โจvec(๐ด), vec(๐ต)โฉ =
vec(
๐ด) vec(๐ต
)
= ๐ด* ๐ต = tr(๐ด*๐ต) = (๐ด, ๐ต).
This isometry connects the (extended) Euclidean inner product on ๐ปโ2 with theFrobenius inner product on โ(๐ป).Remark 2.2. In Lecture 2, we discussed a concrete realization of ๐ปโ2 as the linear hullof all outer products โ(๐ป). This realization is equivalent to inverting the vectorizationmap.
3 Joint Probability DistributionsRecall that a classical (discrete) probability space is fully characterized by a probabilityvector
๐ โ ฮ๐โ1 ={
๐ฅ โ R๐ : ๐ฅ โฅ 0, โจ1, ๐ฅโฉ = 1}
= conv{๐1, . . . , ๐๐} โ R๐.
Measurements correspond to resolutions of the identity (vector) {โ๐ : ๐ โ ๐ด} โ R๐:โ๐ โฅ 0 for each ๐ โ ๐ด and โ๐โ๐ด โ๐ = 1. The probability rule is given by the standardinner product on R๐:
Pr[๐|๐] = โจโ๐, ๐โฉ for all ๐ โ ๐ด.
This formalism fully describes a single ๐-variate random variable. A natural extensionis to consider joint random variables. Here we restrict ourselves to joint distributionson a pair of ๐-variate random variables. Extensions to different dimensions and moredistributions are straightforward.3.1 Independent distributions
Let ฮ๐โ1 be the standard probability simplex in R๐. Define
ฮ๐โ1โฮ๐โ1 = {๐ โ ๐ : ๐, ๐ โ ฮ๐โ1} โ R๐ โ R๐. (4)
The notation โ underlines that this is not the usual tensor product. We do not allowfor convex (or linear) mixtures, only elementary tensor products feature. This inturn implies that ฮ๐โ1โฮ๐โ1 is not a convex set. The fact that a joint probabilitydistribution corresponds to an elementary tensor product has important consequences.The probability rule associated with such joint distributions factorizes. More precisely,let {โ๐ : ๐ โ ๐ด}, {โ๐ : ๐ โ ๐ด} โ R๐ be two measurements that address the first andsecond random variables respectively. Then, the combined measurement on both randomvariables becomes
{โ๐ โ โ๐ : ๐ โ ๐ด, ๐ต โ ๐ต} โ R๐โR๐
and the (extended) probability rule factorizes:
Pr[๐, ๐|๐ โ ๐] = โจโ๐ โ โ๐, ๐ โ ๐โฉ = โจโ๐, ๐โฉโจโ๐, ๐โฉ = Pr[๐|๐]Pr[๐|๐].
This is the defining property of two independent random variables.
7
Fact 3.1. The set of joint independent probability distributions corresponds to ฮ๐โ1 โฮ๐โ1 = {๐ โ ๐ : ๐, ๐ โ ฮ๐โ1}.
Example 3.2 (two fair coins). The probability vector associated with a fair coin toss is๐ = 2โ11 โ R2. The joint probability distribution of two independent coin tossesbecomes (1
21)
โ(1
21)
โ 141 โ R4.
Example 3.3 (Two deterministic distributions). Let ๐1, . . . , ๐๐ denote the standard basis ofR๐. Then, each distribution of the form ๐ = ๐๐ is deterministic. A joint distribution ofdeterministic probability vectors is independent:
๐joint = ๐๐ โ ๐๐ โ ฮ๐โ1โฮ๐โ1.
4 Correlated random variablesIndependent joint probability distributions are not everything. Correlated distributionsare also possible. Note that every joint probability distribution ๐joint โ R๐ โ R๐ isnecessarily entry-wise positive and normalized. Therefore, it must be contained in theextended probability simplex:
ฮ๐2โ1 ={
๐ก โ R๐ โ R๐ : ๐ก โฅ 0, โจ1 โ 1, ๐กโฉ = 1}
. (5)
Importantly, ฮ๐โ1โฮ๐โ1 โ ฮ๐2โ1 and this inclusion is strict.
Example 4.1. Consider a joint distribution of the form
๐joint = 12(๐1 โ ๐2 + ๐2 โ ๐1) โ ฮ๐2โ1.
It is easy to check that ๐joint /โ ฮ๐โ1 โ ฮ๐โ1.
Definition 4.2. We call a joint probability distribution ๐joint โ R๐ โ R๐ correlated if it isnot independent, i.e. ๐joint /โ ฮ๐โ1โฮ๐โ1.
The following simple argument provides a geometric connection between independentrandom variables and every possible joint probability distribution.
Proposition 4.3. The two sets (4) and (5) have the following relation: conv{ฮ๐โ1โฮ๐โ1
}=
ฮ๐2โ1.
Proof. Persistence ensures ๐ โ ๐ โฅ 0 and โจ1 โ 1, ๐, ๐โฉ = โจ1, ๐โฉโจ1, ๐โฉ = 1 for any๐, ๐ โ ฮ๐โ1. This readily implies ฮ๐โ1โฮ๐โ1 โ ฮ๐2โ1. Conversely, recall that ฮ๐โ1is the convex hull of its extreme points: ฮ๐โ1 = conv{๐1, . . . , ๐๐}. Tensor products ofsuch deterministic distributions are independent and we conclude
conv{ฮ๐โ1โฮ๐โ1
} โ conv{๐๐ โ ๐๐ : 1 โค ๐, ๐ โค ๐} = ฮ๐2โ1,
where the last equation follows from the fact that independent deterministic distributionsconstitute all extreme points of the larger simplex.
8
Corollary 4.4 (Full characterization of joint probability distributions). Joint probabilitydistributions are either independent, or correlated (there is no joint option). Moreover,every correlated distribution corresponds to a convex mixture of independent distributions:
ฮ๐โ1โฮ๐โ1โ โ independent
โ conv{ฮ๐โ1โฮ๐โ1
}โ โ
correlated
= ฮ๐2โ1โ โ everything
.
5 Joint states of bipartite quantum systemsRecall that a single quantum mechanical system is described by a (probability) densitymatrix:
๐ โ{
๐ โ H๐ : ๐ โชฐ 0, (๐ผ, ๐) = tr(๐) = 1}
= S(H๐).
Next consider a joint quantum system that is comprised of two quantum mechanicalparticles. The associated joint density matrix lives in the tensor product H๐ โ H๐ โโ(C๐ โ C๐). In full analogy to the analysis of classical probability distributions, weintroduce the following three sets:
(i) independent joint quantum states: S(H๐)โS(H๐) ={
๐ โ ๐ : ๐, ๐ โ ๐ฎ(H๐)}
,
(ii) Separable joint quantum states: SEP(H๐ โ H๐
):= conv
{S(H๐)โS(H๐)
}. This
set encompasses all quantum state that arise as convex mixtures of independentjoint quantum states.
(iii) all possible states: S(H๐ โ H๐
)={
๐ โ H๐ โ H๐ : ๐ โชฐ 0, (I โ I, ๐) = 1}
.
These sets are related by the following inclusions:
S(H๐)โS(H๐)โ โ independent
โ SEP(H๐ โ H๐
)
โ โ correlated
โ S(H๐ โ H๐)โ โ everything
. (6)
5.1 The Positive Partial Transpose Test
It should not come as a surprise that the first inclusion in Rel. (6) is strict. Correlatedjoint quantum distributions do exist. A more interesting question is whether correlationsspan the entire space of joint density matrices.
A seminal test designed by Horodecki3 and Peres (1996) addresses this question andprovides a necessary condition for joint quantum states to be separable.
Theorem 5.1 (The Positive-Partial-Transpose (PPT) test). Every separable joint quan-tum state ๐joint โ SEP
(H๐ โ H๐
)admits positive semidefinite partial transposes:
PT1(๐joint) โชฐ 0 and PT2(๐joint) โชฐ 0.
Here, PT1, PT2 : H๐ โ H๐ โ H๐ โ H๐ denote the partial transposes (3).
9
Proof. First note that ordinary transposition does not affect positive semidefinitiness.Suppose that ๐ด โ H๐ is positive semidefinite and fix ๐ฆ โ C๐. Then,
โจ๐ฆ, ๐ด๐ ๐ฆโฉ = ๐ด๐๐ฆ ๐ฆ = ๐ด๐ฆ ๐ฆ = ๐ด๐๐ฆ ๐ฆ = โจ๐ฆ, ๐ด๐ฆโฉ
which is always non-negative, because ๐ด โชฐ 0. Next, choose ๐joint โ Sep(H๐ โ H๐
)and
decompose it as ๐joint = โ๐๐=1 ๐๐๐
(1)๐ โ ๐
(2)๐ . Linearity of the partial transpose then
implies
PPT1(๐joint) =๐โ
๐=1๐๐PPT1
(๐
(1)๐ โ ๐
(2)๐
)=
๐โ
๐=1๐(๐
(1)๐
)๐โ ๐
(2)๐ .
This operator is necessarily positive semidefinite, because of persistence and the factthat psd operators form a convex cone. An analogous argument can be made for thepartial transpose over the second tensor product factor.
5.2 EntanglementOne of the most interesting features of quantum distributions is the following discrepancy:convex mixtures of independent quantum distributions do not reach all possible jointquantum distributions. The second inclusion in Rel. (6) is strict!
Lemma 5.2. Set ฮฉ = ๐โ1vec(I)vec(I), where I โ H๐ denotes the identity. Then,ฮฉ โ ๐ฎ
(H๐ โ H๐
), but PPT1(ฮฉ) is not positive semidefinite.
The following statement is an immediate consequence of the PPT-test (Theorem 5.1).
Corollary 5.3. ฮฉ โ ๐ฎ(H๐ โ H๐), but ฮฉ /โ Sep(H๐ โ H๐
).
Proof of Lemma 5.2. Vectorization of the identity matrix assumes a particularly simpleform in wiring calculus:
vec(I) = I = and ฮฉ = 1๐
.
This allows us to compute the partial transpose pictorially:
PPT1(ฮฉ) = 1๐
PT1
( )= = 1
๐.
The operator on the right is called the flip operator : F โ โ(C๐ โ C๐). It acts onelementary tensors by permuting them: F๐ฅ โ ๐ฆ = ๐ฆ โ ๐ฅ. It is easy to check that theflip operator is self-adjoint and also obeys F2 = I โ I. The spectrum of such operatorsmust be contained in {ยฑ1} and the only hermitian+unitary operator with positive
10
eigenvalues (๐ = 1 with ๐2-fold degeneracy) is the identity. Clearly, F is not the identityand therefore it must have negative eigenvalues and cannot be positive semidefinite.More concretely, note that
F(๐ฅ โ ๐ฆ โ ๐ฆ โ ๐ฅ) = โ(๐ฅ โ ๐ฆ โ ๐ฆ โ ๐ฅ).
This observation identifies eigenvectors of F associated to eigenvalue ๐ = โ1.
Definition 5.4 (Entanglement). A joint quantum state ๐joint โ S(H๐ โ H๐
)is called entan-
gled if it is not separable, i.e. ๐joint /โ Sep(H๐ โ H๐
)
The name entanglement dates back to Schrรถdinger who coined this term in a letterto Einstein in 1935 (โVerschrรคnkungโ in German, translated to English by Schrรถdingerhimself).
Entangled states arise from correlations that do not have a classical counter-part.This led physicists to use words like โmysteriousโ and โelusiveโ to describe entangle-ment. Today, this strong form of correlation forms the basis of many โcoolโ quantumtechnologies, like quantum teleportation (homework), super-dense coding, quantum keydistributions and quantum computing.
Lecture 04: Symmetric and antisymmetric tensorsScribe: Erick Moen
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 10, 2019
1 Agenda1. Flip operators2. Symmetric and antisymmetric subspaces of ๐ปโ๐
3. The determinant4. The permanent
2 Symmetric and antisymmetric subspaces of ๐ปโ๐
Let ๐ป denote a ๐-dimensional inner product space with designated orthonormal basis๐1, . . . , ๐๐. The flip operator ๐น : ๐ปโ2 โ ๐ปโ2 permutes elementary tensor products:
๐น ๐ฅ โ ๐ฆ = ๐ฆ โ ๐ฅ for all ๐ฅ, ๐ฆ โ ๐ปโ2.
This action can be extended linearly to all of ๐ปโ2. In wiring calculus, the flip operatorassumes the following form:
๐น = .
It is easy to check that the flip operator is self-adjoint (๐น * = ๐น โ โ(๐ปโ2)) and unitary:
๐น 2 = ๐น ๐น = = = I โ I.
The trace of ๐น can also readily be computed using wiring calculus:
tr(๐น ) = ๐น = = = I = tr(I) = dim(๐ป) = ๐.
Define the following operators ๐ , ๐ โ โ(๐ปโ2):
๐ = 12(Iโ2 + ๐น
)= 1
2
(+
)and ๐ = 1
2(Iโ2 โ ๐น
)= 1
2
(โ
)
By construction, these operators are self-adjoint and have the following properties:
๐ 2 = ๐ , ๐2 = ๐, and ๐ ๐ = 0.
This implies that ๐ and ๐ are orthogonal projectors onto disjoint subspaces of ๐ปโ2.
2
Definition 2.1 (Symmetric and antisymmetric subspaces of ๐ปโ2). The symmetric subspace of๐ปโ2 is the range of the orthogonal projector ๐โจ2 = 1
2(I + ๐น ):โ
๐ = range(๐โจ2) = span{๐ฅ โ ๐ฆ + ๐ฆ โ ๐ฅ : ๐ฅ, ๐ฆ โ ๐ป}.
The antisymmetric subspace of ๐ปโ2 is the range of the orthogonal projector ๐โง2 =12(I โ ๐น ): โ
๐ = range(๐โง2) = span{๐ฅ โ ๐ฆ โ ๐ฆ โ ๐ฅ : ๐ฅ, ๐ฆ โ ๐ป}.
This notation is appropriate. The symmetric subspace contains all tensor productsthat are symmetric under permutation of tensor factors. In contrast, the antisymmetricsubspace contains all tensor products that change sign upon permutations. Thisdecomposition of ๐ปโ2 into symmetric and antisymmetric elements is complete.Proposition 2.2. Suppose that ๐ป has dimension ๐. Then,
๐ปโ2 =โ2โจโ2 and dim
(โ2)
=(
๐ + 12
), dim
(โ2)
=(
๐
2
).
Proof. Use tr(I โ I) = tr(I)2 = ๐2 and tr(๐น ) = ๐ to evaluate the dimensions of thesesubspaces:
dim(โ2
)=tr(๐โจ2) = 1
2(tr(I โ I) + tr(๐น )) = 12(๐2 + ๐) =
(๐ + 1
2
),
dim(โ2
)=tr(๐โง2) = 1
2(tr(I โ I) โ tr(๐น )) = 12(๐2 โ ๐) =
(๐
2
).
Next, ๐โจ2๐โง2 = 0 which ensures that both subspaces are mutually orthogonal. Finally,add the dimensions to check that the direct sum of both subspaces cover all of ๐ปโ2:
dim(โ2
)+ dim
(โ2)
=(
๐ + 12
)+(
๐
2
)= ๐2 = dim
(๐ปโ2
).
We conclude this section with a highly instructive example.Example 2.3. Fix any operator ๐ด โ โ(๐ป). Then,
tr(๐โง2๐ด) = ๐โง2๐ด
๐ด
= 12
โโโโโ
๐ด
๐ด
โ ๐ด
๐ด
โโโโโ = 1
2(tr(๐ด)2 โ tr
(๐ด2))
.
Next, fix ๐ = 2 and set ๐ด = [๐๐๐ ]2๐,๐=1. Then,
tr(๐โง2๐ด) = 12(tr(๐ด)2 + tr
(๐ด2))
๐11๐22 โ ๐12๐21 = det(๐ด).
This is not a coincidence, as we shall see later. For ๐ = 2, the antisymmetric subspacehas dimension
(22)
= 1. Evaluating the action of ๐ดโ2 on this one-dimensional subspaceproduces a famous polynomial: the determinant.
3
3 Symmetric and antisymmetric subspaces of ๐ปโ๐
Let us now generalize the constructions of symmetric and antisymmetric subspaces totensor spaces of order ๐ โฅ 3. Let ๐ฎ๐ denote the group of permutations of ๐ elements.For ๐ โ ๐ฎ๐ define ๐๐ โ โ(๐ปโ๐) via
๐๐๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐ = ๐ฅ๐โ1(1) โ ยท ยท ยท โ ๐ฅ๐โ1(๐) for ๐ฅ1, . . . , ๐ฅ๐ โ ๐ป
and extend this definition linearly. It is easy to check that these operators are unitaryand respect the group structure of ๐ฎ๐: ๐๐๐๐ = ๐๐โ๐ . Hence, they form a unitaryrepresentation of the symmetric group ๐ฎ๐ on ๐ปโ๐.
Example 3.1 (๐ = 2). For ๐ = 2, there are only two permutations: the identity andtransposition (โflipโ). In wiring notation:
{๐๐ : ๐ โ ๐ฎ2} ={
,
}โ โ
(๐ปโ2
)
Example 3.2 (๐ = 3). For ๐ = 3, there are 3! = 6 permutation operators:โงโชโจโชโฉ
, , , , ,
โซโชโฌโชโญ
โ โ(๐ปโ3
)
Let sign(๐) โ {ยฑ1} be the signature of a permutation ๐ โ ๐ฎ๐. Define
๐โจ๐ = 1๐!โ
๐โ๐ฎ๐
๐๐ and ๐โง๐ = 1๐!โ
๐โ๐ฎ๐
sign(๐)๐๐. (1)
These are the correct definitions for projectors onto symmetric and antisymmetricsubspaces of ๐ปโ๐. In order to demonstrate this, we need the following fact aboutsignatures.
Fact 3.3. The signature is multiplicative: sign(๐ โ ๐) = sign(๐)sign(๐) for all ๐, ๐ โ ๐ฎ๐
Proposition 3.4 (Symmetric and antisymmetric subspace of ๐ปโ๐). The operators๐โจ๐ , ๐โง๐ โ โ
(๐ปโ๐
)defined in Equation (1) are orthogonal projectors onto mutu-
ally orthogonal subspacesโ
๐ = range(๐โจ๐) andโ
๐ = range(๐โง๐).
Proof. First, note that ๐๐โ1 = ๐ *๐ for any ๐ โ ๐ฎ๐. Moreover, inversion doesnโt change
the signature. Therefore,
๐ *โง๐ = 1
๐!โ
๐โ๐ฎ๐
sign(๐)๐ *๐ = 1
๐!โ
๐โ๐ฎ๐
sign(๐โ1
)๐๐โ1 = 1
๐!โ
๐โฒโ๐ฎ๐
sign(๐โฒ)๐๐โฒ = ๐โง๐ ,
4
because permutations form a group and it doesnโt matter if we sum over them or theirinverses. The operator ๐โง๐ is self-adjoint. Next, use multiplicativity of the sign toconclude
๐ 2โง๐ = 1
(๐!)2โ
๐,๐โ๐ฎ๐
sign(๐)sign(๐)๐๐๐๐ = 1(๐!)2
โ
๐,๐โ๐ฎ๐
sign(๐ โ ๐)๐๐โ๐
= 1๐!โ
๐โ๐ฎ๐
sign(๐)๐๐.
Here, we have once more used the fact that ๐ฎ๐ is a group. Hence, ๐โง๐ is indeed anorthogonal projector. A similar argument establishes that ๐โจ๐ is also a projector.Finally, note that
๐โง๐๐โจ๐ = 1(๐!)2
โ
๐,๐โ๐ฎ๐
sign(๐)๐๐๐๐ = 1๐!โ
๐โ๐ฎ๐
sign(๐)
โโ 1
๐!โ
๐โ๐ฎ๐
๐๐โ๐
โโ
=
โโ 1
๐!โ
๐โ๐ฎ๐
sign(๐)
โโ ๐โจ๐ = 0.
which establishes orthogonality. The final equation is due to the fact that the sign isantisymmetric. Averaging over all possible signatures must yield zero.
Definition 3.4 naturally extends symmetry and antisymmetry to higher order tensorspaces.
Lemma 3.5. Fix ๐ฅ1, . . . , ๐ฅ๐ โ ๐ป. Then,
๐โจ๐๐ฅ2 โ ๐ฅ1 โ ๐ฅ3 โ ยท ยท ยท โ ๐ฅ๐ =๐ฅ1 โ ๐ฅ2 โ ๐ฅ3 โ ยท ยท ยท โ ๐ฅ๐,
๐โง๐๐ฅ2 โ ๐ฅ1 โ ๐ฅ3 โ ยท ยท ยท โ ๐ฅ๐ = โ ๐ฅ1 โ ๐ฅ2 โ ๐ฅ3 โ ยท ยท ยท โ ๐ฅ๐,
and similarly for any other exchange (flip) of two factors. In particular,
๐โง๐๐ฅ โ ๐ฅ โ ๐ฅ3 โ ยท ยท ยท โ ๐ฅ๐ = 0 for any ๐ฅ โ ๐ป.
Proof. Exchanging two factors corresponds to a certain transposition ๐ โ ๐ฎ๐: ๐ฅ1 โ ๐ฅ1 โยท ยท ยท โ ๐ฅ๐ = ๐๐ ๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐. Transpositions have sign -1. The group structure of thepermutation group then implies
๐โง๐๐๐ = 1๐!โ
๐โ๐ฎ๐
sign(๐)(โsign(๐))๐๐๐๐ = โ 1๐!โ
๐โ๐ฎ๐
sign(๐ โ ๐)๐๐โ๐ = โ๐โง๐ .
This establishes antisymmetry among permutations. A similar, but simpler, argumentestablishes symmetry for the projector onto the symmetric subspace: ๐โจ๐๐๐ = ๐โจ๐
for any transposition.The final claim is enforced by contradicting requirements. By construction, ๐ฅ โ ๐ฅ โ
๐ฅ3โยท ยท ยทโ๐ฅ๐ is invariant under permuting the first two factors (symmetric). However, theprojection onto โ๐ must change sign when the first factors are permuted (antisymmetry).The only tensor product that obeys โ๐ก = +๐ก is zero.
5
Fact 3.6. Suppose that ๐ป has dimension ๐. Then,
dim(โ
๐)
=(
๐ + ๐ โ 1๐
)and dim
(โ๐)
=(
๐
๐
).
Let ๐1 . . . , ๐๐ be an orthonormal basis of ๐ป. Then, {๐๐1 โ ยท ยท ยท โ ๐๐๐: 1 โค ๐1, . . . , ๐๐ โค ๐}
is an orthonormal basis of ๐ปโ๐ with respect to the extended inner product.Applying ๐โจ๐ and ๐โง๐ to these basis vectors produce spanning sets for โ๐ and โ๐,
respectively. However, many of these symmetrized (anti-symmetrized) standard basisvectors coincide. Removing such redundancies subsequently produces an orthonormalbasis of these subspaces. Counting their cardinality yields the following dimensionformula.
Proposition 3.7. Suppose that ๐ป has dimension ๐. Then,
dim(โ
๐)
=(
๐ + ๐ โ 1๐
)and dim
(โ๐)
=(
๐
๐
).
Note that these two subspaces in general do not span the entire tensor productspace ๐ปโ๐ with ๐ โฅ 3:
dim(โ
๐)
+ dim(โ
๐)
=(
๐ + ๐ โ 1๐
)+(
๐
๐
)< ๐๐ = dim
(๐ปโ๐
).
Also, the dimension of the totally symmetric subspace grows very quickly as ๐ increases.In contrast, the dimension of the totally anti-symmetric subspace achieves its maximumfor ๐ = โ๐/2โ. Then, the dimension starts to decrease. A second extreme case isachieved for ๐ = ๐:
dim(โ
๐)
=(
๐
๐
)= 1.
For larger values of ๐, the subspace vanishes entirely. The following reformulationexplicitly points out that the range of ๐โจ๐ is one-dimensional.
Lemma 3.8. Let ๐1, . . . , ๐๐ be an orthonormal basis of ๐ป. Then,
๐โง๐ = ๐!๐โง๐(๐1 โ ยท ยท ยท โ ๐๐)(๐1 โ ยท ยท ยท โ ๐๐)*๐โง๐ .
Proof. We may expand the identity on ๐ป as I = โ๐๐=1 ๐๐๐
*๐ . Persistence and the fact
that ๐โง๐ is an orthogonal projector then imply
๐โง๐ =๐โง๐Iโ๐๐โง๐ =๐โ
๐1,...,๐๐=1๐โง๐(๐๐1 โ ยท ยท ยท โ ๐๐๐
)(๐๐1 โ ยท ยท ยท โ ๐๐๐)*๐โง๐ .
Most of these extended standard basis vectors must vanish when projected onto โ๐.Indeed, Lemma 3.5 ensures that the only non-vanishing contributions are permutationsof ๐1 โ ยท ยท ยท โ ๐๐. Up to signs all ๐! such vectors get projected onto the same vector,namely ๐โง๐๐1 โ ยท ยท ยท โ ๐๐. Potential sign flips do not matter, however. Each of theseprojected tensor products features twice in the expression and potential sign flips cancelout.
6
We conclude this section with a powerful property of ๐โจ๐ , ๐โง๐ and โ more generallyโ permutation operations.
Fact 3.9. Permutation operators commute with elementary tensor product operators:๐๐ โ โ(๐ปโ๐) for any ๐ด โ โ(๐ป) and any ๐ โ ๐ฎ๐. In particular,
๐โจ๐๐ดโ๐ = ๐ดโ๐๐โจ๐ and ๐โง๐๐ดโ๐ = ๐ดโ๐๐โง๐ .
Intuitively, this makes sense. Tensor product operators ๐ดโ๐ act identically on allfactors ๐ป of ๐ปโ๐. Permuting first and then applying this operator is equivalent to firstapplying the operator and then permuting tensor factors.
4 The determinant4.1 A tensor product formula for the determinant
Recall that the range of ๐โง๐ is one-dimensional. Moreover, Example 2.3 suggests aconnection between the projection of ๐ดโ๐ onto this 1D-subspace and the determinant(for ๐ = 2). The following definition extends this intuition to arbitrary dimensions ๐.
Definition 4.1 (The determinant). Suppose that ๐ป has dimension ๐. For ๐ด โ โ(๐ป) define
det(๐ด) = tr(๐โง๐๐ดโ๐
).
It is easy to check that this formula is equivalent to the Leibniz formula for thedeterminant. Let ๐1, . . . , ๐๐ โ ๐ป denote the columns of ๐ด. Lemma 3.8 allows fordeducing the Leibniz formula from this definition:
det(๐ด) = tr(๐โง๐๐ดโ๐
)=โ
๐โ๐ฎ๐
sign(๐)โจ๐1, ๐ด๐๐(1)โฉ ยท ยท ยท โจ๐๐, ๐ด๐๐(๐)โฉ. (2)
4.2 Properties of the determinant
All fundamental properties of the determinant can be established with relative ease inthis tensor product formalism.
Lemma 4.2 (Normalization). Let I โ โ(๐ป) denote the identity. Then, det(I) = 1.
Proof. The antisymmetric subspace โ๐ โ ๐ปโ๐ is one-dimensional. Therefore,
det(I) = tr(๐โง๐๐ผโ๐
)= tr(๐โง๐) = dim
(โ๐)
=(
๐
๐
)= 1.
Proposition 4.3 (Invariance under basis changes). Let ๐ โ โ(๐ป) be invertible. Then,det(๐๐ด๐โ1) = det(๐ด) for any ๐ด โ โ(๐ป).
7
Proof. The projector ๐โง๐ commutes with ๐โ๐, see Fact 3.9. Therefore
det(๐๐ด๐โ1
)=tr
(๐โง๐๐โ๐๐ดโ๐(๐โ1)โ๐
)= tr
(๐โ๐๐โง๐๐ดโ๐(๐โ1)โ๐
)
=tr(
๐โง๐๐ดโ๐(๐โ1๐
)โ๐)
= tr(๐โง๐๐ดโ๐
)= det(๐ด),
where we have also used cyclicity of the trace.
Corollary 4.4 (The determinant is the product of eigenvalues). Suppose that ๐ด โ โ(๐ป)is diagonalizable and has eigenvalues ๐1, . . . , ๐๐. Then, det(๐ด) = โ๐
๐=1 ๐๐.
Proof. Let ๐ด = ๐๐ท๐โ1 with ๐ท = diag(๐1, . . . , ๐๐) be an eigenvalue decomposition.Note that,
๐ทโ๐๐โง๐๐1 โ ยท ยท ยท โ ๐๐ = ๐โง๐๐ทโ๐๐1 โ ยท ยท ยท โ ๐๐ = ๐1 ยท ยท ยท ๐๐๐โ๐๐1 โ ยท ยท ยท โ ๐๐.
Proposition 4.3 and Lemma 3.8 then imply
det(๐ด) = det(๐ท) = ๐!โจ๐1 โ ยท ยท ยท โ ๐๐, ๐โง๐๐ทโ๐๐โง๐๐1 โ ยท ยท ยท โ ๐๐โฉ=๐1 ยท ยท ยท ๐๐๐!โจ๐1 โ ยท ยท ยท โ ๐๐, ๐โง๐Iโ๐๐โง๐๐1 โ ยท ยท ยท โ ๐๐โฉ=๐1 ยท ๐๐ det(I).
Proposition 4.5 (The determinant is multiplicative). Fix ๐ด, ๐ต โ โ(๐ป). Then, det(๐ด๐ต) =det(๐ด) det(๐ต).
Proof. Recall that ๐โง๐ is a rank-one projector that is proportional to the (projected)outer product ๐!๐โง๐ ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ*๐โง๐ . Here, we have introduced the short-hand notation ๏ฟฝ๏ฟฝ =๐1 โ ยท ยท ยท โ ๐๐ โ ๐ปโ๐. Also, ๐โง๐ commutes with ๐ดโ๐. Therefore,
det(๐ด๐ต) =tr(๐โง๐(๐ด๐ต)โ๐
)= tr
(๐โง๐๐ดโ๐๐โง๐๐ตโ๐
)
=(๐!)2tr(๐โง๐ ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ*๐โง๐๐ดโ๐๐โง๐ ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ*๐โง๐๐ตโ๐
)
=๐!โจ๏ฟฝ๏ฟฝ, ๐โง๐๐ดโ๐๐โง๐ ๏ฟฝ๏ฟฝโฉ๐!โจ๏ฟฝ๏ฟฝ, ๐โง๐๐ตโ๐๐โง๐ ๏ฟฝ๏ฟฝโฉ=tr
(๐โง๐๐ดโ๐
)tr(๐โง๐๐ตโ๐
)= det(๐ด) det(๐ต).
The following useful fact follows directly from multiplicativity.
Corollary 4.6. Let ๐ โ โ(๐ป) be invertible. Then, det(๐โ1) = det(๐).
Other fundamental properties are evident from the tensor product constructionitself. Tensor products of vectors are multi-linear (i.e. linear in each factor) and ๐โง๐
anti-symmetrizes. For the determinant, these fundamental properties ensure
8
1. Multi-linearity: det(๐ด) is linear in the columns ๐๐ = ๐ด๐๐ of ๐ด.2. Antisymmetry: Exchanging two columns of ๐ด negates the determinant.
These properties completely characterize the determinant and rule out any otherpossibility.
Fact 4.7 (Uniqueness). The determinant is the unique matrix function that is i) multi-plicative, ii) multi-linear, iii) antisymmetric and iv) obeys det(I) = 1.
5 The permanentThe permanent is the symmetric cousin of the determinant. It is typically definedanalogously to the Leibniz formula (2):
perm(๐ด) =โ
๐โ๐ฎ๐
โจ๐1, ๐ด๐๐(1)โฉ ยท ยท ยท โจ๐๐, ๐ด๐๐(๐)โฉ.
Definition 5.1 (The permanent). Suppose that ๐ป has dimension ๐. For ๐ด โ โ(๐ป) define
perm(๐ด) = ๐!โจ๐1 โ ยท ยท ยท โ ๐๐, ๐โจ๐๐ดโ๐๐โจ๐๐1 โ ยท ยท ยท โ ๐๐โฉ.
This definition in terms of tensor products readily implies desirable features. Thepermanent is multi-linear and symmetric under exchanging columns of ๐ด.
However, there is also a crucial difference. While the range of ๐โง๐ is one-dimensional,the range of ๐โจ๐ is huge. This prevents us from rewriting perm(๐ด) as the trace of๐โจ๐๐ดโ๐. This clever trick, however, was the basis for establishing multiple nice featuresof the determinant. For the permanent, such an approach is impossible. In general,rather little is known about the permanent.
The permanent is also notoriously difficult to compute. This computational dis-crepancy between det (easy to compute) and perm (hard to compute) forms the basicdichotomy of algebraic complexity theory (think P vs NP). Current quantum supremacyexperiments (โboson samplingโ) are also based on the computational hardness associatedwith computing (generic) permanents.
Lecture 05: Haar integration
Scribe: Oguzhan Teke
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 15, 2019
1 Agendaโ Polynomials and Tensorsโ Closed form expression for Haar Integralsโ Schur-Weyl Duality
2 Motivational ExamplesIn order to motivate Haar integration, we start this lecture by studying two simpleexamples. In this regard, we consider the 2-dimensional complex unit sphere
S1 ={๐ฅ โ C2 | โจ๐ฅ, ๐ฅโฉ = 1
} โ C2, (1)
and two homogeneous polynomials:
๐1(๐ฅ, ๏ฟฝ๏ฟฝ) = ๐ฅ1 ๏ฟฝ๏ฟฝ1 + ๐ฅ2 ๏ฟฝ๏ฟฝ2 and ๐2(๐ฅ, ๏ฟฝ๏ฟฝ) = ๐ฅ1 ๏ฟฝ๏ฟฝ2 + ๐ฅ2 ๏ฟฝ๏ฟฝ1. (2)
The task is to integrate these polynomials over S1. A moment of thought reveals thatthe first polynomial is just the squared radius: ๐1(๐ฅ, ๏ฟฝ๏ฟฝ) = โจ๐ฅ, ๐ฅโฉ = 1. This polynomialis constant on S1 and we readily conclude
โซ
S1๐1(๐ฃ, ๐ฃ) d๐(๐ฃ) = 1.
The second polynomial flips its sign under negating one corrdinate. Set ๐ฆ = (๐ฅ1, โ๐ฅ2).Then ๐(๐ฆ, ๐ฆ) = โ๐(๐ฅ, ๏ฟฝ๏ฟฝ). This antisymmetry requires that the integral over the entiresphere must vanish: โซ
S1๐2(๐ฃ, ๐ฃ) d๐(๐ฃ) = 0.
Note that we could only evaluate these integration formulas by using clever symmetrytricks. This approach becomes more challenging for higher order polynomials, e.g.๐2(๐ฅ, ๏ฟฝ๏ฟฝ)2.
Haar integration provides a general means for integrating homogeneous polynomialsof any degree over complex unit spheres in any dimensions.
3 Polynomials and TensorsThroughout this course we restrict our attention to homogeneous polynomials.
Definition 3.1. A polynomial ๐(๐ฅ) of degree ๐ is homogeneous if all monomials havedegree ๐. We denote the space of such polynomials by Hom(๐)(๐ฅ).
2
This restriction is not very severe. Any degree-๐ polynomial in ๐ variables can berepresented as a homogeneous polynomial of the same degree in ๐ + 1 variables, wherewe fix the last variable to one:
๐(๐ฅ1, . . . , ๐ฅ๐) = ๐hom(๐ฅ1, . . . , ๐ฅ๐, 1).
3.1 Homogeneous polynomials and ๐ปโ๐
In fact, there is a close connection between homogeneous polynomials of degree ๐ andtensor product spaces of order ๐. Let ๐ป = C๐ be a ๐-dimensional complex vector spaceendowed with the standard inner product. Fix ๐ก โ ๐ปโ๐ and define
๐๐ก(๐ฅ) = โจ๐ก, ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ for all ๐ฅ โ ๐ป.
Here, โจยท, ยทโฉ denotes the extended inner product on ๐ปโ๐. Recall from Lecture 2, that wecan expand ๐ก as a linear combination of elementary tensor products:
๐ก =๐โ
๐=1๐ผ๐ ๐
(1)๐ โ ยท ยท ยท โ ๐
(๐)๐ , ๐
(1)๐ , . . . , ๐
(๐)๐ โ ๐ป.
In turn, ๐(๐ฅ) becomes
๐๐ก(๐ฅ) =๐โ
๐=1๐ผ๐ โจ๐(1)
๐ โ ยท ยท ยท โ ๐(๐)๐ , ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ
=๐โ
๐=1๐ผ๐ โจ๐(1)
๐ , ๐ฅโฉ ยท ยท ยท โจ๐(๐)๐ , ๐ฅโฉ =
๐โ
๐=1๐ผ๐
๐โ
๐=1โจ๐(๐)
๐ , ๐ฅโฉ.
It is easy to check that this is a homogeneous polynomial of degree ๐ in ๐ complexvariables ๐ฅ = (๐ฅ1, . . . , ๐ฅ๐).
A natural question is to ask whether this correspondence (between the tensor spaceof order ๐ and the homogeneous polynomials of degree ๐) is one-to-one? The answeris no in general. The tensor product ๐ฅ โ ยท ยท ยท โ ๐ฅ is invariant under permuting tensorfactors. More generally, let
๐โจ๐ = 1๐!โ
๐โ๐ฎ๐
๐๐
denote the projector onto the totally symmetric subspace โ๐ โ ๐ปโ๐. Fix ๐ก โ ๐ปโ๐.Then,
๐๐ก(๐ฅ) =โจ๐ก, ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ = โจ๐ก, ๐โจ๐ ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ = โจ๐โจ๐ ๐ก, ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ=๐๐โจ๐ ๐ก(๐ฅ).
Here, we have used the fact that Pโจ๐ is the projection (hence, unitary) onto thesymmetric tensor space, and ๐ฅ โ ยท ยท ยท โ ๐ฅ belongs to this subspace. This result showsthat both ๐ก and ๐โจ๐ ๐ก correspond to the same polynomial. This implies that thecorrespondence between ๐ปโ๐ and Hom(๐)(๐ฅ) is in general not one-to-one. It does,however, become one-to-one if we restrict our attention to the totally symmetricsubspace. The following proposition presents this result formally:
3
Proposition 3.2. There is a one-to-one correspondence between Hom(๐)(๐ฅ) and โ๐ โ ๐ปโ๐.
Proof sketch. We note that
dim(Hom(๐)(๐ฅ)
)=(
๐ + ๐ โ 1๐
)= dim
(โ๐)
, (3)
and there is a one-to-one correspondence between the extended standard basis vectorsand monomials:
๐๐๐1 โยทยทยทโ๐๐๐(๐ฅ) = โจ๐๐1 , ๐ฅโฉ ยท ยท ยท โจ๐๐๐
, ๐ฅโฉ = ๐ฅ๐1 ยท ยท ยท ๐ฅ๐๐.
Monomials generate Hom(๐), while the extended standard basis spans โ(๐). Bothdimensions match up which can be used to establish a one-to-one correspondenceformally.
3.2 Doubly homogeneous polynomials and โ(๐ปโ๐)Definition 3.3. Define Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ) to be the space of doubly homogeneous polynomialsover C, i.e. polynomials that are are ๐-homogeneous in ๐ฅ and ๐-homogeneous in ๏ฟฝ๏ฟฝ:
Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ) = Hom(๐)(๐ฅ) Hom(๐)(๏ฟฝ๏ฟฝ). (4)
A typical example for a doubly homogeneous polynomial is the standard Euclideannorm โจ๐ฅ, ๐ฅโฉ and its integer powers.
Theorem 3.4. There is a one-to-one correspondence between Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ) and โ(โ๐
)โ
the space of linear operators from โ๐ to itself.
Proof Sketch. Let ๐, ๐ โ Hom(๐)(๐ป). Then, ๐(๏ฟฝ๏ฟฝ) ๐(๐ฅ) โ Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ). So,
๐(๏ฟฝ๏ฟฝ) ๐(๐ฅ) =โจ๐ฅ โ ยท ยท ยท โ ๐ฅ , ๐ก๐โฉ โจ๐ก*๐, ๐ฅ โ ยท ยท ยท โ ๐ฅ โฉ (5)
=โจ๐ฅ โ ยท ยท ยท โ ๐ฅ ๐ก๐ ๐ก*๐ ๐ฅ โ ยท ยท ยท โ ๐ฅโฉ (6)
for some ๐ก๐, ๐ก๐ โ โ๐. Rank-one operators of the form ๐ก๐ ๐ก*๐ span the space of all linear
operators. The dimension of this space is dim(โ๐)2 =(๐+๐โ1
๐
)2 which coincides with thespace of doubly homogeneous polynomials (
(๐+๐โ1๐
)degrees of freedom for homogeneous
polynomials in ๏ฟฝ๏ฟฝ and ๐ฅ each).
Corollary 3.5. Fix ๐ด โ โ(โ๐) (think of it as ๐โจ๐๐ด๐โจ๐ with ๐ด โ โ(๐ปโ๐)). Then,
๐๐ด(๐ฅ, ๏ฟฝ๏ฟฝ) = โจ๐ฅ โ ยท ยท ยท โ ๐ฅ , ๐ด(๐ฅ โ ยท ยท ยท โ ๐ฅ
)โฉ = tr(๐ด(๐ฅ๐ฅ*)โ๐
), (7)
is doubly-homogeneous of degree ๐. Moreover, every polynomial in Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ) hasthis form.
4
4 Haar integration4.1 Motivation
Haar-integration provides closed form expressions for integrating doubly homogeneouspolynomials over complex unit spheres. It may be viewed as a complex generalizationof Gaussian integration. At the heart is the correspondence between Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ) andโ(๐ปโ๐), see Corollary 3.5. For now, let d๐(๐ฅ) be an arbitrary integration mesure.Then,
โซ๐๐ด(๐ฃ, ๐ฃ) d๐(๐ฃ) =
โซtr(๐ด(๐ฃ๐ฃ*)โ๐
)d๐(๐ฃ) = tr
(๐ด
โซ(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ)
).
This reformulation has deep implications. A closed-form expression for
๐ป(๐) =โซ
(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ) โ โ(๐ปโ๐) (8)
would allow us to compute integrals of arbitrary polynomials by contracting tensorproduct operators: โซ
๐๐ด(๐ฃ, ๐ฃ) d๐(๐ฃ) = tr(๐ด ๐ป(๐)
).
Haar-integration achieves precisely this goal for the normalized, unitarily invariantmeasure d๐(๐ฃ) on the complex unit sphere S๐โ1 โ ๐ป. The exceedingly high degree ofsymmetry allows for deriving an analytic expression for ๐ป(๐).
Before doing so, a few comments are in order. ๐ stands for a measure on S๐โ1 thatinherits nice properties from C๐. Normalization means that ๐
(S๐โ1
)= 1. Finally, and
most importantly for our goals, unitary invariance means that the measure is invariantunder any unitary transformation ๐ :
๐(๐๐) = ๐(๐) for every Borel set ๐ โ S๐โ1.
One can show that there is only one measure on S๐โ1 with these desirable properties.This measure is called the Haar measure.
Informally, the Haar measure assigns an infinitesimally small weight to each point๐ฅ โ S๐โ1. This assignment is fair in the sense that no vector is weighted less (or more)than any other vector.
4.2 Reformulation of the integration formula
The unitary operators on ๐ป form a nice group U(๐). This group is unimodular andcarries the structure of a Lie group. Importantly, one can also endow U(๐) with anormalized, unitarily invariant measure d๐ . In fact, this Haar measure on U(๐) inducesthe unitarily invariant measure on S๐โ1. Indeed, we can think of the sphere as theset of all possible rotations of a fixed starting vector ๐ฃ0 โ S๐โ1, e.g. the โnorth pole.โThe precise choice of starting point is irrelevant, because both measures are unitarily
5
invariant. We can use this reasoning to rewrite ๐ป(๐) in the following fashion:
๐ป(๐) =โซ
S๐โ1(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ) =
โซ
U(๐)(๐๐ฃ0๐ฃ*
0๐*)โ๐ d๐(๐)
=โซ
U(๐)๐โ๐(๐ฃ0๐ฃ*
0)โ๐(๐*)โ๐ d๐(๐).
This reformulation highlights an interesting property of ๐ป(๐).
Lemma 4.1. The operator ๐ปโ๐ โ โ(๐ปโ๐) defined in Eq. (8) commutes with anysynchronized change of basis in ๐ป:
๐ป(๐)๐ โ๐ = ๐ โ๐๐ป(๐) for all ๐ โ U(๐).
Proof. Fix ๐ โ U(๐). Unitary invariance of the Haar measure implies d(๐) = d(๐ ๐).This allows us to perform a simple change of integration variables ๐ โฒ = ๐ โฆโ ๐ ๐ thatensures
๐ โ๐๐ป(๐) =โซ
U(๐)(๐ ๐)โ๐(๐ฃ0๐ฃ*
0)โ๐(๐*)โ๐ d(๐)
=โซ
U(๐)(๐ โฒ)โ๐(๐ฃ0๐ฃ*
0)โ๐((๐ * ๐ โฒ)*)โ๐ d(๐ โฒ)
=โซ
๐(๐)(๐ โฒ)โ๐(๐ฃ0๐ฃ*
0)โ๐((๐ โฒ)*)โ๐ d๐(๐ โฒ)๐ โ๐ = ๐ป(๐)๐ โ๐.
4.3 Haar integration formula for degree ๐ = 1Theorem 4.2. Set ๐ป = C๐ and let d๐(๐ฃ) and d๐(๐) denote the Haar measures on S๐โ1
and U(๐), respectively. Then,
๐ป(1) =โซ
U(๐)๐๐ฃ0๐ฃ*
0๐* d๐(๐) = 1๐
I.
Proof. Lemma 4.1 implies that ๐ป(1) must obey ๐๐ป(1)๐* = ๐ป(1) for any ๐ โ U(๐).In other words: ๐ป(1) must have the same matrix representation for any choice of basis.There is only one operator with this property โ the identity I โ โ(๐ป). The pre-factor๐โ1 follows from taking the trace:
tr(๐ป(1)) =โซ
U(๐)tr(๐๐ฃ0๐ฃ*
0๐*)d๐(๐) = โจ๐ฃ0, ๐ฃ0โฉโซ
U(๐)d๐(๐) = 1.
This closed-form expression already allows us to compute integrals of doubly homo-geneous polynomials of degree one:
โซ
S๐โ1๐๐ด(๐ฃ, ๐ฃ) d๐(๐ฃ) = tr
(๐ด๐ป(1)
)= tr(๐ด)
๐.
6
The two example polynomials from the beginning of this lecture fall into this category:
๐1(๐ฅ, ๏ฟฝ๏ฟฝ) = ๐๐ด1(๐ฅ, ๏ฟฝ๏ฟฝ) = tr(๐ด1 ๐ฅ ๐ฅ*) with ๐ด1 = I =(
1 00 1
), (9)
๐2(๐ฅ, ๏ฟฝ๏ฟฝ) = ๐๐ด2(๐ฅ, ๏ฟฝ๏ฟฝ) = tr(๐ด2 ๐ฅ ๐ฅ*) with ๐ด2 =(
0 11 0
). (10)
Theorem 4.2 now allows us to quickly compute the associated integrals. Set ๐ = 2 andcompute
โซ
S1๐1(๐ฃ, ๐ฃ) d๐(๐ฃ) =tr(๐ด1)
2 = 1,
โซ
S1๐2(๐ฃ, ๐ฃ) d๐(๐ฃ) =tr(๐ด2)
2 = 0.
4.4 Haar integration formula for arbitrary degreeThe approach from the previous subsection can be extended to establish closed formexpressions for ๐ป(๐) with ๐ โฅ 2.
Theorem 4.3 (Haar integration formula). Set ๐ป = C๐ and let d๐(๐ฃ) and d๐(๐) denotethe Haar measures on S๐โ1 โ ๐ป and U(๐), respectively. Then, for any ๐ โฅ 2
๐ป(๐) =โซ
S๐โ1(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ) =
โซ
U(๐)(๐๐ฃ0๐ฃ*
0๐*)โ๐ d๐(๐) =(
๐ + ๐ โ 1๐
)โ1
๐โจ๐ .
Here, ๐โจ๐ = 1๐!โ
๐โ๐ฎ๐๐๐ โ โ(๐ปโ๐) denotes the projector onto the totally symmetric
subspace and(๐+๐โ1
๐
)is the dimension of its range โ๐ โ ๐ปโ๐.
The proof of this statement is based on deep results from algebra (representationtheory). Recall that we can identify permutations ๐ โ ๐ฎ๐ with operators that permutetensor factors:
๐๐ : ๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐ โฆโ ๐ฅ๐โ1(1) โ ยท ยท ยท โ ๐ฅ๐โ1(๐)
and linearly extended to all of ๐ปโ๐. These operators are unitary and respect the groupcomposition rule (see HW I):
๐ *๐ = ๐๐โ1 = ๐ โ1๐ and ๐๐๐๐ = ๐๐โ๐ for all ๐, ๐ โ ๐ฎ๐.
This means that the ๐๐ form a unitary representation of the symmetric group ๐ฎ๐ โ anice, finite group โ on the โrepresentation spaceโ ๐ปโ๐.
In a similar fashion, the map ๐ โฆโ ๐โ๐ forms a unitary representation of U(๐) โ anice Lie group โ on ๐ปโ๐. Crucially, Lemma 4.1 ensures that these two representationsalways commute:
[๐โ๐, ๐๐
]= ๐โ๐๐๐ โ ๐๐๐โ๐ = 0 for all ๐ โ ๐ฎ๐, ๐ โ U(๐).
7
This commutation relation has profound consequences. A deep result from algebra(representation theory) states that every matrix that commutes with all ๐โ๐ must bea linear combination of permutation operators, and vice versa. More precisely: thepermutation operators generate the commutant of
{๐โ๐ : ๐ โ U(๐)
}and vice versa.
We refer to Matthias Christandlโs PhD thesis for a short, detailed and insightful analysisof these properties and borrow the following result:
Theorem 4.4. Let ๐ โ โ(๐ปโ๐) be an operator that commutes with any unitary of theform ๐โ๐. Then, ๐ must be a linear combination of permutation operators:
๐ =โ
๐โ๐ฎ๐
๐๐๐๐.
A proof of this claim would go beyond the scope of this lecture. It follows from thedouble commutant theorem and exploiting both the finite group structure of ๐ฎ๐ and thenice Lie group structure of U(๐). This double-commutant theorem allows us to readilydeduce the Haar integration formula.
Proof of Theorem 4.3. Lemma 4.1 implies that ๐ป(๐) must commute with every tensorproduct unitary ๐โ๐. Theorem 4.4 then ensures that this operator must be a linearcombination of permutations:
๐ =โ
๐โ๐ฎ๐
๐๐๐๐. (11)
Next, note that there is additional symmetry present: ๐ป(๐) is also invariant under anypermutations. Fix ๐ โ ๐ฎ๐ and observe,
๐๐๐ป(๐) =โซ
S๐โ1๐๐(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ) =
โซ
S๐โ1
(๐๐๐ฃโ๐
)(๐ฃ*)โ๐ d๐(๐ฃ)
=โซ
S๐โ1(๐ฃ๐ฃ*)โ๐ d๐(๐ฃ) = ๐ป(๐).
This invariance is only possible if all the expansion coefficients in Eq. (11) are the same:
๐ป(1) =โ
๐โ๐ฎ๐
๐๐๐ = ๐โ
๐โ๐ฎ๐
๐๐ = ๐ ๐โจ๐ .
Finally, take the trace on both sides to specify this constant:
๐
(๐ + ๐ โ 1
๐
)= tr(๐ ๐โจ๐) = tr
(๐ป(๐)
)= 1.
4.5 Closed form expressions for integrating homogeneous polynomials
Corollary 4.5. Let ๐๐ด(๐ฅ, ๏ฟฝ๏ฟฝ) = tr(๐ด ๐ฅ๐ฅ*) be an arbitrary polynomial in Hom(๐)(๐ฅ, ๏ฟฝ๏ฟฝ).Let d๐(๐ฃ) be the Haar measure on the complex unit sphere S๐โ1 โ C๐. Then,
โซ
S๐โ1๐๐ด(๐ฃ, ๐ฃ) d๐(๐ฃ) =
(๐ + ๐ โ 1
๐
)โ1
tr(๐ด ๐โจ๐).
8
Closed form expressions for such integration formulas have a variety of applications.We will discuss several of them in future lectures. For now, we content ourselves withintegrating the squares of the homogeneous polynomials defined in Eq. (2):
๐๐(๐ฅ, ๏ฟฝ๏ฟฝ)2 = tr(๐ด๐ ๐ฅ๐ฅ*)2 = tr(๐ดโ2
๐ (๐ฅ๐ฅ*)โ2).
Set ๐ = 2 and apply Corollary 4.5 to concludeโซ
S1๐๐(๐ฃ, ๐ฃ) d๐(๐ฃ) = tr
(๐ดโ2
๐ ๐ป(2))
= 13 tr
(๐ดโ2
๐ ๐โจ2
),
because(๐+1
2)
= 32 = 3. Next, use ๐โจ2 = 1
2(Iโ2 + ๐น
), where ๐น : ๐ฅ โ ๐ฆ โฆโ ๐ฆ โ ๐ฅ denotes
the flip operator. This ensures,
tr(๐ดโ2
๐ ๐โจ2
)=1
2(tr(๐ด๐Iโ2
)+ tr
(๐ดโ2
๐ ๐น))
= 12(tr(๐ด๐)2 + tr
(๐ด2
๐
)),
and we can insert the operator expressions from Equations (9) and (10) to get concretenumbers:
โซ
S1๐1(๐ฃ, ๐ฃ)2 d๐(๐ฃ) = 1
6(tr(๐ด1)2 + tr
(๐ด2
1))
= 16(22 + 2
)= 1,
โซ
S1๐2(๐ฃ, ๐ฃ)2 d๐(๐ฃ) = 1
6(tr(๐ด2)2 + tr
(๐ด2
2))
= 16(0 + 2) = 1
3 .
Lecture 06: Entanglement is ubiquitous
Scribe: Jiajie Chen
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 17, 2019
1 Agenda1. Pure states and entanglement2. Almost all pure states are entangled3. Proof:
(a) Concentration(b) Discretization(c) Union bound
2 Pure quantum states and entanglement2.1 Pure states
Fix ๐ป = C๐ and endow it with the standard inner product. Recall that quantummechanical systems are described by density matrices:
๐ โ S(๐ป) = {๐ โ โ(๐ป) : ๐* = ๐, ๐ โชฐ 0, (I, ๐) = 1}.
Density matrices are the SDP-generalization of ๐-variate probability vectors:
๐ โ ฮ๐โ1 ={
๐ฅ โ R๐ : ๐ฅ โฅ 0, โจ1, ๐ฅโฉ = 1}
.
Pure probability vectors correspond to extreme points of this convex set: ๐ = ๐๐
for 1 โค ๐ โค ๐. These represent deterministic distributions. Every probability vectorcorresponds to a convex mixture of these extreme distributions.
The natural quantum analogue is captured by the following definition.
Definition 2.1 (pure state). A density matrix ๐ โ ๐ฎ(๐ป) is pure if and only if rank(๐) = 1,or equivalently: ๐ = ๐ข๐ข* for ๐ข โ ๐ป obeying โจ๐ข, ๐ขโฉ = 1. We call such density matricespure states.
Let S(๐ป) denote the complex unit sphere in ๐ป = C๐:
S(๐ป) = {๐ฅ โ ๐ป : โจ๐ฅ, ๐ฅโฉ = 1} โ ๐ป.
The following properties assert that pure states really mimic pure (deterministic)probability vectors:
โ Pure states form the boundary of the convex set ๐ฎ(๐ป).
2
โ Every density matrix corresponds to a convex mixture of pure states:
S(๐ป) = conv{๐ข๐ข* : ๐ข โ S(๐ป)}
โ Every density matrix corresponds to the marginalization of a larger pure state:
๐ = tr2(๐ข๐ข*) for some ๐ข โ S(๐ปโ2
).
Proofs of these claims are part of Exercise I. In summary: Pure states are the โmostextremeโ density matrices. Most quantum phenomena+tricks assume their โpurestโform for pure states. Their extension to general density matrices is then achieved byconvex mixtures. These tend to obfuscate the original properties. An extreme exampleis the maximally mixed state:
๐0 =โซ
โจ๐ข,๐ขโฉ=1๐ข๐ข*d๐(๐ข) = 1
๐I.
This is the โmost uselessโ state conceivable. The outcome of any quantum measurement{๐ป๐ : ๐ โ ๐ด} is maximally random:
Pr[๐|๐0] = (๐ป๐, ๐0) = tr(๐ป๐)๐
for all ๐ โ ๐ด.
Such outcome probabilities can typically be โsimulatedโ by tossing conventional coins.2.2 Entanglement for pure statesSuppose that a quantum mechanical system is comprised of two smaller system withdimensions ๐1 and ๐2, respectively. Set ๐ป1 = C๐1 and ๐ป2 = C๐2 . Then, the jointquantum state is an element of
๐ โ S(๐ป1 โ ๐ป2) โ โ(๐ป1) โ โ(๐ป2) โ โ(๐ป1 โ ๐ป2).
Pure states assume the following form:
๐joint = ๐ข๐ข* for ๐ข โ S(๐ป1 โ ๐ป2) โ S(๐ป),
where ๐ป = ๐ป1 โ ๐ป2 โ C๐ท with ๐ท = ๐1๐2.Recall that there are three possibilities for joint quantum states:
1. Product states: ๐joint = ๐1 โ ๐2 with ๐1 โ S(๐ป1) and ๐2 โ S(๐ป2). These behavelike independent distributions.
2. Separable states: ๐joint โ S(๐ป)1)โS(๐ป2) = conv(๐1 โ ๐2 : ๐1 โ S(๐ป1), ๐2 โ S(๐ป2)}.These correspond to convex mixtures of product states. In classical probabilitytheory, these convex mixtures reach โeverythingโ.
3. Entangled states: everything that is not separable.
Lemma 2.2. A joint pure state ๐joint = ๐ข๐ข* is separable if and only if it is a tensorproduct of pure states: ๐joint = ๐๐* โ ๐๐*, ๐ โ S(๐ป1), ๐ โ S(๐ป2).
3
Proof. It is clear that tensor products of pure states are pure product states: ๐๐*โ๐๐* โ(๐ โ ๐)(๐ โ ๐)*. These are the only pure product states โ the rank constraint is verystringent. Indeed, convex mixtures of pure product states necessarily increase the rank โdensity matrices are psd. So, the intersection of the set of all separable states with the(non-convex) set of all joint pure states returns the (non-convex) set of all pure productstates.
3 Almost all pure states are entangledA well-known result in quantum states that โalmost all states are maximally entangledโ.The typical argument is as follows: Remember our candidate for a entangled state fromLecture 3 (๐ป1 โ ๐ป2):
ฮฉ = 1๐ท
vec(I)vec(I)* = 1๐ท
,
This state is entangled and has the following interesting property:
tr1(ฮฉ) = tr2(ฮฉ) = 1๐
I = ๐0.
Although the joint state is pure, marginalizations produce the maximally mixed state(โgarbageโ). This is indicative of a very strong correlation in the joint system. Purestates with this property are called maximally entangled. Now, suppose that we choose๐ข โ S(๐ป1 โ ๐ป2) uniformly from the Haar measure. Then,
โtr2(๐ข๐ข*) โ 1๐1
Iโ1
is small with exceedingly high probability. Proving this will be part of Exercise II. Sincethe Haar-measure is fair in the sense that it assigns the same infinitesimal weight to anypure state, we can conclude the following quantitative statement: the marginalizationof almost every pure state results is very close to the maximally mixed state.
Remark 3.1. The trace distance is a natural metric for quantifying deviations amongdensity matrices. Helstromโs theorem (Lecture I) assigns an operational meaning to thisquantity: it is proportional to the optimal bias achievable when trying to distinguishthe two states in question with a single measurement.
Today, I want to derive a different statement that points in a similar direction.Almost every joint pure state is very far away from any product state.
Theorem 3.2. Set ๐ป = ๐ป1 โ ๐ป2 (dim(๐ป1) = ๐1, dim(๐ป2) = ๐2, dim(๐ป) = ๐1๐2).Choose ๐ข โ S(๐ป) uniformly from the complex unit sphere (Haar random). Then,
Pr[
inf๐โS(๐ป1),๐โS(๐ป2)
โ๐ข๐ข* โ ๐๐* โ ๐๐*โ1 โค 1]
โค 2 exp(
4.5(๐1 + ๐2) โ 7๐1๐232
)
We will use this statement to introduce a very powerful proof technique by example.It is based on three fundamental steps:
4
โ Concentration of individual problem instances (Haar-randomness)โ Discretization: Find covering nets for ๐ฎ(๐ป1) and ๐ฎ(๐ป2) and combine them to get
a covrering net for all pure product sates in S(๐ป). Finite covering nets allow usto move from controlling an infimum to controlling a minimum.
โ Apply a union bound to the (discretized) minimum and use Haar-concentrationof each instance to counter-balance the number of different instances.
Although not optimal (in terms of constants) and perhaps cumbersome, it is veryversatile and can be applied to a variety of problems. Today, we employ it to show thatalmost every pure state is entangled. Next week, we will employ it to show that almostevery pure state is useless for quantum computation. Other examples include:
1. (Quantum): Every quantum channel admits an accurate approximation (โsketchโ)that has low Kraus rank (Lancien and Winter)
2. (Quantum): Almost every quantum state (unitary channel) has high circuitโcomplexityโ (i.e. takes a long time to generate).
3. (Classical): Randomess in the measurement design allows for recovering sparsevectors and low-rank matrices efficiently from very few measurements.
4. (Classical): control the maximum eigenvalue of a random matrix.
4 Proof of Theorem 3.24.1 Preliminaries
Theorem 4.1 (Markovโs inequality). Let ๐ โ R be a non-negative random variable.Then, for any ๐ผ > 0
Pr[๐ โฅ ๐ผ] โค E[๐]๐ผ
.
Theorem 4.2 (Union bound, Booleโs inequality). Let ๐ธ1, . . . , ๐ธ๐ be events. Then,
Pr[
๐โ
๐=1๐ธ๐
]โค
๐โ
๐=1Pr[๐ธ๐].
In particular, for scalar random variables ๐1, . . . , ๐๐ โ R we have
Pr[
max1โค๐โค๐
๐๐ โฅ ๐ผ
]โค ๐ max
1โค๐โค๐Pr[๐๐ โฅ ๐ผ].
Theorem 4.3 (Haar integration tensor). Let ๐(๐ข) denote the unitarily invariant Haarmeasure on S(๐ป), dim(๐ป) = ๐ท. Then, for any ๐ โ N:
โซ
S(๐ป)(๐ข๐ข*)โ๐d๐(๐ข) =
(๐ท + ๐ โ 1
๐
)โ1
๐โจ๐ . (1)
5
Finally, we will need the concept of a covering net for the complex unit spheres S(๐ป1)and S(๐ป2). A covering net of fineness ๐ > 0 is a finite set of points {๐ง๐}๐
๐=1 โ S(๐ป๐)that evenly covers the sphere: For every ๐ฃ โ S(๐ป๐), there is a net element ๐ง๐ that is (atleast) ๐-close in Euclidean distance:
โ๐ฃ โ ๐ง๐โโ2 โค ๐.
Theorem 4.4 (Existence of covering nets). The complex unit sphere S(๐ป) in ๐ = dim(๐ป)dimensions admits a ๐-covering net of cardinality
๐ โค(
1 + 2๐
)2๐
.
The proof follows from embedding the complex unit sphere into a real-valued unitsphere of dimension 2๐ (isometry) and applying a volumetric counting argument: coverthe big sphere with many small balls.4.2 Step I: ConcentrationProposition 4.5. Suppose that ๐ท = dim(๐ป) and fix ๐ฃ โ S(๐ป). Choose ๐ข โ S(๐ป)uniformly from the Haar measure. Then, for any 0 < ๐ < 2
Pr[โ๐ข๐ข* โ ๐ฃ๐ฃ*โ1 โค ๐] โค 2eโ๐ท(1โ๐2/4)/2
This strong probabilistic concentration forms the basis of the entire argument.Identify a single instance of the larger problem. Then, apply the Haar integrationformula to show that a random vector ๐ข avoids this instance with exponentially largeprobability.
Remark 4.6. Randomness is a misleading term when describing Haar-uniform vectors.They avoid fixed, concrete events in a highly predictable fashion. Sometimes this isconstructive (here: we want to show that most states are entangled), sometimes this isdestructive (future lecture: Haar-random states are useless for quantum computation).
The proof is based on two steps.
Lemma 4.7 (Reformulation). Fix ๐ข, ๐ฃ โ S(๐ป). Then,
โ๐ข๐ข* โ ๐ฃ๐ฃ*โ1 = 2โ
1 โ |โจ๐ข, ๐ฃโฉ|2.
In particular, โ๐ข๐ข* โ ๐ฃ๐ฃ*โ1 โค ๐ if and only if |โจ๐ฃ, ๐ขโฉ|2 โฅ 1 โ ๐2/4
Proof. Set ๐ = ๐ข๐ข* โ ๐ฃ๐ฃ*. This is a hermitian, traceless matrix with rank at mosttwo. The trace norm collects the absolute values of the eigenvalues: โ๐โ1 = |๐1| + |๐2|.A vanishing trace demands ๐1 + ๐2 = tr(๐) = 0, or equivalently ๐1,2 = ยฑ๐. Next,
2๐2 = ๐21 + ๐2
2 = tr(๐) = 2(1 โ tr(๐ข๐ข* ๐ฃ๐ฃ*))
and we conclude ๐ =โ
1 โ |โจ๐ข, ๐ฃโฉ|2.
6
Proposition 4.8. Fix ๐ฃ โ S(๐ป) and choose ๐ข โ S(๐ป) according to the Haar measured๐(๐ข). Then, for any ๐ > 0
Pr[|โจ๐ฃ, ๐ขโฉ|2 โฅ ๐
]โค 2eโ๐ท๐/2.
Proof. Define the non-negative, scalar random variable ๐๐ฃ(๐ข) = |โจ๐ฃ, ๐ขโฉ|2. The Haarintegration formula (1) allows us to compute arbitrary moments: For any ๐ โ N
E[๐๐ฃ(๐ข)๐
]=E[|โจ๐ฃ, ๐ขโฉ|2๐
]= E
[tr(๐ฃ๐ฃ*๐ข๐ข*)๐
]= E
[tr((๐ฃ๐ฃ*)โ๐ (๐ข๐ข*)โ๐
)]
=tr(
(๐ฃ๐ฃ*)โ๐โซ
S(๐ป)(๐ข๐ข*)โ๐d๐(๐ข)
)=(
๐ท + ๐ โ 1๐
)โ1
tr((๐ฃ๐ฃ*)โ๐๐โจ๐
)
=(
๐ท + ๐ โ 1๐
)โ1
= ๐!(๐ท + ๐ โ 1) ยท ยท ยท (๐ท + 1)๐ท โค ๐!
๐ท๐.
This moment growth indicates sub-exponential tail behavior at a scale proportional to1/๐ท. More precisely, choose ๐ > 0 and observe
Pr[๐๐ฃ(๐ข) โฅ ๐ ] =Pr[๐ท๐๐ฃ(๐ข)/2 โฅ ๐ท๐/2] = Pr[exp(๐ท๐๐ฃ(๐ข)/2) โฅ exp(๐ท๐/2)].
Next, apply Markovโs inequality and expand the exponential in a Taylor series:
Pr[exp(๐ท๐๐ฃ(๐ข)/2) โฅ exp(๐ท๐/2)] โคeโ๐ท๐/2E[e๐ท๐๐ฃ(๐ข)/2
]
=eโ๐ท๐/2โโ
๐=0
1๐!
๐ท๐
2๐E[๐๐ฃ(๐ข)๐
]
โคeโ๐ท๐/2โโ
๐=0
12๐
= 2eโ๐ท๐/2.
Combining both statements readily yields Proposition 4.5.4.3 Step II: DiscretizationLet us now take into account the bi-partite structure: dim(๐ป1) = ๐1, dim(๐ป2) = ๐2and ๐ป = ๐ป1 โ ๐ป2 has dimension ๐ท = ๐1๐2.
Let us now choose an arbitrary fixed product state ๐ฃ๐ฃ* = ๐๐* โ ๐๐*. Concentrationโ Proposition 4.5 โ ensures that a Haar-random joint state ๐ข๐ข* will be very far awayfrom this reference state:
Pr[โ๐ข๐ข* โ ๐๐* โ ๐๐*โ1 โค ๐] โค 2eโ๐ท๐/2,
where ๐ = 1 โ ๐2/4 โ (0, 1). The probability of being close is exponentially supressed.What is more, the exponent features ๐ท = ๐1๐2 โ the dimension of ๐ป1 โ ๐ป2.
Intuitively, the set of all possible product space should have a much smaller dimension:it is the tensor product of two unit spheres in ๐1 and ๐2 dimensions each. The notion
7
of covrering nets allows for quantifying this intuition. Fix ๐ > 0 and endow S(๐ป1) andS2(๐ป2) with two covering nets:
{๐ฆ๐}๐1๐=1 โ S(๐ป1) and {๐ง๐}๐2
๐=1 โ S(๐ป2).
It should not come as a surprise that all possible net product states
๐ฉjoint ={
๐ฆ๐๐ฆ*๐ โ ๐ง๐๐ง*
๐ : 1 โค ๐ โค ๐1, 1 โค ๐ โค ๐2}
โ S(๐ป1)โS(๐ป2) โฉ S(๐ป1 โ ๐ป2).
provide an accurate discretization of the set of all product states.
Lemma 4.9. Fix an arbitrary product state ๐๐* โ ๐๐* with ๐ โ S(๐ป1) and ๐ โ S(๐ป2).Then, there is an element ๐ฆ๐๐ฆ
*๐ โ ๐ง๐๐ง
*๐ of the joint net ๐ฉjoint that obeys
โ๐๐* โ ๐๐* โ ๐ฆ๐ฆ* โ ๐ง๐ง*โ1 โค 2๐.
Moreover,
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โฅ min
๐ฃ๐ฃ*โ๐ฉ๐
โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โ 2๐.
The second statement achieves our second goal: discretization. The infimum overall (infinitely many) product states is well approximated by a finite infimum over statesin the net. The cardinality of this net obeys
|๐ฉ๐| = ๐1๐2 โค(
1 + 2๐
)๐1+๐2.
This number scales exponentially in ๐1 + ๐2, not ๐ท = ๐1๐2.
Proof. The covering net assumption ensures that there exist net elements ๐ฆ and ๐ง suchthat
๐2 โฅโ๐ โ ๐ฆโ22 = 2(1 โ Re(โจ๐, ๐ฆโฉ)) โฅ 2(1 โ |โจ๐, ๐ฆโฉ|),
๐2 โฅโ๐ โ ๐งโ22 = 2(1 โ Re(โจ๐, ๐งโฉ)) โฅ 2(1 โ |โจ๐, ๐ฆโฉ|).
Also, for any ๐ฅ, ๐ฆ โ [0, 1]โ
1 โ ๐ฅ2๐ฆ2 โคโ
1 โ ๐ฅ2 =โ
(1 โ ๐ฅ)(1 + ๐ฅ) โคโ
2(1 โ ๐ฅ).
Therefore,
โ๐๐* โ ๐๐* โ ๐ฆ๐ฆ* โ ๐ง๐ง*โ1 =2โ
1 โ |โจ๐ โ ๐, ๐ฆ โ ๐งโฉ|2
=2โ
1 โ |โจ๐, ๐ฆโฉ|2|โจ๐, ๐งโฉ|2
โค2โ
1 โ |โจ๐, ๐ฆโฉ|2 โค 23/2โ
1 โ |โจ๐, ๐ฆโฉ|โค2๐.
8
4.4 Step III: Union boundProposition 4.10. Let ๐ฉ๐ be the net of pure states. Fix 0 < ๐ < 2. Then,
Pr[
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐
]โค |๐ฉ๐| max
๐ฃ๐ฃ*โ๐ฉ๐
Pr[โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐].
This is the final trick. We replace a infimum over infinitely many points by aminimization over finitely many points. Subsequently, we can apply the union bound topull out the minimization. The extra cost we incur is the cardinality of the productstate net:
|๐ฉ๐| โค(
1 + 2๐
)2(๐1+๐2)
Proof. Fix ๐ข โ S(๐ป). Then
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ implies min
๐ฃ๐ฃ*โ๐ฉ๐
โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐,
according to Lemma 4.9. The converse direction must not hold, however. Viewed asevents, the right hand side therefore occurs with a larger probability. Importantly, theminimization is over a finite set of net states. Therefore, we may apply the union bound:
Pr[
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐
]โคPr
[min
๐ฃ๐ฃ*โ๐ฉ๐
โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐
]
=Pr
โกโฃ โ
๐ฃ๐ฃ*โ๐ฉ๐
{โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐}โคโฆ
โคโ
๐ฃ๐ฃ*โ๐ฉ๐
Pr[โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐]
โค|๐ฉ๐| max๐ฃ๐ฃ*โ๐ฉ๐
Pr[โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐].
5 Proof of Theorem 3.2Simply combine Lemma 4.10 with exponentially strong concentration for every netelement:
Pr[
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐
]โค |๐ฉ๐| max
๐ฃ๐ฃ*โ๐ฉ๐
Pr[โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐ + 2๐]
โค |๐ฉ๐|2eโ๐ท(1โ(๐+2๐)2/4)/2.
Finally, recall ๐ท = ๐1๐2 and use the volumetric bound on the cardinality of ๐ฉ๐:
Pr[
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค ๐
]โค 2
(1 + 2
๐
)2(๐1+๐2)eโ๐ท(1โ(๐+2๐)2/4)/2.
We could now optimize over the fineness ๐ of the net.
9
The naive choice ๐ = 1/4 suffices for our purpose. Furthermore, specifying ๐ = 1yields
Pr[
infproduct state ๐ฃ๐ฃ*โ๐ฃ๐ฃ* โ ๐ข๐ข*โ1 โค 1
]โค 292(๐1+๐2)eโ7๐ท/32 โค 2 exp
(4.5(๐1 + ๐2) โ 7๐1๐2
32
),
as advertised in Theorem 3.2.
Lecture 07: Classical reversible circuits
Scribe: Thom Bohdanowicz
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 22, 2019
1 Agenda1. Evolution of classical probability distributions2. Classical reversible circuits
(a) Bit strings and tensor products(b) logical operations and permutation matrices(c) circuit diagrams
2 Evolution of classical probability distributions2.1 Recapitulation of classical probability distributionsThe set of all ๐-variate probability distributions is the simplex:
ฮ๐โ1 ={
๐ฅ โ R๐ : ๐ฅ โฅ 0, โจ1, ๐ฅโฉ = 1}
โ R๐.
It has ๐ extreme points, namely the standard basis vectors: ๐1, . . . , ๐๐. The entiresimplex can be reached by convex mixtures of these extreme points:
ฮ๐โ1 = conv{๐1, . . . , ๐๐} ={
๐โ
๐=1๐๐๐๐ : ๐๐ โฅ 0,
๐โ
๐=1๐๐ = 1
}
Another distinguished point of ฮ๐โ1 is its the (bary-) center:
๐ = 1๐
๐โ
๐=1๐๐ = 1
๐1.
This corresponds to the flat (maximally random) distribution of ๐ events.
2.2 Elementary transformationsWe now address the question of evolution of probability distributions. More precisely, weare looking for linear transformations that map probability distributions onto probabilitydistributions.
Example 2.1 (Reset). Fix ๐ โ ฮ๐โ1 and define ๐ด = ๐1๐ โ โ(๐ป). Then, for any๐ โ ฮ๐โ1:
๐ด๐ = ๐โจ1, ๐โฉ = ๐.
This evolution resets arbitrary distributions ๐ back to some fixed distribution ๐.
2
Although valid, resets behave in a rather peculiar fashion. They do not dependon the input at all. In the following we shall restrict our attention to less invasiveevolutions.
Definition 2.2. A map ๐ด โ โ(๐ป) is unital if the flat distribution is a fix-point: ๐ด1 = 1.
Unital transformations are linear and preserve the barycenter of the simplex. Inorder to fully preserve the geometric structure of the simplex, a unital transformationmust also obey the following three properties:
1. Non-negativity: ๐ด๐๐ โฅ 0. Every entry of ๐ด (with respect to the standard basis)must be non-negative. Otherwise we could identify a test distribution ๐ โ ฮ๐โ1that gets mapped onto a vector that has negative entries.
2. Rows must sum to one: โ๐๐=1[๐ด]๐,๐ = 1 for all 1 โค ๐ โค ๐. This is a consequence
of unitality:โโโ
1...1
โโโ = 1 = ๐ด1 =
โโโโ
โ๐๐=1 ๐ด1๐
...โ๐๐=1 ๐ด๐๐
โโโโ
3. Columns must sum to one: โ๐๐=1 ๐ด๐๐ = 1 for all 1 โค ๐ โค ๐. This is a consequence
of the normalization constraint โจ๐, 1โฉ = 1 for all ๐ โ ฮ๐โ1. Suppose ๐ = ๐ด๐for some ๐ โ ฮ๐โ1. Then, 1 = โจ1, ๐โฉ = โจ1, ๐ด๐โฉ = โจ๐ด๐ 1, ๐. Validity of thisnormalization for any ๐ โ ฮ๐1 enforces ๐ด๐ 1 = 1. Point 2 implies that this isequivalent to demanding that the rows of ๐ด๐ (i.e. the columns of ๐ด) must sumto one.
Operators ๐ด โ โ(๐ป) with these three properties correspond to doubly stochasticmatrices.
Fact 2.3. Every unital map ๐ด that obeys ๐ดฮ๐โ1 โ ฮ๐โ1 is described by a doublystochastic matrix.
It is easy to check that the set of all doubly-stochastic matrices is convex. Thisconvex set is called the Birkhoff polytope.
A permutation matrix is an orthogonal matrix ฮ โ โ(๐ป) such that every row andevery column contain exactly one entry of one. All other entries are zero.
Example 2.4 (Permutation matrices for ๐ = 3).โโโ
1 0 00 1 00 0 1
โโโ ,
โโโ
0 1 01 0 00 0 1
โโโ ,
โโโ
0 0 10 1 01 0 0
โโโ ,
โโโ
1 0 00 0 10 1 0
โโโ ,
โโโ
0 1 00 0 11 0 0
โโโ ,
โโโ
0 0 11 0 00 1 0
โโโ .
Permutation matrices are in one-to-one correspondence with permutations of thestandard basis vectors. For instance, the second matrix in the example permutes the
3
first two standard basis vectors, while leaving the third one invariant:โโโ
0 1 01 0 00 0 1
โโโ ๐1 = ๐2,
โโโ
0 1 01 0 00 0 1
โโโ ๐2 = ๐1,
โโโ
0 1 01 0 00 0 1
โโโ ๐3 = ๐3.
It is easy to check that every permutation matrix is doubly stochastic. What is more,permutation matrices seem to correspond to โextremeโ versions of doubly stochasticmatrices. Most of the entries are zero and therefore exactly saturate the non-negativityconstraint [๐ด๐๐ ] โฅ 0. The Birkhoff-von Neumann theorem makes this intuition precise.
Theorem 2.5 (Birkhoff von-Neumann). The set of doubly stochastic matrices is a convexpolytope. Its extreme points correspond to permutation matrices.
In other words: every doubly stochastic matrix is a convex mixture of permuta-tion matrices. This has profound implications for the study of (unital) evolutions ofprobability distributions.
Corollary 2.6 (Full characterization of unitary maps that preserve the simplex). Everyunital map ๐ด that obeys ๐ดฮ๐โ1 โ ฮ๐โ1 is a convex mixture of permutation matrices.
Permutation matrices are extreme unital transformations. They simply permute theset of extreme points of ฮ๐1 :
ฮ : ๐1, . . . , ๐๐ โฆโ ๐๐(1), . . . , ๐๐(๐).
In turn, they leave the simplex invariant:
ฮ ฮ๐โ1 = conv{ฮ ๐1, . . . , ๐๐} = conv{
๐๐(1), . . . , ๐๐(๐)}
= ฮ๐โ1.
More general untital evolutions (convex mixtures) shrink the simplex. This geometricobservation may be viewed as a starting point for the beautiful theory of majorization.
Remark 2.7. For the sake of simplicity, we have restricted our attention to untial mapsfrom ฮ๐โ1 to itself. This restriction is not necessary. Similar arguments allow for char-acterizing unital evolutions that change the dimension (number of potential outcomes):๐ด : ฮ๐โ1 โ ฮ๐โฒโ1 with ๐โฒ = ๐.
3 Classical reversible circuits3.1 Bit-strings and the extended standard basisThere is a deep connection between unital evolutions of probability distributions andclassical, reversible computation. To make this correspondence as explicit as possible,we introduce the following notation. Fix ๐ = 2 (bi-variate distributions) and identifythe standard basis with either the zero-bit, or the one-bit:
0 โผ ๐0 =(
10
), 1 โผ ๐1 =
(01
).
4
This establishes a connection between {0, 1} = Z2 (bit-land) and {๐0, ๐1} โ ฮ1 โ R2
(deterministic probability distributions).We can use tensor products to extend this identification to bit strings of length ๐:
(๐ฅ1 ยท ยท ยท ๐ฅ๐) โ {0, 1}๐ โผ ๐๐ฅ1 โ ยท ยท ยท โ ๐๐ฅ๐ โ(R2)โ๐
.
We identify length-๐ bit strings with the labels of the extended standard basis of(R2)โ๐.
These extended standard basis vectors form the extreme points of a simplex in 2๐
dimensions:
ฮ2๐โ1 =conv{๐๐ฅ1 โ ยท ยท ยท โ ๐๐ฅ๐: (๐ฅ1 ยท ยท ยท ๐ฅ๐) โ {0, 1}๐} (1)
โ{
๐ฅ โ(R2)โ๐
: โจ๐ฅ, 1โฉ = 1, ๐ฅ โฅ 0}
.
Here, the sign โโโ denotes equivalence up to isomorphisms. The above relation holdstrue with equality if we identify the standard basis of R2๐ with the extended standardbasis of
(R2)โ๐.
3.2 Permutation matrices in (R2)โ๐ and logical operations on bit strings
It is highly instructive to consider the symmetry group of the simplex ฮ2๐โ1 โ (R2)โ๐
defined in (1).3.2.1 The permutation matrix associated with negation (๐ = 1)
For ๐ = 1, there are only two permutations. The identity ฮ = I โ โ(R2) andtransposition. The former leaves standard basis vectors โ and their associated bits โinvariant, while transposition permutes them:
๐ (0) โผ๐ ๐0 =(
0 11 0
)๐0 = ๐1 โผ 1,
๐ (1) โผ๐ ๐1 =(
0 11 0
)๐1 = ๐0 โผ 1.
On the level of bits, we can associate this transformation with the following truth table:
๐ฅ โ {0, 1} ๐ (๐ฅ)0 11 0
.
Hence, the logical operation associated with transposition ๐ โ โ(R2) is negation:
๐ (๐ฅ) = ยฌ๐ฅ for ๐ฅ โ {0, 1}.
3.2.2 The permutation matrix associated with XOR (๐ = 2)
For ๐ = 2,(R2)โ2 โ R4 is accompanied by in total 4! = 24 permutation matrices.
Some of them arise from tensor products of permutation matrices acting on R2 only.
5
Concrete examples are the identity (do nothing) and all possible combinations of singlebit negations:
๐ผ(๐ฅ1๐ฅ2) โผI โ I๐๐ฅ1 โ ๐๐ฅ2 = ๐๐ฅ1 โ ๐๐ฅ2 = ๐ฅ1๐ฅ2,
๐1(๐ฅ1๐ฅ2) โผ๐ โ I๐๐ฅ1 โ ๐๐ฅ2 = ๐ยฌ๐ฅ1 โ ๐๐ฅ2 โผ ยฌ๐ฅ1๐ฅ2,
๐2(๐ฅ1๐ฅ2) โผI โ ๐ ๐๐ฅ1 โ ๐๐ฅ2 = ๐๐ฅ1 โ ๐ยฌ๐ฅ2 โผ ๐ฅ1ยฌ๐ฅ2,
๐1,2(๐ฅ1๐ฅ2) โผ๐ โ ๐ ๐๐ฅ1 โ ๐๐ฅ2 = ๐ยฌ๐ฅ1 โ ๐ยฌ๐ฅ2 โผ ยฌ๐ฅ1ยฌ๐ฅ2
for all ๐ฅ1, ๐ฅ2 โ {0, 1}. Other permutations are genuine elements of R4 and cannot bedecomposed into tensor products of smaller permutation matrices. A concrete exampleis the following permutation matrix
๐ =
โโโโโ
1 0 0 00 1 0 00 0 0 10 0 1 0
โโโโโ โ โ
(R4).
that we have represented with respect to the extended standard basis ๐00 = ๐0โ๐0, ๐01 =๐0 โ ๐1, ๐10 = ๐1 โ ๐0, ๐11 = ๐1 โ ๐1 of R4 โ (
R2)โ2. It corresponds to the followinglogical transformation on length-two bit strings:
๐(00) โผ๐๐0 โ ๐0 = ๐0 โ ๐0 โผ 00,
๐(01) โผ๐๐0 โ ๐1 = ๐0 โ ๐1 โผ 01,
๐(10) โผ๐๐1 โ ๐0 = ๐1 โ ๐1 โผ 11,
๐(11) โผ๐๐1 โ ๐1 = ๐1 โ ๐0 โผ 10.
On the level of bits, we can associate this transformation with the following truth table:
๐ฅ โ {0, 1} ๐ฅ2 โ {0, 1} [๐(๐ฅ1๐ฅ2)]1 [๐(๐ฅ1๐ฅ2)]20 0 0 00 1 0 11 0 1 11 1 1 0
.
Conditioned on the first bit being one (๐ฅ1 = 1), this operation inverts the second bit.Otherwise it does nothing. This action corresponds to the reversible XOR (exclusiveore):
XOR(๐ฅ1๐ฅ2) ={
๐ฅ1๐ฅ2 if ๐ฅ1 = 0,
๐ฅ1ยฌ๐ฅ2 if ๐ฅ1 = 1.
3.2.3 Correspondence for general ๐: Reversible logical functions
The approach outlined generalizes to tensor products of order ๐, or equivalently: bitstrings of length ๐. Permutation matrices ฮ โ (R2)โ๐ may be interpreted as logicaloperations on length-๐ bit strings. We emphasize two key features of permutationmatrices:
6
1. Orthogonality: ฮ โ1 = ฮ ๐ for any permutation matrix2. Group structure: Permutation matrices form a finite group.
Both features have profound implications for the associated logical functions. The firststatement highlights that reading a permutation backwards, corresponds to taking theinverse. In particular: ฮ ๐ ฮ = I implies ๐๐ (๐(๐ฅ1 ยท ยท ยท ๐ฅ๐)) = ๐ฅ1 ยท ยท ยท ๐ฅ๐. Here, ๐๐ = ๐โ1
denotes the โreverseโ of a logical function.
Fact 3.1. Permutation matrices ฮ acting on(R2)โ๐ can represent any reversible logical
function.
The group structure also has profound implications. Finite groups ๐บ typically havea small set of generators ๐บ1, . . . , ๐บ๐ โ ๐บ. These can be combined to generate anyelement of the group. In particular, a small number of permutation matrices suffices toโbuildโ arbitrary permutation matrices. Fact 3.1 allows for extending this structure toreversible logical functions.
Fact 3.2. Any reversible logical function ๐ : {0, 1}๐ โ {0, 1}๐ can be decomposed intoa product of smaller (more elementary) logical functions. This procedure is called acircuit decomposition.
The logical functions associated with generators of permutation matrices are calledan elementary (reversible) gate set. Perhaps surprisingly, a single logical function on 3bits suffices to decompose any reversible ๐-bit function1. This magic function is theToffoli gate. Itโs permutation matrix corresponds to
๐๐๐น =
โโโโโโโโโโโโโโ
1 0 0 0 0 0 0 00 1 0 0 0 0 0 00 0 1 0 0 0 0 00 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 0 10 0 0 0 0 0 1 0
โโโโโโโโโโโโโโ
.
3.3 Circuit diagrams for reversible computationSuppose that we have found a decomposition of a general permutation matrix ฮ โโ((R2)โ๐
)into a product of simpler permutation matrices that act only on a sub-set of
the tensor factors. We can then use the wiring formalism to visualize this decomposition.For instance, suppose that ฮ acts on four tensor factors (bits) and may be decomposedas
ฮ = (๐๐๐น โ I)(๐๐๐ โ ๐๐๐ )I โ ๐ โ3 :(R2)โ4
โ(R2)โ4
.
1In order to achieve this decomposition, we may have to embed the logical function ๐ : {0, 1}๐ โ{0, 1}๐ into a larger bit space {0, 1}๐+๐ and use the additional ๐ bits as โancillasโ.
7
The graphical visualization of this decomposition is
ฮ = ๐๐๐น ๐๐
๐ ๐
๐๐
๐
๐
๐ = .
On the right hand side, we have replaced the individual boxes with standard expressionsfrom the field of logical circuits. The diagram on the right is called a circuit diagram.The identification of bit strings with extended standard basis vectors in (R2)โ๐ naturallyproduces this important framework from electrical engineering. However, there is aslight twist. We read wiring diagrams from right to left, while circuit diagrams aretypically read from left to right.
This example can be generalized to arbitrary logical functions that are associatedwith big permutation matrices ฮ . The fact that permutation matrices form a groupasserts that any such logical function can be decomposed into a sequence of moreelementary logical functions. Wiring diagrams, or circuit diagrams, visualize such adecomposition in a graphical fashion.
Fact 3.3. Wiring diagrams may be viewed as a natural extension of classical circuitdiagrams.
Reversible logical functions (big permutation matrix) form the basic building blockof reversible computation.
Definition 3.4. A classical reversible computation is a three-step procedure:
1. input: ๐ฅ1 ยท ยท ยท ๐ฅ๐ โฆโ ๐๐ฅ1 โ ยท ยท ยท โ ๐๐ฅ๐
2. computation: run the reversible circuit on ๐ฅ1 ยท ยท ยท ๐ฅ๐.3. read-out: Perform the measurement ๐(๐ฆ1 ยท ยท ยท ๐ฆ๐) = โจ๐๐ฆ1 โ ยท ยท ยท โ ๐๐ฆ๐ฮ ๐๐ฅ1 โ ยท ยท ยท ๐๐ฅ๐
โฉ.This measurement yields a deterministic read-out.
It is worthwhile to point out the following feature of this formalism. Readingthe diagram/circuit backwards necessarily produces ฮ ๐ . This is the inverse of ฮ .A hardware implementation of such a circuit therefore has the following appealingfeature. We can re-set a concrete computation by running the circuit backwards! Thisautomatically re-sets the bits to the original input:
๐ฅ1 ยท ยท ยท ๐ฅ๐circuit: right to leftโโ ๐ฆ1 ยท ยท ยท ๐ฆ๐
circuit: left to rightโโ ๐ฅ1 ยท ยท ยท ๐ฅ๐.
Standard, i.e. non-reversible, circuits do not have this feature. They must be re-setby force after a computation is completed. A fundamental thermodynamic restriction,called Landauerโs principle, states that such re-sets necessarily cost work/energy, becausethey erase information. Reversible computation is in principle capable of bypassing
8
this fundamental threshold. For this reason, the study of reversible computations hasrecently gained some traction again. The ultimate goal is to derive hardware thatrequires considerable less energy.
We conclude this section with an important remark regarding generalization. Thetensor product representation of reversible circuits is remarkably powerful. We can easilyextend it to reason about randomized inputs and randomized computations. Simplyreplace the deterministic input ๐๐ฅ1 โยท ยท ยทโ๐๐ฅ๐ by a more general probability distribution๐ โ ฮ2๐โ1, and replace permutation matrices ฮ by doubly stochastic matrices.
Lecture 08: Quantum circuits and quantum computingScribe: Alex Dalzell
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 24, 2019
1 Agenda1. Evolution of quantum states2. Quantum circuits and quantum computation3. Where is the quantum magic?
Unitaries vs. Permutation?Hadamard + superposition?Entanglement?
In the last lecture, we focused on the evolution of classical probability distributions.These are described by doubly stochastic matrices and correspond to the convex hull of allpermutation matrices. Subsequently, we restricted our attention to permutation matrices(extreme evolutions) and standard basis vectors (extreme probability distributions) onthe tensor product space (R2)โ๐. In doing so, we developed the theory of reversibleclassical computation: bit strings are associated with extended standard basis vectorsand reversible logical functions permute them. Here, we apply the same reasoning to(probability) density matrices. This will give rise to quantum computing โ the natural(SDP) extension of reversible computing that has received a tremendous amount ofattention over the past decades.
2 Evolution of quantum states2.1 Recapitulation of quantum states
Set ๐ป = C๐. The set of all quantum states is
S(๐ป) = {๐ โ โ(๐ป) : ๐* = ๐, ๐ โชฐ 0, (I, ๐)}.
This is a convex set. The extreme points are given by pure quantum states ๐ฃ๐ฃ*, where๐ฃ โ C๐ has unit norm:
S(๐ป) = conv{
๐ฃ๐ฃ* : ๐ฃ โ C๐, โจ๐ฃ, ๐ฃโฉ = 1}
.
This formula highlights that any density matrix ๐ corresponds to a convex mixture ofpure states. Another distinguished point is the bary-center:
๐ต =โซ
S๐โ1d๐(๐ฃ)๐ฃ๐ฃ* =
โซd๐(๐)๐๐ฃ0๐ฃ*
0๐* = 1๐
I.
This is called the maximally mixed state and is the quantum analogue of the completelyflat distribution.
2
2.2 Elementary evolution of density matricesWe now address the question of evolution. We are looking for linear transformationsthat map the set of quantum states onto itself:
๐ : โ(๐ป) โ โ(๐ป) s.t. ๐(S(๐ป)) โ S(๐ป).
These operators act on operators, i.e. ๐ โ โ(โ(๐ป)) and we will call them channels.
Example 2.1 (Reset). Fix ๐ โ S(๐ป) and define ๐(๐) = ๐tr(๐) for ๐ โ โ(๐ป). This isa valid quantum channel: ๐(๐) = ๐ โ S(๐ป) for all density matrices ๐ โ S(๐ป).
Although valid channels, resets behave in a peculiar fashion and do only respect thegeometry of quantum states in an extreme sense. All of S(๐ป) is mapped onto a singlepoint. In the following, we shall restrict our attention to less invasive evolutions thatpreserve the barycenter of state space.
Definition 2.2. A channel ๐ : โ(๐ป) โ โ(๐ป) is called unital, if ๐(I) = I.
Let us now focus on the natural quantum extension of permutation matrices.
Proposition 2.3. Fix a unitary matrix ๐ โ ๐(๐). Then, the channel ๐ฐ(๐) = ๐๐๐*
maps pure states onto pure states. These maps are also unital: ๐ฐ(I) = ๐I๐* = I.
Unitary channels appropriately mimic the features of permutation matrices actingon ฮ๐โ1. They fix the barycenter and map quantum state space onto itself:
๐ฐ(S(๐ป)) = {๐๐๐* : ๐* = ๐, ๐ โชฐ 0, (๐ผ, ๐) = 1}.
More general unital channels necessarily shrink S(๐ป). Typically one defines the set ofunital quantum channels via the following three conditions:
1. complete positivity: a channel must map every psd matrix onto a psd matrix (ina strong sense): ๐ โ โ(๐) โชฐ 0 for all ๐ โ โ(๐ป) โ โ(๐ป).
2. unitality: ๐(I) = I.3. trace preservation: (I, ๐(๐)) = (I, ๐).
These three requirements single out a convex set of channels. Unitary channels cor-respond to extreme points of this set. However, this set of extreme points is notcomplete!
Fact 2.4 (Quantum von-Neumann theorem is false). Convex mixtures of unitary channels๐ = โ๐
๐=1 ๐๐๐ฐ๐ do not generate the set of all unital quantum channels. There are exoticexceptions.
This apparent lack of structure renders the study of quantum channels more com-plicated than their classical counter-part (the set of doubly stochastic matrices is apolytope and permutation matrices are its extreme points). Nonetheless, it is usefulto consider unitary channels as the most extreme quantum evolutions. They do share
3
several desirable properties with permutation matrices. In particular, subsequent appli-cations of unitary channels correspond to products of unitaries. Let ๐ฑ1(๐) = ๐1๐๐ *
1and ๐ฑ2(๐) = ๐2๐๐ *
2 be two unitary channels. Then,
๐ฑ2 โ ๐ฑ1(๐) = ๐ฑ2(๐ฑ1(๐)) = ๐ฑ2(๐1๐๐ *1 ) = ๐2๐1๐(๐1๐2)* = ๐ฐ(๐)
is again a unitary channel described by ๐ = ๐1๐2. What is more, unitary matricesform a group ๐ฐ(๐). This allows us to decompose general unitary channels as a productof simpler unitary channels.
3 Quantum circuits and computationsWe can now basically repeat our analysis of reversible circuits from last lecture. Forconcreteness, we will be working explicitly with circuits acting on a colection of qubits(๐ = 2): thus denote
๐ป = C2, S(๐ป) = conv{๐ฃ๐ฃ* : ๐ฃ โ ๐ป, โจ๐ฃ, ๐ฃโฉ = 1}The qubits are the hardware part of a quantum circuit. If we have ๐ qubits, then thequantum circuit acts on an initial quantum state ๐joint โ S(๐ปโ๐) โ โ(๐ปโ๐). The state๐joint plays the role that the initial tensor ๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ (R2)๐ played for classicalcomputation. Then a quantum circuit is simply a big unitary operator ๐ โ โ(๐ปโ๐),which induces the unitary channel
๐ฐ(๐in) = ๐๐in๐*. (1)
In wiring notation, this unitary channel acts in the following fashion:
ยทยทยท
ยทยทยท
ยทยทยท
ยทยทยท ๐in๐ ๐*
Fact 3.1. The set of unitary matrices on ๐ปโ๐ forms a group.Moreover, the set of unitaries is approximately generated by a fixed finite set of
small unitaries.Theorem 3.2 (Solovay, Kitaev). A small number of small (i.e acting on two or threeneighboring tensor factors) suffices to approximate any unitary ๐ โ โ(๐ปโ๐)1.
ยทยทยท
ยทยทยท
๐ โ ยทยทยท
ยทยทยท
๐1 ๐2 ๐2ยท ยท ยท
ยท ยท ยท ๐1๐2๐2
1Specifically, they showed that any ๐ can be approximated to accuracy ๐ using a sequence composedof smaller unitaries of length only polynomial in log(1/๐).
4
A set of gates that is capable of approximately generating any unitary in โ(๐ปโ๐)is called a universal gate set. In the decomposition of ๐ into a sequence of unitariesdrawn from such a generating set, the length of the sequence is called the circuit length.
Remark 3.3. Parallelization can substantially reduce the length of the circuit. Theparallelized circuit length is called the circuit depth.
For example, there are six unitaries shown in the decomposition above, so theywould contribute six toward the circuit length but only three toward the circuit depthsince they can be implemented in three parallel layers.
Decomposition of unitary matrices readily implies decomposition of unitary channels(quantum circuits). Following Eq. (1), if ๐ = ๐1 . . . ๐๐ฟ, then
๐ฐ(๐in) = ๐1 . . . ๐๐ฟ๐in๐*๐ฟ . . . ๐*
1 = ๐ฐ1(๐ฐ2(. . . ๐ฐ๐ฟ(๐in) . . .)) = ๐ฐ1 โ . . . โ ๐ฐ๐ฟ(๐in) (2)
Now we can formally define a quantum computation.
Definition 3.4 (Quantum computation). A quantum computation consists of the followingingredients.
1. Input: Given a classical bit string ๐ฅ1, . . . , ๐ฅ๐, initialize ๐in to be in a tensorproduct state: ๐ฅ1, . . . , ๐ฅ๐ โฆโ ๐in = ๐๐ฅ1๐*
๐ฅ1 โ . . . โ ๐๐ฅ๐๐*๐ฅ๐
:
ยทยทยท
ยทยทยท ๐in = ยทยทยท
ยทยทยท
๐ฅ1 ๐ฅ1
๐ฅ๐ ๐ฅ๐
2. Computation: Apply a unitary channel ๐out = ๐ฐ(๐in = ๐๐in๐*
3. Output: Perform a fixed quantum measurement to retrieve a classical string๐ฆ1, . . . , ๐ฆ๐ from ๐out.
๐ป๐ฆ1,...,๐ฆ๐ = ๐๐ฆ1๐*๐ฆ1 โ . . . โ ๐๐ฆ๐๐*
๐ฆ1 โชฐ 0 and1โ
๐ฆ1...๐ฆ๐=0๐ป๐ฆ1,...,๐ฆ๐ = I. (3)
A quantum computation can then be represented by the following diagram, whichcomputes the probability of measuring output ๐ฆ1, . . . , ๐ฆ๐ after starting with input๐ฅ1, . . . , ๐ฅ๐.
ยทยทยท
ยทยทยท
๐ ยทยทยท
ยทยทยท
๐*
๐ฆ1 ๐ฅ1
๐ฆ๐ ๐ฅ๐
๐ฆ1๐ฅ1
๐ฆ๐๐ฅ๐
(4)
This diagram splits into two separate diagrams that are complex conjugates of oneanother. Recall that a classical reversible circuit diagram looked very similar to a singlecopy of one of these constituent diagrams.
5
The complexity of the quantum computation is the circuit depth of ๐ . A quantumcomputation is said to be polynomial-size if its circuit depth is less than a polynomialin the number of qubits ๐. The ultimate goal of research in quantum algorithms is tofind problems that can be solved by polynomial-sized quantum computations but notpolynomial-sized classical computations. For example, Shorโs algorithm describes apolynomial-sized quantum computation for factoring integers, a problem for which thereis no known classical polynomial-sized computation. Another example of a problemwhere quantum computations may provide a drastic advantage is in the simulation ofquantum systems. But there are other examples where quantum computation may offer asignificant, but less drastic speedup, or where it is suspected quantum computation maybe useful but no rigorous proof has been provided. Such examples include searching foritems within a large unstructured search space, solving linear equations, combinatorialoptimization problems, and even certain machine learning tasks.
This begs the question: what is it about quantum computations that leads to theiradvantage over classical computations? This is the subject of the next section.
4 What is special about quantum computing?4.1 Unitaries vs. Permutations
A clear difference between quantum computations and classical computations is thatquantum computations apply unitary mtarices on an initial tensor product of basisvectors, while classical circuits perform permutation matrices on the basis vectors.
It was discussed in the previous lecture how the โControlled-controlled-NOTโ orโToffoliโ gate is universal for reversible classical computation, since it can generate anypermutation matrix. The Toffoli gate is both a permutation matrix and a unitary matrix,so it qualifies as a quantum gate, but it does not alone form a universal set for quantumcomputation. However, combining the Toffoli gate with the unitary โHadamardโ gate
๐ป = 1โ2
(1 11 โ1
)
is sufficient to form a universal gate set2. Thus we can see that quantum computation is atleast as powerful as classical computation, since a designer of quantum computations mayalways simply forget about the Hadamard gate and perform any classical computationwithin the framework of a quantum computation using only Toffolis. This illustratesthe necessity of the Hadamard gate if one wishes to find any sort of quantum speedup.
Another important distinction between permutation and unitary matrices is thatunitary matrices form a continuous group. For example, rotation matrices
๐ ๐ =(
cos(๐) โ sin(๐)sin(๐) cos(๐)
)
2Technically, Toffoli and Hadamard generate the group of real-valued unitaries. This, however, turnsout to be sufficient for quantum computation.
6
are unitary for any rotation angle ๐ โ [0, 2๐). Thus the degrees of freedom in a quantumcomputation may vary continuously while the circuit is running. The power providedby this fact alone is illustrated through the following example.3
Example 4.1. Suppose ๐ people are standing in a line and each is given a number ๐ฅ๐,such that ๐ = โ๐
๐=1 ๐ฅ๐ is an integer. They wish to collectively compute whether ๐ iseven or odd, but they are only allowed to communicate one bit of information to theperson behind them in line.
Classically, if each ๐ฅ๐ is an integer, then this is possible: the first person computes๐ฅ1 mod 2 and sends the result to the second person, who adds their number and sends๐ฅ1 + ๐ฅ2 mod 2 to the third person, etc. until the final person is able to compute ๐mod 2. However, if the ๐ฅ๐ are not integers but only rational numbers, then this strategydoes not work and there is no way to successfully compute the parity of ๐ given onlythe ability to communicate one classical bit.
What if the people may communicate a single qubit instead of a single classicalbit? In this case, the problem can be solved in both the case that ๐ฅ๐ are integers andin the case they are rationals. Here the first person begins with the single-qubit state๐0๐*
0 and applies the unitary operation ๐ ๐๐ฅ1/2 yielding ๐ ๐๐ฅ1/2๐0๐*0๐ *
๐๐ฅ1/2. Person ๐receives the qubit from person ๐ โ 1 and applies ๐ ๐๐ฅ๐/2. The final state of the qubitbefore measurement is ๐ ๐๐/2๐0๐*
0๐ *๐๐/2, which is ๐0๐*
0 if ๐ is even and ๐1๐*1 if ๐ is
odd. Thus the cases can be deterministically distinguished by the measurement with๐ป0 = ๐0๐*
0, ๐ป1 = ๐1๐*1. This works even in the rational case precisely because the
unitary group is continuous and rotations can be performed by arbitrary angles.
Again through this example we see the importance of the Hadamard gate sinceToffoli gates alone could not be used to approximate rotation gates by arbitrary angles.These arbitary angle rotations put the quantum data into a superposition of multiplebasis states, which is not possible for classical computation. Indeed, as we will see inthe following section, the Hadamard gate and superposition are intimately related.4.2 Hadamard and superpositionThe Hadamard gate is an extremely useful operation all by itself. It creates superposi-tions when acting on standard basis elements:
๐ป๐0 = 1โ2
(1 11 โ1
)(10
)= 1โ
2
(11
)= (๐0 + ๐1)/
โ2
๐ป๐1 = 1โ2
(1 11 โ1
)(01
)= 1โ
2
(1
โ1
)= (๐0 โ ๐1)/
โ2
and thus
๐ปโ๐๐โ๐0 = 1
2๐/2 (๐0 + ๐1)โ๐ = 12๐/2
1โ
๐ฅ1,...,๐ฅ๐=0๐๐ฅ1 โ . . . โ ๐๐ฅ๐ .
3This example was communicated to me by Renato Renner.
7
This equation shows that performing a Hadamard on each of ๐ qubits in the ๐0 basisvector state yields the uniform superposition over all 2๐ extended basis states on ๐ปโ๐.This superposition property can be exploited to yield large speedups over classicalcomputation. A good example of this is the Deutsch-Josza algorithm.
Example 4.2 (Deutsch-Josza algorithm). The task solved by the Deutsch-Josza algorithmis as follows. Let ๐ : {0, 1}๐ โ {0, 1} be a Boolean function satisfying the promise thateither ๐ is constant (i.e. ๐(๐ง) = 0 โ๐ง or ๐(๐ง) = 1 โ๐ง) or ๐ is balanced (i.e. |{๐ง : ๐(๐ง) =0}| = |{๐ง : ๐(๐ง) = 1}|). We only have black-box access to ๐ . This means we may query๐(๐ง) for a certain input ๐ง, but we can learn nothing else about ๐ . We would like todetermine whether ๐ is constant or balanced using as few queries as possible.
Classically, the optimal strategy is simply to begin querying ๐ for different values of๐ง. The minimum number of queries we would need is 2: if the value of ๐ disagrees onthe first two queries we know that ๐ cannot be constant and thus must be balanced.However, the maximum number of queries we will need in the worst case is 2๐โ1 + 1,since if all of the first 2๐โ1 queries agree, it is still possible for the function to bebalanced or for it to be constant.
Quantumly, just one query is enough to solve the problem. The quantum computa-tion that illustrates this consists only of Hadamards and queries to ๐ . We will assumethat ๐ can be queried by applying a unitary ๐๐ that acts on basis states as
๐๐ ๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ ๐๐ง = ๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ ๐๐งโ๐(๐ฅ1,...,๐ฅ๐) (5)
and is linearly extended to the rest of ๐ปโ๐, where โ represents addition modulo 2.Note that a similar construction would be required to construct a classical reversiblecomputation involving queries to ๐ ; indeed, ๐๐ is both a unitary and a permutationmatrix. In wiring notation:
ยทยทยท
๐๐
๐ฅ1
๐ฅ๐
๐ง
= ยทยทยท
๐ฅ1
๐ฅ๐
๐ง โ ๐(๐ฅ)
for all ๐ฅ1, ๐ฅ2, . . . , ๐ฅ๐ โ {0, 1}. (6)
Using this quantum circuit implementation of ๐๐ , the Deutsch-Josza problem issolved by the following quantum computation:
ยทยทยท
ยทยทยท
๐๐
๐ฆ1
๐ฆ๐ 0
0
1
๐ป
๐ป ๐ป
๐ป
๐ป
ยทยทยท
ยทยทยท
๐๐
0
0 ๐ฆ๐
๐ฆ1
1
๐ป
๐ป ๐ป
๐ป
๐ป
(7)
8
To see that it works, we can track the data at different points in the computation.Define
๐ข1 := ๐ปโ(๐+1)(๐โ๐
0 โ ๐1)
= 2โ(๐+1)/21โ
๐ฅ1,...,๐ฅ๐=0๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ (๐0 โ ๐1)
๐ข2 := ๐๐ ๐ข1 = 2โ(๐+1)/21โ
๐ฅ1,...,๐ฅ๐=0๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ (๐๐(๐ฅ) โ ๐1โ๐(๐ฅ))
= 2โ(๐+1)/21โ
๐ฅ1,...,๐ฅ๐=0(โ1)๐(๐ฅ1,...,๐ฅ๐)๐๐ฅ1 โ . . . โ ๐๐ฅ๐ โ (๐0 โ ๐1) (8)
where the last line follows from the fact that the last qubit is ๐0 โ ๐1 if ๐(๐ฅ) = 0, andsimply the negation ๐1 โ ๐0 if ๐(๐ฅ) = 1. When we take tr๐+1(๐ข2๐ข*
2) we still have a rankone matrix and can write it as ๐ข3๐ข*
3 with
๐ข3 := 2โ๐/21โ
๐ฅ1,...,๐ฅ๐=0(โ1)๐(๐ฅ1,...,๐ฅ๐)๐๐ฅ1 โ . . . โ ๐๐ฅ๐ (9)
The final step requires applying ๐ Hadamard gates again which act as (note below theimplied sum over repeated index ๐)
๐ข4 := ๐ปโ๐๐ข3 = 2โ๐1โ
๐ฅ1,...,๐ฅ๐=0(โ1)๐(๐ฅ1,...,๐ฅ๐)
1โ
๐ค1,...,๐ค๐=0(โ1)๐ค๐๐ฅ๐๐๐ค1 โ . . . โ ๐๐ค๐
= 2โ๐1โ
๐ค1,...,๐ค๐=0
โโ
1โ
๐ฅ1,...,๐ฅ๐=0(โ1)๐(๐ฅ1,...,๐ฅ๐)(โ1)๐ค๐๐ฅ๐
โโ ๐๐ค1 โ . . . โ ๐๐ค๐ (10)
Then, we may express the probability of the measurement outcome ๐ฆ1, . . . , ๐ฆ๐ = 0 by
Pr(0 . . . 0|๐ข4๐ข*4) = โจ๐โ๐
0 , ๐ข4๐ข*4๐โ๐
0 โฉ = 2โ2๐
โโ
1โ
๐ฅ1,...,๐ฅ๐=0(โ1)๐(๐ฅ1,...,๐ฅ๐)
โโ
2
={
0 if ๐ is balanced1 if ๐ is constant
Thus one may deterministically distinguish between the two cases using only oneapplication of ๐๐ .
In the Deutsch-Josza example, the computation consisted only of Hadamards andqueries to ๐ . The Hadamards orchestrated a system of superpositions and, later,cancellations of the coefficients for the various basis vectors in such a way that exactlysolved the problem, albeit a problem designed specifically to be easy for such a simplequantum computation.
But are superposition and cancellations really the crux of what makes quantumspecial? A result from Schwarz and Van den Nest challenges this idea.
9
Theorem 4.3 (Schwarz, Van den Nest). A wide variety of architectures for quantumcomputations (including ones that are similar to Shorโs algorithm and the Deutsch-Joszaalgorithm) can be simulated efficiently if the classical outcome probabilities are veryโpeaky.โ
We gain from this result an intuition that quantum computations relying on thesuperposition and interference between different basis states can only yield an exponentialspeedup if the final output distribution is not concentrated on too few of the possibleoutcomes. In other words, there is more to the story than simply interference.4.3 EntanglementWe refer the reader to the Quora article4 by current Caltech postdoc Andru Gheorghiu,which gives an excellent perspective on the utility of entanglement as a resource forquantum algorithms.
But is entanglement sufficient for quantum computation? No. In fact, we willnow illustrate why most quantum states are actually useless for quantum computationdespite having lots of entanglement.Theorem 4.4 (Gross, Flammia, Eisert). Most quantum states are useless for quantumcomputation.Proof strategy. For an ๐ qubit computation, we imagine taking a random pure state๐in = ๐ข๐ข* on ๐ปโ๐ as the input to the quantum computation. Meanwhile, we restrictthe quantum computation to be polynomial-size and require the unitary operation ๐that it implements to have some bounded length ๐ฟ. We suppose that such a quantumcomputation could allow one to solve an interesting problem, and then we show thatyou could do the same thing by tossing ๐ (classical) coins, meaning the interestingproblem could also be solved by an efficient randomized classical computation.
For a fixed ๐ , ๐ข, and ๐ฆ1, . . . , ๐ฆ๐, let
๐ผ = Pr(๐ฆ1 . . . ๐ฆ๐|๐ ๐ข๐ข*๐ *) = tr(๐*๐ฆ1 โ . . . โ ๐*
๐ฆ๐๐ ๐ข๐ข*๐ *๐๐ฆ1 โ . . . โ ๐๐ฆ๐
= tr(๐ฃ๐ฃ*๐ข๐ข*)
where we have implicitly defined ๐ฃ = ๐ *๐๐ฆ1 โ . . . โ ๐๐ฆ๐ . Now if we consider ๐ fixed butrandomize over ๐ข, we can compute the expectation value of this random number (over๐ข):
E[๐ผ] = tr(
๐ฃ๐ฃ*โซ
S(๐ปโ๐๐ข๐ข*d๐(๐ข)
)= tr(๐ฃ๐ฃ*)
dim(๐ปโ๐) = โจ๐ฃ, ๐ฃโฉ2๐
= 12๐
.
Here, we have used the Haar-integration formula for ๐ = 1. Since ๐ข โ S(๐ปโ๐) is aHaar-random vector, we can use higher order Haar integration to establish exponentialconcentration around this mean value.
Proposition 4.5. Fix a unit vector ๐ฃ โ S(๐ปโ๐) and choose ๐ข โ S(๐ปโ๐) uniformly fromthe Haar measure. Then, there is a constant ๐ > 0 such that for any ๐ > 0
Pr[tr(๐ฃ๐ฃ*๐ข๐ข*) โ 1
2๐
โฅ ๐
]โค exp
(โ๐2๐๐2
).
4https://www.quora.com/How-is-quantum-entanglement-beneficial-in-quantum-computers
10
The argument is very similar to the concentration step in Lecture 6 (bound momentsusing the Haar-integration formula and apply the exponential Markov inequality).Alternatively, one could also use concentration of measure (Leviโs Lemma).
Proposition 4.5 implies that the output distribution will be close to flat afterrandomizing over ๐ข, and would be simulatable by flipping classical coins.
The only chance to circumvent this argument is to choose ๐ข randomly but thenchoose ๐ wisely in a ๐ข-dependent way. However, a counting argument shows thatwe simply donโt have enough knobs to turn in ๐ to make a difference. There are 2๐
extended standard basis vectors ๐ฆ1 . . . ๐ฆ๐ (measurement), and ๐๐ฟ|๐บ|๐ฟ different unitaries๐ of length ๐ฟ, where |๐บ| is the number of gates in the unviersal gate set. Thus,
Pr๐ข
[(max
๐max๐ฆ1...๐ฆ๐
Pr[๐ฆ1 . . . ๐ฆ๐|๐ ๐ข๐ข*๐ *] โ 1
2๐
)
โฅ ๐
]โค ๐๐ฟ|๐บ|๐ฟ2๐ exp(โ๐2๐๐2) (11)
where the maximum is taken over all basis states and all unitaries ๐ of length ๐ฟ. Forconstant ๐ and ๐ฟ = poly(๐), this probability will be small for sufficiently large ๐, andthus the output distribution is approximately the uniform distribution and can besimulated by flipping ๐ classical coins.
Lecture 09: Matrix rank
Scribe: Hsin-Yuan Huang
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 29, 2019
1 AgendaThis lecture is devoted to recalling and gathering fundamental and useful properties ofthe matrix rank.
1. Definition2. Computing the matrix rank3. Uniqueness of decompositions4. Upper bounds on the maximal rank5. Typical rank6. Low-rank approximation and the Eckart-Young Theorem
Lecture 10 is then devoted to generalizing the matrix rank to tensors. We shall see thatthe tensor rank behaves very differently and is much more challenging to handle.
2 Definition of matrix rankLet ๐ and ๐ be finite dimensional inner product spaces. Let ๐ * and ๐ * denote theirdual spaces and fix ๐ฃ* โ ๐ * and ๐ค โ ๐ . Consider the linear map ๐ค โ ๐ฃ* : ๐ โ ๐defined by
๐ฅ โฆโ ๐ค๐ฃ*๐ฅ = ๐คโจ๐ฃ, ๐ฅโฉThe linear hull of all such elementary maps forms โ(๐, ๐ ) โ the space of linear operatorsfrom ๐ to ๐ .
Definition 2.1 (rank-one operator). A linear operator ๐ โ โ(๐, ๐ ) has rank one if itcorresponds to an elementary tensor product: ๐ = ๐ค๐ฃ* for ๐ฃ* โ ๐ * (๐ฃ โ ๐ ) and๐ค โ ๐ .
The matrix rank is a straightforward generalization of this concept.
Definition 2.2 (matrix rank). The rank of an operator ๐ โ โ(๐, ๐ ) is the smallest number๐ such that ๐ can be represented as a sum of ๐ rank-one tensor products:
๐ =๐โ
๐=1๐ค๐๐ฃ
*๐ .
We emphasize that this definition may seem non-standard. The following equivalencerelation shows that it nonetheless captures the same concept.
Proposition 2.3. Fix ๐ โ โ(๐, ๐ ). The following are equivalent:
2
1. rank: ๐ has rank ๐
2. column rank: ๐ = dim(Im(๐))
3. row rank: ๐ = dim(๐ ) โ dim(ker(๐))
4. determinantal rank: all size (๐ + 1) minors of any matrix representation havedeterminant zero.
Proof sketch. We will show how 1. implies 2. For the remaining equivalences, we referto standard textbooks and lecture notes on linear algebra. Note that minimiality of thedecomposition ๐ = โ๐
๐=1 ๐ค๐ โ๐ฃ*๐ ensures that {๐ฃ1, . . . , ๐ฃ๐} โ ๐ and {๐ค1, . . . , ๐ค๐} โ ๐
are linearly independent. Indeed, suppose that this were not the case: ๐ค๐ = โ๐โ1๐=1 ๐๐๐ค๐ .
Then,
๐ =๐โ
๐=1๐ค๐ โ ๐ฃ*
๐ =๐โ1โ
๐=1๐ค๐ โ ๐ฃ*
๐ +๐โ1โ
๐=1๐๐๐ค๐ โ ๐ฃ*
๐ =๐โ1โ
๐=1๐ค๐ โ (๐ฃ๐ + ๐๐๐ฃ๐)*
which would contradict minimality. Next note that
๐๐ฅ =๐โ
๐=1๐ค๐โจ๐ฃ๐, ๐ฅโฉ =
๐โ
๐=1๐๐๐ค๐ โ span{๐ค1, . . . , ๐ค๐}.
Choosing different inputs ๐ฅ leads to different coefficients ๐๐ and we can reach all ofspan{๐ค1, . . . , ๐ค๐}. Linear independence ensures that this is a ๐-dimensional subspaceof ๐ .
3 Computing the matrix rankThere is an one-to-one correspondence between linear operators ๐ โ โ(๐, ๐ ) anddim(๐ ) ร dim(๐ ) matrices.
Fact 3.1. Computing the matrix rank is easy!
Several efficient algorithms exist, each with their own advantages and drawbacks.All of them requires ฮ(๐3) arithmatic operations to compute the matrix rank whendim(๐ ) = dim(๐ ) = ๐. For example,
1. Gaussian elimination: This can be done analytically but floating point algorithmscan become unreliable.
2. Singular value decomposition: It is stable and can provide the minimum rankdecomposition but slightly more expensive.
3. QR decomposition with column pivoting: Less expensive than SVD and morerobust than Gaussian elimination.
3
4 Minimal rank decompositions and uniquenessFix a linear operator ๐ โ โ(๐, ๐ ). The singular value decomposition readily providesa minimal rank decomposition.
Theorem 4.1 (Singular value decomposition). The singular value decomposition (SVD)decomposes any matrix ๐ โ โ(๐, ๐ ) into a triple of structured matrices:
๐ = ๐ฮฃ๐ * = [๐ข1, . . . , ๐ข๐]diag(๐1, . . . , ๐๐)[๐ฃ1, . . . , ๐ฃ๐]* =๐โ
๐=1๐๐๐ข๐๐ฃ
*๐ .
The matrices ๐ โ โ(๐, ๐ ) and ๐ โ โ(๐, ๐ ) are linear isometries (equivalently:{๐ข1, . . . , ๐ข๐} โ ๐ and {๐ฃ1, . . . , ๐ฃ๐} โ ๐ are orthonormal sets of vectors) and ๐1, . . . , ๐๐ >0 are strictly positive numbers (singular values). Computing the SVD requires ๐ช(๐3)arithmetic operations.
The SVD is the work-horse of numerical linear algebra. For our purposes it providestwo types of highly relevant information:
1. The matrix rank ๐ is just the number of non-zero singular values.2. A minimal rank decomposition with rich structure:
๐ = ๐ฮฃ๐ * =๐โ
๐=1๐๐๐ข๐๐ฃ
*๐ โ
๐โ
๐=1๐๐๐ข๐ โ ๐ฃ*
๐ .
Fact 4.2. Computing minimal rank decompositions of matrices is easy!
Next we want to address another fundamental question: are these minimal rankdecomposition unique? Firstly, note that each matrix factorization carries two kinds oftrivial ambiguities:
1. Permutation of factors pairs: Choose ๐ โ ๐ฎ๐. Then, permuting factor pairs(๐ข๐(๐), ๐ฃ๐(๐)) does not change the decomposition:
๐โ
๐=1๐ข๐(๐) โ ๐ฃ*
๐(๐) =๐โ
๐=1๐ข๐ โ ๐ฃ*
๐ = ๐.
2. Scaling of factor pairs: Fix non-zero ๐ผ1, . . . , ๐ผ๐ โ F. Then, scaling each factorpair (๐ข๐, ๐ฃ๐) โฆโ
(๐ผโ1
๐ ๐ข๐, ๏ฟฝ๏ฟฝ๐ฃ๐
)does not change the decomposition:
๐โ
๐=1๐ผโ1
๐ ๐ข๐ โ (๏ฟฝ๏ฟฝ๐๐ฃ๐)* =๐โ
๐=1
๐ผ๐
๐ผ๐๐ข๐ โ ๐ฃ*
๐ .
Such symmetries are intrinsic to any factorization and cannot be avoided.
Definition 4.3. We call a minimal rank decomposition of operators ๐ โ โ(๐, ๐ ) โ๐ โ ๐ * unique if it is uniquely determined up to trivial symmetries (permutation andscaling).
4
Proposition 4.4. Minimal rank factorizations of operators ๐ โ โ(๐, ๐ ) are neverunique.
Proof. Fix ๐ โ โ(๐, ๐ ) and apply an SVD: ๐ = ๐ฮฃ๐ *. Choosing ๐ข1, . . . , ๐ข๐ โ ๐to be the columns of ๐ and ๐ฃ1, . . . , ๐ฃ๐ โ ๐ to be the rows of ๐ gives rise to a minimalrank decomposition. Use I = โ๐
๐=1 ๐๐๐*๐ t conclude
๐โ
๐=1๐๐๐ข๐๐ฃ
*๐ =
๐โ
๐=1๐ฮฃ๐๐๐
*๐ ๐ * = ๐ฮฃ๐ * = ๐.
Evidently, this is a minimal rank decomposition. However, we could also have includedin additional invertible map ๐ :
๐ = ๐ฮฃ๐ * = ๐ฮฃ๐ ๐ โ1๐ * =๐โ
๐=1๐ฃ๐๏ฟฝ๏ฟฝ
*๐ .
This is also a valid minimal rank decomposition. Unless ๐ is a signed permutationmatrix or a diagonal scaling matrix, this alternative decomposition is not related to theorginal one via a trivial symmetry operation.
5 Upper bounds on the maximal rankTheorem 5.1. The rank of any ๐ โ โ(๐, ๐ ) obeys ๐ โค min{dim(๐ ), dim(๐ )}.
Proof. Use the equivalent definitions for rank from Proposition 2.3. The column-rankdefinition readily implies
๐ =dim(Im(๐)) โค dim(๐ ),
while a row-rank definition ensures
๐ =dim(Im(๐*)) โค dim(๐ ).
Both bounds are necessarily valid and we can without loss choose the minimum of bothto make the bound as tight as possible.
6 Typical rankTypical rank addresses the following question: what matrix rank do we expect to see forgeneric or typical matrices? Such typical statements require endowing the space of alloperators โ(๐, ๐ ) with a โfairโ measure. One way to achieve this is to consider randommatrices with independent entries that follow a continuous distribution. StandardGaussian matrices meet all these desiderata:
๐ =
โโโ
๐11 . . . ๐1๐... . . . ...
๐๐1 . . . ๐๐๐
โโโ ,
5
where each ๐๐๐ is an independent instance of a standard Gaussian random variable:๐๐๐
i.i.d.โผ ๐ฉ (0, 1) for real-valued matrices and ๐๐๐i.i.d.โผ ๐ฉ (0, 2โ1/2) + ๐๐ฉ (0, 2โ1/2) for
complex-valued matrices. Such Gaussian matrices correspond to matrix representationsof generic operators. Rotational invariance moreover ensures that the choice of basis isirrelevant.
Fact 6.1. A typical/generic matrix โ e.g. a matrix with standard Gaussian entries โsaturates the rank inequality ๐ โค min{dim(๐ ), dim(๐ )} with probability one.
This fundamental fact from random matrix theory holds true regardless whether wework with real-valued or complex matrices. It highlights that the matrix rank bound istight in a strong sense: it is saturated for almost all matrices.
7 Low-rank approximations and the Eckart-Young theoremTheorem 7.1 (Eckart-Young Mirski theorem). Let ๐ โ โ(๐, ๐ ) be a matrix with SVD๐ = ๐ฮฃ๐ *. Then, the best rank-๐ approximation is the truncated SVD:
๐๐ = ๐diag(๐1, . . . , ๐๐, 0, . . . , 0)๐ *.
It achievesโ๐๐ โ ๐โ2
๐น =๐โ
๐=๐+1๐๐.
Remark 7.2. The original Eckart-Young theorem proves optimality of the truncated SDPfor approximation in operator norm.
Proof. We try to find the solution to the following problem:
minimize โ๐ โ ๐โ2๐น
subject to rank(๐) = ๐.
First, note that
โ๐ โ ๐โ2๐น =tr(๐๐*) โ tr(๐๐*) โ tr(๐*๐) + tr(๐๐*)
=โ๐โ2๐น + โ๐โ2
2 โ tr(๐๐*) โ tr(๐*๐).
We need to make these trace-inner products as large as possible. Von-Neumannโs traceinequality (which uses Birkhoff-von Neumann) states that
|tr(๐๐*)| โค๐โ
๐=1๐๐(๐)๐๐(๐)
with equality if and only if ๐ = ๐ฮฃ๐ * and ๐ = ๐๐ท๐ *. This tells us that the SVDprovides us with the โright basis rotations: ๐ = โ๐
๐=1 ๐๐๐ข๐๐ฃ*๐ . But at most ๐ singular
values can be non-zero. Therefore,
โ๐ โ ๐โ2๐น =
๐โ
๐=1๐2
๐ +๐โ
๐=1๐2
๐ โ 2๐โ
๐=1๐๐๐๐ =
๐โ
๐=1(๐๐ โ ๐๐)2.
This expression is minimized if we set ๐1 = ๐1, . . . , ๐๐ = ๐๐, ๐๐+1 = ยท ยท ยท = ๐๐ = 0.
6
This theorem has profound implications for data processing.
โ Greed is good: The theorem justifies a greedy approach to matrix factorization:find the largest rank-one factor, peel it off and repeat. You can expect progressby going from rank-๐ to rank-(๐ + 1).
โ Dimension reduction: The Eckart-Young-Mirski theorem provides the foundationfor dimension reduction (such as principal component analysis) in data analysis โan important subroutine in machine learning. The idea of low-rank approximationis also used extensively in modern recommendation system such as those poweringYouTube and Netflix.
โ Noise resilience: If you wiggle the original matrix a little bit, the best rank-๐approximation changes slightly but is still a good approximation to the originalmatrix. Hence it is robust against (small) noise corruption.
Fact 7.3. The Eckart-Young theorem justifies greedy approaches to matrix factorization:find the largest rank-one factor, peel it off and iterate. Increasing the rank in anapproximation gets you closer to the true underlying matrix.
Lecture 10: Tensor rank
Scribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 01, 2019
1 AgendaLast lecture was devoted to studying several desirable features of matrix rank, as wellas their implications. Today, we will extend this rank discussion to tensors. We shallsee that, as a rule, tensor rank behaves very differently from its matrix-counterpart andis considerably more challenging to handle.
1. Definition of tensor rank2. Computation of tensor rank3. Uniqueness of minimal rank decompositions4. Upper bounds on the maximal rank5. Tensor rank depends on underlying field (C vs. R)6. Typical rank7. Low-rank approximations and border rank8. Examples: the standard inner product and the Hadamard product as tensors
We will mostly restrict our attention to tensors of order three. Generalizations totensors of higher order are straightforward.
2 Recapitulation: Matrix rankThe rank of an operator ๐ โ โ(๐, ๐ ) is the smallest number of rank-one tensorsrequired to represent ๐:
๐ =๐โ
๐=1๐ค๐ โ ๐ฃ*
๐ .
It has many interesting and desirable properties:
1. Computation: several efficient algorithms exist for computing the matrix rank.2. Identifying minimal rank decompositions and uniqueness: The SVD provides an
efficient way to compute minimal rank decompositions. These are never unique3. Upper bound: ๐ โค min{dim(๐ ), dim(๐ )}4. typical rank: a generic matrix saturates the rank bound with probability one
(regardless whether we work over R, or C)5. Low rank approximations: The Eckart-Young-Mirsky states that the best rank-๐
approximation is a truncated SVD. Increasing ๐ can only increase approximationaccuracy.
2
3 Definition of tensor rankLet ๐ด, ๐ต, ๐ถ be inner product spaces with dimensions ๐, ๐ and ๐. We also assume
๐ โฅ ๐ โฅ ๐.
Recall that the tensor product space ๐ด โ ๐ต โ ๐ถ is the linear hull of all elementarytensor products:
๐ก = ๐ โ ๐ โ ๐ ๐ โ ๐ด, ๐ โ ๐ต, ๐ โ ๐ถ.
Definition 3.1. A tensor ๐ก โ ๐ด โ ๐ต โ ๐ถ has rank-one if it corresponds to an elementarytensor product.
Definition 3.2. The rank of a tensor ๐ก โ ๐ด โ ๐ต โ ๐ถ is the smallest number ๐ such that ๐กcan be represented as a sum of ๐ rank-one tensors:
๐ก =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐.
4 Computing the tensor rankIn contrast to the matrix rank โ where we can choose between several different efficientnumerical algorithms to compute it โ computing the tensor rank is in general verychallenging.
Fact 4.1 (Hastad, 1990). Computing the tensor rank of ๐ก โ ๐ด โ ๐ต โ ๐ถ is difficult. It isNP-complete for any finite field and NP-hard over the rational numbers.
The proof follows from a standard reduction of 3SAT which is known to be NPcomplete.
5 Uniqueness of minimal rank decompositionsAny factorization into sums of rank-one tensors is necessarily accompanied by trivialambiguities. Let
๐ก =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐
be a minimal rank decomposition of ๐ก โ ๐ด โ ๐ต โ ๐ถ. Then, this decomposition isinvariant under permuting factor triples: (๐๐, ๐๐, ๐๐) โฆโ (๐๐(๐), ๐๐(๐), ๐๐(๐)). Here, ๐ โ ๐ฎ๐
can be any permutation. Indeed,๐โ
๐=1๐๐(๐) โ ๐๐(๐) โ ๐๐(๐) =
๐โ
๐=1๐๐ โ ๐๐ โ ๐๐ = ๐ก.
Similarly, scaling of individual factors also leaves the final decomposition invariant.Choose ๐ผ๐, ๐ฝ๐ = 0. Then, (๐๐, ๐๐, ๐๐) โฆโ (
๐ผ๐๐๐, ๐ฝ๐๐๐, (๐ผ๐ฝ)โ1๐๐)
provides a a rank-๐decomposition of ๐ก. Permutation and scaling of factors are trivial ambiguities thatcannot be avoided. A minimal rank decomposition is unique up to trivial ambiguities
3
if it is unique up to scaling and permutation of factors. Recall that minimal rankdecompositions of operators (matrices) are never unique โ unless one imposes strongadditional assumptions and constraints. The situation for tensors of order ๐ โฅ 3is very different. A seminal result by Kruskal ensures uniqueness of minimal rankdecompositions under much weaker conditions.
To state this result, we are going to introduce the following useful notation by Kolda.Let ๐ก = โ๐
๐=1 ๐๐ โ ๐๐ โ ๐๐ be a decomposition of ๐ก โ ๐ด โ ๐ต โ ๐ถ into rank-one tensors.Define the three factor matrices
๐ด = [๐1 ยท ยท ยท ๐๐], ๐ต = [๐1, ยท ยท ยท ๐๐], ๐ถ = [๐1, . . . , ๐๐].
and set๐ก =
๐โ
๐=1๐๐ โ ๐๐ โ ๐๐ =: [[๐ด, ๐ต, ๐ถ]].
It is easy to keep track of permutation ambiguities in this decomposition. Let ฮ โ R๐ร๐
be a permutation matrix. Then,
[[๐ดฮ , ๐ตฮ , ๐ถฮ ]] = [[๐ด, ๐ต, ๐ถ]].
Scaling ambiguities act in a similar fashion.
Definition 5.1. The ๐-rank of a matrix ๐ด is the largest number ๐๐ด such that any ๐columns are linearly independent.
This concept is closely related to the spark of a matrix (the smallest number ๐ suchthat there exists a set of ๐ linearly dependent columns).
Theorem 5.2 (Kruskal, 1977). Suppose that ๐ก โ ๐ด โ ๐ต โ ๐ถ admits a decomposition
๐ก =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐ = [[๐ด, ๐ต, ๐ถ]].
Suppose that๐ โค 1
2(๐๐ด + ๐๐ต + ๐๐ถ) โ 1.
Then ๐ก has rank-๐ and this decomposition is unique up to trivial symmetries (permutationsand scaling).
6 Upper bounds on the maximal tensor rankUpper bounds on the maximal rank of tensors exist, but they are much weaker than inthe matrix case.
Theorem 6.1. Consider ๐ด โ ๐ต โ ๐ถ with dimensions ๐, ๐ and ๐. Then, the rank of any๐ก โ ๐ด โ ๐ต โ ๐ถ obeys
๐(๐ก) โค min{๐๐, ๐๐, ๐๐}.
4
The proof of this claim is instructive, because it follows a basic line of thoughts. Weknow very little about tensors, but a lot about matrices. Therefore, it is often beneficialto convert tensor problems into matrix problems. The following technical result followsfrom such a reduction argument.
Lemma 6.2. Let ๐ก โ ๐ดโ๐ต โ๐ถ. Then, its rank ๐ equals the number of rank-one matricesthat are required to span (a space containing) all possible marginalizations
๐ก(๐ด*) := span{
(๐* โ I โ I)๐ก =๐โ
๐=1๐ผ(๐๐) โ ๐๐ โ ๐๐ : ๐* โ ๐ด*
}โ ๐ต โ ๐ถ โ โ(๐ถ, ๐ต).
Proof. Suppose that ๐ก has rank ๐ and express it as ๐ก = โ๐๐=1 ๐๐ โ ๐๐ โ ๐๐ (Note that
in contrast to matrices, the vectors ๐๐, ๐๐ and ๐๐ need not be linearly independent).Therefore,
๐ก(๐ด*) โ {๐1 โ ๐1, . . . , ๐๐ โ ๐๐}is spanned by at most ๐ rank-one matrices. Conversely, suppose that ๐ก(๐ด*) is spannedby ๐ rank-one matrices ๐1 โ ๐1, . . . , ๐๐ โ ๐๐. Choose a (orthonormal) basis ๐*
1, . . . , ๐*๐ of
๐ด*. Then,
๐ก(๐*๐) =
๐โ
๐=1๐ฅ๐,๐๐๐ โ ๐๐.
Next, let ๐1, . . . , ๐๐ be the dual vectors associated with ๐*1, . . . , ๐*
1 (column vs. rowvectors). Then,
๐ก =โ
๐,๐
๐ฅ๐,๐๐๐ โ ๐๐ โ ๐๐ =๐โ
๐=1
(โ
๐
๐ฅ๐๐๐๐
)
โ โ ๏ฟฝ๏ฟฝ๐
โ ๐๐ โ ๐๐.
This is a valid decomposition into exactly ๐ rank-one factors. Therefore, the tensor rankcan be at most ๐.
Proof of Theorem 6.1. The space โ(๐ถ, ๐ต) has dimension ๐๐ = dim(๐ถ)dim(๐ต). This isalso the maximum number of matrices that are required to span this operator space.With the previous Lemma, we conclude
๐ โค dim(๐ก(๐ด*)) = dim(โ(๐ถ, ๐ต)) = ๐๐
Permuting tensor factors also establishes ๐๐ and ๐๐ as upper bounds.
7 The tensor rank depends on the underlying fieldChoose real-valued vector spaces ๐ = R๐ and ๐ = R๐. Then, we may represent๐ โ โ(๐, ๐ ) as a real-valued matrix. Alternatively, we could embed ๐ โ C๐,๐ โ C๐ and extend ๐ linearly to โ(๐ , ๏ฟฝ๏ฟฝ ). The matrix rank does not care: it is thesame in both cases.
5
This is not the case for tensors, as the following example from Kruskal shows. Fixdim(๐ด) = dim(๐ต) = dim(๐ถ) = 2.
๐ก = ๐1 โ ๐1 โ ๐1 + ๐1 โ ๐2 โ ๐2 โ ๐2 โ ๐2 โ ๐1 โ ๐2 โ ๐1 โ ๐2.
Over R, this tensor has rank three:
๐ก =[[(
1 0 10 1 โ1
),
(1 0 10 1 1
),
(1 1 0
โ1 1 1
)]]
However, over C, we can find a rank-2 decomposition:
๐ก =[[
1โ2
(1 1
โ๐ ๐
),
1โ2
(1 1๐ โ๐
),
(1 1๐ โ๐
)]].
8 Typical rankThe matrix rank bound ๐ โค min{dim(๐ ), dim(๐ )} is useful. A typical matrix is goingto saturate it.
For tensors, the situation is quite different. A simple parameter counting argumentsuggests the following typical behavior. Consider the tensor product space ๐ด1 โยท ยท ยทโ๐ด๐
with dimensions ๐1, . . . , ๐๐. The number of degrees of freedom for a rank-๐ tensor is
๐(๐ผ1 + ยท ยท ยท + ๐ผ๐) โ ๐(๐ โ 1) = ๐(๐1 + ยท ยท ยท ๐๐ โ (๐ โ 1)),
while the total degrees of freedom are
dim(๐ด1 โ ยท ยท ยท โ ๐ด๐) = ๐1 ร ยท ยท ยท ร ๐๐.
We expect that the typical rank occurs precisely at the threshold where both numbersbecome equal:
๐ = โ ๐1 ยท ยท ยท ๐๐
๐1 + ยท ยท ยท + ๐๐ โ ๐ + 1โ.
This simple counting argument is approximately correct. Here are some rigorousresults that provide some insights.
Theorem 8.1. 1. (Strassen) The typical rank of an element of C3 โ C3 โ C3 is five(not the expected four).
2. (Strassen-Lickteig) For all ๐ = 3, the typical rank of an element of C๐ โ C๐ โ C๐
is โ ๐3
3๐โ2โ (as expected).3. The typical rank of an element of C2 โ C2 โ C3 and C2 โ C3 โ C3 is three (as
expected).
For tensor products with equal dimension (๐1 = . . . ๐๐ = ๐), we have
๐ = โ ๐๐
๐(๐ โ 1) + 1โ โผ ๐๐โ1
๐< ๐๐โ1.
6
The typical rank does not saturate the maximal rank bound!Finally, the typical rank also depends on the underlying field. Over C the typical
rank is unique, but over R this need not be the case. Tensors in R2 โ R2 โ R2 havetypical rank two and three over R. Monte Carlo experiments reveal that rank-twotensors fill about 79% of the space, while rank-three tensors fill the remaining 21%.Rank-one tensors are possible, but occur with probability zero.
9 Low-rank approximations and border rankRecall that the best rank-๐ approximation of a matrix is given by the truncated SVD(Eckart-Young Theorem). This supports greedy, iterative strategies to approximate amatrix.
Similar approaches seem like a promising avenue to generalize to tensors. Suppose๐ก โ ๐ด โ ๐ต โ ๐ถ admits a decomposition
๐ก =๐โ
๐=1๐๐๐๐ โ ๐๐ โ ๐๐,
where โจ๐๐, ๐๐โฉ = โจ๐๐, ๐๐โฉ = โจ๐๐, ๐๐โฉ = 1 for all 1 โค ๐ โค ๐. Assume moreover, that theweights are arranged in non-increasing order: ๐1 โฅ ๐2 โฅ ยท ยท ยท โฅ ๐๐. Apparent parallelsto the SVD suggest the following iterative approach for approximating ๐ก:
1. identify the largest contributing rank-one factor ๐1๐1 โ ๐1 โ ๐1 (somehow).2. Subtract its contribution and iterate.
We expect that this greedy method provides us with better and better approximationsof ๐ก.
Unfortunately, this intuition is flawed. Tensors are much more complicated thanmatrices. We illustrate this with the following simple example:
๐ก = ๐1 โ ๐1 โ ๐2 + ๐1 โ ๐2 โ ๐1 + ๐2 โ ๐1 โ ๐1,
where ๐1, ๐2 โ ๐ด, ๐1, ๐2 โ ๐ต and ๐1, ๐2 โ ๐ถ are linearly independent each. Evidently,this tensor has rank three. It can, however, be approximated to arbitrary accuracy by arank-two tensor:
๐ (๐) = 1๐
((๐1 + ๐๐2) โ (๐1 + ๐๐2) โ (๐1 + ๐๐2) โ ๐1 โ ๐1 โ ๐1).
More precisely, let โ ยท โ be the Euclidean norm induces by the extended standard innerproduct on ๐ด โ ๐ต โ ๐ถ. Then,
โ๐ก โ ๐ (๐)โ = ๐โ๐2 โ ๐2 โ ๐1 + ๐2 โ ๐1 โ ๐2 + ๐1 โ ๐2 โ ๐2 + ๐๐2 โ ๐2 โ ๐2โ
which can be made arbitrarily small. Many different simple examples for this behaviorare known. These examples motivate the following definition:
Definition 9.1 (border rank). A tensor ๐ก has border rank ๐(๐ก) = ๐ if it is a limit of tensorsof rank ๐, but not a limit of tensors with rank ๐ < ๐.
7
There is an elegant geometric interpretation of this behavior. Intuitively, ๐ (๐) isa point on the line spanned by rank-one tensors inside the space of rank-two tensors.Taking the limit results in a point in the tangent space of ๐1 โ ๐1 โ ๐1. This point onthe tangent line is itself not contained in the set of rank-two-tensors, but infinitesimallyclose.
The property of having border rank at most ๐ is an algebraic property โ similar tothe matrix rank-characterization via vanishing minors. As such it can in principle beprecisely tested by checking whether certain polynomial equations vanish identically.While this is not efficient by any means, it provides at least a strategy that can beexecuted vor very small tensor products.
Remark 9.2 (Relation between rank and border rank). Very little is known about the relationbetween rank and border rank. For ๐ก โ ๐ด1 โ ยท ยท ยท ๐ด๐ with border rank 2, the actual rankcan be anywhere between 2 and ๐. More is known for symmetric tensors ๐ก โ โ๐(๐ด),because they are closely related to homogeneous polynomials.
10 Examples10.1 The standard inner product as a order-2 tensor (matrix)
Fix ๐ = F๐ and define the standard inner product:
๐ * ร ๐ โ F โจ๐ฅ, ๐ฆโฉ =๐โ
๐=1๏ฟฝ๏ฟฝ๐๐ฆ๐.
This is a bilinear form. The space of bilinear forms is closely related to the tensorproduct ๐ โ ๐ . In fact, we defined ๐ โ ๐ to be the dual space of the space of allbilinear forms. In particular,
๐ฅ โ ๐ฆ : Bil(๐, ๐ ) โ F ๐ต โฆโ ๐ต(๐ฅ, ๐ฆ).
A moment of thought reveals that the tensor associated with the inner product is
๐โ
๐=1๐๐ โ ๐*
๐ โ ๐ โ ๐ *,
which corresponds to the identity operator I โ โ(๐, ๐ ). We know that this operatorhas matrix rank ๐ and minimal decompositions are never unique:
I =๐โ
๐=1(๐๐๐) โ (๐๐๐)* for any unitary ๐ .
This is just a fancy way of saying that the standard inner product is basis-independent.Next, we turn to the question of computing the inner product:
โจ๐ฅ, ๐ฆโฉ = ๐ฅ*I๐ฆ =๐โ
๐=1๐ฅ*๐ข๐๐ข
*๐ ๐ฆ =
๐โ
๐=1โจ๐ฅ, ๐ข๐โฉโจ๐ข๐, ๐ฅโฉ.
8
Evaluating this expression requires computing 2๐ different inner products in general.A smart choice of basis substantially reduces the cost for individual scalar productevaluations. If we opt for the standard basis, both โจ๐ฅ, ๐๐โฉ and โจ๐๐, ๐ฆโฉ are very cheap toevaluate. The total arithmetic cost becomes ๐ช(๐).
We could also use a very bad โ e.g. a generic ONB. In this case the arithmetic costcould blow up to ๐ช(๐2).
10.2 The Hadamard product as an order 3-tensor
Endow ๐ = F๐ with the standard basis ๐1, . . . , ๐๐. The Hadamard product is typicallydefined as
๐ ร ๐ โ ๐ : ๐ฅ โ ๐ฆ =๐โ
๐=1๐๐โจ๐๐, ๐ฅโฉโจ๐๐, ๐ฆโฉ.
We can view this as a tensor in ๐ โ ๐ * โ ๐ *:
โ =๐โ
๐=1๐๐ โ ๐*
๐ โ ๐*๐ .
Modulo vector space dualities (๐ โ ๐ *), this tensor looks like the standard extensionof the identity to order 3:
โ =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐ = [[I, I, I]].
Kruskalโs theorem on uniqueness shows that this order-three tensor is essentiallyunique.
Corollary 10.1. The Hadamard tensor โ โ ๐ โ3 has rank ๐ = ๐ and is unique up totrivial ambiguities (permutations and scaling), provided that ๐ โฅ 2.
Proof. The decomposition (10.2) has rank ๐ = ๐ and the individual factor matricesobey ๐I = ๐ each. Correctness of the matrix rank and uniqueness then follow fromchecking Kruskalโs condition:
๐ = ๐ โค 12(๐I + ๐I + ๐I) โ 1 = 3
2๐ โ 1.
The resulting inequality is true provided that ๐ โฅ 2.
The uniqueness requirement ๐ โฅ 2 is perhaps tautological, but worth noting. For๐ = 1, Hadamard and standard inner product coincide. Closer to home, we concludethe following well-known fact
Fact 10.2. In contrast to standard-inner, wedge and tensor products, the Hadamardproduct is basis-dependent.
9
The Hadamard tensor also does not saturate the upper bound on tensor rank:
๐ = ๐(โ) โช ๐2.
This has meaningful consequences for the computational cost. Similar to the matrixcase, the cost of evaluating tensors is strongly connected to the rank. A rank-๐ tensorwill require at least ๐ individual arithmetic operations.
For the Hadamard product this cost ๐ = ๐ is tight. We can use desirable propertiesof the standard basis to compute โจ๐๐, ๐ฅโฉ, โจ๐๐, ๐ฆโฉ and their product at unit cost.
Fact 10.3. The arithmetic cost of computing the Hadamard product is proportional torank(โ) = ๐.
Lecture 11: Strassenโs algorithm for matrix multiplicationScribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppApril 29, 2019
1 Agenda1. Recapitulation: matrix multiplication2. Strassenโs matrix multiplication algorithm3. The arithmetic complexity model and rigorous improvements for the complexity
of matrix multiplication
2 Recapitulation: matrix multiplicationLet ๐ , ๐ โ R๐ร๐. We use Einstein notation to label the individual matrix entries:๐๐
๐ denotes the entry in the ๐-th row (upper-case index) and the ๐-th column (lower-case index). The matrix product ๐ = ๐ ๐ โ R๐ร๐ of two square matrices is definedcomponent-wise:
โโโ
๐11 ยท ยท ยท ๐1
๐... . . . ...
๐๐1 ยท ยท ยท ๐๐
๐
โโโ
โโโ
๐ 11 ยท ยท ยท ๐ 1
๐... . . . ...
๐ ๐1 ยท ยท ยท ๐ ๐
๐
โโโ =
โโโ
โ๐๐=1 ๐1
๐๐ ๐1 ยท ยท ยท โ๐
๐=1 ๐1๐๐ ๐
๐... . . . ...โ๐
๐=1 ๐๐๐๐ ๐
1 ยท ยท ยท โ๐๐=1 ๐๐
๐๐ ๐๐
โโโ .
More succinctly: Let ๐ = ๐๐ โ R๐ร๐ be the matrix product. Then, its coefficientscorrespond to
๐๐๐ =
๐โ
๐=1๐๐
๐๐ ๐๐ for all 1 โค ๐, ๐ โค ๐. (1)
Remark 2.1 (Restriction to real-valued square matrices). For today, we will restrict our at-tention to matrix products of real-valued, square ๐ ร ๐ matrices. More general rect-angular matrices can be converted into square matrices by adding rows/columns ofzero. Moreover, the product of complex-valued matrices can be decomposed into real-and imaginary parts. This decomposition results in four sub-multiplications that areeffectively real-valued.
Formula (1) suggests the following general cost for computing the product ๐ = ๐๐of two ๐ร๐ matrices. Computing a coefficient ๐๐
๐ requires ๐ elementary multiplications(๐๐
๐๐ ๐๐ for 1 โค ๐ โค ๐) and ๐ subsequent elementary additions. There are in total
๐2 coefficients, so the total arithmetic cost is ๐3 elementary multiplications and ๐3
elementary additions.
Fact 2.2. The arithmetic cost of matrix multiplication by means of Formula (1) is 2๐3.
This total arithmetic cost is of order ๐ช(๐3). A natural question is whether thisscaling in the problem size is optimal. Any improvement could lead to faster matrix
2
multiplication algorithms. This is highly desirable in practice, since matrix multipli-cations are at the very core of most numerical linear algebra techniques. We use thefollowing notation to indicate such potential improvements.
Definition 2.3 (exponent of matrix multiplication). The exponent of matrix multiplication๐ is the smallest number such that a (potentially asymptotic) matrix multiplicationalgorithm exists whose arithmetic cost obeys ๐ช(๐๐).
Fact 2.2 asserts ๐ โค 3. Also, we do not make any assumption about additionalstructure in the matrices that we wish to multiply. A general ๐ร๐ matrix has ๐2 degreesof freedom. This imposes a fundamental lower bound on the matrix multiplicationexponent: ๐๐ โฅ 2๐2 arithmetic operations are necessary to input the problem description.Combining both yields
2 < ๐ โค 3. (2)
Naively, one might assume that ๐ = 3 (standard matrix multiplication) is optimal.However, this is not the case. The current record is
๐ โค 2.3729
and was achieved by Le Gall in 2014. This remarkable improvement is a consequence oftensor analysis. We will devote this lecture and the next one to point out the ideas andmethods behind these impressive developments.
3 Strassenโs algorithms3.1 Strassenโs algorithm for multiplying 2ร 2 matricesAll fundamental improvements in the cost of matrix multiplication date back to a keyobservation that is due to Strassen from 1969. According to Landsberg, Strassen tried toprove that the naive matrix multiplication cost of ๐ = 3 is optimal. In order to achievethis goal, he focused on 2 ร 2 matrices defined on a finite field, where an exhaustiveanalysis is possible. His thorough analysis had quite the opposite effect. He found analternative way of doing matrix multiplication that readily generalizes to any field. Set๐ = 2 and define the following seven numbers:
๐1 =(๐1
1 + ๐22)(
๐ 11 + ๐ 2
2),
๐2 =(๐2
1 + ๐22)๐ต1
1,
๐3 =๐11(๐ 1
2 โ ๐ 22),
๐4 =๐22(๐ 2
1 โ ๐ 11),
๐5 =(๐1
1 + ๐12)๐ 2
2,
๐6 =(๐2
1 โ๐11)(
๐ 11 + ๐ 1
2),
๐7 =(๐1
2 โ๐22)(
๐ 21 + ๐ 2
2).
3
One can then check that all entries of the product ๐ = ๐๐ โ R2ร2 correspond toelementary linear combinations of these seven numbers:
๐ =(
๐1 + ๐4 โ๐5 + ๐7 ๐3 + ๐5๐2 + ๐4 ๐1 โ๐2 + ๐3 + ๐6
)โ R2ร2.
It is instructive to group this algorithm into three stages:
1. Linear pre-processing: Compute ๐ = 7 linear combinations of each original inputmatrix:
๐1 =๐11 + ๐2
2, ๐2 = ๐21 + ๐2
2, . . . , ๐7 = ๐12 โ๐2
2,
๐1 =๐ 11 + ๐ 2
2, ๐2 = ๐ 11, . . . , ๐7 = ๐ 2
1 + ๐ 22.
2. Elementary multiplications: Compute ๐๐ = ๐๐๐๐ for each 1 โค ๐ โค 7 = ๐.3. Linear post-processing: infer the entries of the final matrix product by computing
linear combinations of the ๐๐โs.
We emphasize that scalar multiplications are isolated and only occur in stage 2. Whatis more, Strassenโs algorithm requires fewer scalar multiplications than naive matrixmultiplication: 7 instead of 8. This reduction in multiplication seems to come at anadditional price in elementary additions: 18 (10 for pre-processing plus 8 for post-processing) instead of 8 for standard matrix multiplication.3.2 Strassenโs algorithm for multiplying 2๐ ร 2๐ matricesStrassenโs basic algorithm seems more resource-demanding than the naive procedure.However, it does get by with fewer multiplications. This small saving in multiplica-tions does not (yet) offset the extra cost in linear pre- and post-processing. Perhapssurprisingly, this offset changes when we extend Strassenโs basic algorithm to higher-dimensional matrix products.
The divide and conquer rule allows for readily generalizing Strassenโs algorithm tomatrix multiplication of 2๐ ร 2๐ matrices. Simply divide ๐ and ๐ into 2 ร 2 blockmatrices
๐ =(
๐11 ๐1
2๐2
1 ๐22
), ๐ =
(๐ 1
1 ๐ 12
๐ 21 ๐ 2
2
),
where each block is a 2๐โ1ร2๐โ1 matrix. The intermediate values in Strassenโs algorithmreadily generalize to matrices:
๐1 =(๐1
1 + ๐22)(
๐ 11 + ๐ 2
2)โ R2๐โ1ร2๐โ1
,
...
๐7 =(๐1
2 โ๐22)(
๐ 21 + ๐ 2
2)โ R2๐โ1ร2๐โ1
.
So does linear post-processing
๐ =(
๐1 + ๐4 โ๐5 + ๐7 ๐3 + ๐5๐2 + ๐4 ๐1 โ๐2 + ๐3 + ๐6
)โ R2๐ร2๐
.
4
This ansatz reduces the task of computing a single 2๐ ร 2๐ matrix multiplication toseven matrix multiplications of size 2๐โ1 ร 2๐โ1. Nothing prevents us from repeatingthis argument inductively: re-use Strassen to decompose each 2๐โ1 ร 2๐โ1 matrixmultiplication into seven matrix multiplications of size 2๐โ2ร 2๐โ2. Iterate this divisionrecursively ๐ times until the submatrices degenerate into numbers. For ๐ = 2๐, thisrecursive procedure results in
7๐ =(2๐)log2(7)
โ ๐2.807
arithmetic multiplications. This is strictly smaller than the ๐3 elementary multiplicationsassociated with naive matrix multiplication. This simple counting argument does not(yet) take into account the extra effort in linear pre- and post-processing that is requiredfor sequentially applying Strassen multiplication. We will devote Section 4 to a thoroughanalysis of the size of this extra cost. This study will highlight that the number ofmultiplications asymptotically dominate the total arithmetic effort:
Theorem 3.1 (Strassenโs improvement for matrix multiplication). Asymptotically, arecursive application of Strassenโs algorithm achieves a matrix multiplication exponent๐ โ 2.807 < 3.
We conclude this section with a couple of remarks. Strassenโs algorithm only worksfor matrices whose dimension is a power of two. This reshaping may be achieved byzero-padding: extend the original matrices with zero-rows and columns until they havea 2๐ ร 2๐ shape. Theorem 3.1 asserts that this seemingly counter-intuitive step mayspeed up the computation. For sufficiently large dimensions, it is beneficial to firstincrease the problem dimension to subsequently apply a faster algorithm.
Secondly, Strassenโs algorithm โ like most divide and conquer methods โ may beparallelized to a considerable degree. A smooth working of the algorithm, however,requires a considerable amount of data transfer within the cores at each recursion step.
Finally, Strassenโs algorithm is used in practice. Basic Linear Algebra Subprograms(BLAS) use it as a subroutine. The reduction in arithmetic cost comes at the prizeof additional memory and reduced numerical stability. For these reasons, Strassenโsalgorithm is mostly used for integer matrix multiplication. Moreover, these practical im-plementations switch to the naive matrix multiplication procedure, once the submatricesare small enough. They do not carry out the full recursive reduction.
4 Asymptotic dominance of multiplications in Strassenโs algorithm4.1 The algebraic complexity modelIn theoretical computer science, the complexity of an algorithm is usually measuredin runtime. This measure the number of steps that a Turing machine would executebefore terminating and providing the output.
Today, we shall focus on a conceptually similar, but slightly different computationmodel: the algebraic complexity model. There an algorithm is a sequence of algebraicsteps. With matrix multiplication in mind, step ๐ is a statement of the following form:
5
1. constant initialization: ๐ก๐ โ ๐ for any ๐ โ R,2. read in the problem description: ๐ก๐ โ ๐๐
๐ , or ๐ก๐ โ ๐ ๐๐ for 1 โค ๐, ๐ โค ๐,
3. arithmetic computation: ๐ก๐ โ ๐ก๐ โ ๐ก๐, where1 โ โ {+,โ,ร} and ๐, ๐ < ๐,4. solution output: ๐๐
๐ โ ๐ก๐ for some ๐ < ๐ and 1 โค ๐, ๐ โค ๐.
We say that such an arithmetic algorithm computes a matrix product ๐ = ๐๐ , if itoutputs ๐๐
๐ = โ๐๐=1 ๐๐
๐๐ ๐๐ for all 1 โค ๐, ๐ โค ๐. The running time, or complexity, of the
algorithm is the total number of steps disregarding read-in (step 2) and output (step4). For matrix multiplication, this simplification is justified. The lower bound (2) onthe complexity of computing matrix products is strictly larger than the quadratic costassociated with problem read-in and solution output.
When dealing with matrix multiplication algorithms in the algebraic complexitymodel, it is sufficient to focus on algorithms of a very special and desirable form.
Definition 4.1 (Normal form for matrix multiplication). We say that a matrix multiplicationalgorithm is in normal algebraic form if it computes ๐ = ๐๐ by executing the followingsteps:
1. For 1 โค ๐ โค ๐: compute ๐ผ๐, a linear combination of the entries of ๐,2. For 1 โค ๐ โค ๐: compute ๐ฝ๐, a linear combination of the entries of ๐ ,3. For 1 โค ๐ โค ๐: compute ๐๐ = ๐ผ๐๐ฝ๐.4. For 1 โค ๐, ๐ โค ๐: compute ๐๐
๐ as a linear combination of the ๐๐โs.
All these linear combinations are fixed, i.e. they donโt depend on the inputs ๐ and ๐ .The size of this normal form algorithm is characterized by ๐.
This normal form mimics the original presentation of Strassenโs algorithm fromSubsection 3.1.
Fact 4.2. Strassenโs original 2ร 2 algorithm has normal form with ๐ = 7.
Definition 4.1 may seem somewhat ad-hoc and geared towards Strassenโs basicalgorithm. This, however, is not the case. Any arithmetic algorithm for matrixmultiplication can be converted into this normal form at comparatively little extra cost.
Theorem 4.3. Suppose there exists an arithmetic algorithm for matrix multiplicationthat has runtime ๐ . Then, there is a normal form algorithm of size ๐ = 2๐ .
The proof of this statement is somewhat pedantic and we refer to, for instance,Yuval Filmusโ lecture notes for details. The key idea is that matrix multiplication has arich bi-linear structure: the final output must be linear in the entries of ๐ and linearin the entries of ๐ . This restriction alone imposes severe constraints on the arithmetic
1Typically, the algebraic complexity model also allows for division. However, division does notfeature in our algorithms for matrix multiplication and behaves somewhat differently in the algorithmicanalysis. This is why we choose to omit it here.
6
expressions that can occur throughout the course of a general arithmetic algorithm. Forinstance, no polynomials of order three (or higher) can feature in the final expression.Should the algorithm compute a third order polynomial at any step, this must cancel outagain at a later time and we can safely ignore it. Elementary arguments like this allowto considerably trim the arithmetic representation of a general matrix multiplicationalgorithm. Subsequently, this allows for a conversion into normal form at relativelylittle extra cost.
4.2 Dominance of multiplications in the arithmetic complexity model
The following statement provides a rigorous connection between the size ๐ of a matrixmultiplication algorithm in normal form and the associated runtime (measured in thearithmetic complexity model).
Theorem 4.4. Suppose there is an ๐ โ N such that there exists a normal form algorithmthat multiplies two ๐ ร ๐ matrices and has size ๐ = ๐๐ผ. Then, the exponent of matrixmultiplication obeys ๐ โค ๐ผ.
Theorem 3.1 is an immediate consequence of this general result. Strassenโs basicalgorithm meets the requirements of this statement for ๐ = 2 and ๐ = 7 = 2log2(7).Applying it ensures,
๐ โค log2(7) โ 2.807.
Proof of Theorem 4.4. We follow the divide and conquer approach sketched for Strassenโsalgorithm We can recursively extend an algorithm for ๐ ร ๐ matrix multiplication to analgorithm for multiplying two ๐๐ ร ๐๐ matrices. Let us denote the associated runtimeby ๐ (๐). We will establish the claim by induction over ๐. The assumption ๐ (1) = ๐๐ผ
establishes the base case. For the induction step, we bound ๐ (๐ + 1) in terms of ๐ (๐)and additional dimension-dependent factors.
Divide and conquer allows us to reduce matrix multiplication of ๐๐+1ร๐๐+1 matricesto a sequence of in total ๐ smaller matrix multiplications. To do this, we divide ๐ and๐ into ๐2 blocks of size ๐๐ ร ๐๐ each. We then apply the algorithm to compute thematrix product block-wise. The normal form assumption ensures, that this involvesexactly ๐ linear combinations of sub-blocks ๐๐
๐ and ๐ linear combinations of sub-blocks๐ ๐
๐ . Moreover, let us assume for simplicity that each linear combination only involvesa constant number of blocks2 (this is true for Strassenโs algorithm). Since ๐๐
๐ and๐ ๐
๐ have size ๐๐ ร ๐๐, computing each linear combination takes at most ๐(๐๐)2 = ๐๐2๐
arithmetic operations. The normal form ensures that we need to compute exactly ๐linear combinations for sub-blocks of ๐ and ๐ each. The total arithmetic cost ofpre-processing is therefore bounded by
๐pre(๐ + 1) โค 2๐๐๐2๐.
2A more involved argument allows for by-passing this simplifying assumption, but somewhat obscuresthe main conceptual ideas.
7
Next, we multiply these linear combinations using the normal form algorithm for ๐๐ร ๐๐
matrices. The induction hypothesis asserts that this step requires
๐mult(๐ + 1) = ๐๐ (๐)
arithmetic operations. Finally, we compute all ๐2 blocks of the target matrix ๐ = ๐๐ :
๐comb(๐ + 1) โค ๐2 ร ๐โฒ(๐๐)2 = ๐โฒ๐2๐2๐.
Adding all these individual runtime bounds results in the following bound on the overallruntime:
๐ (๐ + 1) โค ๐๐ (๐) +(2๐๐ + ๐โฒ๐2
)๐2๐ = ๐๐ผ๐ (๐) +
(2๐๐๐ผ + ๐โฒ๐2
)๐2๐.
Finally, recall the lower bound on the minimal cost of matrix multiplication from (2):multiplying two ๐๐ ร ๐๐ matrices requires more than ๐2๐ arithmetic operations. Appliedto the problem at hand, this ensures ๐ผ > 2 and we may further simplify the inductiveruntime bound:
๐ (๐ + 1) โค ๐๐ผ(๐ (๐) + ๐ถ๐2๐
).
Since ๐ผ > 2, ๐ (๐) asymptotically dominates ๐ถ๐2๐ for any value of the constant ๐ถ. Inturn, the asymptotic solution to this implicit recurrence is
๐ (๐) = ๐ช(๐๐ผ๐
)= ๐ช
((๐๐)๐ผ)
.
In terms of the matrix size ๐ = ๐๐, this is ๐ (๐) = ๐ช(๐๐ผ) and provides an upper boundon the exponent of matrix multiplication.
Lecture 12: Tensorial aspects of matrix multiplication
Scribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 01, 2019
1 Agenda and Outline1. Matrix multiplication as a tensor2. Connections between tensor rank and the exponent of matrix multiplication3. Schรถnhageโs approach4. Current records and a rough sketch of the laser method
Last lecture was devoted to a detailed analysis of Strassenโs algorithm for fastmatrix multiplication. Today, we will re-visit this idea and expand upon it using tensormethods. This alternative point of view has led to spectacular improvements in thefundamental algorithmic cost associated with matrix multiplication. These speed-upsare general, i.e. they do not depend on advantageous matrix structure, like sparsity.Since matrix multiplication is the dominant subroutine in numerical linear algebra โand, by extension, data analysis โ results of this type are of great relevance to manyscientific communities.
Naively, one would expect that the arithmetic cost of multiplying two ๐ ร ๐ matricesis ๐ช(๐3). This would imply that the exponent of matrix multiplication is ๐ = 3. SinceStrassenโs discovery in 1969, many researches believe that the true cost of matrixmultiplication is โalmostโ linear in the problem size, i.e. the exponent obeys ๐ = 2 + ๐,where ๐ > 0 is small. The current record in this direction is
๐ช(๐๐) where ๐ โค 2.3728639
and was established by Le Gall in 2014. Subsequent work by Ambainis, Fimus andLe Gall highlights that the potential of the underlying approach is almost exhausted:๐ = 2.3725 cannot be overcome by incremental improvements of current techniques.Scientifically, this is an exciting state of the art: Further improvements seem to requiretruly novel ideas.
2 Matrix multiplication as a tensorWe will restrict our attention to real-valued matrix multiplication. Fix ๐ โ R๐ร๐,as well as ๐ โ R๐ร๐ and denote their entries by ๐๐
๐ (entry in the ๐-th row and ๐-thcolumn) and ๐ ๐
๐ (entry in the ๐-th row and ๐-th column), respectively. Then, the product๐ = ๐๐ is a ๐ ร ๐ matrix whose entries are defined by the matrix multiplication rule:
๐๐๐ =
๐โ
๐=1๐๐
๐๐ ๐๐ for 1 โค ๐ โค ๐, 1 โค ๐ โค ๐.
2
Matrix multiplication can be regarded as the following bilinear map:
R๐ร๐ ร R๐ร๐ โR๐ร๐,
(๐, ๐ ) โฆโ๐ = ๐๐ .
Indeed, it is easy to check that this map is linear in both inputs. Next, note that theoutput space R๐ร๐ is a linear vector space with (finite) dimension ๐๐. We can endow itwith the Frobenius inner product (๐1, ๐2) = tr
(๐๐
1 ๐2). This inner product establishes
a one-to-one relation (isomorphism) between R๐ร๐ and its dual space (R๐ร๐)* โ R๐ร๐ โthe space of all linear functionals on R๐ร๐:
๐(๐) = (ฮฆ, ๐) for some ฮฆ โ R๐ร๐.
Dualizing the image space allows us to convert matrix multiplication into a tri-linearform:
R๐ร๐ ร R๐ร๐ ร (R๐ร๐)* โR,
(๐, ๐ , ๐*) โฆโ(๐*, ๐๐ ) = tr(๐๐๐ ) = tr(๐๐ ๐).
The space of tri-linear forms is the canonical dual space of order three tensor products.This correspondence allows us to associate matrix multiplication with a tensor. Thistensor becomes concrete if we choose bases for the individual matrix spaces (viewedas finite dimensional vector spaces). Denote the standard basis of R๐ร๐ by ๐ฅ๐
๐ , thestandard basis of R๐ร๐ by ๐ฆ๐
๐ and let ๐ง๐๐ be the standard basis of R๐ร๐. Then,
โจ๐, ๐, ๐โฉ =๐โ
๐=1
๐โ
๐=1
๐โ
๐=1๐ฅ๐
๐ โ ๐ฆ๐๐ โ ๐ง๐
๐ (1)
is the matrix multiplication tensor. It has a very symmetric structure, see Figure 1.It is wortwhile to underline this correspondence with a more concrete calculation.
For ๐ด โ R๐ร๐, ๐ต โ R๐ร๐ and ๐ถ โ R๐ร๐, we obtain
(โจ๐, ๐, ๐โฉ, ๐ด โ ๐ต โ ๐ถ) =๐โ
๐=1
๐โ
๐=1
๐โ
๐=1
(๐ฅ๐
๐ , ๐ด)(
๐ฆ๐๐, ๐ต
)(๐ง๐
๐, ๐)
=โ
๐,๐,๐
๐ด๐๐๐ต๐
๐๐๐๐ = tr(๐ด๐ต๐ถ).
Specifying ๐ถ = (๐๐๐๐)๐ allows us to read off the (๐, ๐)-th entry of the matrix product๐ด๐ต โ R๐ร๐:
(โจ๐, ๐, ๐โฉ, ๐ด โ ๐ต โ (๐๐๐๐)๐
)= tr
(๐ด๐ต(๐๐๐๐)๐
)= ๐๐
๐ ๐ด๐ต๐๐
for all 1 โค ๐ โค ๐ and 1 โค ๐ โค ๐.The matrix multiplication tensor (1) is a sum of ๐๐๐ elementary tensor products.
Evaluating the contribution of each elementary tensor to the overall matrix product is
3
Figure 1 Visualization of โจ2, 2, 2โฉ viewed as a 3-dimensional array in R4 โR4 โR4 โ R4ร4ร4.The blue boxes indicate an entry of one, while white boxes denote zero entries.
cheap: simply read in the corresponding matrix entries and compute a single product ofthree numbers. This suggests that the computational cost is dominated by the numberof rank-one terms that constitute the tensor โจ๐, ๐, ๐โฉ โ in other words: the tensor rankmatters. The decomposition (1) provides an upper bound on the tensor rank:
๐(โจ๐, ๐, ๐โฉ) โค ๐๐๐.
Note that this upper bound is proportional to the number of scalar multiplications thatis required for standard matrix multiplication.
3 Connections between tensor rank and the exponent of matrix mul-tiplication
3.1 Strassenโs Algorithm from a tensor perspectiveIn 1969 Strassen found an way of multiplying two 2 ร 2 matrices that gets by with fewermultiplications than the naive algorithm. We refer to the previous lecture for details.Here, we emphasize that Strassenโs procedure implies an alternative way of decomposingthe matrix multiplication tensor โจ2, 2, 2โฉ (1) into elementary rank-one tensors:
โจ2, 2, 2โฉ =(๐ฅ1
1 + ๐ฅ22)
โ(๐ฆ1
1 + ๐ฆ22)
โ(๐ง1
1 + ๐ง22)
+(๐ฅ2
1 + ๐ฅ22)
โ ๐ฆ11 โ
(๐ง2
1 โ ๐ง22)
+๐ฅ11 โ
(๐ฆ1
2 โ ๐ฆ22)
โ(๐ง1
2 + ๐ง22)
+ ๐ฅ22 โ
(๐ฆ2
1 โ ๐ฆ11)
โ(๐ง1
1 + ๐ง21)
+(๐ฅ1
1 + ๐ฅ12)
โ ๐ฆ22 โ
(โ๐ง1
1 + ๐ง12)
+(๐ฅ2
1 โ ๐ฅ11)
โ(๐ฆ1
1 + ๐ฆ12)๐ง2
2
+(๐ฅ1
2 โ ๐ฅ22)
โ(๐ฆ2
1 + ๐ฆ22)
โ ๐ง11 .
This is a sum of only seven elementary tensor products.
Theorem 3.1 (Strassen, 1969). The matrix multiplication tensor โจ2, 2, 2โฉ โ (R2ร2)โ3
has rank at most seven (today, we know that it is exactly seven).
4
3.2 Connection between tensor rank and the complexity of matrix multiplicationStrassenโs results suggest a connection between tensor rank and the number of elementarymultiplications required for matrix multiplication. Let ๐ be the exponent of matrixmultiplication, i.e. the smallest number such that a (potentially asymptotic) algorithmexists that multiplies two square matrices using ๐ช(๐๐) arithmetic operations.
We can combine our new tensor observation with the technical results from lastlecture to derive the following profound correspondence.
Theorem 3.2. The tensor rank ๐(โจ๐, ๐, ๐โฉ) of any square matrix multiplication tensorprovides an upper bound on the exponent of matrix multiplication:
๐ โค log๐(๐(โจ๐, ๐, ๐โฉ)) for any ๐ โ N.
Combining this insight with Theorem 3.1 readily reproduces the main result fromthe previous lecture:
๐ โค log2(๐(โจ2, 2, 2โฉ)) โค log2(7) โ 2.807. (2)
Proof sketch of Theorem 3.2. Fix ๐ โ N and suppose that โจ๐, ๐, ๐โฉ admits a decom-position into ๐ elementary tensors (i.e. โจ๐, ๐, ๐โฉ has rank at most ๐). Then, we candecompose the matrix multiplication tensor into a sum of ๐ elementary tensor products.We can then use this decomposition as a guideline to construct an arithmetic algorithmin normal form that multiplies two ๐ร๐ matrices. The size of this algorithm is governedby ๐, because the number of arithmetic multiplications exactly corresponds to thenumber of rank-one tensor contributions. Linear pre- and post-processing steps takeinto account that the elementary tensor factors need not be matrix standard basiselements. We leave a detailed establishment of this correspondence as an instructiveexercise.
Subsequently, we can apply Theorem 4.4 from Lecture 11 to draw a precise connectionto the exponent of matrix multiplication. Recall that this result uses divide and conquerto extend this arithmetic algorithm to matrix product of size ๐๐ ร ๐๐ and let ๐ go toinfinity. The resulting recurrence establishes the advertised correspondence between thesize of the algorithm ๐ and the exponent of matrix multiplication.
3.3 Extension to asymmetric matrix multiplication tensorsTheorem 3.2 hinges on the assumption that the underlying tensor describes matrixmultiplication of square matrices. This structural requirement is essential for the divideand conquer strategy that establishes the connection between tensor rank and theexponent of matrix multiplication.
Viewed from this angle, the tensor rank is somewhat more flexible. There are certainsymmetrization operations on tensors that do not increase the tensor rank. Supposethat we have established a non-trivial bound on the rank of โจ๐, ๐, ๐โฉ. We can thenโsymmetrizeโ the tensor to effectively convert โจ๐, ๐, ๐โฉ into โจ๐๐๐, ๐๐๐, ๐๐๐โฉ โ a squarematrix multiplication โ while simultaneously maintaining key aspects of the originalrank bound. This is the general idea behind a proof of the following extension ofTheorem 3.2.
5
Theorem 3.3. The tensor rank of a general matrix multiplication tensor provides anupper bound on the exponent of matrix multiplication:
๐ โค 3 log๐๐๐(๐(โจ๐, ๐, ๐โฉ)).Proof. Let ๐ = โ
๐,๐,๐ ๐๐๐๐๐๐ โ ๐๐ โ ๐๐ be a general order-three tensor. Define therotations
๐ ๐ถ =โ
๐,๐,๐
๐๐๐๐๐ฆ๐ โ ๐ง๐ โ ๐ฅ๐ and ๐ ๐ถ2 =โ
๐,๐,๐
๐๐๐๐๐ง๐ โ ๐ฅ๐ โ ๐ฆ๐ .
This operation extends the notion of transposition to higher order tensors. It is easy tocheck that rotations do not affect the tensor rank. Next, note that we can also form thetensor product of two order-three tensors. Formally this results in a tensor of order six.The tensor rank is sub-multiplicative under tensoring: ๐(๐ โ ๐ โฒ) โค ๐(๐ )๐(๐ โฒ).
By combining these operations, we can use ๐ = rank(โจ๐, ๐, ๐โฉ) to bound the tensorrank of โจ๐๐๐, ๐๐๐, ๐๐๐โฉ:
๐(โจ๐๐๐, ๐๐๐, ๐๐๐โฉ) =๐(โจ๐, ๐, ๐โฉ โ โจ๐, ๐, ๐โฉ โ โจ๐, ๐, ๐โฉ)โค๐(โจ๐, ๐, ๐โฉ)๐(โจ๐, ๐, ๐โฉ)๐(โจ๐, ๐, ๐โฉ) = ๐(โจ๐, ๐, ๐โฉ)3 = ๐3.
The last line follows from the fact that โจ๐, ๐, ๐โฉ and โจ๐, ๐, ๐โฉ are rotations of โจ๐, ๐, ๐โฉ.Inserting the bound ๐(โจ๐๐๐, ๐๐๐, ๐๐๐โฉ) โค ๐3 into Theorem 3.2 establishes the claim.
3.4 Extension to border rankTensors behave very differently from matrices. Recall that certain rank-๐ tensors ๐ canbe approximated to arbitrary accuracy by tensors that have much smaller rank. Theminimal rank of such approximating tensors is called the border rank ๐(๐ ).
At first sight, the conversion of such approximations into accurate numerical algo-rithms for matrix multiplications seems challenging. However, it turns out that thisis not the case. By adapting the divide and conquer strategy appropriately, one canshow that the approximation accuracy ๐ becomes almost irrelevant when extending theoriginal algorithm recursively to very large matrix dimensions. We content ourselveswith highlighting the result, while referring to the literature for rigorous proofs.Theorem 3.4. The border rank of a general matrix multiplication tensor provides anupper bound on the exponent of matrix multiplication:
๐ โค 3 log๐๐๐(๐(โจ๐, ๐, ๐โฉ)).
4 Improved bounds on the exponent of matrix multiplicationThe previous results may seem somewhat technical. However, Theorem 3.4 forms thebasis of virtually all improvements on the size of ๐ since Strassenโs original discovery.
In a nutshell, all of these improvements arise from variants and refinements ofthe following basic strategy. Identify (small) numbers ๐, ๐, ๐ โ N and a matrixmultiplication tensor โจ๐, ๐, ๐โฉ whose border rank is as small as possible. Then, useTheorem 3.4 (or refinements thereof) to convert this insight about border rank into anupper bound on ๐.
6
4.1 Schoenhageโs TheoremIn 1981, Schoenhage established the following upper bound on the exponent of matrixmultiplication:
๐ โค 2.55. (3)This substantial improvement over Strassenโs bound (2) is a consequence of Schoenhageโsidentity:
๐(โจ4, 1, 4โฉ โ โจ1, 9, 1โฉ) โค 17. (4)Here, โ denotes the direct sum of two matrix multiplication tensors. The direct sumfor tensors is defined in an analogous fashion to the direct sum of two matrices ๐ด โ ๐ต.Each tensor factor is decomposed into two orthogonal subspaces and each tensor onlyacts on one subset of these subspaces. From an operational perspective, Schoenhageโsidentity bounds the joint border rank of an outer product of two 4-dimensional vectorsand an inner product of two completely unrelated 9-dimensional vectors. This bound isremarkable, because border ranks of both individual operations are well understood1:
๐(โจ4, 1, 4โฉ) = 16 and ๐(โจ1, 9, 1โฉ) = 9.
If we associate the border rank (qualitatively) with the number of multiplicationsrequired to compute outer and inner products, we obtain the following puzzling inter-pretation of (4): Computing the outer product of two 4-dimensional vectors requires 16multiplications. At the cost of one additional multiplication, we get an additional innerproduct of two completely unrelated 9-dimensional vectors for free!
Schoenhage capitalized on this counter-intuitive tensor phenomenon by extendingTheorem 3.4 to direct sums of different rectangular matrix multiplication tensors.
Theorem 4.1 (Asymptotic sum inequality). The following bound is true for any triplesequences (๐1, ๐1, ๐1), . . . , (๐๐, ๐๐, ๐๐) โ N ร N ร N
๐โ
๐=1(๐๐๐๐๐๐)๐/3 โค ๐
(๐โจ
๐=1โจ๐๐, ๐๐, ๐๐โฉ
)
Schoenhageโs bound (3) follows from combining the identity (4) with the asymptoticsum inequality and capitalizing on the insight that the border rank is sub-multiplicativeunder taking tensor products. Choose any ๐ โ N and note that
๐((โจ4, 1, 4โฉ โ โจ1, 9, 1โฉ)โ๐
)โค 17๐ .
We may interpret this tensor product as a direct sum of many independent matrixmultiplications. Applying the direct sum inequality and subsequently transforming backto (tensor) products yields
17๐ โฅ ๐((โจ4, 1, 4โฉ โ โจ1, 9, 1โฉ)โ๐
)=((4 ร 1 ร 4)๐/3 + (1 ร 9 ร 1)๐/3
)๐=(16๐/3 + 9๐/3
)๐
Solving 16๐/3 + 9๐/3 = 17 for ๐establishes Schoenhageโs improvement (3).1Exact numbers for rank and border rank readily follow from the fact that both operations may be
described by matrices.
7
4.2 A rough sketch of the main idea behind recent top scoresStrassenโs bound on the exponent of matrix multiplication arises from finding a singlesquare matrix multiplication tensor โจ2, 2, 2โฉ whose rank is smaller than naively antici-pated (7 vs. 8). The divide and conquer strategy subsequently allows for converting thisgain in tensor rank into a genuine speed-up for multiplying large square matrices. Thisapproach can be readily extended to handle non-square matrix multiplication tensors(Theorem 3.3) and border rank (Theorem 3.4).
Schoenhage deviated from this straightforward approach by considering directsums of different matrix multiplication tensors. This affects the relation to matrixmultiplication. The correspondence is more involved and mediated by the direct suminequality (Theorem 4.1) which is non-trivial to prove. However, this relaxation allowedSchoenhage to analyze the border rank of more โexoticโ order-three tensors. Theidentity (4) achieved a substantially smaller border rank than Strassenโs observation(and, more generally: any known border rank bound for matrix multiplication tensors).This resulted in a much better bound: ๐ โค 2.55.
More recent developments are based on pushing Schoenhageโs idea further: deviateeven more from nice matrix multiplication tensors and search this larger set of targettensors for specimen that have a particularly small border rank. This is the main ideabehind the so-called laser method. At the basis of this method is a border rank identitydue to Coppersmith and Winograd from 1990. For any ๐ โ N,
๐(
โจ1, 1, ๐โฉ[0,1,1] + โจ๐, 1, 1โฉ[1,0,1] + โจ1, ๐, 1โฉ[1,1,0] + โจ1, 1, 1โฉ[0,0,2] + โจ1, 1, 1โฉ[0,2.0] + โจ1, 1, 1โฉ[2,0,0])
โค ๐ + 2.
Here โจ๐, ๐, ๐โฉ[๐ผ,๐ฝ,๐พ] denotes a matrix multiplication tensor equivalent to โจ๐, ๐, ๐โฉ, butwhose support is restricted to a certain subset of matrix entries. The super-scriptindicates which subset of matrix entries is affected by the multiplication tensor. Afterflattening the matrix standard basis vectors, these partitions are
๐ฅ[0]0 , ๐ฅ
[1]1 , . . . , ๐ฅ[๐]
๐ , ๐ฅ[1]๐+1, ๐ฆ
[0]0 , ๐ฆ
[1]1 , . . . , ๐ฆ[๐]
๐ , ๐ฆ[1]๐+1, and ๐ง
[0]0 , ๐ง
[1]1 , . . . , ๐ง[๐]
๐ , ๐ง[1]๐+1.
Importantly, the Coppersmith and Winograd identity is valid for any ๐ โ N. However,it is not directly related to a simple matrix multiplication procedure. Forcing certainmatrix entries to zero, however, allows for reducing the tensor to something that looksmuch more like a standard matrix multiplication โจ๐(๐), ๐(๐), ๐(๐)โฉ. Tighter boundson ๐ follow from identifying zero-out patterns that are as sparse as possible andnonetheless enforce a nice matrix multiplication structure. Zeroing out can be done forthe original tensor directly (sub-optimal), or for high-order tensor products and directsums, respectively. In 2014 Le Gall automated this search for efficient zero-out patternsin a large search space (vary ๐ and the size of the tensor product) using numericalalgorithms based on convex optimization. In doing so, he scored the current recordregarding the asymptotic cost of matrix multiplication:
๐ โค 2.3728639.
The laser method yields impressive results, but its potential is almost exhausted. In2015, Ambainis, Filmus and Le Gall proved that it is impossible to go beyond ๐ = 2.3725
8
using the Coppersmith-Winograd identity. This no-go result follows from putting theabove ideas into a rigorous framework. Tight bounds on the convergence of ๐ tocertain entropy functions can be established. These bounds cut both ways: Lowerbounds establish upper bounds for ๐. This is what Le Gall implicitly found in 2014.Upper bounds limit the capabilities of the entire framework and highlight a veritablebottleneck.
Lecture 13: The CP decomposition for tensors
Scribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 13, 2019
1 Agenda1. Practical tools for handling tensors
(a) Tensors as multi-dimensional arrays(b) Tensor slices(c) Matriciation
2. Useful (tensor) products and identities3. The CP decomposition
(a) Motivation(b) Definition(c) Computation
2 Practical tools for handling tensorsToday we will focus exclusively on real-valued tensor products of order three. The meth-ods discussed readily extend to tensors of arbitrary order and may also be generalizedto complex-valued tensors. However, the latter generalization may require some care:transposition and conjugation are not equivalent for complex vector spaces.
Fix ๐ด = R๐1 , ๐ต = R๐2 and ๐ถ = R๐3 and endow each space with the standard basis๐1, . . . , ๐๐๐
, ๐ = 1, 2, 3. We consider tensors in the tensor product space ๐ดโ๐ต โ ๐ถ:
๐ =๐โ
๐=1๐๐๐๐ โ ๐๐ โ ๐๐ ๐๐ โ ๐ด, ๐๐ โ ๐ต, ๐๐ โ ๐ถ, ๐๐ โ R.
2.1 Bases and inner products
The individual standard bases give rise to an extended standard basis on the tensorproduct ๐ดโ๐ต โ ๐ถ:
๐๐ โ ๐๐ โ ๐๐ 1 โค ๐ โค ๐1, 1 โค ๐ โค ๐2, 1 โค ๐ โค ๐3
The canonical inner products on โจยท, ยทโฉ extend as well. Expand ๐ , ๐ โฒ โ ๐ดโ๐ต โ ๐ถ withrespect to the extended standard basis
๐ =โ
๐,๐,๐
๐ก๐๐๐๐๐ โ ๐๐ โ ๐๐ and ๐ โฒ =โ
๐,๐,๐
๐กโฒ๐๐๐๐๐ โ ๐๐ โ ๐๐
2
and set
โจ๐ , ๐ โฒโฉ =โ
๐,๐,๐
โ
๐โฒ,๐โฒ,๐โฒ๐ก๐๐๐๐กโฒ
๐โฒ๐โฒ๐โฒโจ๐๐ โ ๐๐ โ ๐๐, ๐๐โฒ โ ๐๐โฒ โ ๐๐โฒโฉ
=โ
๐,๐,๐
โ
๐โฒ,๐โฒ,๐โฒ๐ก๐๐๐๐กโฒ
๐โฒ๐โฒ๐โฒโจ๐๐, ๐๐โฒโฉโจ๐๐ , ๐๐โฒโฉโจ๐๐, ๐๐โฒโฉ
=โ
๐,๐,๐
๐ก๐๐๐๐กโฒ๐๐๐.
This in particular endows ๐ดโ๐ต โ ๐ถ with a Euclidean norm:
โ๐ โ22 =โ
๐,๐,๐
๐ก2๐๐๐. (1)
The extended standard basis representation highlights a useful interpretation of tensors.They correspond to multi-dimensional arrays:
๐ = [๐ก๐๐๐]๐,๐,๐ โ ๐ดโ๐ต โ ๐ถ = R๐1 ร R๐2 ร R๐3 .
Vectors are 1-dimensional arrays and matrices are 2-dimensional arrays. Tensorscorrespond to higher order arrays. We emphasize, however, that this array interpretationis manifestly basis-dependent.Example 2.1 (Movie frames). We can associate a digital picture with a matrix of pixels.The (๐, ๐)-th entry of this matrix encodes the color of the pixel that sits at position1 โค ๐ โค ๐ฅmax in the ๐ฅ-direction and 1 โค ๐ โค ๐ฆmax in the ๐ฆ-direction. Movies correspondto a sequence of at least 24 frames per second. Throughout a single scene (no cut),these individual frames are typically closely related to each other. It therefore makessense to represent a movie scene as a 3-dimensional data array, where the third axesencodes time.
2.2 Tensor fibres and slicesLet ๐ โ ๐ดโ ๐ต โ ๐ถ be a tensor with basis expansion [๐ก๐๐๐] for 1 โค ๐ โค ๐1, 1 โค ๐ โค ๐2and 1 โค ๐ โค ๐3.Definition 2.2 (Fibre). A fibre is the higher order analogue of matrix rows and columns.For ๐ก โ R๐1 โ R๐2 โ R๐3 , fix 1 โค ๐0 โค ๐1, 1 โค ๐0 โค ๐2, 1 โค ๐0 โค ๐3 and define
๐ก:๐0๐0 =๐1โ
๐=1๐ก๐๐0๐0๐๐ โ ๐ด, ๐ก๐0:๐0 =
๐2โ
๐=1๐ก๐0๐๐0๐๐ โ ๐ต, ๐ก๐0๐0: =
๐3โ
๐=1๐ก๐0๐0๐๐๐ โ ๐ถ.
These vectors are called mode-1, mode-2 and mode-3 fibres, respectively.Fibres arise from contracting certain indices with fixed standard basis vectors. In
wiring calculus,
๐ก:๐0๐0
= ๐ก๐๐0
๐๐0
, ๐ก๐0:๐0= ๐ก
๐๐0
๐๐0
and
๐ก๐0๐0:
= ๐ก
๐๐0
๐๐0 .
3
Example 2.3. For matrices (order two tensors) mode-1 fibers are column vectors andmode-2 fibers are row vectors.
Definition 2.4 (slice). Slices are 2-dimensional sections of a tensor [๐ก๐๐๐] โ ๐ดโ๐ต โ๐ถ andwe treat them as matrices. Fix 1 โค ๐0 โค ๐1, 1 โค ๐0 โค ๐2, 1 โค ๐0 โค ๐3 and define
๐๐0:: =โ
๐,๐
๐ก๐0๐๐๐๐ โ ๐๐ โโ
๐,๐
๐ก๐0๐๐๐๐๐๐๐ โ R๐2ร๐3 ,
๐:๐0: =โ
๐,๐
๐ก๐๐0๐๐๐ โ ๐๐ โโ
๐,๐
๐ก๐๐0๐๐๐๐๐๐ โ R๐1ร๐3 ,
๐::๐0 =โ
๐,๐
๐ก๐๐๐0๐๐ โ ๐๐ โโ
๐๐๐
๐ก๐๐๐0๐๐๐๐๐ โ R๐1ร๐2 .
Slices arise from contracting one index with a fixed standard basis vector and turningthe remaining order-two tensor into a matrix (bend one index):
๐๐0:: = ๐ก
๐๐0
,
๐:๐0:
= ๐ก๐๐0 ,
๐::๐0
= ๐ก
๐๐0
.
Example 2.5 (the Schur-product). Set ๐1 = ๐2 = ๐3 = ๐. Then, the tensor associated withthe Hadamard product ๐ฅ โ ๐ฆ is โ = โ๐
๐=1 ๐๐ โ ๐๐ โ ๐๐. This corresponds to a โdatacubeโ with ones on the super-diagonal and zeros everywhere else.
Example 2.6 (Matrix multiplication for 2ร 2 matrices). Associate the standard matrix basisfor 2 ร 2 matrices with four standard basis vectors: ๐๐๐
๐๐ โ ๐๐ โ ๐๐ = ๐(๐,๐). Then,
matrix multiplication can be viewed as a tensor in R4 ร R4 ร R4:
โจ2, 2, 2โฉ =2โ
๐,๐,๐=1๐(๐,๐) โ ๐(๐,๐) โ ๐(๐,๐).
The associated data cube is depicted in Figure 1.
2.3 Matricization: transforming tensors into a matrix
Matricization, also knownas unfolding or flattening, is the process of reordering theelements of a tensor into a matrix. It is easiest understood in wiring notation. Consideran order three tensor
๐ = ๐ โ ๐ดโ๐ต โ ๐ถ
There is an easy standard procedure to convert this tensor into a matrix. Leave oneindex line unchanged, group the other two together and bend them. For order three
4
Figure 1 Data cube visualization of 2 ร 2 matrix multiplication viewed as a tensor inR4 โ R4 โ R4.
tensors, there are(3
2)
= 3 possible ways to achieve this goal:
๐(1)๐ด ๐ต โ ๐ถ = ๐ ,
๐(2)๐ต ๐ดโ ๐ถ = ๐ ,
๐(3)๐ถ ๐ดโ๐ต = ๐ .
The subscript indicates which tensor factor remain unchanged. These matriciations arecalled mode-๐ unfoldings, where ๐ refers to the tensor factor that remains unchanged.
Example 2.7. Consider ๐ โ R3 โ R4 โ R2 โ R3ร4ร2 with frontal slices
๐::1 =
โโโ
1 4 7 102 5 8 113 6 9 12
โโโ , ๐::2 =
โโโ
13 16 19 2214 17 20 2315 18 21 24
โโโ .
5
Then,
๐(1) =
โโโ
1 4 7 10 13 16 19 222 5 8 11 14 17 20 233 6 9 12 15 18 21 24
โโโ โ R3ร(4ร2),
๐(2) =
โโโโโ
1 2 3 13 14 154 5 6 16 17 187 8 9 19 20 2110 11 12 22 23 24
โโโโโ โ R4ร(3ร2),
๐(3) =(
1 2 3 ยท ยท ยท 11 1213 14 15 ยท ยท ยท 23 24
)โ R2ร(4ร3).
3 Useful product operation on vectors and matrices3.0.1 Kronecker product
Fix ๐ = (๐1, . . . , ๐๐1)๐ โ R๐1 and ๐ = (๐1, . . . , ๐๐2)๐ โ R๐2 . We define the Kroneckerproduct
๐โ ๐ =
โโโ
๐1๐...
๐๐1๐
โโโ โ R๐1ร๐2 โ R๐1 โ R๐2 .
This is a concrete realization of the tensor product that has accompanied us throughoutthe corse of this lecture. It is basis independent and in wiring notation, we denote it by
๐โ
๐ =๐
๐
.
This product operation readily extends to matrices. For ๐ด โ R๐1ร๐2 and ๐ต โ R๐3ร๐4
we set
๐ดโ๐ต =
โโโ
๐11๐ต ยท ยท ยท ๐1๐2๐ต... . . . ...
๐๐11๐ต ยท ยท ยท ๐๐1๐2๐ต
โโโ โ R(๐1ร๐3)ร(๐2ร๐4) โ R๐1ร๐2 โ R๐3ร๐4 .
In wiring notation, this corresponds to arranging operators in parallel:
๐ดโ
๐ต =๐ด
๐ต
.
3.0.2 Hadamard product
The Hadamard product only makes sense for vectors (matrices) with equal dimensions.
6
Fix ๐, ๐ โ R๐ and define their entry-wise product:
๐โ ๐ =
โโโ
๐1๐1...
๐๐๐๐
โโโ โ R๐1 .
Note that the coefficients of this vector correspond to a certain (symmetric) sub-selectionof entries in the Kronecker product. We introduce the following wiring โgadgetโ tomake this explicit:
=๐โ
๐=1
๐๐
๐๐
๐๐
.
Then,
๐โ ๐ =๐
๐
The Hadamard product can be extended to matrices of equal dimension. For ๐ด, ๐ต โR๐1ร๐2 we define
๐ดโ๐ต =
โโโ
๐11๐11 ยท ยท ยท ๐1๐2๐1๐2... . . . ...
๐๐11๐๐11 ยท ยท ยท ๐๐1๐2๐๐1๐2
โโโ โ R๐1ร๐2 .
Similar to the vector case, this matrix product arises from sub-selecting certain entriesof the Kronecker product ๐ดโ๐ต. In wiring notation, this sub-selection is achieved byapplying the Hadamard gadget twice:
๐ดโ๐ต =๐ด
๐ต
.
Finally, note that we recover the Hadamard tensor โ โ R๐1โR๐1โR๐1 from Example 2.5by bending the first and third indicex of the Hadamard gadget to the left:
โ =๐1โ
๐=1๐๐ โ ๐๐ โ ๐๐ = (2)
3.0.3 Khatri-Rao product
Suppose that ๐ด โ R๐1ร๐2 and ๐ต โ R๐3ร๐2 have the same number of columns. Then,we can define the following matrix product:
๐ด *๐ต = [๐1 ยท ยท ยท๐๐2 ] * [๐1 ยท ยท ยท ๐๐2 ] = [๐1 โ ๐1 ยท ยท ยท๐๐2 โ ๐๐2 ] โ R(๐1ร๐3)ร๐2 .
7
This is called the Khatri-Rao product. It results from a one-sided application of theHadamard gadget.
๐ด*๐ต =๐ด
๐ต
This graphical notation underlines the intermediary nature of this product. It is โin a precise sense โ half way between the general Kronecker product and the highlystructured Hadamard product.3.0.4 Useful identities
Let ๐ดโ denote the Moore-Penrose inverse. Then, persistence of the Kronecker productfor operators readily implies
(๐ดโ๐ต)(๐ถ โ๐ท) = ๐ด๐ถ โ๐ต๐ท and (๐ดโ๐ต)โ = ๐ดโ โ๐ตโ .
Although not obvious from the entry-wise definition, the Khatri-Rao product is associa-tive:
(๐ด *๐ต) *๐ถ = ๐ด * (๐ต *๐ถ) =: ๐ด *๐ต *๐ถ.
This readily follows from the wiring definition and specific features of the Hadamardgadget: this gadget corresponds to a 3-way Kronecker delta. It is only non-zero if all ofthe three indices coincide. Like ordinary Kronecker products, this 3-way generalizationis associative. The wiring formalism also allows for readily establishing the followinguseful identity between Khatri-Rao and Hadamard product:
(๐ด *๐ต)๐ (๐ด *๐ต) =๐ด๐
๐ต๐
๐ด
๐ต
= (๐ด๐ด๐ )โ (๐ต๐ ๐ต).
This identity provides a simple expression for the Moore-Penrose inverse:
(๐ด *๐ต)โ =((๐ด๐ ๐ด)โ (๐ต๐ ๐ต)
)โ (๐ด *๐ต)๐ . (3)
4 The CP decomposition4.1 Recapitulation: minimal rank decompositionsRecall that we may express any tensor in as a sum of rank-one elements. For orderthree tensors, we obtain the following decomposition:
๐ก =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐ โ R๐1 โ R๐2 โ R๐3 . (4)
Here ๐1, . . . , ๐๐ โ ๐ด, ๐1, . . . , ๐๐ โ ๐ต and ๐1, . . . , ๐๐ โ ๐ถ. Set ๐ด = [๐1 ยท ยท ยท๐๐] โ R๐1ร๐,๐ต = [๐1 ยท ยท ยท ๐๐] โ R๐2ร๐ and ๐ถ = [๐1 ยท ยท ยท ๐๐] โ R๐3ร๐. Then, Kolda advocates thefollowing notation:
๐ก = [[๐ด, ๐ต, ๐ถ]]. (5)
8
The matrices ๐ด, ๐ต and ๐ถ are called factor matrices. The Hadamard gadget (2) allowsus to convert Koldaโs notation into a simple wiring diagram. Equation (5) is equivalentto
๐ก =๐ด
๐ต
๐ถ
.
Factor matrices are closely related to matriciations. For instance,
๐(2) =๐ด
๐ต
๐ถ
= ๐ต
๐ด๐
๐ถ๐
= ๐ต(๐ด *๐ถ)๐ ,
where we have identified the transpose of the Khatri-Rao product. Similar relation holdtrue for other the other mode-๐ unfoldings. Each matriciation singles out another factormatrix on the right:
๐(1) = ๐ด(๐ต *๐ถ)๐ , ๐(2) = ๐ต(๐ด *๐ถ)๐ and ๐(3) = ๐ถ(๐ด *๐ต)๐ . (6)
4.2 CP decomposition
Minimal rank decompositions (4) of tensors closely resemble matrix factorizations.Indeed, suppose that ๐ด โ R๐1ร๐2 may be decomposed into a product of smallermatrices: ๐ด = ๐ ๐ ๐ , where ๐ โ R๐1ร๐ and ๐ โ R๐2ร๐. Then,
vec(๐ด) = ๐ด = ๐ ๐ ๐= ๐
๐
= ๐ โ๐ vec(I)
The (modified) Hadamard gadget โ = โ๐๐=1 ๐๐ โ ๐๐ โ ๐๐ may be viewed as a natural
extension of the vectorized identity vec(I) = โ๐๐=1 ๐๐ โ ๐๐. One of the most classical
tensor factorizations is based on this correspondence:
Definition 4.1 (CP decompostion). A decomposition (4) of a tensor ๐ก into a sum of ๐ rank-one elements is called a CP decomposition with rank ๐ .
The name is a synergy of CANDECOMP (canonical decomposition) and PARAFAC(parallel factors). CP decompositions always exist. Indeed, we have defined the tensorproduct space R๐1 โ R๐2 โ R๐3 as the linear hull of all elementary tensor products๐๐ โ ๐๐ โ ๐๐. The CP decomposition seeks to achieve the opposite: decompose a generaltensor into a sum of ๐ elementary tensor products. However, finding them is in generalvery challenging for the following reasons:
9
1. Finding a CP decomposition with rank ๐ equal to the tensor rank ๐ wouldimplicitly identify the tensor rank. However, we know that the problem ofdetermining the rank of a tensor is NP-hard.
2. CP decompositions are unique, but not invariant under linear transformations.Therefore, we cannot assume that the columns of ๐ด, ๐ต, or ๐ถ are orthogonal.
3. The existence of border rank shows that CP decompositions need not be stable.
At the same time, CP decompositions are highly valuable in concrete applications.As already pointed out, the name CP is a snythesis of two different names that have along history:
1. CANDECOMP was introduced for analyzing multiple similarity/dissimilaritymatrices from a variety of subjects. Simple averaging of all subjects annihilatesdifferent points of view. Example: vowel sound date from different individuals(mode 1) spoke different vowels (mode 2) and the format (pitch, frequency pattrn)was measured (mode 3). This point of view was adopted by many research groupsin different fields ranging from chemometrics, neuroscience to telecommunication.
2. PARAFAC was introduced, because tensor methods eliminate ambiguities asso-ciated with traditional PCA. In contrast to matrices, tensor factorizations arealmost always unique.
4.3 An alternating least squares algorithm for approximating the CP decomposi-tion
As already pointed out, there are complexity-theoretic obstructions towards computingthe CP decomposition exactly. This however, does not mean that we cannot come upwith approximation heuristics. Here, we present one such heuristic that is based onalternating least squares. It is designed to compute individual iterations as quickly aspossible. The hope is then that many simple iterations ultimately converge to somethinguseful.
The concrete goal is to compute an accurate rank-๐ approximation of a given tensor๐ก โ R๐1 โ R๐2 โ R๐3 . The extended inner product provides us with a natural distancemeasure โ the extended Euclidean norm (1) โ that quantifies approximation accuracy.Moreover, we shall consider the approximation rank ๐ as a free input parameter to ouralgorithm. For given ๐ , we aim to solve
minimize โ๐กโ [[๐ด, ๐ต, ๐ถ]]โ subject to ๐ด โ R๐1ร๐ , ๐ต โ R๐2ร๐ , ๐ถ โ R๐3ร๐.
Solving this problem requires simultaneous optimization over three different matrixvariables. This is typically a very challenging problem. Alternating least squares (ALS)is a popular approach for iteratively solving such problems heuristically. First, fix ๐ต, ๐ถ,solve for ๐ด and update ๐ด to be the optimal solution of the single-variable problem.Then, fix ๐ด, ๐ถ, solve for ๐ต and update accordingly. Keeping ๐ด, ๐ต fixed and updating๐ถ in a similar fashion completes one ALS cycle. This cycle is repeated many times untilsome stopping condition is reached. Prominent stopping conditions are (i) very little
10
input : tensor ๐ก โ R๐1 โ R๐2 โ R๐3 , rank ๐ Initialize ๐ด โ R๐1ร๐ , ๐ต โ R๐2ร๐ , ๐ถ โ R๐3ร๐ , ๐max (max nr. of iterations);while ๐ < ๐ do
๐ดโ ๐(1)(๐ต *๐ถ)(๐ต๐ ๐ต โ๐ถ๐ ๐ถ
)โ ;
๐ต โ ๐(2)(๐ด *๐ถ)(๐ด๐ ๐ดโ๐ถ๐ ๐ถ
)โ ;
๐ถ โ ๐(3)(๐ด *๐ต)(๐ด๐ ๐ดโ๐ต๐ ๐ต
)โ ;
(Break loop if a certain stopping condition is reached);๐โ ๐ + 1;
endoutput : matrix factors ๐ด โ R๐1ร๐ , ๐ต โ R๐2ร๐ , ๐ถ๐3ร๐
Algorithm 1: Alternating least squares (ALS) algorithm for approximating CPdecompositions.
improvement in the objective function, (ii) very little change in the factor matrices, (iii)the objective value (norm to target tensor) is close to zero, or (iv) a pre-fixed maximumnumber of cycle repetitions is exhausted.
We emphasize that initialization also plays a very important role in such ALS-typealgorithms. The initial choices of ๐ด, ๐ต and ๐ถ may affect the performance considerably.A naive initialization would correspond to populate all three matrices with randomentries. This often works well in practice, because random initialization may avoid โhardproblem instancesโ. Smarter initialization techniques use spectral information aboutmatriciations of ๐ก to reduce the distance of the initialization to the final target tensor.This may limit the number of cycle repetitions required for convergence. However, weemphasize that we have barely scratched the surface here.
Algorithm 1 summarizes pseudo-code for an ALS approach to computing CP decom-positions. The individual updates are optimized to require as few resources as possible.To understand their working, let us focus on the first sub-iteration. Fix ๐ต, ๐ถ andoptimize the norm distance over ๐ด โ R๐1ร๐ . The Euclidean norm has an interestingfeature. It is invariant under re-arranging tensor indices. This in particular includesmatricitations. This together with the first identity in Eq. (6) allows for isolating thecontribution of ๐ด to the norm difference:
โ๐กโ [[๐ด, ๐ต, ๐ถ]]โ =๐(1) โ๐ด(๐ต *๐ถ)๐
.
Since ๐ต, ๐ถ and ๐(1) are fixed, minimizing this (Frobenius) norm distance reduces to asimple least squares problem. The optimal solution is well-known and corresponds to
๐ดโฏ = ๐(1)((๐ต0 *๐ถ)๐
)โ ,
where โ denotes the Moore-Penrose pseudo-inverse. Computing this pseudo-inverseis the most expensive step in the update ๐ด โ ๐ดโฏ. The cost scales polynomially in
11
the dimension ๐2๐3๐ of ๐ต *๐ถ. The identity (3) allows for significantly reducing thisdimension:
๐ดโฏ = ๐(1)(๐ต *๐ถ)(๐ต๐ ๐ต โ๐ถ๐ ๐
)โ . (7)
This re-formulation only requires computing the pseudo-inverse of ๐ต๐ ๐ต โ๐ถ๐ ๐ถ โ a๐ ร๐ matrix.
After we have updated ๐ด according to (7), we move on to optimizing over ๐ตexclusively. Choosing a different matriciation โ ๐(2) in this case โ allows for isolation ๐ตand repeating the least squares arguments from before. A similar analysis extends tothe optimization over ๐ถ exclusively, where the right matriciation is ๐(3).
Lecture 14: The Tucker decomposition for tensors
Scribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 15, 2019
1 Agenda1. Prelude: different views on matrix factorization2. The Tucker decomposition
(a) Definition and Motivation(b) Algorithms(c) Specifications
2 Prelude: different views on matrix factorizationMatrix factorizations are an indispensable tool for both analytical and numerical linearalgebra. The key idea is to decompose a large matrix into smaller constituents that areeasier to analyze and work with. As such, matrix factorizations feature prominently indata analysis.
In a nutshell, there are two different views on matrix factorization that are somewhatdual to each other. What is more, both approaches are ultimately based on the singularvalue decomposition and therefore yield comparable results. This equivalence, howeveris broken for higher order tensors.
One view gives rise to the CP decomposition, while the other motivatates the Tuckerdecomposition.2.1 Independent component analysis (ICA)Let R๐ร๐ denote the linear space of real-valued ๐ร๐ matrices. This space is the linearhull of all outer products (rank-one matrices):
R๐ร๐ ={
๐โ
๐=1๐ฅ๐๐ฆ
๐๐ : ๐ฅ1, . . . , ๐ฅ๐ โ R๐, ๐ฆ1, . . . , ๐ฆ๐ โ R๐, ๐ โ N
}.
Outer products are the elementary elements that generate the matrix space. A naturalapproach to matrix factorization is to find the best elementary fit to a given matrix,remove its contribution and iterate. More precisely, fix ๐ โ R๐ร๐ and execute thefollowing iterative procedure.
Firstly, identify the best rank-one fit to ๐. This can be obtained my maximizingthe Rayleigh quotient:
maximize๐โR๐,๐โR๐
โจ๐, ๐๐โฉโ๐โโ๐โ (1)
Secondly, ๐(1) = ๐โฏ๐โฏ๐๐โฏ and remove this leading contribution from ๐. I.e. update
๐ โฆโ๐ โ๐(1). This step is often called deflation.
2
This two-step procedure can be repeated ๐ times to obtain a sequence of outerproducts that approximates ๐ ever more accurately:
๐ โ๐โ
๐=1๐(๐) =
๐โ
๐=1๐๐๐๐๐
๐๐ .
As ๐ increases, this approximation becomes more accurate and exactly reproduces ๐once ๐ = rank(๐). In many concrete applications, ๐ โช min{๐, ๐} already provides anexcelent approximation.
However, the order of contributions is also important. By construction, Importantly,the relevance of each contribution diminishes with each iteration: ๐๐ โฅ ๐๐+1 for all๐ = 1, . . . , ๐ โ 1.
Vectorization provides a straightforward mapping of this procedure to order-twotensors:
vec(๐) โ๐โ
๐=1๐๐๐๐ โ ๐๐ โ R๐ โ R๐.
We approximate ๐ก = vec(๐) โ R๐ โ R๐ by a sequence of elementary tensor products๐๐๐๐ โ ๐๐. The CP decomposition is a natural extension of this factorization approachto higher order tensors. For instance,
๐ก โ๐โ
๐=1๐๐๐๐ โ ๐๐ โ ๐๐ โ R๐ โ R๐ โ R๐,
where ๐1, . . . , ๐๐ โ R๐, ๐1, . . . , ๐๐ โ R๐, ๐1, . . . , ๐๐ โ R๐ and ๐1, . . . , ๐๐ โ R.2.2 Principal component analysis (PCA)Independent component analysis treats R๐ร๐ as a vector space. It decomposes ๐ into alinear combination of distinguished elements, namely rank-one matrices. However, R๐ร๐
is more than just a vector space. We can also multiply matrices. This is the startingpoint for another approach to matrix factorization: approximate ๐ by a product of(smaller) matrices:
๐ โ ๐ ๐ ๐ where ๐ โ R๐ร๐, ๐ โ R๐ร๐, ๐ถ โ R๐ร๐. (2)
Ideally, one chooses ๐ โช min{๐, ๐} to expose latent structures.This approach has a long and proud tradition in statistics that dates back to Pearson
in 1901. It is called principal component analysis. A data matrix ๐ is decomposed intofactors โ the columns of ๐ โ and loadings โ the columns of ๐ . The factors isolate corefeatures of the data, while the loadings highlight how these core features need to becombined.
Concrete approximations require a notion of distance on R๐ร๐. Typically, onechooses the Frobenius norm. PCA then corresponds to choosing a value for ๐ andsolving the following optimization problem:
minimize๐ โR๐ร๐, ๐ โR๐ร๐
๐ โ ๐ ๐ ๐
.
3
In contrast to the previous factorization approach, it is not so clear how to extend thismethod to tensors. Tensor product spaces do not have a natural algebra structure โ itis not clear how to multiply them.
The following re-interpretation of PCA helps to overcome this challenge, but is alsoinsightful by itself. Matrices can be interpreted as either elements of a vector space, orconcrete realizations of a linear operator:
๐ โ R๐ร๐ vs. ๐ โ โ(R๐,R๐).
We can interpret ๐ in Eq. (2) as a linear operator from a small space R๐ to a muchlarger space R๐. In contrast, we treat ๐ ๐ as an element of the (small) vector spaceR๐ร๐. This interpretation can be readily extended to order-two tensors:
vec(๐) = ๐ดโ Iโ โ blow-up
vec(๐ )โ โ tensor
.
The core information of a concrete data table is contained in a small matrix โ theloadings which determine interactions among final rows and columns. The factorscorrespond to a blow-up that embed these interactions in a much larger column space.2.3 Relation between both approachesFor matrices, ICA and PCA are closely related. In fact it is useful to view them asprimal and dual approaches to solve the same problem. This close relation is due to thesingular value decomposition (SVD) โ the royal emperor of all matrix factorizations.This single decomposition solves both ICA and PCA at once. Fix ๐ โ R๐ร๐ and applya SVD:
๐ = ๐ฮฃ๐ ๐ =rank(๐)โ
๐=1๐๐๐ข๐๐ฃ
๐๐ .
Assume that the singular values are arranged in non-increasing order and the vectors๐ข๐ โ R๐, ๐ฃ๐ โ R๐ are orthogonal and normalized. Then,
โจ๐ข1, ๐๐ฃ1โฉ =rank(๐)โ
๐=1๐๐โจ๐ข1, ๐ข๐โฉโจ๐ฃ๐, ๐ฃ1โฉ = ๐1.
which is the maximum Rayleigh quotient value achievable:
โจ๐, ๐๐โฉโ๐โโ๐โ โค โ๐โโ = ๐1.
This highlights that the rank-one matrix ๐1๐ข1๐ฃ๐1 provides the best rank-one approxima-
tion to ๐. Subtracting this contribution and iterating the procedure reveals additionalSVD triples (๐๐, ๐ข๐, ๐ฃ๐). Their order is dictated by the size of the singular values. For ๐iterations, we obtain the following approximation accuracy:
โ๐ โ๐โ
๐=1๐๐๐ข๐๐ฃ
๐๐ โ2 =
rank(๐)โ
๐=๐+1๐2
๐ .
4
The SVD also provides an optimal solution for PCA. Set ๐โฏ = [๐1๐ข1, . . . , ๐๐๐ข๐] โ R๐ร๐
and ๐โฏ = [๐ฃ1, . . . , ๐ฃ๐]. Then,
โ๐ โ ๐โฏ๐๐โฏ โ2 = โ๐ โ
๐โ
๐=1๐๐๐ข๐๐ฃ
๐๐ โ2 =
rank(๐)โ
๐=๐+1๐2
๐ .
Not only, does this approximation accuracy exactly coincide with ICA-value. TheEckart-Young-Mirski Theorem asserts that this value is optimal and cannot be furtherimproved.
This equivalence breaks down for tensors of higher order. To illustrate the mainproblem, let us consider the following seemingly trivial identity:
๐โ
๐=1๐๐๐ข๐ โ ๐ฃ๐ =๐ฮฃโ ๐
๐โ
๐=1๐๐ โ ๐๐ = ๐ฮฃโ ๐ vec(I) = vec
(๐ฮฃ๐ ๐
)= vec
(๐โฏ๐
๐โฏ
).
The left hand side is the (vectorized) ICA, while the right hand side is the (vectorized)PCA. The steps in between, however, break down for tensors of higher order. Wecannot move around operators for free anymore. Consequently, the two approachesobtain a rather different flavor and give rise to the oldest and most classical tensordecompositions:
1. ICA gives rise to the CP decompostion โ the topic of Lecture 13.2. PCA gives rise to the Tucker decomposition โ todayโs topic.
3 The Tucker decompositionOnce more we shall focus our attention of order three tensors: ๐ก โ R๐1 โ R๐2 โ R๐3 . Ageneralization to higher orders is straightforward, but becomes more involved notation-wise.
The key idea behind the Tucker decomposition is to view ๐ก as a blow-up of anotherorder-three tensor that lives in a much smaller space.
Definition 3.1. A tensor ๐ก โ R๐1 โ R๐2 โ R๐3 admits a Tucker decomposition of localdimensions (๐ 1, ๐ 2, ๐ 3) if there are matrices ๐ด โ R๐1ร๐ 1 , ๐ต โ R๐2ร๐ 2 , ๐ถ โ R๐3ร๐ 3
and a tensor ๐ โ R๐ 1 โ R๐ 2 โ R๐ 3 such that
๐ก = ๐ดโ๐ต โ๐ถ๐.
This representation becomes interesting if the local dimensions are much smallerthan the ambient dimensions: ๐๐ โช ๐๐ for ๐ = 1, 2, 3. In this case, the core tensor ๐ ismuch smaller than the original tensor ๐ก. The latter is recovered by blowing up ๐ indifferent directions. Importantly, these blow-ups are assumed to be independent:
๐ดโ๐ต โ๐ถ โ โ(R๐ 1 ,R๐1
)โ โ
(R๐ 2 ,R๐2
)โ โ
(R๐ 3 ,R๐3
).
5
It is instructive, to expand the Tucker decomposition further. Write ๐ด = [๐1, . . . , ๐๐ 1 ],๐ต = [๐1, . . . , ๐๐ 2 ], ๐ถ = [๐1, . . . , ๐๐ 3 ] and interpret ๐ โ R๐ 1 โ R๐ 2 โ R๐ 3 as a ternaryarray: [๐๐,๐,๐] โ R๐ 1ร๐ 2ร๐ 3 . Then,
๐ก =๐ 1โ
๐=1
๐ 2โ
๐=1
๐ 3โ
๐=1๐๐๐๐๐๐ โ ๐๐ โ ๐๐.
Remark 3.2. The CP decomposition is a special case of a Tucker decomposition. Set๐ 1 = ๐ 2 = ๐ 3 = ๐ and fix ๐ = โ๐ = โ๐
๐=1 ๐๐ โ ๐๐ โ ๐๐ โ the Hadamard tensor. Then,
๐ดโ๐ต โ๐ถโ๐ =๐โ
๐=1๐๐ โ ๐๐ โ ๐๐.
The above remark highlights that the Tucker decomposition is more flexible thanthe CP decomposition. It also has a natural interpretation in terms of data compression.This becomes interesting when choosing ๐ โช ๐1, ๐ โช ๐2 and ๐ 3 โช ๐3 still results in anaccurate approximation of the original tensor. The โbig tensorโ ๐ก โ R๐1โR๐2โR๐3 arisesfrom blowing up a much smaller tensor that mediates interactions between differentfactors. Hence, ๐ may be viewed as a tensor generalization of the loadings matrix inPCA. The individual factors ๐ด, ๐ต and ๐ถ describe how these original interactions areembedded in a much larger space.
Although not necessary, one typically assumes that the blow-ups ๐ด, ๐ต, ๐ถ areisometric embeddings. In other words, they correspond to matrices with orthogonalcolumns normalized to unit length.
Example 3.3 (Tucker decomposition for matrices). Identify R๐1ร๐2 with the tensor productR๐1 โ R๐2 . The exact correspondence is provided by vectorization and its inverse.Applying this inverse to a tucker decomposition of an order two-tensor yields
๐ = vecโ1(๐ดโ๐ต๐) = ๐ดvec(๐)โ1๐ต๐ = ๐ด๐บ๐ ๐ .
Importantly the core matrix ๐บ = vecโ1(๐) โ R๐1ร๐ 2 need not be diagonal. Thisadditional flexibility may allow for achieving accurate approximation with even smallerinternal degrees of freedom than the SVD (there, the core matrix ฮฃ is necessarilydiagonal).
3.1 SpecificationsThe Tucker decomposition is rather general and flexible. Several specifications of theTucker decomposition have become popular in the data analysis literature.3.1.1 Tucker2 decomposition
Set one of the blow-ups to be the identity matrix. For instance, choose ๐ 3 = ๐3 and fix๐ถ = I. A Tucker2 decomposition of a tensor is
๐ก = ๐ดโ๐ต โ I๐.
6
The tensor ๐ itself mediates interactions between the first two tensor factors and thethird. This may be viewed as a PCA with an additional tensor product constraint onthe factor matrix. To see this, consider a matriciation that leaves the third tensor factorinvariant:
๐ก(3) = ๐(3)๐ด๐ โ๐ต๐ and consequently ๐ก๐
(3) = ๐ดโ๐ต๐๐(3).
As advertised, ๐๐(3) โ R๐1๐2ร๐3 is a matrix that mediates correlations between the first
two tensor factors and the final one.3.1.2 Tucker1 decomposition
The Tucker1 decomposition is an even more radical extension of the previous restriction.Set two of the blow-ups to be the identity. For instance, choose ๐ 2 = ๐2, ๐ 3 = ๐3and fix ๐ต = I, as well as ๐ถ = I. This specification recovers the traditional PCA for๐1 ร ๐2๐3 matrices. Choose a matriciation for the first component and observe
๐ก(1) = ๐ด๐(1) where ๐ด โ R๐1ร๐ 1 , ๐(1) โ R๐ 1ร๐2๐3 .
3.2 Computing the Tucker decompositionIt should not come as a surprise that the problem of computing Tucker decompositionsis challenging. We will now describe an alternating least squares algorithm (ALS)that attempts to find good Tucker approximations of a given order-three tensor. Theapproximation quality is measured in the Euclidean norm induced by the extendedstandard inner product:
โ๐กโ2 = โจ๐ก, ๐กโฉ =โ
๐,๐โฒ
โ
๐,๐โฒ
โ
๐,๐โฒ๐ก๐๐๐๐ก๐โฒ๐โฒ๐โฒโจ๐๐, ๐๐โฒโฉโจ๐๐ , ๐๐โฒโฉโจ๐๐, ๐๐โฒโฉ =
โ
๐,๐,๐
๐ก2๐๐๐.
ALS-type algorithms are a common heuristic to address complicated, multi-objectiveoptimization problems. The core idea is to fix all but one variable and optimize overthe remaining variable while expending as few computational resources as possible.Sweeping across all different variables results in an update for each contribution andthe cheapness of each step allows for iterating this procedure many times.
On first sight, the problem of finding a Tucker decomposition is similar to computingthe CP decomposition of a tensor. However, there are two core differences: (i) the coretensor ๐ โ R๐1 โR๐ 2 โR๐ 3 is a new optimization variable. (ii) the blow-ups ๐ด, ๐ต and๐ถ are assumed to be isometries.
Nonetheless, the final ALS algorithm looks rather similar to the one we developedfor the CP decomposition. Pseudo-code for it is provided in Algorithm1.
Let us start the discussion of the algorithm with the update for the core tensor ๐.Suppose that ๐ด, ๐ต, ๐ถ are fixed isometries. Isometric invariance of the Euclidean normreadily implies
โ๐กโ๐ดโ๐ต โ๐ถ๐โ =๐ด๐ โ๐ต๐ โ๐ถ๐ ๐กโ ๐
.
Clearly, this expression is minimized if we choose
๐ = ๐ด๐ โ๐ต๐ ๐ถ๐ ๐ก โ R๐1 โ R๐ 2 โ R๐ 3 .
7
input : tensor ๐ก โ R๐1 โ R๐2 โ R๐3 , inner dimensions (๐ 1, ๐ 2, ๐ 3)Initialize isometries ๐ด โ R๐1ร๐ 1 , ๐ต โ R๐2ร๐ 1 , ๐ถ โ R๐3ร๐ 3 and ๐max;while ๐ < ๐max do
๐ โ ๐ด๐ โ๐ต๐ โ๐ถ๐ ๐ก;Compute SVD of ๐(1)๐ต โ๐ถ, extract top left singular vectors and update๐ดโ [๐ข1, . . . , ๐ข๐ 1 ]๐ ;
Compute SVD of ๐(2)๐ดโ๐ถ,extract top left singular vectors and update๐ต โ [๐ข1, . . . , ๐ข๐ 2 ]๐ ;
Compute SVD of ๐(3)๐ดโ๐ต, extract top left singular vectors and update๐ถ โ [๐ข1, . . . , ๐ข๐ 3 ]๐ ;
(Break loop if a certain stopping condition is reached);๐โ ๐ + 1;
endoutput : matrix factors ๐ด โ R๐1ร๐ , ๐ต โ R๐2ร๐ , ๐ถ๐3ร๐
Algorithm 1: Alternating least squares (ALS) algorithm for approximating the Tuckerdecomposition.
This simple update rule fully takes care of the core tensor and we can restrict ourattention to optimizing the isometries individually. Suppose that ๐ is of the form (3.2).Then, the fact that ๐ด, ๐ต and ๐ถ are isometries implies
โ๐กโ๐ดโ๐ต โ๐ถ๐โ22 =โจ๐ก, ๐กโฉ โ 2โจ๐ด๐ โ๐ต๐ โ๐ถ๐โ โ
๐
๐ก, ๐โฉ+ โจ๐, ๐ด๐ ๐ดโ โ Iโ๐ต๐ ๐ตโ โ
Iโ๐ถ๐ ๐ถโ โ
I๐โฉ
=โ๐กโ2 โ 2โจ๐, ๐โฉ+ โจ๐, ๐โฉ = โ๐กโ2 โ โจ๐, ๐โฉ.
The first contribution โ๐กโ2 is fixed and constant. In turn, minimizing the Euclideandistance to ๐ก is equivalent to maximizing the norm of the core tensor ๐ = ๐ด๐โ๐ต๐โ๐ถ๐ ๐ก.If we keep ๐ต,๐ถ fixed, the single-objective optimization over the remaining isometrybecomes
๐ดโR๐1ร๐1
๐ด๐ โ๐ต๐ โ๐ถ๐ ๐ก
2subject to ๐ด๐ ๐ด = I.
We can isolate the contribution of ๐ด๐ by choosing a particular matriciation (recall thatthe Euclidean norm remains invariant under re-grouping of indicies):
๐ด๐ โ๐ต๐ โ๐ถ๐ ๐ก
2=
๐ด๐ ๐(1)๐ต โ๐ถ
2
๐น
Here, โ ยท โ๐น denotes the Frobenius norm (the natural Euclidean norm for matrices).Principal component analysis tells us how me maximize this expression over all isometries.Simply compute an SVD ๐ก โ ๐ต โ ๐ต = ๐ฮฃ๐ ๐ โ R๐1ร๐ 2๐ 3 , collect the leading ๐ 1singular vectors and transpose:
๐ด๐โฏ = [๐ข1, . . . , ๐ข๐ 1 ] โ R๐ 1ร๐1 .
8
The computational cost of this update is governed by the SVD. We emphasize that ourapproach to the problem results in the SVD of a ๐1 ร๐ 2๐ 3-matrix. The (potentiallylarge) dimensions ๐2 and ๐3 do not feature at all.
The same idea allows for updating the other isometries ๐ต and ๐ถ in an analo-gous fashion. The only difference is that other matriciations are used to isolate thecontributions of a given isometry (๐(2) for ๐ต and ๐(3) for ๐ถ).
Lecture 15: Tensor train decomposition IScribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 20, 2019
1 Agenda1. Motivation: the Schmidt decomposition for order-two tensors2. Tensor train decomposition: keep applying the Schmidt decomposition sequentially3. Examples of tensor train representations
2 Motivation: Schmidt decompositions of order-2 tensors2.1 Recapitulation: The SVD
Let ๐ด โ โ(C๐2 ,C๐1) be an operator, or equivalently: a complex-valued ๐1 ร ๐2 matrix.Every such operator admits a singular value decomposition:
๐ด = ๐ฮฃ๐ *.
Here, ฮฃ โ R๐ร๐ is a diagonal matrix that collects the singular values and ๐ โ C๐1ร๐,๐ โ C๐2ร๐ are isometries. In wiring notation, we write
๐ด = ๐ ฮฃ ๐ *
and use a round box to notationally underline the diagonal nature of ฮฃ.The SVD by itself may already be a compression of the original matrix. Suppose
that ๐ด has rank ๐ โช min{๐1, ๐2}. Then, the inner lines denote indices that live in C๐,rather than C๐1 (left), or C๐2 (right). Even if this is not the case, we may obtain acompressed approximation by truncating the inner index dimension from ๐ = rank(๐ด)to ๐ โค ๐. Define ฮฃ(๐ ) = diag(๐1, . . . , ๐๐ , 0, . . . , 0) and set ๐ด(๐ ) = ๐ฮฃ(๐ )๐
*. Isometricinvariance of the Frobenius norm then implies
๐ด โ ๐ด(๐ )
2
๐น=๐(ฮฃ โ ฮฃ(๐ )
)๐ *
2
๐น=
๐โ
๐=๐ +1๐๐(๐ด)2. (1)
The Eckart-Young-Mirski Theorem asserts that this rank-๐ approximation is optimal.2.2 The Schmidt decomposition of order two tensorsThe SVD and its truncated version readily extend to tensors of order two. Recallthat we may identify the space of complex-valued ๐1 ร ๐2 matrices with C๐1 โ C๐2 .The precise correspondence is provided by vectorization (๐ฅ๐ฆ๐ โฆโ ๐ฅ โ ๐ฆ) and the outerproduct representation (๐ฅ โ ๐ฆ โฆโ ๐ฅ๐ฆ๐ ). Identify ๐ฅ โ C๐1 โ C๐2 with a ๐1 ร ๐2 matrix๐ and apply an SVD:
๐ฅ = ๐ = ๐ ฮฃ ๐ *= ๐
๐
ฮฃ = ๐
๐๐
2
Here, we have implicitly defined ๐ = vec(ฮฃ) = โ๐๐=1 ๐๐๐๐ โ ๐๐ โ C๐ โ C๐. It is
worthwhile to formulate this decomposition formula explicitly without wiring diagrams.Set ๐ = rank(๐) and decompose the isometries into columns: ๐ = [๐ข1, . . . , ๐ข๐] โ C๐1ร๐,๐ = [๐ฃ1, . . . , ๐ฃ๐] โ C๐2ร๐. Then, the above decomposition reads
๐ฅ = ๐ โ ๐ vec(ฮฃ) =๐โ
๐=1๐ โ ๐
๐โ
๐=1๐๐๐๐ โ ๐๐ =
๐โ
๐=1๐๐๐ข๐ โ ๐ฃ๐ โ C๐1 โ C๐2 . (2)
Definition 2.1. The decomposition (2) is called a Schmidt decomposition. The parameter๐ = rank(vecโ1(๐ฅ)) is called the Schmidt-rank.
Schmidt decompositions are an important tool in quantum information theory. Theyfeatures prominently in the study of bi-partite entanglement. The following propertiesfollow directly from the SVD.
Proposition 2.2. The Schmidt decomposition (2) has many desirable features:
1. the weights ๐1, . . . , ๐๐ are strictly positive,2. {๐ข1, . . . , ๐ข๐} is a set of ๐ orthonormal vectors in C๐1,3. {๐ฃ1, . . . , ๐ฃ๐} is a set of ๐ orthonormal vectors in C๐2.
Another important feature of Schmidt-decomposition is compressed approximation.The following claim is an immediate consequence of Equation (1).
Corollary 2.3. Fix ๐ฅ โ C๐1 โC๐2 and ๐ โ N. Then, the truncated Schmidt decomposition๐ฅ(๐ ) = โ๐
๐=1 ๐๐๐ข๐ โ ๐ฃ๐ is the best (tensor) rank-๐ approximation of ๐ฅ. It achieves
โ๐ฅ โ ๐ฅ(๐ )โ2 =๐โ
๐=๐ +1๐2
๐ .
where, โ ยท โ denotes the Euclidean norm on C๐1 โ C๐2.
2.3 Context: relation between Schmidt and CP+Tucker decompositionsThe Schmidt decomposition (2) is a natural starting point for cornering tensor decom-positions. For order-two tensors it reads
๐ฅ = ๐
๐๐ =
๐โ
๐=1๐๐
๐
๐
๐๐
๐๐
(3)
The CP decomposition generalizes the final expression to higher order tensors:
๐ก =๐ โ
๐=1๐๐
๐๐
๐๐
๐๐๐ด
๐ต
๐ถ
=๐ โ
๐=1๐๐๐๐ โ ๐๐ โ ๐๐ โ R๐1 โ R๐2 โ R๐3 .
3
The Tucker decomposition is based on a generalization of the second expression inEq. (3):
๐ก =๐ด
๐ต
๐ถ
๐ =๐ 1โ
๐=1
๐ 2โ
๐=1
๐ 3โ
๐=1๐๐๐๐๐๐ โ ๐๐ โ ๐๐ โ R๐1 โ R๐2 โ R๐3 .
While both decomositions coincide for order-two tensors โ see Eq. (3) โ they obtaina unique genuine tensor flavor when extended to higher order. Moreover, each gen-eralization comes at a price: Corollary 2.3 does not generalize. It is not clear howto truncate CP and Tucker decompositions in an optimal fashion. The tensor traindecomposition โ the main topic of the remaining lectures โ is designed to preserve thisoptimal compression property.
3 The tensor train decomposition3.1 Derivation of tensor train representationsThe Schmidt decomposition (2) provides a way to factorize order-two tensors into sums ofelementary tensors. We can naively apply it to an order-three tensor ๐ก โ C๐1 โC๐2 โC๐3
by grouping the second and third tensor factor together: Interpret C๐2 โC๐3 as a singlecomplex vector space C๐2๐3 of much larger dimension. In wiring notation,
๐ก = ๐ก =๐ด(1)
๐
๐(1)=
๐ด(1)
๐
๐(1),
where we have set ๐ด(1) = ๐ โ C๐1ร๐. Here, ๐ denotes the Schmidt rank of ๐ก viewed asan order-two tensor. A single Schmidt decomposition allows for decoupling the firsttensor factor from the rest. But nothing stops us from repeating this procedure. Inserta resolution of the identity to obtain
๐ด(1)
๐
๐(1)=
๐โ
๐=1
๐ด
๐
๐(1)๐๐ ๐๐ (4)
and apply a Schmidt decomposition to each ๐ ๐๐ โ C๐2 โ C๐3 individually:
๐๐๐ = ๐ (๐) = ๐ต(๐)
๐ถ(๐)๐(๐) = ๐ต
๐ถ๐(2)
๐๐
. (5)
4
The last reformulation is a mathematical trick. We absorb the index dependence intoan additional tensor degree of freedom: ๐ต corresponds to an order-three tensor thatimplicitly takes care of the labeling. Inserting Eq. (5) into Eq. (4) yields
๐ก =๐โ
๐=1
๐ด
๐ต
๐ถ
๐๐ ๐๐๐(1)
๐(2)=
๐ด
๐ต
๐ถ
๐(1)
๐(2),
where we have absorbed the resolution of the identity. We can now bend indices anduse ๐(๐) = vec(ฮฃ(๐)) to represent the wiring diagram on the right hand side in a moresymmetric fashion. To further increase readability, we also rotate wiring diagrams by90 degrees:
๐ก = ๐ด ฮฃ(1) ๐ต ฮฃ(2) ๐ถ . (6)
This is a tensor train decomposition. We can further pinpoint the underlying tensorstructure by absorbing the diagonal matrices ฮฃ(1) and ฮฃ(2) into one of the other tensorcomponents:
๐ก = ๏ฟฝ๏ฟฝ ๐ต ๏ฟฝ๏ฟฝ . (7)
The tensor train decomposition factorizes a general order-three tensor into a โtrainโ ofthree individual tensors: the boundary is comprised of order two-tensors ๏ฟฝ๏ฟฝ, ๏ฟฝ๏ฟฝ whilethe center corresponds to an order three tensor ๐ต. This asymmetry can be definedaway by another minor modification. Include an additional tensor factor on each end ofthe train and combine them with a trace operation:
๐ก = ๐ด ๐ต ๐ถ ๐ .
The matrix ๐ provides an additional degree of freedom. Choosing an outer product๐ = ๐ฆ๐๐ฆ
*๐ effectively recovers the asymmetric tensor train from (7). An extension of
this decomposition to tensors of arbitrary order is now straightforward, e.g. for order-8tensors we obtain
๐ก = ๐ด(1) ๐ด(2) ๐ด(3) ๐ด(4) ๐ด(5) ๐ด(6) ๐ด(7) ๐ด(8) ๐ (8)
A single order three-tensor โ a โwagonโโ represents each tensor factor. These wagonsare connected by an internal line that represents a virtual degree of freedom. This lineconnects all the wagons as well as the single matrix ๐ at the top โ the โlocomotiveโ.
5
3.2 Definition of tensor trains and key features
Definition 3.1. A tensor train (TT) is a tensor ๐ก โ C๐1โยท ยท ยทโC๐๐ that is fully characterizedby an array of ๐ order three tensors
{๐ด(๐)
}โ C๐ท
๐ โ C๐ท๐+1 โ C๐๐ and a single matrix๐ โ C๐ท๐+1ร๐ท1 :
๐ก =๐1โ
๐1=1ยท ยท ยท
๐๐โ
๐๐ =1tr(๐ด
(1)::๐1๐ด
(2)::๐2 ยท ยท ยท ๐ด
(๐)::๐๐
๐)๐๐1 โ ๐๐2 โ ยท ยท ยท โ ๐๐๐ . (9)
Here, ๐ด(๐)::๐๐
โ C๐ท๐ร๐ท๐+1 denote the frontal matrix slices of ๐ด(๐) with respect to the thirdfactor.
Note that we may express each order three tensor ๐ด(๐) completely by its frontalslices:
๐ด(๐)๐๐
= ๐ด(๐)::๐๐
for 1 โค ๐๐ โค ๐๐.
This re-formulation is not only convenient notation-wise. It also highlights the origin ofan alternative name for tensor trains.
Remark 3.2 (Alternative name: matrix product state (MPS)). According to Eq. (9), each expan-sion coefficient of ๐ก with respect to the extended standard basis is a trace of a productof matrices: ๐ก๐1ยทยทยท๐๐ = tr
(๐ด
(1)๐1 ยท ยท ยท ๐ด
(๐)๐๐
๐). In quantum mechanics, vectors are typically
associated with pure states: Up to normalization, ๐ = ๐ก๐ก* describes a joint pure stateof ๐ quantum mechanical systems. These two features together explain an alternativenomenclature from quantum mechanics: matrix product states.
It is instructive to introduce the following summary parameters for external andinternal dimensions:
๐ = max1โค๐โค๐
๐๐ and ๐ท = max1โค๐โค๐+1
๐ท๐.
The maximum internal dimension ๐ท is called the bond dimension. The dimension ofthe tensor product C๐1 โ ยท ยท ยท โ C๐๐ is then roughly ๐๐ , while the number of degrees offreedom in a TT is
deg(TT) = ๐๐๐ท2 + ๐ท2 = (๐๐ + 1)๐ท2. (10)
The tensor train representation is complete: every tensor ๐ก โ C๐1 โ ยท ยท ยท โ C๐๐ can berepresented as a TT. This expressiveness, however, does not come cheap. In general,the internal dimensions ๐ท must scale exponentially in the number of tensor factors:
๐ท โ ๐(๐โ1)/2โ
๐.
This relation readily follows from comparing (10) to the overall tensor space dimension๐๐ . It should not come as a surprise โ dimensions of tensor products grow very quickly.Nonetheless, such a general scaling is prohibitively expensive. Why would we want
6
to represent individual expansion coefficients of an order ๐ tensor as the trace of aproduct of ๐ ๐๐/2 ร ๐๐/2 matrices?
The true advantage of the TT formalism stems from manually trimming the internaldimension to a much smaller value. Suppose that ๐ท only scales polynomially in thenumber of tensor factors ๐ . Then, the associated TT is described by
deg(TT) = ๐ถ๐poly(๐)
degrees of freedom โ an exponential compression. What is more, the connection to theSVD exactly tells us how we have to trim a general tensor train: isolate the diagonalsingular value matrices between the trains, see Eq. (6), and truncate them to onlycontain the ๐ท largest singular values. The Eckart-Young-Mirski Theorem ensures thatthis compression is optimal โ even for large tensor products. The incurred approximationerror is bounded by the singular values that we cut out.3.3 Examples and and a non-example
3.3.1 Elementary tensor product
Consider the elementary tensor product ๐ก = ๐1 โ ยท ยท ยท โ ๐1 โ C๐ โ ยท ยท ยท โ C๐. Then, itsexpansion coefficients with respect to the extended standard basis are
[๐ก]๐1,...,๐๐= ๐ฟ๐1,1 ยท ยท ยท ๐ฟ๐๐ ,1.
These admit a particularly concise tensor train decomposition. Choose ๐ท = 1 (trivialbond dimension) and set ๐ด๐ = ๐ฟ1,๐ โ C โ C1ร1 for all 1 โค ๐ โค ๐ and ๐ = 1 โ C = C1ร1.Then,
tr(๐ด๐1๐ด๐2 ยท ยท ยท ๐ด๐๐ ๐ ) = ๐ฟ๐1,1 ยท ยท ยท ๐ฟ๐๐ ,1 = [๐ก]๐1,...,๐๐.
More general elementary tensor products ๐ฅ1 โ ยท ยท ยท โ ๐ฅ๐ can be constructed in a similarfashion.3.3.2 The GHZ state
Define the following tensor ๐ด โ C2 โ C2 โ C2 (๐ = ๐ท = 2) via its frontal slices
๐ด1 = ๐ด::1 =(
1 00 0
), ๐ด2 = ๐ด::2 =
(0 00 1
)
and set ๐ = I โ C2ร2. Then, this collection defines a tensor train on (C๐)โ๐ withbond dimension ๐ท. Note that ๐ด2
๐ = ๐ด๐ for ๐ = 1, 2 and ๐ด1๐ด0 = 0 โ C2ร2. Therefore,
๐ก =2โ
๐1=1ยท ยท ยท
2โ
๐๐ =1tr(๐ด๐1 ยท ยท ยท ๐ด๐๐ I)๐๐1 โ ยท ยท ยท โ ๐๐๐
=2โ
๐1=1ยท ยท ยท
2โ
๐๐ =1๐ฟ๐1=ยทยทยท=๐๐ ๐๐1 โ ยท ยท ยท โ ๐๐๐ = ๐โ๐
1 + ๐โ๐2 .
This describes a highly structured tensor product vector that is associated to a prominentpure quantum state ๐ = ๐ก๐ก*/2 โ the GreenbergerโHorneโZeilinger (GHZ) state.
7
3.3.3 The ๐ -state
Define the following tensor ๐ด โ C2 โ C2 โ C2 (๐ = ๐ท = 2) via its frontal slices:
๐ด1 = ๐ด::1 =(
1 00 1
)= ๐ผ, ๐ด2 = ๐ด::2 =
(0 10 0
)and set ๐ =
(0 11 0
).
Note that ๐ด21 = ๐ด1, ๐ด1๐ด2 = ๐ด2 and ๐ด2๐ด2 = 0. Moreover, tr(๐ด1๐ ) = 0 and
tr(๐ด2๐) = 1. These elementary relations fully characterize the following tensor trainwith bond dimension ๐ท on (C2)โ๐ :
๐ก =2โ
๐1=1ยท ยท ยท
2โ
๐๐ =1tr(๐ด๐1 ยท ยท ยท ๐ด๐๐ ๐ )๐๐1 โ ยท ยท ยท โ ๐๐๐
=๐2 โ ๐โ(๐โ1)1 + ๐1 โ ๐2 โ ๐
โ(๐โ2)1 + ยท ยท ยท + ๐
โ(๐โ1)1 โ ๐1.
Up to normalization, this TT produces a pure quantum state that is an equal superpo-sition of all possible permutations of ๐2 โ ๐
โ(๐โ1)1 :
๐ก = ๐ !๐โจ๐ ๐2 โ ๐1 โ ยท ยท ยท โ ๐1.
The associated (pure) quantum state is called the ๐ -state. It features prominently inthe study of multi-partite entanglement.3.4 A random element of (C๐)โ๐
Set ๐1 = . . . = ๐๐ = ๐ and consider a Haar-random unit vector in (C๐)โ๐ โ C๐๐:
๐ข โผ S(C๐๐
).
The parameter counting argument from (10) suggests that an exponentially large bonddimension ๐ท โ ๐๐/2 is required to accurately approximate this generic tensor. This isindeed the case. To see this, divide the ๐ tensor factors into two families. The first ๐1factors are grouped into family ๐ด, while the remaining ๐ โ ๐1 factors belong to family๐ต. Haar integration implies the following concentration identity:
Etr๐ต(๐ข๐ข*) โ 1
๐๐1๐ผ
2< ๐โ(๐โ๐1)/2.
We refer to Homework Sheet II for details. Moreover, Leviโs Lemma asserts that thenorm deviation of any concrete realization of ๐ข will concentrate sharply around thisexpected value. Next, choose ๐1 = ๐/3. This ensures ๐๐1 โฅ ๐(๐โ๐1/2) and in turn,
๐๐1tr๐ต(๐ข๐ข*) โ I
2< 1 with overwhelming probability.
Next, note that we may express I as a sum of dim(๐ด) = ๐๐1 outer products: I = โ๐ ๐ฃ๐๐ฃ
*๐ .
Inserting this into the norm bound demands๐
๐1tr๐ต(๐ข๐ข*) โ๐๐1โ
๐=1๐ฃ๐๐ฃ
*๐
2
< 1 with overwhelming probability.
8
This has profound consequences. The partial trace tr๐ต(๐ข๐ข*) must approximate each ofthe ๐๐1 outer products to accuracy strictly larger than one. This is only possible, if theTT representation gives rise to at least ๐๐1 different outer products when taking thepartial trace. In turn, this imposes a lower bound on the bond dimension that connectsthe system ๐ด with the system ๐ต:
๐ท๐1,๐1+1 โฅ ๐๐1 = ๐๐/3.
We thus conclude that the largest bond dimension for expressing a generic vector mustgrow exponentially in the number of tensor factors. This argument may be readilyextended to lower bound any virtual degree of freedom: ๐ท๐ โฅ ๐๐/3 for all 1 โค ๐ โค ๐ .Indeed, a random vector ๐ข โ S๐๐ does not care about the specific ordering of tensorfactors and we may permute them at will.
Lecture 16: Tensor train decomposition IIScribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 22, 2019
1 Agenda1. Recapitulation: tensor trains (TT)/matrix product states (MPS)2. TT properties and symmetries3. Exponential decay of correlations accross a tensor
2 Recapitulation: tensor trains/ matrix product states2.1 Definition of tensor trainsTensor trains (TT) are a decomposition of order-๐ tensors. They arise from sequentiallyapplying Schmidt decompositions to separate individual tensor factors one at a time.This approach is rather different from other prominent tensor decompositions and hasvery unique features. Over the last decades, TT have become a highly useful tool inquantum physics, as well as machine learning.
Definition 2.1 (tensor train (TT)). A tensor train representation of ๐ก โ C๐1 โ ยท ยท ยท โ C๐๐ ischaracterized by an array of ๐ order-three tensors ๐ด(1) โ C๐ท1 โC๐ท2 โC๐1 , . . . , ๐ด(๐) โC๐ท๐ โ C๐ท๐+1 โ C๐๐ , and a single matrix ๐ โ C๐ท๐+1ร๐ท1 :
๐ก =๐1โ
๐1=1ยท ยท ยท
๐๐โ
๐๐ =1tr(๐ด
(1)::๐1๐ด
(2)::๐2 ยท ยท ยท ๐ด
(๐)::๐๐
๐)๐๐1 โ ยท ยท ยท โ ๐๐๐ โ C๐1 โ ยท ยท ยท โ C๐๐ . (1)
The elements ๐ด(๐)::๐๐
โ C๐ท๐ร๐ท๐+1 denote the frontal slices of the tensor ๐ด(๐), and๐1, . . . , ๐๐๐
denotes the standard basis of C๐๐ .
The TT decomposition becomes exceptionally clear in wiring notation. Here is anexample of a general TT for ๐ = 8 factors:
๐ด(1)
C๐1
C๐ท1
๐ด(2)
C๐2
C๐ท2
๐ด(3)
C๐3
C๐ท3
๐ด(4)
C๐4
C๐ท4
๐ด(5)
C๐5
C๐ท5
๐ด(6)
C๐6
C๐ท6
๐ด(7)
C๐7
C๐ท7
๐ด(8)
C๐8
C๐ท8
๐ตC๐ท9
The internal dimensions ๐ท1, . . . , ๐ท๐+1 are often called bond dimensions. Their size doesnot feature in the final tensor expressions. Viewed from this angle, the bond dimensionsare โvirtualโ degrees of freedom. In contrast, the dimensions ๐1, . . . , ๐๐ are fixed andin one-to-one correspondence with the tensor product space C๐1 โ ยท ยท ยท โ C๐ท๐ โ theactual object of interest. As such, these dimensions are often called physical dimensions,because they carry an actual meaning.
2
Remark 2.2 (Alternative name: matrix product states (MPS)). The expansion (1) is characterizedby the trace of a product of ๐ matrices โ the frontal slices associated with the tensortrains. Moreover, the outer product ๐ก๐ก* is proportional to a pure state of a joint quantumsystem. For these reasons, tensor trains are typically called matrix product states in thephysics literature.
TT are only valuable if the bond dimension scales moderately in the number of tensorfactors. A simple parameter counting argument highlights that this is a substantialrestriction. A generic tensor will require an exponentially large bond dimension foraccurate representation.In summary: TT can only efficiently represent a tiny fraction of all elements in a verylarge tensor space. However, this small fraction contains tensor products with veryreasonable behavior that reflects our intuition about how physically meaningful tensorsshould behave. One such feature โ exponential decay of correlations โ will be coveredlater in todayโs lecture.2.2 Additional definitions and Physics jargonThe first definition addresses tensor product spaces, independent of concrete tensorrepresentations.
Definition 2.3 (thermodynamic limit for tensor products). Set ๐1 = ยท ยท ยท = ๐๐ = ๐. Thethermodynamic limit is the limit of infinitely large tensor products of C๐:
lim๐โโ
(C๐)โ๐
.
The thermodynamic limit may be viewed as a mathematical idealization. The studyof many tensor-related properties โ as well as tensors themselves โ often becomes easier.A concrete example are moments of random variables and, more generally, polynomials.We know from earlier lectures that both are related to the symmetric subspace of(C๐)โ๐ . High order moments tend to become more and more well-behaved and regular.Likewise, polynomials of very high degrees accurately approximate smooth functions(Taylorโs theorem) which are often easier to work with.
Within physics, the thermodynamic limit arises naturally when one tries to approx-imate infinite dimensional Hilbert spaces in a discrete fashion. Informally speaking,it marks the transition between matrix analysis and functional analysis. Traditionalquantum mechanics is phrased in this language. Interestingly, the first introduction oftensor trains / matrix product states was phrased in this language (Fannes, Nachtergaele,Werner, Finitely correlated states on quantum spin chains, 1992). Only later, FrankVerstraete (then at Caltech) and others discretized this observation and popularizedthe TT/MPS framework in its current finite-dimensional form.
Definition 2.4 (Translation invariant tensor trains). A tensor train ๐ด(1), ๐ด(๐), ๐ is translationinvariant if all order-three tensors are the same
๐ด(๐) = ๐ด(๐) โ C๐ท โ C๐ท โ C๐ for all 1 โค ๐, ๐ โค ๐.
This in particular implies ๐1 = ๐2 = ยท ยท ยท ๐๐ and ๐ท1 = ๐ท2 = ยท ยท ยท = ๐ท๐+1 = ๐ท.
3
Translation invariance substantially reduces the complexity of a TT decomposition. Itnaturally arises in applications, where the individual tensor factors are indistinguishable,e.g. a quantum system describing ๐ identical particles on a line. Symmetric tensors๐ก โ โ๐
(C๐)
are another promising candidate for such a substantial simplification.Such tensors arise naturally when one considers the moment distribution of data โ e.g.frequencies of words in topic related texts.
Definition 2.5 (Periodic boundary conditions). A tensor train ๐ด(1), . . . , ๐ด(๐), ๐ is said tohave periodic boundary conditions if ๐ท๐+1 = ๐ท1 and ๐ = I.
This nomenclature has a geometric origin. The left and right-most constituents of aTT are connected by a trace. If ๐ต = I, there is no discontinuity when moving fromthe ๐ -th train to the first. Effectively, the trains form a circle, not a line. Periodicboundary conditions go well with translation invariance. Combining both assumptionsresults in a circle of identical trains with physical indices pointing outwards:
๐ด
C๐
C๐ท
๐ด
C๐
C๐ท
๐ดC๐
C๐ท
๐ด
C๐ C๐ท ๐ด
C๐
C๐ท
๐ด
C๐
C๐ท
๐ด C๐
C๐ท
๐ด
C๐C๐ท๐ด
C๐
C๐ท
2.3 Uniqueness of tensor train decompositions
Recall that tensor factorizations are typically unique. Kruskalโs theorem provides arather mild condition that ensures that the minimal rank decomposition of a tensor โ i.ethe optimal CP decomposition โ is unique up to trivial ambiguities. This is not the casefor matrix factorizations. If ๐ = ๐๐ * is a matrix factorization, then so is ๐๐ ๐ โ1๐ *
for any invertible matrix ๐ โ C๐ร๐โฒ (๐โฒ โฅ ๐). Tensor train decompositions behave in asimilar fashion. This lack of uniqueness should not come as a surprise. After all, wedeveloped tensor train decomposition by sequentially applying Schmidt decompositions.The latter simple correspond to matriciating tensors and applying a matrix SVD.
Theorem 2.6 (Gauge freedom). Let ๐ด(1), . . . , ๐ด(๐), ๐ be a TT with physical dimensions๐1, . . . , ๐๐ and bond dimensions ๐ท1, . . . , ๐ท๐+1. Choose ๐1 โฅ ๐ท1, . . . , ๐๐+1 โฅ ๐ท๐+1and ๐ 1 โ C๐1ร๐ท1 , . . . , ๐ ๐+1 โ C๐๐+1ร๐ท๐+1 such that each matrix admits a left inverse,
4
i.e. ๐ โ ๐ ๐ ๐ = ๐ผ โ C๐ท๐ร๐ท๐. Then, the transformations
๐ด(๐)::๐๐
โฆโ๐ ๐๐ด(๐)::๐๐
๐ โ ๐+1 for all 1 โค ๐๐ โค ๐๐,
๐ โฆโ๐ ๐+1๐ ๐ โ 1
do not affect the associated tensor ๐ก โ C๐1 โ . . . โ C๐๐ .
Proof. With wiring diagrams. We will address the case ๐ = 4, but the proof readilygeneralizes. Let ๏ฟฝ๏ฟฝ(1), . . . , ๏ฟฝ๏ฟฝ(4) and ๐ต denote the descriptions that result from such atransformation. Then,
๐ก = ๏ฟฝ๏ฟฝ(1) ๏ฟฝ๏ฟฝ(2) ๏ฟฝ๏ฟฝ(3) ๏ฟฝ๏ฟฝ(4) ๏ฟฝ๏ฟฝ
= ๐ด(1) ๐ด(2) ๐ด(3) ๐ด(4)๐ โ 1 ๐ 1 ๐ โ
2 ๐ 2 ๐ โ 3 ๐ 3 ๐ โ
4 ๐ 4 ๐ โ 5 ๐ 5 ๐ต
= ๐ด(1) ๐ด(2) ๐ด(3) ๐ด(4) ๐ต = ๐ก
Physicists call this freedom a gauge freedom. It is exploited by several state-of-the artalgorithms that use tensor trains. It allows for converting the order three tensors ๐ด(๐)
into a standard form that greatly reduces the cost of computing tensor contractions.
3 Exponential decay of correlations within tensor trainsWe have already alluded to physically well-motivated structures that seem to be hiddenwithin the TT formalism. Chief among them is the following feature: Correlationsbetween individual tensor factors decay exponentially with their mutual distance.3.1 Conditional expectation valueIn order to rigorously state this claim, we need a bit of additional notation.
Definition 3.1 (Conditional expectation value). Fix a tensor ๐ก โ C๐1 โ ยท ยท ยท โ C๐๐ and anoperator ๐ด acting on this tensor product space. The conditional expectation value of ๐ดwith respect to ๐ก๐ก* is
โจ๐ดโฉ๐ก๐ก* = tr(๐ด๐ก๐ก*)โจ๐ก, ๐กโฉ = โจ๐ก, ๐ด๐กโฉ
โจ๐ก, ๐กโฉ .
The origin of this notion hails from quantum mechanics. The normalized outerproduct ๐ก๐ก*/โจ๐ก, ๐กโฉ describes a pure quantum state of the joint system comprised of ๐(potentially different) quantum mechanical systems.
The operator ๐ด may correspond to an element in a quantum mechanical measure-ment: ๐ด = ๐ป๐๐
โชฐ 0, where โ๐ ๐ป๐๐= I. In this case, the conditional expectation
value tells us the probability for obtaining outcome ๐๐ when measuring ๐ = ๐ก๐ก*.
5
3.2 Aside: quantum measurements vs. observablesMany physicists like to combine a quantum measurement (resolution of the identity)with the associated outcomes to obtain a single hermitian operator. This operator iscalled an observable:
๐ =โ
๐
๐๐๐ป๐.
The conditional expectation value of an observable then corresponds to the expectedmeasurement outcome:
โจ๐โฉ๐ก๐ก* = tr(โ
๐
๐๐๐ป๐๐ก๐ก*)
=โ
๐
๐๐tr(๐ป๐๐ก๐ก*) =โ
๐
๐๐Pr[๐๐|๐ก๐ก*].
Observables often have a concrete physical interpretation, like energy, or spin. Theconditional expectation value of an observable corresponds to the expected size of theassociated physical quantity. Single-shot measurement results may differ from thisexpectation.3.3 2-point correlatorsIn quantum mechanics, tensor products arise naturally when studying joint quantumsystems. Every tensor factor corresponds to a microscopic quantum system. Resourceconstraints typically prevent us from performing joint measurements on all ๐ systemssimultaneously. Instead, we may restrict our measurement effort to few select systemsand ignore (marginalize over) the rest. The following short-hand notation captures thisfeature:
๏ฟฝ๏ฟฝ๐ :=๐ผ๐1ร๐1 โ ๐ผ๐2ร๐2 ยท ยท ยท โ ๐ผ๐๐โ1ร๐๐โ1 โ ๐๐ โ ๐ผ๐๐+1ร๐๐+1 โ ยท ยท ยท โ ๐ผ๐๐ ร๐๐,
๏ฟฝ๏ฟฝ๐ :=๐ผ๐1ร๐1 โ ๐ผ๐2ร๐2 ยท ยท ยท โ ๐ผ๐๐โ1ร๐๐โ1 โ ๐๐ โ ๐ผ๐๐+1ร๐๐+1 โ ยท ยท ยท โ ๐ผ๐๐ ร๐๐
Each of these operators only acts non-trivially on the ๐-th and ๐-th tensor factor,respectively. For ๐ = ๐, the product ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐ = ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐ commutes and acts non-trivially onboth the ๐-th and the ๐-th tensor factor.
One of the most fascinating aspects of quantum mechanics is that measurementsnecessarily affect the quantum system in question. An interaction โ e.g. a measurementโ with the ๐-th system may affect the joint quantum state of all constituents. Thefollowing definition provides a way to probe this effect:Definition 3.2 (2-point correlator). For a tensor ๐ก โ C๐1 โ ยท ยท ยท โ C๐๐ and operators ๐๐ โC๐๐ร๐๐ , ๐๐ โ C๐๐ร๐๐ we define the 2-point correlator :
โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* โ โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*
Two point correlators allow us to probe the spread of local perturbations withinthe joint quantum system. Suppose that we poke the ๐-th system and are interested inestimating how severely the ๐-th system is affected by this interaction. Then, โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*
is a meaningful measure to address this question. It vanishes if and only if ๏ฟฝ๏ฟฝ๐ has noinfluence on the conditional expectation value โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* . Otherwise, it is strictly largerand upper-bounds the strength of correlations between the ๐-th and the ๐-th subsystem
6
3.4 Tensor trains and exponential decay of 2-point correlators
Theorem 3.3. Let ๐ก โ C๐1 โยท ยท ยทโC๐๐ be a tensor that admits a translationally invariantTT representation with periodic boundary conditions that is also injective1. Then,2-point correlators decay exponentially with distance:
โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* โ โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*
โค poly(๐, ๐ท, โ๐๐โโ, โ๐๐โโ)eโ๐|๐โ๐|. (2)
Note that this result is only meaningful if the bond dimension is moderate, i.e.๐ท = ๐ถ๐poly(๐). Otherwise, the pre-factor could absorb any exponential decay indistance. Before proving this result, it is worthwhile to point out a strong converse.
Theorem 3.4 (Brandao, Horodecki; 2015). Let ๐ = ๐ก๐ก*/โจ๐ก, ๐กโฉ be a pure joint quantumstate of ๐ โidenticalโ systems. Suppose that all 2-point correlators decay exponentiallyin the sense of Eq. (2). Then, ๐ก โ C๐ โ ยท ยท ยท โ C๐ is well-approximated by a TT withpolynomial bond dimension ๐ท = ๐ถ๐poly(๐).
3.5 Proof of Exponential decay of correlations in tensor trains
We will present a self-contained proof of Theorem 3.3 that is based on several simplifyingassumptions:
1. Translation invariance: ๐ด(๐) = ๐ด(๐) for all 1 โค ๐, ๐ โค ๐ . This also ensures๐1 = ยท ยท ยท = ๐๐ = ๐ and ๐ท1 = ยท ยท ยท = ๐ท๐+1 = ๐ท.
2. Periodic boundary conditions: ๐ต = I,
3. Thermodynamic limit: we will assume ๐ to be โvery bigโ (think ๐ โ โ), butrefrain from a rigorous limit analysis.
We will also need a fourth assumption โ injectivity. We will introduce it later, once weneed it. Also, we assume without loss that ๐ก is normalized: โจ๐ก, ๐กโฉ = 1. For now, let us
1We refer to Definition 3.5 below for a precise definition of this term.
7
rewrite the expressions of interest in wiring notation:
โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* =๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
๐๐ ,
โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* =๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท๐๐
, (3)
โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* =๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
๐ด
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
ยท ยท ยท
๐๐ ๐๐(4)
Cyclicity of the trace allows us to assume without loss ๐ < ๐ โ as indicated in thediagrams above. If we read these expressions from left to right, they reveal a lot ofstructure. We are dealing with traces of ๐ operators that act on C๐ท โ C๐ท. What ismore, translation invariance ensures that almost all of them are the same. This centralbuilding block is called a transfer matrix:
๐ =๐ด
๐ด
=๐โ
๐=1๐ด::๐ โ ๐ด*
::๐ โ โ(C๐ท โ C๐ท
)
All but two (one) constituents are such transfer matrices. The remaining two operatorsare
๐๐๐ =๐ด
๐ด
๐๐ โ โ(C๐ท โ C๐ท
)and ๐๐๐ =
๐ด
๐ด
๐๐.
This new notation considerably simplifies the conditional expectation values. Insertingthese matrix definitions into the wiring diagram expressions readily yields
โจ๐๐โฉ๐ก๐ก* =tr(๐ ๐โ1๐๐๐ ๐ ๐โ๐โ1
)= tr
(๐ ๐โ1๐๐๐
),
โจ๐๐โฉ๐ก๐ก* =tr(๐ ๐โ1๐๐๐
๐ ๐โ๐โ1)
= tr(๐ ๐โ1๐๐๐
),
โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* =tr(๐ ๐โ1๐๐๐ ๐ ๐โ๐โ1๐๐๐
๐ ๐โ๐โ1)
= tr(๐ ๐โ2โ(๐โ๐)๐๐๐ ๐ ๐โ๐โ1๐๐๐
).
8
Here, we have also used cyclicity of the trace to further simplify these expressions. Weare now ready to phrase our fourth assumption:
4 Injectivity: The transfer matrix ๐ is diagonalizable and it has a unique largesteigenvalue of one. All other eigenvalues are smaller in modulus.
The assumption that ๐max = 1 readily follows from normalization. The power methoddemands for ๐ โ โ (thermodynamic limit)
1 = โจ๐ก, ๐กโฉ = tr(๐ ๐
)โ ๐๐
max which ensures ๐max = 1.
In contrast, uniqueness of the largest eigenvalue is a more severe demand that is essentialfor exponential decay of correlations.
Definition 3.5. A translationally invariant TT is injective if the transfer matrix is diago-nalizable and has a unique largest eigenvalue.
Apply an eigenvalue decomposition ๐ = ๐ ๐ท๐ โ1, where ๐ท = diag(๐max, ๐2, . . . , ๐๐ท2)and |๐๐| < ๐max = 1 for all 2 โค ๐ โค ๐ท2. We can again imply the power method toconclude
lim๐โโ
๐ ๐ = lim๐โโ
๐ ๐ท๐ ๐ โ1 = ๐ lim๐โโ
diag(1๐ , ๐๐
2 , . . . , ๐๐๐ท2
)๐ โ1
=๐ diag(1, 0, . . . , 0)๐ โ1 = ๐ค๐๐ค*๐ ,
where we have implicitly defined ๐ค๐ = ๐ ๐1 โ C๐ท and ๐ค๐ = (๐ โ1)*๐1 โ C๐. Injectivitytogether with the thermodynamic limit ensure that high powers of the transfer matrixwith itself approach an outer product.
This insight allows us to considerably simplify the 2-point correlator:
โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* โ โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* =tr(๐ ๐โ2โ(๐โ๐)๐๐๐ ๐ ๐โ๐โ1๐๐๐
)โ tr
(๐ ๐โ1๐๐๐
)tr(๐ ๐โ1๐๐๐
)
โtr(๐ โ๐๐๐ ๐ ๐โ๐โ1๐๐๐
)โ tr
(๐ โ๐๐๐
)tr(๐ โ๐๐๐
)
=๐ค*๐๐๐๐ ๐ ๐โ๐โ1๐๐๐
๐ค๐ โ ๐ค*๐๐๐๐ ๐ค๐๐ค
*๐๐๐๐
๐ค๐
=๐ค*๐๐๐๐ ๐ ๐โ๐โ1๐๐๐
๐ค๐ โ ๐ค*๐๐๐๐ ๐ โ๐๐๐
๐ค๐
=๐ค*๐๐๐๐
(๐ ๐โ๐โ1 โ ๐ โ
)๐๐๐
๐ค๐.
We are now almost done. Exponential decay readily follows from the following observa-tion:
๐ ๐โ๐โ1 โ ๐ โ =๐(๐ท๐โ๐โ1 โ ๐ทโ
)๐ โ1 = ๐ diag
(1๐โ๐โ1 โ 1, ๐๐โ๐โ1
2 , . . . , ๐๐โ๐โ1๐ท2
)๐ โ1
=๐ diag(0, ๐๐โ๐โ1
2 , . . . , ๐๐โ๐โ1๐ท2
)๐ โ1.
Since |๐๐| < 1 for all 2 โค ๐ โค ๐ท2, this matrix difference decays exponentially in anymatrix norm. A quick look at the expression above highlights that control of the
9
operator norm suffices:โจ๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก* โ โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*โจ๏ฟฝ๏ฟฝ๐โฉ๐ก๐ก*
โค๐*
1๐ โ1๐๐๐ ๐(๐ท๐โ๐โ1 โ ๐ทโ
)๐ โ1๐๐๐
๐ ๐1
โคโ๐ โ1๐๐๐ ๐ ๐1โโ2โ๐ โ1๐๐๐๐ ๐1โโ2
๐ท๐โ๐โ1 โ ๐ทโ
โโคpoly(๐ท, ๐, โ๐๐โโ, โ๐๐โโ) max
2โค๐โค๐ท2|๐๐|๐โ๐โ1.
This establishes the relation advertised in Theorem 3.3.
Lecture 17: Tensor train algorithms (DMRG lite)
Scribe: Richard Kueng
ACM 270-1, Spring 2019Richard Kueng & Joel TroppMay 29, 2019
1 Agenda1. Problem statement: compute ground state energies of joint quantum systems
2. Concrete examples
3. Tensor train ansatz
4. Matrix product operators
5. Alternate tensor train minimization (DMRG1)
6. Extension (DMRG2) and rigorous convergence guarantees
2 Problem statement: compute ground state energies of joint quan-tum systems
2.1 Recapitulation: quantum states
Consider the set of joint quantum states on ๐ identical systems โ each with localdimension ๐:
S(C๐)โ๐
)={
๐ โ โ(๐ปโ๐ ) : ๐ โชฐ 0, (I, ๐) = 1}
.
Here, (๐, ๐ ) = tr(๐๐ denotes the Frobenius inner product on โ((C๐)โ๐ ) This is aconvex subset of the (real-valued) space of ๐๐ ร ๐๐ hermitian matrices. The extremepoints of this set correspond to pure states:
๐ = ๐ข๐ข* with ๐ข โ (C๐)โ๐ , โจ๐ข, ๐ขโฉ = 1.
A measurement is a resolution of the identity:
๐ป๐1 , . . . , ๐ป๐๐: ๐ป๐๐
โชฐ 0,๐โ
๐=1๐ป๐๐
= I.
Bornโs rule assertsPr[๐๐|๐] = (๐ป๐๐
, ๐)
An important sub-class of measurements are projective measurements: Each ๐ป๐๐is
an orthogonal projection ๐๐๐. This ensures that the psd constraint is met by default.
Moreover, a set of orthogonal projectors forms a resolution of the identity if and onlyif the projectors project onto mutually orthogonal subspaces whose union spans all of(C๐)โ๐ .
2
2.2 Quantum mechanical observables and the ground state problem
Let ๐๐1 , . . . , ๐๐๐ be a projective quantum measurement. Suppose that the measure-ment outcomes are real-valued numbers (e.g. energy, or spin). Then, we can combinemeasurement and outcomes to a single Hermitian matrix:
๐ =๐โ
๐=1๐๐๐๐๐
.
This object is called an observable. The associated measurements arise from a spectraldecomposition. Note that
โจ๐โฉ๐ = (๐, ๐) =๐โ
๐=1๐๐(๐๐๐
, ๐) =๐โ
๐=1๐๐Pr[๐๐|๐] = E๐[๐].
This conditional expectation value measures the expected physical quantity achieved bya quantum state ๐. Arguably, the most important physical quantity of any system isenergy.
Definition 2.1. The observable associated with energy is caled a Hamiltonian and isdenoted by ๐ป โ โ((C๐)โ๐ ). Its smallest eigenvalue ๐min is called the ground stateenergy.
2.3 The ground state problem
One of the most fundamental questions in quantum physics and chemistry is: Given aHamiltonian ๐ป , find the smallest expected energy achievable and โ ideally โ a quantumstate that achieves this minimal value.
Definition 2.2. Let ๐ป be a Hamiltonian. A quantum state ๐โฏ is said to be in the groundstate if โจ๐ปโฉ๐โฏ
= min๐โจ๐ปโฉ๐.
The following immediate consequence of convexity allows for substantially reducingthe complexity of the ground state problem.
Lemma 2.3. For any Hamiltonian ๐ป, there always exists a pure state ๐ = ๐ข๐ข* thatachieves the ground state energy: โจ๐ปโฉ๐ข๐ข* = min๐โจ๐ปโฉ๐.
We note in passing that this ground state need not be unique. There might be otherpure states that achieve the same energy in expectation. Linearity then implies thatany convex mixture of such pure ground states is also a ground state.
Proof. Fix ๐ป and note that the function (๐ป, ๐) is linear in ๐ and therefore also concave.We minimize this concave function over the set of all quantum states which is convex.A fundamental result from convex optimization states that a concave function achievesits minimum over a convex set at the boundary. This boundary corresponds to the setof all pure quantum states.
3
Problem statement Let ๐ป โ โ((C๐)โ๐ ) be a Hamiltonian. The ground state problemcorresponds to solving the following Rayleigh quotient:
minimize๐ขโ(C๐)โ๐
โจ๐ข, ๐ป๐ขโฉโจ๐ข, ๐ขโฉ .
This problem is not difficult in its own right. A โsimpleโ eigenvalue decomposition of ๐ปwould readily solve it. The challenge stems from the curse of dimensionality associatedwith tensor product spaces: ๐ข lives in a ๐๐ dimensional space. Even for moderate ๐ ,this exponential growth renders a full eigenvalue decomposition of ๐ป impractical. Theassociated runtime would be ๐ช
(๐3๐
).
3 The ground state problem for spin chainsStated as it is, the ground state problem might seem strange at first. The glaring difficultystems from very large dimensions. But how do such high dimensional Hamiltoniansโvery large-dimensional hermitian matrix โ arise in the first place? This is a featureof many body physics, and โ more general โ the study of emergent phenomena (e.g.swarm behavior in certain animal species). Already very simple, structured interactionsbetween ๐ players can give rise to a very intricate global interaction patterns. Thisis in particular true for interactions among ๐ simple quantum mechanical systems.Understanding such phenomena may help to explain effects that we can measure in thelab. Recall the following short-hand notation for operators on โ((C๐)โ๐ ):
๏ฟฝ๏ฟฝ๐ = Iโ(๐โ1) โ ๐ โ Iโ(๐โ๐โ1) for ๐ โ โ(C๐), 1 โค ๐ โค ๐.
Also, recall the Pauli matrices
๐ =(
0 11 0
), ๐ =
(0 โ๐๐ 0
), ๐ =
(1 00 โ1
)
The spin is a 2-dimensional degree of freedom and loosely resembles a magnetic moment(think of an electric current that passes through a closed ring). The Pauli matrices playa crucial role in the study of spin. They correspond to observables that measure theorientation of the spin along the three different axes in space.
Now, suppose that we have prepared ๐ quantum mechanical systems along a line.And we have isolated them sufficiently from the environment and each other such thatonly their spin degree of freedom matters. This allows us to accurately approximateeach system with a 2-dimensional quantum state โ the spin state. The joint systemis described by an enormous density matrix in โ((C2)โ๐ ). Since spin resembles amagnetic moment, the individual systems are affected by external electric/magneticfields and can also interact with each other. Physicists have come up with various toymodels that reflect these types of interactions. The result is typically a big Hamiltonian
4
๐ป that is comprised of many simple terms:
โโ โโ โโ โโ โโ
external field
nearest neighbor coupling
.
For each individual particle, the energy contributions are very simple. They feature spininteractions among nearest neighbors and a contribution from an external field. Sinceenergy is additive, the full Hamiltonian then corresponds to a sum of ๐ simple terms:
๐ป = โ๐โ1โ
๐=1
(๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐+1 + ๏ฟฝ๏ฟฝ๐
)โ โ
((C2)โ๐
).
Example 3.1 (Ising model). The Ising model is arguably the simplest interesting spin chainmodel. The nearest neighbor interactions are mediated by Pauli-๐ matrices, while theexternal (magnetic) field contributes a Pauli-๐ term each:
๐ปIsing = โ๐ฝ๐โ1โ
๐=1๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐+1 โ โ
๐โ
๐=1๐๐. (1)
The parameters ๐ฝ (coupling strength) and โ (external field strengths) are the parametersof the model. Ising actually could provide an analytic solution to this ground stateproblem. The Ising model can be readily extended to higher dimensions (e.g. lattices).The higher dimensional Ising ground state problem can be solved efficiently (i.e. intime polynomial in ๐) for for planar graphs (e.g. a 2D lattice), but is NP-complete fornon-planar graphs (e.g. a lattice in 3D).
Example 3.2 (Heisenberg model). The Heisenberg model expands on Ising by consideringnearest neighbor interactions along all possible spin directions:
๐ปHeisenberg = โ๐ฝ๐
๐โ1โ
๐=1๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐+1 โ ๐ฝ๐
๐โ1โ
๐=1๐๐๐๐+1 โ ๐ฝ๐
๐โ1โ
๐=1๏ฟฝ๏ฟฝ๐๏ฟฝ๏ฟฝ๐+1 โ โ
๐โ
๐=1๐๐. (2)
This quantum mechanical model is used in the study of critical points and phasetransitions of magnetic systems. Note that the Ising model is a specification of theHeisenberg model, where ๐ฝ๐ = ๐ฝ๐ = 0.
4 Tensor Train Ansatz for solving ground state problems4.1 Recapitulation: tensor trains
Every tensor ๐ก โ (C๐)โ๐ can be expanded as a tensor train. Each โwagonโ correspondsto an order-three tensor ๐ด(๐) โ C๐ท๐ โ C๐ท๐+1 โ C๐ that is typically cut into frontalslices:
๐ด(๐)๐ = ๐ด
(๐)::๐ โ C๐ท๐ร๐ท๐+1 for 1 โค ๐ โค ๐.
5
These local tensors characterize the tensor ๐ก by means of the following expansionformula:
๐ก =๐โ
๐1=1,...,๐๐ =1tr(๐ด
(1)๐1 ๐ด
(2)๐2 ยท ยท ยท ๐ด
(๐)๐๐
)๐๐1 โ ๐๐2 ยท ยท ยท โ ๐๐๐ .
The motivation for this representation (and its name) becomes exceptionally clear inwiring notation:
๐ก =
C๐
C๐ท
C๐
C๐ท
C๐
C๐ท
C๐
C๐ท
๐ด(1) ๐ด(2) ๐ด(๐)๐ด(๐)
Remark 4.1. This is a slight modification of the TT framework introduced in previouslectures. The virtual index does not wrap around. In turn, the boundary tensors๐ด(1) and ๐ด(2) have a qualitatively different flavour from the other wagons: they areonly order-two tensors. Although highly relevant in practice, we will ignore boundaryeffects/representations and entirely focus on tensors ๐ด(๐) in the center of the train.
Tensor trains approximate arbitrary tensors. They mediate correlations betweenindividual factors by exploiting an additional degree of freedom that connects thewagons on a virtual level. The dimension of this auxiliar space ๐ท is called the bonddimension.
A large value of ๐ท โ exponentially large in ๐ โ greatly increases the expressiveness ofthe TT model. Every tensor can be represented by a TT with bond dimension ๐ท โ ๐๐/2.Small values of ๐ท โ polynomially large in ๐ โ facilitate actual tensor computations atthe cost of expressiveness. Varying the parameter ๐ท interpolates between both regimes.
Fact 4.2 (Gauge transformations). Tensor train representations are never unique. Wecan apply arbitrary invertible linear transformations along the virtual degrees of freedom.This allows us to convert a given TT into either a left- or a right- normal form:
๐ด(๐)
๐ด(๐)
= or๐ด(๐)
๐ด(๐)
= .
4.2 Tensor train ansatz for the ground state problem
Recall the ground state problem for a given Hamiltonian ๐ป โ โ((C๐)โ๐ ):
๐min = minimize๐ขโ(C๐)โ๐
โจ๐ข, ๐ป๐ขโฉโจ๐ข, ๐ขโฉ .
The challenge in solving this problem does stem from the fact that (C๐)โ๐ is a hugespace with dimension ๐๐ . A natural ansatz to approximate this problem is to restrict
6
the regime over which we optimize. Tensor trains with fixed bond dimension (b.d.) ๐ทare a natural candidate for such a restricted optimization:
๏ฟฝ๏ฟฝmin(๐ท) = minimize๐ก is TT with b.d. ๐ท
โจ๐ก, ๐ป๐กโฉโจ๐ก, ๐กโฉ . (3)
This is an optimization over a strict subset of all tensors. What is more, a moment ofthought reveals that TT with bond dimension ๐ท are included in the set of TT withbond dimension ๐ท + 1:
TT(1) โ TT(2) โ ยท ยท ยท โ TT(๐ท) โ ยท ยท ยท โ TT(๐๐/2) = (C๐)โ๐ .
In turn๏ฟฝ๏ฟฝmin(1) โฅ ๏ฟฝ๏ฟฝmin(2) โฅ ยท ยท ยท โฅ ๏ฟฝ๏ฟฝmax(๐ท) โฅ ยท ยท ยท โฅ ๏ฟฝ๏ฟฝmin(๐๐/2) = ๐min,
where the last equality follows from complete expressiveness of TT for sufficiently highbond dimension.
Although simple, this is a profound insight. While varying the bond dimension, weobtain ever more accurate approximations of the true ground state. However, there is atrade-off. The runtime of the underlying algorithm will scale polynomially in ๐ท.4.3 Problem reformulation: Matrix product operatorsIt is instructive to rewrite the objective function in (3) in wiring formalism:
โจ๐ก, ๐ป๐กโฉโจ๐ก, ๐กโฉ =
๐ด(๐)
๐ด(๐)
๐ด(๐)
๐ด(๐)
๐ด(1)
๐ด(1)
๐ป
๐ด(๐)
๐ด(๐)
๐ด(๐)
๐ด(๐)
๐ด(1)
๐ด(1)
The denominator reveals a lot of structure. In particular, its form already suggeststhe potential benefits of transforming the individual ๐ด(๐)โs into a suitable normal form.The big Hamiltonian in the enumerator, however, breaks this nice sequential structure.It seems highly advisable to decompose it further into a tensor expression that mimicsthe structure of tensor trains.
Definition 4.3 (Matrix product operator). A matrix product operator (MPO) is fully char-acterized by collection of ๐ order four tensors ๐ (๐) โ C๐ โ C๐ โ C๐ทโฒ โ C๐ทโฒ (thinkoperator-valued matrices of size ๐ทโฒ ร ๐ทโฒ) and two vectors ๐ฃ๐, ๐ฃ๐ โ C๐ทโฒ :
๐
C๐
C๐
C๐
C๐
= ๐ (๐) ๐ (๐)๐ (1)
C๐
C๐
C๐
C๐
C๐
C๐
C๐ทโฒC๐ทโฒ
๐ฃ๐ ๐ฃ๐ โ โ(C๐)โ๐
)
7
Every operator acting on a tensor space may be represented as a MPO. Thedecomposition may be achieved in a fashion similar to the derivation of tensor trains.However, in general, the bond dimension must scale exponentially with the number oftensor factors: ๐ทโฒ โ ๐๐ is necessary to accurately represent a generic operator.
However, the Hamiltonians we consider for the ground state problem are typicallyvery far from being generic. Their simple structure manifests itself in a tiny bonddimension.
Example 4.4 (MPO for the Ising Hamiltonian). The Ising Hamiltonian ๐ปIsing โ โ((C2)โ๐ )(1) is fully characterized by a single MPO with bond dimension ๐ทโฒ = 3:
๐ (๐) =
โโโ
I 0 0๐ 0 0
โโ๐ โ๐ฝ๐ I
โโโ and ๐ฃ๐ =
โโโ
001
โโโ , ๐ฃ๐ =
โโโ
100
โโโ .
Example 4.5 (MPO for the Heisenberg Hamiltonian). The Heisenberg Hamiltonian ๐ปHeisenberg โโ((C2)โ๐ ) (2) is fully characterized by a single MPO with bond dimension ๐ทโฒ = 5:
โโโโโโโ
I 0 0 0 0๐ 0 0 0 0๐ 0 0 0 0๐ 0 0 0 0
โโ๐ โ๐ฝ๐๐ โ๐ฝ๐ ๐ โ๐ฝ๐๐ I
โโโโโโโ
and ๐ฃ๐ =
โโโโโโโ
00001
โโโโโโโ
, ๐ฃ๐ =
โโโโโโโ
10000
โโโโโโโ
.
5 DMRG lite5.1 OverviewWe are now ready to discuss a simplified version of DMRG. The task is to approximatelysolve the ground state problem for a big Hamiltonian๐ป โ โ
((C๐)โ๐
). We will do a TT
ansatz and use an alternating least squares heuristics to individually optimize ๐ด(๐) bykeeping all other tensors fixed. One iteration consists of ๐ independent optimizations โone for each tensor train wagon โ and results in a global update of the TT approximationto the ground state. Repeating this โsweepโ many times heuristically boosts convergence.The problem parameters are:
โ local physical dimension ๐: typically this is small, e.g. ๐ = 2 for spins.โ number of quantum systems ๐ : we suppose that this number is very large.โ bond dimension ๐ทโฒ of the expansion of ๐ป as a matrix product operator: this
is fixed and typically small, e.g. ๐ท = 3 for the Ising model, or ๐ทโฒ = 5 for theHeisenberg model.
โ bond dimension ๐ท: this is a free parameter that we get to choose. We supposethat it scales moderately in the problem size: ๐ท = poly(๐).
Our ALS-type algorithm is based on a nifty sub-routine to individually optimizetensor ๐ด(๐) individually. Importantly, each optimization can be achieved in runtime
8
polynomial in ๐, ๐ทโฒ, ๐ and the model parameter ๐ท. This is efficient, as long as we donot choose ๐ท to be too large. This results in a total runtime of ๐ = poly(๐, ๐ทโฒ, ๐, ๐ท) =poly(๐, ๐ทโฒ, ๐, ๐ท) for each iteration. This polynomial cost is cheap when compared tothe exponentially large problem dimension ๐๐ . This gain allows us to repeat thesesequential updates many times to โ hopefully โ boost convergence to the the groundstate energy.5.2 ALS subroutine
We focus on optimizing one tensor ๐ด(๐) while keeping all other elements in the TTfixed. Next, we insert an MPO representation of the Hamiltonian ๐ป in the enumeratorof the objective energy function:
โจ๐ก, ๐ป๐กโฉ =
๐ท
๐ทโฒ
๐ท
๐
๐
๐ด(๐)
๐ด(๐)
.
The MPO formalism ensures that this wiring diagram now looks very similar to thedenominator:
โจ๐ก, ๐กโฉ =
๐ท
๐ท
๐
๐ด(๐)
๐ด(๐)
.
We can now contract everything to the left of the ๐ด(๐) and everything to the right toget an effective environment tensor:
โจ๐ก, ๐ป๐กโฉ =๐ด(๐)
๐ด(๐)
๐ป(๐)eff
=
๐
๐
๏ฟฝ๏ฟฝ(๐)eff
= โจ๐, ๏ฟฝ๏ฟฝ(๐)eff ๐โฉ, ๐ = vec
(๐ด(๐)
)โ C๐๐ท2
.
The last equation follows from vectorization and a re-arrangement of the tensor factorsin ๐ป
(๐)eff . It is easy to check that the runtime for constracting ๐ป
(๐)eff scales polynomially
in ๐ท, ๐ทโฒ, ๐ and linearly in ๐ .At a comparable cost, we can construct a similar environment tensor for the
denominator:
โจ๐ก, ๐กโฉ =๐ด(๐)
๐ด(๐)
๐ธ(๐)๐ ๐ธ
(๐)๐ก
=
๐
๐
๏ฟฝ๏ฟฝ(๐)๐ ๏ฟฝ๏ฟฝ
(๐)๐
= โจ๐, ๏ฟฝ๏ฟฝ(๐)๐ โ I๏ฟฝ๏ฟฝ(๐)
๐ ๐โฉ.
A smart normal form convention in the TT can further simplify this expression. If alltensors ๐ด(๐) left from ๐ด(๐) (1 โค ๐ < ๐) are in left-normal form and all tensors right
9
from ๐ด(๐) (๐ < ๐ โค ๐) are in right-normal form, the effective tensors become trivial1:๏ฟฝ๏ฟฝ
(๐)๐ = I and ๏ฟฝ๏ฟฝ
(๐)๐ = I. Under these assumptions and reformulations, the optimization
problem exactly resembles a Raiyleigh quotient:
minimize๐ด(๐)
โจ๐ก, ๐ป๐กโฉโจ๐ก, ๐กโฉ = minimize
๐โC๐ท2๐
โจ๐, ๏ฟฝ๏ฟฝ(๐)eff ๐โฉ
โจ๐, ๐โฉ .
It can be solved by computing the eigenvalue decomposition of ๏ฟฝ๏ฟฝ(๐)eff โ โ
(C๐ท2๐
),
extracting the smallest eigenvector ๐โฏ and re-shaping it into a tensor. The runtime ofa dense eigenvalue decomposition is of order ๐ช(๐๐ท2). A subsequent conversion of theupdated ๐ด(๐) into left- or right normal form comes at a similar cost.5.3 Extensions and rigorous resultsThe above sweeping procedure is often called DMRG1. It is an iterative procedure,where the individual tensors in a TT representation are updated sequentially. The bonddimension ๐ท is a proper input to the heuristic โ there is no easy way to change it withinthe algorithm.
This can be a severe drawback in practice. Once we start with a certain bonddimension value ๐ท, we must stick to it. DMRG2 is a conceptually simple refinement ofDMRG1 that allows to adjust the bond dimension dynamically while the algorithm isrunning. The main idea is to group two tensors together and treat them as a singletensor:
๐ด(๐) ๐ด(๐+1) = ๏ฟฝ๏ฟฝ(๐,๐+1) .
Subsequently, apply DMRG1 to this coarse-grained tensor network. Once an update for๏ฟฝ๏ฟฝ(๐,๐+1) is obtained, we can subsequently apply a singular value decomposition to pullthe two original tensors apart. The decay of the spectrum associated with this SVDprovides us with valuable guidance on how to adjust bond dimensions dynamically.
Last but not least, we want to emphasize that a comparatively recent result providesa rigorous underpinning for tensor train approaches to solve the ground state problem.It applies to an iterative algorithm, that is similar in spirit to DMRG, but the detailsare somewhat different. The associated rigorous convergence guarantee applies to localHamiltonians ๐ป of 1D-chains that have a spectral gap:
๐min < ๐๐ โ ๐ for all ๐๐ = ๐min and ๐ > 0 is constant.
Theorem 5.1 (Landau, Vazirani, Vidick; 2013). Let ๐ป be a local Hamiltonian of a 1Dquantum system with a constant spectral gap. Then, there is a tensor train algorithmthat accurately2 approximates both the ground state (as a tensor train) and the groundstate energy and runs in polynomial time.
1Such a smart reformulation is achievable in practice: Start the first iteration by moving from leftto right and convert all updated tensors into left normal form. Start the second iteration from rightto left and convert all updated tensors into right normal form. Continue this sweeping procedure forsubsequent iterations
2The accuracy is inverse polynomial in the problem parameters.
Homework Ihand-in date: April 29, 2019
ACM 270-1, Spring 2019Richard Kueng and Joel TroppApril 15, 2019
1 Extreme points of classical and quantum probability distributionsFix a vector space V and let ๐ฆ โ V be a convex set.
Definition 1.1 (extreme point). Let ๐ถ โ V be a convex set. A point ๐ฅ โ ๐ถ is extreme if๐ฅ = 1
2(๐ฆ + ๐ง) for ๐ฆ, ๐ง โ ๐ถ implies ๐ฆ = ๐ง = ๐ฅ.
Definition 1.2 (exposed point). Let ๐ถ โ V be a convex set. A point ๐ฅ โ ๐ถ is exposed ifthere exists a linear functional ๐ : ๐ถ โ R such that ๐(๐ฅ) = 1 and ๐(๐ฆ) < 1 for anyother point ๐ฆ โ ๐ถ.
1. Set V = R๐ and consider the convex set ฮ๐โ1(R๐) = {๐ฅ โ R๐ : ๐ฅ โฅ 0, โจ1, ๐ฅโฉ = 1}.Determine all extreme points of ฮ๐โ1(R๐).
2. Show that every extreme point of ฮ๐โ1(R๐) is also exposed.3. Show that ฮ๐โ1(R๐) is the convex hull of all extreme points.
4. Set V = H๐ and consider the convex set S(H๐) ={
๐ โ H๐ : ๐ โชฐ 0, (I, ๐) = 1}
.Determine all extreme points of S(H๐).
5. Show that every extreme point of S(H๐) is also exposed.6. Show that S(H๐) is the convex hull of all extreme points.
Definition 1.3. A density matrix ๐ is called a pure (quantum) state if it is an extremepoint of ๐ฎ(H๐).
Pure states are the quantum mechanical analogue of deterministic classical proba-bility distributions.
7. Let ๐ โ S(H๐) be a pure quantum state. Find a (finite) quantum measurement{๐ป๐ : ๐ โ ๐ด} โ H๐ for which the measurement outcomes are deterministic, i.e.Pr[๐|๐] = 1 for one ๐ โ ๐ด and Pr[๐|๐] = 0 else.
8. Let ๐ โ S(H๐) be a pure quantum state. Find a (finite) quantum measurement{๐ป๐ : ๐ โ ๐ด} โ H๐ for which the measurement outcomes are fully random, i.e.Pr[๐|๐] = 1/|๐ด| for all ๐ โ ๐ด.
2 Symmetric and Antisymmetric tensorsLet ๐ป be a ๐-dimensional vector space with inner product โจยท, ยทโฉ and designated orthonor-mal basis ๐1, . . . , ๐๐. We associate the group of permutations ๐ฎ๐ among ๐ parties withthe following operators on Hโ๐:
๐๐๐1 โ ยท ยท ยท โ ๐๐ = ๐๐โ1(1) โ ยท ยท ยท โ ๐๐โ1(๐)
2
and linearly extended to ๐ปโ๐. Define
๐โจ๐ = 1๐!
โ
๐โ๐ฎ๐
๐๐ and ๐โง๐ = 1๐!
โ
๐โ๐ฎ๐
sign(๐)๐๐.
1. Verify that the map ๐ โ ๐๐ defines a unitary representation of the permutationgroup, i.e. each ๐๐ โ โ(๐ปโ๐) is a unitary operator and the map is a grouphomomorphism: ๐๐โ๐ = ๐๐๐๐ for all ๐, ๐ โ ๐ฎ๐.
2. Find a basis for the totally symmetric subspace โ๐ = range(๐โจ๐) โ ๐ปโ๐ that isorthonormal with respect to the extended inner product on ๐ปโ๐.
3. Find a basis for the totally antisymmetric subspace โ๐ = range(๐โง๐) โ ๐ปโ๐ thatis orthonormal with respect to the extended inner product on ๐ปโ๐.
4. Determine dim(โ๐
)and dim
(โ๐).
5. Let ๐ด = [๐๐๐ ]๐๐,๐=1 be a matrix associated to an operator in โ(๐ป). Verify theLeibniz formulae
det(๐ด) :=๐!โจ๐1 โ ยท ยท ยท ๐๐, ๐โง๐๐ดโ๐๐โง๐๐1 โ ยท ยท ยท โ ๐๐โฉ =โ
๐โ๐ฎ๐
sign(๐)๐โ
๐=1๐๐,๐(๐),
perm(๐ด) :=๐!โจ๐1 โ ยท ยท ยท ๐๐, ๐โจ๐๐ดโ๐๐โจ๐๐1 โ ยท ยท ยท โ ๐๐โฉ =โ
๐โ๐ฎ๐
๐โ
๐=1๐๐,๐(๐)
6. (optional) Prove Schurโs theorem: Let ๐ด โ โ(๐ป) be positive semidefinite. Then
perm(๐ด) โฅ det(๐ด).
Hint: Every positive semidefinite matrix admits a Cholesky decomposition: ๐ด =๐ฟ๐ฟ*, where ๐ฟ is lower-triangular.
3 Wiring computations with implications for quantum informationtheory
Let (๐ป, โจยท, ยทโฉ) be a ๐-dimensional inner product space with designated orthonormalbasis ๐1, . . . , ๐๐. The outer products ๐ธ๐๐ = ๐๐๐
๐๐ then form a basis of โ(๐ป). Define
vectorization:vec : โ(๐ป) โ ๐ป โ ๐ป ๐๐๐
๐๐ โฆโ ๐๐ โ ๐๐
and extend this action linearly to โ(๐ป). Partial traces are linear contractions:
tr1(๐ โ ๐ ) = tr(๐)๐ and tr2(๐ โ ๐ ) = tr(๐ )๐ for ๐, ๐ โ โ(๐ป)
and extend this definition linearly to โ(๐ป โ ๐ป). In wiring calculus these operationsassume the following pictorial form:
vec(๐) = ๐ and tr1(๐ ) = ๐ , tr2(๐ ) = ๐
3
1. Verify the following formulas with and without wiring calculus. Fix ๐, ๐ , ๐ โโ(๐ป). Then,
vec(๐๐ ๐) =๐ โ ๐๐ vec(๐ ),tr1(vec(๐)vec(๐ )*) =๐๐ *,
tr2(vec(๐)vec(๐ )*) =(๐ *๐)๐ ,
where ๐๐ is the transpose of ๐ (with respect to the designated matrix basis๐๐๐
๐๐ ) and ๐* = ๏ฟฝ๏ฟฝ๐ is the (basis-independent) adjoint.
2. Purification: Let ๐ โ โ(๐ป) be a positive semidefinite matrix. Show that thereexists a tensor product ๐ฅ โ ๐ป โ ๐ป such that
๐ = tr2(๐ฅ๐ฅ*).
Context: In quantum mechanics, partial traces correspond to marginalization(ignore one part of a joint probability distribution). Every density matrix ๐ โ๐ฎ(H๐) corresponds to the marginalization of a larger quantum state that is pure.
3. Schmidt-decomposition: Show that every ๐ก โ ๐ป โ ๐ป can be expressed in the form
๐ก =๐โ
๐=1๐๐๐ข๐ โ ๐ฃ๐,
for positive numbers ๐1, . . . , ๐๐ and orthonormal sets {๐ข1, . . . , ๐ข๐}, {๐ฃ1, . . . , ๐ฃ๐} โ๐ป.
4 Unitary operator bases and Bell basis measurements (optional)This exercise gathers auxiliary results that are essential for quantum teleportation (nextexercise). A proof of these relations is optional!
Let ๐1, . . . , ๐๐ be a designated orthonormal basis of ๐ป = C๐. For ๐, ๐ โ [๐] ={1, . . . , ๐} define the following operators
๐๐๐๐ = ๐๐โ๐ and ๐๐๐๐ = ๐๐๐๐๐ for all ๐ โ [๐]
and extend this action linearly to ๐ป. Here โ denotes addition modulo ๐ and ๐ =exp(2๐๐/๐) is a ๐-th root of unity. Combine both to obtain
๐๐,๐ = ๐๐๐๐ for all ๐, ๐ โ [๐].
1. Unitary operator basis: Each ๐๐,๐ โ โ(๐ป) is unitary and moreover,(๐ (๐, ๐), ๐ (๐โฒ, ๐โฒ)
)= tr
(๐ (๐, ๐)โ ๐ (๐โฒ, ๐โฒ)
)= ๐๐ฟ๐,๐โฒ๐ฟ๐,๐โฒ .
2. Mixing property for unitary operator bases:
1๐2
โ
๐,๐โ[๐]๐ *
๐,๐๐๐๐,๐ = tr(๐)๐
I for all ๐ โ H๐. (1)
4
3. Bell basis measurements: Define ฮฉ = ๐โ1vec(I)vec(I)* โ H๐ โ H๐ and set
๐ป๐,๐ = (๐ (๐, ๐) โ I)ฮฉ(๐ (๐, ๐)* โ I). (2)
This defines a family of ๐2 mutually orthogonal rank-one projectors on H๐ โ H๐.Hence, {๐ป๐,๐ : ๐, ๐ โ [๐]} is a valid quantum measurement for joint quantumstates defined on S
(H๐ โ H๐
).
5 Quantum teleportationThe concept of entanglement is the basis of several surprising โquantum technologiesโ.Quantum teleportation is a process by which a quantum state ๐ โ S(H๐) can betransmitted (โteleportedโ) from one location to another. To understand it, we needtwo additional facts from quantum information theory. We refer to Watrousโ lecturenotes (Lecture 3) for details.
Fact 5.1 (Partial measurement). Let ๐ โ S(H๐ โ H๐โฒ) be a joint quantum state and
let {๐ป๐ : ๐ โ ๐ด} โ H๐ be a partial measurement on the first system only. Then, theprobability of measuring outcome ๐ โ ๐ด is
Pr[๐|๐] = tr(๐ป๐ โ I๐). (3)
Conditioned on obtaining outcome ๐ โ ๐ด, the surviving quantum state on the secondsystem becomes
๐ = 1tr(๐ป๐ โ I๐)tr1(๐ป๐ โ I๐) โ ๐ฎ(H๐).
Fact 5.2 (Quantum circuit). Quantum systems can evolve with time: ๐(initial) โฆโ ๐(final).The most basic evolution is a unitary map ๐(final) = ๐๐(initial)๐
*, where ๐ โ U(๐) isa unitary operator. Such unitary evolutions do not change the defining properties ofquantum states and form the basis of quantum processing technologies. This is why wealso call them quantum circuits.
Suppose that two parties (Alice and Bob) are at very distant locations, but share abipartite, entangled state ฮฉ โ S(H๐ โH๐). One half is with Alice, while one half is withBob. Alice can then use her half of this joint entangled state to โteleportโ an arbitraryquantum state ๐ โ S(H๐) from her location to Bob. The protocol is as follows:
(i) Alice prepares the state ๐ โ S(H๐) she wants to transmit. She jointly measures ๐and her half of the entangled state in the Bell basis (2).
(ii) She records the observed measurement outcome (๐0, ๐0) โ [๐] ร [๐] and sends thesenumbers to Bob via a classical communication channel.
(iii) Upon receiving (๐0, ๐0), Bob applies the quantum circuit ๐(final) = ๐๐0,๐0๐(initial)๐*
๐0,๐0to his half of the entangled state ฮฉ.
5
๐
ฮฉ
๐ป๐ =
๐๐
Figure 1 Schematic depiction of the first step in the quantum teleportation procedure. Alice(top) and Bob (bottom) share a entangled state ฮฉ. Alice performs a Bell basis measurement๐ป๐ (with ๐ = (๐, ๐)) on ๐ and her half of ฮฉ. This affects Bobโs part of ฮฉ which getstransformed into a novel quantum state ๐๐. Since Alice performs a measurement, herquantum systems cease to exist.
This protocol perfectly transfers Aliceโs initial quantum state ๐ to Bob. Although, Aliceโsmeasurement outcome is random, the entire teleportation procedure is deterministic.Bobโs action depends on the outcome of Aliceโs measurement and perfectly corrects it forevery possible measurement outcome. We refer to Figure 1 for a pictorial representation.
1. Show that the probability outcome distribution for Aliceโs measurement is flat:Pr[(๐, ๐)|๐] = 1/๐2 for all ๐, ๐ โ [๐].Hint: Start with the schematic diagram of the protocol in Figure 1. Evaluateformula (3) by inserting Wiring formulas for ฮฉ, ๐ป๐,๐ and contracting lines.
2. Condition on Alice measuring (๐0, ๐0). Show that Bobโs half of the maximallyentangled state necessarily transforms into ๐ *
๐0,๐0๐๐๐0,๐0 โ S(H๐).3. Prove the correct working of the quantum teleportation protocol, regardless of
Aliceโs measurement outcome.4. The transmission of ๐ from Alice to Bob happens instantaneously โ regardless of
the distance between them. This seemingly contradicts causality โ Information canat most travel with the speed of light. Such apparent implications of entanglementgreatly worried scientists for decades. Argue that there is no need to worry:Quantum teleportation does not violate causality (in expectation).Hint: Use the mixing property (1) of unitary operator bases and flatness of Aliceโsmeasurement outcome distribution.
Homework IIhand-in date: May 13th, 2019
ACM 270-1, Spring 2019Richard Kueng and Joel TroppApril 29, 2019
1 Almost all pure states are maximally entangledThe purpose of this exercise is to prove the following famous result regarding pure stateentanglement.
Theorem 1.1 (Most pure states are almost maximally entangled). Set ๐ป1 = C๐1 and๐ป2 = C๐2. Choose a joint pure quantum state ๐ = ๐ข๐ข* uniformly from the complexunit sphere S(๐ป1 โ ๐ป2) โ S๐1๐2โ1. Then, for any ๐ > 0:
Pr๐ขโผS(๐ป1โ๐ป2)
[tr2(๐ข๐ข*) โ 1
๐1I
1โฅโ
๐1๐2
+ ๐
]โค 2 exp
(๐1๐2๐2
18๐3
).
This is a consequence of Haar integration and concentration of measure:
Theorem 1.2 (Leviโs Lemma). Let ๐ : S2๐โ1 โ R be a Lipschitz-continuous functionon the real-valued unit sphere, i.e. there is a constant ๐ฟ > 0 such that |๐(๐ฅ) โ ๐(๐ฆ)| โค๐ฟโ๐ฅ โ ๐ฆโโ2 for any ๐ฅ, ๐ฆ โ S2๐โ1. Then, for any ๐ > 0
Pr๐ฅโผS2๐โ1 [|๐(๐ฅ) โ E๐ฅ๐(๐ฅ)| โฅ ๐] โค 2 exp(
โ 2๐๐2
9๐3๐ฟ2
).
1. Use Haar-integration and โ๐โ22 = tr(๐2) to show that
E๐ข
tr2(๐ข๐ข*) โ 1
๐1I
2
2โค 1
๐2.
2. Combine Jensenโs inequality and โ๐โ1 โคโ
dim(๐ป)โ๐โ2 for any ๐ โ โ(๐ป) toconclude
E๐ข
tr2(๐ข๐ข*) โ 1
๐1I
1โคโ
๐1๐2
.
3. Use Leviโs Lemma to prove Theorem 1.1. Hint: find an isometric embedding ofthe complex unit sphere S๐ทโ1 into the real-valued unit sphere S2๐ทโ1.
2 Quadrature formulas for Haar integrationHaar integration provides closed-form expressions for all moments of the uniformdistribution over the complex unit sphere:
โซ
S๐โ1d๐(๐ฃ)(๐ฃ๐ฃ*)โ๐ =
(๐ + ๐ โ 1
๐
)โ1
๐โจ๐ for all ๐ โ N.
Full knowledge of all moments is often not necessary in concrete applications. Oftentimes,control of the first ๐ก moments suffices.
2
Definition 2.1. A complex projective ๐ก-design is a finite set of ๐ unit vectors ๐ค1, . . . , ๐ค๐ โS๐โ1 such that
1๐
๐โ
๐=1(๐ค๐๐ค
*๐ )โ๐ก =
โซ
S๐โ1d๐(๐ฃ)(๐ฃ๐ฃ*)โ๐ก.
It is instructive to think of ๐ก-designs as quadrature rules for the complex unit sphere.These finite point sets approximate the uniform measure up to ๐ก-th moments. They arealso the natural extension of ๐ก-wise independent functions to the complex unit sphere.
1. Show that every ๐ก-design is also a ๐กโฒ-design with ๐กโฒ โค ๐ก.2. Let ๐ค1, . . . , ๐ค๐ โ S๐โ1 be arbitrary. Show that for any ๐ก โ N,
๐น๐ก(๐ค1, . . . , ๐ค๐ ) = 1๐2
๐โ
๐,๐=1|โจ๐ค๐, ๐ค๐โฉ|2๐ก โฅ
(๐ + ๐ก โ 1
๐ก
)โ1
(Welch bound)
with equality if and only if ๐ค1, . . . , ๐ค๐ constitutes a ๐ก-design.Hint: Compute the squared Frobenius norm-difference between the frame operator๐โ1โ๐
๐=1(๐ค๐๐ค*๐ )โ๐ก and its Haar-uniform counterpart.
3. Show that every orthonormal basis of C๐ is a 1-design4. Two orthonormal bases ๐1, . . . , ๐๐ and ๐1, . . . , ๐๐ of C๐ are mutually unbiased1
if |โจ๐๐, ๐๐โฉ|2 = ๐โ1 for all 1 โค ๐, ๐ โค ๐. It is known that at most ๐ + 1 pairwisemutually unbiased bases can exist in any dimension ๐. Show that such maximalsets of mutually unbiased bases form a 2-design of cardinality ๐ = (๐ + 1)๐.Context: Explicit constructions are known for prime power dimensions ๐ = ๐๐.In contrast, still very little is known about maximal sets of MUBs in compositedimensions (including ๐ = 6).
5. Let ๐ฃ1, . . . , ๐ฃ๐ โ S๐โ1 be a 4-design. Show that{
๐๐ ๐ฃ๐๐ฃ
*๐ : 1 โค ๐ โค ๐
}does
constitute a valid quantum measurement. Moreover, show that this fixed quantummeasurement โalmostโ achieves the Helstrom bound for successfully distinguishingstates that have low rank. The optimal probability of correctly distinguishing twostates with a single 4-design measurement obeys
๐succ โฅ 12 + โ๐ โ ๐โ1
3โ
rank(๐ โ ๐)for all ๐, ๐ โ S
(H๐).
Hint: Apply the maximum likelihood rule to the classical probability distributionsover potential outcomes. Then, rewrite this expression as the expected absolutevalue of a scalar random variable ๐ and apply the following useful inequality byBerger: E|๐| โฅ
โE[๐2]3/E[๐4].
6. Show that a 2-design measurement does not allow for drawing comparable conclu-sions. Hint: use a mutually unbiased basis measurement and carefully pick purestates ๐, ๐ to minimize the success probability.
1A prominent example are the standard and Fourier bases, respectively.
3
3 Unital quantum channelsDefinition 3.1. Set ๐ป1 = C๐1 and ๐ป2 = C๐2 A linear map ๐ณ : โ(๐ป1) โ โ(๐ป2) is a unitalquantum channel, if
(i) Trace preservation: tr(๐ณ (๐)) = tr(๐) for all ๐ โ โ(๐ป1),(ii) Unitality: ๐ณ (I๐ป1) = I๐ป2 , where I๐ป๐ โ โ(๐ป๐) denote the identity operators on ๐ป1
and ๐ป2,(iii) Complete positivity: ๐ณ โโ(๐) โชฐ 0 for all psd ๐ โ โ(๐ป1 โ ๐ป1). Here โ : ๐ โฆโ ๐
denotes the identity operator on โ(๐ป1).
1. Positivity vs. complete positivity: Let ๐ฏ : โ(๐ป1) โ โ(๐ป1) be the transposechannel: ๐ โฆโ ๐๐ . Verify that this linear map is trace preserving, unital andpositive, i.e.๐ โชฐ 0 implies ๐ฏ (๐) โชฐ 0. However, show that the transpose map isnot completely positive.
2. Equivalent characterizations of unital channels. Show that the following fourconditions are equivalent:
(a) unital quantum channels: ๐ณ : โ(๐ป1) โ โ(๐ป2) obeys all requirements fromDefinition 3.1.
(b) Kraus-representation: There is ๐ โ N and operators ๐ด1, . . . , ๐ด๐ โ โ(๐ป1, ๐ป2)such that
๐ณ (๐) =๐โ
๐=1๐ด๐๐๐ด*
๐ and๐โ
๐=1๐ด๐๐ด
*๐ = I๐ป2 ,
๐โ
๐=1๐ด*
๐ ๐ด๐ = I๐ป1 .
(c) Choi-Jamiolkowski representation: The Choi-matrix
๐ฝ(๐ณ ) = ๐ณ โ โ(๐โ1ฮฉฮฉ*
)โ โ(๐ป2 โ ๐ป1) with ฮฉ = vec(I๐ป1)
is psd and obeys tr1(๐ฝ(๐ณ )) = ๐โ11 I๐ป1 , as well as tr2((๐ฝ(๐ณ )) = ๐โ1
2 I๐ป2 .(d) Stinespring representation: there exists ๐ป3 = C๐3 and a linear isometry
๐ : ๐ป1 โ ๐ป2 โ ๐ป3 such that ๐ณ (๐) = tr3(๐๐๐*).
Context: The lack of a quantum Birkhoff-von Neumann theorem renders the studyof general quantum evolutions somewhat cumbersome. The fact that positivity doesnot imply complete positivity (see 1.) does not make things easier, either. The fourequivalent characterizations of channels provide different points view with uniqueadvantages and drawbacks: (a) summarizes the three core properties of unital channels.(b) The Kraus representation is the easiest way to construct unital quantum channels inpractice. (c) The Choi-Jamiolkowski representation is a linear bijection that pinpointsthe convex structure of unital quantum maps. They are in one-to-one relation to anaffine slice of the cone of psd operators in โ(๐ป2 โ๐ป1). It also provides a tractable way tocheck complete positivity. (d) The Stinespring representation provides a nice conceptualinterpretation of general unital channels. They arise from considering an isometricevolution on a larger joint quantum system (system+environment) and subsequentlytracing out the environment (marginalization).
4
4 Average channel fidelity and twirlingSet ๐ป = C๐ and let ๐ณ : โ(๐ป) โ โ(๐ป) be a unital quantum channel. Define the averagefidelity:
๐(๐ณ ) =โซ
S๐โ1d๐(๐ฃ)โจ๐ฃ, ๐ณ (๐ฃ๐ฃ*)๐ฃโฉ
1. Let ๐ณ (๐) = โ๐๐=1 ๐ด๐๐๐ด*
๐ be a Kraus representation of ๐ณ . Show that
๐(๐ณ ) = 1(๐ + 1)๐
(๐ +
๐โ
๐=1|tr(๐ด๐)|2
).
2. Conclude that ๐(๐ณ ) โ [(๐ + 1)โ1, 1]
and find (unital) channels that saturate bothbounds.
3. Define the twirl of ๐ณ to be the following channel:
๐ฏ๐ณ =โซ
d๐๐ฐ* โ ๐ณ โ ๐ฐ i.e. ๐ฏ๐ณ (๐) =โซ
d๐๐*๐ณ (๐๐๐*)๐ .
Show that twirling turns every unital channel ๐ณ into a depolarizing channel:
๐ฏ๐ณ (๐) = ๐๐(๐) = ๐๐ + (1 โ ๐)I/๐ with ๐ = ๐๐(๐ณ ) โ 1๐ โ 1 .
Hint: Use the following generalization of the Haar-integral formula for tensorproducts: For any ๐ด โ โ(๐ปโ2),
โซd๐๐โ2๐ด(๐*)โ2 =
(๐ + 1
2
)โ1
tr(๐โจ2๐ด)๐โจ2 +(
๐
2
)โ1
tr(๐โง2๐ด)๐โง2 .
4. (optional:) show that the composition of two depolarizing channels ๐๐ and ๐๐ isagain a depolarizing channel: ๐๐ โ ๐๐ = ๐๐๐.
Context: the average fidelity is a popular benchmark for actual implementationof quantum circuits in the lab: apply a circuit, reverse it and check how close theresulting empirical average fidelity is to one. It is easy to check that twirling leavesthe average fidelity invariant, but reduces any channel to a depolarizing channel whichis much easier to analyze. This trick is at the basis of several popular techniques forbenchmarking concrete implementations of quantum circuits. For instance, the populartechnique of randomized benchmarking cleverly combines features 3 and 4 to estimatethe average fidelity in an extremely noise-resiliant fashion.
5 One-clean qubitTypical quantum circuits require initializing a set of qubits in a pure product state. Theone-clean qubit model substantially weakens this requirement: only one qubit must beinitialized in a product state, the remaining ๐ qubits are maximally mixed (โgarbageโ).Perhaps surprisingly, a single clean qubit suffices to do useful quantum computations.The following protocol shows how to efficiently compute traces of big tensor-productunitaries ๐ โ โ(๐ปโ๐). Set ๐ป = C2 and consider the following computation thatrequires ๐ + 1 qubits:
5
(i) Initialize: ๐in = ๐0๐*0 โ
(12I)โ๐
.
(ii) Apply a Hadamard gate to the first qubit: ๐1 = ๐ป โ Iโ๐๐in๐ป* โ Iโ๐.(iii) Apply a conditional unitary circuit to all ๐ + 1 qubits:
๐ถ๐๐0 โ ๐ฅ = ๐0 โ ๐ฅ, ๐ถ๐๐1 โ ๐ฅ = ๐1 โ ๐๐ฅ for all ๐ฅ โ ๐ปโ๐
and linearly extended to ๐ปโ(๐+1): ๐2 = ๐๐ฐ(๐1) = ๐ถ๐๐1๐ถ๐*.(iv) Apply a Hadamard gate to the first qubit: ๐3 = ๐ป โ Iโ๐๐2๐ป* โ Iโ๐.(v) Perform the following two-outcome quantum measurement:
๐ป0 = ๐0๐*0 โ Iโ๐, ๐ป1 = ๐1๐*
1 โ Iโ๐ โ โ(๐ปโ(๐+1)
).
1. Draw a (wiring) circuit diagram for this quantum computation.2. Show that the probability of obtaining outcome 0 is in one-to-one correspondence
with the real-part of the trace of ๐ :
Pr[0|๐3] = 12 + Re(tr(๐))
2๐+1 and Pr[1|๐3] = 12 โ Re(tr(๐))
2๐+1 .
3. How must the quantum computation be altered to estimate the imaginary part ofthe trace instead?
Context (the power of one-clean-qubit): The unitary matrix ๐ acts on the 2๐-dimensional space ๐ปโ๐. A naive computation of the trace therefore requires exponentialruntime (in ๐) on a classical computer. The one-clean qubit circuit above provides analternative means to estimate this trace. It is โefficientโ, whenever ๐ can be implementedin polynomial circuit size. Such short-sized unitary matrices occur naturally in thestudy of knots and are related to evaluating certain Jones polynomials (see e.g. Shor andJordan, Estimating Jones Polynomials is a Complete Problem for One Clean Qubit).Evaluating these Jones polynomials is believed to be very difficult for classical computers(NP-hard).
The relationship between DQC1 (the official term for problems that can be solvedefficiently using one-clean-qubit-architectures) and BQP (the class of problems that canbe solved efficiently using orthodox quantum computations) is still not fully understood.
Homework IIIhand-in date: June 3rd, 2019
ACM 270-1, Spring 2019Richard Kueng and Joel TroppMay 21, 2019
1 Tensor rankLet ๐1, ๐2 denote the standard basis in F2 (F = R, or F = C) and consider the followingtensor on F2 โ F2 โ F2 that is defined in terms of its frontal slices:
๐ก1 =(
1 00 1
), ๐ก2 =
(0 1
โ1 0
).
1. Show that this tensor has rank-two over F = C.2. Show that this tensor has rank-three over F = R and argue why an improvement
to rank-two is impossible over the real numbers.
2 Strassenโs algorithmApply Strassenโs algorithm for multiplying two 2ร2 matrices recursively to compute theproduct of two 4 ร 4 matrices. Whenever possible, use linearity of the matrix productto remove superfluous contributions and verify that the final result remains correct.
3 Matrix multiplication tensors
Set{
๐๐๐ = ๐๐๐
๐๐
}: ๐ โ [๐], ๐ โ [๐] (standard basis of C๐ร๐),
{๐ ๐
๐ = ๐๐๐๐๐
}: ๐ โ
[๐], ๐ โ [๐] (standard basis of C๐ร๐), as well as{
๐๐๐ = ๐๐๐๐
๐
}: ๐ โ [๐], ๐ โ [๐]
(standard basis of C๐ร๐) and define
โจ๐, ๐, ๐โฉ =๐โ
๐=1
๐โ
๐=1
๐โ
๐=1๐๐
๐ โ ๐ ๐๐ โ ๐๐
๐ .
1. Verify that โจ๐, ๐, ๐โฉ encodes matrix multiplication as a tensor. More precisely,verify
tr1,2(โจ๐, ๐, ๐โฉ*๐ด โ ๐ต) = ๐ด๐ต โ C๐ร๐ for any ๐ด โ C๐ร๐, ๐ โ C๐ร๐.
2. Verify Biniโs identity
๐(๐1
1 โ ๐ 11 โ ๐1
1 + ๐11 โ ๐ 1
2 โ ๐21 + ๐1
2 โ ๐ 21 โ ๐1
1 + ๐12 โ ๐ 2
2 โ ๐21 + ๐2
1 โ ๐ 11 โ ๐1
2 + ๐21 โ ๐ 1
2 โ ๐22)
+๐2(๐11 โ ๐ 2
2 โ ๐21 + ๐1
1 โ ๐ 11 โ ๐1
2 + ๐12 โ ๐ 2
1 โ ๐22 + ๐2
1 โ ๐ 21 โ ๐2
2)
=(๐1
2 + ๐๐11)
โ(๐ 1
2 + ๐๐ 22)
โ ๐21
+(๐2
1 + ๐๐11)
โ ๐ 11 โ
(๐1
1 + ๐๐12)
+๐12 โ ๐1
2 โ(๐1
1 + ๐21 + ๐๐2
2)
โ๐21 โ
(๐ 1
1 + ๐ 12 + ๐๐ 2
1)
โ ๐11
+(๐1
2 + ๐21)
โ(๐ 1
2 + ๐๐ 21)
โ(๐1
1 + ๐๐22).
2
3. Relate the left hand side of Biniโs identity to the following partial matrix multipli-cation: (
๐11 ๐12๐21 0
)(๐11 ๐12๐21 ๐22
)=(
๐11 ๐12๐21 ๐22
)
4. What is the border rank of this partial matrix multiplication?
Context: The exact rank (which equals the border rank) of the full 2 ร 2 matrixmultiplication tensor is eight. The border rank associated with Biniโs partial matrixmultiplication tensor is much lower. Recursive arguments โ similar to the one we didfor Strassen โ show that this border rank identity yields the following upper bound onthe exponent of matrix multiplication: ๐ โค 3 log6 5 โ 2.70. This was the first genuineimprovement over Strassenโs algorithm for matrix multiplication.
4 Tensor trains/ Matrix product statesWe consider translationally invariant matrix product states for tensor products ๐ก โ(C๐)โ๐
with โbondโ (inner) dimension ๐ท. These are described by a single order-threetensor ๐ด โ C๐ทร๐ท โ C๐ (one ๐ท ร ๐ท matrix ๐ด๐ for each index ๐ โ [๐]). We also allowourself the freedom of including an additional ๐ท ร ๐ท matrix ๐ต at the end of the tensortrain:
๐ก =๐โ
๐1,...,๐๐ =1tr(๐ด๐1๐ด๐2 ยท ยท ยท ๐ด๐๐ ๐ต)๐๐1 โ ยท ยท ยท โ ๐๐๐ โ
(C๐)โ๐
.
For ๐ = 8, the wiring formalism of such a construction assumes the following form:
๐ก = ๐ด ๐ด ๐ด ๐ด ๐ด ๐ด ๐ด ๐ด ๐ต
1. Set ๐ = 2, ๐ท = 2, ๐ โ N arbitrary and compute the tensor train associated with
๐ด1 =(
1 00 1
), ๐ด2 =
(0 10 0
), ๐ต =
(0 11 0
).
2. Set ๐ = 2, ๐ท = 2, ๐ โ N arbitrary and compute the tensor train associated with
๐ด1 =(
1 00 1
), ๐ด2 =
(0 11 0
), ๐ต =
(1 00 0
).
3. Fix ๐ = ๐ท, as well as ๐ โ N arbitrary and compute the tensor train defined by ๐shift matrices ๐ด1, . . . , ๐ด๐ โ C๐ร๐ and ๐ต = ๐๐๐
๐1 for arbitrary ๐ โ [๐]. The shift
matrices are defined by their action on standard basis vectors: ๐ด๐๐๐ = ๐(๐โ1)โ๐
and linearly extended (see also HW1). Here, โ denotes addition modulo ๐.
3
5 Alternating least squares (ALS) algorithms for computing the CPdecomposition
The CP decomposition factorizes an order ๐ tensor ๐ก โ R๐1 โยท ยท ยทโR๐๐ into ๐ rank-onecomponents:
๐ก =๐ โ
๐=1๐๐๐
(1)๐ โ ยท ยท ยท โ ๐
(๐)๐
In the lecture, we saw an iterative algorithm that approximates a CP decomposition oforder-three tensors. Given input rank ๐ it sequentially optimized over different tensorfactors, while keeping the remaining factors fixed. Different matriciations of the originaltensor allowed for implementing each optimization efficiently.
1. Generalize this iterative algorithm to tensors of arbitrary order ๐ . Write concisepseudo-code for this procedure.
2. Do a rigorous runtime analysis. How does each iteration scale with the problemparameters ๐ , ๐max = max{๐1, . . . , ๐๐ } and ๐ ?
3. (optional) Implement your pseudo code in your favorite programming language.Apply it to the tensor from Exercise 1 and check if it can find the right rankbehavior.