+ All Categories
Home > Documents > Beyond matrices: statistical method for higher-order...

Beyond matrices: statistical method for higher-order...

Date post: 31-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
77
Beyond matrices: statistical method for higher-order tensors and its application. I Miaoyan Wang Department of Statistics University of Wisconsin-Madison Fudan University July, 2019
Transcript
Page 1: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Beyond matrices: statistical method for higher-ordertensors and its application. I

Miaoyan Wang

Department of StatisticsUniversity of Wisconsin-Madison

Fudan UniversityJuly, 2019

Page 2: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Introduction: session aim

This session focuses on statistical machine learning methods for tensorand matrix analysis. We aim to cover:

I Spectral theory for higher-order tensors

I PCA and population structure inference

I Structured tensor decomposition and its statistical properties

I Application of tensor decomposition to genetics

I Low-rank tensor estimation from binary data

I Multiway clustering via tensor block models

Page 3: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Introduction: about me

I Assistant professor in Statistics at University of Wis-

consin Madison, USA

I Past experiences:

I Postdoc in Computer Science at UC BerkeleyI Simons Math + Biology visitor at University of

PennsylvaniaI PhD in Statistics at UChicagoI B.S in Pure and Applied Mathematics, Fudan

University

Page 4: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

My research

Statistical machine learning:

I structured tensor decomposition, optimization

Numerical analysis:

I functional properties of higher-order tensors

Biological applications:

I statistical genetics, complex traits, gene expression analysis

Page 5: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Introduction: resources

Importantly, the class site ishttp://www.stat.wisc.edu/~miaoyan/tensor.html.

I PDF copies of slides

I Datasets needed for exercises

I Links to software packages

Page 6: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

A successful story: PCA of Europeans

000201100000111110000000...

000011000000120110000000...

002001110120010100110111...

000000000111210100101110...

110110111011110120001001...

SNPs

ind

ivid

ua

ls

1,389 samples, ~ 200k SNPs

Novembre et al. (2008)

Page 7: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Matrix methods are powerful, however...

All Gaussian except points 17 and 39.left: matrix PCA; right: principal components of kurtosis.Figure credit: Jason Morton and Lek-Heng Lim (2009, 2015 Rasmus Bro).

Page 8: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

What is a tensor?

I Tensors are generalizations of vectors and matrices:

I An order-k tensor A = Jai1... ikK ∈ Fd1×···×dk is a hypermatrix withdimensions (d1, . . . , dk) and entries ai1...ik ∈ F.

I This talk will focus on F = R or {0, 1}.I We focus on tensor of order 3 or greater, also known as higher-order

tensors.

Page 9: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Tensors in statistical modeling

“Tensors are the new matrices” that tie together a wide range of areas:

I Longitudinal social network data {Yt : t = 1, ..., n}I Spatio-temporal transcriptome data

I Joint probability table of a set of variables P(X1, X2, X3)

I Higher-order moments in topic models

I Markov models for the phylogenetic tree K1,3

Liu, Yuan, and Zhao 2017, Hoff 2015, Montanari-Richard 2014

Anandkumar 2014, Mossel et al 2004, McCullagh 1987

Page 10: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Tensors in genomics

I Many biomedical datasets come naturally in a multiway form.

I Multi-tissue multi-individual gene expression measures could be orga-nized as a multiarray dataset A = JagitK ∈ RnG×nI×nT .

normalization

imputation

Multi-way Clustering

To identify subsets of genes that are similarly expressed within subsets ofindividuals and tissues, we seek local blocks in the expression tensor.

Page 11: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Tensors in scientific computing

Tensor algebra software speeds big-data analysis 100-fold (Science Daily).

I Deep learning frameworks: tensorflow / torch / theano

Page 12: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Talk outline

Prohibitive Computational Complexity

Most higher-order tensor problems are NP-hard [Hillar & Lim, 2013].

Topics I will address:

I PCA and population structure inference

I Spectral theory for higher-order tensors

I Structured tensor decomposition and its statistical properties

Page 13: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

A successful story: PCA of Europeans

000201100000111110000000...

000011000000120110000000...

002001110120010100110111...

000000000111210100101110...

110110111011110120001001...

SNPs

ind

ivid

ua

ls

1,389 samples, ~ 200k SNPs

Novembre et al. (2008)

Page 14: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Background: Population structure

I Many organisms (humans, Arabidopsis) spread across the world manythousand years ago.

I Migration and genetic drift led to genetic diversity between groups.

Page 15: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Population structure inferences

I Inference on genetic ancestry differences among individuals from dif-ferent populations, or population structure, has been motivated by avariety of applications:

I population geneticsI genetic association studiesI personalized medicineI forensics

I Advancements in genotyping technologies have largely facilitated theinvestigation of genetic diversity at remarkably high levels of detail.

I A variety of methods have been proposed for the identification of geneticancestry differences among individuals in a sample using high-densitygenome-screen data.

Page 16: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Inferring Population Structure with PCA

I Principal Components Analysis (PCA) is the most widely used approachfor identifying and adjusting for ancestry difference among sample indi-viduals

I PCA applied to genotype data can be used to calculate principal com-ponents (PCs) that explain differences among the sample individualsin the genetic data

I The top PCs are viewed as continuous axes of variation that reflectgenetic variation due to ancestry in the sample.

I PCA is an unsupervised learning tool for dimension reduction in multi-variate analysis.

Page 17: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Data structure

I Sample of n individuals, indexed by i = 1, 2, . . . , n.

I Genome screen data on m genetic autosomal markers, indexed by ` =1, 2, . . . ,m.

I At each marker, for each individual, we have a genotype value xi`.

I Here we consider bi-allelic SNP data, so xi` takes values 0, 1, or 2,corresponding to the number of reference alleles.

I We center and standardize these genotype values:

zi` =xi` − 2p`√2p`(1− p`)

,

where p` is an estimate of the reference allele frequency for marker l.

Page 18: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Genetic Correlation Estimation

I Create an n × m matrix, Z, of centered and standardized genotypevalues, and from this, a genetic correlation matrix (GRM):

Φ =1

mZZT

I Φij is an estimate of the genome-wide average genetic correlation be-tween individuals i and j.

I PCA relies on individuals from the same ancestral population being moregenetically correlated than individuals from different ancestral popula-tions.

Page 19: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Standard Principal Components Analysis (PCA)

I PCA is performed by obtaining the eigen-decomposition Φ.

I Top eigenvectors (PCs) are used as surrogates for population structure.

I Orthogonal axes of variation, i.e. linear combinations of SNPs, thatbest explain the genotypic variability amongst the n sample individualsare identified.

I Individuals with “similar” values for a particular top principal componenttend to have “similar” ancestry.

Page 20: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

PCA of Europeans

An application of principal components to genetic data from Europeansamples showed that the first two principal components computed using200K SNPs could map their country of origin accurately.

000201100000111110000000...

000011000000120110000000...

002001110120010100110111...

000000000111210100101110...

110110111011110120001001...

SNPs

ind

ivid

ua

ls

1,389 samples, ~ 200k SNPs

Novembre et al. (2008)

Page 21: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Population structure among Arabidopsis (host) sample

12ptAn application of PCA to genetic data from 1001 Arabidopsis projectlargely captures the geographical origins of the Arabidposis accessions:I US vs. EuropeanI Smaller regional groups among European accessions

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

PC 1

PC

2

A

−0.3 −0.2 −0.1 0.0 0.1

−0.

2−

0.1

0.0

0.1

0.2

●●

●●

●●

●●●

●●

● ●

PC 2

PC

3

−0.2 −0.1 0.0 0.1 0.2

−0.

2−

0.1

0.0

0.1

B

Page 22: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Population structure among pathogen sample

We develop a method for genetic correlation matrix (GRM) estimationusing both mutation and deletion polymorphisms. [PNAS. Vol. 115 (24), 2018.]

I GRM can be used for clustering analysis.I Xanthomonas sample exhibits strong population stratification.

Page 23: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

HapMap ASW and MXL Ancestry

I Genome-screen data on 150,872 autosomal SNPs was used to estimateancestry

I Estimated genome-wide ancestry proportions of every individual usingthe ADMIXTURE (Alexander et al., 2009) software

I A supervised analysis was conducted using genotype data from the fol-lowing reference population samples for three “ancestral” populations

I HapMap YRI for West African ancestryI HapMap CEU samples for northern and western European ancestryI HGDP Native American samples for Native American ancestry

Page 24: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Conomos, Matthew P et al. Genetic epidemiology 39.4 (2015): 276-293.

Page 25: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Figure source: SISG 2017. Timothy Thornton and Michael Wu.

Page 26: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Table source: SISG 2017. Timothy Thornton and Michael Wu.

Page 27: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Outline

PCA and population structure inference

Spectral theory for higher-order tensors

Structured tensor decomposition and its statistical properties

Page 28: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Tensor spectral norm

I Order-k tensor as k-linear functional A : Rd1 × · · · ×Rdk → R given by

〈A, x1 ⊗ · · · ⊗ xk︸ ︷︷ ︸rank-1 tensor

〉 =∑

i1,...,ik

ai1...ikx(1)i1· · ·x(k)ik ,

where xn = (x(n)1 , . . . , x

(n)dn

)T ∈ Rdn , n ∈ [k].

I Spectral norm. Determine the value of

‖A‖2 = max‖xi‖2=1,xi∈Rdi

〈A,x1 ⊗ · · · ⊗ xk〉.

I Finding ‖A‖2 is closely related to the best rank-1 tensor approximation.

Key question

Can we provide polynomial-time computable bounds for ‖A‖2?

Page 29: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Unfolding

I Matricization. Rearrange the slices of the tensor in different modesinto a matrix.

Unfoldπ(A) Partition π ∈ P[3]

∈ Rd2×d1d3 π = {{2}, {1, 3}}

∈ Rd1×d2d3 π = {{1}, {2, 3}}

∈ Rd3×d1d2 π = {{3}, {1, 2}}

I General Unfolding. The set of all possible unfoldings of an order-ktensor is in one-to-one correspondence with the set P[k] of all partitionsof [k] := {1, . . . , k}.

I For π = {B1, . . . , B`} ∈ P[k], Unfoldπ(A) is obtained by combiningthe modes in each block Bn into a single mode.

Page 30: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Unfolding

I Matricization. Rearrange the slices of the tensor in different modesinto a matrix.

Unfoldπ(A) Partition π ∈ P[3]

∈ Rd2×d1d3 π = {{2}, {1, 3}}

∈ Rd1×d2d3 π = {{1}, {2, 3}}

∈ Rd3×d1d2 π = {{3}, {1, 2}}

I General Unfolding. The set of all possible unfoldings of an order-ktensor is in one-to-one correspondence with the set P[k] of all partitionsof [k] := {1, . . . , k}.

I For π = {B1, . . . , B`} ∈ P[k], Unfoldπ(A) is obtained by combiningthe modes in each block Bn into a single mode.

Page 31: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Partition lattice

Partial order on P[k]π3 ≤ π1 if π3 is a refinement of π1.

Example: k = 4

π3 ≤ π1, while π1 and π2 are not comparable.

I The set of all partitions of [k] with two blocks ↔ all matricizations.

Page 32: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Norm inequalities between arbitrary two unfoldings

Theorem (W. et al., 2017a)

Let A ∈ Rd1×···×dk be an arbitrary order-k tensor, and π1, π2 any two

partitions in P[k]. Define dim(A) =k∏

n=1dn. Then,

√dimA(π1, π2)

dim(A)‖Unfoldπ1(A)‖2 ≤ ‖Unfoldπ2(A)‖2

≤√

dim(A)

dimA(π2, π1)‖Unfoldπ1(A)‖2 .

Given A ∈ Rd1×···×dk , we define the map dimA : P[k] × P[k] → N+ as

dimA(π1, π2) =∏

B∈π1

[maxB′∈π2

( ∏

n∈B∩B′dn

)], where π1, π2 ∈ P[k].

Wang et al., Linear Algebra and its Applications, Vol. 520 (2017) 44-66.

Page 33: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Corollaries

Bottom-up inequality

Let A ∈ Rd×···×d be an order-k tensor. Define P`[k] = the set of all

partitions of [k] with ` blocks. Then, for all 1 ≤ ` ≤ k,1

d(k−`)/2maxπ∈P`

[k]

‖Unfoldπ(A)‖2 ≤ ‖A‖2 ≤ minπ∈P`

[k]

‖Unfoldπ(A)‖2 .

with

Page 34: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Corollaries

Frobenius norm vs. spectral norm‖A‖F = maxπ∈P[k]

‖Unfoldπ(A)‖2 , ‖A‖2 = minπ∈P[k]‖Unfoldπ(A)‖2 ,

‖A‖F ≤[ ∏

n dnmaxn∈[k] dn

]1/2‖A‖2 .

This bound improves over the recent result found by Friedland and Lim [Lemma 5.1,

2016], namely, ‖A‖F ≤(∏

n dn)1/2 ‖A‖2.

Page 35: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Orthogonal decomposability

I In general the unfolding operation may change the spectral norm by upto a poly(d) factor.

I How about specially-structured tensors?

Page 36: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Orthogonal decomposability

I In general the unfolding operation may change the spectral norm by upto a poly(d) factor.

I How about specially-structured tensors?

Definition (Orthogonally decomposable)

A tensor A ∈ Rd1×···×dk is called orthogonal decomposable, or 0[k]-OD, ifit admits the decomposition

A = λ1a(1)1 ⊗ a

(1)2 ⊗ · · · ⊗ a

(1)k + · · ·+ λra

(r)1 ⊗ a

(r)2 ⊗ · · · ⊗ a

(r)k ,

where the set of vectors {a(n)i } satisfies

〈a(n)i , a

(m)i 〉 = δnm, for all n,m ∈ [r].

Page 37: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Orthogonal decomposability

I In general the unfolding operation may change the spectral norm by upto a poly(d) factor.

I How about specially-structured tensors?

Definition (π-orthogonally decomposable)

A tensor A ∈ Rd1×···×dk is called π-orthogonally decomposable, or π-OD,if it admits the decomposition

A = λ1 a(1)1 ⊗ a

(1)2︸ ︷︷ ︸

B1

⊗ · · ·⊗a(1)k︸ ︷︷ ︸

B`

+ · · ·+ λr a(r)1 ⊗ a

(r)2︸ ︷︷ ︸

B1

⊗ · · ·⊗a(r)k︸ ︷︷ ︸

B`

,

where the set of vectors {a(n)i } satisfies

〈⊗i∈Ba(n)i , ⊗i∈Ba(m)

i 〉 = δnm,

for all B ∈ π and all n,m ∈ [r].

Page 38: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

π-OD tensors and norm-preserving cones

Suppose A is a π-OD tensor and define cdef= ‖A‖2.

I ‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ def= {τ : τ ≥ π} ∪ {τ : τ ≤ π}\1[k]

I π-OD implies π′-OD for all π′ ≥ π. ⇒‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ1 ∪ · · · ∪ Cπs ,

where π1, . . . , πs are matricizations obtained by merging blocks of π.I Can obtain sharper bounds for spectral norms.

Page 39: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

π-OD tensors and norm-preserving cones

Suppose A is a π-OD tensor and define cdef= ‖A‖2.

I ‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ def= {τ : τ ≥ π} ∪ {τ : τ ≤ π}\1[k]

I π-OD implies π′-OD for all π′ ≥ π. ⇒‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ1 ∪ · · · ∪ Cπs ,

where π1, . . . , πs are matricizations obtained by merging blocks of π.I Can obtain sharper bounds for spectral norms.

Page 40: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

π-OD tensors and norm-preserving cones

Suppose A is a π-OD tensor and define cdef= ‖A‖2.

I ‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ def= {τ : τ ≥ π} ∪ {τ : τ ≤ π}\1[k]

I π-OD implies π′-OD for all π′ ≥ π. ⇒‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ1 ∪ · · · ∪ Cπs ,

where π1, . . . , πs are matricizations obtained by merging blocks of π.

I Can obtain sharper bounds for spectral norms.

Page 41: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

π-OD tensors and norm-preserving cones

Suppose A is a π-OD tensor and define cdef= ‖A‖2.

I ‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ def= {τ : τ ≥ π} ∪ {τ : τ ≤ π}\1[k]

I π-OD implies π′-OD for all π′ ≥ π. ⇒‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ1 ∪ · · · ∪ Cπs ,

where π1, . . . , πs are matricizations obtained by merging blocks of π.I Can obtain sharper bounds for spectral norms.

Page 42: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

π-OD tensors and norm-preserving cones

Suppose A is a π-OD tensor and define cdef= ‖A‖2.

I ‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ def= {τ : τ ≥ π} ∪ {τ : τ ≤ π}\1[k]

I π-OD implies π′-OD for all π′ ≥ π. ⇒‖Unfoldτ (A)‖2 = c for all τ ∈ Cπ1 ∪ · · · ∪ Cπs ,

where π1, . . . , πs are matricizations obtained by merging blocks of π.I Can obtain sharper bounds for spectral norms.

Page 43: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Outline

I PCA and population structure inference

I Spectral theory for higher-order tensors

I Structured tensor decomposition and its statistical properties

Page 44: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Review of matrix eigendecomposition

Matrix perturbation theorem (Davis–Kahan 1970)

Let A and E be symmetric matrices, and A = A+ E. Let ui, ui denotethe ith eigenvectors of A and A, respectively. Then

sin Θ(ui, ui) ≤2 ‖E‖2

minj 6=i |λj − λi|.

+

=

=

+ +

+

I Does there exist a tensor analogue of matrix eigendecomposition? Howabout perturbation analysis?

Page 45: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Review of matrix eigendecomposition

Matrix perturbation theorem (Davis–Kahan 1970)

Let A and E be symmetric matrices, and A = A+ E. Let ui, ui denotethe ith eigenvectors of A and A, respectively. Then

sin Θ(ui, ui) ≤2 ‖E‖2

minj 6=i |λj − λi|.

+

=

=

+ +

+

I Does there exist a tensor analogue of matrix eigendecomposition? Howabout perturbation analysis?

Page 46: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Symmetric tensors

Definition (Symmetric tensors)

A tensor A = Jai1...ikK ∈ Rd1×···×dk is called symmetric if d1 = · · · = dkand

ai1i2...ik = aσ(i1)σ(i2)...σ(ik),

for all permutations σ of [k].

I By the spectral theorem, every symmetric matrix A admits an eigende-composition,

A = λ1u⊗21 + λ2u

⊗22 + · · ·+ λru

⊗2r .

I Does not hold for general symmetric tensors.

example

Page 47: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

SOD tensors

I A tensor A is called symmetric and orthogonally decomposable (SOD) if

A =

r∑

i=1

λiu⊗ki ,

where {ui} are orthonormal vectors in Rd and {λi} are non-zero scalars.

I For example, k = 3 and r = 3:

+ +=

I Kruskal’s theorem implies that {ui} is unique even in the case of de-generate λis.

I Eigen-components of a 3rd cumulant tensor are closely related to pa-rameter estimation in latent variable models [Anandkumar et al 2014].

Page 48: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Tensor decomposition

I Nearly SOD tensors:

A =

r∑

i=1

λiu⊗ki + E ,

where E ∈ Rd×···×d is a symmetric but otherwise arbitrary tensor with‖E‖2 ≤ ε.

I For example, k = 3 and r = 3:

+ +=

+

Key question

Can we recover the vectors {ui} from the noisy observation A?

Page 49: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Decomposition of SOD tensors: noiseless case

I The structure of A =∑r

i=1 λiu⊗ki implies a common eigenspace for all

matrix slices.

=

+ +=

I Is it possible to recover {ui}i∈[r] using the left singular vectors of the1-mode unfolding, A(1)(2...k)?

Page 50: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Decomposition of SOD tensors: noiseless case

I The structure of A =∑r

i=1 λiu⊗ki implies a common eigenspace for all

matrix slices.

=

+ +=

I Is it possible to recover {ui}i∈[r] using the left singular vectors of the1-mode unfolding, A(1)(2...k)?

Page 51: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Decomposition of SOD tensors: noiseless case

I The structure of A =∑r

i=1 λiu⊗ki implies the one-mode unfolding

A(1)(2...k) =∑r

i=1 λiui Vec(u⊗k−1i )T :

+ +

= + +

=

...

I Is it possible to recover {ui}i∈[r] using the left singular vectors of the1-mode unfolding, A(1)(2...k)?

Page 52: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Matrix vs. tensor decompositions

+

=

=

+ +

+

Caveats:

I A rank r > 1 matrix can be decomposed in multiple ways as a sum oforder-product terms in the case of degenerate λis.

I Kruskal’s theorem guarantees that the set of vectors {ui}i∈[r] of anSOD tensor is unique up to signs even when some λis are degenerate.

Page 53: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Matrix vs. tensor decompositions

+

=

=

+ +

+

Caveats:

I A rank r > 1 matrix can be decomposed in multiple ways as a sum oforder-product terms in the case of degenerate λis.

I Kruskal’s theorem guarantees that the set of vectors {ui}i∈[r] of anSOD tensor is unique up to signs even when some λis are degenerate.

Page 54: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Two-mode HOSVD via rank-1 matrix pursuit

Key idea: Instead of A(1)(2...k), we consider the two-mode unfolding of A.

Two-mode unfolding

A(12)(3...k) is a d2 × dk−2 matrix obtained by grouping the first 2 indices as therow index and the remaining (k − 2) indices as the column index.

Page 55: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Two-mode HOSVD via rank-1 matrix pursuit

Key idea: Instead of A(1)(2...k), we consider the two-mode unfolding of A.

Two-mode unfolding

A(12)(3...k) is a d2 × dk−2 matrix obtained by grouping the first 2 indicesas the row index and the remaining (k − 2) indices as the column index.

Page 56: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Our results

Given an order-k nearly SOD tensor ∈ Rd×···×d,

A =

r∑

i=1

λiu⊗ki + E , where ‖E‖2 ≤ ε.

Goal: recover {ui} from A.

I Noiseless case:Every rank-1 matrix in the left singular space of A(12)(3...k) is (up to ascalar) the Kronecker square of some robust tensor eigenvector ui.

I Noisy case:If ε/|λ|min - d−(k−2)/2, we can recover {ui} up to error O(ε) in poly-normial time.

Wang, M. and Song, Y.S., Journal of Machine Learning Research W&CP, Vol. 54 (2017) 614-622.

details

Page 57: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Comparison of tensor decomposition algorithms

I The error bound in tensor decomposition does not depend on the eigen-value gap ⇒ more stable than matrix decomposition.

MethodNoise threshold Recovery accuracy

(ε/|λ|min ≤) (‖ui − ui‖2 ≤)

Power iterationO(d−1) for order 3 8ε

λi(Anandkumar et al, 2014)

Joint diagonalization–

2ε√‖λ‖1λmax

λ2i+ o(ε)

(Kuleshov et al, 2015)

Our method O(d−1/2) for order 3 2ελi

+ o(ε)(W. and Song 2017b) O(d−(k−2)/2) for order k

Wang, M. and Song, Y.S., Journal of Machine Learning Research W&CP, Vol. 54 (2017) 614-622.

Page 58: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Numerical experiments

Our method achieves a higher estimation accuracy and performsfavorably as the order increases.

a b (c)

TPM: tensor power method, JMLR 2014OJD: Orthogonal joint diagonalization, AISTATS 2015

Page 59: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Numerical experiments

Our method has better convergence performance compared with otherdecomposition methods.

I Order-3:

I Order-4:

TPM: tensor power method, JMLR 2014OJD: Orthogonal joint diagonalization, AISTATS 2015

Page 60: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Conclusions

I We see a successful application of matrix spectral method in revealinglatent structure in genetics data.

I We establish a full picture of the norm landscape over all possible un-foldings, providing the mathematical foundations of tensor algorithms.

I We propose a new tensor decomposition algorithm that provably handlesa great level of noise while achieving high recovery accuracy.

Page 61: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Future work

Keywords: higher-order tensor, genomics data, randomized algorithm.

I Developing statistical tools for large-scale genomics data:

I Integrative analysis of multiple omics datasets;

I Spatial-temporal gene expression analysis;

I Single-cell RNA-seq gene expression studies.

I Developing random tensor theory for probabilistic algorithms:

I Preliminary results: Gaussian random tensors and array normal distribu-tion; concentration properties;

I Open problems: tensor analogy of Tracy-Widom law for the top eigen-value? Bernstein-type inequality?

Page 62: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

Introduction Population inference Tensor spectral norm Tensor decomposition Conclusions

Publications:

I M. Wang and Y. S. Song. Tensor Decomposition via two-mode higher-orderSVD (HOSVD). Journal of Machine Learning Research W&CP (AISTATStrack), Vol. 54, (2017) 614-622.

I M. Wang, K. Dao Duc, J. Fischer, and Y. S. Song. Operator norm inequalitiesbetween tensor unfoldings on the partition lattice. Linear Algebra and itsApplications, Vol. 520 (2017) 44-66.

I M. Wang, J. Fischer, and Y. S. Song. Three-way clustering of multi-tissuemulti-individual gene expression data via semi-nonnegative tensor decomposi-tion. Annals of Applied Statistics. (2019), Vol. 13, No. 2, 1124-1148.

Page 63: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Example

A symmetric tensor but not orthogonally decomposable:

A(:, :, 1) =

[2 11 1

],

A(:, :, 2) =

[1 11 1

].

back

Page 64: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Orthogonality

Definition (π-orthogonally decomposable)

A tensor A ∈ Rd1×···×dk is called π-OD with partition π = {B1, . . . , B`}if it admits the decomposition

A = λ1 a(1)1 ⊗ a

(1)2︸ ︷︷ ︸

B1

⊗ · · ·⊗a(1)k︸ ︷︷ ︸

B`

+ · · ·+ λr a(r)1 ⊗ a

(r)2︸ ︷︷ ︸

B1

⊗ · · ·⊗a(r)k︸ ︷︷ ︸

B`

,

where the set of vectors {a(n)i } satisfies

〈⊗i∈Ba(n)i , ⊗i∈Ba(m)

i 〉 = δnm,

for all B ∈ π and all n,m ∈ [r].

Page 65: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Unfolding of an order-k tensor

I General Unfolding. The set of all possible unfoldings of an order-ktensor is in one-to-one correspondence with the set P[k] of all partitionsof [k] = {1, . . . , k}.

I For π = {B1, . . . , B`} ∈ P[k], Unfoldπ(A) is obtained by combiningthe modes in each block Bn into a single mode.

Example. An order-4 tensor A = Jai1i2i3i4K ∈ R2×2×2×2 with ai1i2i3i4 ={1 if i1 = i2 = i3 = i4

0 otherwisecan be matricized into

I 2× 23 matrix: Unfold[1|234](A) =

[1 0 0 0 0 0 0 00 0 0 0 0 0 0 1

].

I 22 × 22 matrix: Unfold[12|34](A) =

1 0 0 00 0 0 00 0 0 00 0 0 1

.

back

Page 66: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Definition (Inner product)

For any two tensors A = Jai1... ikK, B = Jbi1... ikK ∈ Rd1×···×dk of identicalorder and dimensions, their inner product is defined as

〈A, B〉 =∑

i1,...,ik

ai1...ikbi1...ik .

The tensor Frobenius norm of A is defined as ‖A‖F =√〈A, A〉.

Page 67: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Norm inequalities between any two tensor unfoldings

Given A ∈ Rd1×···×dk , we define the map dimA : P[k] × P[k] → N+ as :

dimA(π1, π2) =∏B∈π1

[maxB′∈π2

( ∏n∈B∩B′

dn

)], where π1, π2 ∈ P[k].

Theorem (p-norm inequalities)

Let A ∈ Rd1×···×dk be an arbitrary order-k tensor, and π1, π2 any two partitions in P[k].

Define dim(A) =∏ki=1 di. Then,

(a) For any 1 ≤ p ≤ 2,

[dimA]−1/p

[dimA(π1, π2)]−1/2

‖Unfoldπ1 (A)‖p ≤ ‖ Unfoldπ2 (A)‖p ≤[dim(A)]1/p

[dimA(π2, π1)]1/2‖Unfoldπ1 (A)‖p .

(b) For any 2 ≤ p ≤ ∞,

[dim(A)]1p−1

[dimA(π1, π2)]−1/2

‖Unfoldπ1 (A)‖p ≤ ‖ Unfoldπ2 (A)‖p ≤[dim(A)]1−

1p

[dimA(π2, π1)]1/2‖Unfoldπ1 (A)‖p .

Page 68: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Two-mode HOSVD algorithm for tensors with noise

Rank-1 matrices in LS(r) are sufficient to find {ui}.I Define the two-mode left singular space byLS(r) def

= Span{ai ∈ Rd2: ai is the ith left singular vector of (12)(3...k)}.

I Look for “nearly” rank-1 matrix M in the linear space LS(r):maximizeM∈Rd×d

‖M‖2 ,

subject to M ∈ LS(r) and ‖M‖F = 1.

Justification of the optimization: ‖M‖2 ≤ ‖M‖F ≤√

rank M ‖M‖2.

I Apply eigendecomposition on the matrix M to recover ui.

back

Page 69: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Exact Recovery for SOD Tensors in the noiseless case

Optimization to recover the desired factors {ui} of A:

maximizeM∈Rd×d

‖M‖2 ,

subject to M ∈ LS0 and ‖M‖F = 1. (1)

Theorem (W. and Song, 2017)

The optimization problem (1) has exactly r pairs of local maximizers{±M∗

i : i ∈ [r]}. Furthermore, they satisfy the following three proper-ties:

1 ‖M∗i ‖2 = 1 for all i ∈ [r].

2

∣∣∣〈Vec(M∗i ), Vec(M∗

j )〉∣∣∣ = δij for all i, j ∈ [r], where 〈·, ·〉 denotes the

inner product.

3 There exists a permutation π on [r] such that M∗i = ±u⊗2π(i) for all

i ∈ [r].

back

Page 70: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Two-mode HOSVD algorithm for tensors with noise

Optimization to recover the desired factors {ui} of :

maximizeM∈Rd×d

‖M‖2 ,

subject to M ∈ LSr and ‖M‖F = 1.

Algorithm 1 Two-mode HOSVD

Input: Noisy tensor T where T =∑ri=1 λiu

⊗ki + E , number of factors r.

Output: r pairs of estimators (ui, λi).

1: Reshape the tensor T into a d2-by-dk−2 matrix T(12)(3...k);2: Find the top r left singular vectors of T(12)(3...k), denoted {a1, . . . ,ar};3: Initialize LS(r) = Span{ai : i ∈ [r]};4: for i=1 to r do5: Solve Mi = argmax

M∈LS(r),‖M‖F=1

‖M‖σ and ui = argmaxu∈Sd−1

|uTMiu|;

6: Update Mi ← T(1)(2)(3...k)(I, I,Vec(u⊗(k−2)i )) and ui ← argmaxu∈Sd−1

|uTMiu|;

7: Return (ui, λi)← (ui, T (ui, . . . , ui));8: Set LS(r) ← LS(r) ∩

[Vec(u⊗2i )

]⊥;

9: end for

Two-Mode HOSVD

Nearly Rank-1 Matrix

Post-Processing

Deflation

1

back

Page 71: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Theorem (W. and Song, 2017b)

Let =∑ri=1 λiu

⊗ki +E ∈ Rd×···×d, where {ui}i∈[r] are orthonormal vectors, λi >

0 for all i ∈ [r], and ‖E‖2 ≤ ε. Suppose ε ≤ |λ|min/[c0d

(k−2)/2], where c0 > 0

is a sufficiently large constant that does not depend on d. Let {(ui, λi)}i∈[r] bethe output of Algorithm 1 for inputs and r. Then, there exists a permutation πon [r] such that for all i ∈ [r],

Loss(ui,uπ(i)) ≤2ε

λπ(i)+ o(ε), Loss(λi, λπ(i)) ≤ 2ε+ o(ε),

and ∥∥∥∥∥−r∑

i=1

λiu⊗ki

∥∥∥∥∥2

≤ Cε+ o(ε),

where C = C(k) > 0 is a constant that only depends on k.

For two unit vectors a, b ∈ Rd, define

Loss(a, b) = min(‖a− b‖2 , ‖a+ b‖2

).

If a, b are two scalars in R, we define Loss(a, b) = min (|a− b|, |a+ b|) . back

Page 72: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Run time comparison

Complexity (for order-3 tensors):

I TPM (Anandkumar et al., 2014): O(d3M) per iteration, where M isthe number of restarts.

I OJD (Kuleshov et al., 2015): O(d3L) per iteration, where L is thenumber of projected matrices.

I Our method (W. and Song 2017b): O(d3) per iteration.

Simulation study: decompose A ∈ R18000×500×40 into 10 components.

I SDA (Hore et al., 2016): 73,989 seconds (∼ 20.1 hrs)

I HOSVD (Ombert et al., 2007): 5,849 seconds

I Our method (W. el al., 2017c): 6,047 seconds (∼ 1.7 hrs)

Page 73: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Theorem

Let A ∈ ⊗kRd be an order-k dim-d random tensor with i.i.d. standardGaussian entries, then

d1/2 < E ‖A‖2 < kd1/2.

Further, ‖A‖2 concentrates tightly around its expectation. Namely, forany s ≥ 0,

P(∣∣ ‖A‖2 − E ‖A‖2

∣∣ ≥ s) ≤ 2e−s2/2.

With little modification, the above result can be generalized to order-k,dimensional-(d1, . . . , dk) tensors. Specifically, we have

√dmax < E ‖A‖2 <

k∑

i=1

√di.

This implies, ‖A‖2 � Op(√dmax) asymptotically for large d and fixed k.

Page 74: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Theorem (Non-Asymptotic Chain)

Let A ∈ ⊗kRd be an order-k dim-d random tensor with i.i.d. standardGaussian entries. Then for any d ≥ 4 and k ≥ 2,

E ‖Mat1(A)‖2 > E ‖Mat2(A)‖2 > · · · > E∥∥Matbk/2c(A)

∥∥2.

Further, for any 1 ≤ p ≤ bk/2c,

d(k−p)/2 < E ‖Matp(A)‖ < d(k−p)/2 + dp/2.

The following inequality chain holds almost surely as d→∞ at any fixedk:

‖Mat1(A)‖2 > ‖Mat2(A)‖2 > · · · >∥∥Matbk/2c(A)

∥∥2,

Further, for any 1 ≤ p ≤ bk/2c,‖Matp(A)‖2 →a.s. (1 + 1{p=k−p})d

(k−p)/2 as d→∞.

Page 75: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

3050

7090

1−mode 2−mode 3−mode 4−mode 5−mode

Multi−Mode Flattening of Random Tensor

Ope

rato

r N

orm

dp 2 + d(k−p) 2

150

200

250

1−mode 2−mode 3−mode 4−mode 5−mode

Multi−Mode Flattening of Random Symmetric Tensor

Ope

rato

r N

orm

Page 76: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Proposition (lp-norm vs. lq-norm)

Let A ∈ Rd1×···×dk be an order-k tensor and suppose q ≥ p ≥ 1. Then,

‖A‖p ≤ ‖A‖q ≤ dim(A)1p− 1q ‖A‖p .

Page 77: Beyond matrices: statistical method for higher-order ...pages.stat.wisc.edu/~miaoyan/tensor_tutorial_fudan.pdf · I Tensors are generalizations of vectors and matrices: I An order-k

More

Theorem (lp-norm inequalities)

Let A ∈ Rd1×···×dk be an arbitrary order-k tensor, and π1, π2 any twopartitions in P[k].For any 1 ≤ p ≤ 2,

[dim(A)]−1/p

[dimA(π1, π2)]−1/2 ‖Unfoldπ1(A)‖p ≤ ‖Unfoldπ2(A)‖p

≤ [dim(A)]1/p

[dimA(π2, π1)]1/2‖Unfoldπ1(A)‖p .

For any 2 ≤ p ≤ ∞,

[dim(A)]1p−1

[dimA(π1, π2)]−1/2 ‖Unfoldπ1(A)‖p ≤ ‖Unfoldπ2(A)‖p

≤ [dim(A)]1− 1

p

[dimA(π2, π1)]1/2‖Unfoldπ1(A)‖p .


Recommended