Post on 29-Dec-2019
transcript
Beyond matrices: statistical method for higher-ordertensors and its application. II
Miaoyan Wang
University of Wisconsin, Madison
Fudan University,
July, 2019
gene expression low-rank binary tensor tensor block model Conclusions and future work
Talk outline
Prohibitive Computational Complexity
Most higher-order tensor problems are NP-hard [Hillar & Lim, 2013].
Topics I will address:
I Application of tensor decomposition to genetics
I Low-rank tensor estimation from binary data
I Multiway clustering via tensor block models
1 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Applications of tensor decomposition to biology
I Many biomedical datasets come naturally in a multiway form, e.g.,patients × genes × experimental conditions.
I For example, time-series gene expression measurements could be orga-nized as a order-3 tensor A = JagstK ∈ RN×S×T .
gene
time
individual
I g = 1, . . . ,N indexes (often thou-sands of) the genes
I s = 1, . . . ,S indexes the individu-als/samples
I t = 1, . . . ,T indexes the timepoints.
I Typically, N � ST .
Goal
Identify subsets of genes that are similarly expressed within subsets of indi-viduals, and learn the time trajectories of the associated expression level.
2 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Time-series gene expression
We simulate tensorial data A ∈ RN×S×T for time-series gene expression:
I Given a time t, the matrix slide A(·, ·, t) are transposable (i.e., boththe gene and sample indexes are permutable)⇒ simulate a “checkerboard” pattern for the latent bicluster member-ships at the initial time.
I Given a gene g and a sample s, the trajectory A(g , s, ·) is temporallystructured (i.e., the time indexes are meaningful)⇒ for each bicluster, select a time trajectory from a library of candidatefunctions, such as sigmoid, sines, cosines, etc.
time
gene
individual
timegene
individual
3 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Tensor Applications to Time-Series Gene Expression Data
I Data (denoted A): noisy time-series gene expression with shuffled geneand sample indexes.
I Goal: Identify subsets of genes that are similarly expressed within sub-sets of individuals, and learn the time trajectories of the associatedexpression level.
We propose to
1 first perform truncated CP decomposition of A via two-mode HOSVDalgorithm:
gene
individual
time
2 then perform K-mean clusterings on the gene factor matrixG = [g1, . . . , gk ] and the sample factor matrix S = [s1, . . . , sk ].
4 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Comparison to matrix-based clustering
Tensor-based clustering outperforms the matrix-based clustering.Matrix-based clustering:
I PCA on the time-averaged matrix M1 = 1T
∑Tt=1 A(·, ·, t)
I PCA on the sample-averaged matrix M2 = 1S
∑Ss=1 A(·, s, ·)
5 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Comparison to HOSVD-based clustering
In our simulations, clustering based on CP decomposition is more accuratethan that based on Tucker decomposition/HOSVD.
HOSVD:
gene
individual
time
CP decomposition:
gene
individual
time
●
●●
●
●
5 10 15 20
0.10
0.15
0.20
0.25
Classification Error Rate of Gene Clustering
Model Complexity (# of sample groups among 20 samples)
clas
sific
atio
n er
ror
rate
(C
ER
)
● CP Decomposition via Two−mode HOSVD
Tucker/HOSVD Decomposition
6 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Multi-tissue gene expression analysis
I Central dogma of genetics: DNAtranscription−−−−−−−→ RNA
translation−−−−−−→ protein.I GTEx RNA-seq data: expression profiles (∼ 25, 000 genes) measured
from 544 individuals across 53 tissues.
gene
individual
individual
gene
Genotype-tissue expression (GTEx) project, Aguet et al. Nature (2017) 550, 204-213.
7 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Review: Matrix SVD for biclustering
=
samples
fea
ture
s
I Columns of U describe patterns across samples
I Columns of VT describe patterns across genesY Kluger et al, Genome Research (2003). 13(4): 703-71
Data Science Specialization (COURSERA) by Brian Caffo and Jeff Leek
8 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Multi-tissue gene expression analysis
We have developed a semi-nonnegative tensor decomposition for three-wayclustering of multi-tissue multi-individual gene expression data.
entriwise
Wang et al., under review (2017). doi.org/10.1101/229245
9 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Performance comparison
Our algorithm identifies 3-way clusters with higher accuracy compared toexisting methods.
0.10
0.20
0.30
rela
tive
err
or
0.6 0.8 1.0 1.2 1.4
HOSVD
5 5.5 6 6.5 7
0.20
0.25
0.30
0.35
rela
tive
err
or
our method
our method
SDA: sparse decomposition of array (Hore et al. Nat. Gen. 2016).HOSVD: higher-order singular value decomposition; Tucker decomposition (Omberg et al. PNAS. 2007).
details10 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component I: shared, global expression
I Both tissue- and individual-loadings are essentially flat ⇒ this compo-nent captures global expression common to all samples.
a. Tissue Loadings (Component 1)
c. Individual Loadings (Component 1)
ranked individuals
b. Gene Loadings (Component 1)
loa
din
g
0.04
0.02
0.00
loa
din
g
0.000
0.010
0.015
loa
din
g
0.00
0.05
0.10
0.15 brain
blood
artery
esophagus
skin
heart
digestion
others
adipos
0.005
ranked genes
11 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component I: shared, global expression
I Both tissue- and individual-loadings are essentially flat ⇒ this compo-nent captures global expression common to all samples.
a. Tissue Loadings (Component 1)
c. Individual Loadings (Component 1)
ranked individuals
b. Gene Loadings (Component 1)
loa
din
g
0.04
0.02
0.00
loa
din
g
0.000
0.010
0.015
loa
din
g
0.00
0.05
0.10
0.15 brain
blood
artery
esophagus
skin
heart
digestion
others
adipos
0.005
ranked genes
11 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component I: shared, global expression
I Top genes are mainly mitochondrial genes (15/20 top genes); othernon-mitochondrial genes include ACTB, EEF1A1, and EEF2.
loa
din
g
0.000
0.015
0 20 40 60
2.5e−11
5.0e−11
7.5e−11
1.0e−10
SRP-dependent cotraslational protein targeting to membrane
d. Enriched gene ontologies (GOs) among top genes (Component 1)
cotraslational protein targeting to membrane
protein targeting to endoplasmic reticulum
ribosomal large subunit biogenesis
cytoplasmic translation
top 200 genes
Benjamini-Hochberg
adjusted pvalue
0.010
Number of top genes in the GO
b. Gene Loadings (Component 1)
0.005
12 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component II: brain tissues
• Tissue vector clearly separates brain tissues from non-brain tissues.
a. Tissue Loadings (Component 2)
c. Individual Loadings (Component 2)
ranked individuals
loa
din
g
0.04
0.02
0.00
loa
din
g
-0.03
0.00
0.03
load
ing
0.00
0.10
0.20
0.30brain
blood
artery
esophagus
skin
heart
digestion
adipos
b. Gene Loadings (Component 2)
ranked genes
13 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component II: brain tissues
• Tissue vector clearly separates brain tissues from non-brain tissues.
a. Tissue Loadings (Component 2)
c. Individual Loadings (Component 2)
ranked individuals
loa
din
g
0.04
0.02
0.00
loa
din
g
-0.03
0.00
0.03
load
ing
0.00
0.10
0.20
0.30brain
blood
artery
esophagus
skin
heart
digestion
adipos
b. Gene Loadings (Component 2)
ranked genes
details
13 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component II: brain tissues
I Enriched GOs are mostly related to glutamate receptor signaling path-way and chemical synaptic transmission.
0 10 20 30
glutamate receptor signaling pathway
d. Enriched GOs among top genes (Component 2)
chemical synaptic transmission, postsynpatic
excitotary postsynaptic potential
memory
synaptic vesicle exocytosis
3e−14
6e−14
9e−14
B-H corrected p-value
Number of top genes in the GO
loa
din
g
-0.03
0.00
0.03 top 899 genes
b. Gene Loadings (Component 2)
ranked genes
14 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Component II: brain tissues
I Age explains more variation (24.4%) than gender (0.3%) or ethnicity(4.3%).
d.
c. Individual Loadings (Component 2)
ranked individuals
loa
din
g0.04
0.02
0.00
I How to pinpoint the age-related genes?
15 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Tensor projection to detect differentially-expressed genes
I We incorporate information across tissues by projecting the tensorthrough the tissue-vector.
I For a given test gene, we perform the following regression analysis
A(test gene, ·,Ti ) = Wβ0 + Xβ1 + ε,
where W is the covariate, and X is the primary variable of interest, andε ∼ N(0, I). Consider H0 : β1 = 0 vs. Hα : β1 6= 0.
indivdiuals
ge
ne
s
tissues
gene to be tested
The projected gene-by-individual matrix:
16 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Power to detect differentially-expressed genes
I Using tensor-projection procedures, we identify 694 age-related genes,of which only 514 were detected by separate analyses.
I Receiver operating characteristic (ROC) plot from simulated expressiondata with age-related genes.
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Tru
e p
ositi
ve r
ate
0.0
0.2
0.4
0.6
0.8
1.0
single tissue ( = 10)
Tensor projection with
r = 10 = # of tissues
r = 5
r = 3 = # of tissue groups
17 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Gene expression in the brain
I We analyze 13 brain tissues (“subtensor”) and apply our tensor methodto this subtensor.
I Expression modules are spatially restricted to specific brain regions.
18 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Gene expression in the brain
I We analyze 13 brain tissues (“subtensor”) and apply our tensor methodto this subtensor.
I Expression modules are spatially restricted to specific brain regions.
loa
din
g 0.6
0.4
0.2
0.0
0.8
0.4
0.0
loa
din
g 0.8
0.4
0.0
ranked tissues
loa
din
g
0.4
0.2
0.0
ranked tissues
loa
din
g
Tissue-vector (comp 3)
Tissue-vector (comp 5)
Cerebellum Cortex Basal ganglia
Others (Hippocampus, Amygdala, Spinal cord, etc)
Tissue-vector (comp 4)
ranked tissues
Tissue-vector (comp 2)Recall: i tensor component, i = 1, 2, ... th
ranked tissues
18 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Gene expression in the brain
Table 1: Three-way clustering analysis of gene expression in the brain
Tissue Gene Individual
enriched region enriched ontologyvariance explained by
age gender ethnicity
cerebellum dorsal spinal cord development 0.0% 8.0% 0.2%cortex behavior defense response 16.7% 0.6% 1.4%
basal ganglia forebrain generation of neurons 1.3% 0.8% 1.7%others embryonic skeletal system morphogenesis 10.5% 0.7% 5.2%
Red number indicates p-value < 0.001.
19 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Power to detect differentially-expressed genes
loa
din
g
Tissue-vector Top genes
Tensor component 8
0.6
0.4
0.2
0.0
IL1RL1, SERPINA3
etc
variance explained
age 22.1%
gender 0.6%
ethnicity 1.9%
Cerebellum Cortex Basal ganglia Others
Individual-vector
ranked tissues
I The 2nd top gene SERPINA3 is implicated in Alzheimer’s disease [Kam-
boh et al, Neurobiol Aging 2006].
I This gene has a moderate age effect in each single tissue (p rangingfrom 0.05 to 1.1× 10−6 with all effects in the same direction).
I Tensor projection method yields p = 1.0× 10−8 for the age effect ofthis gene ⇒ an increase of significance by two orders of magnitude.
20 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Outline
I Application of tensor decomposition to genetics
I Low-rank tensor estimation from binary data
I Multiway clustering via tensor block models
20 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Review: Matrix SVD
=
samples
fea
ture
s
I Columns of U describe patterns across samples
I Columns of VT describe patterns across genes
21 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Tensor rank decomposition
I Tensor analogue of matrix SVD:
Y =R∑
r=1
λrar ⊗ br⊗cr ,
where λ1 ≥ λ2 ≥ . . . ≥ λR , ‖ar‖2 = ‖br‖2 = ‖cr‖2 = 1.
I Example: a rank-3 order-3 tensor.
+ +=
22 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Low-rank approximation
I A noisy rank-R tensor:
Y︸︷︷︸observation
=R∑
r=1
λrar ⊗ br ⊗ cr︸ ︷︷ ︸signal
+ E︸︷︷︸noise
,
I Assume that R is known and the error E = JεijkK ∼ i.i.d. N(0, σ2).I Maximizing the likelihood under the i.i.d. Gaussian error model is equiv-
alent to minimizing the squared loss:
minar ,br ,cr ,λr :r∈[R]
‖Y −R∑
r=1
λrar ⊗ br ⊗ cr‖2F
I CANDECOMP/PARAFAC (CP) tensor approximation (Carroll & Chang, 1970,
Kolda & Bader, 2009, Casey et al, 2018) 23 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Bibliography dataset in DBLP
I Three-way bibliography relationships (author, conference, keyword) ex-tracted from DBLP database.
I A tensor entry is 1 if the triplet co-occurs in a bibliography entry.I Could be organized into a tensor of the form Y ∈ {0, 1}10k×200×10k
(10k authors, 200 conferences, and 10k keywords)Conferences
24 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Latent variable models for binary tensors
Directly applying rank-R approximation to Y = JYijkK is sub-optimal. why?(symmetry w.r.t. 0↔ 1; non-negativity; non-linearity)
"Preference" tensor
Binary tensor
I Θ is unknown. Θ has low rank.I π : R 7→ [0, 1] is a known function (e.g., the logistic curve).I Θ ∈ Rd1×d2×d3, Y ∈ {0, 1}d1×d2×d3 .
25 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Probabilistic models for binary tensors
Latent variable formulation:
I We observe Y = JYijkK = {0, 1}d1×d2×d3 .
I Assume an underlying continuous-valued tensor Z = JZijkK:
Yijk |Zijk =
{1 if Zijk ≥ 0,
0 if Zijk < 0.
I The latent tensor Z is further modeled as:
Z = Θ︸︷︷︸signal
+ E︸︷︷︸noise
, where Rank(Θ) ≤ R,
and the entries in E = JεijkK follow i.i.d. N(0, σ2).
I Rank (Θ) = min{R ∈ N+ : Θ =∑R
r=1 a1r ⊗ · · · ⊗ ak
r }.I Assume R is known or otherwise could be estimated (e.g., using BIC).
26 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Connection with GLM
More generally, we can model the binary tensor Y as
Y = JYijkK ∼ind. Bernoulli(π(Θ)), where Rank(Θ) ≤ R,
and π : R 7→ [0, 1] is the link function (we allow π to be applied to tensorsin a point-wise manner):
I Logistic link: π(θ) = 11+e−θ/σ
⇐⇒ latent noise E follows i.i.d. logisticdistribution with scale parameter σ.
I Probit link: π(θ) = Φ(θ/σ), where Φ(·) is the CDF for standard normaldistribution ⇐⇒ latent noise E follows i.i.d. N(0, σ2).
27 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Low-rank tensor estimation
I Goal: estimate low-rank Θ ∈ Rd1×d2×d3 and/or π(Θ) ∈ [0, 1]d1×d2×d3 .
I Data: Y ∈ {0, 1}d1×d2×d3 .
Log-likelihood:
LY(Θ) =∑ijk
(1(Yijk=1) log π(θijk) + 1(Yijk=0) log (1− π(θijk))
).
We propose to estimate Θ using the constrained MLE:
ΘMLE = arg maxΘ∈DLY(Θ).
where D(R, α) ={
Θ ∈ Rd1×d2×d3 : rank(Θ) ≤ R, ‖Θ‖∞ ≤ α}
.
28 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Convergence rate
Define Loss (Θ1,Θ2) = 1√∏k dk‖Θ1 −Θ2‖F , where Θ1,Θ2 ∈ Rd1×···×dK .
Error bound for constrained MLE (W. and Li, 2019)
Assume that Θtrue ∈ D(R, α) and regularity conditions on the link functionπ. Then with high probability,
Loss(
ΘMLE,Θtrue)≤ C
Lαγα
√log(K )RK−1
∑k dk∏
k dk,
where
Lα := sup|θ|≤α
π(θ)
π(θ)(1− π(θ)), γα = inf
|θ|≤α
(π2(θ)
π2(θ)− π(θ)
π(θ)
)> 0.
Is this bound tight?
29 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Special case: Probit link
Lα,σ ≈ασ + 1
σ, γα,σ ≈
ασ + 1
6
σ2e−α
2/σ2.
Theorem (W. and Li, 2019)
I Upper bound for constrained MLE: w.h.p.
Loss(
ΘMLE,Θtrue
)≤ min
{2α,Cσ
(α + σ
6α + σ
)exp
(α2
σ2
)√dmax∏k dk
}.
I Lower bound for any estimator:
infΘ
supΘtrue∈D(R,α)
ELoss(Θ,Θtrue) ≥ min
{α, σC ′
√dmax∏k dk
}.
I The constrained-MLE is rate-optimal in terms of {dk} in the class oflow-rank tensors.
I Tensor vs. “best” matricization: O(d−K+1) vs O(d−bK/2c).
30 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Phase Diagram
I Define signal-to-noise ratio (SNR) = Θtrue
σ = ασ .
SNR � O(1) O(1) & SNR� O(d−(K−1)/2) O(d−(K−1)/2) & SNR
Binary tensor σeα2/σ2
d−(K−1)/2 ↘ σd−(K−1)/2 ↗ α
Continuous-valued tensor σd−(K−1)/2 σd−(K−1)/2 α
Table: Error bound for estimating Θtrue ∈ (Rd )⊗K from noisy observations.
I Noise helps in the high SNR regime! (Davenport et al 2014, Cai-Zhou 2013.)
''noise helps'' region
"noise hurts" region
31 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Noise helps!
Intuition: If σ = 0, we cannot recover the magnitude of the underlyingtensor Θ ∈ Rd1×···×dK from the observed binary tensor Y ∈ {0, 1}d1×···×dK .
I Consider the rank-1 probit model without noise.
Yijk = 1{θijk>0}, Θ = a ⊗ a ⊗ a.
I Cannot distinguish a = (a1, . . . , ad)T and a = (sign(a1), . . . , sign(ad))T .
I In contrary to many statistical problems: noise is essential for recoveringΘ.
I Our constrained MLE based on binary observations achieves the samedegree of accuracy as if it were given access to the completely unquan-tized measurements.
32 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Algorithm
Alternating optimization
maxΘ=JθijkK
LY(Θ) =∑ijk
log π (qijkθijk) ,
subject to Θ =R∑
r=1
ar ⊗ br ⊗ cr , Θ ≤ α,
where qijk = 2Yijk − 1 taking values -1 or 1 and π(·) is the link function.
I Denote by A = [a1, . . . , aR ], B = [b1, . . . ,bR ] and C = [c1, . . . , cR ].I Objective function is multi-convex in A,B and C .I Update At+1 ← maxA LY(A; Bt ,C t); solved by GLM.I Modify At+1 ← γAt + (1−γ)At+1 subject to infinity-norm constraint;
1-dimension optimization over γ ∈ [0, 1].
33 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Empirical performance
Interplay between statistical efficiency and computational cost
Let M(t) = [A(t),B(t),C (t)] denote a sequence of estimators generatedfrom alternating GLM Algorithm. Under certain conditions∗ on the initialpoint M(0) and the true point Mtrue, we have, with very high probability,
Loss(
Θ(M(t)),Θtrue
)≤ C1ρ
tε︸ ︷︷ ︸algorithmic error
+C2Lαγα
√RK−1
∑k dk∏
k dk︸ ︷︷ ︸statistical error
,
where ρ ∈ (0, 1) is a contraction parameter, and C1,C2 > 0 are two con-stants.
Stopping criterium:
t ≥ T � log1/ρ
{d (k−1)/2
}.
34 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
A long but fun journey
The spectral analysis for higher-order tensors is much more dedicated thanthe matrix analysis.
I Norm landscape over all possible tensor unfoldings (W. et al, LAA’17)
I Perturbation analysis of Higher-order SVD algorithm (W. and Song,AISTATS’17)
I Exponential-valued tensor analysis (W and Li’19, Zeng and W.’19)
I Regularity on decomposition algorithm (W. et al’, AOAS’19)
35 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Simulation
I Rank selection via BIC:σ = 0.1 σ = 0.01
True rank d = 20 d = 40 d = 60 d = 20 d = 40 d = 60
R = 10 8.7 (0.9) 10 (0) 10 (0) 8.8 (0.4) 10 (0) 10 (0)
R = 40 36.8 (1.1) 39.6(1.7) 40.2 (0.4) 36.0 (1.2) 38.8(1.6) 40.3 (1.1)
I Comparison with alternative methods
BooleanTF (Boolean tensor factorization, Miettinen 2011, Rukat et al 2018); BTF Bayesian
(Rai et al 2014); BTF Gradient (Hong et al 2018)36 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Experiments with real data
I Nations (Nickel et al., 2011). 14× 14× 56 binary tensor consisting of56 relations among 14 countries.
I Human connectome project (HCP) (Wang et al., 2017a). 68×68×212 binary tensor consisting of structural connectivity patterns between68 brain regions among 212 individuals.
I Enron (Zhe et al., 2016). 581 × 124 × 48 binary tensor depictingthree-way relationships (sender, receiver, time) from the Enron emaildataset.
I Kinship (Nickel et al., 2011). 104× 104× 26 binary tensor consistingof 26 types of relations among 104 individuals.
37 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Political relationships in 1950-1965
I 56 types of relationships among 14 countries between 1950-1965.
I Multi-relational data can be organized as a tensor Y ∈ {0, 1}14×14×56.A tensor entry is 1 if the relation holds between two countries.
I Relations: “sends tourists to”, “exports books to”, “joint membershipof NGOs”, “conferences”, etc.
I Countries: USSR, Poland, China, UK, Brazil, etc.
38 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Estimates from binary tensor decomposition
39 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Human connectome project (HCP)
I Structural connectivity patterns among 68 brain regions for 212 indi-viduals from HCP.
I Brain images were parcellated to 68 regions-of-interest following theDesikan atlas (Desikan et al 2006).
I Strong spatial patterns in the brain connectivity. e.g. edges capturedby tensor component 2 are located within the cerebral hemisphere.
I Nodes with high connectivity intensity: superior frontal gyrus (sensorysystem), corpus callosum (commissural tract).
Component 6
40 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Comparison with continuous-valued tensor decomposition
I Binary tensor decomposition has prediction performance compared toclassical CP tensor decomposition.
I Link prediction from 5-fold cross-validation:
tensor decomposition methoddataset non-zeros binary (logistic link) continuous-valued
AUC MSE AUC MSEHPC 35.3% 0.9860 1.3× 10−3 0.9314 1.4× 10−2
Nations 21.1% 0.9169 1.1× 10−2 0.8619 2.2× 10−2
Kinship 3.8% 0.9708 1.2× 10−4 0.9436 1.4× 10−3
Enron 0.01% 0.9432 6.4× 10−3 0.7956 6.3× 10−5
AUC: area under the receiver operating characteristic (ROC) curve;
MSE: mean squared error
41 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Outline
I Application of tensor decomposition to genetics
I Low-rank tensor estimation from binary data
I Multiway clustering via tensor block models
41 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Tensor block model
I In many applications, the data tensors are often expected to have un-derlying block structure.
I Examples of tensor block model (TBM):
Input Output Input Output
Figure: (a) Our TBM method is used for multiway clustering and for revealing the underlyingcheckerbox structure in a noisy tensor. (b) The sparse TBM method is used for detecting sub-tensors of elevated means.
Related work:I Low-rank estimation: block structure implies low-rankness, but directly ap-
plying low-rank estimation to a block tensors yields an inferior estimator.I Multiway clustering: typically two steps, by first estimating a low-rank repre-
sentation, and then performing clustering on the resulting factors.42 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Tensor block model
I Tensor block model on Y = Jyi1,...,iK K ∈ Rd1×···×dK :
I Suppose the tensor entry yi1,...,iK belongs to the block determined bythe rkth cluster in the mode k for rk ∈ [Rk ], then
yi1,...,iK = cr1,...,rK + εi1,...,iK , for (i1, . . . , iK ) ∈ [d1]× · · · × [dK ],
I Can be viewed as a super sparse Tucker model:
Y = C ×1 M1 ×2 · · · ×K MK + E ,
where C ∈ RR1×···×RK is a core tensor consisting of block means, Mk ∈{0, 1}Rk×dk is a membership matrix for mode k , and E is the sub-Gaussian (σ) noise tensor.
I Special cases: Gaussian tensor block model (real tensors), Stochastictensor block model (binary tensors, σ = 1/4)
43 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Identifiability
Irreducible core
The core tensor C is irreducible if it cannot be written as a block tensor withthe number of mode-k clusters smaller than Rk , for any k ∈ [K ].
I K = 2: C has no two identical rows and no two identical columns.
I General K : none of order-(K -1) fibers of C are identical.
I Irreducibility is a weaker assumption than full-rankness.
Identifiability
Under the irreducibility assumption, the factor matrices Mk ’s are identifiableup to permutations of cluster labels.
I Recall that Tucker and many other factor analyses suffer from rotationalinvariance.
44 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Least-square estimation
I We propose a least-square estimation:
Θ = arg minΘ∈P
{−2〈Y,Θ〉+ ‖Θ‖2
F
}= arg min
Θ∈P‖Y −Θ‖2
F .
where the parameter space P is
P ={
Θ ∈ Rd1×···×dK : Θ = C ×1 M1 ×2 · · · ×K MK ,with some
membership matrices Mk ’s and a core tensor C ∈ RR1×···×RK}.
I Algorithm: alternating optimization with suitable initialization.
45 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Convergence rate (Zeng and W., 2019)
Let Θ be the least-square estimator of under tensor block model. There exists twoconstants C1,C2 > 0 such that, with w.h.p.,
Loss2(Θtrue, Θ) ≤ C1σ2∏
k dk(
∏k
Rk︸ ︷︷ ︸estimating block means
+∑k
dk logRk︸ ︷︷ ︸estimating block allocations
).
In the special case d = d1 = . . . = dK :
Our rate Tucker rate (Zhang & Xia’ 18) Lasso rate (Chi et al ’18)
O(d logR) O(dR) O(dK−1)
Partition consistency (Zeng and W., 2019)
Under mild separation assumption∗ on block means, there exist permutation ma-trices Pk ’s such that, the misclassification rate (MCR) satisfies∑
k
MCR(Mk ,PkMtrue)→ 0, in probability.
∗block-mean gap = 1‖C‖F
minrk ,r′k,I−k|fiber(Crk ,I−k
)− fiber(Cr′k,I−k
)| ≥ O(d) for all k ∈ [K ].46 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Extension to sparse estimation
I In many large-scale applications, not every block in a data tensor is ofequal importance.
I For example, in the genome-wise expression data analysis, only a fewentries represent the signals while the majority come from the back-ground noise
I We propose the regularized least-square estimation:
Θsparse = arg minΘ∈P
{‖Y −Θ‖2
F + λ ‖‖ Cρ},
where ρ = 1 (lasso), ρ = 0 (subset sparse), or ρ = Frobenius (ridge),among many others.
I Sparse estimation incurs slight changes to the previous Algorithm.
Input Output Input Output
47 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Simulation
I Simulate order-3 Gaussian block tensors with d1 logR1 ≈ d2 logR2 ≈d3 logR3.
I Performance assessment by root mean squared error (RMSE):
Figure: (a) Average RMSE against d1. (b) Average RMSE against rescaled samplesize N =
√d2d3/ logR1.
48 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Comparison with alternative methods
We compare our tensor block model (TBM) with two popular low-ranktensor estimation: (i) CP decomposition, and (ii) Tucker decomposition.
I Non-sparse case
TBM
I Sparse case
Sparsity (ρ) Noise (σ) Penalization (λ) Estimated Sparsity Rate Correct Zero Rate Sparsity Error Rate
0.5 4λ = 0 0(0) 0(0) 0.49(0.03)
λ = 136.4 0.56(0.04) 0.99(0.02) 0.06(0.03)
0.5 8λ = 0 0(0) 0(0) 0.49(0.03)
λ = 439.7 0.59(0.05) 0.99(0.01) 0.14(0.06)
0.8 8λ = 0 0(0) 0(0) 0.80(0.05)
λ = 241.3 0.83(0.06) 0.95(0.04) 0.12(0.06)
49 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
Conclusions and future work
I We have developed a framework of statistical models, scalable algo-rithms, and statistical theory to analyze tensor-valued data.
I The general strategy is to carve out a broad range of specially-structuredtensors that are useful in practice, and to develop efficient algorithmsfor analyzing these high-dimensional tensor data.
I Other structures are also possible, e.g. non-negativity (M. et al, AOAS2019, in press) and smoothness (on-going).
I Extend the modeling for second-order structure in tensors, e.g. tensornormal model with factorized covariance
E ∼ N(0,Φ1 ⊗ Φ2 ⊗ Φ3).
Tensor analogy of matrix normal models (W. et al, PNAS 2018).
50 / 51
gene expression low-rank binary tensor tensor block model Conclusions and future work
References:
I Yuchen Zeng and M. Wang. Multiway clustering via tensor block model. Under Review inNIPS. (2019).
I M. Wang and L. Li. Learning from binary multiway data: probabilistic tensor decompositionand its statistical optimality. Under Review in JMLR. (2019).
I M. Wang, J. Fischer, and Y. S. Song. Three-way clustering of multi-tissue multi-individualgene expression data using semi-nonnegative tensor tensor decomposition. Annals of Ap-plied Statistics. (2019), Vol. 13, No. 2, 1124-1148.
I M. Wang et al. Two-way Mixed-Effects Methods for Joint Association Analyses Using BothHost and Pathogen Genomes. PNAS. Vol. 115 (24), E5440-E5449, (2018)
I M. Wang and Y. S. Song. Tensor decomposition via two-mode higher-order SVD (HOSVD).Journal of Machine Learning Research W&CP (AISTATS track), Vol. 54, (2017) 614-622.
I M. Wang, K. Dao Duc, J. Fischer, and Y. S. Song. Operator norm inequalities betweentensor unfoldings on the partition lattice. Linear Algebra and its Applications, Vol. 520(2017) 44-66.
Thank you!
51 / 51
1 / 6
2 / 6
3 / 6
4 / 6
5 / 6
6 / 6