Tensors and graphical models
Mariya Ishteva with Haesun Park, Le Song
Dept. ELEC, VUB Georgia Tech, USA
INMA Seminar, May 7, 2013, LLN
Outline
Tensors
Random variables and graphical models
Tractable representations
Structure learning
2
Tensors
RM×N×P
3
Ranks
• Multilinear rank (R1,R2,R3)
• Rank-R
Rank-1 tensor:
R = min(r), s.t. A =r∑
i=1
{rank-1 tensor}i
4
Matrix representations of tensors
Mode-1
A = A(1) =
• Mode-2
• Mode-3
• Multilinear rank: (rank(A(1)), rank(A(2)), rank(A(3)))
5
Tensor-matrix multiplication
• Tensor-matrix product
• Contraction A ∈ RI×J×M B ∈ R
K×L×M
C = 〈A, B〉3 C(i , j , k , l) =M∑
m=1
aijmbklm
4th order tensorC ∈ R
I×J×K×L
6
Basic decompositions
Singular value decomposition (SVD)
MLSVD / HOSVD
CP / CANDECOMP / PARAFAC
7
Outline
Tensors
Random variables and graphical models
Tractable representations
Structure learning
8
Discrete random variables
• Random variable
X ; 1, . . . , nPx(1), . . . , Px(n) Px ∈ R
n, Rn+, [0, 1]
• X1,X2; P(X1,X2) P12 ∈ Rn×n
1 · · · n1 P12(1, 1) · · · P12(1, n)...
n P12(n, 1) · · · P12(n, n)
• P(x1, x2) := P(X1 = x1,X2 = x2)
9
2 random variables
X1,X2; P(X1,X2) P12 ∈ Rn×n
X1 ⊥ X2
P(x1, x2) = P(x1)P(x2)rank-1 matrix
=
H
X1 X2
P(x1, x2) =∑
h
P(x1|h)P(x2|h)P(h)
low-rank matrixrank-k matrix, k < n
=
Conditional probability tables (CPTs) P(X1|H),P(X2|H)
10
3 random variablesX1,X2,X3; P(X1,X2,X3) P123 ∈ R
n×n×n
X1,X2,X3 independent
P(x1, x2, x3) = P(x1)P(x2)P(x3)
rank-1 tensor
=
H
X1 X2 X3
rank-k tensor, k < n
=
= · · ·
P(x1, x2, x3) =∑
h
P(x1|h)P(x2|h)P(x3|h)P(h)
11
4 random variables
• X1,X2,X3,X4; P(X1,X2,X3,X4) P1234 ∈ Rn×n×n×n
• X1,X2,X3,X4 independent
• H
X1 X2 X3 X4
P(x1, x2, x3, x4) =∑
h
P(x1|h)P(x2|h)P(x3|h)P(x4|h)P(h)
• more variables
• more hidden variables
12
Challenges
• 10 variables, 10 states each −→ 1010 entries
• We need tractable representations• Latent variable models / low-rank factors• # parameters: exponential −→ polynomial
H
X1 X1 X
X1 X1 X1
• Challenges:• Choose a good representation X
• Learn the correct structure X
• Estimate the parameters ×
13
Outline
Tensors
Random variables and graphical models
Tractable representations
Structure learning
14
Tensors and graphical models
CP / CANDECOMP / PARAFACH
X1 X2 Xn· · ·
Tensor trainH1 H2 H3 Hn
X1 X2 X3 Xn
· · ·
HMM
Hierarchical Tucker
H
X1 X1 X
X1 X1 X1 Latent tree model
Tucker / MLSVDBlock term decomposition
×
15
Tensor train (TT) decomposition
A(i1,...,id )=∑
α0,...,αd
G1(α0, i1, α1)G2(α1, i2, α2) . . .Gd(αd−1, id , αd )
[I. V. Oseledets, SIAM J. Scientific Computing, 2011]
• Avoids curse of dimensionality• Small number of parameters, compared to Tucker model• Slightly more parameters than CP but more stable• Gk (αk−1, nk , αk ) has dimensions rk−1 × nk × rk , r0 = rd = 1• rk are called compression ranks:
Ak = Ak (i1, . . . , ik ; ik+1, . . . , id ), rank(Ak ) = rk
• Computation based on SVD• Computation: top → bottom
H1 H2 H3 Hn
X1 X2 X3 Xn
· · ·
16
Hierarchical Tucker decomposition
[L. Grasedyck, SIMAX, 2010]
• Similar properties as TT decomposition• Computation: bottom → top
H
X1 X1 X
X1 X1 X1
17
Potential advantages of tensor approach
• Real data are often multi-way
• Provides higher-level view
• Flexibility: different ranks in each mode: Tucker
• Uniqueness: CP, Block term decomposition
• No curse of dimensionality: Tensor train, hierarch. Tucker
18
Outline
Tensors
Random variables and graphical models
Tractable representations
Structure learning
19
Structure learning
• Given: (samples of) observed variables
• Assumption: the variables can be connected via hiddenvariables in a tree structure in a meaningful way
• Find: the tree / the relationships between the variables
• Additional difficulty: unknown number of hidden states
H
H H
X X X X
X3 X5 X2 X1 X X1 X1 X1
X1 X1
?
20
Quartet relationships: topologies
X1
X2
X3
X4
H G
X1
X3
X2
X4
H G
X1
X4
X2
X3
H G
P(x1, x2, x3, x4) =∑
h,g
P(x1|h)P(x2|h)P(h, g)P(x3|g)P(x4|g)
21
Building trees based on quartet relationships
Choose 3 variables and form a tree
Add all other variables, one by one
• Split the current tree into 3 subtrees• Choose 3 variables from different subtrees• Resolve the quartet relation with current and chosen variables• Insert the current variable in a subtree or connect to the tree
[For simplicity, assume each latent variable has 3 neighbors]
22
Tensor view of quartets
X1
X2
X3
X4
H G
P(X1,X2,X3,X4) =
P1|H
P2|H
IH PHG IG
P4|G
P3|G
A = reshape(P,n2,n2);
B = reshape(permute(P, [1,3,2,4]),n2,n2);
C = reshape(permute(P, [1,4,2,3]),n2,n2).
Notation: P1|H , P2|H , etc. stand for P(X1|H), P(X2|H), etc.
23
Rank properties of matrix representations
A =
(
( (
(P2|H P1|H PHG P4|G P3|G⊤
B =
(( (
(P3|G P1|H diag(PHG(:)) P4|G P2|H⊤
• rank(A) = rank(PHG) = krank(B) = rank(C) = nnz(PHG)
rank(A) ≪ rank(B) = rank(C)
• Sampling noise Nuclear norm relaxation
‖A‖∗ =∑n2
i=1 σi(A)
24
Resolving quartet relations
Algorithm 1 i∗ = Quartet(X1, X2, X3, X4)
1: Estimate P(X1,X2,X3,X4) from a set of m i.i.d. samples.2: Unfold P into matrices A, B and C, and compute
a1 = ‖A‖∗, a2 = ‖B‖∗ and a3 = ‖C‖∗.
3: Return i∗ = arg min i∈{1,2,3}ai .
• Easy to compute
• Recovery conditions
• Finite sample guarantees
• Agnostic to the number of hidden states
• Compares favorably to alternatives
25
Example: stock data
Given: stock prices (25 years, discretized into 10 values)
Find: relations between stocks
Finance:• C (Citigroup)• JPM (JPMorgan Chase)• AXP (American Express)• F (Ford Motor: Automotive and Financial Services)
Retailers:• TGT (Target)• WMT (WalMart)• RSH (RadioShack)
26
Conclusions
• Tensor decompositions are related to graphical models
• A common goal: tractable representations
• Tensors can be used for structure learning
27