Post on 16-May-2020
transcript
A central limit theorem for an omnibus embedding ofrandom dot product graphs
Keith Levin1
with Avanti Athreya2, Minh Tang2, Vince Lyzinski3 and Carey E. Priebe2
1University of Michigan, 2Johns Hopkins University, 3University of Massachusetts Amherst
November 18, 2017
Classical two-sample hypothesis testing
Well-studied in statistics (indeed, the only thing we teach undergrads?)
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 2 / 20
Graph Hypothesis Testing
Q: how to tell if two (or more) graphs are from the same distribution?
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 3 / 20
Random Dot Product Graph(RDPG; Young and Scheinerman, 2007)
Extends stochastic block model (SBM)Vertices assigned latent positions
drawn i.i.d. from d-dimensional distribution FF constrained so that 0 ≤ xT y ≤ 1 whenever x, y ∈ supp FDenote i-th latent position by Xi ∈ Rd
Edges {i, j} present or absent independently with probability XTi Xj .
Collect latent positions in rows of X ∈ Rn×d .
Warning: Non-identifiabilityModel specified only up to orthogonal rotation of latent positions.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 4 / 20
Random Dot Product Graph(RDPG; Young and Scheinerman, 2007)
Extends stochastic block model (SBM)Vertices assigned latent positions
drawn i.i.d. from d-dimensional distribution FF constrained so that 0 ≤ xT y ≤ 1 whenever x, y ∈ supp F .Denote i-th latent position by Xi
Edges {i, j} present or absent independently with probability XTi Xj .
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 5 / 20
Estimating latent positions:adjacency spectral embedding (Sussman et al, 2012)
Definition (Adjacency Spectral Embedding (ASE))
Given adjacency matrix A , embed vertices of A = USUT into Rd as rowsof X = UdS1/2
d ∈ Rn×d , where Ud denotes first d columns of U, Sd denotestruncation of S to top d eigenvalues.
Under RDPG, ∃W : max1≤i≤n ‖Xi −WXi‖ = OP(n−1/2 log n).
Lyzinski, et al (2014): ASE yields a.a.s. perfect recovery of blockmemberships in SBM
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 6 / 20
RDPG: what do we mean by same distribution?
Option 1: Test if latent positions are drawn from same distribution.
G1 positions drawn i.i.d. F1, G2 positions drawn i.i.d. F2
Test if F1 = F2
“Nonparametric” testing
Tang, Athreya, Sussman, Lyzinski and Priebe (2017)Estimate latent positions of G1 and G2 via ASE, apply maximum meandiscrepancy (Gretton et al, 2012) to ASE estimates.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 7 / 20
RDPG: what do we mean by same distribution?
Option 1: Test if latent positions are drawn from same distribution.
G1 positions drawn i.i.d. F1, G2 positions drawn i.i.d. F2
Test if F1 = F2
“Nonparametric” testing
Tang, Athreya, Sussman, Lyzinski and Priebe (2017)Estimate latent positions of G1 and G2 via ASE, apply maximum meandiscrepancy (Gretton et al, 2012) to ASE estimates.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 7 / 20
RDPG: what do we mean by same distribution?
Option 2: Test if latent positions are the same
G1 latent positions X ∈ Rn×d , G2 latent positions Y ∈ Rn×d
Test if X = YW for some unitary W .
“Semiparametric” testing
Tang, Athreya, Sussman, Lyzinski and Priebe (2015)Embed both graphs via ASE, align estimated positions via Procrustesanalysis (Gower, 1975). Reject H0 if alignment is poor, i.e., ifTProc = minW∈Ud ‖X − YW‖F is large.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 8 / 20
Challenges in semiparametric graph testing
Problem 1: Procrustes alignment introduces variance
More variance⇒ less power.
Problem 2: How to generalize to multiple-graph hypothesis testing?
Ultimately, we want something like ANOVA for graphs.
Goal: develop a technique that...1 Avoids Procrustes alignment2 Generalizes naturally to 3 or more graphs
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 9 / 20
Omnibus matrix: motivation
Definition (Omnibus matrix)Let graphs G1 and G2 be d-dimensional RDPGs with adjacency matricesA (1) and A (2). We construct an omnibus matrix for the graphs as
M =
A (1) A (1)+A (2)
2A (1)+A (2)
2 A (2)
∈ R2n×2n
Note: generalizes naturally to m graphs, with (i, j)-block (A (i) + A (j))/2.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 10 / 20
Omnibus embedding
Reminder
M =
A (1) A (1)+A (2)
2A (1)+A (2)
2 A (2)
∈ R2n×2n
Under H0, we have EA (1) = EA (2) = XXT = P = UPSPUTP
SP ∈ Rd×d diagonal, UP ∈ R
n×d orthonormal columns
EM = P =
[P PP P
]=
[UU
]SP
[UT UT
]=
[XX
] [XT XT
]= UPSPUT
P.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 11 / 20
Omnibus embedding
Under H0, we have EA (1) = EA (2) = XXT = P = UPSPUTP
SP ∈ Rd×d diagonal, UP ∈ R
n×d orthonormal columns
EM = P =
[P PP P
]=
[UU
]SP
[UT UT
]=
[XX
] [XT XT
]= UPSPUT
P.
Key pointApplying ASE to M, we get a 2n-by-d matrix,
Z =
[XY
],
X , Y ∈ Rn×d provide estimates of latent positions of G1, G2, in the samed-dimensional space without additional alignment step. Natural teststatistic given by TOmni = ‖X − Y‖F .
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 12 / 20
Main results: Notational preliminaries
In what follows, we assume the null hypothesis
So G1 and G2 have shared latent positions X ∈ Rn×d .
EA (1) = EA (2) = P = UPSPUTP = XXT ∈ Rn×n
We denote the “true latent positions” of M by
Z =
[XX
]=
[UP
UP
]S1/2
P = UPS1/2P∈ R2n×d
and their estimates by
Z = UMS1/2M =
[XY
]∈ R2n×d
where SM ∈ Rd×d is the diagonal matrix of the top d eigenvalues of M
and corresponding eigenvectors in columns of UM ∈ R2n×d .
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 13 / 20
Main results: Concentration inequality
Lemma (Uniform concentration of estimates)
Let {A (i)}mi=1 be adjacency matrices of m independent RDPGs with sharedlatent positions X = UPS1/2
P ∈ Rn×d and let M ∈ Rmn×mn be their omnibusmatrix with top eigenvalues collected in diagonal matrix SM ∈ R
d×d andcorresponding eigenvalues in the columns of UM ∈ R
mn×d . There exists aconstant C > 0 such that with high probability, there exists an orthogonalmatrix W ∈ Rd×d such that
max1≤h≤mn
‖(UMS1/2M − UPS1/2
PW)h,·‖ ≤
Cm1/2 log mn√
n.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 14 / 20
Main results: CLT
Theorem (CLT: informally)
Let {A (i)}mi=1 be adjacency matrices of m independent RDPGs with sharedlatent positions X = UPS1/2
P ∈ Rn×d drawn i.i.d. from d-dimensionaldistribution F. Let M ∈ Rmn×mn be their omnibus matrix with topeigenvalues collected in diagonal matrix SM ∈ R
d×d and correspondingeigenvalues in the columns of UM ∈ R
mn×d . Fix h = m(s − 1) + i for i ∈ [n]and s ∈ [m]. Then the error between the h-th position estimate and the(properly rotated) true h-th position is asymptotically a continuous mixtureof normals, with mixing determined by F.
n1/2(UMS1/2M − UPS1/2
PWn)h,· →
∫N(0,Σ(y))dF(y).
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 15 / 20
Main results: CLT
Theorem (CLT: More formally)
Let {A (i)}mi=1 be adjacency matrices of m independent RDPGs with sharedlatent positions X = UPS1/2
P ∈ Rn×d drawn i.i.d. from d-dimensionaldistribution F. Let M ∈ Rmn×mn be their omnibus matrix with topeigenvalues collected in diagonal matrix SM ∈ R
d×d and correspondingeigenvalues in the columns of UM ∈ R
mn×d . Let Φ(x,Σ) denote the cdf of amultivariate Gaussian with mean 0 and covariance matrix Σ. Fixh = m(s − 1) + i for i ∈ [n] and s ∈ [m]. There exists a sequence of d-by-dorthogonal matrices (Wn)∞n=1 such that for all x ∈ Rd ,
limn→∞
Pr[n1/2(UMS1/2
M − UPS1/2P
Wn)h,· ≤ x]
=
∫Φ (x,Σ(y)) dF(y),
where Σ(y) = (m + 3)∆−1Σ(y)∆−1/(4m) and
∆ = EFX1XT1 , Σ(y) = EF (yT X1 − (yT X1)2)X1XT
1 .
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 16 / 20
Experiments: hypothesis testing
●● ●●
●
●
●●● ●
●
●
0.00
0.25
0.50
0.75
1.00
0 250 500 750 1000Number of vertices (log scale)
Em
piric
al P
ower
Method●
●
Omnibus
Procrustes
(a)
●●
●
●
●
●
●●
●
●
●●
0.00
0.25
0.50
0.75
1.00
0 250 500 750 1000Number of vertices (log scale)
Em
piric
al P
ower
Method●
●
Omnibus
Procrustes
(b)
●●
●
●
● ●
●
●
●
●● ●
0.00
0.25
0.50
0.75
1.00
0 250 500 750 1000Number of vertices (log scale)
Em
piric
al P
ower
Method●
●
Omnibus
Procrustes
(c)
Figure: Power of the Procrustes-based (blue) and omnibus-based (green) tests todetect when the two graphs being testing differ in (a) one, (b) five, and (c) ten oftheir latent positions. Each point is the proportion of 1000 trials for which thegiven technique correctly rejected the null hypothesis. Error bars denote twostandard errors of this empirical mean.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 17 / 20
Experiments: estimating latent positions
●
●
● ●
●
● ● ● ● ●
●
● ●
●
●
● ● ● ● ●
●
●●
●
●
● ● ● ● ●
●
●●
●
●
● ● ● ● ●
●
●
● ●
●
● ● ● ● ●
10
20 30 50 80 100 200 300 500 8001000Number of vertices (log scale)
Mea
n S
quar
ed E
rror
(lo
g sc
ale)
Method●
●
●
●
●
Abar
ASE1
OMNI
OMNIbar
PROCbar
Figure: Mean squared error (MSE) in recovery of latent positions (up to rotation)in a 2-graph RDPG model as a function of the number of vertices for differentestimation procedures.
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 18 / 20
Future Work
Develop graph analogues of ANOVA and other multiple hypothesistesting procedures
Improve techniques for choosing critical value in omnibus test
Improve understanding of power under HA
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 19 / 20
Thanks!Full paper: https://arxiv.org/abs/1705.09355
K. Levin (U.Michigan,JHU,UMass) A CLT for omnibus embeddings November 18, 2017 20 / 20