Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
High Dimensional Separable Representations forStatistical Estimation and Controlled Sensing
Theodoros Tsiligkaridis†
†Dept. of Electrical Engineering and Computer ScienceUniversity of Michigan, Ann Arbor
Ph.D. Thesis PresentationDecember 11, 2013
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Motivation
I Separable approximations effective dimensionality reductiontechniques for high dimensional problems.
I Covariance estimation: reduced computational complexity &improved estimation accuracy. Statistical estimation performance forseparable models in high dimensions? Model mismatch?
I Centralized controlled sensing leads to great performance gains atthe expense of query design. Separable approximations to optimaljoint policy? Performance degradation?
I Controlled sensing over a network of greedy agents. Separablerepresentation of information state? Separable representation ofpolicy? Convergence?
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Application: Spatiotemporal Signal Processing
Figure : U-component of wind speed as a function of time andlatitude/longitude for year 2008. (Source: National Centers for EnvironmentalPrediction, NOAA)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Application: Centralized Active Multisensor TargetLocalization
Fusion Center
S1
S2
S3
An(1)
An(2)
An(3)
Target X*
Yn+1(1)
Yn+1(2)
Yn+1(3)
Figure : Illustration of basic centralized collaborative tracking system.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Application: Decentralized Active Multisensor TargetLocalization
S1
S2
S3
An(1)
An(2)
An(3)
Target X*S4
S5
An(4)
An(5)
Figure : Illustration of basic decentralized collaborative tracking system.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Impact
I Engineering: collaborative on-road vehicle-recognition & tracking,optimization & design of active sensing systems (e.g., frequency agileradar, multicamera object tracking with PTZ cameras), conditionson network structure for successful aggregation of information indecentralized settings, human-in-the-loop decision making
I Signal Processing & Control: covariance decompositions formultidimensional data with theoretical guarantees, centralized &decentralized collaborative estimation with active queries,non-Bayesian social learning with active queries over finite networksleads to global consistency, decentralized stochastic search
I Social Sciences: social learning & opinion dynamics, adaptivetesting, recommendation systems, multitask learning, interviewdesign
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Contributions of Thesis
1. Performance bounds for high-dimensional Kronecker-productstructured covariance matrix estimation
2. Optimal query design for a centralized collaborative controlledsensing system for target localization
3. Global convergence theory for decentralized collaborative controlledsensing for target localization
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Kronecker Graphical Lasso
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Mathematical setting
Observed d × n random matrix:
Z =
z1,1 · · · z1,n
.... . .
...zd,1 · · · zd,n
= [z1, . . . , zn]
Each column of Z is an independent realization of Gaussian randomvector
z = [z1, . . . , zd ]T
Of interest: estimate the d × d inverse covariance (precision) matrix of z(and the covariance matrix)
Θ = Σ−1, Σ = cov(z) = E [zzT ]
Gaussian graphical models: activity recognition, gene expressionnetworks, social networks, multiple financial time series.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Gaussian Graphical Models
Consider a random vector measurement Z ∈ Rd . Joint probabilitydistribution of d measurements can be represented as an undirectedgraph G = (V, E). Edge (i , j) /∈ E iff Zi and Zj are conditionallyindependent given all the other variables.
I If Z is a Gaussian random vector, conditional independencerelationships between variables are encoded in precision matrix(Lauritzen [1996]). Thus, estimating the Gaussian graphical modelis equivalent to estimating the precision matrix.
I Sparse GGM equivalent to sparse precision matrix.
Define sparsity parameter:
sΘ0 = card ({(i , j) : [Θ0]i,j 6= 0, i 6= j})
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Sparse inverse covariance matrices and associatedgraphical models
Figure : Left: inverse correlation matrix. Right: associated graphical model(Wiesel et al. [2010])
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work
I Many more unknown parameters (d(d + 1)/2) than measurements(n).
I Sample covariance matrix Sn = 1n
∑nt=1 ztzTt is poor estimator of Σ:
I Large eigenvalue spread in high dimensional regime (Karoui [2008]).I Estimation of eigenvectors of the SCM becomes impossible if the
ratio n/d is below a critical threshold (Paul [2007], Rao et al.[2008]).
I Regularize:I Parametric models: Toeplitz, AR, ARMA (Bickel and Levina [2008],
Huang et al. [2006], Cai et al. [2012]).I Sparse structured (inverse) covariance: Graphical lasso (Yuan and
Lin [2007])I Kronecker structured covariance: Flip-Flop Kronecker covariance
estimator (Werner et al. [2008])
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Kronecker product model for covariance matrix
Figure : A saturated model with 18× 18 covariance matrix has18*(18+1)/2=171 unknown covariance parameters. A Kronecker productcovariance model reduces number of parameters to 6 + 21 = 27 unknowncovariance parameters.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Sparse Kronecker product model for covariance matrix
Figure : A sparse Kronecker product covariance model reduces number ofparameters from 65 to 16 unknown covariance parameters.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Applications of KP Covariance
I geostatistics (Cressie [1993], Genton [2007])
I genomics (Yin and Li [2012])
I multi-task learning (Bonilla et al. [2008])
I face recognition (Zhang and Schneider [2010])
I recommendation systems (Allen and Tibshirani [2010])
I collaborative filtering (Yu et al. [2009])
I MIMO wireless communications (Werner and Jansson [2007])
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Problem Formulation
I Available are n i.i.d. multivariate Gaussian observations {zt}nt=1,where zt ∈ Rpq, having zero-mean and covariance equal to
Σ = A0︸︷︷︸p×p
⊗ B0︸︷︷︸q×q
=
[A0]1,1B0 . . . [A0]1,pB0
.... . .
...[A0]p,1B0 . . . [A0]p,pB0
,where A0 ∈ Sp
++ and B0 ∈ Sq++.
I Goal is to estimate the covariance matrix and its inverse Θ = Σ−1
(precision matrix).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Graphical Lasso (Yuan and Lin [2007])
Penalized negative log-likelihood function for Θ = Σ−1:
J(Θ) := tr(ΘSn)− log det(Θ) + λ|Θ|1 (1)
where Sn = 1n
∑nt=1 ztzTt is the sample covariance matrix (SCM).
Minimizer Θn ∈ arg min J(Θ).
I Fast algorithms exist for minimizing (1) (Friedman et al. [2008],Hsieh et al. [2011]) with worst-case computational complexity ofO(d4).
I High-dimensional MSE convergence rate (Rothman et al. [2008]):
‖Θn −Θ0‖2F = OP
((d + sΘ0 ) log(d)
n
)(2)
where λ �√
log(d)n .
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
ML estimator of Kronecker structured covariance
Negative log-likelihood function when Θ has Kronecker structureΘ = X⊗ Y:
J(X,Y) = tr((X⊗ Y)Sn)− q log det(X)− p log det(Y) (3)
Alternating minimization yields Flip-Flop algorithm (Werner et al. [2008])that generates updates of A = X−1, B = Y−1
A(B)︸ ︷︷ ︸p×p
=1
q
q∑k,l=1
[B−1]k,l Sn(l , k) (4)
B(A)︸ ︷︷ ︸q×q
=1
p
p∑i,j=1
[A−1]i,j Sn(j , i) (5)
where Sn = KTp,qSnKp,q
and Kp,qvec(N) = vec(NT ) for any p × q matrix N.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Submatrix partitioning of SCMŜn(1,2) Ŝn(1,1)
Figure : SCM of size pq × pq with p = 4, q = 5. Blue: Sn(1, 2). Red: Sn(1, 1).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
MSE Convergence Rate of FF (Tsiligkaridis et al. [2012])
Let RFF (3) := A(B(Ainit))⊗ B(A(B(Ainit))) denote the 3-step(noniterative) version of the flip-flop algorithm (Werner et al. [2008]).More generally, let RFF (k) denote the k-step version of the flip-flopalgorithm.
TheoremLet A0,B0, and Ainit have uniformly bounded spectra and defineM = max(p, f , n). Assume p ≥ q ≥ 2 and p log M ≤ C ′′n for some finiteconstant C ′′ > 0. Finally, assume n ≥ p
q + 1. Then, for k ≥ 2 finite,
‖ΘFF (k)−Θ0‖2F = OP
((p2 + q2) log M
n
)(6)
as n→∞.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
KGlasso Algorithm
min Jλ(X,Y) = J(X,Y) + λX |X|1 + λY |Y|1 (7)
where J(·, ·) is given in (3) and λX , λY ≥ 0.
Algorithm 1 KGlasso (Tsiligkaridis et al. [2012, 2013a])
1: Input: Sn, p, q, n, λX > 0, λY > 02: Output: ΘKGlasso
3: Initialize Ainit to be positive definite.4: A← Ainit
5: repeat6: B← 1
p
∑pi,j=1 [A−1]i,j Sn(j , i)
7: Y ← arg minY∈Sq++
tr(YB)− log det(Y) + λY |Y|18: A← 1
q
∑qk,l=1 [B−1]k,l Sn(l , k)
9: X← arg minX∈Sp++
tr(XA)− log det(X) + λX |X|110: until convergence11: ΘKGlasso ← X⊗ Y
Computational complexity: O(p4 + q4) (KGlasso)
vs O(p4q4) (Glasso).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
KGlasso Convergence Rate (Tsiligkaridis et al. [2012])
Define ΘKGlasso(k) as the output of the kth KGlasso iteration.
TheoremLet A0,B0,Ainit have uniformly bounded spectra. Let M = max(p, f , n).Assume sparse X0 and Y0, i.e. sX0 = O(p), sY0 = O(f ). Assume
max(
pq ,
qp
)log M = o(n). If in the KGlasso algorithm
λ(k)X �
(1√p + 1√
q
)q√
log Mn and λ
(k′)Y �
(1√p + 1√
q
)p√
log Mn for all
k , k ′ ≥ 1, then
‖ΘKGlasso(k)−Θ0‖2F = OP
((p + q) log M
n
)(8)
as n→∞.
Assume p ∼ q. Comparing the KGlasso convergence rate (p + q)/n (8) withothers
I SCM rate: p2q2/n. Worse by 3 orders of magnitude
I FF rate: (p2 + q2)/n. Worse by 1 order of magnitude
I Glasso rate: (pq + sΘ0 )/n. Worse by 1 order of magnitude.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Large Sample MSE Convergence
We considered X0 and Y0 large sparse matrices of dimensionp = q = 100 yielding a covariance matrix Θ0 of dimension10, 000× 10, 000. This dimension was too large for implementation ofGlasso even when implemented using the state-of-the-art algorithm(Hsieh et al. [2011]). However, we can run KGlasso and FF and compareperformances since they have considerably less computational burden.
Figure : Sparse Kronecker matrix representation. Left panel: left Kroneckerfactor. Right panel: right Kronecker factor. The sparsity factor for bothprecision matrices is approximately 200.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Large Sample MSE Convergence (Cont.)
101
102
10−1
100
Sample size (n)
Nor
mal
ized
Fro
beni
us e
rror
in in
vers
e co
v. m
atrix
Max number of iter. = 100, Trials = 40, (p,f)=(100,100)
FFKGLassoFF/Thres
Student Version of MATLAB
Figure : Normalized RMSE performance for precision matrix as a function ofsample size n. For n = 10, there is a 72% RMSE reduction from the FF toKGLasso solution and a 70% RMSE reduction from the FF/Thres to KGLasso.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Kronecker PCA
S
V
projV(S)
V1=A1xB1V2=A2xB2
V1V2
projV2(S)
projV1(S)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Introduction
I Represent covariance as a Sum of Kronecker Products (SKP) of twolower dimensional factor matrices.
Σ0 =r∑
γ=1
A0,γ ⊗ B0,γ (9)
where {A0,γ} are p× p linearly independent matrices and {B0,γ} areq × q linearly independent matrices.
I Note 1 ≤ r ≤ r0 = min(p2, q2) and refer to r as the separation rank.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Introduction
Applications of Sum of Kronecker Products (SKP) model (9)
I Spatiotemporal MEG/EEG covariance modeling (de Munck et al.[2002, 2004], Bijma et al. [2005], Jun et al. [2006])
I Synthetic Aperture Radar (SAR) data analysis (Tebaldini [2009],Rucci et al. [2010])
Van Loan and Pitsianis [1993]:
I Any pq × pq matrix Σ0 can be written as an orthogonal expansionof Kronecker products of the form (9)
I Low separation rank is equivalent to low rank in a permuted spacedefined by the reshaping operator R(·)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Low separation rank ⇔ Low rank in permuted space
Original Covariance Σ0
50 100 150 200
50
100
150
200
Permuted Covariance R0
50 100 150 200 250 300 350 400
20
40
60
80
100
0
0.5
1
1.5
2
2.5
3
3.5
Student Version of MATLAB
Figure : Original (top) and permuted covariance (bottom) matrix. The originalcovariance is Σ0 = A0 ⊗ B0, where A0 is a 10× 10 Toeplitz matrix and B0 is a20× 20 unstructured p.d. matrix. Note that the permutation operator R mapsa symmetric p.s.d. matrix Σ0 to a non-symmetric rank 1 matrix R0 = R(Σ0).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Permuted rank-penalized least-squares (PRLS)(Tsiligkaridis and Hero [2013a,b])
1. Map SCM to a different linear space:
Rn = R(Sn) ∈ Rp2×q2
2. Solve least-squares problem with nuclear norm penalization:
Rλn ∈ arg minR∈Rp2×q2
‖Rn − R‖2F + λ‖R‖∗ (10)
3. Map back to original space:
Sλn = R−1(Rλn ) ∈ Rpq×pq
where λ ≥ 0 is a regularization parameter.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Properties of PRLS Estimator (Tsiligkaridis and Hero[2013b])
Theorem
I The solution Σλn is symmetric.
I If n ≥ pq, then the solution Σλn is positive definite with probability 1.
TheoremDefine M = max(p, q, n). Set
λ = λn = 2C0t1−2ε′ max
{p2+q2+log M
n ,√
p2+q2+log Mn
}for t > 0 large enough.
Then, with probability at least 1− 2M−t
4C :
‖Σλn −Σ0‖2
F ≤ infR:rank(R)≤r
‖R− R0‖2F
+ C ′r max
{(p2 + q2 + log M
n
)2
,p2 + q2 + log M
n
}(11)
for some absolute constant C ′ > 0.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Setup
I NCEP Dataset: Daily average wind speeds collected at q = 144× 73weather stations spread throughout the world (Kalnay et al. [1996],Tsiligkaridis and Hero [2013b])
I Considered a 10× 10 grid of stations, corresponding to latituderange 90◦N-67.5◦N and longitude range 0◦E-22.5◦E
I Prediction time lag p − 1 = 7, full dimension d = pq = 800, numberof training samples n = 228.
I Training period: 2003− 2007, Testing period: 2008− 2012.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Kronecker product decomposition: PRLS
SCM
200 400 600 800
200
400
600
800
PRLS, reff = 2
200 400 600 800
200
400
600
800
KP Left Factor: Temporal
2 4 6 8
2
4
6
8
KP Right Factor: Spatial
20 40 60 80 100
20
40
60
80
100
Student Version of MATLAB
Figure : Sample covariance matrix (SCM) (top left), PRLS covariance estimate(top right), temporal Kronecker factor for first KP component (bottom left)and spatial Kronecker factor for first KP component (bottom right).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Kronecker Spectrum
Figure : Kronecker spectrum of SCM (left) and Eigenspectrum of SCM (right).The KP spectrum is more compact than the eigenspectrum.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
RMSE performance gains
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Station
Roo
t Mea
n S
quar
e E
rror
SCMPRLSReg. Tyler
Student Version of MATLAB
Figure : RMSE prediction performance across q stations for linear estimatorsusing SCM (blue), PRLS (green) and regularized Tyler (magenta).
I Average gain of PRLS over SCM = 4.64 dB
I Average gain of Reg. Tyler over SCM = 3.41 dB
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Centralized Collaborative 20 Questions
Fusion Center
S1
S2
S3
An(1)
An(2)
An(3)
Target X*
Yn+1(1)
Yn+1(2)
Yn+1(3)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Motivation
Fusion Center
An(1)
An(2)
Target X*
Yn+1(1) Yn+1
(2)
I What is the intrinsic value of adding a human-in-the-loop to anautonomous learning machine?
I Insight into human-aided autonomous sensing for estimating anunknown target location or identifying a target.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Motivation
Figure : PTZ IP camera. Source: en.wikipedia.org/wiki/Pan-tilt-zoom camera
I Sensor systems become more flexible, e.g. pan-tilt-zoom cameras:where to look? different sensor waveforms & observations modes?How to control these aspects for a common localization objective?
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work & Applications
Ask a sequence of questions and refine posterior distribution of target’slocation given the responses.
I Probabilistic Bisection Algorithm (PBA) first introduced in (Horstein[1963]).
I Discretized PBA (Burnashev and Zigangirov [1974]).
I Noisy Binary Search (Karp and Kleinberg [2007]).
I Convergence rate for BZ algorithm (Castro and Nowak [2007]).
I Noisy 20 questions game: PBA shown to be optimal under minimumexpected entropy criterion (Jedynak et al. [2012]).
I Convergence rate for PBA (Waeber et al. [2013]).
Applications of PBA: stochastic root finding, combinatorial optimization,road tracking, electron microscopy
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Single player setting
I Jedynak et al. [2012] considers 20 questions with noise, where anoisy oracle is queried whether a target X ∗ lies in a set An ⊂ Rd .
I Starting with a prior distribution on the target’s location p0(·),minimize expected entropy of the posterior distribution:
infπEπ [H(pN)] (12)
where π = (π0, π1, . . . ) denotes the policy. The posteriormean/median of pN(·) is the target location estimate.
I Jedynak et al. [2012] shows the bisection policy is optimal under theminimum entropy criterion. Assuming the noisy channel is a BSC,optimal policies are characterized by:
Pn(An) :=
∫An
pn(x)dx = 1/2 (13)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Noisy 20 Questions with Collaborative Players: Model(Tsiligkaridis et al. [2013c])
I M collaborating players can be asked questions at each time instant.
I mth player’s query at time n: “does X ∗ lie in the region A(m)n ⊂ Rd?”
I Query is the binary variable Z(m)n = I (X ∗ ∈ A
(m)n ) ∈ {0, 1} to which the
player yields provides a noisy response Y(m)n+1 ∈ {0, 1}.
I Define the M-tuples Yn+1 = (Y(1)n+1, . . . ,Y
(M)n+1 ) and An = {A(1)
n , . . . ,A(M)n }.
AssumptionPlayers’ responses are conditionally independent:
P(Yn+1 = y|An,X∗ = x ,Fn) =
M∏m=1
P(Y(m)n+1 = y (m)|A(m)
n ,X ∗ = x ,Fn) (14)
P(Y(m)n+1 = y (m)|A(m)
n ,X ∗ = x ,Fn) =
{f
(m)1 (y (m)|εm), x ∈ A
(m)n
f(m)
0 (y (m)|εm), x /∈ A(m)n
(15)
f(m)j (y (m)|εm) =
{1− εm, y (m) = j
εm, y (m) = 1− j(16)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Optimal Joint Query Design: Setup
I Joint controller chooses M queries A(m)n at time n. Define the set of
subsets of Rd :
γ(A(1), . . . ,A(M)) =
{M⋂
m=1
(A(m))im : im ∈ {0, 1}
}
where (A)0 := Ac and (A)1 := A. The cardinality of this set ofsubsets is 2M and these subsets partition Rd .
I Define the density parameterized by An, pn, i1, . . . , iM :
gi1:iM (y (1), . . . , y (M)|An,Fn) :=M∏
m=1
f(m)im
(y (m)|A(m)n ,Fn)
where ij ∈ {0, 1}.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Sequential Query Design
Player 1Posterior
UpdateController 1
Controller M Player MPosterior
Update
Player 2Posterior
UpdateController 2
.
.
.
I Query region Ant chosen at time nt = (n, t), where n = 0, 1, . . .indexes over cycles and t = 0, . . . ,M − 1 indexes within cycles.
I Nested sequence of sigma-algebras Gn,t , Gn,t ⊂ Gn+i,t+j for all i ≥ 0and j ∈ {0, . . . ,M − 1− t}, generated by sequence of queries andthe players’ responses.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Optimal Joint Query Design
Player 1
Player M
Fusion
Center
Joint
Controller
.
.
.
I Joint controller chooses a batch of M queries {A(m)n } at time n.
I As in sequential query design, joint queries chosen based onaccumulation information at controller. Since full batch of jointqueries are determined at start of nth cycle, the joint controller onlyhas access to a coarser filtration Fn, Fn−1 ⊂ Fn, as compared toGn,t .
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Equivalence Theorem (Tsiligkaridis et al. [2013c])
Theorem(Equivalence, Known Error Probabilities)
1. The expected entropy loss under an optimal joint query design is thesame as the greedy sequential query design. This loss is given by:
C =M∑
m=1
C (εm) =M∑
m=1
(1− hb(εm)) (17)
where hb(εm) = −εm log(εm)− (1− εm) log(1− εm) is the binaryentropy function.
2. All jointly optimal control laws equalize the posterior probability over
the dyadic partitions induced by An = {A(1)n , . . . ,A
(M)n }:
Pn(R) =
∫R
pn(x)dx = 2−M ,∀R ∈ γ(An). (18)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Consequences of Equivalence Theorem
I Optimal policy can be implemented using the simpler sequentialquery design.
I Despite the fact that all players are conditionally independent, thejoint policy does not decouple into separate single player optimalpolicies (analogous to the non-separability of the optimalvector-quantizer in source coding even for independent sourcesGersho and Gray [1992]).
I Optimal queries must be overlapping-i.e.,⋂M
m=1 A(m)n 6= ∅, but not
identical.
I Optimal query An is not unique.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Example of optimal queries for M = 2
Figure : Jointly optimal queries under uniform prior.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Lower Bounds on MSE via Entropy Loss
Theorem(Lower Bound on MSE) Assume the entropy H(p0) is finite. Then, theMSE of the joint or sequential query policies satisfies:
K
2πed exp
(−2nC
d
)≤ E[‖ X ∗ − Xn ‖2
2] (19)
where K = e2H(p0) and Xn is the posterior mean. The expected entropyloss per iteration is C =
∑m C (εm).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Upper Bounds on MSE: Setup
I Performance analysis of PBA is difficult primarily due to thecontinuous nature of the posterior Castro and Nowak [2007].
“The probabilistic bisection algorithm seems to work extremely wellin practice, but it is hard to analyze and there are few theoreticalguarantees for it, especially pertaining error rates of convergence.”
I A discretized version of PBA was proposed in (Burnashev andZigangirov [1974]) (BZ algorithm), which imposes a piecewiseconstant structure on the posterior (see Castro and Nowak [2007],App. A in Castro [2007]).
I Recently, an answer for the continuous PBA was given in (Waeberet al. [2013]) for one-dimensional target search.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Upper Bounds on MSE: Setup
I For simplicity, assume the target location is constrained to the unitinterval X = [0, 1].
I A step size ∆ > 0 is defined such that ∆−1 ∈ N and the posteriorafter j iterations is pj : X → R, given by
pj(x) =1
∆
∆−1∑i=1
ai (j)I (x ∈ Ii )
where I1 = [0,∆], Ii = ((i − 1)∆, i∆] for i = 2, . . . ,∆−1. The initialposterior is ai (0) = ∆. The posterior is characterized completely bythe pseudo-posterior a(j) = [a1(j), . . . , a∆−1 (j)] which is updated ateach iteration via Bayes rule.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Upper Bounds on MSE
Theorem(Upper Bound on MSE) Consider the sequential bisection algorithm forM players in one-dimension, where each bisection is implemented usingthe BZ algorithm. Then, we have:
P(|X ∗ − Xn| > ∆) ≤ (1
∆− 1) exp
(−nC
)E[(X ∗ − Xn)2] ≤ (2−2/3 + 21/3) exp
(−2
3nC
)(20)
where C =∑M
m=1 C (εm), C (ε) = 1/2−√ε(1− ε).
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Upper Bounds on MSE: Human-in-the-loop
I Player 1 (machine) has constant error probability ε1 ∈ (0, 1/2)
I Player 2 (human) has error probability depending on the targetlocalization error:
P(Y(2)n+1 = y (2)|Z (2)
n = 1− y (2)) =1
2−min(δ0, µ|X ∗ − Xn|κ−1) (21)
I κ = human ”resolution” (κ > 1)
I δ0 = reliability parameter (0 < δ0 < µ < 1/2)
I MSE upper bound for “player 1 + human” system:
E[(X ∗ − Xn)2] ≤ e−23nC(ε1)
×[
2−2/3 + 21/3 exp
(−µ
2
50
(3 · 2−1/3
4
)2κ−2
ne−nC(ε1) 2κ−23
)](22)
which is no greater than the “player 1” MSE bound.
I Both bounds converge to zero at the same rate as n→∞.
I Human gain ratio (HGR) = ratio of MSE upper bounds associated with“player 1” and “player 1 + human”.
Rn(κ) =2−2/3 + 21/3
2−2/3 + 21/3 exp(−µ2
50( 3·2−1/3
4)2κ−2ne−nC(ε1) 2κ−2
3
) (23)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Upper Bounds on MSE: Human-Gain Ratio
I The larger ε1 is, the larger is the HGR.
I As κ decreases to 1, the ratio increases, meaning that the human becomesmore like the machine and helps more.
0 100 200 300 400 500 600 700 800 900 1000
1
1.05
1.1
1.15
1.2
Iteration (n)
MS
E H
uman
Gai
n R
atio
ε1 = 0.4
Opt. κ = 1.5Opt. κ = 2Opt. κ = 2.5Pred. κ = 1.5Pred. κ = 2Pred. κ = 2.5
Student Version of MATLAB
Figure : Human gain ratio as a function of κ. The human provides the largestgain in the beginning few iterations and its value of information decreases asn→∞. The predictions well match the optimized bounds.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Simulation (Known error probabilities): Initial Distribution
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8x 10
−3 Iteration = 0
Student Version of MATLAB
Figure : Initial distribution is a mixture of three Gaussians with means 0.25, 0.5and 0.75, and variances 0.02, 0.05 and 0.08, respectively. The target was set tobe the center of the mode at X ∗ = 0.75 with the largest variance.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Simulation (Known error probabilities): MSE Decay
0 20 40 60 80 100 120 1400
0.05
0.1
0.15
0.2
0.25
Iteration (n)
MS
E
κ = 1.5, δ0 = 0.4, mu = 0.42
p1: ε
1 = 0.1
p1+human: ε1 = 0.1
p1: ε1 = 0.2
p1+human: ε1 = 0.2
p1: ε1 = 0.3
p1+human: ε1 = 0.3
p1: ε1 = 0.4
p1+human: ε1 = 0.4
Student Version of MATLAB
Figure : Monte Carlo simulation for MSE performance of the sequentialestimator as a function of iteration and ε1 ∈ (0, 1/2). 2000 Monte Carlo trialswere used. The human parameters were set to κ = 1.5, µ = 0.42, δ0 = 0.4, thelength of pseudo-posterior was ∆−1 = 1618. The target was set to X ∗ = 0.75.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Decentralized Collaborative 20 Questions
S1
S2
S3
An(1)
An(2)
An(3)
Target X*S4
S5
An(4)
An(5)
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Motivation
Consider a collection of agents in a network with the objective oflocalizing a target collectively.
I What is the value of collaboration when there is no central authority?
I Local in-network querying and processing leads to globalequilibrium? Deterministic or random limit? Unbiasedness?
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Intractability of fully Bayesian methodology
I limited observability (observations of an agent not observable byothers) & lack of global knowledge of observation statistics
I if agents have only partial information on the network structure andthe probability distribution of the signals observed by other agents,the Bayesian approach becomes more complicated because agentswould need to form and update beliefs on the states of the world, inaddition to the networks struture and the rest of the agents’ signalstructures
I even if the network structure is known, agents would still need toupdate beliefs on the information of every other agent in thenetwork, given only the neighbors’ beliefs at each iteration
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work on Distributed Averaging
Consensus, gossip algorithms, distributed averaging: messages distributedaround network through local processing.
I averaging under randomized gossip (Boyd et al. [2006])
I geographic gossip (Dimakis et al. [2006])
I randomized path averaging (Benezit et al. [2010])
I gossip algorithms for sensor networks (Dimakis et al. [2010])
I randomized gossip broadcast algorithms for consensus (Aysal et al.[2009])
I gossip distributed estimation for linear parameter estimation (Karand Moura [2011])
I consensus for wireless medium (Nokleby et al. [2013])
Applications: distributed optimization (Tsitsiklis [1984], Tsitsiklis et al.[1986]), load-balancing (Cybenko [1989]), distributed detection(Saligrama et al. [2006])Our work differs because we consider new information injected into thedynamical system described by averaging and because we considercontrolled observations.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work on Social Learning
Dynamic model of opinion formation.
I opinion formation model (DeGrout [1974])
I convergence of dynamics generated by non-Bayesian decentralizedestimation scheme (Jadbabaie et al. [2012])
I rate of convergence analysis (Molavi et al. [2013])
Our work differs because we consider continuous-valued target space andcontrolled observations.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work on Computerized Adaptive Testing
Given current estimate of proficiency, how to choose next test item?
I dynamic selection of test items via item-response theory &maximum information or maximum expected precision criterion(Wainer [2000], Owen [1975])
Our work differs because we consider continuous-valued query regions, nopractical constraints necessary, and a different objective function.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Prior Work on Active Stochastic Search/20 Questions
Active querying for sequential estimation.
I single-player 20 questions for target localization (Jedynak et al.[2012])
I convergence rate for discretized version of single-player 20 questions(Castro and Nowak [2007])
I convergence rate for continuous-space single-player PBA (Waeberet al. [2013])
I (centralized) multi-player 20 questions for target localization(Tsiligkaridis et al. [2013b])
Our work differs because we consider intermediate local belief sharingbetween agents after each local bisection and Bayesian update (entropyno longer monotonically decreasing for each agent!). Also, each agentincorporates beliefs of neighbors in a way that is agnostic of neighbors’error probabilities.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Notation
I X ∗ ∈ [0, 1] = true target location
I N = {1, . . . ,M} = agent set of network
I G = (N ,E ) directed graph capturing agent interactions
I Ni = {j ∈ N : (j , i) ∈ E} = local neighborhood of ith agent
I pi,t(·) = belief of ith agent at time t
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Decentralized Estimation
Algorithm 2 Decentralized Estimation Algorithm
1: Input: G = (N , E),A = {ai,j : (i, j) ∈ N ×N}, {εi : i ∈ N}2: Output: {Xi,t , Xi,t : i ∈ N}3: Initialize pi,0(·) to be positive everywhere.4: repeat5: For each agent i ∈ N :6: Bisect posterior density at median: Xi,t = F−1
i,t (1/2).
7: Obtain (noisy) binary response yi,t+1 ∈ {0, 1}.8: Belief update:
pi,t+1(x) = ai,ipi,t(x)li (yi,t+1|x, Xi,t)
Zi,t(yi,t+1)+
∑j∈Ni
ai,jpj,t(x), x ∈ X (24)
where the observation p.m.f. is:
li (y |x, Xi,t) = f(i)
1 (y)I (x ≤ Xi,t) + f(i)
0 (y)I (x > Xi,t), y ∈ Y (25)
and f(i)
1 (y) = (1− εi )I (y=1)εI (y=0)i , f
(i)0 (y) = 1− f
(i)1 (y).
9: Calculate target estimate: Xi,t =∫X xpi,t(x)dx .
10: until convergence
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Assumptions
I (Conditional Independence) Assume conditional independence:
P(Yt+1 = y|Ft) =M∏i=1
P(Yi,t+1 = yi |Ft) (26)
and each player’s response is governed by:
li (yi |x ,Ai,t) := P(Yi,t+1 = yi |Ai,t ,X∗ = x) =
{f
(i)1 (yi ), x ∈ Ai,t
f(i)
0 (yi ), x /∈ Ai,t
(27)
I (Memoryless Binary Symmetric Channels) Model players’ responses asindependent BSC’s with crossover probabilities εi ∈ (0, 1/2).
f(i)z (yi ) =
{1− εi , yi = zεi , yi 6= z
for i = 1, . . . ,M, z = 0, 1.
I (Strong Connectivity & Positive Self-reliances) Assume that the network isstrongly connected and all self-reliances ai,i are strictly positive.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Global Convergence Theory
Theorem(Asymptotic Agreement/Consensus) Consider Algorithm 2. LetB = [0, b] ∈ B([0, 1]). Then, consensus of the agents’ beliefs is asymptoticallyachieved across the network:
Vt(B) = maxi
Pi,t(B)−mini
Pi,t(B)p.−→ 0
as t →∞.
Theorem(Convergence of Beliefs to a Deterministic Limit & Consistency) ConsiderAlgorithm 2. Let B = [0, b] ∈ B([0, 1]). Then, we have:
1. For each i ∈ N :
Fi,t(b) = Pi,t(B)p.−→ F∞(b) =
{0, b < X ∗
1, b > X ∗
2. For all i ∈ N :
Xi,t :=
∫ 1
x=0
xpi,t(x)dxp.−→ X ∗
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Simulation: Three network topologies
a) Fully connected graph b) Cyclic graphc) Star graph
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
MSE Performance, εi = 0.4,∀i
0 100 200 300 400 500 600 700 800 900 1000
10−4
10−3
10−2
10−1
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 1000
10−4
10−3
10−2
10−1
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 1000
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: max
0 100 200 300 400 500 600 700 800 900 10000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: max
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
MSE Performance, ε1 = 0.05, εi = 0.45,∀i 6= 1
0 50 100 150 200 250 300 350 400 450 500
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: maxCentralized
Student Version of MATLAB
0 50 100 150 200 250 300 350 400 450 500
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: maxCentralized
Student Version of MATLAB
0 50 100 150 200 250 300 350 400 450 500
100
Iteration
MS
E
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
Iteration
δ pr
obab
ility
mas
s
I: avgI: minI: maxA: avgA: minA: maxCentralized
0 50 100 150 200 250 300 350 400 450 5000
2
4
6
8
10
Iteration
Ent
ropy
I: avgI: minI: maxA: avgA: minA: maxCentralized
Student Version of MATLAB
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Main Contributions
1. Kronecker Graphical Lasso
I Sparse covariance estimation algorithm (KGlasso) introduced for thehigh-dimensional setting for Kronecker product structure.
I High-dimensional MSE convergence rate analysis.
I Analysis prescribes selection of regularization parameters.
2. Covariance Estimation via Kronecker Product Expansions
I Scalable covariance estimation algorithm (PRLS) introduced for thehigh-dimensional setting.
I Tradeoff between approximation error and estimation error.
I High-dimensional MSE convergence rate analysis.
I Analysis prescribes selection of regularization parameter.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Main Contributions
3. Centralized Collaborative 20 Questions
I Introduced model for centralized collaborative 20 questions.
I Characterized optimal policies & proved equivalence theorem thatsimplifies policy implementation.
I Incorporated human-in-the-loop by treating him as a collaborativeplayer.
I Linked information theoretic gains to MSE convergence rates.
4. Decentralized Collaborative 20 Questions
I Introduced model for decentralized collaborative 20 questions.
I Proved consensus of agents’ beliefs & global consistency ofdecentralized estimation algorithm.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
Thank you!
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
G. I. Allen and R. Tibshirani. Transposable regularized covariance modelswith an application to missing data imputation. The Annals of AppliedStatistics, 4(2):764–790, 2010.
T. C. Aysal, M. E. Yildiz, A. D. Sarwate, and A. Scaglione. Broadcastgossip algorithms for consensus. IEEE Transactions on SignalProcessing, 57(7), July 2009.
F. Benezit, A. Dimakis, P. Thiran, and M. Vetterli. Order-optimalconsensus through randomized path averaging. IEEE Transactions onInformation Theory, 56(10):5150–5167, October 2010.
P. Bickel and E. Levina. Covariance regularization by thresholding.Annals of Statistics, 36(6):2577–2604, 2008.
Fetsje Bijma, Jan de Munck, and Rob Heethaar. The spatiotemporal megcovariance matrix modeled as a sum of kronecker products.NeuroImage, 27:402–415, 2005.
E. Bonilla, K. M. Chai, and C. Williams. Multi-task gaussian processprediction. Advances in Neural Information Processing Systems, pages153–160, 2008.
S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized GossipAlgorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, June 2006.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
M. V. Burnashev and K. Sh. Zigangirov. An interval estimation problemfor controlled observations. Problems in Information Transmission, 10:223–231, 1974.
T. Tony Cai, Z. Ren, and H. Zhou. Optimal rates of convergence forestimating toeplitz covariance matrices. Probability Theory andRelated Fields, March 2012.
R. Castro. Active Learning and Adaptive Sampling for Non-parametricInference. PhD thesis, Rice University, August 2007.
R. Castro and R. Nowak. Active learning and sampling. In Foundationsand Applications of Sensor Management. Springer, 2007.
N. Cressie. Statistics for Spatial Data. Wiley, New York, 1993.
G. Cybenko. Dynamic load balancing for distributed memorymultiprocessors. Journal of Parallel and Distributed Computing, 7(2):279–301, 1989.
J. C. de Munck, H. M. Huizenga, L. J. Waldorp, and R. M. Heethaar.Estimating stationary dipoles from meg/eeg data contaminated withspatially and temporally correlated background noise. IEEETransactions on Signal Processing, 50(7), July 2002.
J. C. de Munck, F. Bijma, P. Gaura, C. A. Sieluzycki, M. I. Branco, andR. M. Heethaar. A maximum-likelihood estimator for trial-to-trial
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
variations in noisy meg/eeg data sets. IEEE Transactions onBiomedical Engineering, 51(12), 2004.
M. H. DeGrout. Reaching a consensus. Journal of American StatisticalAssociation, 69:118–121, 1974.
A. Dimakis, A. Sarwate, and M. Wainwright. Geographic gossip:Efficient averaging for sensor networks. IEEE Transactions on SignalProcessing, 56(3):1205–1216, March 2006.
A. Dimakis, S. Kar, J. M. F. Moura, M. G. Rabbat, and A. Scaglione.Gossip algorithms for distributed signal processing. Proceedings of theIEEE, 98(11), November 2010.
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covarianceestimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
M. G. Genton. Separable approximations of space-time covariancematrices. Environmetrics, 18:681–695, 2007.
A. Gersho and R. M. Gray. Vector quantization and Signal Compression.Kluwer Academic Press/Springer, 1992.
M. Horstein. Sequential transmission using noiseless feedback. IEEETransactions on Information Theory, pages 136–143, July 1963.
C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, and P. Ravikumar. Sparse inversecovariance matrix estimation using quadratic approximation. Advancesin Neural Information Processing Systems, 24, 2011.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
J. Huang, N. Liu, M. Pourahmadi, and L. Liu. Covariance matrixselection and estimation via penalised normal likelihood. Biometrika,93(1), 2006.
A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi.Non-bayesian social learning. Games and Economic Behavior, 76:210–225, 2012.
B. Jedynak, P. I. Frazier, and R. Sznitman. Twenty questions with noise:Bayes optimal policies for entropy loss. Journal of Applied Probability,49:114–136, 2012.
S. C. Jun, S. M. Plis, D. M. Ranken, and D. M. Schmidt. Spatiotemporalnoise covariance estimation from limited empiricalmagnetoencephalographic data. Physics in Medicine and Biology, 51:5549–5564, 2006.
E. Kalnay, M. Kanamitsu, R. Kistler, W. Collins, D. Deaven, L. Gandin,M. Iredell, S. Saha, G. White, J. Woollen, Y. Zhu, M. Chelliah,W. Ebisuzaki, W.Higgins, J. Janowiak, K. C. Mo, C. Ropelewski,J. Wang, A. Leetmaa, R. Reynolds, Roy Jenne, and Dennis Joseph.The ncep/ncar 40-year reanalysis project. Bulletin of the AmericanMeteorological Society, 77(3):437471, 1996.
S. Kar and J. M. F. Moura. Covergence rate analysis of distributed gossip
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
(linear parameter) estimation: Fundamental limits and tradeoffs. IEEEJournal of Selected Topics in Signal Processing, 5(4), August 2011.
N. El Karoui. Spectrum estimation for large dimensional covariancematrices using random matrix theory. Annals of Statistics, 36(6):2757–2790, 2008.
R. M. Karp and R. Kleinberg. Noisy binary search and its applications. InProceedings of the eighteenth annual ACM-SIAM symposium onDiscrete algorithms (SODA), pages 881–890, 2007.
Steffen L. Lauritzen. Graphical Models. Oxford University Press US, firstedition, 1996.
Charles Van Loan and Nikos Pitsianis. Approximation with kroneckerproducts. In Linear Algebra for Large Scale and Real TimeApplications, pages 293–314. Kluwer Publications, 1993.
P. Molavi, A. Jadbabaie, K. R. Rad, and A. Tahbaz-Salehi. Reachingconsensus with increasing information. IEEE Journal of Selected Topicsin Signal Processing, 7(2):358–369, April 2013.
M. Nokleby, W. Bajwa, R. Calderbank, and B. Aazhang. Towardresource-optimal consensus over the wireless medium. IEEE Journal ofSelected Topics in Signal Processing, 7(2), April 2013.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
R. J. Owen. A bayesian sequential procedure for quantal response in thecontext of adaptive mental testing. Journal of the American StatisticalAssociation, 70:351–356, 1975.
D. Paul. Asymptotics of sample eigenstructure for a large dimensionalspiked covariance model. Statistica Sinica, 17:1617–1642, 2007.
N. Raj Rao, J. Mingo, R. Speicher, and A. Edelman. Statisticaleigen-inference from large wishart matrices. Annals of Statistics, 36(6):2850–2885, 2008.
A. Rothman, P. Bickel, E. Levina, and J. Zhu. Sparse permutationinvariant covariance estimation. Electronic Journal of Statistics, 2:494–515, 2008.
A. Rucci, S. Tebaldini, and F. Rocca. Skp-shrinkage estimator for sarmulti-baselines applications. In Proceedings of IEEE Radar Conference,2010.
V. Saligrama, M. Alanyali, and O. Savas. Distributed detection in sensornetworks with packet loss and finite capacity links. IEEE Transactionson Signal Processing, 54(11):4118–4132, November 2006.
S. Tebaldini. Algebraic synthesis of forest scenarios from multibaselinepolinsar data. IEEE Transactions on Geoscience and Remote Sensing,47(12), December 2009.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
T. Tsiligkaridis and A. O. Hero. Low separation rank covarianceestimation using kronecker product expansions. In Proceedings of IEEEInternational Symposium on Information Theory (ISIT), July 2013a.
T. Tsiligkaridis and A. O. Hero. Covariance Estimation via KroneckerProduct Expansions. arXiv: 1302.2686, February 2013b.
T. Tsiligkaridis, A. O. Hero, and S. Zhou. Convergence Properties ofKronecker Graphical Lasso algorithms. arXiv:1204.0585, July 2012.
T. Tsiligkaridis, A. O. Hero, and S. Zhou. On Convergence of KroneckerGraphical Lasso Algorithms. IEEE Transactions on Signal Processing,61(7):1743–1755, April 2013a.
T. Tsiligkaridis, B. M. Sadler, and A. O. Hero. Collaborative 20 Questionsfor Target Localization. Preprint, arXiv: 1306.1922, August 2013b.
T. Tsiligkaridis, B. M. Sadler, and A. O. Hero. A Collaborative 20Questions model for target search with human-machine interaction. InProceedings of IEEE International Conference on Acoustics, Speech,and Signal Processing (ICASSP), 2013c.
J. Tsitsiklis. Problems in decentralized decision making and computation.PhD thesis, Massachussets Institute of Technology, Cambridge, MA,November 1984.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
J. Tsitsiklis, D. Bertsekas, and M. Athans. Distributed asynchronousdeterministic and stochastic gradient optimization algorithms. IEEETransactions on Automatic Control, 31(9):803–812, September 1986.
R. Waeber, P. I. Frazier, and S. G. Henderson. Bisection search withnoisy responses. SIAM Journal of Control and Optimization, 53(3):2261–2279, 2013.
H. Wainer. Computerized Adaptive Testing: A Primer. Routledge, 2edition, 2000.
K. Werner, M. Jansson, and P. Stoica. On estimation of covariancematrices with Kronecker product structure. IEEE Transactions onSignal Processing, 56(2), February 2008.
Karl Werner and Magnus Jansson. Estimation of kronecker structuredchannel covariances using training data. In Proceedings of EUSIPCO,2007.
A. Wiesel, Y. Eldar, and A. O. Hero. Covariance estimation indecomposable gaussian graphical models. IEEE Transactions on SignalProcessing, 58(3):1482–1492, March 2010.
J. Yin and H. Li. Model selection and estimation in the matrix normalgraphical model. Journal of Multivariate Analysis, 107:119–140, 2012.
Kronecker GLasso Kronecker PCA Centralized Collaborative 20 Q. Decentralized Collaborative 20 Q. Conclusion References
K. Yu, J. Lafferty, S. Zhu, and Y. Gong. Large-scale collaborativeprediction using a nonparametric random effects model. ICML, pages1185–1192, 2009.
M. Yuan and Y. Lin. Model selection and estimation in the gaussiangraphical model. Biometrika, 94:19–35, 2007.
Y. Zhang and J. Schneider. Learning multiple tasks with a sparsematrix-normal penalty. Advances in Neural Information ProcessingSystems, 23:2550–2558, 2010.