Steinrsquos Method for Matrix Concentration
Lester Mackeydagger
CollaboratorsMichael I Jordandagger Richard Y Chenlowast Brendan Farrelllowast and Joel A Tropplowast
daggerUniversity of California Berkeley lowastCalifornia Institute of Technology
BEARS 2012
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 1 20
Motivation
Concentration Inequalities
Matrix concentration
PX minus EX ge t le δ
Pλmax(X minus EX) ge t le δ
Non-asymptotic control of random matrices with complexdistributions
Applications
Matrix estimation from sparse random measurements(Gross 2011 Recht 2009 Mackey Talwalkar and Jordan 2011)
Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)
Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)
Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 2 20
Motivation
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)
Deep results from matrix analysis
Sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration
rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20
Motivation
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Extensions
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Schatten p-norm Bp =(
tr|B|p)1p
for p ge 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Motivation
Concentration Inequalities
Matrix concentration
PX minus EX ge t le δ
Pλmax(X minus EX) ge t le δ
Non-asymptotic control of random matrices with complexdistributions
Applications
Matrix estimation from sparse random measurements(Gross 2011 Recht 2009 Mackey Talwalkar and Jordan 2011)
Randomized matrix multiplication and factorization(Drineas Mahoney and Muthukrishnan 2008 Hsu Kakade and Zhang 2011b)
Convex relaxation of robust or chance-constrained optimization(Nemirovski 2007 So 2011 Cheung So and Wang 2011)
Random graph analysis (Christofides and Markstrom 2008 Oliveira 2009)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 2 20
Motivation
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)
Deep results from matrix analysis
Sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration
rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20
Motivation
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Extensions
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Schatten p-norm Bp =(
tr|B|p)1p
for p ge 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Motivation
Concentration Inequalities
Matrix concentration
Pλmax(X minus EX) ge t le δ
Difficulty Matrix multiplication is not commutative
Past approaches (Oliveira 2009 Tropp 2011 Hsu Kakade and Zhang 2011a)
Deep results from matrix analysis
Sums of independent matrices and matrix martingales
This work
Steinrsquos method of exchangeable pairs (1972) as advanced byChatterjee (2007) for scalar concentration
rArr Improved exponential tail inequalities (Hoeffding Bernstein)rArr Polynomial moment inequalities (Khintchine Rosenthal)rArr Dependent sums and more general matrix functionals
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 3 20
Motivation
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Extensions
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Schatten p-norm Bp =(
tr|B|p)1p
for p ge 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Motivation
Roadmap
1 Motivation
2 Steinrsquos Method Background and Notation
3 Exponential Tail Inequalities
4 Polynomial Moment Inequalities
5 Extensions
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 4 20
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Schatten p-norm Bp =(
tr|B|p)1p
for p ge 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Background
Notation
Hermitian matrices Hd = A isin Cdtimesd A = AlowastAll matrices in this talk are Hermitian
Maximum eigenvalue λmax(middot)Trace trB the sum of the diagonal entries of B
Spectral norm B the maximum singular value of B
Schatten p-norm Bp =(
tr|B|p)1p
for p ge 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 5 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Background
Matrix Stein Pair
Definition (Exchangeable Pair)
(ZZ prime) is an exchangeable pair if (ZZ prime)d= (Z prime Z)
Definition (Matrix Stein Pair)
Let (ZZ prime) be an auxiliary exchangeable pair and let Ψ Z rarr Hd
be a measurable function Define the random matrices
X = Ψ(Z) and X prime = Ψ(Z prime)
(XX prime) is a matrix Stein pair with scale factor α isin (0 1] if
E[X prime |Z] = (1minus α)X
Matrix Stein pairs are exchangeable pairs
Matrix Stein pairs always have zero mean
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 6 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Background
The Conditional Variance
Definition (Conditional Variance)
Suppose that (XX prime) is a matrix Stein pair with scale factor αconstructed from the exchangeable pair (ZZ prime) The conditional
variance is the random matrix
∆X = ∆X(Z) =1
2αE[
(X minusX prime)2 |Z]
∆X is a stochastic estimate for the variance EX2
Control over ∆X yields control over λmax(X)
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 7 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Exponential Concentration for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let (XX prime) be a matrix Stein pair with X isin Hd Suppose that
∆X 4 cX + v I almost surely for c v ge 0
Then for all t ge 0
Pλmax(X) ge t le d middot exp minust2
2v + 2ct
Control over the conditional variance ∆X yields
Gaussian tail for λmax(X) for small t Poisson tail for large t
When d = 1 reduces to scalar result of Chatterjee (2007)
The dimensional factor d cannot be removed
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 8 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Application Matrix Hoeffding Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (Yk)kge1 be independent matrices in Hd satisfying
EYk = 0 and Y 2
k 4 A2
k
for deterministic matrices (Ak)kge1 Define the variance parameter
σ2 =1
2
∥
∥
∥
sum
k
(
A2
k + EY 2
k
)
∥
∥
∥
Then for all t ge 0
P
λmax
(
sum
kYk
)
ge t
le d middot eminust22σ2
Improves upon the matrix Hoeffding inequality of Tropp (2011)Optimal constant 12 in the exponentVariance parameter σ2 smaller than the bound
∥
∥
sum
k A2k
∥
∥
Tighter than classical Hoeffding inequality (1963) when d = 1
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 9 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
1 Matrix Laplace transform method (Ahlswede amp Winter 2002)
Relate tail probability to the trace of the mgf of X
Pλmax(X) ge t le infθgt0
eminusθt middotm(θ)
where m(θ) = E tr eθX
How to bound the trace mgf
Past approaches Golden-Thompson Liebrsquos concavity theorem
Chatterjeersquos strategy for scalar concentration
Control mgf growth by bounding derivative
mprime(θ) = E trXeθX for θ isin R
Rewrite using exchangeable pairs
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 10 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Method of Exchangeable Pairs
Lemma
Suppose that (XX prime) is a matrix Stein pair with scale factor α LetF Hd rarr Hd be a measurable function satisfying
E(X minusX prime)F (X) lt infin
Then
E[X F (X)] =1
2αE[(X minusX prime)(F (X)minus F (X prime))] (1)
Intuition
Can characterize the distribution of a random matrix byintegrating it against a class of test functions F
Eq 1 allows us to estimate this integral using the smoothnessproperties of F and the discrepancy X minusX prime
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 11 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
2 Method of Exchangeable Pairs
Rewrite the derivative of the trace mgf
mprime(θ) = E trXeθX =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
Goal Use the smoothness of F (X) = eθX to bound the derivative
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 12 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Mean Value Trace Inequality
Lemma (Mackey Jordan Chen Farrell and Tropp 2012)
Suppose that g R rarr R is a weakly increasing function and thath R rarr R is a function whose derivative hprime is convex For allmatrices AB isin Hd it holds that
tr[(g(A)minus g(B)) middot (h(A)minus h(B))] le1
2tr[(g(A)minus g(B)) middot (AminusB) middot (hprime(A) + hprime(B))]
Standard matrix functions If g R rarr R then
g(A) = Q
g(λ1)
g(λd)
Qlowast when A = Q
λ1
λd
Qlowast
Inequality does not hold without the trace
For exponential concentration we let g(A) = A and h(B) = eθB
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 13 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Exponential Tail Inequalities
Exponential Concentration Proof Sketch
3 Mean Value Trace Inequality
Bound the derivative of the trace mgf
mprime(θ) =1
2αE tr
[
(X minusX prime)(
eθX minus eθXprime)]
le θ
4αE tr
[
(X minusX prime)2 middot(
eθX + eθXprime)]
= θ middot E tr[
∆X eθX]
4 Conditional Variance Bound ∆X 4 cX + v I
Yields differential inequality
mprime(θ) le cθ middotmprime(θ) + vθ middotm(θ)
Solve to bound m(θ) and thereby bound Pλmax(X) ge t
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 14 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Polynomial Moment Inequalities
Polynomial Moments for Random Matrices
Theorem (Mackey Jordan Chen Farrell and Tropp 2012)
Let p = 1 or p ge 15 Suppose that (XX prime) is a matrix Stein pairwhere EX2p
2p lt infin Then(
EX2p2p
)12p leradic
2pminus 1 middot(
E∆Xpp)12p
Moral The conditional variance controls the moments of X
Generalizes Chatterjeersquos version (2007) of the scalarBurkholder-Davis-Gundy inequality (Burkholder 1973)
See also Pisier amp Xu (1997) Junge amp Xu (2003 2008)
Proof techniques mirror those for exponential concentration
Also holds for infinite dimensional Schatten-class operators
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 15 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Polynomial Moment Inequalities
Application Matrix Khintchine Inequality
Corollary (Mackey Jordan Chen Farrell and Tropp 2012)
Let (εk)kge1 be an independent sequence of Rademacher randomvariables and (Ak)kge1 be a deterministic sequence of Hermitianmatrices Then if p = 1 or p ge 15
(
E
∥
∥
∥
sum
kεkAk
∥
∥
∥
2p
2p
)12p
leradic
2pminus 1 middot∥
∥
∥
∥
(
sum
kA2
k
)12∥
∥
∥
∥
2p
Noncommutative Khintchine inequality (Lust-Piquard 1986 Lust-Piquard
and Pisier 1991) is a dominant tool in applied matrix analysis
eg Used in analysis of column sampling and projection forapproximate SVD (Rudelson and Vershynin 2007)
Steinrsquos method offers an unusually concise proof
The constantradic2pminus 1 is within
radice of optimal
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 16 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Extensions
Extensions
Refined Exponential Concentration
Relate trace mgf of conditional variance to trace mgf of XYields matrix generalization of classical Bernstein inequalityOffers tool for unbounded random matrices
General Complex Matrices
Map any matrix B isin Cd1timesd2 to a Hermitian matrix via dilation
D(B) =
[
0 B
Blowast0
]
isin Hd1+d2
Preserves spectral information λmax(D(B)) = BDependent Sequences
Sums of conditionally zero-mean random matricesCombinatorial matrix statistics (eg sampling wo replacement)Matrix-valued functions satisfying a self-reproducing property
Yields a dependent bounded differences inequality for matricesMackey (UC Berkeley) Steinrsquos Method BEARS 2012 17 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Extensions
The End
Thanks
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 18 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Extensions
References IAhlswede R and Winter A Strong converse for identification via quantum channels IEEE Trans Inform Theory 48(3)
569ndash579 Mar 2002
Burkholder D L Distribution function inequalities for martingales Ann Probab 119ndash42 1973 doi101214aop1176997023
Chatterjee S Steinrsquos method for concentration inequalities Probab Theory Related Fields 138305ndash321 2007
Cheung S-S So A Man-Cho and Wang K Chance-constrained linear matrix inequalities with dependent perturbations Asafe tractable approximation approach Available at httpwwwsecuhkeduhk~manchosopaperscclmi_stapdf 2011
Christofides D and Markstrom K Expansion properties of random cayley graphs and vertex transitive graphs via matrixmartingales Random Struct Algorithms 32(1)88ndash100 2008
Drineas P Mahoney M W and Muthukrishnan S Relative-error CUR matrix decompositions SIAM Journal on Matrix
Analysis and Applications 30844ndash881 2008
Gross D Recovering low-rank matrices from few coefficients in any basis IEEE Trans Inform Theory 57(3)1548ndash1566 Mar2011
Hoeffding W Probability inequalities for sums of bounded random variables Journal of the American Statistical Association58(301)13ndash30 1963
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matrices Available atarXiv11041672 2011a
Hsu D Kakade S M and Zhang T Dimension-free tail inequalities for sums of random matricesarXiv11041672v3[mathPR] 2011b
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities Ann Probab 31(2)948ndash995 2003
Junge M and Xu Q Noncommutative BurkholderRosenthal inequalities II Applications Israel J Math 167227ndash282 2008
Lust-Piquard F Inegalites de Khintchine dans Cp (1 lt p lt infin) C R Math Acad Sci Paris 303(7)289ndash292 1986
Lust-Piquard F and Pisier G Noncommutative Khintchine and Paley inequalities Ark Mat 29(2)241ndash260 1991
Mackey L Talwalkar A and Jordan M I Divide-and-conquer matrix factorization In Advances in Neural Information
Processing Systems 24 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 19 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20
Extensions
References II
Mackey L Jordan M I Chen R Y Farrell B and Tropp J A Matrix concentration inequalities via the method ofexchangeable pairs Available at arXiv Jan 2012
Nemirovski A Sums of random symmetric matrices and quadratic optimization under orthogonality constraints Math
Program 109283ndash317 January 2007 ISSN 0025-5610 doi 101007s10107-006-0033-0 URLhttpdlacmorgcitationcfmid=12297161229726
Oliveira R I Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges Availableat arXiv09110600 Nov 2009
Pisier G and Xu Q Non-commutative martingale inequalities Comm Math Phys 189(3)667ndash698 1997
Recht B A simpler approach to matrix completion arXiv09100651v2[csIT] 2009
Rudelson M and Vershynin R Sampling from large matrices An approach through geometric functional analysis J Assoc
Comput Mach 54(4)Article 21 19 pp Jul 2007 (electronic)
So A Man-Cho Moment inequalities for sums of random matrices and their applications in optimization Math Program 130(1)125ndash151 2011
Stein C A bound for the error in the normal approximation to the distribution of a sum of dependent random variables InProc 6th Berkeley Symp Math Statist Probab Berkeley 1972 Univ California Press
Tropp J A User-friendly tail bounds for sums of random matrices Found Comput Math August 2011
Mackey (UC Berkeley) Steinrsquos Method BEARS 2012 20 20