Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Dimensionality reduction:
Johnson-Lindenstrauss lemma
for circulant matrices
Jan Vyb́ıral
Austrian Academy of Sciences
RICAM, Linz, Austria
April 2010Helmholtz Zentrum
Munich, Germany
partially joint work with Aicke Hinrichs (University of Jena, Germany)
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Outline
◮ Johnson-Lindenstrauss lemma
◮ Classical proof
◮ Variants and improvements
◮ Applications - Approximate nearest neighbours
◮ Circulant matrices
◮ Decoupling vs. Fourier transform
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Johnson-Lindenstrauss lemma
Let
◮ ε ∈ (0, 12),
◮ x1, . . . , xn ∈ Rd . . . arbitrary points,
◮ k = O(ε−2 log n), i.e. k ≥ Cε−2 log n.
There exists a (linear) mapping f : Rd → R
k such that
(1 − ε)||xi − xj ||22 ≤ ||f (xi) − f (xj )||22 ≤ (1 + ε)||xi − xj ||22
for all i , j ∈ {1, . . . , n}.Here || · ||2 stands for the Euclidean norm in R
d or Rk , respectively.
For example: n = 109, ε = .2, k = 4200, d arbitrary!
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Typical proofA ⊂ R
k×d - k × d matrices,
P - probability measure on AFor each y , ||y ||2 = 1: concentration of measure
P(A ∈ A : ||Ay ||2 > 1 + ε) ≤ exp(−ckε−2),
P(A ∈ A : ||Ay ||2 < 1 − ε) ≤ exp(−ckε−2).
Choosing
exp(−ckε−2) ≤ 1
n2,
the probability of failure (union bound) is smaller then
2 ·(
n
2
)
· 1
n2= 1 − 1
n< 1.
Hence, the probability of success is positive!
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
The condition
exp(−ckε−2) ≤ 1
n2
leads tockε2 ≥ 2 log n
and
k ≥ 2
c· ε−2 log n, i.e. C =
2
c.
By increasing C (= 3/c), we may achieve, that such a mappingbecomes ”typical”, i.e. occurs with probability at least 1 − 1/n.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Classical proof
W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitzmappings into a Hilbert space. Contem. Math., 26:189-206, 1984
Projection onto a “random” k-dimensional subspace satisfies thedesired property with positive probability
advantages: geometrical proof
disadvantages: measure on the set of all k-dimensional subspacesevaluating f (x) involves orthonormalisationtime consuming
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Variants and improvements
Elementary proof: S. Dasgupta and A. GuptaAn elementary proof of a theorem of Johnson and Lindenstrauss.Random. Struct. Algorithms, 22:60-65, 2003.
Improvements motivated by applications:
◮ Good running times of f (x)
◮ Small randomness used
◮ Small memory space used
◮ ...others...
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
D. Achlioptas, Database-friendly random projections:
Johnson-Lindenstrauss with binary coins.
J. Comput. Syst. Sci., 66(4):671-687, 2003.
f realised by a k × d matrix, where each entry is generatedindependently at random: Gaussian or Bernoulli (or similar)variables.
◮ Running time: k × d
◮ Randomness: k × d
◮ Memory space k × d
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
N. Ailon and B. Chazelle, Approximate nearest neighbors and
the fast Johnson-Lindenstrauss transform. In Proc. 38th
Annual ACM Symposium on Theory of Computing, 2006.
f (x) = PHDx , where
◮ P is a k × d matrix, where each component is generatedindependently at randomPi ,j = N(0, 1) with probability
q = min
{
Θ
(log2 n
d
)
, 1
}
Pi ,j = 0 with probability 1 − q,
◮ H is the d × d normalised Hadamard matrix,
◮ D is a random d × d diagonal matrix, with each Di ,i drawnindependently from {−1, 1} with probability 1/2.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Runnig time: With high probability, f (x) may be calculated in timeO(d log d + qdk)
Randomness: k × d
Memory space: with high probability O(d + kdq) = O(d + k log2 n)
Not easy to implement
Other variants and improvements. . .
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Approximate nearest neighbors
The nearest neighbor problem: Given P = {x1, . . . , xn} in a metricspace X , preprocess P so as to efficiently find the minimiser of
mini=1,...,n
d(xi , q), q ∈ X .
Naive algorith: compare all the distances - no preprocessing.
The approximate nearest neighbor problem: GivenP = {x1, . . . , xn} in a metric space X and ε > 0, preprocess P soas to efficiently find p ∈ P , such that
d(p, q) ≤ (1 + ε)d(p′, q), p′ ∈ X .
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
X = Rd , hashing functions:
Choose randomly v ∈ Rd and pre-compute hv (i) := vTxi ,
i = 1, . . . , n.Find
argminp∈P |vT (p − q)|.
Iterate over different v1, v2, . . .
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Connection to compressed sensing - RIP
We say, that A ∈ Rn×N satisfies the Restricted Isometry Property
of order k, if there exists δk ∈ (0, 1), such that
(1 − δk)||x ||22 ≤ ||Ax ||22 ≤ (1 + δk)||x ||22
holds for all x ∈ RN with ||x ||0 := #{j = 1, . . . ,N : xj 6= 0} ≤ k.
The aim is to find matrices with small δk for large k.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
R. Baraniuk, M. Davenport, R. DeVore and M. Wakin, A
simple proof of the Restricted Isometry Property for Random
Matrices, Constructive Approximation, 2008.If
Pω
(||A(ω)x ||22 ≥ (1 + ε)||x ||22
)≤ exp(−nc(ε))
(and the same for ≤) and δ > 0, then A(ω) statisfies RIP for
k ≤ c ′(δ)n
log(N/n) + 1
and δ with exponential high probability.
Every distribution that yields J-L transforms, yields alsoRIP-matrices.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Circulant matrices
a = (a0, . . . , ad−1) be i.i.d. random variables
Ma,k =
a0 a1 a2 . . . ad−1
ad−1 a0 a1 . . . ad−2
ad−2 ad−1 a0 . . . ad−3...
......
. . ....
ad−k+1 ad−k+2 ad−k+3 . . . ad−k
∈ Rk×d
Is it possible to take f (x) = 1√kMa,kx? Or f (x) = 1√
kMa,kDκx?
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Decoupling vs. Fourier transform
Yes! With k = O(ε−2 log3 n) - decoupling techniques
Yes! With k = O(ε−2 log2 n) - Fourier-analytic methods
The improvement to O(ε−2 log n) is still open. . . promising numerical experiments
advantages: running time O(d log d) - using FFTrandomness used 2d instead of (k + 1)deasy to implement: FFT is a part of every softwarepackage
disadvantage: up to now - bigger k
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
log3 n:
Let
◮ x1, . . . , xn be arbitrary points in Rd ,
◮ ε ∈ (0, 12),
◮ k = O(ε−2 log3 n),
◮ a = (a0, . . . , ad−1) be independent Bernoulli variables orindependent normally distributed variables,
◮ Ma,k and Dκ be as above and
◮ f (x) = 1√kMa,kDκx .
Then with probability at least 2/3 the following holds
(1−ε)||xi−xj ||22 ≤ ||f (xi )−f (xj)||22 ≤ (1+ε)||xi−xj ||22, i , j = 1, . . . , n.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Strategy of the proof of log3 n
-decoupling the dependenceConcentration inequalities for every fixed x with ||x ||2 = 1:
Pa,κ
(
||Ma,kDκx ||22 ≥ (1 + ε)k)
≤ exp(−c(kε2)1/3)
and
Pa,κ
(
||Ma,kDκx ||22 ≤ (1 − ε)k)
≤ exp(−c(kε2)1/3).
Then union bound over all n(n − 1)/2 pairs of points.
The bound on k is given by
2 · n(n − 1)
2· exp(−c(kε2)1/3) < 1.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Separation of the diagonal and the off-diagonal term
||Ma,kDκx ||22 =
k−1∑
j=0
(d−1∑
i=0
aiκj+ixj+i
)2
= I + II
I =d−1∑
i=0
a2i ·
k−1∑
j=0
x2j+i
︸ ︷︷ ︸
diagonal
, II =k−1∑
j=0
∑
i 6=i ′
aiai ′κj+iκj+i ′xj+ixj+i ′
︸ ︷︷ ︸
off −diagonal
. . . summation in the index is modulo d . . .
Pa,κ
(
||Ma,kDκx ||22 ≥ (1+ε)k)
≤ Pa(I ≥ (1+ε/2)k)+Pa,κ(II ≥ εk/2)
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Estimates of I : Pa(I ≥ (1 + ε/2)k)Lemma of B. Laurent and P. Massart
. . . or any other variant of Bernstein’s ineqaulity
Exponential concentration of
Z =D∑
i=1
αi(a2i − 1),
where ai are i.i.d. normal variables and αi are nonnegative realnumbers. Then for any t > 0
P(Z ≥ 2||α||2√
t + 2||α||∞t) ≤ exp(−t),
P(Z ≤ −2||α||2√
t) ≤ exp(−t).
αi :=∑k−1
j=0 x2j+i , ||α||1 = k, ||α||∞ ≤ 1 and ||α||2 ≤
√k.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Estimates of II :
Decoupling lemma of Bourgain and Tzafriri:
Let ξ0, . . . , ξd−1 be independent random variables withE ξ0 = · · · = E ξd−1 = 0 and let {xi ,j}d−1
i ,j=0 be a double sequence ofreal numbers. Then for 1 ≤ p < ∞
E
∣∣∣∣
∑
i 6=j
xi ,jξiξj
∣∣∣∣
p
≤ 4pE
∣∣∣∣
∑
i 6=j
xi ,jξiξ′j
∣∣∣∣
p
,
where (ξ′0, . . . , ξ′d−1) denotes an independent copy of
(ξ0, . . . , ξd−1).
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Further tools:Two times Khintchine’s inequalities and
(
Ea,a′
∣∣∣
k−1∑
j=0
aja′j
∣∣∣
p)1/p
≤√
p(k + p),
for both a Bernoulli or Gaussian variables.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
The role of Dκ
k ≤ d , a0, . . . , ad−1 independent normal variables
x =1√d
(1, . . . , 1), ||Ma,kx ||22 = k(d−1∑
j=0
aj√d
)2
2-stability:
b :=d−1∑
j=0
aj√d≈ N(0, 1)
Pa
(
||Ma,kx ||22 > (1 + ε)k)
= Pb
(
b2 > (1 + ε))
depends neither on k nor on d
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
log2 n
Let
◮ x1, . . . , xn be arbitrary points in Rd ,
◮ ε ∈ (0, 12),
◮ k = O(ε−2 log2 n),
◮ a = (a0, . . . , ad−1) be independent normally distributedvariables,
◮ Ma,k and Dκ be as above and
◮ f (x) = 1√kMa,kDκx .
Then with probability at least 2/3 the following holds
(1−ε)||xi−xj ||22 ≤ ||f (xi )−f (xj)||22 ≤ (1+ε)||xi−xj ||22, i , j = 1, . . . , n.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Fourier methods
F - unitary discrete Fourier transform, F : Cd → C
d
Every circulant matrix may be diagonalised by F and F−1
Ma,dx = Fdiag(√
dFa)F−1x .
The singular values are the square roots of the eigenvalues of
Ma,dM∗a,d = Fdiag(
√dFa)diag(
√dFa)F−1 = Fdiag(d |Fa|2)F−1
i.e.√
d |Fa|.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Strategy of the proof of log2 n
Concentration inequalities for all x̃ =xi−xj
||xi−xj ||2
Pa
(||Ma,kDκx̃ ||22 ≥ 2(1 + ε)k
)≤ exp
(
− ckε2
log n
)
,
Pa
(||Ma,kDκx̃ ||22 ≤ 2(1 − ε)k
)≤ exp
(
− ckε2
log n
)
.
From this, the result follows again by a union bound.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Let ||x ||2 = 1.
y j := S j(Dκx) ∈ Cd , j = 0, . . . , k − 1,
where S is the shift operator
S : Cd → C
d , S(z0, . . . , zd−1) = (z1, . . . , zd−1, z0).
Y . . . k × d matrix with rows y0, . . . , yk−1.Note, that ||Ma,kDκx ||22 = ||Ya||22
Hence,
P(||Ma,kDκx ||22 ≥ (1 + ε)k
)= P
(||Ya||22 ≥ (1 + ε)k
).
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
Let Y = UΣV be the singular value decomposition of Y .Then
||Ya||22 = ||UΣVa||22 = ||ΣVa||22 = ||Σb||22,where b := Va is a k-dimanesional vector of independent normalvariables.Hence,
P(||Ma,kDκx ||22 ≥ (1 + ε)k
)= P
(k−1∑
j=0
λ2j b
2j ≥ (1 + ε)k
),
where λj are the singular values of Y .Lemma of B. Laurent and P. Massart: Estimate ||λ||4 and ||λ||∞!
||λ||22 = ||Y ||F = k and ||λ||2∞ ≤ c log n implies
||λ||44 ≤ c k log n.
Outline J-L lemma Classical proof Variants Applications Circulant matrices Decoupling vs. Fourier
References:
W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz
mappings into a Hilbert space. Contem. Math., 26:189-206, 1984
S. Dasgupta and A. Gupta, An elementary proof of a theorem of
Johnson and Lindenstrauss. Random. Struct. Algorithms,22:60-65, 2003.
N. Ailon and B. Chazelle, Approximate nearest neighbors and the
fast Johnson-Lindenstrauss transform. In Proc. 38th Annual ACM
Symposium on Theory of Computing, 2006.
A. Hinrichs and J. Vyb́ıral, Johnson-Lindenstrauss lemma for
circulant matrices, http://arxiv.org/abs/1001.4919
J. Vyb́ıral, A variant of the Johnson-Lindenstrauss lemma for
circulant matrices, http://arxiv.org/abs/1002.2847