Post on 20-Dec-2015
transcript
Approximate Nearest NeighborsApproximate Nearest Neighborsand theand the
Fast Johnson-LindenstraussFast Johnson-LindenstraussTransformTransform
Nir AilonNir Ailon, Bernard Chazelle, Bernard Chazelle(Princeton University)(Princeton University)
Dimension ReductionDimension Reduction
Algorithmic metric embedding techniqueAlgorithmic metric embedding technique
(R(Rdd, L, Lqq) ) !! (R (Rkk, L, Lpp))
k << dk << d Useful in algorithms requiring exponential Useful in algorithms requiring exponential
(in d) time/space(in d) time/space
Johnson-Lindenstrauss for LJohnson-Lindenstrauss for L22
What is exact complexity?What is exact complexity?
Dimension Reduction ApplicationsDimension Reduction Applications
Approximate nearest neighbor [KOR00, IM98]…Approximate nearest neighbor [KOR00, IM98]… Text analysis [PRTV98]Text analysis [PRTV98] Clustering [BOR99, S00]Clustering [BOR99, S00] Streaming [I00]Streaming [I00] Linear algebra [DKM05, DKM06]Linear algebra [DKM05, DKM06]
Matrix multiplication Matrix multiplication SVD computationSVD computation LL22 regression regression
VLSI layout Design [V98]VLSI layout Design [V98] Learning [AV99, D99, V98] . . .Learning [AV99, D99, V98] . . .
Three Quick Slides on:Three Quick Slides on:Approximate Nearest Neighbor Approximate Nearest Neighbor
SearchingSearching. . . . . .
Approximate Nearest NeighborApproximate Nearest Neighbor
P = Set of n points
x
pmin
p
dist(x,p) · (1+)dist(x,pmin)
Approximate Nearest NeighborApproximate Nearest Neighbor
d can be very larged can be very large -approx beats “curse of dimensionality”-approx beats “curse of dimensionality” [IM98, H01] (Euclidean), [KOR00] (Cube):[IM98, H01] (Euclidean), [KOR00] (Cube):
Time O(Time O(-2-2d log n)d log n) Space nSpace nO(O(-2-2))
Bottleneck: Dimension reduction
Using FJLTO(d log d + -3 log2 n)
The d-Hypercube CaseThe d-Hypercube Case [KOR00][KOR00] Binary search on distance Binary search on distance 22 [d] [d] For distance For distance multiply space by random matrixmultiply space by random matrix
22 Z Z22k k ££ d d k=O( k=O(-- log n) log n)
ijij i.i.d. i.i.d. »» biased coin biased coin
Preprocess lookup tables for Preprocess lookup tables for x (mod 2)x (mod 2) Our observation: Our observation: can be made sparse can be made sparse
Using “handle” to pUsing “handle” to p22 P s.t. dist(x,p) P s.t. dist(x,p) Time for each step: O(Time for each step: O(-2-2d log n) d log n) )) O(d + O(d + -2-2 log n) log n)
How to make similar improvement for LHow to make similar improvement for L2 2 ??
Back to Euclidean Space andBack to Euclidean Space andJohnson-LindenstraussJohnson-Lindenstrauss. . . . . .
History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction
[JL84] [JL84] : Projection of R: Projection of Rdd onto random onto random
subspace of dimension k=c subspace of dimension k=c-2-2 log n log n w.h.p.:w.h.p.:
88 p pii,p,pjj 22 P P
|| || p pi i - - p pjj || ||2 2 = = (1±O((1±O() ||p) ||pi i - p- pjj||||22
LL22 !! L L22 embedding embedding
History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction
[FM87], [DG99][FM87], [DG99] Simplified proof, improved constant cSimplified proof, improved constant c 22 R Rk k ££ d d : random orthogonal matrix : random orthogonal matrix
1
2
k
||i||2=1
i ¢ j = 0
History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction
[IM98][IM98] 22 R Rkk££ d d : : ijij i.i.d. i.i.d. »» N(0,1/d) N(0,1/d)
1
2
k
E ||i||22=1
Ei ¢ j = 0
History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction
[A03][A03] Need only tight concentration of |Need only tight concentration of |ii ¢¢ v| v|22
22 R Rkk££ d d : : ijij i.i.d. i.i.d. » »
1
2
k
E ||i||22=1
Ei ¢ j = 0
+1 1/2 -1 1/2
History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction
[A03][A03] 22 R Rkk££ d d : : ijij i.i.d. i.i.d. » » SparseSparse
1
2
k
E ||i||22=1
Ei ¢ j = 0
+1 1/6 0 2/3 -1 1/6
00000000000000000000000
Sparse Johnson-LindenstraussSparse Johnson-Lindenstrauss
Sparsity parameter: s = Pr[ Sparsity parameter: s = Pr[ ijij 0 ] 0 ]
Cannot be o(1) due to “hidden coordinate”Cannot be o(1) due to “hidden coordinate”
0100
v = 2 Rd
00000000000000000000000000000000000000000000
Uncertainty PrincipleUncertainty Principle
v sparse v sparse )) v dense v dense
v = H vv = H v
^
^
- Walsh - Hadamard matrix- Fourier transform on {0,1}log2 d
- Computable in time O(d log d)
- Isometry: ||v||2 = ||v||2^
Adding RandomizationAdding Randomization
H deterministic, invertibleH deterministic, invertible)) We’re back to square one! We’re back to square one!
Precondition H with random diagonal DPrecondition H with random diagonal D
±1 ±1
±1
. . .D = - Computable in time O(d)- Isometry
The lThe l11-Bound Lemma-Bound Lemma
w.h.p.:w.h.p.:88 p pii,p,pjj 22 P P µµ R Rd d ::
||HD(p||HD(pi i - p- pjj)||)||11 ·· O(d O(d-1/2-1/2 log log1/21/2 n) ||p n) ||pi i - p- pjj||||2 2
Rules out:Rules out: HD(p HD(pi i – p– pjj) = “hidden coordinate vector” !!) = “hidden coordinate vector” !!
instead...instead...
Hidden Coordinate-SetHidden Coordinate-Set
Worst-case v = pWorst-case v = pi i - p- pjj (assuming l (assuming l11-bound):-bound):
88 j j J: |v J: |vjj| = | = (d(d-1/2-1/2 log log1/21/2 n) n)
88 j j J: vJ: vjj = 0= 0
J J µµ [d], |J| = [d], |J| = (d/log n)(d/log n)
(assume ||v||(assume ||v||22 = 1) = 1)
Fast J-L TransformFast J-L TransformFJLT = FJLT = H DH D
ij ij i.i.d i.i.d » » 0 1-sN(0,1) s
Diag(±1)HadamardSparseJL
l2 ! l1 l2 ! l2
-1 log nd
log2 nds s
Bottleneck:Bias of |i ¢ v|
Bottleneck:Variance of |i ¢ v|2
ApplicationsApplications
Approximate nearest neighbor in (RApproximate nearest neighbor in (Rdd, l, l22))
ll2 2 regression: regression:
minimize ||Ax-b||minimize ||Ax-b||22
A A 22 R Rn n ££ d d over-constrained: d<<n over-constrained: d<<n
[DMM06] approximate by sampling[DMM06] approximate by sampling
[Sarlos06] using FJLT [Sarlos06] using FJLT )) constructive constructive More applications...?More applications...?
non-constructivenon-constructive
Interesting Problem Interesting Problem II
Improvement & lower bound Improvement & lower bound
for J-L computationfor J-L computation
Interesting Problem Interesting Problem IIII Dimension reduction is samplingDimension reduction is sampling Sampling by random walk:Sampling by random walk:
Expander graphs for uniform samplingExpander graphs for uniform sampling Convex bodies for volume estimationConvex bodies for volume estimation
[Kac59]: Random walk on orthogonal group[Kac59]: Random walk on orthogonal groupfor t=1..T:for t=1..T:
pick i,j pick i,j 22RR [d], [d], 22RR [0,2 [0,2)) v vii v vii cos cos+ v+ vj j sin sin
vvjj -v -vii sin sin + v+ vj j coscos Output (vOutput (v11, ..., v, ..., vkk) as dimension reduction of v) as dimension reduction of v How many steps for J-L guarantee?How many steps for J-L guarantee? [CCL01], [DS00], [P99] . . .[CCL01], [DS00], [P99] . . .
Ã
Thank You!