+ All Categories
Home > Documents > Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Date post: 01-Oct-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
65
Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007
Transcript
Page 1: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Aditya Menon

August 23, 2007

Page 2: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Outline

1 IntroductionDimensionality reductionThe Johnson-Lindenstrauss LemmaSpeeding up computation

2 The Fast Johnson-Lindenstrauss TransformSparser projectionsTrouble with sparse vectors?Summary

3 Ailon and Liberty’s improvementBounding the mappingThe Walsh-Hadamard transformError-correcting codesPutting it together

4 References

Page 3: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

Dimensionality reduction

Distances

For high-dimensional vector data, it is of interest to have anotion of distance between two vectors

Recall that the `p norm of a vector x is

||x||p =(∑

|xi |p)1/p

The `2 norm corresponds to the standard Euclidean norm of avector

The `∞ norm is the maximal absolute value of any component

||x||∞ = maxi|xi |

Page 4: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

Dimensionality reduction

Dimensionality reduction

Suppose we’re given an input vector x ∈ Rd

We want to reduce the dimensionality of x to some k < d ,while preserving the `p norm

Can think of this as a metric embedding problem - can weembed `d

p into `kp?

Formally, we have the following problem

Problem

Suppose we are given an x ∈ Rd , and some parameters p, ε. Canwe find a y ∈ Rk for some k = f (ε) so that

(1− ε)||x||p ≤ ||y||p ≤ (1 + ε)||x||p

Page 5: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma [5] is the archetypal resultfor `2 dimensionality reduction

Tells us that for n points, there is an ε-embedding of

`d2 → `

O(log n/ε2)2

Theorem

Suppose uii=1...n ∈ Rn×d . Then, for ε > 0 and k = O(log n/ε2),there is a mapping f : Rd → Rk so that

(∀i , j)(1− ε)||ui − uj||2 ≤ ||f (ui)− f (uj)||2 ≤ (1 + ε)||ui − uj||2

Page 6: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

The Johnson-Lindenstrauss Lemma

Johnson-Lindenstrauss in practice

Proof of Johnson-Lindenstrauss lemma is non-constructive(unfortunately!)

In practise, we use the probabilistic method to do aJohnson-Lindenstrauss style reduction

Insert randomness at the cost of an exact guarantee

Now the guarantee becomes probabilistic

Page 7: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

The Johnson-Lindenstrauss Lemma

Johnson-Lindenstrauss in practice

Standard version:

Theorem

Suppose uii=1...n ∈ Rn×d . Then, for ε > 0 andk = O(β log n/ε2), the mapping f (ui) = 1√

kuiR, where R is a

d × k matrix of i.i.d. Gaussian variables, satisfies with probabilityat least 1− 1

nβ ,

(∀i , j)(1− ε)||ui − uj||2 ≤ ||f (ui)− f (uj)||2 ≤ (1 + ε)||ui − uj||2

Page 8: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

Speeding up computation

Achlioptas’ improvement

Achlioptas [1] gave an ever simpler matrix construction:

Rij =√

3

+1 probability = 1

6

0 probability = 23

−1 probability = 16

23 rds sparse, and simpler to construct than a Gaussian matrix

With no loss in accuracy!

Page 9: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

Speeding up computation

A question

23 rds sparsity is a good speedup in practise

But density is still O(dk)Computing the mapping is still an O(dk) operationasymptotically

Let

A = A : ∀ unit x ∈ Rd , with v.h.p., (1−ε) ≤ ||Ax||2 ≤ (1+ε)

Question: For which A ∈ A can Ax be computed quickerthan O(dk)?

Page 10: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Introduction

Speeding up computation

The answer?

We look at two approaches that allow for quicker computation

First is the Fast Johnson-Lindenstrauss transform, based on aFourier transform

Next is the Ailon-Liberty Transform, based on a Fouriertransform and error correcting codes!

Page 11: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

The Fast Johnson-Lindenstrauss Transform

Ailon and Chazelle [2] proposed the FastJohnson-Lindenstrauss transform

Can speedup `2 reduction from O(dk) to (roughly) O(d log d)

How?

Make the projection matrix even sparserNeed some “tricks” to solve the problems associated with this

Let’s reverse engineer the construction...

Page 12: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Sparser projections

Sparser projection matrix

Use the projection matrix

P ∼

N(0, 1

q

)p = q

0 p = 1− q

where

q = min

Θ

(log2 n

d

), 1

Density of the matrix is O

(1ε2 min

log3 n, d log n

)In practise, this is typically significantly sparser thanAchlioptas’ matrix

Page 13: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

What do we lose?

Can follow standard concentration-proof methods

But we end up needing to assume that ||x||∞ is bounded -namely, that information is spread out

We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and asparse projection don’t mix well

So are we forced to choose between generality or usefulness?

Not if we try to insert randomness...

Page 14: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

What do we lose?

Can follow standard concentration-proof methods

But we end up needing to assume that ||x||∞ is bounded -namely, that information is spread out

We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and asparse projection don’t mix well

So are we forced to choose between generality or usefulness?

Not if we try to insert randomness...

Page 15: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

A clever idea

Can we randomly transform x so that

||Φ(x)||2 = ||x||2||Φ(x)||∞ is bounded with v.h.p.?

Answer: Yes! Use a Fourier transform Φ = FDistance preservingHas an “uncertainty principle” - a “signal” and its Fouriertransform cannot both be concentrated

Use the FFT to give an O(d log d) random mapping

Details on the specifics in next section...

Page 16: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

A clever idea

Can we randomly transform x so that

||Φ(x)||2 = ||x||2||Φ(x)||∞ is bounded with v.h.p.?

Answer: Yes! Use a Fourier transform Φ = FDistance preservingHas an “uncertainty principle” - a “signal” and its Fouriertransform cannot both be concentrated

Use the FFT to give an O(d log d) random mapping

Details on the specifics in next section...

Page 17: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

Applying a Fourier transform

Fourier transform will guarantee that

||x||∞ = ω(1) ⇐⇒ ||x||∞ = o(1)

But now we will be in trouble if the input is uniformlydistributed!

To deal with this, do a random sign change:

x = Dx

where D is a random diagonal ±1 matrix

Now we get a guarantee of spread with high probability, so the“random” Fourier transform gives us back generality

Page 18: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

Random sign change

The sign change mapping Dx will give us

x =

d1x1

d2x2...

ddxd

=

±x1

±x2...±xd

where the ± are attained with equal probability

Clearly norm preserving

Page 19: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Trouble with sparse vectors?

Putting it together

So, we compute the mapping f : x 7→ PF(Dx)

Runtime will be

O

(d log d + min

d log n

ε2,log3 n

ε2

)Under some loose conditions, runtime is

O(max

d log d , k3

)If k ∈

[Ω(log d),O(

√d)], this is quicker than the O(dk)

simple mapping

In practise, upper bound is reasonable, lower bound might notbe though

Page 20: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

The Fast Johnson-Lindenstrauss Transform

Summary

Summary

Tried increasing sparsity with disregard for generality

Used randomization to get back generality (probabilistically)

Key ingredient was a Fourier transform, with a randomizationstep first

Page 21: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Ailon and Liberty’s improvement

Ailon and Liberty [3] improved the runtime from O(d log d) toO(d log k), for k = O

(d1/2−δ

), δ > 0

Idea: Sparsity isn’t the only way to speedup computationtime

Can also speedup runtime when the projection matrix has aspecial structureSo find a matrix with a convenient structure and which willsatisfy the JL property

Page 22: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Operator norm

We need something called the operator norm in our analysis

The operator norm of a transformation matrix A is

||A||p→q = sup||x||p=1

||Ax||q

i.e. maximal q norm of the transformation of unit `p-normpoints

A fact we will need to employ:

||A||p1→p2 = ||AT ||q2→q1

where 1p1

+ 1q1

= 1, 1p2

+ 1q2

= 1

Page 23: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Reverse engineering

Let’s say the mapping is a matrix multiplication

In particular, say we have a mapping of the form

f : x 7→ BDx

where B is some k × d matrix with unit columns, and D is adiagonal matrix whose entries are randomly ±1

Doing a random sign change again

Now we just need to see what properties we will need B tosatisfy in order for

||BDx||2 ≈ ||x||2

Page 24: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

Bounding the mapping

Easy to see that

BDx =

B11d1x1 + . . . + B1dddxd...

Bk1d1x1 + . . . + Bkdddxd

Write as BDx = Mz, where

M(i) = xiB(i)

z =[d1 . . . dd

]TThere is a special name for a vector like Mz...

Page 25: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

Rademacher series

Definition

If M is an arbitrary k × d real matrix, and z ∈ Rd is so that

zi =

+1 p = 1/2

−1 p = 1/2

then Mz is called a Rademacher random variable. This is a vectorwhose entries are arbitrary sums/differences of each of the entiresin rows of M.

Such a variable is interesting because of a powerful theorem...

Page 26: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

Talagrand’s theorem

Theorem

Suppose M, z are as above. Let Z = ||Mz||p, and let

σ = ||M||2→p

µ = median(Z )

Then,Pr [|Z − µ| > t] ≤ 4e−t2/8σ2

(see [6])

σ (the “deviation”) is the maximal p-norm of all points on theunit circle

Theorem says that the norm of a Rademacher variable issharply concentrated about the median

Page 27: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

Implications for us

Our mapping, BDx, has given us a Rademacher randomvariable

We know that we can apply Talagrand’s theorem to get aconcentration result

So, all we need to do is find out what the median anddeviation are...

Page 28: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

Deviation

Let Y = ||BDx||2 = ||Mz||2Deviation is

σ = sup||y||2=1

||yTM||2

= sup

(d∑

i=1

x2i

(yTB(i)

)2)1/2

≤ ||x||4 sup

(d∑

i=1

(yTB(i))4

)1/4

by Cauchy-Schwartz

= ||x||4||BT ||2→4

Page 29: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

What do we need?

So, σ ≤ ||x||4||BT ||2→4

Fact: |1− µ| ≤√

32σ

Can combine to get

Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||24||BT ||22→4)

Result: We need to control both ||x||4 and ||BT ||2→4

i.e. we want them both to be smallIf we manage this, we’ve got our concentration bound

Page 30: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

What do we need?

So, σ ≤ ||x||4||BT ||2→4

Fact: |1− µ| ≤√

32σ

Can combine to get

Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||24||BT ||22→4)

Result: We need to control both ||x||4 and ||BT ||2→4

i.e. we want them both to be smallIf we manage this, we’ve got our concentration bound

Page 31: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small

How to control ||x||4?

Use repeated Fourier/Walsh-Hadamard transforms

How to control ||BT ||2→4?

Use error correcting codes

Page 32: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small

How to control ||x||4?Use repeated Fourier/Walsh-Hadamard transforms

How to control ||BT ||2→4?

Use error correcting codes

Page 33: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Bounding the mapping

The two ingredients

To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small

How to control ||x||4?Use repeated Fourier/Walsh-Hadamard transforms

How to control ||BT ||2→4?

Use error correcting codes

Page 34: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4small?

Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability

Problem: What is such a Φ?

(Final!) Solution: Back to the Fourier transform!

Page 35: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4small?

Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability

Problem: What is such a Φ?

(Final!) Solution: Back to the Fourier transform!

Page 36: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4small?

Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability

Problem: What is such a Φ?

(Final!) Solution: Back to the Fourier transform!

Page 37: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Controlling ||x||4

Problem: Input x is “adversarial” - so how to make ||x||4small?

Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability

Problem: What is such a Φ?

(Final!) Solution: Back to the Fourier transform!

Page 38: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

The Discrete Fourier transform

Discrete Fourier transform on a0, a1, . . . , aN−1 is

ak 7→N−1∑n=0

ane−2πikn/N

=N−1∑n=0

an

(e−2πik/N

)n

Can think of it as a polynomial evaluation - if

P(x) = a0 + a1x + a2x2 + . . . + aN−1x

N−1

then we haveak 7→ P

(e−2πik/N

)

Page 39: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

The finite-field Fourier transform

Notice that ωk = e−2πik/N 6= 1 satisfies (ωk)N = 1

ωk is a primitive root of 1

Transform isak 7→ P

(ωk)

for any primitive root ω

Page 40: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

The multi-dimensional Fourier transform

We can also consider the transform of multi-dimensional data

1-D case:

ak 7→N−1∑n=0

anωkn

υ-D case: If n = (n1, . . . , nυ),

ak 7→N−1∑

n1,...,nυ=0

anωk.n

Page 41: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

The Walsh-Hadamard transform

Consider the case N = 2, ω = −1 [7]:

ak1,k2 7→1∑

n1,n2=0

an1,n2(−1)k1n1+k2n2

This is called the Walsh-Hadamard transform

Intuition: Instead of using sinusoidal basis functions, usesquare-wave functions

The square waves are called Walsh-functions

Why not the standard discrete FT?

We use a technical property about the Walsh-Hadamardtransform matrix...

Page 42: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Fourier transform on the binary hyper-cube

Suppose we work with F2 = 0, 1We can encode the Fourier transform with theWalsh-Hadamard matrix Hd ,

Hd(i , j) =1

2d/2(−1)<i−1,j−1>

where < i , j > is the dot-product of i , j as expressed in binary

Fact:

Hd =1√2

[Hd/2 Hd/2

Hd/2 −Hd/2

]Corollary: We can compute Hd in O(d log d) time

Page 43: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Example of Hadamard matrix

When d = 4, we get

H4 =1

2

1 1 1 11 −1 1 −11 1 −1 −11 −1 −1 1

Note entries are always ±1

Page 44: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Fourier again?

Let Φ : x 7→ HdD0xD0 as before a random diagonal ±1 matrix

Already know that it will preserve the `2 norm

But is ||Φ(x)||4 small?

Answer: Yes - by another application of Talagrand’s theorem!

Page 45: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Fourier again?

Let Φ : x 7→ HdD0xD0 as before a random diagonal ±1 matrix

Already know that it will preserve the `2 norm

But is ||Φ(x)||4 small?

Answer: Yes - by another application of Talagrand’s theorem!

Page 46: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Towards Talagrand

Need σ, µ for Talagrand’s theorem

Write Φ(x) = Mz as before, where M(i) = xiH(i)

Estimate deviation:

σ = ||M||2→4

= ||MT || 43→2 (from earlier fact)

≤(∑

x4i

)1/4sup

||y||4/3=1

(∑(yTH(i)

)4)1/4

= ||x||4||H|| 43→4

Page 47: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Some magic

We now employ the following theorem [4]

Theorem

Haussdorf-Young theorem. For any p ∈ [1, 2], if H is theHadamard matrix, and 1

p + 1q = 1, then

||H||p→q ≤√

d .d−1p

As a result, for p = 43 ,

σ ≤ ||x||4d−1/4

Further, we have the following fact (see [3] for proof!)...

Fact: µ = O(

1d1/4

)

Page 48: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Getting the desired result

With the above σ, µ, an application of Talagrand, along withthe assumption k = O(d1/2−δ), reveals

||HD0x||4 ≤ c0d−1/4 + c1d

−δ/2||x||4

If we compose the mapping,

||HD1(HD0x)|| ≤ c0d−1/4 + c0c1d

−1/4−δ/2 + c21d−δ||x||4

If we repeat this r = 12δ times,

||HDr−1HDr−2 . . .HD0x||4 = O(d−1/4

)

Page 49: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

The Walsh-Hadamard transform

Our resultant transform

To control ||x||4, use the composed transform

Φ(r) : x 7→ HDr−1HDr−2 . . .HD0x

We manage to preserve ||x||2, and contract

||Φrx||4

Runtime is O(

d log dδ

)

Page 50: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Error-correcting codes

The Hadamard matrix also has a connection toerror-correcting codes

Such codes look to represent one’s message in such a waythat it can be decoded correctly even if there are some errorsduring transmission

Suppose we want to send out a message to a decoder whichallows for at most d errors

i.e. we can recover from d or less errors in the transmission

Fact: By choosing our “code-words” from the matrix[H2d

−H2d

], where −1 7→ 0, we can correct up to d errors

Page 51: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Code matrix

An m × d matrix A is called a code matrix if

A =

√d

m

Hd(i1, :)Hd(i2, :)

...Hd(im, :)

Picking out only m out of d rows of the Hadamard matrix

Page 52: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Independence in codes

A code matrix is called a-wise independent if exactly d2a

columns agree in a places

Independence is very useful for us:

Theorem

Suppose B is a k × d, 4-wise independent code matrix. Then,

||BT ||2→4 = O

(d1/4

√k

)

Page 53: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Proof of theorem

Recall that we need to bound

||BT ||2→4 = sup||y||2=1

||yTB||4

Consider:

||yTB||44 = dE[(yTB(j))4

]=

d

k2

∑i1

∑i2

∑i3

∑i4

E [yi1yi2yi3yi4b1b2b3b4]

=d

k2(3||y||42 − 2||y||44)

≤ 3d

k2

Consequently,

||BT ||2→4 ≤(3d)1/4

√k

Page 54: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Making our matrix

We’re set if we get a k × d , 4-wise independent code matrix

Problem: How do we make such a matrix?

Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)

Called the BCH code matrix

Which is good, because...

Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:

B =[BBCH BBCH . . . BBCH

]︸ ︷︷ ︸d

BCH(k)copies

Page 55: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Making our matrix

We’re set if we get a k × d , 4-wise independent code matrix

Problem: How do we make such a matrix?

Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)

Called the BCH code matrix

Which is good, because...

Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:

B =[BBCH BBCH . . . BBCH

]︸ ︷︷ ︸d

BCH(k)copies

Page 56: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Making our matrix

We’re set if we get a k × d , 4-wise independent code matrix

Problem: How do we make such a matrix?

Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)

Called the BCH code matrix

Which is good, because...

Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:

B =[BBCH BBCH . . . BBCH

]︸ ︷︷ ︸d

BCH(k)copies

Page 57: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Error-correcting codes

Time to make matrix

Time to compute the mapping x 7→ Bx?

We have to do dBCH(k) mappings BBCHxBCH

Each such mapping can be done via a Walsh-Hadamardtransform, by construction of BCH codes

Takes time O(BCH(k). log BCH(k))

Total runtime is therefore O(d log k)

Page 58: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

Merging results

Use the randomized Fourier transform to keep ||x||4 small

O(d log d) time

Use the error-correcting code matrix to keep ||B||2→4 small

O(d log k) time

Result: We get the concentration bound!

Page 59: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

Runtime

Runtime is still going to be O(d log d)

Question: Can we speed up the computation of Φ(r)?

Answer: Yes - use the same “block” idea as with theerror-correcting codes

Some rather technical calculation reveals this will still work

Page 60: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

Runtime

Runtime is still going to be O(d log d)

Question: Can we speed up the computation of Φ(r)?

Answer: Yes - use the same “block” idea as with theerror-correcting codes

Some rather technical calculation reveals this will still work

Page 61: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

Blocked transform

Choose β = BCH(k).kδ = Θ(k2+δ)

Let

H =

H1

H2

. . .

Hd/β

where each Hi is of size β × β

Fact: The above mapping can replace Φr

The mapping HD ′x can be computed in time O(d log k), soour total runtime is O(d log k)

Page 62: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

A tabular comparison

Runtimes of the three approaches (standard JL, Fast JLT, andAilon-Liberty) (from [3]):

k = o(log d)

k ∈ [ω(log d),

o(poly(d))]

k ∈ [Ω(poly(d)),

o((d log d)1/3)]

k ∈ [ω((d log d)1/3),

O(d1/2−δ)]

AL AL AL, FJLT ALJL FJLT FJLT

FJLT JL JL JL

Page 63: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

Ailon and Liberty’s improvement

Putting it together

Conclusion

`2 dimensionality reduction is based on theJohnson-Lindenstrauss lemma

The standard approach takes O(dk) time to perform thereduction

By sparsifying, and compensating with a randomized Fouriertransform, we can reduce the runtime to roughly O(d log d)via the Fast Johnson-Lindenstrauss transform [2]

By using error-correcting codes and a randomized Fouriertransform, we can reduce the runtime to roughly O(d log k)via Ailon and Liberty’s transform [3]

Open questions: Can one extend this to k = O(d1−δ)?k = Ω(d)?

Page 64: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

References

Achlioptas, D.

Database-friendly random projections.

In PODS ’01: Proceedings of the Twentieth ACMSIGMOD-SIGACT-SIGART Symposium on Principles of DatabaseSystems (New York, NY, USA, 2001), ACM Press, pp. 274–281.

Ailon, N., and Chazelle, B.

Approximate nearest neighbors and the fast Johnson-Lindenstrausstransform.

In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium onTheory of computing (New York, NY, USA, 2006), ACM Press,pp. 557–563.

Ailon, N., and Liberty, E.

Fast dimension reduction using Rademacher series on dual BCH codes.

Tech. Rep. TR07-070, Electronic Colloquium on ComputationalComplexity, 2007.

Bergh, J., and Lofstrom, J.

Interpolation Spaces.

Springer-Verlag, 1976.

Page 65: Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions

References

Johnson, W., and Lindenstrauss, J.

Extensions of Lipschitz mappings into a Hilbert space.

In Conference in Modern Analysis and Probability (Providence, RI, USA,1984), American Mathematical Society, pp. 189–206.

Ledoux, M., and Talagrand, M.

Probability in Banach Spaces: Isoperimetry and Processes.

Springer, 2006.

Massey, J. L.

Design and analysis of block ciphers.

http://www.win.tue.nl/math/eidma/courses/minicourses/massey/

dabcmay2000f3.pdf, May 2000.

Presentation.


Recommended