+ All Categories
Home > Documents > Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf ·...

Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf ·...

Date post: 19-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
127
High Dimensional Spaces Foundations of Data Science Course Ramesh Hariharan Jan 2014 Ramesh Hariharan High Dimensional Spaces
Transcript
Page 1: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

High Dimensional SpacesFoundations of Data Science Course

Ramesh Hariharan

Jan 2014

Ramesh Hariharan High Dimensional Spaces

Page 2: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

Ramesh Hariharan High Dimensional Spaces

Page 3: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

Ramesh Hariharan High Dimensional Spaces

Page 4: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

Ramesh Hariharan High Dimensional Spaces

Page 5: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

Ramesh Hariharan High Dimensional Spaces

Page 6: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

Ramesh Hariharan High Dimensional Spaces

Page 7: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

Ramesh Hariharan High Dimensional Spaces

Page 8: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

Ramesh Hariharan High Dimensional Spaces

Page 9: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

Ramesh Hariharan High Dimensional Spaces

Page 10: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

Ramesh Hariharan High Dimensional Spaces

Page 11: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

Ramesh Hariharan High Dimensional Spaces

Page 12: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Inductive View of fn

Ramesh Hariharan High Dimensional Spaces

Page 13: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Inductive Derivation for fn

fn = 2 fn−1∫ π

20 sinn(θ) dθ n ≥ 2

f1 = 2

fn = 2n ∫ π2

0 sinn(θ) dθ∫ π

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 14: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Inductive Derivation for fn

fn = 2 fn−1∫ π

20 sinn(θ) dθ n ≥ 2

f1 = 2

fn = 2n ∫ π2

0 sinn(θ) dθ∫ π

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 15: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Inductive Derivation for fn

fn = 2 fn−1∫ π

20 sinn(θ) dθ n ≥ 2

f1 = 2

fn = 2n ∫ π2

0 sinn(θ) dθ∫ π

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 16: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

0 sin3(θ) dθ∫ π

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

f4 =

24 ∫ π2

0 sin4(θ) dθ∫ π

20 sin3(θ) dθ

∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π2

2

Ramesh Hariharan High Dimensional Spaces

Page 17: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

0 sin3(θ) dθ∫ π

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

f4 =

24 ∫ π2

0 sin4(θ) dθ∫ π

20 sin3(θ) dθ

∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π2

2

Ramesh Hariharan High Dimensional Spaces

Page 18: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

0 sin3(θ) dθ∫ π

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

f4 =

24 ∫ π2

0 sin4(θ) dθ∫ π

20 sin3(θ) dθ

∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π2

2

Ramesh Hariharan High Dimensional Spaces

Page 19: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

0 sin3(θ) dθ∫ π

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

f4 =

24 ∫ π2

0 sin4(θ) dθ∫ π

20 sin3(θ) dθ

∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π2

2

Ramesh Hariharan High Dimensional Spaces

Page 20: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

0 sin3(θ) dθ∫ π

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

f4 =

24 ∫ π2

0 sin4(θ) dθ∫ π

20 sin3(θ) dθ

∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π2

2

Ramesh Hariharan High Dimensional Spaces

Page 21: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

0 sinn(θ)dθ = n−1n

n−3n−2 · · ·

12

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

π2n

Ramesh Hariharan High Dimensional Spaces

Page 22: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

0 sinn(θ)dθ = n−1n

n−3n−2 · · ·

12

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

π2n

Ramesh Hariharan High Dimensional Spaces

Page 23: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

0 sinn(θ)dθ = n−1n

n−3n−2 · · ·

12

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

π2n

Ramesh Hariharan High Dimensional Spaces

Page 24: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

0 sinn(θ)dθ = n−1n

n−3n−2 · · ·

12

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

π2n

Ramesh Hariharan High Dimensional Spaces

Page 25: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

0 sinn(θ)dθ = n−1n

n−3n−2 · · ·

12

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

π2n

Ramesh Hariharan High Dimensional Spaces

Page 26: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The biggest unit sphere sits in 5-d!

Ramesh Hariharan High Dimensional Spaces

Page 27: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The biggest unit sphere sits in 5-d!

Ramesh Hariharan High Dimensional Spaces

Page 28: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The biggest unit sphere sits in 5-d!

Ramesh Hariharan High Dimensional Spaces

Page 29: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The biggest unit sphere sits in 5-d!

Ramesh Hariharan High Dimensional Spaces

Page 30: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Unit Sphere vs theUnit Cube

Corners of a unitcube are distance√

n2 from the origin

Center points ofeach side aredistance 1

2 fromthe origin

It looks like this

Ramesh Hariharan High Dimensional Spaces

Page 31: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Unit Sphere vs theUnit Cube

Corners of a unitcube are distance√

n2 from the origin

Center points ofeach side aredistance 1

2 fromthe origin

It looks like this

Ramesh Hariharan High Dimensional Spaces

Page 32: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Where is the Volume Concentrated?

How much of the volume is located outside a band of angle 2αaround the equator?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

√π

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

Ramesh Hariharan High Dimensional Spaces

Page 33: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Where is the Volume Concentrated?

How much of the volume is located outside a band of angle 2αaround the equator?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

√π

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

Ramesh Hariharan High Dimensional Spaces

Page 34: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Where is the Volume Concentrated?

How much of the volume is located outside a band of angle 2αaround the equator?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

√π

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

Ramesh Hariharan High Dimensional Spaces

Page 35: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Where is the Volume Concentrated?

How much of the volume is located outside a band of angle 2αaround the equator?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

√π

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

Ramesh Hariharan High Dimensional Spaces

Page 36: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

=

∫ 1

sin2 α

12√

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

∫ 1

sin2 αe−y n−1

2 dy

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

Ramesh Hariharan High Dimensional Spaces

Page 37: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

=

∫ 1

sin2 α

12√

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

∫ 1

sin2 αe−y n−1

2 dy

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

Ramesh Hariharan High Dimensional Spaces

Page 38: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

=

∫ 1

sin2 α

12√

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

∫ 1

sin2 αe−y n−1

2 dy

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

Ramesh Hariharan High Dimensional Spaces

Page 39: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

=

∫ 1

sin2 α

12√

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

∫ 1

sin2 αe−y n−1

2 dy

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

Ramesh Hariharan High Dimensional Spaces

Page 40: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

π1

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

√2

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

a22

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

a22

Ramesh Hariharan High Dimensional Spaces

Page 41: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

π1

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

√2

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

a22

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

a22

Ramesh Hariharan High Dimensional Spaces

Page 42: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

π1

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

√2

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

a22

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

a22

Ramesh Hariharan High Dimensional Spaces

Page 43: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

π1

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

√2

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

a22

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

a22

Ramesh Hariharan High Dimensional Spaces

Page 44: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

π1

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

√2

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

a22

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

a22

Ramesh Hariharan High Dimensional Spaces

Page 45: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Do 2 Equators sum to more than the whole!

Ramesh Hariharan High Dimensional Spaces

Page 46: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 47: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 48: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 49: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 50: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 51: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Ramesh Hariharan High Dimensional Spaces

Page 52: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Inductive View of an

Ramesh Hariharan High Dimensional Spaces

Page 53: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Dot Product between a Fixed Unit Vector and a Random Unit Vector

A Spherically Symmetric Random Unit Vector:Probability of lying in any specific patch P on the surface isproportional to the area of P.

Dot Product is also the length of the projection of the fixed vectoron the random vector.

Dot Product equals cos(θ), where θ is the angle between the twovectors.

E(cos2(θ)), Var(cos2(θ)), and tail bounds on cos2(θ)?

Ramesh Hariharan High Dimensional Spaces

Page 54: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Dot Product between a Fixed Unit Vector and a Random Unit Vector

A Spherically Symmetric Random Unit Vector:Probability of lying in any specific patch P on the surface isproportional to the area of P.

Dot Product is also the length of the projection of the fixed vectoron the random vector.

Dot Product equals cos(θ), where θ is the angle between the twovectors.

E(cos2(θ)), Var(cos2(θ)), and tail bounds on cos2(θ)?

Ramesh Hariharan High Dimensional Spaces

Page 55: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Dot Product between a Fixed Unit Vector and a Random Unit Vector

A Spherically Symmetric Random Unit Vector:Probability of lying in any specific patch P on the surface isproportional to the area of P.

Dot Product is also the length of the projection of the fixed vectoron the random vector.

Dot Product equals cos(θ), where θ is the angle between the twovectors.

E(cos2(θ)), Var(cos2(θ)), and tail bounds on cos2(θ)?

Ramesh Hariharan High Dimensional Spaces

Page 56: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Dot Product between a Fixed Unit Vector and a Random Unit Vector

A Spherically Symmetric Random Unit Vector:Probability of lying in any specific patch P on the surface isproportional to the area of P.

Dot Product is also the length of the projection of the fixed vectoron the random vector.

Dot Product equals cos(θ), where θ is the angle between the twovectors.

E(cos2(θ)), Var(cos2(θ)), and tail bounds on cos2(θ)?

Ramesh Hariharan High Dimensional Spaces

Page 57: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

E(cos2(θ))

∫ π2

0 sinn−2(θ) cos2(θ) dθ∫ π2

0 sinn−2(θ) dθ

=

∫ π2

0 sinn−2(θ) dθ −∫ π

20 sinn(θ) dθ∫ π

20 sinn−2(θ) dθ

= 1− n − 1n

=1n

Ramesh Hariharan High Dimensional Spaces

Page 58: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

E(cos2(θ))

∫ π2

0 sinn−2(θ) cos2(θ) dθ∫ π2

0 sinn−2(θ) dθ

=

∫ π2

0 sinn−2(θ) dθ −∫ π

20 sinn(θ) dθ∫ π

20 sinn−2(θ) dθ

= 1− n − 1n

=1n

Ramesh Hariharan High Dimensional Spaces

Page 59: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

E(cos2(θ))

∫ π2

0 sinn−2(θ) cos2(θ) dθ∫ π2

0 sinn−2(θ) dθ

=

∫ π2

0 sinn−2(θ) dθ −∫ π

20 sinn(θ) dθ∫ π

20 sinn−2(θ) dθ

= 1− n − 1n

=1n

Ramesh Hariharan High Dimensional Spaces

Page 60: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Var(cos2(θ))

∫ π2

0 sinn−2(θ) cos4(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

=

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

0 sinn+2(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

= 1− 2n − 1

n+

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

n2

Ramesh Hariharan High Dimensional Spaces

Page 61: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Var(cos2(θ))

∫ π2

0 sinn−2(θ) cos4(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

=

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

0 sinn+2(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

= 1− 2n − 1

n+

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

n2

Ramesh Hariharan High Dimensional Spaces

Page 62: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Var(cos2(θ))

∫ π2

0 sinn−2(θ) cos4(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

=

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

0 sinn+2(θ) dθ∫ π2

0 sinn−2(θ) dθ− 1

n2

= 1− 2n − 1

n+

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

n2

Ramesh Hariharan High Dimensional Spaces

Page 63: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tail Bounds on cos2(θ)

Pr(cos2(θ) > a2

n ) =R cos−1( a√

n)

0 sinn−2(θ) dθR π2

0 sinn−2(θ) dθ

≤√

2(n−1)(n−2)π

1(n−3)ae−

n−32n a2 ∼

√2π

1ae−

a22

Ramesh Hariharan High Dimensional Spaces

Page 64: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tail Bounds on cos2(θ)

Pr(cos2(θ) > a2

n ) =R cos−1( a√

n)

0 sinn−2(θ) dθR π2

0 sinn−2(θ) dθ

≤√

2(n−1)(n−2)π

1(n−3)ae−

n−32n a2 ∼

√2π

1ae−

a22

Ramesh Hariharan High Dimensional Spaces

Page 65: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Projection Length of Fixed Unit Vector on Random Unit Vector

With probability 1−√

1ae−

a22 , the projected length is between 0

and a√n

With probability 0.946, the projected length is between 0 and 2√n

Can we drive the projected length to be much more tightlydistributed around 1√

n ?

Ramesh Hariharan High Dimensional Spaces

Page 66: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Projection Length of Fixed Unit Vector on Random Unit Vector

With probability 1−√

1ae−

a22 , the projected length is between 0

and a√n

With probability 0.946, the projected length is between 0 and 2√n

Can we drive the projected length to be much more tightlydistributed around 1√

n ?

Ramesh Hariharan High Dimensional Spaces

Page 67: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Projection Length of Fixed Unit Vector on Random Unit Vector

With probability 1−√

1ae−

a22 , the projected length is between 0

and a√n

With probability 0.946, the projected length is between 0 and 2√n

Can we drive the projected length to be much more tightlydistributed around 1√

n ?

Ramesh Hariharan High Dimensional Spaces

Page 68: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Project on to many Random Vectors

Let X1, . . . , Xk be the projection lengths on to k independentrandom unit vectors

The resulting k -tuple defines a mapping from n-dimensionalspace to k -dimensional space

X =√

X 21 + · · ·+ X 2

k is the length of the vector post-mapping

Consider X 2 = X 21 + · · ·+ X 2

k .

Ramesh Hariharan High Dimensional Spaces

Page 69: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Project on to many Random Vectors

Let X1, . . . , Xk be the projection lengths on to k independentrandom unit vectors

The resulting k -tuple defines a mapping from n-dimensionalspace to k -dimensional space

X =√

X 21 + · · ·+ X 2

k is the length of the vector post-mapping

Consider X 2 = X 21 + · · ·+ X 2

k .

Ramesh Hariharan High Dimensional Spaces

Page 70: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Project on to many Random Vectors

Let X1, . . . , Xk be the projection lengths on to k independentrandom unit vectors

The resulting k -tuple defines a mapping from n-dimensionalspace to k -dimensional space

X =√

X 21 + · · ·+ X 2

k is the length of the vector post-mapping

Consider X 2 = X 21 + · · ·+ X 2

k .

Ramesh Hariharan High Dimensional Spaces

Page 71: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Project on to many Random Vectors

Let X1, . . . , Xk be the projection lengths on to k independentrandom unit vectors

The resulting k -tuple defines a mapping from n-dimensionalspace to k -dimensional space

X =√

X 21 + · · ·+ X 2

k is the length of the vector post-mapping

Consider X 2 = X 21 + · · ·+ X 2

k .

Ramesh Hariharan High Dimensional Spaces

Page 72: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sums of Random Variables

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

1 )k

I.e., the distribution of X 2

k preserves the mean but is much tighteraround the mean.

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Ramesh Hariharan High Dimensional Spaces

Page 73: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sums of Random Variables

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

1 )k

I.e., the distribution of X 2

k preserves the mean but is much tighteraround the mean.

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Ramesh Hariharan High Dimensional Spaces

Page 74: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Sums of Random Variables

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

1 )k

I.e., the distribution of X 2

k preserves the mean but is much tighteraround the mean.

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Ramesh Hariharan High Dimensional Spaces

Page 75: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

n

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Ramesh Hariharan High Dimensional Spaces

Page 76: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

n

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Ramesh Hariharan High Dimensional Spaces

Page 77: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

n

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Ramesh Hariharan High Dimensional Spaces

Page 78: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

n

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Ramesh Hariharan High Dimensional Spaces

Page 79: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

n

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Ramesh Hariharan High Dimensional Spaces

Page 80: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Asymptotic Tight Concentration for X 2

By CLT, for k →∞, the distribution of X 2 =∑k

0 X 2i tends to

N(kn ,≤ 2k

n2 )

Pr(|X 2 − kn | ≥ εk

n ) should then be ≤√

4ε2kπ

e−ε2k

4

For k > 12 log mε2 , this is 1

m3

How do we show this for finite k?

Ramesh Hariharan High Dimensional Spaces

Page 81: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Asymptotic Tight Concentration for X 2

By CLT, for k →∞, the distribution of X 2 =∑k

0 X 2i tends to

N(kn ,≤ 2k

n2 )

Pr(|X 2 − kn | ≥ εk

n ) should then be ≤√

4ε2kπ

e−ε2k

4

For k > 12 log mε2 , this is 1

m3

How do we show this for finite k?

Ramesh Hariharan High Dimensional Spaces

Page 82: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Asymptotic Tight Concentration for X 2

By CLT, for k →∞, the distribution of X 2 =∑k

0 X 2i tends to

N(kn ,≤ 2k

n2 )

Pr(|X 2 − kn | ≥ εk

n ) should then be ≤√

4ε2kπ

e−ε2k

4

For k > 12 log mε2 , this is 1

m3

How do we show this for finite k?

Ramesh Hariharan High Dimensional Spaces

Page 83: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Asymptotic Tight Concentration for X 2

By CLT, for k →∞, the distribution of X 2 =∑k

0 X 2i tends to

N(kn ,≤ 2k

n2 )

Pr(|X 2 − kn | ≥ εk

n ) should then be ≤√

4ε2kπ

e−ε2k

4

For k > 12 log mε2 , this is 1

m3

How do we show this for finite k?

Ramesh Hariharan High Dimensional Spaces

Page 84: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tight Concentration and Tail Bound Inequalities

Markov’s Inequality for a non-negative random variable Y

Pr(Y > k) ≤ E(Y )/k

Chebychev’s Inequality

Pr(|X 2 − kn| ≥ ε

kn

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Not strong enough to yield negative exponential dependence onk .

Ramesh Hariharan High Dimensional Spaces

Page 85: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tight Concentration and Tail Bound Inequalities

Markov’s Inequality for a non-negative random variable Y

Pr(Y > k) ≤ E(Y )/k

Chebychev’s Inequality

Pr(|X 2 − kn| ≥ ε

kn

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Not strong enough to yield negative exponential dependence onk .

Ramesh Hariharan High Dimensional Spaces

Page 86: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tight Concentration and Tail Bound Inequalities

Markov’s Inequality for a non-negative random variable Y

Pr(Y > k) ≤ E(Y )/k

Chebychev’s Inequality

Pr(|X 2 − kn| ≥ ε

kn

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Not strong enough to yield negative exponential dependence onk .

Ramesh Hariharan High Dimensional Spaces

Page 87: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Tight Concentration and Tail Bound Inequalities

Markov’s Inequality for a non-negative random variable Y

Pr(Y > k) ≤ E(Y )/k

Chebychev’s Inequality

Pr(|X 2 − kn| ≥ ε

kn

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Not strong enough to yield negative exponential dependence onk .

Ramesh Hariharan High Dimensional Spaces

Page 88: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 89: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 90: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 91: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 92: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 93: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

)

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

n

E(e−tX 2i ) ≤?

Ramesh Hariharan High Dimensional Spaces

Page 94: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

E(e−tX 2i ) ≤?

Using 1− x ≤ e−x ≤ 1− x + x2

2 , for all x ≥ 0:

E(e−tX 2i ) ≤ E(1− tX 2

i + t2 X 4i

2)

≤ 1− tn

+3t2

2n2 ≤ e−tn (1− 3t

2n )

Ramesh Hariharan High Dimensional Spaces

Page 95: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

E(e−tX 2i ) ≤?

Using 1− x ≤ e−x ≤ 1− x + x2

2 , for all x ≥ 0:

E(e−tX 2i ) ≤ E(1− tX 2

i + t2 X 4i

2)

≤ 1− tn

+3t2

2n2 ≤ e−tn (1− 3t

2n )

Ramesh Hariharan High Dimensional Spaces

Page 96: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

2n )

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

6

Ramesh Hariharan High Dimensional Spaces

Page 97: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

2n )

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

6

Ramesh Hariharan High Dimensional Spaces

Page 98: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

2n )

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

6

Ramesh Hariharan High Dimensional Spaces

Page 99: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

2n )

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

6

Ramesh Hariharan High Dimensional Spaces

Page 100: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

2n )

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

6

Ramesh Hariharan High Dimensional Spaces

Page 101: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

)

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

n

Ramesh Hariharan High Dimensional Spaces

Page 102: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

)

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

n

Ramesh Hariharan High Dimensional Spaces

Page 103: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

)

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

n

Ramesh Hariharan High Dimensional Spaces

Page 104: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

)

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

n

Ramesh Hariharan High Dimensional Spaces

Page 105: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

)

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

n

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

n

Ramesh Hariharan High Dimensional Spaces

Page 106: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Upper Tail Bound for X 2

Setting y = cos2θ.∫ π2

0 sinn−2 θet cos2 θ dθ∫ π2

0 sinn−2 θ dθ≤

√2(n − 1)

π

12

∫ 1

0

(1− y)n−3

2 ety√

ydy

Setting 1− y ≤ e−y ,∀y .

≤√

2(n − 1)

π

12

∫ 1

0

e−y( n−32 −t)

√y

dy

Setting∫ 1

0 y− 12 e−y dy ≤

√π

≤√

2(n − 1)

π

1

2√

n−32 − t

√π ≤

√n − 1

n − 3− 2t

Ramesh Hariharan High Dimensional Spaces

Page 107: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Upper Tail Bound for X 2

Setting y = cos2θ.∫ π2

0 sinn−2 θet cos2 θ dθ∫ π2

0 sinn−2 θ dθ≤

√2(n − 1)

π

12

∫ 1

0

(1− y)n−3

2 ety√

ydy

Setting 1− y ≤ e−y ,∀y .

≤√

2(n − 1)

π

12

∫ 1

0

e−y( n−32 −t)

√y

dy

Setting∫ 1

0 y− 12 e−y dy ≤

√π

≤√

2(n − 1)

π

1

2√

n−32 − t

√π ≤

√n − 1

n − 3− 2t

Ramesh Hariharan High Dimensional Spaces

Page 108: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Upper Tail Bound for X 2

Setting y = cos2θ.∫ π2

0 sinn−2 θet cos2 θ dθ∫ π2

0 sinn−2 θ dθ≤

√2(n − 1)

π

12

∫ 1

0

(1− y)n−3

2 ety√

ydy

Setting 1− y ≤ e−y ,∀y .

≤√

2(n − 1)

π

12

∫ 1

0

e−y( n−32 −t)

√y

dy

Setting∫ 1

0 y− 12 e−y dy ≤

√π

≤√

2(n − 1)

π

1

2√

n−32 − t

√π ≤

√n − 1

n − 3− 2t

Ramesh Hariharan High Dimensional Spaces

Page 109: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

The Upper Tail Bound for X 2

Setting y = cos2θ.∫ π2

0 sinn−2 θet cos2 θ dθ∫ π2

0 sinn−2 θ dθ≤

√2(n − 1)

π

12

∫ 1

0

(1− y)n−3

2 ety√

ydy

Setting 1− y ≤ e−y ,∀y .

≤√

2(n − 1)

π

12

∫ 1

0

e−y( n−32 −t)

√y

dy

Setting∫ 1

0 y− 12 e−y dy ≤

√π

≤√

2(n − 1)

π

1

2√

n−32 − t

√π ≤

√n − 1

n − 3− 2t

Ramesh Hariharan High Dimensional Spaces

Page 110: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

E(etX 2i )ke−t(1+ε) k

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , andconstraining 0 < 2t < n−3

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

Ramesh Hariharan High Dimensional Spaces

Page 111: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

E(etX 2i )ke−t(1+ε) k

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , andconstraining 0 < 2t < n−3

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

Ramesh Hariharan High Dimensional Spaces

Page 112: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

E(etX 2i )ke−t(1+ε) k

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , andconstraining 0 < 2t < n−3

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

Ramesh Hariharan High Dimensional Spaces

Page 113: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

E(etX 2i )ke−t(1+ε) k

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , andconstraining 0 < 2t < n−3

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

Ramesh Hariharan High Dimensional Spaces

Page 114: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 115: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 116: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 117: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 118: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 119: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 120: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Completing the Upper Tail Bound for X 2

So: E(etX 2i )ke−t(1−ε) k

n

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

n ]

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

n ]

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Ramesh Hariharan High Dimensional Spaces

Page 121: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 122: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 123: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 124: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 125: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 126: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces

Page 127: Foundations of Data Science Course Ramesh Hariharan Jan 2014hariharan-ramesh.com/ppts/nDim.pdf · 1(θ)dθ = π2 2 Ramesh Hariharan High Dimensional Spaces. Volume of a 1, 2, 3, 4-Dimensional

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

=

√(

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

m3 .

And this holds for all pairs simultaneously with probability 1− 32m .

Ramesh Hariharan High Dimensional Spaces


Recommended