+ All Categories
Home > Documents > L6 Curse of Dimensionality Parchas

L6 Curse of Dimensionality Parchas

Date post: 03-Jun-2018
Category:
Upload: gianni-pantaleo
View: 219 times
Download: 0 times
Share this document with a friend
40
8/12/2019 L6 Curse of Dimensionality Parchas http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 1/40 The Curse of Dimensionality Panagiotis Parchas Advanced Data Management Spring 2012 CSE !"ST
Transcript
Page 1: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 1/40

The Curse of Dimensionality

Panagiotis Parchas

Advanced Data Management

Spring 2012

CSE !"ST

Page 2: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 2/40

Multiple Dimensions• As #e discussed in the lectures$ many times it is convenient

to transform a signal %time series$ picture& to a point in

multidimensional space'• This transformation is handy as #e can apply conventional

data(ase inde)ing techni*ues for *ueries such as NN$ orsearch

• This transform may lead as to very high +dimensionality,%hundreds of dimensions&

• -n high dimensionality$ there is a num(er of pro(lems%geometrical and inde) performance& that are usually referredto as the +Curse of Dimensionality,

• -n this presentation.

 –  Some intuition a(out the Curse'

 – E)plore techni*ues that try to overcome it'

Page 3: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 3/40

The Curse

• /olume and area depend e)ponentially on the

num(er of dimensions'

• o intuitive effects.

  – 

eome r c e ec s concern ng e vo ume o ypercu(es and spheres

 –  -nde)ing effects

 –  Effects in the Data(ase environment %*ueryselectivity&

Page 4: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 4/40

a&eometric EffectsLemma:

A sphere touching or intersecting

all the d1 (orders of a cu(e$ #ill

contain the center'

• True for 2D and 3D %(y

visuali4ation&

• -t should (e true for hi her

dimensions %hyper cu(es$ hyperspheres&5

It is NOT!

Page 5: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 5/40

(&-nde)ing Effects

Page 6: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 6/40

(&-nde)ing effects6cont7

• The higher the dimensionality the more

coarse the inde)ing %#hich renders it

useless5&

• This affects all the inde)ing techni*ues'

C8-ST-A 9:M$ 2001

Page 7: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 7/40

c&;uery selectivity

Page 8: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 8/40

<hen is meaningful=

!evin 9eyer et all$ 1>>>

Page 9: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 9/40

<hat is the spell for the curse=• /arious attempts of multidimensional inde)ing

#here proved that don?t ma@e sense for a (igcategory of data distri(utions 6C8-ST-A 9:M$ 20017

 Dimensionality 8eduction techni*ues'

• They (asically apply ideas of compression$ to

data$ in order to reduce the dimensionality'

• -n the ne)t #e #ill focus mainly in Time Series'

Page 10: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 10/40

-ntroduction

11' 

>

>'

10

10'

11

>B1B2011 10B1B2011 11B1B2011 12B1B2011 1B1B2012 2B1B2012

-

12D

space

12 Data points

Page 11: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 11/40

0 20 40 60 80 100 120   0 20 40 60 80 100 120   0 20 40 60 80 100 120   0 20 40 60 80 100 1200 20 40 60 80 100 120   0 20 40 60 80 100 120

DT D<T APCA PAA PA

Tutorial in -EEE -CDM 200F (y Dr' !eogh

Page 12: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 12/40

Discrete ourier Transform %DT&• +Every signal$ no matter ho# comple)$ can (e represented as

a summation of sinusoids,

• -dea. –  ind the hidden sinusoids that form the time series

 –  Store two num(ers for each. % A , φ&

 –  arger fre*uency sins generally correspond to details of the time series

 –  <e can discard them and @eep Gust the first ones %lo# fre*uency&

 –  Then #e use -nverse DT to get the appro)imation of the time series'

phasemagnitude

DT.

-nverse DT.

Page 13: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 13/40

DT e)ample

>'

10

10'

11

11'TIME SERIES

11'1>3F

11'2F

11'31H

11'2>I3

11'303H11'303H

11'202

11'120>

11'1012

11'00F>

A

133>'2

22'HI2

13'F1

10'F>

H'HF>

3'1

'>H21

'03

2'30

3'23

1'320>

φ

0

1'FFH

0'33IF2

0'I33

1'3F2

1'FI3

1'F2

1'2H1I

1'IHF1

1'>H

1'00

>        1 I

        1        3

        1        >

        2       A

        3        1

        3       I

        F        3

        F        >

       A       A

        H        1

        H       I

       I        3

       I        >

        C       A

        >        1

        >       I

        1        0        3

        1        0        >

        1        1       A

        1        2        1

        1        2       I

10'>

10'>F0110'>FIH

10'IH>

10'HFF

10'HFIH

10'I13H

10'IF>2

10'I210'H32

10'HF>

10'H2F>

10'F>0F

10'FI>

3'1>3>

2'10'3FI2

2'HF11

3'12

2'2F

2'0IH

1'00HH

1'2

1'F2I

0'3HF

1'02

0'>I202

1'3F33

1'H>I2

DT 1'

1'32'I3

0'>H0HI

1'F3IF

1'3I02

2'00

0'H>IF

0'302

1'0F0

0'0>2F03

1'22>3

0'310F

0'2F0FI

1'H03F

<e store

JK

valuesL

Page 14: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 14/40

DT e)ample%cont&

A

133>'2

22'HI2

φ

0

1'FFH

A""roximate TS

10'2F

10'>F>

11'0>

11'1FI

11'21

11'2F3

11'2F

11'22H

11'11

11'11 >'

10

10'

11

11'

#T a""roximation

-DT

'

10'F>H'HF>

3'1

'>H21

'03

'

0'I331'3F2

1'FI3

1'F2

1'2H1I

11'0F3

10'>H3

10'3

10'0>

10'IFH

10'H>

10'HI

10'H32

10'H110'H11

10'H0>

10'H0I

10'H03

10'>F

10'I>

>        1 H

        1        1

        1        H

        2        1

        2        H

        3        1

        3        H

        F        1

        F        H

       A        1

       A        H

        H        1

        H        H

       I        1

       I        H

        C        1

        C        H

        >        1

        >        H

        1        0        1

        1        0        H

        1        1        1

        1        1        H

        1        2        1

        1        2        H

Page 15: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 15/40

DT

Page 16: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 16/40

DT %pros cons&• :%nlogn& comple)ity

•ard#are -mplementations• ood a(ility to compress most signals

• Many applications

• ot good appro)imation for (ursty signals

• ot good appro)imation if the signal contains (oth flatand (usy segments

• Cannot support other distance metrics• Contains info only for the fre*uency distri(ution

 –  The time domain=

Page 17: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 17/40

<hy DT is not enough=• -t gives us information a(out the fre*uency

component of a time series$ #ithout tellingwhere this fre*uency lies in the time domain

1z(t)=sin(5*t) , sin(10*t)

2x(t)=sin(5*t)+sin(10*t)

3500Fourier Decomposition (Spectrum)

0 1000 2000 3000 4000 5000 6000 70-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 1000 2000 3000 4000 5000 6000 7000-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

3000

Page 18: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 18/40

Discrete <avelet Transform%D<T&• This comes as a solution to the previous pro(lem'

• The #avelet transform contains information (oth for the fre*uency

domain AD the time domain'

• The (asic -dea is to e)press the time series as a linear com(ination of a

#ave et asis unction' aar <ave et is most y use .

Page 19: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 19/40

D<T. raphical -ntuition• The #avelet is stretched and shifted in time and this is done

for all the possi(le stretches and shifts.

• After#ards$ each is multiplied #ith the TS'

• <e @eep only the ones #ith high product'

Page 20: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 20/40

D<T. umerical -ntuition

Reso%ution A&erages #etai%s

F 6> I 3 7

2 6 F7 61 17

1 6H7 627

9

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

1

2

3

4

5

6

7

8

Page 21: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 21/40

E)ample ta@en (y Stollnit4$ E' et all 1>>

Page 22: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 22/40

D<T-n our e)ample.

• <e had 12pts• The appro)imation

%red line& uses only10'H

10'

11

11'2

11'F'a&e%et A""roximation

1H haar coefficients

>'

10

10'2

'

        1 A >        1        3

        1       I

        2        1

        2       A

        2        >

        3        3

        3       I

        F        1

        F       A

        F        >

       A        3

       A       I

        H        1

        H       A

        H        >

       I        3

       I       I

        C        1

        C       A

        C        >

        >        3

        >       I

        1        0        1

        1        0       A

        1        0        >

        1        1        3

        1        1       I

        1        2        1

        1        2       A

Page 23: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 23/40

D<T%Pros Cons&N ood a(ility to compress stationary signals'

N ast linear time algorithms for D<T e)ist'N A(le to support some interesting nonEuclideansimilarity measures'

N Signals must have a length n K 2someOinteger

N <or@s (est if N is K 2someOinteger' :ther#ise #aveletsappro)imate the left side of signal at the e)pense of the right side'

N Cannot support #eighted distance measures'

Page 24: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 24/40

Singular /alue Decomposition%S/D&• All the previous methods$ try to transform

each time series independently of the others'• <hat if #e ta@e into account all the Time

• <e can then achieve the desired

dimensionality reduction for the specific

Dataset

Page 25: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 25/40

S/D. 9asic -dea 617

*

Page 26: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 26/40

S/D. 9asic -dea %2&

*

Page 27: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 27/40

S/D. 9asic -dea %3&

*

Page 28: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 28/40

S/D 6more7• The goal is to find the a)es #ith the (iggest

variance'

High variance A lot of -mportanta)es

 

-nformation

Lo( variance

a)es

ittle

-nformationB

oise

 

A)es

Axes can )e

truncate*

Page 29: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 29/40

S/D6more7• -n the previous intuition$ #e can @eep the coefficients of

the proGections to the ne# a)is'

• This can (e efficiently done (y S/D'• So #e perform the dimensionality reduction in an

aggregate #ay ta@ing into account the #hole dataset'

• This idea #as traditionally used in linear alge(ra formatri) compression'

• The idea #as to find the %nearly& linearly dependentcolumns of a matri) A and eliminate them'

• It can be proved that this compression is optimal.

T V U  A   Σ=

Page 30: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 30/40

S/D. compression

*

ProGection to the

a)is denoted (y

the (iggest

singular value s

MINIM+M

information loss

ood forcompression

Page 31: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 31/40

S/D. Clustering

*

ProGection to the

a)is denoted (y

the smallestsingular value s,

MAIM+Minformation loss

ood for

clustering

Page 32: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 32/40

S/D%Pros Cons&N O"tima% linear dimensionality reduction techni*ue '

N The eigenvalues tell us something a(out the underlying structure of the

data'

 N Computationally very e)pensive'

N Time. :%Mn2&

N Space. :%Mn&

N An insertion into the data(ase re*uires recom"uting the S/D'

N Cannot support #eighted distance measures or non Euclidean measures'

Page 33: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 33/40

Piece#ise AggregatePiece#ise Aggregate Appro)imationAppro)imation

%PAA&%PAA&• /ery simple$ intuitive

• 8epresent the time series as a summation of (o)esof e*ual length'

11.4PAA approximation

• <e @eep 13 (o)es

0 20 40 60 80 100 120 1409.8

10

10.2

10.4

10.6

10.8

11

.

Page 34: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 34/40

PAA%Pros Cons&N ast$ easy to implement$ intuitive

N The authors claim it is as efficient as otherapproaches %empirically&

 N Supports *ueries o ar itrary engt sN Supports non Euclidean measures

N -t seems as a simplification of D<T$ that

cannot (e generali4ed to other types of signals

Page 35: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 35/40

Adaptive Piece#ise ConstantAdaptive Piece#ise Constant

Appro)imation %APCA&Appro)imation %APCA&• <hat a(out signals #ith

flat areas and pea@s=

-DEA. generalize PAA

 

8a# Data %Electrocardiogram&

Adaptive 8epresentation %APCA&8econstruction Error 2'H1

so t can automat ca  

adapt itself to the correct 

bo! size.

%#e should no# @eep (oth

the length and height of

the (o)&

50 100 150 200 2500

aar <avelet or PAA8econstruction Error 3'2I

DT8econstruction Error 3'11

e)ample (y E'!eogh -EEE -CDM 200F

Page 36: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 36/40

APCA 6more7• -n order to implement it$ the authors propose

first a D<T transformation that is follo#ed (ymerging of the similar, ad"acent #avelets'

 

• o#ever the inde)ing is more complicated

than PAA since #e need t#o num(ers for each

(o)'• That is the reason #hy is not used very often'

Page 37: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 37/40

Piece#ise inearPiece#ise inear Appro)imation %PA&Appro)imation %PA&inear segments

for representation%not necessarily

Although efficient in

some cases$

The implementation

is slo# and it is not

inde)a(lee)ample for visuali4ation only

Page 38: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 38/40

on inear Techni*ues

Dimensionality 8eduction. A Comparative8evie#$ ''P' van der Maaten 200

Page 39: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 39/40

on inear techni*ues 627• A lot of techni*ues hve emerged the last

years'• o#ever $ 6Maaten et al 2007 compared

most of the datasets all these complicated

techni*ues turn out to (e #orse'

• The reasons the authors claim$ are data overfitting and curse of dimensionality5

Page 40: L6 Curse of Dimensionality Parchas

8/12/2019 L6 Curse of Dimensionality Parchas

http://slidepdf.com/reader/full/l6-curse-of-dimensionality-parchas 40/40

Conclusion• All the (efore mentioned techni*ues have their

strong and #ea@ points'

• Dr !eogh tested them over H different datasets#ith different characteristics.

On a&erage. the/ are a%% a)out the same' -nparticular$ on 0Q of the datasets they are all #ithin10Q of each other'

So the choice for the (est method depends on thecharacteristics of the Dataset


Recommended