+ All Categories
Home > Documents > Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension...

Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension...

Date post: 10-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
34
Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November 2015 1
Transcript
Page 1: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Subspace estimation in linear dimension reduction

Hannu Oja (with Klaus Nordhausen and David E. Tyler)

BIRS workshop, Banff, November 2015

1

Page 2: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

The plan

• Linear and nonlinear dimension reduction

• Supervised and unsupervised dimension reduction

• Similarities between PCA, FOBI and SIR

• Signal and noise subspaces

• Bootstrap tests for the dimension of signal subspace

• Estimation of the dimension of signal subspace

2

Page 3: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Introduction

• Let x be a p-variate random vector with cumulative distribution Fx.

• Linear dimension reduction.

Find a projection matrix P such that you do not loose information

if you transform x → z = Px:

(i) x|Px is not “interesting” (unsupervised)

(ii) y ⊥⊥ x |Px for some “interesting” y (supervised)

• Nonlinear dimension reduction - not discussed here.

Find a (nonlinear) function H : Rp → Rk such that you do not loose information

if you transform x → z = H(x):

(i) x|H(x) is not “interesting” (unsupervised)

(ii) y ⊥⊥ x |H(x) for some “interesting” y (supervised)

3

Page 4: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Linear dimension reduction

• The dimension of x is reduced using a k × p matrix B.

Then

x → z = Bx

or

x → z = PBx where PB = B′(BB′)−1B.

• The idea is that k << p and that “no information is lost” in the transformation.

• Dimension reduction methods (unsupervised and supervised):

PCA, ICA, ICS, SIR, SAVE, etc.

4

Page 5: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Looking for similarities: PCA, FOBI, SIR

• Assume that E(x) = 0. In PCA, one then finds the p× p transformation matrix W

such that

WW′ = Ip and WE(xx′)W′ = D

where D is a diagonal matrix with diagonal elements d1 ≥ ... ≥ dp ≥ 0.

• In the independent component analysis (ICA), FOBI finds transformation matrix W such

that

WE(xx′)W′ = Ip and WE(xx′E(xx′)−1xx′)W′ = D

where the diagonal elements D are ordered so that

|d1 − (p+ 2)| ≥ ... ≥ |dp − (p+ 2)|.

• The sliced inverse regression (SIR) uses a dependent variable y, and finds a

transformation matrix W which satisfies

WE(xx′)W′ = Ip and WE(E(x|y)E(x|y)′)W′ = D

where the diagonal elements D are d1 ≥ ... ≥ dp ≥ 0.

5

Page 6: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• The idea in dimension reduction is then that W = (W′

1,W′

2)′ where

– k-dimensional W1x presents information (signal), and

– (p− k)-dimensional W2x presents noise.

6

Page 7: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 1: Data set 1, Fisher’s Iris Data: Original variables.

Sepal.Length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5

4.5

5.5

6.5

7.5

2.0

2.5

3.0

3.5

4.0

Sepal.Width

Petal.Length

12

34

56

7

4.5 5.5 6.5 7.5

0.5

1.0

1.5

2.0

2.5

1 2 3 4 5 6 7

Petal.Width

7

Page 8: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 2: Data set 1, Fisher’s Iris Data: Principal components.

Comp.1

−1.0 0.0 0.5 1.0 −0.4 0.0 0.2 0.4

−3

−1

12

34

−1.

00.

01.

0

Comp.2

Comp.3

−0.

50.

00.

5

−3 −1 1 2 3 4

−0.

40.

00.

20.

4

−0.5 0.0 0.5

Comp.4

8

Page 9: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 3: Data set 1, Fisher’s Iris Data : FOBI coordinates.

IC.1

5 6 7 8 9 10 −2.0 −1.0 0.0 1.0

45

67

89

56

78

910

IC.2

IC.3

−7

−6

−5

−4

−3

4 5 6 7 8 9

−2.

0−

1.0

0.0

1.0

−7 −6 −5 −4 −3

IC.4

9

Page 10: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 4: Data set 2: Original variables.

V1

−10 0 5 −5 0 5 10 −5 0 5

−3

−1

13

−10

05

V2

V3

−10

05

−5

05

10

V4

V5

−5

05

10

−3 −1 1 3

−5

05

−10 0 5 −5 0 5 10

V6

10

Page 11: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 5: Data set 2: Principal components.

V1

−5 0 5 10 −6 −2 2 −1.5 0.0 1.0

−15

−5

515

−5

05

10

V2

V3

−10

05

10

−6

−2

2

V4

V5

−1.

00.

51.

5

−15 −5 5 15

−1.

50.

01.

0

−10 0 5 10 −1.0 0.5 1.5

V6

11

Page 12: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 6: Data set 2: FOBI coordinates.

V1

−3 −1 1 3 −2 0 2 −3 −1 0 1

−4

−2

02

−3

−1

13

V2

V3

−3

−1

13

−2

02

V4

V5

−3

−1

01

−4 −2 0 2

−3

−1

01

−3 −1 1 3 −3 −1 0 1

V6

12

Page 13: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 7: Data set 3: Original variables.

V1

−5 0 5 −10 0 5 −4 0 2 4

−5

05

−5

05

V2

V3

−4

04

−10

05

V4

V5

−4

02

4

−5 0 5

−4

02

4

−4 0 4 −4 0 2 4

V6

13

Page 14: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 8: Data set 3: Principal components.

V1

−4 0 4 −4 0 2 −1.0 0.0 1.0

−15

−5

5

−4

04

V2

V3

−2

24

6

−4

02

V4

V5

−2

01

−15 −5 5

−1.

00.

01.

0

−2 2 4 6 −2 0 1

V6

14

Page 15: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 9: Data set 3: FOBI coordinates.

V1

−3 −1 1 −2 0 2 −0.5 1.0 2.5

−3

−1

1

−3

−1

1

V2

V3

−3

−1

13

−2

02

V4

V5

−2

02

−3 −1 1

−0.

51.

02.

5

−3 −1 1 3 −2 0 2

V6

15

Page 16: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 10: Data set 4: Original variables.

X.1

−6 −2 2 6 −6 0 4 −6 −2 2

−6

04

−6

−2

26

X.2

X.3

−10

010

−6

04

X.4

X.5

−4

04

−6 0 4

−6

−2

2

−10 0 10 −4 0 4

y

16

Page 17: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Figure 11: Data set 4: SIR coordinates.

Z.1

−3 0 2 −3 0 2 −6 −2 2

−3

02

4

−3

02

Z.2

Z.3

−3

02

−3

02

Z.4

Z.5

−3

02

−3 0 2 4

−6

−2

2

−3 0 2 −3 0 2

y

17

Page 18: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Testing whether W2x is noise

• In dimension reduction W = (W′

1,W′

2)′ and k-variate W1x is assumed to carry the

relevant information. We then wish to test the following null hypotheses saying that W2x

presents noise:

– PCA:

(i) H0 :W2x ∼ Np−k(0, σ2Ip−k),

(ii) H0 :W2x is spherically symmetric, or

(iii) H0 :W2x has exchangeable components.

– FOBI:

H0 :W2x ∼ Np−k(0, Ip−k).

– SIR:

H0 : (y,W1x) ⊥⊥ W2x (implies that y ⊥⊥ W2x|W1x and linearity condition).

• Unconventional semiparametric bootstrapping is used in the following to test for these

hypotheses.

18

Page 19: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Test statistics for the dimension of W1x

• Let X = (x1, ...,xn)′ (or (y,X)) be a random sample for the distribution of x (or of

(y,x) and W and D natural estimates of W and D, respectively. We then have the

following.

• PCA: H0 implies that d1 ≥ ... ≥ dk > dk+1 = ... = dp. We choose

T (X) = − log

( ∏pi=k+1 di

1/(p−k)

∑pi=k+1 di/(p− k)

).

• FOBI: H0 implies that d1 ≥ ... ≥ dk > dk+1 = ... = dp = p+ 2. We choose

T (X) =

p∑

i=k+1

(di − p− 2)2.

• SIR: H0 implies that d1 ≥ ... ≥ dk > dk+1 = ... = dp = 0. we choose

T (y,X) = log

(p∏

i=k+1

di1/(p−k)

)(or∑p

i=k+1 d2i ) .

19

Page 20: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Tests based on limiting distributions

• Let X = (x1, ...,xn)′ (or (y,X)) be a random sample for the distribution of x (or of

(y,x) and W and D natural estimates of W and D, respectively. We then have the

following.

• PCA: Tyler (1981), Schott (2006), etc.

• FOBI: ?

• SIR: LI (1991), Bura and Cook (2001)

20

Page 21: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

PCA: Strategies for bootstrapping

• Write Z = (X− 1nµ′)W′ and Z = (Z1, Z2) = (X− 1nµ

′)(W1

,W2

).

• Our bootstrap samples X∗ under the null model is then obtained as follows.

1. Write Z = (Z1, Z2) for a bootstrap sample of size n from {z1, ..., zn}.

2. Z∗

1 = Z1 and

2.1 Z∗

2 = (O1z21, ...,Onz2n)′ for n independent random orthogonal

(p− k)× (p− k) matrices O1, ...,On (subsphericity of W2x), or

2.2 Z∗

2 = (P1z21, ...,Pnz2n)′ for n independent random (p− k)× (p− k)

permutation matrices P1, ...,Pn (exchangeability of W2x).

3. Write Z∗ = (Z∗

1,Z∗

2).

4. Write X∗ = Z∗(W′)−1 + 1nµ′.

• An estimated p-value for a bootstrap test with the test statistic T (X) is then obtained as

M−1#{T (X∗

j ) ≥ T (X)} where X∗

1, ...,X∗

M are M independent bootstrap

samples.

21

Page 22: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

PCA: Simulation results

• 500 repetitions (random samples) for sample sizes n = 50, 100, 150, 200 were

generated from N5(0, diag(3, 2, 1, 1, 1)).

• For each random sample, M = 200 bootstrap samples were generated for null

hypotheses k = 3, 2, 1 under the assumptions of subsphericity (O) and

subexchangeability (P).

• The proportion of bootstrap p-values below 0.05 is reported in the following. The true

value is k = 2.

n k = 3 k = 2 k = 1

O P O P O P

50 0.026 0.020 0.032 0.028 0.416 0.392

100 0.016 0.016 0.044 0.050 0.828 0.814

150 0.016 0.016 0.054 0.060 0.970 0.972

200 0.022 0.018 0.036 0.034 0.998 0.998

22

Page 23: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

FOBI: Strategies for bootstrapping

• Write Z = (X− 1nµ′)W′ and Z = (Z1, Z2) = (X− 1nµ

′)(W1

,W2

).

• Our bootstrap samples X∗ under the null model is then obtained as follows.

1. Write Z∗

1 for a matrix of componentwise bootstrap samples of size n from Z1.

2. Let Z∗

2 be a random sample of size n from Np−k(0, Ik).

3. Write Z∗ = (Z∗

1,Z∗

2).

4. Write X∗ = Z∗(W′)−1 + 1nµ′.

• An estimated p-value for a bootstrap test with the test statistic T (X) is then obtained as

M−1#{T (X∗

j ) ≥ T (X)} where X∗

1, ...,X∗

M are M independent bootstrap

samples.

23

Page 24: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

FOBI: Simulation results

• 500 repetitions (random samples) for sample sizes n = 50, 100, 200, ..., 1000 were

generated from a 5-variate independent component model (Setting 1 and 2 below).

• For each random sample, M = 200 bootstrap samples were generated for null

hypotheses k = 2, 3, 4 under the assumptions of subsgaussianity.

• The proportion of bootstrap p-values below 0.05 is reported in the following.

24

Page 25: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• FOBI, Setting 1The distribution of the independent components are χ2

3, N(0, 1), N(0, 1), N(0, 1) and

U(0, 1), and the mixing matrix is I5. The true value is k = 2.

n k = 3 k = 2 k = 1

50 0.026 0.030 0.044

100 0.028 0.050 0.104

200 0.030 0.062 0.236

500 0.018 0.062 0.890

1000 0.028 0.044 1.000

25

Page 26: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• FOBI, Setting 2The distributions of the independent components are exp(1), t6, N(0, 1), N(0, 1),

N(0, 1), and the mixing matrix is I5. The true value is k = 2.

n k = 3 k = 2 k = 1

50 0.038 0.034 0.102

100 0.040 0.048 0.206

200 0.058 0.094 0.468

500 0.024 0.058 0.798

1000 0.028 0.070 0.962

26

Page 27: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

SIR: Strategies for bootstrapping

• Write Z = (X− 1nµ′)W′ and Z = (Z1, Z2) = (X− 1nµ

′)(W1

,W2

).

• Our bootstrap samples (y∗,X∗) under the null model is then obtained as follows.

1. Let(y∗,Z∗

1) be a bootstrap sample of size n from (y, Z1).

2. Let Z∗

2 be a bootstrap sample form random sample of size n from Z2.

(Bootstrap samples are independent)

3. Write Z∗ = (Z∗

1,Z∗

2).

4. Write X∗ = Z∗(W′)−1 + 1nµ′.

• An estimated p-value for a bootstrap test with the test statistic T (y,X) is then obtained

as M−1#{T ((y∗,X∗)j) ≥ T (y,X)} where (y∗,X∗)1, ..., (y∗,X∗)M are M

independent bootstrap samples.

27

Page 28: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

SIR: Simulation results

• 500 repetitions (random samples) for sample sizes n = 50, 100, 200, ..., 1000 were

generated from a nonlinear model for response y and 5-variate x (Setting 1 and 2 below).

• For each random sample, M = 200 bootstrap samples were generated for null

hypotheses k = 2, 3, 4 under the assumptions that (y,W1x) and W2x are

independent.

• The proportion of bootstrap p-values below 0.05 is reported in the following.

28

Page 29: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• SIR, Setting 1Now x ∼ N5(0, I5) and y = x1(x1 + x2 + 1) + ǫ, where ǫ ∼ N(0, 0.25) and

ǫ ⊥⊥ x. Again, k = 2.

n k = 3 k = 2 k = 1

100 0.010 0.024 0.162

200 0.004 0.034 0.298

500 0.004 0.042 0.552

1000 0.012 0.038 0.740

2000 0.010 0.040 0.908

5000 0.006 0.046 0.982

10000 0.010 0.052 0.996

29

Page 30: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• SIR, Setting 2Now x ∼ N5(0, I5) and y = x1

0.5+(x2+1.5)2 + ǫ, where ǫ ∼ N(0, 0.25) and ǫ ⊥⊥ x.

The true value is k = 2.

n k = 3 k = 2 k = 1

100 0.006 0.030 0.242

200 0.010 0.036 0.398

500 0.006 0.060 0.710

1000 0.004 0.032 0.856

2000 0.006 0.030 0.950

5000 0.002 0.028 0.986

10000 0.008 0.060 0.996

30

Page 31: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Final remarks

• FOBI and SIR just serve here as first examples on ICA (ICS) methods and supervised dimension

reduction methods. Our approach works for other methods as well.

• Comparison: Asymptotic tests vs. bootstrap tests

• How to robustify?

– PCA: Replace the covariance matrix by a robust scatter matrix (elliptic case)

– FOBI: Use two robust scatter matrices with independent property

– SIR: Robustify both Cov(x) and Cov(x|y). Gather et al. (2001, 2002), Yohai and Noste

(2005)

31

Page 32: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

• Estimation of k:

Test H0,0

Test H0,1

Test H0,2

...

reject

k = 1

accept

reject

k = 0

accept

Figure 12: Estimation through stepwise testing procedure.

32

Page 33: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

Some referencesBura, E. and Cook, R.D. (2001). Extending Sliced Inverse Regression: the Weighted Chi-Squared Test.

Journal of the American Statistical Association, 96, 996-1003.

Dray, S. (2008). On the number of principal components: A test of dimensionality based on

measurements of similarity between matrices. Computational Statistics & Data Analysis, 52, 2228 2237

Ilmonen, P., Serfling, R., and Oja, H. (2012). Invariant coordinate selection (ICS) functionals.

International Statistical Review, 80, 93110.

Li, K.C. (1991). Sliced Inverse Regression for Dimension Reduction. Journal of the American Statistical

Association, 86, 316-342.

Liski, E., Nordhausen, K., and Oja, H. (2013). Supervised invariant coordinate selection. Statistics: A

Journal of Theoretical and Applied Statistics, 48, 711-731.

Miettinen, J., Nordhausen, K., Oja, H. and Taskinen, S. (2015). Fourth moments and independent

component analysis. Statistical Science,30, 372-390.

Tyler, D.E. (1981). Asymptotic Inference for Eigenvectors. The Annals of Statistics, 9, 725-736.

Tyler, D.E., Critchley, F., Dumbgen, L. and Oja, H. (2009). Invariant coordinate selection. Journal of

Royal Statistical Society B, 71, 549-592.

33

Page 34: Subspace estimation in linear dimension reduction...Subspace estimation in linear dimension reduction Hannu Oja (with Klaus Nordhausen and David E. Tyler) BIRS workshop, Banff, November

THANK YOU FOR YOUR INTEREST !

34


Recommended